Hello Daniel, Thank you very much for testing Suricata on various hardware. However, looking at the figures, this does not look right to me. The SoC in the APU is not very fast. I guess it will indeed perform very poorly because of its small cache sizes. It only has 2MB of L2 cache that is shared by four cores. Each of the cores only has 1 GHz clock speed. That is not really very speedy. However, we are seeing that throughput is going down from (let’s round it up) 1000 MBit/s to only 30 MBit/s - only looking at the downstream. That is a loss of 97% of bandwidth - or only 3% of bandwidth remains. If that would be the case with loads of rules enabled; loads of decoding happening… well… I would have said that this is basically what the hardware does. But when suricata only gets a copy of the packet and then almost does nothing with it, then this should not be so severe. Yesterday evening, I have changed some options around the queueing which is in my point of view the culprit here. As we can see from your CPU stats, user space (i.e. suricata) is not very busy. It is the kernel that is consuming around 27% of CPU time. So I enabled an option that ties each queue to a single CPU. That should ensure that the caches remain “hotter”. https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=4d093b810552339a6a7df774412c8e144f799331 I also enabled CPU affinity in suricata: https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=35cdc506b06ed2e5fc8f7ad7fe57239eaadbda58 This kind of does the same. Each process is tied to a single processor ensuring that cache misses are less likely. The verdict processes also have a higher priority now which might decrease latency. The nightly build has already run through for x86_64: https://nightly.ipfire.org/next/2019-03-26%2021:58:01%20+0000-35cdc506/x86_64/ Could you please re-test and report any changes? Best, -Michael > On 26 Mar 2019, at 20:47, Daniel Weismüller wrote: > > Here are the first bidirectional iperf benchmarcs with the apu > > 797/922 mbit/s without suricata > 23/68 suricata no rules active > 30/60 suricata with 1 rule active > 28/63 suricata with 7 rules active > > top cpu usage > 10%us 27%sy 0%ni 50%id 0%wa 1,5%hi 12%si 0%st > > Wow, this is slower than I imagined. > Tomorrow I try better hardware. > > - > Daniel