Hello Daniel,
Thank you very much for testing Suricata on various hardware.
However, looking at the figures, this does not look right to me.
The SoC in the APU is not very fast. I guess it will indeed perform very poorly because of its small cache sizes. It only has 2MB of L2 cache that is shared by four cores. Each of the cores only has 1 GHz clock speed. That is not really very speedy.
However, we are seeing that throughput is going down from (let’s round it up) 1000 MBit/s to only 30 MBit/s - only looking at the downstream. That is a loss of 97% of bandwidth - or only 3% of bandwidth remains.
If that would be the case with loads of rules enabled; loads of decoding happening… well… I would have said that this is basically what the hardware does.
But when suricata only gets a copy of the packet and then almost does nothing with it, then this should not be so severe.
Yesterday evening, I have changed some options around the queueing which is in my point of view the culprit here. As we can see from your CPU stats, user space (i.e. suricata) is not very busy. It is the kernel that is consuming around 27% of CPU time.
So I enabled an option that ties each queue to a single CPU. That should ensure that the caches remain “hotter”.
https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=4d093b810552339a6a7d...
I also enabled CPU affinity in suricata:
https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=35cdc506b06ed2e5fc8f...
This kind of does the same. Each process is tied to a single processor ensuring that cache misses are less likely.
The verdict processes also have a higher priority now which might decrease latency.
The nightly build has already run through for x86_64:
https://nightly.ipfire.org/next/2019-03-26%2021:58:01%20+0000-35cdc506/x86_6...
Could you please re-test and report any changes?
Best, -Michael
On 26 Mar 2019, at 20:47, Daniel Weismüller daniel.weismueller@ipfire.org wrote:
Here are the first bidirectional iperf benchmarcs with the apu
797/922 mbit/s without suricata 23/68 suricata no rules active 30/60 suricata with 1 rule active 28/63 suricata with 7 rules active
top cpu usage 10%us 27%sy 0%ni 50%id 0%wa 1,5%hi 12%si 0%st
Wow, this is slower than I imagined. Tomorrow I try better hardware.
Daniel