* APU / Suricata Benchmarks
@ 2019-03-26 20:47 Daniel Weismüller
2019-03-27 9:57 ` Michael Tremer
0 siblings, 1 reply; 2+ messages in thread
From: Daniel Weismüller @ 2019-03-26 20:47 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 340 bytes --]
Here are the first bidirectional iperf benchmarcs with the apu
797/922 mbit/s without suricata
23/68 suricata no rules active
30/60 suricata with 1 rule active
28/63 suricata with 7 rules active
top cpu usage
10%us 27%sy 0%ni 50%id 0%wa 1,5%hi 12%si 0%st
Wow, this is slower than I imagined.
Tomorrow I try better hardware.
-
Daniel
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: APU / Suricata Benchmarks
2019-03-26 20:47 APU / Suricata Benchmarks Daniel Weismüller
@ 2019-03-27 9:57 ` Michael Tremer
0 siblings, 0 replies; 2+ messages in thread
From: Michael Tremer @ 2019-03-27 9:57 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]
Hello Daniel,
Thank you very much for testing Suricata on various hardware.
However, looking at the figures, this does not look right to me.
The SoC in the APU is not very fast. I guess it will indeed perform very poorly because of its small cache sizes. It only has 2MB of L2 cache that is shared by four cores. Each of the cores only has 1 GHz clock speed. That is not really very speedy.
However, we are seeing that throughput is going down from (let’s round it up) 1000 MBit/s to only 30 MBit/s - only looking at the downstream. That is a loss of 97% of bandwidth - or only 3% of bandwidth remains.
If that would be the case with loads of rules enabled; loads of decoding happening… well… I would have said that this is basically what the hardware does.
But when suricata only gets a copy of the packet and then almost does nothing with it, then this should not be so severe.
Yesterday evening, I have changed some options around the queueing which is in my point of view the culprit here. As we can see from your CPU stats, user space (i.e. suricata) is not very busy. It is the kernel that is consuming around 27% of CPU time.
So I enabled an option that ties each queue to a single CPU. That should ensure that the caches remain “hotter”.
https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=4d093b810552339a6a7df774412c8e144f799331
I also enabled CPU affinity in suricata:
https://git.ipfire.org/?p=ipfire-2.x.git;a=commitdiff;h=35cdc506b06ed2e5fc8f7ad7fe57239eaadbda58
This kind of does the same. Each process is tied to a single processor ensuring that cache misses are less likely.
The verdict processes also have a higher priority now which might decrease latency.
The nightly build has already run through for x86_64:
https://nightly.ipfire.org/next/2019-03-26%2021:58:01%20+0000-35cdc506/x86_64/
Could you please re-test and report any changes?
Best,
-Michael
> On 26 Mar 2019, at 20:47, Daniel Weismüller <daniel.weismueller(a)ipfire.org> wrote:
>
> Here are the first bidirectional iperf benchmarcs with the apu
>
> 797/922 mbit/s without suricata
> 23/68 suricata no rules active
> 30/60 suricata with 1 rule active
> 28/63 suricata with 7 rules active
>
> top cpu usage
> 10%us 27%sy 0%ni 50%id 0%wa 1,5%hi 12%si 0%st
>
> Wow, this is slower than I imagined.
> Tomorrow I try better hardware.
>
> -
> Daniel
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-03-27 9:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-26 20:47 APU / Suricata Benchmarks Daniel Weismüller
2019-03-27 9:57 ` Michael Tremer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox