From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: APU / Suricata Benchmarks Date: Wed, 27 Mar 2019 09:57:41 +0000 Message-ID: <989322CA-5D95-43C3-A268-BD2FA1829F9B@ipfire.org> In-Reply-To: <20190326204758.Horde.DnISwUw2OMOxDZiG-LkF5cz@whytea.ipfire-zuhause.de> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0409302322680807325==" List-Id: --===============0409302322680807325== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Daniel, Thank you very much for testing Suricata on various hardware. However, looking at the figures, this does not look right to me. The SoC in the APU is not very fast. I guess it will indeed perform very poor= ly because of its small cache sizes. It only has 2MB of L2 cache that is shar= ed by four cores. Each of the cores only has 1 GHz clock speed. That is not r= eally very speedy. However, we are seeing that throughput is going down from (let=E2=80=99s roun= d it up) 1000 MBit/s to only 30 MBit/s - only looking at the downstream. That= is a loss of 97% of bandwidth - or only 3% of bandwidth remains. If that would be the case with loads of rules enabled; loads of decoding happ= ening=E2=80=A6 well=E2=80=A6 I would have said that this is basically what th= e hardware does. But when suricata only gets a copy of the packet and then almost does nothing= with it, then this should not be so severe. Yesterday evening, I have changed some options around the queueing which is i= n my point of view the culprit here. As we can see from your CPU stats, user = space (i.e. suricata) is not very busy. It is the kernel that is consuming ar= ound 27% of CPU time. So I enabled an option that ties each queue to a single CPU. That should ensu= re that the caches remain =E2=80=9Chotter=E2=80=9D. https://git.ipfire.org/?p=3Dipfire-2.x.git;a=3Dcommitdiff;h=3D4d093b8105523= 39a6a7df774412c8e144f799331 I also enabled CPU affinity in suricata: https://git.ipfire.org/?p=3Dipfire-2.x.git;a=3Dcommitdiff;h=3D35cdc506b06ed= 2e5fc8f7ad7fe57239eaadbda58 This kind of does the same. Each process is tied to a single processor ensuri= ng that cache misses are less likely. The verdict processes also have a higher priority now which might decrease la= tency. The nightly build has already run through for x86_64: https://nightly.ipfire.org/next/2019-03-26%2021:58:01%20+0000-35cdc506/x86_= 64/ Could you please re-test and report any changes? Best, -Michael > On 26 Mar 2019, at 20:47, Daniel Weism=C3=BCller wrote: >=20 > Here are the first bidirectional iperf benchmarcs with the apu=20 >=20 > 797/922 mbit/s without suricata > 23/68 suricata no rules active > 30/60 suricata with 1 rule active > 28/63 suricata with 7 rules active >=20 > top cpu usage > 10%us 27%sy 0%ni 50%id 0%wa 1,5%hi 12%si 0%st >=20 > Wow, this is slower than I imagined.=20 > Tomorrow I try better hardware. >=20 > - > Daniel --===============0409302322680807325==--