From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: Advice with adding RPS/RFS for PPP connections Date: Tue, 02 Jun 2020 08:52:36 +0100 Message-ID: <220C4081-4ED0-41F8-B1E1-E6D643388C49@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6056803750815869020==" List-Id: --===============6056803750815869020== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Adam, > On 2 Jun 2020, at 02:38, Adam Jaremko wro= te: >=20 >> On Mon, 1 Jun 2020 at 05:36, Michael Tremer = wrote: >>=20 >> Hello Adam, >>=20 >> Thank you for getting in touch. >>=20 >>> On 29 May 2020, at 21:18, Adam Jaremko = wrote: >>>=20 >>> I've been doing RPS/RFS in my own shell scripts for some time and also im= plemented primitive support into the web UI but I'm very much a perl novice s= ince my last real usage was in the 90's. >>=20 >> What did you change in your scripts? >=20 > For my basic implementation I've added new parameters to > /var/ipfire/ppp/settings called RPS=3D{on|off} and RPS_CPUS=3D. > Added a new script to /etc/rc.d/init.d/networking/red.up/ called > 01-rps which reads and acts upon the values by setting the following: >=20 > /sys/class/net/${RED_DEV}/queues/rx-*/rps_cpus > /sys/class/net/${RED_DEV}/queues/rx-*/rps_flow_cnt > /sys/class/net/${RED_DEV}/queues/tx-*/xps_cpus This makes sense. > As for the WUI, I modified /srv/web/ipfire/cgi-bin/pppsetup.cgi to to > get output from /usr/bin/nproc to display a list of checkboxes (to > represent the bitmask) and convert the mask to hex on write. It's > verbatim to what is documented at However, I do not really understand why you want to give the user a choice ab= out this? It is not best to always load-balance across all processors? That can be auto= matically detected and will still work even after the user changes hardware. = It will also be zero-configuration in the first place. > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/ht= ml/performance_tuning_guide/network-rps > https://www.suse.com/support/kb/doc/?id=3D000018430 >=20 > I understand it's a little more involved with NUMA systems which I > don't address, but I've found a script or two in my research, such as >=20 > https://stackoverflow.com/questions/30618524/setting-receive-packet-steerin= g-rps-for-32-cores/49544150#49544150 >=20 >>> First and foremost, would it be relevant to add such support in an offici= al capacity? >>=20 >> Generally I would say yes. Does it come with any downsides? >>=20 >> I am not aware that anyone ran into resource issues here, because PPP conn= ections are usually rather slow (up to 100 MBit/s). But we would probably gai= n some throughput by better utilisation of the load-balancing on the IPS, etc. >=20 > I generalized PPP but in my case it's PPPoE on symmetrical gigabit but > any form of encapsulation is often not handled by RSS algorithms with > the exception of VLANs (PPPoE via VLAN is part of that exception). And > as I'm sure you're aware network processing was being delegated to > only CPU0. Yes, that can happen. I would argue that the network interfaces make a differ= ence here, but generally CPU0 will become a bottleneck. Normally that does no= t matter too much as I said that bandwidth over PPP sessions is normally not = high enough to even saturate a moderate processor. But encapsulation is expen= sive and a Gigabit is a lot of traffic to push! >=20 >>> Second, is there any guidelines to accessing the filesytem with the scrip= ts? I ask because my non thorough browsing of the code only revealed the use = of readhash, whereas I would need to poke and prod at procfs to work with com= plex CPU affinity per queue. >>=20 >> What are you doing here? Are you assigning a processor to a queue using th= at script? >=20 > I just realized I said procfs where I meant sysfs. The basic > implementation is explained above but in my own scripts I am assigning > a single proc to rx/tx queue and irq, which is where I would like to > move the WUI part forward. I would like to represent each queue with > its own CPU bitmask. We have some code in IPFire 3 that potentially automatically does what you wa= nt to do: https://git.ipfire.org/?p=3Dnetwork.git;a=3Dblob;f=3Dsrc/functions/function= s.interrupts;h=3D83a57b35145888fad075f1e4ea58832c81789967;hb=3DHEAD The main part is here which is called when a new network interface is plugged= in (it is all hotplugging here): https://git.ipfire.org/?p=3Dnetwork.git;a=3Dblob;f=3Dsrc/functions/function= s.device;hb=3Dea4abb82bc6e613ddebd6235f792dd5bbbc469c9#l1007 It will then search for a processor that is not very busy and assign all queu= es of the NIC to the least busy processor core. Least busy being the ones wit= h the fewest queues assigned. Can you re-use some of this code? -Michael >=20 >>> I can submit a review patch of my primitive implementation (non NUMA) whe= rein it's the same CPU affinity used across each queue using nproc and some c= heck boxes just to get feedback and a start in the right direction. >>>=20 >>> Thanks, >>> AJ >>=20 >> -Michael --===============6056803750815869020==--