Thanks Michael for the input, I feel I'm headed in the right direction now. > On Tue, 2 Jun 2020 at 03:52, Michael Tremer wrote: > > Hi Adam, > > > On 2 Jun 2020, at 02:38, Adam Jaremko wrote: > > > >> On Mon, 1 Jun 2020 at 05:36, Michael Tremer wrote: > >> > >> Hello Adam, > >> > >> Thank you for getting in touch. > >> > >>> On 29 May 2020, at 21:18, Adam Jaremko wrote: > >>> > >>> I've been doing RPS/RFS in my own shell scripts for some time and also implemented primitive support into the web UI but I'm very much a perl novice since my last real usage was in the 90's. > >> > >> What did you change in your scripts? > > > > For my basic implementation I've added new parameters to > > /var/ipfire/ppp/settings called RPS={on|off} and RPS_CPUS=. > > Added a new script to /etc/rc.d/init.d/networking/red.up/ called > > 01-rps which reads and acts upon the values by setting the following: > > > > /sys/class/net/${RED_DEV}/queues/rx-*/rps_cpus > > /sys/class/net/${RED_DEV}/queues/rx-*/rps_flow_cnt > > /sys/class/net/${RED_DEV}/queues/tx-*/xps_cpus > > This makes sense. > > > As for the WUI, I modified /srv/web/ipfire/cgi-bin/pppsetup.cgi to to > > get output from /usr/bin/nproc to display a list of checkboxes (to > > represent the bitmask) and convert the mask to hex on write. It's > > verbatim to what is documented at > > However, I do not really understand why you want to give the user a choice about this? > > It is not best to always load-balance across all processors? That can be automatically detected and will still work even after the user changes hardware. It will also be zero-configuration in the first place. > I see your point and yes zero configuration makes absolute sense. > > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/network-rps > > https://www.suse.com/support/kb/doc/?id=000018430 > > > > I understand it's a little more involved with NUMA systems which I > > don't address, but I've found a script or two in my research, such as > > > > https://stackoverflow.com/questions/30618524/setting-receive-packet-steering-rps-for-32-cores/49544150#49544150 > > > >>> First and foremost, would it be relevant to add such support in an official capacity? > >> > >> Generally I would say yes. Does it come with any downsides? > >> > >> I am not aware that anyone ran into resource issues here, because PPP connections are usually rather slow (up to 100 MBit/s). But we would probably gain some throughput by better utilisation of the load-balancing on the IPS, etc. > > > > I generalized PPP but in my case it's PPPoE on symmetrical gigabit but > > any form of encapsulation is often not handled by RSS algorithms with > > the exception of VLANs (PPPoE via VLAN is part of that exception). And > > as I'm sure you're aware network processing was being delegated to > > only CPU0. > > Yes, that can happen. I would argue that the network interfaces make a difference here, but generally CPU0 will become a bottleneck. Normally that does not matter too much as I said that bandwidth over PPP sessions is normally not high enough to even saturate a moderate processor. But encapsulation is expensive and a Gigabit is a lot of traffic to push! > > > > >>> Second, is there any guidelines to accessing the filesytem with the scripts? I ask because my non thorough browsing of the code only revealed the use of readhash, whereas I would need to poke and prod at procfs to work with complex CPU affinity per queue. > >> > >> What are you doing here? Are you assigning a processor to a queue using that script? > > > > I just realized I said procfs where I meant sysfs. The basic > > implementation is explained above but in my own scripts I am assigning > > a single proc to rx/tx queue and irq, which is where I would like to > > move the WUI part forward. I would like to represent each queue with > > its own CPU bitmask. > > We have some code in IPFire 3 that potentially automatically does what you want to do: > > https://git.ipfire.org/?p=network.git;a=blob;f=src/functions/functions.interrupts;h=83a57b35145888fad075f1e4ea58832c81789967;hb=HEAD > > The main part is here which is called when a new network interface is plugged in (it is all hotplugging here): > > https://git.ipfire.org/?p=network.git;a=blob;f=src/functions/functions.device;hb=ea4abb82bc6e613ddebd6235f792dd5bbbc469c9#l1007 > > It will then search for a processor that is not very busy and assign all queues of the NIC to the least busy processor core. Least busy being the ones with the fewest queues assigned. > > Can you re-use some of this code? > Wonderful to see that IPFire 3 already has it all in place and I'll definitely re-use the code. > -Michael > > > > >>> I can submit a review patch of my primitive implementation (non NUMA) wherein it's the same CPU affinity used across each queue using nproc and some check boxes just to get feedback and a start in the right direction. > >>> > >>> Thanks, > >>> AJ > >> > >> -Michael > -- AJ