Hi Adam,
On 2 Jun 2020, at 02:38, Adam Jaremko adam.jaremko+ipfire@gmail.com wrote:
On Mon, 1 Jun 2020 at 05:36, Michael Tremer michael.tremer@ipfire.org wrote:
Hello Adam,
Thank you for getting in touch.
On 29 May 2020, at 21:18, Adam Jaremko adam.jaremko+ipfire@gmail.com wrote:
I've been doing RPS/RFS in my own shell scripts for some time and also implemented primitive support into the web UI but I'm very much a perl novice since my last real usage was in the 90's.
What did you change in your scripts?
For my basic implementation I've added new parameters to /var/ipfire/ppp/settings called RPS={on|off} and RPS_CPUS=<CPU MASK>. Added a new script to /etc/rc.d/init.d/networking/red.up/ called 01-rps which reads and acts upon the values by setting the following:
/sys/class/net/${RED_DEV}/queues/rx-*/rps_cpus /sys/class/net/${RED_DEV}/queues/rx-*/rps_flow_cnt /sys/class/net/${RED_DEV}/queues/tx-*/xps_cpus
This makes sense.
As for the WUI, I modified /srv/web/ipfire/cgi-bin/pppsetup.cgi to to get output from /usr/bin/nproc to display a list of checkboxes (to represent the bitmask) and convert the mask to hex on write. It's verbatim to what is documented at
However, I do not really understand why you want to give the user a choice about this?
It is not best to always load-balance across all processors? That can be automatically detected and will still work even after the user changes hardware. It will also be zero-configuration in the first place.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/htm... https://www.suse.com/support/kb/doc/?id=000018430
I understand it's a little more involved with NUMA systems which I don't address, but I've found a script or two in my research, such as
https://stackoverflow.com/questions/30618524/setting-receive-packet-steering...
First and foremost, would it be relevant to add such support in an official capacity?
Generally I would say yes. Does it come with any downsides?
I am not aware that anyone ran into resource issues here, because PPP connections are usually rather slow (up to 100 MBit/s). But we would probably gain some throughput by better utilisation of the load-balancing on the IPS, etc.
I generalized PPP but in my case it's PPPoE on symmetrical gigabit but any form of encapsulation is often not handled by RSS algorithms with the exception of VLANs (PPPoE via VLAN is part of that exception). And as I'm sure you're aware network processing was being delegated to only CPU0.
Yes, that can happen. I would argue that the network interfaces make a difference here, but generally CPU0 will become a bottleneck. Normally that does not matter too much as I said that bandwidth over PPP sessions is normally not high enough to even saturate a moderate processor. But encapsulation is expensive and a Gigabit is a lot of traffic to push!
Second, is there any guidelines to accessing the filesytem with the scripts? I ask because my non thorough browsing of the code only revealed the use of readhash, whereas I would need to poke and prod at procfs to work with complex CPU affinity per queue.
What are you doing here? Are you assigning a processor to a queue using that script?
I just realized I said procfs where I meant sysfs. The basic implementation is explained above but in my own scripts I am assigning a single proc to rx/tx queue and irq, which is where I would like to move the WUI part forward. I would like to represent each queue with its own CPU bitmask.
We have some code in IPFire 3 that potentially automatically does what you want to do:
https://git.ipfire.org/?p=network.git;a=blob;f=src/functions/functions.inter...
The main part is here which is called when a new network interface is plugged in (it is all hotplugging here):
https://git.ipfire.org/?p=network.git;a=blob;f=src/functions/functions.devic...
It will then search for a processor that is not very busy and assign all queues of the NIC to the least busy processor core. Least busy being the ones with the fewest queues assigned.
Can you re-use some of this code?
-Michael
I can submit a review patch of my primitive implementation (non NUMA) wherein it's the same CPU affinity used across each queue using nproc and some check boxes just to get feedback and a start in the right direction.
Thanks, AJ
-Michael