I had a failure today on my new ipfire installation that didn't survive the same kind of outage that my hand-made Debian-based firewall box had survived many times in the past: a power failure and restoration.
In the community pages, I picked up an existing discussion and discussed the scenario in detail. I won't repeat that discussion here, but it was suggested that I post to this list and work towards an upstream change.
https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec...
I won't repeat all the details of how I discovered this, but allow me to summarize the small changes I made. (See the community post and followups there for full details.)
Basically, when ipfire boots and DHCP on red doesn't provide an address, dhcpcd times out after 60 seconds and then stops trying and nothing makes it try again. This leaves the green network up (good!) but the red network completely dead until someone reboots ipfire (or takes some other steps that re-trigger a start of dhcpcd).
My simple repair so far has been:
1. Edit /etc/init.d/networking/functions.network to start dhcpcd in the background with no timeout.
--- /root/functions.network.orig 2022-01-08 16:26:02.956856033 -0400 +++ functions.network 2022-01-08 21:07:28.617170885 -0400 @@ -56,7 +56,7 @@ # This function will start a dhcpcd on a speciefied device.
local device="$1" - local dhcp_start="" + local dhcp_start="--timeout 0 --background "
boot_mesg -n "Starting dhcpcd on the ${device} interface..."
(Be sure to include that last space inside the quotes!)
2. For my testing, I also set ntp's ENABLESETONBOOT in /var/ipfire/time/settings to off (aka “Force setting the system clock on boot”) because it sits in a loop waiting for red0 to come up otherwise!
At the time, I didn't notice that the loop in /etc/init.d/ntp stops after a minute, but nonetheless, it was handy to turn it off while testing :) So, I _think_ only the first change is necessary.
All of the testing I did so far seems to indicate that, provided I don't have rules that explicitly mention the red0 IP address, all works well when the lease is acquired, or even when the lease changes the IP unexpectedly.
Is a change like this something that could become part of ipfire?
Thanks for making ipfire! I'm impressed so far.
I've had dhcp do that to me and it doesn't take a power failure but a downed isp will do.
My solution is to have a script run every 5 minutes to try again and after an hour reboot ipfire again.
My desktop just died and I'm rebuilding it but I can post that script if it's helpful.
Get BlueMail for Android
-------- Original Message -------- From: Brad Spencer spencer@jacknife.org Sent: Sat Jan 08 20:27:28 EST 2022 To: development@lists.ipfire.org Subject: Keep trying DHCP on red0 if unavailable at boot
I had a failure today on my new ipfire installation that didn't survive the same kind of outage that my hand-made Debian-based firewall box had survived many times in the past: a power failure and restoration.
In the community pages, I picked up an existing discussion and discussed the scenario in detail. I won't repeat that discussion here, but it was suggested that I post to this list and work towards an upstream change.
https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec...
I won't repeat all the details of how I discovered this, but allow me to summarize the small changes I made. (See the community post and followups there for full details.)
Basically, when ipfire boots and DHCP on red doesn't provide an address, dhcpcd times out after 60 seconds and then stops trying and nothing makes it try again. This leaves the green network up (good!) but the red network completely dead until someone reboots ipfire (or takes some other steps that re-trigger a start of dhcpcd).
My simple repair so far has been:
1. Edit /etc/init.d/networking/functions.network to start dhcpcd in the background with no timeout.
--- /root/functions.network.orig 2022-01-08 16:26:02.956856033 -0400 +++ functions.network 2022-01-08 21:07:28.617170885 -0400 @@ -56,7 +56,7 @@ # This function will start a dhcpcd on a speciefied device.
local device="$1" - local dhcp_start="" + local dhcp_start="--timeout 0 --background "
boot_mesg -n "Starting dhcpcd on the ${device} interface..."
(Be sure to include that last space inside the quotes!)
2. For my testing, I also set ntp's ENABLESETONBOOT in /var/ipfire/time/settings to off (aka “Force setting the system clock on boot”) because it sits in a loop waiting for red0 to come up otherwise!
At the time, I didn't notice that the loop in /etc/init.d/ntp stops after a minute, but nonetheless, it was handy to turn it off while testing :) So, I _think_ only the first change is necessary.
All of the testing I did so far seems to indicate that, provided I don't have rules that explicitly mention the red0 IP address, all works well when the lease is acquired, or even when the lease changes the IP unexpectedly.
Is a change like this something that could become part of ipfire?
Thanks for making ipfire! I'm impressed so far.
On 1/8/2022 10:03 PM, Jose Dias wrote:
I've had dhcp do that to me and it doesn't take a power failure but a downed isp will do.
My solution is to have a script run every 5 minutes to try again and after an hour reboot ipfire again.
My desktop just died and I'm rebuilding it but I can post that script if it's helpful.
Yes, I agree. In the community post I linked to, I tried to explain details failure.
With my change to its startup arguments, I've been able to demonstrate that even if dhcpcd is unable to obtain an initial DHCP lease at boot, ipfire's boot sequence is not blocked, and dhcpcd does correctly keep retrying for longer than 60 seconds, and that ipfire reacts correctly when an IP is leased. See
If you're interested, you can see my most recent followup in the community: https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec...
So, I'm going to try this instead of rebooting. The retries are frequent and cheap, and other (local) ipfire services remain available the whole time.
Thanks for the offer!
Hi,
thanks for getting in touch. The behaviour you described is tracked in bug #10813. I am currently working on a clean solution for this and bug #11502 but this takes time. The scripts involved in this are rather old (2007) which makes the fix more of a rewrite.
You can see my progress here: https://git.ipfire.org/?p=people/jschlag/ipfire-2.x.git;a=shortlog;h=refs/he...
As this needs a very good test a will reach out to the list when i have an iso image with all necessary changes.
Greetings Jonatan
Am 09.01.2022 um 04:04 schrieb Brad Spencer spencer@jacknife.org:
On 1/8/2022 10:03 PM, Jose Dias wrote:
I've had dhcp do that to me and it doesn't take a power failure but a downed isp will do.
My solution is to have a script run every 5 minutes to try again and after an hour reboot ipfire again.
My desktop just died and I'm rebuilding it but I can post that script if it's helpful.
Yes, I agree. In the community post I linked to, I tried to explain details failure.
With my change to its startup arguments, I've been able to demonstrate that even if dhcpcd is unable to obtain an initial DHCP lease at boot, ipfire's boot sequence is not blocked, and dhcpcd does correctly keep retrying for longer than 60 seconds, and that ipfire reacts correctly when an IP is leased. See
If you're interested, you can see my most recent followup in the community: https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec...
So, I'm going to try this instead of rebooting. The retries are frequent and cheap, and other (local) ipfire services remain available the whole time.
Thanks for the offer!
-- Brad Spencer
This is what I use. It actually runs out of /etc/fcron.hourly . The way it works for me, it'll keep trying to get an IP and if after 15 cycles it still doesn't have an IP then it reboots. Rebooting has no impact on dhcp on green as lease times will withstand a reboot.
[root@harold fcron.hourly]# cat /etc/fcron.hourly/monitor_red_device.sh #!/bin/sh
# set -vx
SLEEP=30s DEV=red0
CYCLES=15
IPLOST=0
for c in `seq 1 $CYCLES` ; do
# get ip address from external device, default red0 IP= IP=`ip address show dev ${DEV} | grep inet | awk '{print $2}'`
echo IP=${IP}
# if we got an ip then exit if [ -n "${IP}" ] ; then if [ ${IPLOST} -eq 0 ] ; then logger -t ipfire "IP= ${IP}" else logger -t ipfire "IP= ${IP} reaquired." fi exit 0 fi
IPLOST=1 logger -t ipfire 'IP lost' # we don't have an IP address. wait a minute sleep ${SLEEP}
/etc/init.d/networking/red stop ${DEV}
sleep ${SLEEP}
/etc/init.d/networking/red start ${DEV} sleep ${SLEEP}
done
logger -t ipfire 'IP not aquired. Rebooting.' # if we reach this far then we might as well restart telinit 6
-----Original Message----- From: Jonatan Schlag [mailto:jonatan.schlag@ipfire.org] Sent: Sun 1/9/2022 10:32 AM To: Brad Spencer Cc: Jose A. Dias; IPFire Development Subject: Re: Keep trying DHCP on red0 if unavailable at boot
Hi,
thanks for getting in touch. The behaviour you described is tracked in bug #10813. I am currently working on a clean solution for this and bug #11502 but this takes time. The scripts involved in this are rather old (2007) which makes the fix more of a rewrite.
You can see my progress here: https://git.ipfire.org/?p=people/jschlag/ipfire-2.x.git;a=shortlog;h=refs/he...
As this needs a very good test a will reach out to the list when i have an iso image with all necessary changes.
Greetings Jonatan
Am 09.01.2022 um 04:04 schrieb Brad Spencer spencer@jacknife.org:
?On 1/8/2022 10:03 PM, Jose Dias wrote:
I've had dhcp do that to me and it doesn't take a power failure but a downed isp will do.
My solution is to have a script run every 5 minutes to try again and after an hour reboot ipfire again.
My desktop just died and I'm rebuilding it but I can post that script if it's helpful.
Yes, I agree. In the community post I linked to, I tried to explain details failure.
With my change to its startup arguments, I've been able to demonstrate that even if dhcpcd is unable to obtain an initial DHCP lease at boot, ipfire's boot sequence is not blocked, and dhcpcd does correctly keep retrying for longer than 60 seconds, and that ipfire reacts correctly when an IP is leased. See
If you're interested, you can see my most recent followup in the community: https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec...
So, I'm going to try this instead of rebooting. The retries are frequent and cheap, and other (local) ipfire services remain available the whole time.
Thanks for the offer!
-- Brad Spencer
Well, Rogers (my ISP) does not disappoint in this regard. They disappoint in other ways, but down time they are producing.
This is what the logs show for today. This is from the IPFire section of the System Logs. I lost the IP at around at about 10:30 local time. I did some maintenance while it was out and then I let it cycle through. The only change I've made in the script below was to use CYCLES=18 to let it run a couple more times in the hour. This is rough but it gets around the bug for how.
From: Development development-bounces@lists.ipfire.org On Behalf Of Jose A. Dias Sent: Sunday, January 9, 2022 4:02 PM To: Jonatan Schlag jonatan.schlag@ipfire.org; Brad Spencer spencer@jacknife.org Cc: IPFire Development development@lists.ipfire.org Subject: RE: Keep trying DHCP on red0 if unavailable at boot
This is what I use. It actually runs out of /etc/fcron.hourly . The way it works for me, it'll keep trying to get an IP and if after 15 cycles it still doesn't have an IP then it reboots. Rebooting has no impact on dhcp on green as lease times will withstand a reboot.
[root@harold fcron.hourly]# cat /etc/fcron.hourly/monitor_red_device.sh #!/bin/sh
# set -vx
SLEEP=30s DEV=red0
CYCLES=15
IPLOST=0
for c in `seq 1 $CYCLES` ; do
# get ip address from external device, default red0 IP= IP=`ip address show dev ${DEV} | grep inet | awk '{print $2}'`
echo IP=${IP}
# if we got an ip then exit if [ -n "${IP}" ] ; then if [ ${IPLOST} -eq 0 ] ; then logger -t ipfire "IP= ${IP}" else logger -t ipfire "IP= ${IP} reaquired." fi exit 0 fi
IPLOST=1 logger -t ipfire 'IP lost' # we don't have an IP address. wait a minute sleep ${SLEEP}
/etc/init.d/networking/red stop ${DEV}
sleep ${SLEEP}
/etc/init.d/networking/red start ${DEV} sleep ${SLEEP}
done
logger -t ipfire 'IP not aquired. Rebooting.' # if we reach this far then we might as well restart telinit 6
-----Original Message----- From: Jonatan Schlag [mailto:jonatan.schlag@ipfire.org] Sent: Sun 1/9/2022 10:32 AM To: Brad Spencer Cc: Jose A. Dias; IPFire Development Subject: Re: Keep trying DHCP on red0 if unavailable at boot
Hi,
thanks for getting in touch. The behaviour you described is tracked in bug #10813. I am currently working on a clean solution for this and bug #11502 but this takes time. The scripts involved in this are rather old (2007) which makes the fix more of a rewrite.
You can see my progress here: https://git.ipfire.org/?p=people/jschlag/ipfire-2.x.git;a=shortlog;h=refs/he ads/improve_network_startup
As this needs a very good test a will reach out to the list when i have an iso image with all necessary changes.
Greetings Jonatan
Am 09.01.2022 um 04:04 schrieb Brad Spencer <spencer@jacknife.org
mailto:spencer@jacknife.org >:
?On 1/8/2022 10:03 PM, Jose Dias wrote:
I've had dhcp do that to me and it doesn't take a power failure but a
downed isp will do.
My solution is to have a script run every 5 minutes to try again and
after an hour reboot ipfire again.
My desktop just died and I'm rebuilding it but I can post that script if
it's helpful.
Yes, I agree. In the community post I linked to, I tried to explain
details failure.
With my change to its startup arguments, I've been able to demonstrate
that even if dhcpcd is unable to obtain an initial DHCP lease at boot, ipfire's boot sequence is not blocked, and dhcpcd does correctly keep retrying for longer than 60 seconds, and that ipfire reacts correctly when an IP is leased. See
If you're interested, you can see my most recent followup in the
community: https://community.ipfire.org/t/dhcp-client-on-red0-wont-reassign-ip-upon-rec onnection/2455/38?u=spencer
So, I'm going to try this instead of rebooting. The retries are frequent
and cheap, and other (local) ipfire services remain available the whole time.
Thanks for the offer!
-- Brad Spencer