Michael,
Thank you for your response. I agree with you that a complete absence of service is not the only reason to switch to a backup network connection. My particular scenario involves a primary connection that is an "unlimited" connection, and my backup has a 5GB/mo cap, so this is what I'm starting, based on a specific use case. I think the foundation is here to address other concerns, such as latency, but will require some additional work to accomplish.
I also agree that with DHCP, this gets tricky, but again, my specific use case was what I based this on, which uses static IPs.
In my case, the secondary IP address on the red0 interface is because my red0 interface has a public, static IP that is within a range that belongs to my primary carrier. There are other scenarios where that would not be necessary, but would require interaction with a BGP domain or would require private IPs and a double-NAT.
I agree with moving to the internal mail agent. I wasn't aware of how to invoke it, so I added it in. I imagine it's easily rectified.
I considered multiple ping targets, but didn't implement yet because there will be some more if/then/else or do/while loops to consider. :)
As it concerns the frequency of the ICMP check, this could be easily modified to allow the user to change the interval. Every 5 minutes is sufficient for my purposes, but I imagine in a production environment, once a minute, may be preferable, or even less.
I do not have this in a git branch, as I don't know how to do so. I'd love to learn! :)
Thank you,
Jeff Russell
-----Original Message----- From: Michael Tremer [mailto:michael.tremer@ipfire.org] Sent: Tuesday, November 29, 2016 10:51 AM To: Jeffrey S. Russell jeff.russell@russellingupfun.com; development@lists.ipfire.org Subject: Re: WAN Failover
Hello Jeff,
On Wed, 2016-11-16 at 21:50 -0500, Jeffrey S. Russell wrote:
First, I apologize if I am not conforming to any specific processes or etiquette in this correspondence. I did look for guidance prior to sending this email, and didn’t find any. Now, on to the purpose of this missive:
There isn't much guidelines but common sense and this for the patches: http://wiki.ipfire.org/devel/submit-patches
I have been using IPFire for about 4 years now. I’m quite impressed with the platform and its sophistication. However, I ran into a need for WAN Failover, which was absent in the platform. After looking for solutions and finding very little, I became adventurous. I’m hoping that this can be taken as a contribution to be incorporated into the core platform, though it will likely need a little tweaking. I created my own WAN Failover solution for IPFire, including a script that checks and verifies the primary connection and switches to the secondary WAN in the event of a failure. It continues to monitor the primary and switches back once it’s reliably up again. Additionally, I “hacked” the integration of configuration pages in the Web GUI. To be clear, I am not a typically educated programmer, but have learned along the way due to various needs I or others have had. My technique may be lacking, so please be gentle.
That's okay. We can work on that.
However, there is a few reasons why IPFire doesn't support automatic failover. Simply: It never works well.
But I can understand the problem that you are trying to solve here and it makes sense in the case that your primary connection *entirely* fails. I is not just getting a bit wobbly - it has to cut off completely - which is quite frankly not the only reason to act and probably not a very common one either.
Very often you have a significant amount of packet loss and high latency and those would be reasons to act for me if this should be a highly sophisticated solution. But let's start with the basics and get to that...
J Here’s what I did:
On the IPFire box, I had to do the following: Add a secondary IP address to the RED interface Modify the file "/var/ipfire/menu.d/30-network.menu" to include a menu item for "WAN Failover" Modify the file "/var/ipfire/langs/en.pl" to include certain language additions (I didn't know the correct words for the other languages, but could research to add) Create a CGI file for WAN failover, "/srv/web/ipfire/cgi-bin/failover.cgi" Create a folder for the failover config, "/var/ipfire/failover", and a file in the folder, "failover.conf", marking both for owner and group, "nobody:nobody" Create a script to switch from Primary to Secondary WAN and vice-versa Add a fcrontab entry to run the script every 5 minutes To add the secondary IP address, I edited "/etc/rc.d/init.d/networking/red" and inserted "ip addr add 1.1.1.2/30 dev red0" on line 117. I'm not happy with specifying the IP address explicitly in this file, but I have not yet delved deep enough to understand the best way to include this as a variable, especially given the complexity of this file. My implementation is based on Ethernet-only, but could be further developed to encompass any RED interface type. To do this, someone smarter than me would need to make this more integrated with the IPFire setup. This is, in my mind, the weakest part of my implementation.
Well, maintaining multiple connections alive that use DHCP is a bit trickier than this. I think focusing on static IPs is okay for now.
But what is the reason to use a second IP address?
Here are the entries I added to the /var/ipfire/langs/en.pl:
'Failover' => 'WAN Failover', 'ping target' => 'Ping Target', 'primary isp name' => 'Primary ISP Name', 'secondary isp name' => 'Secondary ISP Name',
The CGI file for WAN Failover includes all the required fields for the gateway switching script: Ping Target - This is the IP address that should be used to check for Layer-3 connectivity over the Primary WAN. This should always be an IP address, because DNS resolution failure could cause the script to fail in error. Gateway - This is the IP address of the Primary WAN gateway, which we discover from the "/var/ipfire/network/settings" file. Primary ISP Name - This is just a useful name for the primary, typically the name of your ISP and is used in the logging. Secondary ISP Name - Again, a useful name and is relevant in logging. Source IP - This is the Source IP address to be used to verify connectivity and should align with the subnet of the primary WAN link gateway. Mailserver Address - This is the FQDN of the mail server for notifications of gateway changes. Mailserver Port - This is the TCP port the mail server listens on. Mail Sender - This is the sender email address, which should represent the IPFire box. Mail recipient - This is the receiver email address, which should represent the indivudual that manages the IPFire box, or can be an email address linked to a distribution list for multiple recipients. Username - This is the mail server username for the IPFire login to email. Password - This is the mail server password for the IPFire login to email.
For this to be merged into the distribution the email address stuff must be replaced by the new(ish) internal mail agent. That should even be better since it is trying to retransmit in case both connections are down. However, this is redundant and nothing is necessary that doesn't exist so far.
Targets to ping is always a huge problem. First of all I think the gateway should be connected to. If that is okay, a host on the Internet should be tried.
If I had more time and/or better skills, I would have added an optional checkbox to enable/disable email notification as well as optional credential entry. I would also have considered using the "Mailserver" package in IPFire and its relevant credentials.
The following is my CGI file for WAN Failover: #!/usr/bin/perl ###################################################################### ######## # # # # IPFire.org - A linux based firewall # # Copyright (C) 2007 Michael Tremer & Christian Schmidt # # # # This program is free software: you can redistribute it and/or modify # # it under the terms of the GNU General Public License as published by # # the Free Software Foundation, either version 3 of the License, or # # (at your option) any later version. # # # # This program is distributed in the hope that it will be useful, # # but WITHOUT ANY WARRANTY; without even the implied warranty of # # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # # GNU General Public License for more details. # # # # You should have received a copy of the GNU General Public License # # along with this program. If not, see http://www.gnu.org/licenses/. # # # ###################################################################### ######## #
use strict;
require '/var/ipfire/general-functions.pl'; require "${General::swroot}/lang.pl"; require "${General::swroot}/header.pl";
my %cgiparams=(); my %mainsettings=(); my %failoversettings=(); my %netsettings=(); my %color=(); my %checked=(); my $errormessage='';
$cgiparams{'ACTION'} = ''; &Header::getcgihash(%cgiparams);
&Header::showhttpheaders(); &General::readhash("${General::swroot}/main/settings",%mainsettings); &General::readhash("/srv/web/ipfire/html/themes/".$mainsettings{'THEME '}."/inc lude/colors.txt", %color); &General::readhash("${General::swroot}/failover/failover.conf",%failo versetti ngs); &General::readhash("${General::swroot}/ethernet/settings",%netsetting s);
if ($cgiparams{'ACTION'} eq "$Lang::tr{'save'}") { $failoversettings{'TARGET'} = $cgiparams{'TARGET'}; $failoversettings{'MAILFROM'} = $cgiparams{'MAILFROM'}; $failoversettings{'MAILTO'} = $cgiparams{'MAILTO'}; $failoversettings{'PRIMARYISP'} = $cgiparams{'PRIMARYISP'}; $failoversettings{'SECONDARYISP'} = $cgiparams{'SECONDARYISP'}; $failoversettings{'MAILSERVER'} = $cgiparams{'MAILSERVER'}; $failoversettings{'MAILPORT'} = $cgiparams{'MAILPORT'}; $failoversettings{'MAILUSER'} = $cgiparams{'MAILUSER'}; $failoversettings{'MAILPWD'} = $cgiparams{'MAILPWD'}; $failoversettings{'GATEWAY'} = $cgiparams{'GATEWAY'}; $failoversettings{'SOURCEIP'} = $cgiparams{'SOURCEIP'};
&General::writehash("${General::swroot}/failover/failover.conf", %failoversettings); SAVE_ERROR: } else { if ($failoversettings{'TARGET'}) { $cgiparams{'TARGET'} = $failoversettings{'TARGET'}; } if ($failoversettings{'MAILFROM'}) { $cgiparams{'MAILFROM'} = $failoversettings{'MAILFROM'}; } if ($failoversettings{'MAILTO'}) { $cgiparams{'MAILTO'} = $failoversettings{'MAILTO'}; } if ($failoversettings{'PRIMARYISP'}) { $cgiparams{'PRIMARYISP'} = $failoversettings{'PRIMARYISP'}; } if ($failoversettings{'SECONDARYISP'}) { $cgiparams{'SECONDARYISP'} = $failoversettings{'SECONDARYISP'}; } if ($failoversettings{'MAILSERVER'}) { $cgiparams{'MAILSERVER'} = $failoversettings{'MAILSERVER'}; } if ($failoversettings{'MAILPORT'}) { $cgiparams{'MAILPORT'} = $failoversettings{'MAILPORT'}; } if ($failoversettings{'MAILUSER'}) { $cgiparams{'MAILUSER'} = $failoversettings{'MAILUSER'}; } if ($failoversettings{'MAILPWD'}) { $cgiparams{'MAILPWD'} = $failoversettings{'MAILPWD'}; } if ($failoversettings{'GATEWAY'}) { $cgiparams{'GATEWAY'} = $failoversettings{'GATEWAY'}; } else { $cgiparams{'GATEWAY'} = $netsettings{'DEFAULT_GATEWAY'}; } if ($failoversettings{'SOURCEIP'}) { $cgiparams{'SOURCEIP'} = $failoversettings{'SOURCEIP'}; } else { $cgiparams{'SOURCEIP'} = $netsettings{'RED_ADDRESS'}; } } &Header::openpage($Lang::tr{'Failover'}, 1, '');
print <<END
<form method='post' action='$ENV{'SCRIPT_NAME'}'> <table width='100%'> <tr> <td> </td> <td width='25%' class='base'>$Lang::tr{'ping target'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='TARGET' value='$cgiparams{'TARGET'}' /></td> <td width='25%' class='base'>$Lang::tr{'primary isp name'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='PRIMARYISP' value='$cgiparams{'PRIMARYISP'}' /></td> </tr> <tr> <td> </td> <td width='25%' class='base'>$Lang::tr{'gateway'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='GATEWAY' value='$cgiparams{'GATEWAY'}' /></td> <td width='25%' class='base'>$Lang::tr{'secondary isp name'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='SECONDARYISP' value='$cgiparams{'SECONDARYISP'}' /></td> </tr> <tr> <td> </td> <td width='25%' class='base'></td> <td width='25%'></td> <td width='25%' class='base'>Source IP: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='SOURCEIP' value='$cgiparams{'SOURCEIP'}' /></td> </tr> <tr> <td> </td> <td width='25%' class='base'>$Lang::tr{'email mailaddr'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='MAILSERVER' value='$cgiparams{'MAILSERVER'}' /></td> <td width='25%' class='base'>$Lang::tr{'email mailport'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='MAILPORT' value='$cgiparams{'MAILPORT'}' /></td> </tr> <tr> <td> </td> <td width='25%' class='base'>$Lang::tr{'email mailsender'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='MAILFROM' value='$cgiparams{'MAILFROM'}' /></td> <td width='25%' class='base'>$Lang::tr{'email mailrcpt'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='MAILTO' value='$cgiparams{'MAILTO'}' /></td> </tr> <tr> <td> </td> <td width='25%' class='base'>$Lang::tr{'email mailuser'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='text' name='MAILUSER' value='$cgiparams{'MAILUSER'}' /></td> <td width='25%' class='base'>$Lang::tr{'email mailpass'}: <img src='/blob.gif' alt='*' /></td> <td width='25%'><input type='password' name='MAILPWD' value='$cgiparams{'MAILPWD'}' /></td> </tr> </table> <table width='100%'> <tr> <td> </td> <td width='25%'><img src='/blob.gif' alt='*' /> $Lang::tr{'required field'}</td> <td width='50%' align='right'></td> <td width='25%' align='right'><input type='submit' name='ACTION' value='$Lang::tr{'save'}' /></td> </tr> </table> END ; &Header::closebox(); &Header::closebigbox(); &Header::closepage();
The following is the format of the "failover.conf" file:
MAILUSER=<username> GATEWAY=<gateway_address> MAILPORT=<mail_port> MAILPWD=<mail_password> TARGET=<ping_target> SOURCEIP=<source_ip_address> MAILTO=<email_recipient> MAILFROM=<email_sender> SECONDARYISP=<secondary_isp> PRIMARYISP=<primary_isp> MAILSERVER=<mail_server_fqdn>
The following is the "check-gateway.sh" script:
# Script to check on the status of the Primary WAN network and switch to a Secondary WAN network. #!/bin/bash
# Create logfile and send all output to logfile logfile=/var/log/gateway-check.log exec &> >(tee -a "$logfile")
# Define variable for a date/time stamp datetimestamp=$(date)
# Define variable for the source IP to use for the ping source SRC_IP=$(cat /var/ipfire/failover/failover.conf | grep SOURCEIP | cut -d "=" -f 2)
# Define variable for Primary Gateway Address GATEWAY_IP=$(cat /var/ipfire/failover/failover.conf | grep GATEWAY | cut -d "=" -f 2)
# Define variable for destination MACs DEST_MAC=$(arp -n $GATEWAY_IP | grep $GATEWAY_IP | awk '{print $3}')
# Define variable for ping target PING_TARGET=$(cat /var/ipfire/failover/failover.conf | grep TARGET | cut -d "=" -f 2)
# Define email fields MAIL_FROM=$(cat /var/ipfire/failover/failover.conf | grep MAILFROM | cut -d "=" -f 2) MAIL_TO=$(cat /var/ipfire/failover/failover.conf | grep MAILTO | cut -d "=" -f 2) PRIMARY_ISP=$(cat /var/ipfire/failover/failover.conf | grep PRIMARYISP | cut -d "=" -f 2) SECONDARY_ISP=$(cat /var/ipfire/failover/failover.conf | grep SECONDARYISP | cut -d "=" -f 2) MAIL_SERVER=$(cat /var/ipfire/failover/failover.conf | grep MAILSERVER | cut -d "=" -f 2) MAIL_PORT=$(cat /var/ipfire/failover/failover.conf | grep MAILPORT | cut -d "=" -f 2) MAIL_USER=$(cat /var/ipfire/failover/failover.conf | grep MAILUSER | cut -d "=" -f 2) MAIL_PWD=$(cat /var/ipfire/failover/failover.conf | grep MAILPWD | cut -d "=" -f 2)
# Beginning bracket for logging results to show the date.time of the event echo "=========="$datetimestamp"=========="
echo "------------------Parameter Check------------------" echo "---------------------------------------------------" echo "Field | Parameter" echo "___________________________________________________" echo "Source IP Address | "$SRC_IP echo "Gateway IP Address | "$GATEWAY_IP echo "Destination MAC Address | "$DEST_MAC echo "Ping Target | "$PING_TARGET echo "Source email Address | "$MAIL_FROM echo "Destination email Address | "$MAIL_TO echo "Primary ISP Name | "$PRIMARY_ISP echo "Secondary ISP Name | "$SECONDARY_ISP echo "Mail Server FQDN | "$MAIL_SERVER echo "Mail Server Port | "$MAIL_PORT echo "Mail Server Username | "$MAIL_USER echo "Mail Server Password | "$MAIL_PWD echo # Creating a variable to determine the result of the nping command, which send 5 pings to the ping target, always over my Primary WAN, no matter the default gateway # The command pulls the number of "Lost" pings, which will be a value from 1 to 5. result=$(nping --icmp --source-ip $SRC_IP --dest-mac $DEST_MAC $PING_TARGET | grep Lost | awk '{print $12}')
# Creating a variable to determine the default gateway. The "route" command shows the routing table, the "grep" command is pulling the default route line and the "awk" command is printing the second value on the line. gateway=$(route | grep default | awk '{print $2}')
# This prints the current gateway and the number of lost pings to the logfile. echo "The current gateway is ("$gateway"), and the number of lost pings is "$result"."
# Here, I evaluate the conditions and create an action depending on the result. First, I am trying to determine if all pings were lost. If all pings are lost, I'm declaring the Primary WAN link to be "dead". if [ $result -eq 5 ]; then # Next, knowing that the Primary WAN has failed, I check to see if the default gateway is over my Primary WAN ("gateway" is the hostname). if [ "$gateway" = "gateway" ]; then # The next line logs that the Primary WAN link has failed and we will switch over to Secondary WAN. echo "The $PRIMARY_ISP link has failed. Switching over to the $SECONDARY_ISP link..." # The next lines delete the current Primary WAN gateway, switch to Secondary WAN and show the routing table for confirmation. route delete default gw gateway route add default gw 1.1.1.1 echo "Gateway switched from $PRIMARY_ISP to $SECONDARY_ISP. Following is the routing table:" route # The next command sends me an email to alert me that the Primary WAN link is down and we are switching over to Secondary WAN. echo sendEmail -f $MAIL_FROM -t $MAIL_TO -u "Gateway Change Alert" -m "The $PRIMARY_ISP link has failed. Switching over to $SECONDARY_ISP link." -s $MAIL_SERVER:$MAIL_PORT -xu $MAIL_USER -xp $MAIL_PWD else # Now, again knowing that we failed over Primary WAN, if the default gateway is Secondary WAN, we don't do anything. echo "$SECONDARY_ISP is the current active link. $PRIMARY_ISP is still down." # Since there is nothing we need to do, we next exit the shell script. exit fi # The next section is based on if the Primary WAN is operational (i.e. less than 5 lost pings) else # Knowing that the Primary WAN link is viable, we now evaluate if we are using the Secondary WAN link (1.1.1.1 is the gateway when using Secondary WAN). if [ "$gateway" = "1.1.1.1" ]; then # The next line logs that the Primary WAN is operational and we will switch back to Primary WAN from Secondary WAN. echo "The $PRIMARY_ISP link is now operational. Switching over to the $PRIMARY_ISP link." # The next lines delete the current Secondary WAN gateway, switch to Primary WAN and show the routing table for verification. route delete default gw 1.1.1.1 route add default gw gateway echo "Gateway switched from $SECONDARY_ISP to $PRIMARY_ISP. Following is the routing table:" route # The next command sends me an email to alert me that the Primary WAN link is up and we are switching back to the Primary WAN. echo sendEmail -f $MAIL_FROM -t $MAIL_TO -u "Gateway Change Alert" -m "The $PRIMARY_ISP link is now operational. Switching back to the $PRIMARY_ISP link." -s $MAIL_SERVER:$MAIL_PORT -xu $MAIL_USER -xp $MAIL_PWD else # Now, again knowing that the Primary WAN is operational, if the default gateway is already the cable mode, then we don't do anything. echo "The $PRIMARY_ISP link is operational and $PRIMARY_ISP is the current active link." fi fi # We end the logging with the same date/time stamp to indicate that this is the end of this iteration of the scrupt run. echo "=========="$datetimestamp"=========="
Finally, the following is the addition to fcrontab for the 5-minute interval run of the script:
# Check the gateway connection every 15 minutes, if down, switch gateway to Verizon backup */5 * * * * /root/check-gateway.sh
Not sure if this is "often enough", but probably better than nothing :)
Do you have this in a git branch or something? It is a bit hard to read the code in this email.
The script is currently stored in /root/, but that is just in my implementation while I am ignorant of the proper location of such a script.
/usr/sbin probably.
So, that's it! I'm open to thoughts or suggestions, but I'm currently at my limit of capability. I'd love to see this get added as an option in the initial setup, or packaged into an addon. I've seen numerous requests for this kind of feature, so I did this to get something functional in place. If there are any resources that can help me develop this into an official add-on, I'd love to see it so this can be available to the community.
This would probably be a part of the core package since it is modifying the network scripts.
Hope my feedback helps a little and we can work on this.
Best, -Michael
Thank you!
Jeff Russell