From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jeffrey S. Russell" To: development@lists.ipfire.org Subject: RE: WAN Failover Date: Tue, 29 Nov 2016 14:46:28 -0500 Message-ID: <0c2801d24a79$47697ec0$d63c7c40$@russellingupfun.com> In-Reply-To: <1480434670.13949.82.camel@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0869832792028812353==" List-Id: --===============0869832792028812353== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Michael, Thank you for your response. I agree with you that a complete absence of ser= vice is not the only reason to switch to a backup network connection. My par= ticular scenario involves a primary connection that is an "unlimited" connect= ion, and my backup has a 5GB/mo cap, so this is what I'm starting, based on a= specific use case. I think the foundation is here to address other concerns= , such as latency, but will require some additional work to accomplish. I also agree that with DHCP, this gets tricky, but again, my specific use cas= e was what I based this on, which uses static IPs. =20 In my case, the secondary IP address on the red0 interface is because my red0= interface has a public, static IP that is within a range that belongs to my = primary carrier. There are other scenarios where that would not be necessary= , but would require interaction with a BGP domain or would require private IP= s and a double-NAT. I agree with moving to the internal mail agent. I wasn't aware of how to inv= oke it, so I added it in. I imagine it's easily rectified. I considered multiple ping targets, but didn't implement yet because there wi= ll be some more if/then/else or do/while loops to consider. :) As it concerns the frequency of the ICMP check, this could be easily modified= to allow the user to change the interval. Every 5 minutes is sufficient for= my purposes, but I imagine in a production environment, once a minute, may b= e preferable, or even less. I do not have this in a git branch, as I don't know how to do so. I'd love t= o learn! :) Thank you, Jeff Russell -----Original Message----- From: Michael Tremer [mailto:michael.tremer(a)ipfire.org]=20 Sent: Tuesday, November 29, 2016 10:51 AM To: Jeffrey S. Russell ; development(a)li= sts.ipfire.org Subject: Re: WAN Failover Hello Jeff, On Wed, 2016-11-16 at 21:50 -0500, Jeffrey S. Russell wrote: > First, I apologize if I am not conforming to any specific processes or=20 > etiquette in this correspondence. I did look for guidance prior to=20 > sending this email, and didn=E2=80=99t find any. Now, on to the purpose of= this missive: There isn't much guidelines but common sense and this for the patches: http://wiki.ipfire.org/devel/submit-patches > I have been using IPFire for about 4 years now. I=E2=80=99m quite impresse= d=20 > with the platform and its sophistication. However, I ran into a need=20 > for WAN Failover, which was absent in the platform. After looking for=20 > solutions and finding very little, I became adventurous. I=E2=80=99m hopin= g=20 > that this can be taken as a contribution to be incorporated into the=20 > core platform, though it will likely need a little tweaking. I=20 > created my own WAN Failover solution for IPFire, including a script=20 > that checks and verifies the primary connection and switches to the=20 > secondary WAN in the event of a failure. It continues to monitor the prima= ry and switches back once it=E2=80=99s reliably up again. > Additionally, I =E2=80=9Chacked=E2=80=9D the integration of configuration p= ages in the=20 > Web GUI. To be clear, I am not a typically educated programmer, but=20 > have learned along the way due to various needs I or others have had. =20 > My technique may be lacking, so please be gentle. That's okay. We can work on that. However, there is a few reasons why IPFire doesn't support automatic failover. Simply: It never works well. But I can understand the problem that you are trying to solve here and it mak= es sense in the case that your primary connection *entirely* fails. I is not = just getting a bit wobbly - it has to cut off completely - which is quite fra= nkly not the only reason to act and probably not a very common one either. Very often you have a significant amount of packet loss and high latency and = those would be reasons to act for me if this should be a highly sophisticated= solution. But let's start with the basics and get to that... > J Here=E2=80=99s what I did: > =20 > On the IPFire box, I had to do the following: > Add a secondary IP address to the RED interface Modify the file=20 > "/var/ipfire/menu.d/30-network.menu" to include a menu item for "WAN=20 > Failover" > Modify the file "/var/ipfire/langs/en.pl" to include certain language=20 > additions (I didn't know the correct words for the other languages,=20 > but could research to add) Create a CGI file for WAN failover,=20 > "/srv/web/ipfire/cgi-bin/failover.cgi" > Create a folder for the failover config, "/var/ipfire/failover", and a=20 > file in the folder, "failover.conf", marking both for owner and group, "nob= ody:nobody" > Create a script to switch from Primary to Secondary WAN and vice-versa=20 > Add a fcrontab entry to run the script every 5 minutes To add the=20 > secondary IP address, I edited "/etc/rc.d/init.d/networking/red" > and inserted "ip addr add 1.1.1.2/30 dev red0" on line 117. I'm not=20 > happy with specifying the IP address explicitly in this file, but I=20 > have not yet delved deep enough to understand the best way to include=20 > this as a variable, especially given the complexity of this file. My=20 > implementation is based on Ethernet-only, but could be further=20 > developed to encompass any RED interface type. To do this, someone=20 > smarter than me would need to make this more integrated with the=20 > IPFire setup. This is, in my mind, the weakest part of my implementation. Well, maintaining multiple connections alive that use DHCP is a bit trickier = than this. I think focusing on static IPs is okay for now. But what is the reason to use a second IP address? > Here are the entries I added to the /var/ipfire/langs/en.pl: > =20 > 'Failover' =3D> 'WAN Failover', > 'ping target' =3D> 'Ping Target', > 'primary isp name' =3D> 'Primary ISP Name', 'secondary isp name' =3D>=20 > 'Secondary ISP Name', > =20 > The CGI file for WAN Failover includes all the required fields for the=20 > gateway switching script: > Ping Target - This is the IP address that should be used to check for=20 > Layer-3 connectivity over the Primary WAN. This should always be an IP=20 > address, because DNS resolution failure could cause the script to fail in e= rror. > Gateway - This is the IP address of the Primary WAN gateway, which we=20 > discover from the "/var/ipfire/network/settings" file. > Primary ISP Name - This is just a useful name for the primary,=20 > typically the name of your ISP and is used in the logging. > Secondary ISP Name - Again, a useful name and is relevant in logging. > Source IP - This is the Source IP address to be used to verify=20 > connectivity and should align with the subnet of the primary WAN link gatew= ay. > Mailserver Address - This is the FQDN of the mail server for=20 > notifications of gateway changes. > Mailserver Port - This is the TCP port the mail server listens on. > Mail Sender - This is the sender email address, which should represent=20 > the IPFire box. > Mail recipient - This is the receiver email address, which should=20 > represent the indivudual that manages the IPFire box, or can be an=20 > email address linked to a distribution list for multiple recipients. > Username - This is the mail server username for the IPFire login to email. > Password - This is the mail server password for the IPFire login to email. For this to be merged into the distribution the email address stuff must be r= eplaced by the new(ish) internal mail agent. That should even be better since= it is trying to retransmit in case both connections are down. However, this = is redundant and nothing is necessary that doesn't exist so far. Targets to ping is always a huge problem. First of all I think the gateway sh= ould be connected to. If that is okay, a host on the Internet should be tried. > If I had more time and/or better skills, I would have added an=20 > optional checkbox to enable/disable email notification as well as=20 > optional credential entry. I would also have considered using the=20 > "Mailserver" package in IPFire and its relevant credentials. >=20 > The following is my CGI file for WAN Failover: > #!/usr/bin/perl > ###################################################################### > ######## > # > # > # > # IPFire.org - A linux based firewall > # > # Copyright (C) 2007 Michael Tremer & Christian Schmidt # # > # > # This program is free software: you can redistribute it and/or modify=20 > # # it under the terms of the GNU General Public License as published=20 > by # # the Free Software Foundation, either version 3 of the License,=20 > or > # > # (at your option) any later version. > # > # > # > # This program is distributed in the hope that it will be useful, > # > # but WITHOUT ANY WARRANTY; without even the implied warranty of # #=20 > MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > # > # GNU General Public License for more details. # # > # > # You should have received a copy of the GNU General Public License > # > # along with this program. If not, see=20 > . > # > # > # > ###################################################################### > ######## > # >=20 >=20 > use strict; >=20 > require '/var/ipfire/general-functions.pl'; > require "${General::swroot}/lang.pl"; > require "${General::swroot}/header.pl"; >=20 > my %cgiparams=3D(); > my %mainsettings=3D(); > my %failoversettings=3D(); > my %netsettings=3D(); > my %color=3D(); > my %checked=3D(); > my $errormessage=3D''; >=20 > $cgiparams{'ACTION'} =3D ''; > &Header::getcgihash(\%cgiparams); >=20 > &Header::showhttpheaders(); > &General::readhash("${General::swroot}/main/settings",\%mainsettings); > &General::readhash("/srv/web/ipfire/html/themes/".$mainsettings{'THEME > '}."/inc > lude/colors.txt", \%color); > &General::readhash("${General::swroot}/failover/failover.conf",\%failo > versetti > ngs); > &General::readhash("${General::swroot}/ethernet/settings",\%netsetting > s); >=20 > if ($cgiparams{'ACTION'} eq "$Lang::tr{'save'}") { > $failoversettings{'TARGET'} =3D $cgiparams{'TARGET'}; > $failoversettings{'MAILFROM'} =3D $cgiparams{'MAILFROM'}; > $failoversettings{'MAILTO'} =3D $cgiparams{'MAILTO'}; > $failoversettings{'PRIMARYISP'} =3D $cgiparams{'PRIMARYISP'}; > $failoversettings{'SECONDARYISP'} =3D=20 > $cgiparams{'SECONDARYISP'}; > $failoversettings{'MAILSERVER'} =3D $cgiparams{'MAILSERVER'}; > $failoversettings{'MAILPORT'} =3D $cgiparams{'MAILPORT'}; > $failoversettings{'MAILUSER'} =3D $cgiparams{'MAILUSER'}; > $failoversettings{'MAILPWD'} =3D $cgiparams{'MAILPWD'}; > $failoversettings{'GATEWAY'} =3D $cgiparams{'GATEWAY'}; > $failoversettings{'SOURCEIP'} =3D $cgiparams{'SOURCEIP'}; > =20 > &General::writehash("${General::swroot}/failover/failover.conf", > \%failoversettings); > SAVE_ERROR: > } else { > if ($failoversettings{'TARGET'}) { > $cgiparams{'TARGET'} =3D $failoversettings{'TARGET'}; > } > if ($failoversettings{'MAILFROM'}) { > $cgiparams{'MAILFROM'} =3D=20 > $failoversettings{'MAILFROM'}; > } > if ($failoversettings{'MAILTO'}) { > $cgiparams{'MAILTO'} =3D $failoversettings{'MAILTO'}; > } > if ($failoversettings{'PRIMARYISP'}) { > $cgiparams{'PRIMARYISP'} =3D=20 > $failoversettings{'PRIMARYISP'}; > } > if ($failoversettings{'SECONDARYISP'}) { > $cgiparams{'SECONDARYISP'} =3D=20 > $failoversettings{'SECONDARYISP'}; > } > if ($failoversettings{'MAILSERVER'}) { > $cgiparams{'MAILSERVER'} =3D=20 > $failoversettings{'MAILSERVER'}; > } > if ($failoversettings{'MAILPORT'}) { > $cgiparams{'MAILPORT'} =3D=20 > $failoversettings{'MAILPORT'}; > } > if ($failoversettings{'MAILUSER'}) { > $cgiparams{'MAILUSER'} =3D=20 > $failoversettings{'MAILUSER'}; > } > if ($failoversettings{'MAILPWD'}) { > $cgiparams{'MAILPWD'} =3D $failoversettings{'MAILPWD'}; > } > if ($failoversettings{'GATEWAY'}) { > $cgiparams{'GATEWAY'} =3D $failoversettings{'GATEWAY'}; > } else { > $cgiparams{'GATEWAY'} =3D=20 > $netsettings{'DEFAULT_GATEWAY'}; > } > if ($failoversettings{'SOURCEIP'}) { > $cgiparams{'SOURCEIP'} =3D=20 > $failoversettings{'SOURCEIP'}; > } else { > $cgiparams{'SOURCEIP'} =3D $netsettings{'RED_ADDRESS'}; > } > } > &Header::openpage($Lang::tr{'Failover'}, 1, ''); >=20 > print <
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
 $Lang::tr{'ping=20 > target'}: 3D'*' value=3D'$cgiparams{'TARGET'}' />$Lang::tr{'primary isp=20 > name'}: 3D'*' value=3D'$cgiparams{'PRIMARYISP'}' />
  class=3D'base'>$Lang::tr{'gateway'}:  src=3D'/blob.gif' alt=3D'*' /> value=3D'$cgiparams{'GATEWAY'}' />$Lang::tr{'secondary isp=20 > name'}: 3D'*' value=3D'$cgiparams{'SECONDARYISP'}' />
 Source IP:  src=3D'/blob.gif' alt=3D'*' /> value=3D'$cgiparams{'SOURCEIP'}' />
 $Lang::tr{'email=20 > mailaddr'}: 3D'*' value=3D'$cgiparams{'MAILSERVER'}' />$Lang::tr{'email=20 > mailport'}: 3D'*' value=3D'$cgiparams{'MAILPORT'}' />
 $Lang::tr{'email=20 > mailsender'}: 3D'*' value=3D'$cgiparams{'MAILFROM'}' />$Lang::tr{'email=20 > mailrcpt'}: 3D'*' value=3D'$cgiparams{'MAILTO'}' />
 $Lang::tr{'email=20 > mailuser'}: 3D'*' value=3D'$cgiparams{'MAILUSER'}' />$Lang::tr{'email=20 > mailpass'}: 3D'*' value=3D'$cgiparams{'MAILPWD'}' />
> > > > > > > >
 3D'*'=20 > $Lang::tr{'required field'} name=3D'ACTION' value=3D'$Lang::tr{'save'}' />
> END > ; > &Header::closebox(); > &Header::closebigbox(); > &Header::closepage(); > =20 > The following is the format of the "failover.conf" file: > =20 > MAILUSER=3D > GATEWAY=3D > MAILPORT=3D > MAILPWD=3D > TARGET=3D > SOURCEIP=3D > MAILTO=3D > MAILFROM=3D > SECONDARYISP=3D > PRIMARYISP=3D > MAILSERVER=3D > =20 > The following is the "check-gateway.sh" script: > =20 > # Script to check on the status of the Primary WAN network and switch=20 > to a Secondary WAN network. > #!/bin/bash >=20 > # Create logfile and send all output to logfile=20 > logfile=3D/var/log/gateway-check.log > exec &> >(tee -a "$logfile") >=20 > # Define variable for a date/time stamp > datetimestamp=3D$(date) >=20 > # Define variable for the source IP to use for the ping source=20 > SRC_IP=3D$(cat /var/ipfire/failover/failover.conf | grep SOURCEIP | cut -d = "=3D" > -f 2) >=20 > # Define variable for Primary Gateway Address GATEWAY_IP=3D$(cat=20 > /var/ipfire/failover/failover.conf | grep GATEWAY | cut -d "=3D" -f 2) >=20 > # Define variable for destination MACs DEST_MAC=3D$(arp -n $GATEWAY_IP |=20 > grep $GATEWAY_IP | awk '{print $3}') >=20 > # Define variable for ping target > PING_TARGET=3D$(cat /var/ipfire/failover/failover.conf | grep TARGET |=20 > cut -d "=3D" -f 2) >=20 > # Define email fields > MAIL_FROM=3D$(cat /var/ipfire/failover/failover.conf | grep MAILFROM |=20 > cut -d "=3D" -f 2) MAIL_TO=3D$(cat /var/ipfire/failover/failover.conf |=20 > grep MAILTO | cut -d "=3D" -f > 2) > PRIMARY_ISP=3D$(cat /var/ipfire/failover/failover.conf | grep PRIMARYISP=20 > | cut -d "=3D" -f 2) SECONDARY_ISP=3D$(cat=20 > /var/ipfire/failover/failover.conf | grep SECONDARYISP | cut -d "=3D" -f=20 > 2) MAIL_SERVER=3D$(cat /var/ipfire/failover/failover.conf | grep=20 > MAILSERVER | cut -d "=3D" -f 2) MAIL_PORT=3D$(cat=20 > /var/ipfire/failover/failover.conf | grep MAILPORT | cut -d "=3D" -f 2)=20 > MAIL_USER=3D$(cat /var/ipfire/failover/failover.conf | grep MAILUSER |=20 > cut -d "=3D" -f 2) MAIL_PWD=3D$(cat /var/ipfire/failover/failover.conf |=20 > grep MAILPWD | cut -d "=3D" > -f 2) >=20 > # Beginning bracket for logging results to show the date.time of the=20 > event echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D"$datetimestamp"=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D" >=20 > echo "------------------Parameter Check------------------" > echo "---------------------------------------------------" > echo "Field | Parameter" > echo "___________________________________________________" > echo "Source IP Address | "$SRC_IP echo "Gateway IP Address = > | "$GATEWAY_IP echo "Destination MAC Address | "$DEST_MAC echo "Ping=20 > Target | "$PING_TARGET echo "Source email Address |=20 > "$MAIL_FROM echo "Destination email Address | "$MAIL_TO echo "Primary=20 > ISP Name | "$PRIMARY_ISP echo "Secondary ISP Name |=20 > "$SECONDARY_ISP echo "Mail Server FQDN | "$MAIL_SERVER echo=20 > "Mail Server Port | "$MAIL_PORT echo "Mail Server Username =20 > | "$MAIL_USER echo "Mail Server Password | "$MAIL_PWD echo #=20 > Creating a variable to determine the result of the nping command,=20 > which send > 5 pings to the ping target, always over my Primary WAN, no matter the=20 > default gateway # The command pulls the number of "Lost" pings, which=20 > will be a value from 1 to 5. > result=3D$(nping --icmp --source-ip $SRC_IP --dest-mac $DEST_MAC=20 > $PING_TARGET | grep Lost | awk '{print $12}') >=20 > # Creating a variable to determine the default gateway. The "route"=20 > command shows the routing table, the "grep" command is pulling the=20 > default route line and the "awk" command is printing the second value on th= e line. > gateway=3D$(route | grep default | awk '{print $2}') >=20 > # This prints the current gateway and the number of lost pings to the logfi= le. > echo "The current gateway is ("$gateway"), and the number of lost=20 > pings is "$result"." >=20 > # Here, I evaluate the conditions and create an action depending on=20 > the result. First, I am trying to determine if all pings were lost. =20 > If all pings are lost, I'm declaring the Primary WAN link to be "dead". > if [ $result -eq 5 ]; > then > # Next, knowing that the Primary WAN has failed, I=20 > check to see if the default gateway is over my Primary WAN ("gateway" is th= e hostname). > if [ "$gateway" =3D "gateway" ]; > then > # The next line logs that the Primary=20 > WAN link has failed and we will switch over to Secondary WAN. > echo "The $PRIMARY_ISP link has=20 > failed. Switching over to the $SECONDARY_ISP link..." > # The next lines delete the current=20 > Primary WAN gateway, switch to Secondary WAN and show the routing=20 > table for confirmation. > route delete default gw gateway > route add default gw 1.1.1.1 > echo "Gateway switched from=20 > $PRIMARY_ISP to $SECONDARY_ISP. Following is the routing table:" > route > # The next command sends me an email=20 > to alert me that the Primary WAN link is down and we are switching=20 > over to Secondary WAN. > echo > sendEmail -f $MAIL_FROM -t $MAIL_TO -u=20 > "Gateway Change Alert" -m "The $PRIMARY_ISP link has failed. =20 > Switching over to $SECONDARY_ISP link." -s $MAIL_SERVER:$MAIL_PORT -xu=20 > $MAIL_USER -xp $MAIL_PWD > else > # Now, again knowing that we failed=20 > over Primary WAN, if the default gateway is Secondary WAN, we don't do anyt= hing. > echo "$SECONDARY_ISP is the current=20 > active link. $PRIMARY_ISP is still down." > # Since there is nothing we need to=20 > do, we next exit the shell script. > exit > fi > # The next section is based on if the Primary WAN is operational (i= .e. > less than 5 lost pings) > else > # Knowing that the Primary WAN link is viable, we now=20 > evaluate if we are using the Secondary WAN link (1.1.1.1 is the=20 > gateway when using Secondary WAN). > if [ "$gateway" =3D "1.1.1.1" ]; > then > # The next line logs that the Primary=20 > WAN is operational and we will switch back to Primary WAN from Secondary WA= N. > echo "The $PRIMARY_ISP link is now=20 > operational. Switching over to the $PRIMARY_ISP link." > # The next lines delete the current=20 > Secondary WAN gateway, switch to Primary WAN and show the routing=20 > table for verification. > route delete default gw 1.1.1.1 > route add default gw gateway > echo "Gateway switched from=20 > $SECONDARY_ISP to $PRIMARY_ISP. Following is the routing table:" > route > # The next command sends me an email=20 > to alert me that the Primary WAN link is up and we are switching back=20 > to the Primary WAN. > echo > sendEmail -f $MAIL_FROM -t $MAIL_TO -u=20 > "Gateway Change Alert" -m "The $PRIMARY_ISP link is now operational. > Switching back to the $PRIMARY_ISP link." -s $MAIL_SERVER:$MAIL_PORT=20 > -xu $MAIL_USER -xp $MAIL_PWD > else > # Now, again knowing that the Primary=20 > WAN is operational, if the default gateway is already the cable mode,=20 > then we don't do anything. > echo "The $PRIMARY_ISP link is=20 > operational and $PRIMARY_ISP is the current active link." > fi > fi > # We end the logging with the same date/time stamp to indicate that=20 > this is the end of this iteration of the scrupt run. > echo "=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D"$datetimestamp"=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D" > =20 > Finally, the following is the addition to fcrontab for the 5-minute=20 > interval run of the script: > =20 > # Check the gateway connection every 15 minutes, if down, switch=20 > gateway to Verizon backup > */5 * * * * /root/check-gateway.sh Not sure if this is "often enough", but probably better than nothing :) Do you have this in a git branch or something? It is a bit hard to read the c= ode in this email. > The script is currently stored in /root/, but that is just in my=20 > implementation while I am ignorant of the proper location of such a script. /usr/sbin probably. > So, that's it! I'm open to thoughts or suggestions, but I'm currently=20 > at my limit of capability. I'd love to see this get added as an option=20 > in the initial setup, or packaged into an addon. I've seen numerous=20 > requests for this kind of feature, so I did this to get something=20 > functional in place. If there are any resources that can help me=20 > develop this into an official add-on, I'd love to see it so this can be ava= ilable to the community. This would probably be a part of the core package since it is modifying the n= etwork scripts. Hope my feedback helps a little and we can work on this. Best, -Michael > =20 > Thank you! > =20 > Jeff Russell --===============0869832792028812353==--