From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: Stop unbound-dhcp-leases-bridge from continually restarting unbound Date: Tue, 26 Mar 2024 10:44:38 +0000 Message-ID: <220FEAD1-86A0-4D0E-9D50-C69F0D92F8A6@ipfire.org> In-Reply-To: <21c9ef28-5340-4712-adde-4def514038f4@howitts.co.uk> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6776982501287799780==" List-Id: --===============6776982501287799780== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 25 Mar 2024, at 11:46, Nick Howitt wrote: >=20 >=20 >=20 > On 25/03/2024 11:21, Michael Tremer wrote: >> Hello, >>> On 24 Mar 2024, at 15:25, Nick Howitt wrote: >>>=20 >>> Hi all, >>>=20 >>> Please bear with me as I am new to IPFire and not at all used to this sty= le of development, so any guidance would be appreciated. >> No worries, feel free to ask any questions that you have=E2=80=A6 There is= documentation around all of this here: >> https://www.ipfire.org/docs/devel >> Especially here: >> https://www.ipfire.org/docs/devel/submit-patches >>> There is a bug, https://bugzilla.ipfire.org/show_bug.cgi?id=3D13254, wher= e unbound-dhcp-leases-bridge restarts unbound a lot when it is totally unnece= ssary. It appears to be because when a lease is renewed, the script gets trig= gered, created a new /etc/unbound/dhcp-leases.conf then restarts Unbound to r= ead the file. Generally this seems to be unnecessary as, with a renewed lease= , the old and new /etc/unbound/dhcp-leases.conf files are the same. With 5 le= ases in that file (one for a machine not active) I am getting 3-4 restarts an= hour of Unbound when I have a min/max lease time of 60/120min. >> I am still not entirely sure why it is such an issue to reload Unbound tha= t often, but I am happy to avoid doing this whenever possible. > I am seeing odd pauses in browsing and I was wondering if it was related to= having to wait on DNS lookups as Unbound was reloading. That is when I bumpe= d into the bugzilla report and noticed others possibly experiencing something= similar. Hmm, the setup where I tested this initially and which was the reason for imp= lementing some changes has about 2000 clients in a day. There is a lot of act= ivity and we would noticed any problems if Unbound went to sleep for a little= while every now and then. They have a reload pretty much every few seconds. >>> Looking at the code, it is fairly easy to fix. The current code creates a= temp file then copies it into place then restarts unbound. All I am doing is= doing a file comparison before the copy and skipping the restart if the file= s are the same. >> Yeah, I think this is a workable solution to mitigate the problem slightly. >> Note that Jon has sent a couple of ideas a while back solving this problem= by removing the bridge and running shell commands instead, but I did not hav= e the time to look into how robust that solution would be. With the bridge, w= e have at least a certain amount of confidence. > I like what Jon has done and want to evaluate it. Also, as it is in Bash I = have a better chance of understanding all of it. I would consider my solution= as a short term sticking plaster and Jon's as the proper way to go longer te= rm. There is really to way around modifying Unbounds configuration. If we can rea= lly rule out that there is no problems when it=E2=80=99s reloading I don=E2= =80=99t think we need Jon=E2=80=99s approach. If we can prove that there is a= small outage, then we do. >>> There were a couple of gotchas because setting the file attributes and co= pying the file were done inside the "with" block for generating the temporary= file. This meant a file comparison always returned False as the temp file wa= s still open and so never the same is the old file. >>>=20 >>> I moved those two statements outside the "with". This forced me to change= the fchmod to chmod. >> In theory you could have left it there, because even if the file is being = removed later, there is no harm in changing permissions before. > Agreed. It just didn't seem to be the right place inside the with statement= . It also meant using a file descriptor rather than a file name which I find = harder to read/understand as I am not a programmer, just a hacker. As I was i= n that area, I thought it a good opportunity to change it. >>> It could be argued that the file copy should not be done if the files are= not different, but it would take an extra "if" and you'd have to remember to= delete the temp file. If required, I can make that change. >> I think I can live with that. This while thing is very far from efficient = anyways and serves as some glue between Unbound and ISC DHCP. The latter will= definitely go soon as it has become EOL and whenever we make that transition= we will have to come up with something new which hopefully does not have the= shortcomings that the bridge has. >>> Also, one small thing I noticed once is that the old and new dhcp-leases.= conf files could occasionally contain the same leases but in a different orde= r. I have been unable to reproduce, but to sidestep it, instead of stepping t= hrough the leases variable directly, I sort it and step through that. It shou= ld make the resulting file completely deterministic and make the file compari= son more effective. >> I think we are evaluating where to do the maths. I always considered it pe= rfectly safe to just throw the new file into the Unbound configuration, have = Unbound read it again and then everything is good. I would also argue that th= at works without any problems even in places where I have 2000 active leases = with a rather short lease time. However, there is the log message=E2=80=A6 >> Sorting creates some extra work, but I am sure that this is a lot cheaper = than reloading Unbound. So once again, I am happy to go down this route. >>>=20 >>> My patch is: >>>=20 >>> From 73873d4944944a2f02317a73074a6894726f36f7 Mon Sep 17 00:00:00 2001 >>> From: Nick Howitt >>> Date: Sun, 24 Mar 2024 15:17:19 +0000 >>> Subject: [PATCH] Stop unbound-dhcp-leases-bridge from restarting unbound = every >>> time it write dhcp-leases.conf as it is very often unchanged and does not >>> require a restart. >>>=20 >>> --- >>> config/unbound/unbound-dhcp-leases-bridge | 28 +++++++++++++++-------- >>> 1 file changed, 19 insertions(+), 9 deletions(-) >>>=20 >>> diff --git a/config/unbound/unbound-dhcp-leases-bridge b/config/unbound/u= nbound-dhcp-leases-bridge >>> index e9f022aff..d22772066 100644 >>> --- a/config/unbound/unbound-dhcp-leases-bridge >>> +++ b/config/unbound/unbound-dhcp-leases-bridge >>> @@ -22,6 +22,7 @@ >>> import argparse >>> import datetime >>> import daemon >>> +import filecmp >> This module does however not seem very fit for purpose for me. It tries to= check the metadata of the file using stat() calls which will not do at all w= hat we want them to do and it will cache any results which is also not at all= helpful. >> I would suggest to simply implement this function yourself like so: >> def compare(f1, f2): >> buffersize =3D 16 * 1024 >> # Reset file handles >> f1.seek(0) >> f2.seek(0) >> while True: >> b1 =3D f1.read(buffersize) >> b2 =3D f2.read(buffersize) >> if not b1 =3D=3D b2: >> return False >> elif not b1 or not b2: >> break >> return True >> You can pass the file descriptors to each of the files and since we are no= w changing a couple of things, former assumptions can be dropped: >> * Unbound used to reload the configuration file, but there could have been= the case of concurrent reads. Therefore the entire atomic file replacement c= an be removed. >> * With that, you could remove the delete=3DFalse attribute from the tempor= ary file. You already have the target file open and you can just copy the con= tent from the temporary file. Because of the former assumption I would consid= er that safe. >> That way, you will be able to avoid copying anything if the files don=E2= =80=99t match and you will not have to introduce any extra code to remove the= temporary file. And the chmod statement can go, too :) > If you ise filecmp.cmp with "shallow=3DFalse" it does not use stat() and do= es a proper file comparison. It that is OK with you, I'd like to keep using f= ilecmp as it is a one-liner. But then you will have to live with the cache storing useless information. I = find the =E2=80=9Cclear_cache()=E2=80=9D statement really quite unsettling. >>> import functools >>> import ipaddress >>> import logging >>> @@ -516,26 +517,35 @@ class UnboundConfigWriter(object): >>>=20 >>> def update_dhcp_leases(self, leases): >>> # Write out all leases >>> - self.write_dhcp_leases(leases) >>> + if self.write_dhcp_leases(leases): >>>=20 >>> - log.debug("Reloading Unbound...") >>> + log.debug("Reloading Unbound...") >>>=20 >>> - # Reload the configuration without dropping the cache >>> - self._control("reload_keep_cache") >>> + # Reload the configuration without dropping the cache >>> + self._control("reload_keep_cache") >>> + >>> + else: >>> + >>> + log.debug("Not reloading Unbound. Leases not changed.") >>>=20 >>> def write_dhcp_leases(self, leases): >>> log.debug("Writing DHCP leases...") >>>=20 >>> with tempfile.NamedTemporaryFile(mode=3D"w", delete=3DFalse) as f: >>> - for l in leases: >>> + for l in sorted(leases): >>> for rr in l.rrset: >>> f.write("local-data: \"%s\"\n" % " ".join(rr)) >>>=20 >>> - # Make file readable for everyone >>> - os.fchmod(f.fileno(), stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|stat.S_IRO= TH) >>> + filecmp.clear_cache() >>> + RequireUnboundReload =3D not filecmp.cmp(f.name, self.path, shallow=3DF= alse) >> Oh and please use snake case for variables. It is Python after all. > > Never heard of the expression, but then I'm a hacker, not a programmer! I'l= l change it. It is just that I like consistency. Yes really. I know that there is a lot of= code that is anything but. It just helps with reading when your eyes can lea= rn a pattern. >>> + # Make file readable for everyone >>> + os.chmod(f.name, stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|stat.S_IROTH) >>> + >>> + # Move the file to its destination >>> + os.rename(f.name, self.path) >>>=20 >>> - # Move the file to its destination >>> - os.rename(f.name, self.path) >>> + return RequireUnboundReload >>>=20 >>> def _control(self, *args): >>> command =3D ["unbound-control"] >>> --=20 >>> 2.43.0 >>>=20 >>> I have this locally on a branch on my firewall. If it is OK, do I need so= me sort of rights to push my branch and then how I get it merged? >> I can create your own Git repository for you to backup/manage your own cha= nges, but we do not allow anyone apart from two people to push into the maste= r repository. > I only have basic knowledge of Git and am not sure how much I will do as co= ntribs. I am happy to work locally. If I had my own git repo, would I then ha= ve to create a merge request as well as submit by email? No, we don=E2=80=99t send merge requests here. Only very rarely. Maybe less t= han once a year. We simply email patch files to the mailing list for a paper trail and because= it is the best way to put comments on the code :) -Michael >> I am also very happy to see your first code contribution to IPFire! >> -Michael >>>=20 >>> Regards, >>>=20 >>> Nick --===============6776982501287799780==--