From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [PATCH] Stop unbound-dhcp-leases-bridge from continually restarting unbound Date: Fri, 26 Apr 2024 17:12:08 +0200 Message-ID: <44A5C7CC-564C-463D-86EC-60A2F092E600@ipfire.org> In-Reply-To: <1309030d-35bf-4f7c-bb01-99d07d6ba0a4@howitts.co.uk> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1241252988555480848==" List-Id: --===============1241252988555480848== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Nick, > On 25 Apr 2024, at 13:43, Nick Howitt wrote: >=20 > Please can I bump this again to request a review of my patch submitted on 2= 8th March? Thanks, Nick. It is a little bit difficult to test your proposed changes if testers have to= copy and paste them out of an email. I have just submitted a patch set to this list that breaks down all the chang= es that you proposed and I cleaned up the code so that we can use the filecmp= function without any extra steps. We can simply jump out of the write functi= on and return False to prevent Unbound being reloaded. We will also avoid replacing the file all the time as that causes lots of unn= ecessary disk I/O. I am not saying it makes a significant difference, but fla= sh has a limited write count that we are still wasting. Please review and you if you are happy, please add the appropriate Git tag. Best, -Michael > On 10/04/2024 12:04, Nick Howitt wrote: >> Bump >>=20 >> On 28/03/2024 17:25, Nick Howitt wrote: >>> Next version.=20 >>>=20 >>> On 28/03/2024 10:41, Michael Tremer wrote: >>>>=20 >>>> Hello,=20 >>>>=20 >>>>> On 27 Mar 2024, at 15:28, Nick Howitt wrote:=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>> On 26/03/2024 11:10, Michael Tremer wrote: >>>>>>=20 >>>>>>=20 >>>>>>> On 25 Mar 2024, at 13:11, Nick Howitt wrote:=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> On 25/03/2024 11:31, Michael Tremer wrote: >>>>>>>> Hello,=20 >>>>>>>> I don=E2=80=99t think that the risk is very high, but I would accoun= t for this problem.=20 >>>>>>>> With my proposed changes from my previous email this should not be a= large problem at all. When opening /etc/unbound/dhcp-leases.conf with mode = =E2=80=9Cr+=E2=80=9D you can catch the Exception and skip the comparison and = open the file again with =E2=80=9Cw=E2=80=9D. >>>>>>>>> On 25 Mar 2024, at 10:10, Nick Howitt wrote:= =20 >>>>>>>>>=20 >>>>>>>>> If there is a risk that the /etc/unbound/dhcp-leases.conf does not = exist, for example on a clean install (I don't know), the filecmp.cmp line ne= eds to be wrapped in a try...except to stop it blowing up:=20 >>>>>>>>>=20 >>>>>>>>> try:=20 >>>>>>>>> RequireUnboundReload =3D not filecmp.cmp(f.name, self.path, shallow= =3DFalse)=20 >>>>>>>>> except:=20 >>>>>>>>> RequireUnboundReload =3D True >>>>>>>> I generally don=E2=80=99t like blanco exception catches. You will ne= ver know about the daemons that you are hiding. >>>>>>>>> This will force the file to be to be generated, even if it is only = copying in a blank temp file, and unbound will restart. It will only ever hap= pen once.=20 >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> On 24/03/2024 15:49, Nick Howitt wrote: >>>>>>>>>> Hi all,=20 >>>>>>>>>> **** This is a repost of an email incorrectly submitted. ****=20 >>>>>>>>>> Please bear with me as I am new to IPFire and not at all used to t= his style of development, so any guidance would be appreciated.=20 >>>>>>>>>> There is a bug, https://bugzilla.ipfire.org/show_bug.cgi?id=3D1325= 4, where unbound-dhcp-leases-bridge restarts unbound a lot when it is totally= unnecessary. It appears to be because when a lease is renewed, the script ge= ts triggered, created a new /etc/unbound/dhcp-leases.conf then restarts Unbou= nd to read the file. Generally this seems to be unnecessary as, with a renewe= d lease, the old and new /etc/unbound/dhcp-leases.conf files are the same. Wi= th 5 leases in that file (one for a machine not active) I am getting 3-4 rest= arts an hour of Unbound when I have a min/max lease time of 60/120min.=20 >>>>>>>>>> Looking at the code, it is fairly easy to fix. The current code cr= eates a temp file then copies it into place then restarts unbound. All I am d= oing is doing a file comparison before the copy and skipping the restart if t= he files are the same.=20 >>>>>>>>>> There were a couple of gotchas because setting the file attributes= and copying the file were done inside the "with" block for generating the te= mporary file. This meant a file comparison always returned False as the temp = file was still open and so never the same is the old file.=20 >>>>>>>>>> I moved those two statements outside the "with". This forced me to= change the fchmod to chmod.=20 >>>>>>>>>> It could be argued that the file copy should not be done if the fi= les are not different, but it would take an extra "if" and you'd have to reme= mber to delete the temp file. If required, I can make that change.=20 >>>>>>>>>> Also, one small thing I noticed once is that the old and new dhcp-= leases.conf files could occasionally contain the same leases but in a differe= nt order. I have been unable to reproduce, but to sidestep it, instead of ste= pping through the leases variable directly, I sort it and step through that. = It should make the resulting file completely deterministic and make the file = comparison more effective.=20 >>>>>>>>>> My patch is:=20 >>>>>>>>>> From 73873d4944944a2f02317a73074a6894726f36f7 Mon Sep 17 00:00:00= 2001=20 >>>>>>>>>> From: Nick Howitt =20 >>>>>>>>>> Date: Sun, 24 Mar 2024 15:17:19 +0000=20 >>>>>>>>>> Subject: [PATCH] Stop unbound-dhcp-leases-bridge from restarting u= nbound every=20 >>>>>>>>>> time it write dhcp-leases.conf as it is very often unchanged and= does not=20 >>>>>>>>>> require a restart.=20 >>>>>>>>>> ---=20 >>>>>>>>>> config/unbound/unbound-dhcp-leases-bridge | 28 +++++++++++++++--= ------=20 >>>>>>>>>> 1 file changed, 19 insertions(+), 9 deletions(-)=20 >>>>>>>>>> diff --git a/config/unbound/unbound-dhcp-leases-bridge b/config/un= bound/unbound-dhcp-leases-bridge=20 >>>>>>>>>> index e9f022aff..d22772066 100644=20 >>>>>>>>>> --- a/config/unbound/unbound-dhcp-leases-bridge=20 >>>>>>>>>> +++ b/config/unbound/unbound-dhcp-leases-bridge=20 >>>>>>>>>> @@ -22,6 +22,7 @@=20 >>>>>>>>>> import argparse=20 >>>>>>>>>> import datetime=20 >>>>>>>>>> import daemon=20 >>>>>>>>>> +import filecmp=20 >>>>>>>>>> import functools=20 >>>>>>>>>> import ipaddress=20 >>>>>>>>>> import logging=20 >>>>>>>>>> @@ -516,26 +517,35 @@ class UnboundConfigWriter(object):=20 >>>>>>>>>> def update_dhcp_leases(self, leases):=20 >>>>>>>>>> # Write out all leases=20 >>>>>>>>>> - self.write_dhcp_leases(leases)=20 >>>>>>>>>> + if self.write_dhcp_leases(leases):=20 >>>>>>>>>> - log.debug("Reloading Unbound...")=20 >>>>>>>>>> + log.debug("Reloading Unbound...")=20 >>>>>>>>>> - # Reload the configuration without dropping the cache=20 >>>>>>>>>> - self._control("reload_keep_cache")=20 >>>>>>>>>> + # Reload the configuration without dropping the cache= =20 >>>>>>>>>> + self._control("reload_keep_cache")=20 >>>>>>>>>> +=20 >>>>>>>>>> + else:=20 >>>>>>>>>> +=20 >>>>>>>>>> + log.debug("Not reloading Unbound. Leases not changed.= ")=20 >>>>>>>>>> def write_dhcp_leases(self, leases):=20 >>>>>>>>>> log.debug("Writing DHCP leases...")=20 >>>>>>>>>> with tempfile.NamedTemporaryFile(mode=3D"w", delete=3DFa= lse) as f:=20 >>>>>>>>>> - for l in leases:=20 >>>>>>>>>> + for l in sorted(leases):=20 >>>>>>>>>> for rr in l.rrset:=20 >>>>>>>>>> f.write("local-data: \"%s\"\n" % " ".join(rr= ))=20 >>>>>>>>>> - # Make file readable for everyone=20 >>>>>>>>>> - os.fchmod(f.fileno(), stat.S_IRUSR|stat.S_IWUSR|stat.= S_IRGRP|stat.S_IROTH)=20 >>>>>>>>>> + filecmp.clear_cache()=20 >>>>>>>>>> + RequireUnboundReload =3D not filecmp.cmp(f.name, self.pat= h, shallow=3DFalse)=20 >>>>>>>>>> +=20 >>>>>>>>>> + # Make file readable for everyone=20 >>>>>>>>>> + os.chmod(f.name, stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|s= tat.S_IROTH)=20 >>>>>>>>>> +=20 >>>>>>>>>> + # Move the file to its destination=20 >>>>>>>>>> + os.rename(f.name, self.path)=20 >>>>>>>>>> - # Move the file to its destination=20 >>>>>>>>>> - os.rename(f.name, self.path)=20 >>>>>>>>>> + return RequireUnboundReload=20 >>>>>>>>>> def _control(self, *args):=20 >>>>>>>>>> command =3D ["unbound-control"] >>>>>>> Resubmitting the patch.=20 >>>>>>>=20 >>>>>>> Following my previous reply, I would like to stick with filecmp as I = am using it with "shallow=3DFalse" to avoid doing the comparison just with st= at() calls. Note that I also flush the filecmp cache before using the command= so it is running with the latest file data. Talking about file descriptors, = atomic file replacement etc., scares me. >>>>>> File descriptors are great. They avoid race-conditions and all sorts o= f things. >>>>> But I have never used them so I don't intend to go that way. Too much f= or me to get my brain round and a limited ability to test. >>>>=20 >>>> Well, I pretty much gave you the entire solution=E2=80=A6 There was noth= ing left to do apart from copying and pasting the snippets.=20 >>>>=20 >>> Ok I've tried to use your version but struggled to get it going. If I use= d it as was and called compare(file1_name, file2_name) it failed as "compare"= was not defined.=20 >>> I could call it using self.compare but then it failed with "TypeError: Un= boundConfigWriter.compare() takes 2 positional arguments but 3 were given". I= changed the function to compare(self, f1, f2).=20 >>> Then it failed with "AttributeError: 'str' object has no attribute 'seek'= ", so I changed it again to accept file names then open the files before doin= g the seeks.=20 >>> I really hope it is OK as I am out of my depth when defining functions wi= th "self" as an argument. Please can you check it? If it is not OK, can I rev= ert to filecmp?=20 >>>=20 >>> I have made one futher change, the sorted(leases) line was not working an= d not sorting the object ever, and I saw lines changing places in the resulti= ng leases file. I have changed it to "sorted(leases, key=3Dlambda x: x.ipaddr= )". It definitely sorts this time.=20 >>>=20 >>>>>> I don=E2=80=99t think that module is the best choice really if you hav= e to hack around and flush the cache. Why would it even cache things in the f= irst place? I don=E2=80=99t see where the cache helps. >>>>> From the filecmp.clear_cache documentation:=20 >>>>>=20 >>>>> Clear the filecmp cache. This may be useful if a file is compared so= =20 >>>>> quickly after it is modified that it is within the mtime resolution = >>>>> of the underlying filesystem.=20 >>>>>=20 >>>>> Now this makes me believe that the clear_cache is only relevant when us= ing shallow=3DTrue, which I'm not and I don't know how to check the underlyin= g code. It is used to make sure the stat() info is updated, but we are not us= ing the stat() info. I guess we could as easily add a one second sleep instea= d as that is way longer than the mtime resolution. >>>>=20 >>>> The function is here and it is rather simple: https://github.com/python/= cpython/blob/3.12/Lib/filecmp.py#L30=20 >>>>=20 >>>> It will always stat() both files and compares three items in case shallo= w is False: The type of the file, the size and mtime. If the file is not a re= gular file, the function will exist rather early. So in our case, that is all= work the function is doing for thing. It boils down to something very simila= r to what I suggested to use to compare the files.=20 >>>>=20 >>>> On top of that, there is a cache which will always be used but will neve= r return a match for us because the temporary file at least will always have = another mtime.=20 >>>>=20 >>>> You could clear the cache after every call, but I would rather not have = our code messy and care about implementation details of some Python module.=20 >>>>=20 >>>>> The module is chosen because all the references I read about comparing = files preferred this one-liner for comparing files, although other methods we= re given and there were lots of discussions about large files (GB+) because i= f you chomp them it may be much quicker to say they are not the same when you= find the first block that does not match. Having said this, I don't know how= filecmp works, but we only have a small file. At the same time, I am not a r= egular programmer so simple one-liners are good for me. >>>>=20 >>>> You would always want to read both files block for block and compare the= blocks. There is no way to implement this significantly faster.=20 >>>>=20 >>>>>>> I have modified the earlier try/except I proposed to catch only FileN= otFoundError, and I have changed CamelCase to snake_case.=20 >>>>>>>=20 >>>>>>> ---=20 >>>>>>> config/unbound/unbound-dhcp-leases-bridge | 31 ++++++++++++++++------= -=20 >>>>>>> 1 file changed, 22 insertions(+), 9 deletions(-)=20 >>>>>>>=20 >>>>>>> diff --git a/config/unbound/unbound-dhcp-leases-bridge b/config/unbou= nd/unbound-dhcp-leases-bridge=20 >>>>>>> index e9f022aff..a9ec8e1e2 100644=20 >>>>>>> --- a/config/unbound/unbound-dhcp-leases-bridge=20 >>>>>>> +++ b/config/unbound/unbound-dhcp-leases-bridge=20 >>>>>>> @@ -22,6 +22,7 @@=20 >>>>>>> import argparse=20 >>>>>>> import datetime=20 >>>>>>> import daemon=20 >>>>>>> +import filecmp=20 >>>>>>> import functools=20 >>>>>>> import ipaddress=20 >>>>>>> import logging=20 >>>>>>> @@ -516,26 +517,38 @@ class UnboundConfigWriter(object):=20 >>>>>>>=20 >>>>>>> def update_dhcp_leases(self, leases):=20 >>>>>>> # Write out all leases=20 >>>>>>> - self.write_dhcp_leases(leases)=20 >>>>>>> + if self.write_dhcp_leases(leases):=20 >>>>>>>=20 >>>>>>> - log.debug("Reloading Unbound...")=20 >>>>>>> + log.debug("Reloading Unbound...")=20 >>>>>>>=20 >>>>>>> - # Reload the configuration without dropping the cache=20 >>>>>>> - self._control("reload_keep_cache")=20 >>>>>>> + # Reload the configuration without dropping the cache=20 >>>>>>> + self._control("reload_keep_cache")=20 >>>>>>> +=20 >>>>>>> + else:=20 >>>>>>> +=20 >>>>>>> + log.debug("Not reloading Unbound. Leases not changed.")=20 >>>>>>>=20 >>>>>>> def write_dhcp_leases(self, leases):=20 >>>>>>> log.debug("Writing DHCP leases...")=20 >>>>>>>=20 >>>>>>> with tempfile.NamedTemporaryFile(mode=3D"w", delete=3DFalse) as f:=20 >>>>>>> - for l in leases:=20 >>>>>>> + for l in sorted(leases):=20 >>>>>>> for rr in l.rrset:=20 >>>>>>> f.write("local-data: \"%s\"\n" % " ".join(rr))=20 >>>>>>>=20 >>>>>>> - # Make file readable for everyone=20 >>>>>>> - os.fchmod(f.fileno(), stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|stat.S= _IROTH)=20 >>>>>>> + filecmp.clear_cache()=20 >>>>>>> + try:=20 >>>>>>> + require_unbound_reload =3D not filecmp.cmp(f.name, self.path, shall= ow=3DFalse)=20 >>>>>>> + except FileNotFoundError:=20 >>>>>>> + require_unbound_reload =3D True >>>>>> If you want to go this route, you will have to flush the temporary fil= e first. Otherwise you might have some data left in the buffer that has not b= een written (because the file is still open). You will then compare incomplet= e output. This would not become a problem if you used the file descriptors. >>>>> The temp file is closed at the point of doing the filecmp, and the file= cmp cache has been flushed. How do I flush the temporary file further. Readin= g up, is it just having the last statement in the with block as f.flush() wit= h the same indent as "for l in sorted(leases):" >>>>=20 >>>> If you have left the with block you will already have flushed the buffer= s and closed the file. It looks like I misread the indentation in the patch t= hen.=20 >>>>=20 >>>>>>=20 >>>>>> Catching FileNotFound would work for me. >>>>> Good >>>>>>=20 >>>>>>> +=20 >>>>>>> + # Make file readable for everyone=20 >>>>>>> + os.chmod(f.name, stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|stat.S_IROT= H)=20 >>>>>>> +=20 >>>>>>> + # Move the file to its destination=20 >>>>>>> + os.rename(f.name, self.path)=20 >>>>>>>=20 >>>>>>> - # Move the file to its destination=20 >>>>>>> - os.rename(f.name, self.path)=20 >>>>>>> + return require_unbound_reload=20 >>>>>>>=20 >>>>>>> def _control(self, *args):=20 >>>>>>> command =3D ["unbound-control"]=20 >>>>>>> --=20 >>>>>>> 2.43.0=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> Nick >>>>=20 >>>>=20 >>> ---=20 >>> config/unbound/unbound-dhcp-leases-bridge | 50 +++++++++++++++++++----=20 >>> 1 file changed, 41 insertions(+), 9 deletions(-)=20 >>>=20 >>> diff --git a/config/unbound/unbound-dhcp-leases-bridge b/config/unbound/u= nbound-dhcp-leases-bridge=20 >>> index e9f022aff..80d9523ea 100644=20 >>> --- a/config/unbound/unbound-dhcp-leases-bridge=20 >>> +++ b/config/unbound/unbound-dhcp-leases-bridge=20 >>> @@ -516,26 +516,37 @@ class UnboundConfigWriter(object):=20 >>>=20 >>> def update_dhcp_leases(self, leases):=20 >>> # Write out all leases=20 >>> - self.write_dhcp_leases(leases)=20 >>> + if self.write_dhcp_leases(leases):=20 >>>=20 >>> - log.debug("Reloading Unbound...")=20 >>> + log.debug("Reloading Unbound...")=20 >>>=20 >>> - # Reload the configuration without dropping the cache=20 >>> - self._control("reload_keep_cache")=20 >>> + # Reload the configuration without dropping the cache=20 >>> + self._control("reload_keep_cache")=20 >>> +=20 >>> + else:=20 >>> +=20 >>> + log.debug("Not reloading Unbound. Leases not changed.")=20 >>>=20 >>> def write_dhcp_leases(self, leases):=20 >>> log.debug("Writing DHCP leases...")=20 >>>=20 >>> with tempfile.NamedTemporaryFile(mode=3D"w", delete=3DFalse) as = f:=20 >>> - for l in leases:=20 >>> + for l in sorted(leases, key=3Dlambda x: x.ipaddr):=20 >>> for rr in l.rrset:=20 >>> f.write("local-data: \"%s\"\n" % " ".join(rr))=20 >>>=20 >>> - # Make file readable for everyone=20 >>> - os.fchmod(f.fileno(), stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP= |stat.S_IROTH)=20 >>> + try:=20 >>> + require_unbound_reload =3D not self.compare(f.name, self.pat= h)=20 >>> + except FileNotFoundError:=20 >>> + require_unbound_reload =3D True=20 >>> +=20 >>> + # Make file readable for everyone=20 >>> + os.chmod(f.name, stat.S_IRUSR|stat.S_IWUSR|stat.S_IRGRP|stat.S_I= ROTH)=20 >>> +=20 >>> + # Move the file to its destination=20 >>> + os.rename(f.name, self.path)=20 >>>=20 >>> - # Move the file to its destination=20 >>> - os.rename(f.name, self.path)=20 >>> + return require_unbound_reload=20 >>>=20 >>> def _control(self, *args):=20 >>> command =3D ["unbound-control"]=20 >>> @@ -551,6 +562,27 @@ class UnboundConfigWriter(object):=20 >>>=20 >>> raise e=20 >>>=20 >>> + def compare(self, if1, if2):=20 >>> + buffersize =3D 16 * 1024=20 >>> +=20 >>> + f1 =3D open(if1, "r")=20 >>> + f2 =3D open(if2, "r")=20 >>> +=20 >>> + # Reset file handles=20 >>> + f1.seek(0)=20 >>> + f2.seek(0)=20 >>> +=20 >>> + while True:=20 >>> + b1 =3D f1.read(buffersize)=20 >>> + b2 =3D f2.read(buffersize)=20 >>> +=20 >>> + if not b1 =3D=3D b2:=20 >>> + return False=20 >>> +=20 >>> + elif not b1 or not b2:=20 >>> + break=20 >>> +=20 >>> + return True=20 >>>=20 >>> if __name__ =3D=3D "__main__":=20 >>> parser =3D argparse.ArgumentParser(description=3D"Bridge for DHCP Le= ases and Unbound DNS") >>=20 >=20 --===============1241252988555480848==--