From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: Conduct sanity checks per DROP list Date: Sun, 02 Oct 2022 11:04:22 +0000 Message-ID: <397a7f14-d0a4-743d-5c5c-2f05ccb8259f@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3249445131979555030==" List-Id: --===============3249445131979555030== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, > Hello, >=20 >> On 26 Sep 2022, at 19:26, Peter M=C3=BCller w= rote: >> >> Hello Michael, >> >> better late than never: Thanks for your reply and the comments. >> >>> Hello Peter, >>> >>>> On 18 Aug 2022, at 22:42, Peter M=C3=BCller = wrote: >>>> >>>> Previously, the lack of distinction between different DROP lists caused >>>> only the last one to be persisted. >>>> >>>> Signed-off-by: Peter M=C3=BCller >>>> --- >>>> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >>>> 1 file changed, 39 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-imp= orter.in >>>> index 8d47497..e4a9ab0 100644 >>>> --- a/src/scripts/location-importer.in >>>> +++ b/src/scripts/location-importer.in >>>> @@ -1427,18 +1427,21 @@ class CLI(object): >>>> def _update_overrides_for_spamhaus_drop(self): >>>> downloader =3D location.importer.Downloader() >>>> >>>> - ip_urls =3D [ >>>> - "https://www.spamhaus.org/drop/drop.txt", >>>> - "https://www.spamhaus.org/drop/edrop.txt", >>>> - "https://www.spamhaus.org/drop/dropv6.txt" >>>> + ip_lists =3D [ >>>> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >>>> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >>>> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt= ") >>>> ] >>> >>> Would it not be better to have this as a dict? >>> >>> Or if you prefer something to iterate over, a tuple? >>> >>> Would it not be better/easier if we would use a handle like =E2=80=9CSPAM= HAUS-DROP=E2=80=9D, =E2=80=9CSPAMHAUS-EDROP=E2=80=9D, =E2=80=A6 instead of a = string with spaces? It does not matter that much to me, but since this is a m= achine-readable string, should it not be less human formatted? >> >> Hm, good point. I have changed that in the upcoming second version of the = patch. >> >>> >>>> >>>> - asn_urls =3D [ >>>> - "https://www.spamhaus.org/drop/asndrop.txt" >>>> + asn_lists =3D [ >>>> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.= txt") >>>> ] >>>> >>>> - for url in ip_urls: >>>> - # Fetch IP list >>>> + for list in ip_lists: >>>> + name =3D list[0] >>>> + url =3D list[1] >>> >>> It would be more Pythonic if you wrote it like this: >>> >>> for name, url in ip_lists: >>> ... >>> >>>> + >>>> + # Fetch IP list from given URL >>>> f =3D downloader.retrieve(url) >>>> >>>> # Split into lines >>>> @@ -1448,11 +1451,11 @@ class CLI(object): >>>> # downloads. >>>> if len(fcontent) > 10: >>>> self.db.execute(""" >>>> - DELETE FROM autnum_overrides WHERE source =3D 'Spamhaus ASN-DROP l= ist'; >>>> - DELETE FROM network_overrides WHERE source =3D 'Spamhaus DROP list= s'; >>>> - """) >>>> + DELETE FROM network_overrides WHERE source =3D '%s'; >>> >>> No need for the =E2=80=98=E2=80=99 around the %s. >> >> Yes there is, the SQL command results in an exception otherwise. >=20 > Err, you are allowing some SQL command injection here. >=20 > You do not want to use =E2=80=9C%s=E2=80=9D as a string format parameter fo= r Python strings. It is more of a placeholder for the value that has to be es= caped first - SQLite is using ? which is probably a better choice for Python. >=20 > The database binding is doing this automatically, but the arguments need to= be passed after the query. >=20 > Under no circumstances whatsoever you want to chance the query dynamically.= Never ever. >=20 > I fixed this here: https://git.ipfire.org/?p=3Dlocation/libloc.git;a=3Dcomm= itdiff;h=3Db56bf4bf065130b38968b580e0bea6db809783d8 better late than never: Many thanks for spotting and fixing this before it we= nt into anything stable or productive! Guess I'm not a good programmer after all... All the best, Peter M=C3=BCller >=20 > Best, > -Michael >=20 >> >> Second version of the patch is coming right up. >> >> Thanks, and best regards, >> Peter M=C3=BCller >> >>> >>>> + """ % name, >>>> + ) >>>> else: >>>> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored= " % url) >>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, ur= l)) >>>> continue >>>> >>>> # Iterate through every line, filter comments and add remaining netwo= rks to >>>> @@ -1475,8 +1478,8 @@ class CLI(object): >>>> >>>> # Sanitize parsed networks... >>>> if not self._check_parsed_network(network): >>>> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s= : %s" % \ >>>> - (url, network)) >>>> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >>>> + (name, url, network)) >>>> continue >>>> >>>> # Conduct SQL statement... >>>> @@ -1488,14 +1491,31 @@ class CLI(object): >>>> ) VALUES (%s, %s, %s) >>>> ON CONFLICT (network) DO UPDATE SET is_drop =3D True""", >>>> "%s" % network, >>>> - "Spamhaus DROP lists", >>>> + name, >>>> True >>>> ) >>>> >>>> - for url in asn_urls: >>>> + for list in asn_lists: >>>> + name =3D list[0] >>>> + url =3D list[1] >>>> + >>> >>> See above. >>> >>>> # Fetch URL >>>> f =3D downloader.retrieve(url) >>>> >>>> + # Split into lines >>>> + fcontent =3D f.readlines() >>>> + >>>> + # Conduct a very basic sanity check to rule out CDN issues causing b= ogus DROP >>>> + # downloads. >>>> + if len(fcontent) > 10: >>>> + self.db.execute(""" >>>> + DELETE FROM autnum_overrides WHERE source =3D '%s'; >>>> + """ % name, >>>> + ) >>>> + else: >>>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, ur= l)) >>>> + continue >>> >>> Is there a reason why this DELETE statement is not part of the transactio= n? If something goes wrong later on, you would have lost the previous data. >>> >>> Would it also not be programmatically better to start the import and then= count how many lines you are importing? >>> >>> The =E2=80=9Cf.readlines()=E2=80=9D call is something that I would consid= er a bit ugly. >>> >>>> + >>>> # Iterate through every line, filter comments and add remaining ASNs = to >>>> # the override table in case they are valid... >>>> with self.db.transaction(): >>>> @@ -1518,8 +1538,8 @@ class CLI(object): >>>> >>>> # Filter invalid ASNs... >>>> if not self._check_parsed_asn(asn): >>>> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s= " % \ >>>> - (url, asn)) >>>> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >>>> + (name, url, asn)) >>>> continue >>>> >>>> # Conduct SQL statement... >>>> @@ -1531,7 +1551,7 @@ class CLI(object): >>>> ) VALUES (%s, %s, %s) >>>> ON CONFLICT (number) DO UPDATE SET is_drop =3D True""", >>>> "%s" % asn, >>>> - "Spamhaus ASN-DROP list", >>>> + name, >>>> True >>>> ) >>>> >>>> --=20 >>>> 2.35.3 >=20 --===============3249445131979555030==--