From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: Conduct sanity checks per DROP list Date: Tue, 27 Sep 2022 10:22:36 +0100 Message-ID: In-Reply-To: <87682839-dc30-ac0e-6bb1-ec705bdc793e@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3661186158044965511==" List-Id: --===============3661186158044965511== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 26 Sep 2022, at 19:26, Peter M=C3=BCller wr= ote: >=20 > Hello Michael, >=20 > better late than never: Thanks for your reply and the comments. >=20 >> Hello Peter, >>=20 >>> On 18 Aug 2022, at 22:42, Peter M=C3=BCller = wrote: >>>=20 >>> Previously, the lack of distinction between different DROP lists caused >>> only the last one to be persisted. >>>=20 >>> Signed-off-by: Peter M=C3=BCller >>> --- >>> src/scripts/location-importer.in | 58 +++++++++++++++++++++----------- >>> 1 file changed, 39 insertions(+), 19 deletions(-) >>>=20 >>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-impo= rter.in >>> index 8d47497..e4a9ab0 100644 >>> --- a/src/scripts/location-importer.in >>> +++ b/src/scripts/location-importer.in >>> @@ -1427,18 +1427,21 @@ class CLI(object): >>> def _update_overrides_for_spamhaus_drop(self): >>> downloader =3D location.importer.Downloader() >>>=20 >>> - ip_urls =3D [ >>> - "https://www.spamhaus.org/drop/drop.txt", >>> - "https://www.spamhaus.org/drop/edrop.txt", >>> - "https://www.spamhaus.org/drop/dropv6.txt" >>> + ip_lists =3D [ >>> + ("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"), >>> + ("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"), >>> + ("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt") >>> ] >>=20 >> Would it not be better to have this as a dict? >>=20 >> Or if you prefer something to iterate over, a tuple? >>=20 >> Would it not be better/easier if we would use a handle like =E2=80=9CSPAMH= AUS-DROP=E2=80=9D, =E2=80=9CSPAMHAUS-EDROP=E2=80=9D, =E2=80=A6 instead of a s= tring with spaces? It does not matter that much to me, but since this is a ma= chine-readable string, should it not be less human formatted? >=20 > Hm, good point. I have changed that in the upcoming second version of the p= atch. >=20 >>=20 >>>=20 >>> - asn_urls =3D [ >>> - "https://www.spamhaus.org/drop/asndrop.txt" >>> + asn_lists =3D [ >>> + ("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.t= xt") >>> ] >>>=20 >>> - for url in ip_urls: >>> - # Fetch IP list >>> + for list in ip_lists: >>> + name =3D list[0] >>> + url =3D list[1] >>=20 >> It would be more Pythonic if you wrote it like this: >>=20 >> for name, url in ip_lists: >> ... >>=20 >>> + >>> + # Fetch IP list from given URL >>> f =3D downloader.retrieve(url) >>>=20 >>> # Split into lines >>> @@ -1448,11 +1451,11 @@ class CLI(object): >>> # downloads. >>> if len(fcontent) > 10: >>> self.db.execute(""" >>> - DELETE FROM autnum_overrides WHERE source =3D 'Spamhaus ASN-DROP li= st'; >>> - DELETE FROM network_overrides WHERE source =3D 'Spamhaus DROP lists= '; >>> - """) >>> + DELETE FROM network_overrides WHERE source =3D '%s'; >>=20 >> No need for the =E2=80=98=E2=80=99 around the %s. >=20 > Yes there is, the SQL command results in an exception otherwise. Err, you are allowing some SQL command injection here. You do not want to use =E2=80=9C%s=E2=80=9D as a string format parameter for = Python strings. It is more of a placeholder for the value that has to be esca= ped first - SQLite is using ? which is probably a better choice for Python. The database binding is doing this automatically, but the arguments need to b= e passed after the query. Under no circumstances whatsoever you want to chance the query dynamically. N= ever ever. I fixed this here: https://git.ipfire.org/?p=3Dlocation/libloc.git;a=3Dcommit= diff;h=3Db56bf4bf065130b38968b580e0bea6db809783d8 Best, -Michael >=20 > Second version of the patch is coming right up. >=20 > Thanks, and best regards, > Peter M=C3=BCller >=20 >>=20 >>> + """ % name, >>> + ) >>> else: >>> - log.error("Spamhaus DROP URL %s returned likely bogus file, ignored"= % url) >>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url= )) >>> continue >>>=20 >>> # Iterate through every line, filter comments and add remaining networ= ks to >>> @@ -1475,8 +1478,8 @@ class CLI(object): >>>=20 >>> # Sanitize parsed networks... >>> if not self._check_parsed_network(network): >>> - log.warning("Skipping bogus network found in Spamhaus DROP URL %s:= %s" % \ >>> - (url, network)) >>> + log.warning("Skipping bogus network found in %s (%s): %s" % \ >>> + (name, url, network)) >>> continue >>>=20 >>> # Conduct SQL statement... >>> @@ -1488,14 +1491,31 @@ class CLI(object): >>> ) VALUES (%s, %s, %s) >>> ON CONFLICT (network) DO UPDATE SET is_drop =3D True""", >>> "%s" % network, >>> - "Spamhaus DROP lists", >>> + name, >>> True >>> ) >>>=20 >>> - for url in asn_urls: >>> + for list in asn_lists: >>> + name =3D list[0] >>> + url =3D list[1] >>> + >>=20 >> See above. >>=20 >>> # Fetch URL >>> f =3D downloader.retrieve(url) >>>=20 >>> + # Split into lines >>> + fcontent =3D f.readlines() >>> + >>> + # Conduct a very basic sanity check to rule out CDN issues causing bo= gus DROP >>> + # downloads. >>> + if len(fcontent) > 10: >>> + self.db.execute(""" >>> + DELETE FROM autnum_overrides WHERE source =3D '%s'; >>> + """ % name, >>> + ) >>> + else: >>> + log.error("%s (%s) returned likely bogus file, ignored" % (name, url= )) >>> + continue >>=20 >> Is there a reason why this DELETE statement is not part of the transaction= ? If something goes wrong later on, you would have lost the previous data. >>=20 >> Would it also not be programmatically better to start the import and then = count how many lines you are importing? >>=20 >> The =E2=80=9Cf.readlines()=E2=80=9D call is something that I would conside= r a bit ugly. >>=20 >>> + >>> # Iterate through every line, filter comments and add remaining ASNs to >>> # the override table in case they are valid... >>> with self.db.transaction(): >>> @@ -1518,8 +1538,8 @@ class CLI(object): >>>=20 >>> # Filter invalid ASNs... >>> if not self._check_parsed_asn(asn): >>> - log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s"= % \ >>> - (url, asn)) >>> + log.warning("Skipping bogus ASN found in %s (%s): %s" % \ >>> + (name, url, asn)) >>> continue >>>=20 >>> # Conduct SQL statement... >>> @@ -1531,7 +1551,7 @@ class CLI(object): >>> ) VALUES (%s, %s, %s) >>> ON CONFLICT (number) DO UPDATE SET is_drop =3D True""", >>> "%s" % asn, >>> - "Spamhaus ASN-DROP list", >>> + name, >>> True >>> ) >>>=20 >>> --=20 >>> 2.35.3 --===============3661186158044965511==--