From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Date: Thu, 14 Oct 2021 19:12:36 +0100 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4548273148446430982==" List-Id: --===============4548273148446430982== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 13 Oct 2021, at 17:37, Peter M=C3=BCller wr= ote: >=20 > Hello Michael, >=20 > thanks for your reply. >=20 >> Hello, >>=20 >>> On 10 Oct 2021, at 17:16, Peter M=C3=BCller = wrote: >>>=20 >>> A while ago, it was discussed whether or not libloc should become an >>> "opinionated database", i. e. including any information on a network's >>> reputation. >>>=20 >>> In general, this idea was dismissed as libloc is neither intended nor >>> suitable for such tasks, and we do not want to make (political?) >>> decisions like these for various reasons. All we do is to provide a >>> useful location database in a neutral way, and leave it up to our users >>> on how to react on certain results. >>>=20 >>> However, there is a problematic area. Take AS55303 as an example: We >>> _know_ this is to be a dirty network, tampering with RIR data and >>> hijacking IP space, and strongly recommend against processing any >>> connection originating from or directed to it. >>>=20 >>> Since it appears to be loaded with proxies used by miscreants for >>> abusive purposes, all we can do at the time of writing is to flag it >>> as "anonymous proxy", but we lack possibility of telling our users >>> something like "this is not a safe area". The very same goes for known >>> bulletproof ISPs, IP hijackers, and so forth. >>>=20 >>> This patch therefore suggests to populate the "is_drop" flag introduced >>> in libloc 0.9.8 (albeit currently unused in production) with the >>> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to >>> have at least the baddest of the bad covered. The very same lists are, >>> in fact, included in popular IPS rulesets as well - a decent amount of >>> IPFire users is therefore likely to have them already enabled, but in a >>> very costly way. >>>=20 >>> It is not planned to go further, partly because there is no other feed >>> publicly available, which would come with the same intention, >>> volatility, and FP rate. >>>=20 >>> The second version of this patch makes use of an auxiliary function to >>> sanitise ASNs, hence avoiding boilerplate code, and treats any line >>> starting with a semicolon as a comment, which should be sufficient. >>>=20 >>> Signed-off-by: Peter M=C3=BCller >>> --- >>> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 104 insertions(+) >>>=20 >>> diff --git a/src/python/location-importer.in b/src/python/location-import= er.in >>> index c2b3e41..1d84c15 100644 >>> --- a/src/python/location-importer.in >>> +++ b/src/python/location-importer.in >>> @@ -1075,6 +1075,9 @@ class CLI(object): >>> # network allocation lists in a machine-readable format... >>> self._update_overrides_for_aws() >>>=20 >>> + # Update overrides for Spamhaus DROP feeds... >>> + self._update_overrides_for_spamhaus_drop() >>> + >>> for file in ns.files: >>> log.info("Reading %s..." % file) >>>=20 >>> @@ -1259,6 +1262,107 @@ class CLI(object): >>> ) >>>=20 >>>=20 >>> + def _update_overrides_for_spamhaus_drop(self): >>> + downloader =3D location.importer.Downloader() >>> + >>> + ip_urls =3D [ >>> + "https://www.spamhaus.org/drop/drop.txt", >>> + "https://www.spamhaus.org/drop/edrop.txt", >>> + "https://www.spamhaus.org/drop/dropv6.txt" >>> + ] >>> + >>> + asn_urls =3D [ >>> + "https://www.spamhaus.org/drop/asndrop.txt" >>> + ] >>> + >>> + for url in ip_urls: >>> + try: >>> + with downloader.request(url, return_blocks=3DFalse) as f: >>> + fcontent =3D f.body.readlines() >>> + except Exception as e: >>> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >>> + return >>> + >>> + # Iterate through every line, filter comments and add remaining netwo= rks to >>> + # the override table in case they are valid... >>> + with self.db.transaction(): >>> + for sline in fcontent: >>> + >>> + # The response is assumed to be encoded in UTF-8... >>> + sline =3D sline.decode("utf-8") >>> + >>> + # Comments start with a semicolon... >>> + if sline.startswith(";"): >>> + continue >>> + >>> + # Extract network and ignore anything afterwards... >>> + try: >>> + network =3D ipaddress.ip_network(sline.split()[0], strict=3DFalse) >>> + except ValueError: >>> + log.error("Unable to parse line: %s" % sline) >>> + continue >>> + >>> + # Sanitize parsed networks... >>> + if not self._check_parsed_network(network): >>> + log.warning("Skipping bogus network found in Spamhaus DROP URL %s:= %s" % \ >>> + (url, network)) >>> + continue >>> + >>> + # Conduct SQL statement... >>> + self.db.execute(""" >>> + INSERT INTO network_overrides( >>> + network, >>> + source, >>> + is_drop >>> + ) VALUES (%s, %s, %s) >>> + ON CONFLICT (network) DO NOTHING""", >>> + "%s" % network, >>> + "Spamhaus DROP lists", >>> + True >>> + ) >>> + >>> + for url in asn_urls: >>> + try: >>> + with downloader.request(url, return_blocks=3DFalse) as f: >>> + fcontent =3D f.body.readlines() >>> + except Exception as e: >>> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >>> + return >>> + >>> + # Iterate through every line, filter comments and add remaining ASNs = to >>> + # the override table in case they are valid... >>> + with self.db.transaction(): >>> + for sline in fcontent: >>> + >>> + # The response is assumed to be encoded in UTF-8... >>> + sline =3D sline.decode("utf-8") >>> + >>> + # Comments start with a semicolon... >>> + if sline.startswith(";"): >>> + continue >>> + >>> + # Extract ASN and ignore anything afterwards... >>> + asn =3D int(sline.split()[0].strip("AS")) >>=20 >> I kind of like stuff like this being split over multiple lines. Performanc= e is the same, but it is easier to understand and debugging lines can be adde= d easier. >=20 > I see. >=20 >>=20 >>> + >>> + # Filter invalid ASNs... >>> + if not self._check_parsed_asn(asn): >>> + log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s"= % \ >>> + (url, asn)) >>> + continue >>=20 >> This answers my question from my other email whether it is possible that a= non-integer is being passed to the check function. It can=E2=80=99t happen = here. I would prefer to drop the isinstance() check for performance reasons. >=20 > True, it can't happen, but I just noticed I would not catch a ValueError in= case something > else than an integer would have left after all these split() and strip(). I= n the > _import_as_names_from_arin() function, this is covered, hence I will add th= is here as well. Yeah that might be true, but you don=E2=80=99t want to catch all exceptions. = They are called exceptions for a reason. When you parse stuff, you want to check this, but I would normally expect tha= t the caller of a function has sufficiently typed their arguments. Otherwise = you want to use a typed language like C :) -Michael >=20 > Thanks, and best regards, > Peter M=C3=BCller >=20 >>=20 >>> + >>> + # Conduct SQL statement... >>> + self.db.execute(""" >>> + INSERT INTO autnum_overrides( >>> + number, >>> + source, >>> + is_drop >>> + ) VALUES (%s, %s, %s) >>> + ON CONFLICT (number) DO NOTHING""", >>> + "%s" % asn, >>> + "Spamhaus ASN-DROP list", >>> + True >>> + ) >>> + >>> @staticmethod >>> def _parse_bool(block, key): >>> val =3D block.get(key) >>> --=20 >>> 2.26.2 --===============4548273148446430982==--