Hello, > On 13 Oct 2021, at 17:37, Peter Müller wrote: > > Hello Michael, > > thanks for your reply. > >> Hello, >> >>> On 10 Oct 2021, at 17:16, Peter Müller wrote: >>> >>> A while ago, it was discussed whether or not libloc should become an >>> "opinionated database", i. e. including any information on a network's >>> reputation. >>> >>> In general, this idea was dismissed as libloc is neither intended nor >>> suitable for such tasks, and we do not want to make (political?) >>> decisions like these for various reasons. All we do is to provide a >>> useful location database in a neutral way, and leave it up to our users >>> on how to react on certain results. >>> >>> However, there is a problematic area. Take AS55303 as an example: We >>> _know_ this is to be a dirty network, tampering with RIR data and >>> hijacking IP space, and strongly recommend against processing any >>> connection originating from or directed to it. >>> >>> Since it appears to be loaded with proxies used by miscreants for >>> abusive purposes, all we can do at the time of writing is to flag it >>> as "anonymous proxy", but we lack possibility of telling our users >>> something like "this is not a safe area". The very same goes for known >>> bulletproof ISPs, IP hijackers, and so forth. >>> >>> This patch therefore suggests to populate the "is_drop" flag introduced >>> in libloc 0.9.8 (albeit currently unused in production) with the >>> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to >>> have at least the baddest of the bad covered. The very same lists are, >>> in fact, included in popular IPS rulesets as well - a decent amount of >>> IPFire users is therefore likely to have them already enabled, but in a >>> very costly way. >>> >>> It is not planned to go further, partly because there is no other feed >>> publicly available, which would come with the same intention, >>> volatility, and FP rate. >>> >>> The second version of this patch makes use of an auxiliary function to >>> sanitise ASNs, hence avoiding boilerplate code, and treats any line >>> starting with a semicolon as a comment, which should be sufficient. >>> >>> Signed-off-by: Peter Müller >>> --- >>> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 104 insertions(+) >>> >>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in >>> index c2b3e41..1d84c15 100644 >>> --- a/src/python/location-importer.in >>> +++ b/src/python/location-importer.in >>> @@ -1075,6 +1075,9 @@ class CLI(object): >>> # network allocation lists in a machine-readable format... >>> self._update_overrides_for_aws() >>> >>> + # Update overrides for Spamhaus DROP feeds... >>> + self._update_overrides_for_spamhaus_drop() >>> + >>> for file in ns.files: >>> log.info("Reading %s..." % file) >>> >>> @@ -1259,6 +1262,107 @@ class CLI(object): >>> ) >>> >>> >>> + def _update_overrides_for_spamhaus_drop(self): >>> + downloader = location.importer.Downloader() >>> + >>> + ip_urls = [ >>> + "https://www.spamhaus.org/drop/drop.txt", >>> + "https://www.spamhaus.org/drop/edrop.txt", >>> + "https://www.spamhaus.org/drop/dropv6.txt" >>> + ] >>> + >>> + asn_urls = [ >>> + "https://www.spamhaus.org/drop/asndrop.txt" >>> + ] >>> + >>> + for url in ip_urls: >>> + try: >>> + with downloader.request(url, return_blocks=False) as f: >>> + fcontent = f.body.readlines() >>> + except Exception as e: >>> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >>> + return >>> + >>> + # Iterate through every line, filter comments and add remaining networks to >>> + # the override table in case they are valid... >>> + with self.db.transaction(): >>> + for sline in fcontent: >>> + >>> + # The response is assumed to be encoded in UTF-8... >>> + sline = sline.decode("utf-8") >>> + >>> + # Comments start with a semicolon... >>> + if sline.startswith(";"): >>> + continue >>> + >>> + # Extract network and ignore anything afterwards... >>> + try: >>> + network = ipaddress.ip_network(sline.split()[0], strict=False) >>> + except ValueError: >>> + log.error("Unable to parse line: %s" % sline) >>> + continue >>> + >>> + # Sanitize parsed networks... >>> + if not self._check_parsed_network(network): >>> + log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ >>> + (url, network)) >>> + continue >>> + >>> + # Conduct SQL statement... >>> + self.db.execute(""" >>> + INSERT INTO network_overrides( >>> + network, >>> + source, >>> + is_drop >>> + ) VALUES (%s, %s, %s) >>> + ON CONFLICT (network) DO NOTHING""", >>> + "%s" % network, >>> + "Spamhaus DROP lists", >>> + True >>> + ) >>> + >>> + for url in asn_urls: >>> + try: >>> + with downloader.request(url, return_blocks=False) as f: >>> + fcontent = f.body.readlines() >>> + except Exception as e: >>> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >>> + return >>> + >>> + # Iterate through every line, filter comments and add remaining ASNs to >>> + # the override table in case they are valid... >>> + with self.db.transaction(): >>> + for sline in fcontent: >>> + >>> + # The response is assumed to be encoded in UTF-8... >>> + sline = sline.decode("utf-8") >>> + >>> + # Comments start with a semicolon... >>> + if sline.startswith(";"): >>> + continue >>> + >>> + # Extract ASN and ignore anything afterwards... >>> + asn = int(sline.split()[0].strip("AS")) >> >> I kind of like stuff like this being split over multiple lines. Performance is the same, but it is easier to understand and debugging lines can be added easier. > > I see. > >> >>> + >>> + # Filter invalid ASNs... >>> + if not self._check_parsed_asn(asn): >>> + log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ >>> + (url, asn)) >>> + continue >> >> This answers my question from my other email whether it is possible that a non-integer is being passed to the check function. It can’t happen here. I would prefer to drop the isinstance() check for performance reasons. > > True, it can't happen, but I just noticed I would not catch a ValueError in case something > else than an integer would have left after all these split() and strip(). In the > _import_as_names_from_arin() function, this is covered, hence I will add this here as well. Yeah that might be true, but you don’t want to catch all exceptions. They are called exceptions for a reason. When you parse stuff, you want to check this, but I would normally expect that the caller of a function has sufficiently typed their arguments. Otherwise you want to use a typed language like C :) -Michael > > Thanks, and best regards, > Peter Müller > >> >>> + >>> + # Conduct SQL statement... >>> + self.db.execute(""" >>> + INSERT INTO autnum_overrides( >>> + number, >>> + source, >>> + is_drop >>> + ) VALUES (%s, %s, %s) >>> + ON CONFLICT (number) DO NOTHING""", >>> + "%s" % asn, >>> + "Spamhaus ASN-DROP list", >>> + True >>> + ) >>> + >>> @staticmethod >>> def _parse_bool(block, key): >>> val = block.get(key) >>> -- >>> 2.26.2