Hello, > On 10 Oct 2021, at 17:16, Peter Müller wrote: > > A while ago, it was discussed whether or not libloc should become an > "opinionated database", i. e. including any information on a network's > reputation. > > In general, this idea was dismissed as libloc is neither intended nor > suitable for such tasks, and we do not want to make (political?) > decisions like these for various reasons. All we do is to provide a > useful location database in a neutral way, and leave it up to our users > on how to react on certain results. > > However, there is a problematic area. Take AS55303 as an example: We > _know_ this is to be a dirty network, tampering with RIR data and > hijacking IP space, and strongly recommend against processing any > connection originating from or directed to it. > > Since it appears to be loaded with proxies used by miscreants for > abusive purposes, all we can do at the time of writing is to flag it > as "anonymous proxy", but we lack possibility of telling our users > something like "this is not a safe area". The very same goes for known > bulletproof ISPs, IP hijackers, and so forth. > > This patch therefore suggests to populate the "is_drop" flag introduced > in libloc 0.9.8 (albeit currently unused in production) with the > contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to > have at least the baddest of the bad covered. The very same lists are, > in fact, included in popular IPS rulesets as well - a decent amount of > IPFire users is therefore likely to have them already enabled, but in a > very costly way. > > It is not planned to go further, partly because there is no other feed > publicly available, which would come with the same intention, > volatility, and FP rate. > > The second version of this patch makes use of an auxiliary function to > sanitise ASNs, hence avoiding boilerplate code, and treats any line > starting with a semicolon as a comment, which should be sufficient. > > Signed-off-by: Peter Müller > --- > src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++ > 1 file changed, 104 insertions(+) > > diff --git a/src/python/location-importer.in b/src/python/location-importer.in > index c2b3e41..1d84c15 100644 > --- a/src/python/location-importer.in > +++ b/src/python/location-importer.in > @@ -1075,6 +1075,9 @@ class CLI(object): > # network allocation lists in a machine-readable format... > self._update_overrides_for_aws() > > + # Update overrides for Spamhaus DROP feeds... > + self._update_overrides_for_spamhaus_drop() > + > for file in ns.files: > log.info("Reading %s..." % file) > > @@ -1259,6 +1262,107 @@ class CLI(object): > ) > > > + def _update_overrides_for_spamhaus_drop(self): > + downloader = location.importer.Downloader() > + > + ip_urls = [ > + "https://www.spamhaus.org/drop/drop.txt", > + "https://www.spamhaus.org/drop/edrop.txt", > + "https://www.spamhaus.org/drop/dropv6.txt" > + ] > + > + asn_urls = [ > + "https://www.spamhaus.org/drop/asndrop.txt" > + ] > + > + for url in ip_urls: > + try: > + with downloader.request(url, return_blocks=False) as f: > + fcontent = f.body.readlines() > + except Exception as e: > + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) > + return > + > + # Iterate through every line, filter comments and add remaining networks to > + # the override table in case they are valid... > + with self.db.transaction(): > + for sline in fcontent: > + > + # The response is assumed to be encoded in UTF-8... > + sline = sline.decode("utf-8") > + > + # Comments start with a semicolon... > + if sline.startswith(";"): > + continue > + > + # Extract network and ignore anything afterwards... > + try: > + network = ipaddress.ip_network(sline.split()[0], strict=False) > + except ValueError: > + log.error("Unable to parse line: %s" % sline) > + continue > + > + # Sanitize parsed networks... > + if not self._check_parsed_network(network): > + log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \ > + (url, network)) > + continue > + > + # Conduct SQL statement... > + self.db.execute(""" > + INSERT INTO network_overrides( > + network, > + source, > + is_drop > + ) VALUES (%s, %s, %s) > + ON CONFLICT (network) DO NOTHING""", > + "%s" % network, > + "Spamhaus DROP lists", > + True > + ) > + > + for url in asn_urls: > + try: > + with downloader.request(url, return_blocks=False) as f: > + fcontent = f.body.readlines() > + except Exception as e: > + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) > + return > + > + # Iterate through every line, filter comments and add remaining ASNs to > + # the override table in case they are valid... > + with self.db.transaction(): > + for sline in fcontent: > + > + # The response is assumed to be encoded in UTF-8... > + sline = sline.decode("utf-8") > + > + # Comments start with a semicolon... > + if sline.startswith(";"): > + continue > + > + # Extract ASN and ignore anything afterwards... > + asn = int(sline.split()[0].strip("AS")) I kind of like stuff like this being split over multiple lines. Performance is the same, but it is easier to understand and debugging lines can be added easier. > + > + # Filter invalid ASNs... > + if not self._check_parsed_asn(asn): > + log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \ > + (url, asn)) > + continue This answers my question from my other email whether it is possible that a non-integer is being passed to the check function. It can’t happen here. I would prefer to drop the isinstance() check for performance reasons. > + > + # Conduct SQL statement... > + self.db.execute(""" > + INSERT INTO autnum_overrides( > + number, > + source, > + is_drop > + ) VALUES (%s, %s, %s) > + ON CONFLICT (number) DO NOTHING""", > + "%s" % asn, > + "Spamhaus ASN-DROP list", > + True > + ) > + > @staticmethod > def _parse_bool(block, key): > val = block.get(key) > -- > 2.26.2