From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: location@lists.ipfire.org Subject: Re: [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Date: Wed, 13 Oct 2021 18:37:08 +0200 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5727017939691463126==" List-Id: --===============5727017939691463126== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, thanks for your reply. > Hello, >=20 >> On 10 Oct 2021, at 17:16, Peter M=C3=BCller w= rote: >> >> A while ago, it was discussed whether or not libloc should become an >> "opinionated database", i. e. including any information on a network's >> reputation. >> >> In general, this idea was dismissed as libloc is neither intended nor >> suitable for such tasks, and we do not want to make (political?) >> decisions like these for various reasons. All we do is to provide a >> useful location database in a neutral way, and leave it up to our users >> on how to react on certain results. >> >> However, there is a problematic area. Take AS55303 as an example: We >> _know_ this is to be a dirty network, tampering with RIR data and >> hijacking IP space, and strongly recommend against processing any >> connection originating from or directed to it. >> >> Since it appears to be loaded with proxies used by miscreants for >> abusive purposes, all we can do at the time of writing is to flag it >> as "anonymous proxy", but we lack possibility of telling our users >> something like "this is not a safe area". The very same goes for known >> bulletproof ISPs, IP hijackers, and so forth. >> >> This patch therefore suggests to populate the "is_drop" flag introduced >> in libloc 0.9.8 (albeit currently unused in production) with the >> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to >> have at least the baddest of the bad covered. The very same lists are, >> in fact, included in popular IPS rulesets as well - a decent amount of >> IPFire users is therefore likely to have them already enabled, but in a >> very costly way. >> >> It is not planned to go further, partly because there is no other feed >> publicly available, which would come with the same intention, >> volatility, and FP rate. >> >> The second version of this patch makes use of an auxiliary function to >> sanitise ASNs, hence avoiding boilerplate code, and treats any line >> starting with a semicolon as a comment, which should be sufficient. >> >> Signed-off-by: Peter M=C3=BCller >> --- >> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++ >> 1 file changed, 104 insertions(+) >> >> diff --git a/src/python/location-importer.in b/src/python/location-importe= r.in >> index c2b3e41..1d84c15 100644 >> --- a/src/python/location-importer.in >> +++ b/src/python/location-importer.in >> @@ -1075,6 +1075,9 @@ class CLI(object): >> # network allocation lists in a machine-readable format... >> self._update_overrides_for_aws() >> >> + # Update overrides for Spamhaus DROP feeds... >> + self._update_overrides_for_spamhaus_drop() >> + >> for file in ns.files: >> log.info("Reading %s..." % file) >> >> @@ -1259,6 +1262,107 @@ class CLI(object): >> ) >> >> >> + def _update_overrides_for_spamhaus_drop(self): >> + downloader =3D location.importer.Downloader() >> + >> + ip_urls =3D [ >> + "https://www.spamhaus.org/drop/drop.txt", >> + "https://www.spamhaus.org/drop/edrop.txt", >> + "https://www.spamhaus.org/drop/dropv6.txt" >> + ] >> + >> + asn_urls =3D [ >> + "https://www.spamhaus.org/drop/asndrop.txt" >> + ] >> + >> + for url in ip_urls: >> + try: >> + with downloader.request(url, return_blocks=3DFalse) as f: >> + fcontent =3D f.body.readlines() >> + except Exception as e: >> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >> + return >> + >> + # Iterate through every line, filter comments and add remaining networ= ks to >> + # the override table in case they are valid... >> + with self.db.transaction(): >> + for sline in fcontent: >> + >> + # The response is assumed to be encoded in UTF-8... >> + sline =3D sline.decode("utf-8") >> + >> + # Comments start with a semicolon... >> + if sline.startswith(";"): >> + continue >> + >> + # Extract network and ignore anything afterwards... >> + try: >> + network =3D ipaddress.ip_network(sline.split()[0], strict=3DFalse) >> + except ValueError: >> + log.error("Unable to parse line: %s" % sline) >> + continue >> + >> + # Sanitize parsed networks... >> + if not self._check_parsed_network(network): >> + log.warning("Skipping bogus network found in Spamhaus DROP URL %s: = %s" % \ >> + (url, network)) >> + continue >> + >> + # Conduct SQL statement... >> + self.db.execute(""" >> + INSERT INTO network_overrides( >> + network, >> + source, >> + is_drop >> + ) VALUES (%s, %s, %s) >> + ON CONFLICT (network) DO NOTHING""", >> + "%s" % network, >> + "Spamhaus DROP lists", >> + True >> + ) >> + >> + for url in asn_urls: >> + try: >> + with downloader.request(url, return_blocks=3DFalse) as f: >> + fcontent =3D f.body.readlines() >> + except Exception as e: >> + log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e)) >> + return >> + >> + # Iterate through every line, filter comments and add remaining ASNs to >> + # the override table in case they are valid... >> + with self.db.transaction(): >> + for sline in fcontent: >> + >> + # The response is assumed to be encoded in UTF-8... >> + sline =3D sline.decode("utf-8") >> + >> + # Comments start with a semicolon... >> + if sline.startswith(";"): >> + continue >> + >> + # Extract ASN and ignore anything afterwards... >> + asn =3D int(sline.split()[0].strip("AS")) >=20 > I kind of like stuff like this being split over multiple lines. Performance= is the same, but it is easier to understand and debugging lines can be added= easier. I see. >=20 >> + >> + # Filter invalid ASNs... >> + if not self._check_parsed_asn(asn): >> + log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" = % \ >> + (url, asn)) >> + continue >=20 > This answers my question from my other email whether it is possible that a = non-integer is being passed to the check function. It can=E2=80=99t happen h= ere. I would prefer to drop the isinstance() check for performance reasons. True, it can't happen, but I just noticed I would not catch a ValueError in c= ase something else than an integer would have left after all these split() and strip(). In = the _import_as_names_from_arin() function, this is covered, hence I will add this= here as well. Thanks, and best regards, Peter M=C3=BCller >=20 >> + >> + # Conduct SQL statement... >> + self.db.execute(""" >> + INSERT INTO autnum_overrides( >> + number, >> + source, >> + is_drop >> + ) VALUES (%s, %s, %s) >> + ON CONFLICT (number) DO NOTHING""", >> + "%s" % asn, >> + "Spamhaus ASN-DROP list", >> + True >> + ) >> + >> @staticmethod >> def _parse_bool(block, key): >> val =3D block.get(key) >> --=20 >> 2.26.2 >=20 --===============5727017939691463126==--