From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH v2] location-importer.in: skip networks with unknown country codes Date: Thu, 01 Apr 2021 10:51:20 +0100 Message-ID: <02D5FE70-F986-4524-9F54-D1CFC0678777@ipfire.org> In-Reply-To: <5d6a3267-6b61-fa2e-b35c-0fc42c713e33@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6742893914875294518==" List-Id: --===============6742893914875294518== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, I merged this patch, but it has some unwanted side-effects: Technically it works as designed as we are successfully dropping any countrie= s that are not part of the imported list. I changed our scripts that these wi= ll always be imported first now. I ran a manual import which dropped CS which is Serbia and Montenegro. This u= sed to be a valid country code, but Serbia and Montenegro is not a single cou= ntry any more. I decided to add it because we would have dropped too many net= works without it. Now we are dropping a few networks with country code YU - Y= ugoslavia. Montenegro became independent from Serbia in 2006, Yugoslavia became the Stat= e Union of Serbia and Montenegro in 2003. For some reasons (probably because = I didn=E2=80=99t do research) I thought these events were closer together and= therefore thought that all networks with country code CS simply =E2=80=9Cfor= got=E2=80=9D to update this, but there never were any that actually existing = during the time of Yugoslavia. Long story short: Would anybody object to add YU to the database although it = doesn=E2=80=99t exist as a country any more? I guess we cannot just =E2=80=9C= rewrite=E2=80=9D it because the situation is way too complicated. However, we= wanted to give people an idea where some IP address is located and that is k= ind of does not work if the country does not exist any more. Returning nothin= g instead is not a great solution either because we are then simply hiding ne= tworks that exist. Or did I overlook an ever better option? -Michael > On 30 Mar 2021, at 16:47, Peter M=C3=BCller wr= ote: >=20 > There is no sense in parsing and storting networks whose country codes > cannot be found in the ISO-3166-x country code table. This avoids side > effects in applications using the location database, and introduces > another sanity check to compensate bogus RIR data. >=20 > On location02, this affects some networks from APNIC (country code: ZZ) > as well as a bunch of smaller allocations within the RIPE region still > tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU > (Soviet Union) was found - while the NIC for .su TLD is still > operational. :-) >=20 > Applying this patch causes the countries to be processed before > update_whois() is called. In case no countries are present in the SQL > table, this check is silently omitted. >=20 > Fixes: #12510 >=20 > Signed-off-by: Peter M=C3=BCller > --- > src/python/location-importer.in | 38 ++++++++++++++++++++++----------- > 1 file changed, 26 insertions(+), 12 deletions(-) >=20 > diff --git a/src/python/location-importer.in b/src/python/location-importer= .in > index e2f201b..1e08458 100644 > --- a/src/python/location-importer.in > +++ b/src/python/location-importer.in > @@ -388,10 +388,17 @@ class CLI(object): > TRUNCATE TABLE networks; > """) >=20 > + # Fetch all valid country codes to check parsed networks aganist... > + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_code") > + validcountries =3D [] > + > + for row in rows: > + validcountries.append(row.country_code) > + > for source in location.importer.WHOIS_SOURCES: > with downloader.request(source, return_blocks=3DTrue) as f: > for block in f: > - self._parse_block(block) > + self._parse_block(block, validcountries) >=20 > # Process all parsed networks from every RIR we happen to have access to, > # insert the largest network chunks into the networks table immediately.= .. > @@ -467,7 +474,7 @@ class CLI(object): > # Download data > with downloader.request(source) as f: > for line in f: > - self._parse_line(line) > + self._parse_line(line, validcountries) >=20 > def _check_parsed_network(self, network): > """ > @@ -532,7 +539,7 @@ class CLI(object): > # be suitable for libloc consumption... > return True >=20 > - def _parse_block(self, block): > + def _parse_block(self, block, validcountries =3D None): > # Get first line to find out what type of block this is > line =3D block[0] >=20 > @@ -542,7 +549,7 @@ class CLI(object): >=20 > # inetnum > if line.startswith("inet6num:") or line.startswith("inetnum:"): > - return self._parse_inetnum_block(block) > + return self._parse_inetnum_block(block, validcountries) >=20 > # organisation > elif line.startswith("organisation:"): > @@ -573,7 +580,7 @@ class CLI(object): > autnum.get("asn"), autnum.get("org"), > ) >=20 > - def _parse_inetnum_block(self, block): > + def _parse_inetnum_block(self, block, validcountries =3D None): > log.debug("Parsing inetnum block:") >=20 > inetnum =3D {} > @@ -616,10 +623,10 @@ class CLI(object): > if not inetnum or not "country" in inetnum: > return >=20 > - # Skip objects with bogus country code 'ZZ' > - if inetnum.get("country") =3D=3D "ZZ": > - log.warning("Skipping network with bogus country 'ZZ': %s" % \ > - (inetnum.get("inet6num") or inetnum.get("inetnum"))) > + # Skip objects with unknown country codes > + if validcountries and inetnum.get("country") not in validcountries: > + log.warning("Skipping network with bogus country '%s': %s" % \ > + (inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inetn= um"))) > return >=20 > # Iterate through all networks enumerated from above, check them for plau= sibility and insert > @@ -652,7 +659,7 @@ class CLI(object): > org.get("organisation"), org.get("org-name"), > ) >=20 > - def _parse_line(self, line): > + def _parse_line(self, line, validcountries =3D None): > # Skip version line > if line.startswith("2"): > return > @@ -667,8 +674,15 @@ class CLI(object): > log.warning("Could not parse line: %s" % line) > return >=20 > - # Skip any lines that are for stats only > - if country_code =3D=3D "*": > + # Skip any lines that are for stats only or do not have a country > + # code at all (avoids log spam below) > + if not country_code or country_code =3D=3D '*': > + return > + > + # Skip objects with unknown country codes > + if validcountries and country_code not in validcountries: > + log.warning("Skipping line with bogus country '%s': %s" % \ > + (country_code, line)) > return >=20 > if type in ("ipv6", "ipv4"): > --=20 > 2.26.2 --===============6742893914875294518==--