From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: location@lists.ipfire.org Subject: Re: [PATCH v2] location-importer.in: skip networks with unknown country codes Date: Fri, 02 Apr 2021 21:58:51 +0200 Message-ID: In-Reply-To: <02D5FE70-F986-4524-9F54-D1CFC0678777@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1685820655397445619==" List-Id: --===============1685820655397445619== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, thank you for your reply and merging this. On location02, the amount of networks being ignored because of a "YU" country= set is (please excuse the crappy "sort" output - it is really useless when it comes = to IP addresses): > 194.106.185.0/26 > 194.106.185.144/28 > 194.106.185.96/28 > 194.194.158.0/25 > 194.194.158.128/26 > 194.247.200.160/28 > 194.247.207.224/27 > 194.247.223.12/30 > 194.247.223.16/28 > 194.247.223.32/28 > 194.247.223.48/28 > 194.247.223.72/29 > 194.247.223.80/28 > 194.247.223.96/28 > 195.178.61.128/26 > 195.178.62.192/27 > 195.178.62.64/28 > 195.178.63.0/29 > 195.178.63.128/28 > 195.178.63.144/28 > 195.178.63.32/27 > 195.178.63.8/29 > 195.250.104.135/32 > 195.250.104.140/32 > 195.250.104.145/32 > 195.250.104.224/28 > 195.250.104.240/28 > 195.250.113.144/28 > 195.250.113.192/27 > 195.250.114.224/27 > 195.250.116.0/26 > 195.250.116.64/26 > 195.252.107.128/29 > 195.252.110.192/26 > 195.252.111.128/26 > 195.252.111.192/26 > 195.252.115.0/24 > 195.252.118.0/26 > 195.252.118.64/26 > 195.252.120.0/24 > 195.66.165.0/24 > 217.26.77.64/26 > 217.26.79.0/27 > 62.108.115.0/28 > 62.108.117.16/28 > 62.108.117.96/28 > 62.193.141.192/28 > 62.193.141.224/28 > 62.193.141.240/28 > 62.193.141.56/29 Since we currently ignore anything more specific than a /24, only these netwo= rks are actually relevant in this discussion, as we would have discarded the others anyway: > 195.66.165.0/24 > 195.252.115.0/24 > 195.252.120.0/24 Digging deeper into them, the first one dead-ends somewhere in the vicinity o= f Croatia, having a RIPE database entry dated before 2003: > inetnum: 195.66.165.0 - 195.66.165.255 > netname: Posta_Crne_Gore > descr: Posta Crne Gore > country: YU > admin-c: MM609-RIPE > tech-c: MM609-RIPE > status: ASSIGNED PA > mnt-by: AS8585-MNT > created: 2001-10-04T08:34:51Z > last-modified: 2002-10-30T09:36:47Z > source: RIPE >=20 > person: Martinovic Milan > address: Posta Crne Gore > address: Slobode 1 > address: 81000 Podgorica > address: Montenegro, Yugoslavia > phone: +381 81 225 181 > nic-hdl: MM609-RIPE > created: 1970-01-01T00:00:00Z > last-modified: 2020-06-03T10:52:16Z > source: RIPE # Filtered > mnt-by: AS8585-MNT >=20 > route: 195.66.160.0/19 > descr: Internet Crna Gora > origin: AS8585 > mnt-by: AS8585-MNT > created: 1970-01-01T00:00:00Z > last-modified: 2001-09-22T09:33:48Z > source: RIPE # Filtered The second is routed by AS6700 into a residential dial-up network pool somewh= ere in Serbia, while it's RIPE DB entry shows: > inetnum: 195.252.115.0 - 195.252.115.255 > netname: DRENIK > descr: Drenik ISP > descr: Beograd, Deligradska 19 > country: YU > admin-c: DR47-RIPE > tech-c: DR47-RIPE > status: ASSIGNED PA > mnt-by: AS6700-MNT > created: 2002-04-11T08:21:26Z > last-modified: 2002-04-11T08:21:26Z > source: RIPE >=20 > person: Nenad Repac > address: D.D. TELEFONIJA > address: Marsala Tolbuhina 56 > address: 11000 Beograd > address: Yugoslavia > phone: +381 11 444 11 44 Ext. 381 > fax-no: +381 11 3248 953 > nic-hdl: DR47-RIPE > mnt-by: AS6700-MNT > created: 1970-01-01T00:00:00Z > last-modified: 2001-09-21T23:28:31Z > source: RIPE # Filtered >=20 > route: 195.252.96.0/19 > descr: BeotelNet ISP, Belgrade, RS > origin: AS6700 > mnt-by: AS6700-MNT > created: 1970-01-01T00:00:00Z > last-modified: 2019-07-15T09:12:36Z > source: RIPE Same goes for the third network, having a RIPE DB entry maintained by the sam= e organisation: > inetnum: 195.252.120.0 - 195.252.120.255 > netname: ABSOFT > descr: AB SOFT > descr: Kneza Milosa 82, Beograd > country: YU > admin-c: DR47-RIPE > tech-c: DR47-RIPE > status: ASSIGNED PA > mnt-by: AS6700-MNT > created: 2002-04-10T16:54:48Z > last-modified: 2002-04-10T16:54:48Z > source: RIPE >=20 > person: Nenad Repac > address: D.D. TELEFONIJA > address: Marsala Tolbuhina 56 > address: 11000 Beograd > address: Yugoslavia > phone: +381 11 444 11 44 Ext. 381 > fax-no: +381 11 3248 953 > nic-hdl: DR47-RIPE > mnt-by: AS6700-MNT > created: 1970-01-01T00:00:00Z > last-modified: 2001-09-21T23:28:31Z > source: RIPE # Filtered >=20 > route: 195.252.96.0/19 > descr: BeotelNet ISP, Belgrade, RS > origin: AS6700 > mnt-by: AS6700-MNT > created: 1970-01-01T00:00:00Z > last-modified: 2019-07-15T09:12:36Z > source: RIPE Since we are only dealing with three networks here and their actual location = seems to be pretty clear to me, I suggest _not_ to add YU as a legitimate country. Instead, I wo= uld just write overrides for these networks. Would you be fine with that? Thanks, and best regards, Peter M=C3=BCller > Hello, >=20 > I merged this patch, but it has some unwanted side-effects: >=20 > Technically it works as designed as we are successfully dropping any countr= ies that are not part of the imported list. I changed our scripts that these = will always be imported first now. >=20 > I ran a manual import which dropped CS which is Serbia and Montenegro. This= used to be a valid country code, but Serbia and Montenegro is not a single c= ountry any more. I decided to add it because we would have dropped too many n= etworks without it. Now we are dropping a few networks with country code YU -= Yugoslavia. >=20 > Montenegro became independent from Serbia in 2006, Yugoslavia became the St= ate Union of Serbia and Montenegro in 2003. For some reasons (probably becaus= e I didn=E2=80=99t do research) I thought these events were closer together a= nd therefore thought that all networks with country code CS simply =E2=80=9Cf= orgot=E2=80=9D to update this, but there never were any that actually existin= g during the time of Yugoslavia. >=20 > Long story short: Would anybody object to add YU to the database although i= t doesn=E2=80=99t exist as a country any more? I guess we cannot just =E2=80= =9Crewrite=E2=80=9D it because the situation is way too complicated. However,= we wanted to give people an idea where some IP address is located and that i= s kind of does not work if the country does not exist any more. Returning not= hing instead is not a great solution either because we are then simply hiding= networks that exist. >=20 > Or did I overlook an ever better option? >=20 > -Michael >=20 >> On 30 Mar 2021, at 16:47, Peter M=C3=BCller w= rote: >> >> There is no sense in parsing and storting networks whose country codes >> cannot be found in the ISO-3166-x country code table. This avoids side >> effects in applications using the location database, and introduces >> another sanity check to compensate bogus RIR data. >> >> On location02, this affects some networks from APNIC (country code: ZZ) >> as well as a bunch of smaller allocations within the RIPE region still >> tagged to CS or YU (Yugoslavia). To my surprise, no network tagged as SU >> (Soviet Union) was found - while the NIC for .su TLD is still >> operational. :-) >> >> Applying this patch causes the countries to be processed before >> update_whois() is called. In case no countries are present in the SQL >> table, this check is silently omitted. >> >> Fixes: #12510 >> >> Signed-off-by: Peter M=C3=BCller >> --- >> src/python/location-importer.in | 38 ++++++++++++++++++++++----------- >> 1 file changed, 26 insertions(+), 12 deletions(-) >> >> diff --git a/src/python/location-importer.in b/src/python/location-importe= r.in >> index e2f201b..1e08458 100644 >> --- a/src/python/location-importer.in >> +++ b/src/python/location-importer.in >> @@ -388,10 +388,17 @@ class CLI(object): >> TRUNCATE TABLE networks; >> """) >> >> + # Fetch all valid country codes to check parsed networks aganist... >> + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_code") >> + validcountries =3D [] >> + >> + for row in rows: >> + validcountries.append(row.country_code) >> + >> for source in location.importer.WHOIS_SOURCES: >> with downloader.request(source, return_blocks=3DTrue) as f: >> for block in f: >> - self._parse_block(block) >> + self._parse_block(block, validcountries) >> >> # Process all parsed networks from every RIR we happen to have access t= o, >> # insert the largest network chunks into the networks table immediately= ... >> @@ -467,7 +474,7 @@ class CLI(object): >> # Download data >> with downloader.request(source) as f: >> for line in f: >> - self._parse_line(line) >> + self._parse_line(line, validcountries) >> >> def _check_parsed_network(self, network): >> """ >> @@ -532,7 +539,7 @@ class CLI(object): >> # be suitable for libloc consumption... >> return True >> >> - def _parse_block(self, block): >> + def _parse_block(self, block, validcountries =3D None): >> # Get first line to find out what type of block this is >> line =3D block[0] >> >> @@ -542,7 +549,7 @@ class CLI(object): >> >> # inetnum >> if line.startswith("inet6num:") or line.startswith("inetnum:"): >> - return self._parse_inetnum_block(block) >> + return self._parse_inetnum_block(block, validcountries) >> >> # organisation >> elif line.startswith("organisation:"): >> @@ -573,7 +580,7 @@ class CLI(object): >> autnum.get("asn"), autnum.get("org"), >> ) >> >> - def _parse_inetnum_block(self, block): >> + def _parse_inetnum_block(self, block, validcountries =3D None): >> log.debug("Parsing inetnum block:") >> >> inetnum =3D {} >> @@ -616,10 +623,10 @@ class CLI(object): >> if not inetnum or not "country" in inetnum: >> return >> >> - # Skip objects with bogus country code 'ZZ' >> - if inetnum.get("country") =3D=3D "ZZ": >> - log.warning("Skipping network with bogus country 'ZZ': %s" % \ >> - (inetnum.get("inet6num") or inetnum.get("inetnum"))) >> + # Skip objects with unknown country codes >> + if validcountries and inetnum.get("country") not in validcountries: >> + log.warning("Skipping network with bogus country '%s': %s" % \ >> + (inetnum.get("country"), inetnum.get("inet6num") or inetnum.get("inet= num"))) >> return >> >> # Iterate through all networks enumerated from above, check them for pla= usibility and insert >> @@ -652,7 +659,7 @@ class CLI(object): >> org.get("organisation"), org.get("org-name"), >> ) >> >> - def _parse_line(self, line): >> + def _parse_line(self, line, validcountries =3D None): >> # Skip version line >> if line.startswith("2"): >> return >> @@ -667,8 +674,15 @@ class CLI(object): >> log.warning("Could not parse line: %s" % line) >> return >> >> - # Skip any lines that are for stats only >> - if country_code =3D=3D "*": >> + # Skip any lines that are for stats only or do not have a country >> + # code at all (avoids log spam below) >> + if not country_code or country_code =3D=3D '*': >> + return >> + >> + # Skip objects with unknown country codes >> + if validcountries and country_code not in validcountries: >> + log.warning("Skipping line with bogus country '%s': %s" % \ >> + (country_code, line)) >> return >> >> if type in ("ipv6", "ipv4"): >> --=20 >> 2.26.2 >=20 --===============1685820655397445619==--