From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: How should location-importer.in deal with RIR objects having multiple distinct "country" fields? Date: Tue, 04 May 2021 09:07:57 +0100 Message-ID: <6CC1F21D-097B-4948-8FF7-83964684212F@ipfire.org> In-Reply-To: <642234e4-c993-5c2d-199c-a1afed0d255b@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3730286551356983788==" List-Id: --===============3730286551356983788== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hey Peter, > On 3 May 2021, at 21:56, Peter M=C3=BCller wro= te: >=20 > Hello Michael, > hello location folks (CC'ed), >=20 > unfortunately, another problem surfaces when processing inetnum and inet6nu= m feeds from RIRs > which provide that kind of more precise data: A decent amount of network ob= jects have multiple > distinct "country" fields. First of all, don=E2=80=99t panic. We noticed this before and we decided to g= o with an easy solution for now. It probably is now time to revisit this and see if we can improve. Although y= ou might have seen many hits when checking for multiple countries, I am sure = that there is only a small number of networks (relatively speaking). > Here is an example: >=20 >> inetnum: 178.79.192.0 - 178.79.255.255 >> netname: EU-LLNW-20100512 >> country: EU >> country: SE >> country: DE >> country: NL >> country: GB >> country: ES >> country: FR >> country: IT >> org: ORG-LNI1-RIPE >> admin-c: GU2143-RIPE >> tech-c: GU2143-RIPE >> status: ALLOCATED PA >> remarks: ****************** ABUSE COMPLAINTS TO: abuse(a)limelightn= etworks.com >> mnt-by: RIPE-NCC-HM-MNT >> mnt-by: LLNW-MNT >> mnt-domains: LLNW-MNT >> mnt-routes: LLNW-MNT >> created: 2010-05-12T16:20:38Z >> last-modified: 2017-09-01T17:39:08Z >> source: RIPE # Filtered >=20 > Currently, the last country item is made persistent via the SQL INSERT stat= ement. Since these do > not appear to be sorted in any way, this makes things completely nondetermi= nistic. How do you know that they are not sorted? Do you expect them to be sorted alphabetically? That wouldn=E2=80=99t make se= nse. If the order in which they are being put in would be preserved, we can alread= y change and use the first country code, hoping that they would have been put= in in order of precedence. In this example, EU is probably the best way to s= ay =E2=80=9CSE, DE, NL, GB, ES, FR, IT=E2=80=9D. > The network above would be, however, recoverable: If we do not interpret "E= U" as the European Union, > but rather as the European country, all other country codes given here woul= d be covered by it. European country? Did you mean continent? Last time I checked a map, the bord= ers were still there. > Alas, this is not helping in cases such as these two: >=20 >> Country of network [IPv4Network('77.74.172.0/23')] already set to 'CH', om= itting 'FI' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('185.253.140.0/24')] already set to 'GB', = omitting 'NL' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('185.253.140.0/24')] already set to 'GB', = omitting 'US' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'US' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'JP' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'SG' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'AU' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'NL' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'FR' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', = omitting 'DE' (multiple country lines in RIR data?) = = =20 >> Country of network [IPv4Network('193.109.168.0/22')] already set to 'GB', = omitting 'US' (multiple country lines in RIR data?) >=20 > There are _plenty_ of such networks, I believe RIPE IPv4 only fills several= screen pages. Nothing > in life is ever easy, and parsing RIR data definitely isn't... :-/ >=20 > Delegating the task of handling such situations to the application using li= bloc does not make sense > to me, as people are _expecting_ precise answers from it - if we can use th= e term of preciseness here > at all -, otherwise, they could simply parse RIR data on their own. Therefo= re, we have to somehow make > do with this. Possible options would be as follows: >=20 > (a) We do not process such networks entirely. If a network operator wants t= o have his/her network > covered by libloc, he/she/it should kindly fix it's RIR data. >=20 > That would not prevent us from obtaining announcements for such networks= , but we would not label > them with any country anymore. This is a very bad proposal. Just because we do not have 100% confidence in t= he data doesn=E2=80=99t mean we have to drop it. We generally can only trust the people who put the data in and that for me do= es not have 100% confidence. With that logic, the database would be empty. I would also assume that someone tried to do a good job here and list all cou= ntries where infrastructure for this network is located in. That is sometimes= difficult to say when you have a CDN because there are many POPs and they ar= e probably all organised as anycasts - a very common method these days. > (b) We try to automatically determine meaningful codes in each case. >=20 > This is tricky and not very deterministic. What about a network having "= CY" and "TR" set? Would that > be covered by "EU"? Good question. I would say yes. A continent is a good approximation. Worse would be DE and JP. Or CN and CH. You simply cannot group them together= with this logic. But I suppose there wouldn=E2=80=99t be too many examples l= ike this. That there are networks spread over Europe is much more common beca= use Europe is densely populated and because of a unified legal system and eas= y trade, putting servers into many countries isn=E2=80=99t an issue at all. W= e do this without even thinking about it. We just put them where our users ar= e or where it is cheapest. > 213.230.255.0/24 seems to be used worldwide, but in my point of view, th= is is not sufficient to > classify it as an anycast network. Worse, we have or should assign a cou= ntry code to anycast networks > as well. What is it, if not that? It technically could not be split into smaller netwo= rks. It is already /24. > (c) We try to determine the jurisdiction of a networks' organisation handle. >=20 > Frankly, I have no idea what problems would arise in this case. If an or= ganisation fails to provide > accurate and meaningful RIR data, what will their organisation handle po= ssibly look like? In the Limelight example from above, I would say US is what I would almost ex= pect with the anycast bit set. > Trying to keep things deterministic, (a) is my current favorite - it is the= most brutal, though. No, this is not acceptable at all. I would be much happier with either deciding this with an override on a case = by case basis, but that would be a lot of work. > Do you see a better way of dealing with such networks? Another automated way is to mark these networks with a =E2=80=9Cconfidence=E2= =80=9D value - however that is being determined. We would then require applic= ations to consider this value. The downside is that there cannot be much done= apart from dropping the network when it is below a certain threshold or use = it when confidence is high enough. That could be different for different appl= ications and make it more difficult to implement libloc. > @All: Thoughts? Comments? Opinions? >=20 > Thanks, and best regards, > Peter M=C3=BCller -Michael --===============3730286551356983788==--