Hey Peter,
On 3 May 2021, at 21:56, Peter Müller peter.mueller@ipfire.org wrote:
Hello Michael, hello location folks (CC'ed),
unfortunately, another problem surfaces when processing inetnum and inet6num feeds from RIRs which provide that kind of more precise data: A decent amount of network objects have multiple distinct "country" fields.
First of all, don’t panic. We noticed this before and we decided to go with an easy solution for now.
It probably is now time to revisit this and see if we can improve. Although you might have seen many hits when checking for multiple countries, I am sure that there is only a small number of networks (relatively speaking).
Here is an example:
inetnum: 178.79.192.0 - 178.79.255.255 netname: EU-LLNW-20100512 country: EU country: SE country: DE country: NL country: GB country: ES country: FR country: IT org: ORG-LNI1-RIPE admin-c: GU2143-RIPE tech-c: GU2143-RIPE status: ALLOCATED PA remarks: ****************** ABUSE COMPLAINTS TO: abuse@limelightnetworks.com mnt-by: RIPE-NCC-HM-MNT mnt-by: LLNW-MNT mnt-domains: LLNW-MNT mnt-routes: LLNW-MNT created: 2010-05-12T16:20:38Z last-modified: 2017-09-01T17:39:08Z source: RIPE # Filtered
Currently, the last country item is made persistent via the SQL INSERT statement. Since these do not appear to be sorted in any way, this makes things completely nondeterministic.
How do you know that they are not sorted?
Do you expect them to be sorted alphabetically? That wouldn’t make sense.
If the order in which they are being put in would be preserved, we can already change and use the first country code, hoping that they would have been put in in order of precedence. In this example, EU is probably the best way to say “SE, DE, NL, GB, ES, FR, IT”.
The network above would be, however, recoverable: If we do not interpret "EU" as the European Union, but rather as the European country, all other country codes given here would be covered by it.
European country? Did you mean continent? Last time I checked a map, the borders were still there.
Alas, this is not helping in cases such as these two:
Country of network [IPv4Network('77.74.172.0/23')] already set to 'CH', omitting 'FI' (multiple country lines in RIR data?) Country of network [IPv4Network('185.253.140.0/24')] already set to 'GB', omitting 'NL' (multiple country lines in RIR data?) Country of network [IPv4Network('185.253.140.0/24')] already set to 'GB', omitting 'US' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'US' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'JP' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'SG' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'AU' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'NL' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'FR' (multiple country lines in RIR data?) Country of network [IPv4Network('213.230.255.0/24')] already set to 'GB', omitting 'DE' (multiple country lines in RIR data?) Country of network [IPv4Network('193.109.168.0/22')] already set to 'GB', omitting 'US' (multiple country lines in RIR data?)
There are _plenty_ of such networks, I believe RIPE IPv4 only fills several screen pages. Nothing in life is ever easy, and parsing RIR data definitely isn't... :-/
Delegating the task of handling such situations to the application using libloc does not make sense to me, as people are _expecting_ precise answers from it - if we can use the term of preciseness here at all -, otherwise, they could simply parse RIR data on their own. Therefore, we have to somehow make do with this. Possible options would be as follows:
(a) We do not process such networks entirely. If a network operator wants to have his/her network covered by libloc, he/she/it should kindly fix it's RIR data.
That would not prevent us from obtaining announcements for such networks, but we would not label them with any country anymore.
This is a very bad proposal. Just because we do not have 100% confidence in the data doesn’t mean we have to drop it.
We generally can only trust the people who put the data in and that for me does not have 100% confidence. With that logic, the database would be empty.
I would also assume that someone tried to do a good job here and list all countries where infrastructure for this network is located in. That is sometimes difficult to say when you have a CDN because there are many POPs and they are probably all organised as anycasts - a very common method these days.
(b) We try to automatically determine meaningful codes in each case.
This is tricky and not very deterministic. What about a network having "CY" and "TR" set? Would that be covered by "EU"?
Good question. I would say yes. A continent is a good approximation.
Worse would be DE and JP. Or CN and CH. You simply cannot group them together with this logic. But I suppose there wouldn’t be too many examples like this. That there are networks spread over Europe is much more common because Europe is densely populated and because of a unified legal system and easy trade, putting servers into many countries isn’t an issue at all. We do this without even thinking about it. We just put them where our users are or where it is cheapest.
213.230.255.0/24 seems to be used worldwide, but in my point of view, this is not sufficient to classify it as an anycast network. Worse, we have or should assign a country code to anycast networks as well.
What is it, if not that? It technically could not be split into smaller networks. It is already /24.
(c) We try to determine the jurisdiction of a networks' organisation handle.
Frankly, I have no idea what problems would arise in this case. If an organisation fails to provide accurate and meaningful RIR data, what will their organisation handle possibly look like?
In the Limelight example from above, I would say US is what I would almost expect with the anycast bit set.
Trying to keep things deterministic, (a) is my current favorite - it is the most brutal, though.
No, this is not acceptable at all.
I would be much happier with either deciding this with an override on a case by case basis, but that would be a lot of work.
Do you see a better way of dealing with such networks?
Another automated way is to mark these networks with a “confidence” value - however that is being determined. We would then require applications to consider this value. The downside is that there cannot be much done apart from dropping the network when it is below a certain threshold or use it when confidence is high enough. That could be different for different applications and make it more difficult to implement libloc.
@All: Thoughts? Comments? Opinions?
Thanks, and best regards, Peter Müller
-Michael