Hello location folks,
since I happen to be the "master of disaster" when it comes to the actual contents of the location database, I observe networks whose country code is inaccurate on an almost daily basis.
Once a week (or in a similar interval), I batch them into a single commit and send them to this list, as you probably all have noticed by now.
Besides things like being an anonymous proxy, a satellite dial-up network, or an anycasted network service (which all cannot be very reliably determined automatically), there is a significant amount of inaccurate countries - on purpose or by chance, such as bypassing location-based firewall rules.
Since it has never been clarified whether that country refers to the actual physical location of the machines behind that network or its jurisdiction, we just take the country information we are able to obtain, and do not make any legal/political/... decisions there on our own. We simply have no alternative.
However, there are some examples of forging country information (let's put it that way) I would like to make stop:
(a) Country codes of unpopulated islands The Bouvet Island in the Atlantic Ocean is a notable example of a ISO-3166-x country code (BV) which should not show up in RIR data: Literally nobody is using IP networks on that island, since it has no inhabitants who might use it. Same goes for AQ (Antarctica): While there might be some scientists there, they would probably not delegate a /24 or a /48 to this country - in fact, I would be surprised to hear that anything else than a satellite uplink is possible there.
(b) Networks registered to offshore letterbox companies Especially folks with a higher need of privacy (not necessarily malicious, although there are some bulletproof ISPs or professional IP hijackers in this group) register IP networks to letterbox companies in offshore locations, such as the Seychelles (SC). What annoys me here is some of those operators put int obviously unhelpful country information into their networks, so we have things like a network used in Romania (RO), but, according to it's RIR data, is "located" in an offshore jurisdiction. We have agreed to correct such data manually, as they do not provide helpful information to our users. This usually involves running "mtr" against that network and try to find out which country we end up in - thanks to all backbone operators who care about informative PTRs for their core infrastructure. :-)
(c) Highly obvious example of forged country information A popular example of this category is a Russian ISP tagging it's networks as being located in the United States (US), since there is no reasonable way of blocking traffic from or to this country entirely. I am currently not aware of an opposite example. While this is not that easy to spot if the original and forged country are geographically located close to each other (e.g.: Russia and Finland), we might be able to automate this by running traceroutes into those networks, and try to find out which countries we traverse - if they are located in a completely different continent than we expect the network to be located in, things start to look suspicious.
At the time of writing, my ideas about detecting those cases automatically are as follows:
(a) Reject or flag networks located in certain countries automatically In my humble opinion, it is safe to consider any network located in BV as being suspicious. We could flag those to indicate they probably need further investigation (preferred), or reject them entirely while building our database.
(b) Determine contact details for networks in certain countries and flag known letterbox companies At least for some offshore locations (such as Seychelles [SC] or Panama [PA]), some networks located there are indeed used by local telcos, so we cannot apply (a) as such to those countries. However, a bunch of letterbox companies tends to trace back to the same postal address, so we might be able to detect those networks with a reasonable sensitivity by check their contact details against lists of known letterboxes (the ICIJ offshore leaks come to mind). This, however, is time-consuming and does not work for regions covered by LACNIC and ARIN.
(c) Try to detect traceroute anomalies I've discussed this idea with Michael several times on the phone, and while it is certainly not an elegant solution, I have not came up with a better solution at the time of writing. The technical setup would be as follows: - An unmetered server runs a traceroute against a given networks. - The hops are parsed into a list of IP addresses. - The location results for the last two or three hops are resolved. In case they are not part of the network in question (possibly not even the same AS) and differ completely from what we are expecting (i.e. a different country), the network is being flagged.
For the records, I wanted to document those ideas here. In case anybody is willingly to discuss them, has a better idea or some comments on his/her own, I would be delighted to hear of them.
Thanks, and best regards, Peter Müller
P.S.: At the time of writing, the "AQ" and "BV" networks in question are as follows:
location=# SELECT * FROM networks WHERE country = 'AQ' ORDER BY family(network), masklen(network); network | country ---------------------+--------- 185.192.56.0/22 | AQ 202.144.198.0/23 | AQ 139.28.204.0/23 | AQ 156.0.201.0/24 | AQ 185.121.177.0/24 | AQ 2a0a:2840::/30 | AQ 2a0a:2844::/30 | AQ 2a0e:46c6::/40 | AQ 2a07:1c44:1000::/40 | AQ 2a07:1c44:6700::/40 | AQ 2a0a:2846:230::/44 | AQ 2a0e:b107:d10::/44 | AQ 2a0c:3b80:4151::/48 | AQ 2a0c:3b80:6171::/48 | AQ 2a0c:b641:21e::/48 | AQ 2a0d:1a40:cafe::/48 | AQ 2a0d:1a45:128::/48 | AQ 2a0d:1a45:3213::/48 | AQ 2a0e:b107:b7f::/48 | AQ 2a05:dfc7:5::/48 | AQ 2a05:dfc7:5353::/48 | AQ 2a05:dfc7:beef::/48 | AQ 2a05:dfc7:dfc7::/48 | AQ 2a05:dfc7:dfc8::/48 | AQ 2a06:1e86:5555::/48 | AQ 2a07:a905:ffe8::/48 | AQ 2a0e:b107:198::/48 | AQ (27 rows)
location=# SELECT * FROM networks WHERE country = 'BV' ORDER BY family(network), masklen(network); network | country ---------------------+--------- 185.193.126.0/23 | BV 80.78.16.0/24 | BV 185.193.124.0/24 | BV 185.193.125.0/24 | BV 2a0c:3b80:4256::/48 | BV 2a0c:3b80:6276::/48 | BV (6 rows)