Hi,
On Tue, 2018-01-23 at 17:33 +0100, Peter Müller wrote:
Hello,
just a quick status update about the location database ("libloc"). I did some research about how (and where) to gather the data for it the other day.
Cool.
Basically, we just need these information:
- IPv4/IPv6 network range
- assigned country code
- Autonomous System Number (ASN) the network belongs to
- description of the network [helpful, but optional in first place]
All in all not really difficult, but putting it all together is a bit.
The RIPE provides a documentation about the different files on their FTP server here: https://www.ripe.net/manage-ips-and-asns/db/support/documentation/update-ref... The database (either split or en bloc) can be downloaded here: https://ftp.ripe.net/ripe/dbase/
I am still a bit unsure how you will put the two things together here since you only have a list of prefixes, but the ASN is not next to it.
There is another list with all the ASes but where is the connection to put this together?
According to the first link, the files ripe.db.inetnum and ripe.db.inet6num contain the allocations of the IP address space. This does not mean that they are actually announced via BGP at the moment, but we agreed not to care about that since a network might be down for maintenance or some other reasons.
We don't care at all what has been announced over BGP and what not.
However, only the network ranges and country codes can be extracted from inet(6)num directly. It seems like we have to look at the route database for anything else (ASN, description), which is located at ripe.db.route.
Yes, that is where the connection is. Or at least some.
The network range needs to be converted to CIDR first, since the formats differ between inet(6)num and route. But since an AS might announce bigger ranges (say a /15) than the network's size itself (i.e. /29), it might make sense to extract the route announcements in fist place. We already know ASN and some descriptions then.
We don't need to take care about aggregating prefixes for now. I suppose that is something that would make the database a little bit smaller but shouldn't be our priority for now.
The BGP routers will try to aggregate routes as well before they push them out to another neighbor which is supposed to keep the routing table as small as possible.
After that, inet(6)num can be parsed to gather networks in that range, and to extract their actual network range and country code. I think it makes sense to keep the description of the AS set here.
Keep the description of the AS with the network?
The last two steps consist in checking if either the complete ASN or a certain network is listed in one of the override files. Finally, the stuff can be dumped into the libloc database.
That is not necessary I think. We can just add those as their own network objects and they will either overlay a network or be a smaller network in a bigger one. We will be able to find them because their country is A1, A2 or A3, etc. That should do I suppose.
If not, we shall worry about that when we are ready to import the RIR databases :)
Apart from LACNIC (Latin America and Caribbean), every RIR offers a complete database for download:
- ARIN: https://ftp.arin.net/pub/rr/arin.db
- AFRINIC: https://ftp.afrinic.net/pub/dbase/afrinic.db.gz
- APNIC: https://ftp.apnic.net/apnic/whois/ (split DB only)
- RIPE: https://ftp.ripe.net/ripe/dbase/ripe.db.gz
LACNIC only provides a list of IP ranges (and decimal encoded subnets - *ugh*) and country codes over here: http://ftp.lacnic.net/pub/stats/lacnic/delegated-lacnic-extended-latest I have no idea at the moment how to enumerate ASN and ASdescr for them...
That's not very nice to not follow the standard here. I suppose we can only walk through the list and query their whois server for each line. That might takes ages, but it is the only option I can see at the moment.
Overall, the RIPE DB looks quite tidy, but we always have a huge number of side effects (one network range is announced by two ASNs, ...) making thinks ugly.
We don't care if it is being announced by multiple AS. It will only be in the RIR databases once, unless there are mistakes.
I tried to write a Python script for the jobs listed above, but eventually failed since my programming skills are very limited and I needed some layer (perhaps a SQLite DB?) to store partial information. Unfortunately, it seems like that job is much more complicated than I initially thought.
You might be fine with python dictionaries, but it might be that it will take some GBs of memory. The sqlite database will solve this problem but will make writing things to it a bit harder because it doesn't have a data type for IP addresses and networks.
Does anybody have an idea about how to store incomplete information here? Let's say, we already know ASN, description and CIDR, but neither all possible networks in that range nor their country...
?
Example?
Any help is highly appreciated. Thanks in advance.
Best regards, Peter Müller
Best, -Michael