It seems there has been a significant reduction in the visibility of certain routes in the IPFire dataset recently, and/or spurious routes taking place of legitimate announcements.
For example, 209.38.204.47 is originated by AS14061 and announced in 209.38.192.0/19, but IPFire erroneously sees 209.38.0.0/16 and locates no corresponding AS.
Here are a few more prefixes with the same issue:
+----------+------------------+-----------------+------------+ | ASN | Prefix | IPFire Prefix | IPFire ASN | +----------+------------------+-----------------+------------+ | AS64425 | 45.148.122.0/24 | 45.148.120.0/22 | None | | AS197540 | 94.16.116.0/22 | 94.16.96.0/19 | None | | AS36352 | 198.23.234.0/23 | 198.23.128.0/17 | None | | AS63949 | 172.105.192.0/19 | 172.104.0.0/15 | None | +----------+------------------+-----------------+------------+
Thanks so much for all your hard work,
-- Jordan
Hello Jordan,
Thank you very much for your email. This has really helped me.
Recently I have been rolling out a lot of changes that massively improve the database. I have not been 100% confident that all problems have been solved, however I did not have and indication that things went wrong.
One of the changes was a deduplication algorithm that is supposed to remove any subnets from the database we don’t need. That works well.
The second change is to merge neighbouring subnets (e.g. 10.0.0.0/24 and 10.0.1.0/24 could be stored as 10.0.0.0/23). That would save a lot of space, too. That algorithm relied on a function to count the bit length of an IP address (i.e. the minimum number of bits I need to represent a certain IP address). That function was total garbage. No idea how that didn’t cause bigger damage.
The fix is now here:
https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=edbf280e6e043ea...
I have published an updated version of the database with the fix:
https://git.ipfire.org/?p=location/location-database.git;a=commitdiff;h=16e9...
We now have data for the networks that you have listed (as expected):
root@michael:~# location lookup 209.38.204.47 45.148.122.0 94.16.116.0 198.23.234.0 172.105.192.0 209.38.204.47: Network : 209.38.192.0/18 Country : United States of America Autonomous System : AS14061 - DigitalOcean, LLC 45.148.122.0: Network : 45.148.122.0/24 Country : Netherlands Autonomous System : AS64425 - SKB Enterprise B.V. Hostile Network safe to drop: yes 94.16.116.0: Network : 94.16.112.0/21 Country : Germany Autonomous System : AS197540 - netcup GmbH 198.23.234.0: Network : 198.23.224.0/20 Country : Canada Autonomous System : AS36352 - HostPapa 172.105.192.0: Network : 172.105.0.0/16 Country : United States of America Autonomous System : AS63949 - Akamai Technologies, Inc.
This all matches the table that you have drawn.
So, thank you very much for helping me find this bug. It is solved!
Best, -Michael
P.S. Out of my own curiosity, may I ask what your application for this database is?
On 22 Mar 2024, at 04:03, Jordan Savoca jsavoca@posteo.net wrote:
It seems there has been a significant reduction in the visibility of certain routes in the IPFire dataset recently, and/or spurious routes taking place of legitimate announcements.
For example, 209.38.204.47 is originated by AS14061 and announced in 209.38.192.0/19, but IPFire erroneously sees 209.38.0.0/16 and locates no corresponding AS.
Here are a few more prefixes with the same issue:
+----------+------------------+-----------------+------------+ | ASN | Prefix | IPFire Prefix | IPFire ASN | +----------+------------------+-----------------+------------+ | AS64425 | 45.148.122.0/24 | 45.148.120.0/22 | None | | AS197540 | 94.16.116.0/22 | 94.16.96.0/19 | None | | AS36352 | 198.23.234.0/23 | 198.23.128.0/17 | None | | AS63949 | 172.105.192.0/19 | 172.104.0.0/15 | None | +----------+------------------+-----------------+------------+
Thanks so much for all your hard work,
-- Jordan
On Fri Mar 22, 2024 at 9:09 AM MST, Michael Tremer wrote:
Hello Jordan,
Thank you very much for your email. This has really helped me.
Recently I have been rolling out a lot of changes that massively improve the database. I have not been 100% confident that all problems have been solved, however I did not have and indication that things went wrong.
One of the changes was a deduplication algorithm that is supposed to remove any subnets from the database we don’t need. That works well.
The second change is to merge neighbouring subnets (e.g. 10.0.0.0/24 and 10.0.1.0/24 could be stored as 10.0.0.0/23). That would save a lot of space, too. That algorithm relied on a function to count the bit length of an IP address (i.e. the minimum number of bits I need to represent a certain IP address). That function was total garbage. No idea how that didn’t cause bigger damage.
The fix is now here:
https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=edbf280e6e043ea...
I have published an updated version of the database with the fix:
https://git.ipfire.org/?p=location/location-database.git;a=commitdiff;h=16e9...
We now have data for the networks that you have listed (as expected):
Awesome, thank you so much for the quick turnaround on a fix, much appreciated! :)
I hope you have a great weekend!
-- Jordan
On Fri Mar 22, 2024 at 9:09 AM MST, Michael Tremer wrote:
P.S. Out of my own curiosity, may I ask what your application for this database is?
Oops, I hadn't replied to your question! I've written a tiny WHOIS[1] server which returns AS information for addresses and hostnames using a dataset (presently) derived from the IPFire git sets; I hope to use the location libraries in future, however. ^^
The second application[2] is a set of HTML documents generated from onionoo[3] over at the Tor Project, which relies on IPFire as well. I'd noticed some historical discrepancies in announcement information, which was the impetus for my reaching out.
Thank you again for the quick fix!
[1]: https://git.jordan.im/asn/ [2]: https://git.jordan.im/allium/ [3]: https://metrics.torproject.org/onionoo.html
-- Jordan
Hello Jordan,
On 22 Mar 2024, at 16:27, Jordan Savoca jsavoca@posteo.net wrote:
On Fri Mar 22, 2024 at 9:09 AM MST, Michael Tremer wrote:
P.S. Out of my own curiosity, may I ask what your application for this database is?
Oops, I hadn't replied to your question! I've written a tiny WHOIS[1] server which returns AS information for addresses and hostnames using a dataset (presently) derived from the IPFire git sets; I hope to use the location libraries in future, however. ^^
Very cool.
Any reason why you are parsing the text file instead of using our Python bindings?
The text file isn’t flat so you cannot only search to the first match, but since the binary database is organised as a tree, a search will be a lot faster and accurate. The bindings are packaged for Fedora, Debian and a couple of others.
If you want to have all networks that belong to a specific AS, there is a way to search for them having the library walk through the entire tree. That should be super fast.
The second application[2] is a set of HTML documents generated from onionoo[3] over at the Tor Project, which relies on IPFire as well. I'd noticed some historical discrepancies in announcement information, which was the impetus for my reaching out.
Yeah, that probably makes sense since I recently added the randomiser :)
Thank you again for the quick fix!
No worries. As mentioned we had a couple of outstanding issues and I believe that they are now all solved which will help us pave the way for a 1.0 release.
Great to see that the database is making its way into projects everywhere :)
-- Jordan
On Fri Mar 22, 2024 at 9:34 AM MST, Michael Tremer wrote:
Very cool.
Any reason why you are parsing the text file instead of using our Python bindings?
No reason beyond a highly unproductive preference for few dependencies and regrettable fondness for quick parsing scripts. :P
My production systems are Alpine-based and I tend to use shared systems for development where there's considerable resistance to installing dependencies system-wide, so it's usually easiest to use the packages already available on the system or those easily installable in userland.
The text file isn’t flat so you cannot only search to the first match, but since the binary database is organised as a tree, a search will be a lot faster and accurate. The bindings are packaged for Fedora, Debian and a couple of others.
If you want to have all networks that belong to a specific AS, there is a way to search for them having the library walk through the entire tree. That should be super fast.
This is a good point, though. Ideally I wouldn't be systematically discarding swaths of announcement information and incurring performance costs during queries for more or less no reason.
No worries. As mentioned we had a couple of outstanding issues and I believe that they are now all solved which will help us pave the way for a 1.0 release.
Great to see that the database is making its way into projects everywhere :)
Definitely! Looking forward to the 1.0 release. :) Thanks again.
-- Jordan
Hey,
On 22 Mar 2024, at 16:54, Jordan Savoca jsavoca@posteo.net wrote:
On Fri Mar 22, 2024 at 9:34 AM MST, Michael Tremer wrote:
Very cool.
Any reason why you are parsing the text file instead of using our Python bindings?
No reason beyond a highly unproductive preference for few dependencies and regrettable fondness for quick parsing scripts. :P
My production systems are Alpine-based and I tend to use shared systems for development where there's considerable resistance to installing dependencies system-wide, so it's usually easiest to use the packages already available on the system or those easily installable in userland.
The text file isn’t flat so you cannot only search to the first match, but since the binary database is organised as a tree, a search will be a lot faster and accurate. The bindings are packaged for Fedora, Debian and a couple of others.
If you want to have all networks that belong to a specific AS, there is a way to search for them having the library walk through the entire tree. That should be super fast.
This is a good point, though. Ideally I wouldn't be systematically discarding swaths of announcement information and incurring performance costs during queries for more or less no reason.
Yeah, I have spent a lot of time to make the search really really fast. So it would be good to see that code in hefty use :)
No worries. As mentioned we had a couple of outstanding issues and I believe that they are now all solved which will help us pave the way for a 1.0 release.
Great to see that the database is making its way into projects everywhere :)
Definitely! Looking forward to the 1.0 release. :) Thanks again.
-- Jordan