Hello,
On 18 Jan 2022, at 21:10, Peter Müller peter.mueller@ipfire.org wrote:
Hello nusenu, hello Michael,
Since you apparently don't like this data source and I always thought RIPEstat has pretty good data quality: Would you mind sharing your opinion on this?
sorry for not replying on this sooner. Actually, I do not like or dislike RIPEstat; I just did not have sufficient time to made myself an educated opinion on this.
At the moment, things are quite packed on my end, but that will hopefully over at the beginning of February. So, this is not forgotten or silently discarded, but just a very tardy reply due to my "load average"...
@Peter: Do you want to look into extracting information from this?
Yes.
Without looking at the amount of queries we'd probably need to do: Do you think this makes sense while running the location-importer, or should this become a dedicated script, which we can run in the background all the time, so it won't slow down the daily generation of the actual database.
In case of the latter, we could actually do some scraping on the ARIN AS names, too. Since we keep track of their source, this should not be too hard, and if we get some more human-readable names for some of them, it might be worth the effort.
I thought we were talking about parsing an HTML table. That should not be a process that is either complicated nor something I would call scraping.
Scraping is what I would consider sending one request per piece of information you would want to obtain and this is always a bad idea. It is slow, it has a lot of overhead on our side and of course on the server side - this is not what a good citizen of the internet would do. So I would be against this. Most places have this excluded in their t&cs for exactly this reason.
-Michael
Thanks, and best regards, Peter Müller