Hello Peter, > On 28 Oct 2022, at 21:29, Peter Müller wrote: > > Hello Michael, > > above all, thank you very much for the patchset and all the work behind it. > > Unfortunately, as briefly discussed via the phone already, I have some general > concerns regarding geofeeds: > > (a) In contrast to RIRs, I do not see geofeed providers as trustworthy source. > While the former are not trustworthy in terms of the data they provide (since > no vetting or QA of database changes is usually conducted, and it does not look > to me like this is going to change soon), at least their infrastructure is: > It seems reasonable to me to trust, for example, RIPE's FTP server to serve > the same database files regardless of the client requesting it. For some of > them, we could even verify that through file signature validation, assuming that > it is too costly to do live GPG-signing at scale. > > Geofeed URLs, in contrast, can lead to anywhere, and I would not be surprised > at all to see dubious ISPs serving different geofeeds to different clients. > Given that our IP address ranges are public and static, and libloc reveals itself > through the User-Agent HTTP header, it would be quite easy to serve us a geofeed > that tampers with data, while playing innocent to other clients. > > In addition, many of the 215 geofeed URLs that are currently live (attached) point > to services such as Google Docs or GitHub - both don't strike me as reliable sources > in terms of persistence. Generally, we have the full problem of URL/domain rot again. :-( > > One could argue that these points (to a certain extend) hold true for RIRs as > well. However, if we cannot trust them, it's curtains for libloc either way. :-) > Some random ISPs trying to make us consuming geolocation data from random URLs, > on the other hand, poses a greater risk than benefit to the quality of the > location database. I see your point, but I disagree. The RIR databases are self-assessment, too. People can put whatever they want in there and it is not being checked by anyone. The only thing that you might have in favour of your argument is that there is a better paper trail of any changes than the geo feeds. Those can be changed - even randomly generated. But I believe that we have in both cases no chance to verify any data. Malicious players will fake their location even in the RIR databases. What I would suggest as a minimum is to select at least a couple of “trusted” or very large sources that we maintain manually. There are a couple of cloud providers which use Geofeeds and we would quite likely improve the quality of the data for them. > Which brings me directly to the next point... > > (b) Presumed we still agree on not being more precise than /24 or /48, all > the information geofeeds provide could (should?) have been in the RIR databases > as well. > > The only exception is ARIN, but since we do not get their raw database, we won't > be able to consume any geofeed URLs in it. So, for the area where we lack accuracy > of geolocation information most, geofeed won't help us. And for all the other RIRs > (LACNIC included, for which we process an additional geolocation database feed > already), the geofeeds ideally should not contain any new information to us. Why should we not process anything smaller than those prefixes? It wouldn’t hurt us at all. > Earlier today, I created a location database text dump on location02 with and without > the geofeed patchset applied. The diff can be retrieved from https://people.ipfire.org/~pmueller/location-database-geofeed-diff.tar.gz, > and is rather massive, partly because CIDRs smaller than /24 resp. /48 are yet to > be ignored by the geofeed processing routines. > > I have yet to assess the diff closely, but for a superficial analysis, it appears > like geofeed introduces a lot of changes that could have been in the respective RIR > databases as well. The fact that they are not there does not inspire confidence. > > Apologies for this rather disappointing feedback, and best regards, > Peter Müller<20221028_live_geofeeds.txt> Well, I don’t think this is disappointing. Technically I suspect that you are happy with the code. We now just need to figure out where to use it and where to not use it. Best, -Michael