From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH 01/10] importer: Store geofeed URLs from RIR data Date: Sat, 29 Oct 2022 12:43:59 +0100 Message-ID: <4F03F877-7419-4CB7-9E7E-A8C1A822CE34@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7451673321197510749==" List-Id: --===============7451673321197510749== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Peter, > On 28 Oct 2022, at 21:29, Peter M=C3=BCller wr= ote: >=20 > Hello Michael, >=20 > above all, thank you very much for the patchset and all the work behind it. >=20 > Unfortunately, as briefly discussed via the phone already, I have some gene= ral > concerns regarding geofeeds: >=20 > (a) In contrast to RIRs, I do not see geofeed providers as trustworthy sour= ce. > While the former are not trustworthy in terms of the data they provide (sin= ce > no vetting or QA of database changes is usually conducted, and it does not = look > to me like this is going to change soon), at least their infrastructure is: > It seems reasonable to me to trust, for example, RIPE's FTP server to serve > the same database files regardless of the client requesting it. For some of > them, we could even verify that through file signature validation, assuming= that > it is too costly to do live GPG-signing at scale. >=20 > Geofeed URLs, in contrast, can lead to anywhere, and I would not be surpris= ed > at all to see dubious ISPs serving different geofeeds to different clients. > Given that our IP address ranges are public and static, and libloc reveals = itself > through the User-Agent HTTP header, it would be quite easy to serve us a ge= ofeed > that tampers with data, while playing innocent to other clients. >=20 > In addition, many of the 215 geofeed URLs that are currently live (attached= ) point > to services such as Google Docs or GitHub - both don't strike me as reliabl= e sources > in terms of persistence. Generally, we have the full problem of URL/domain = rot again. :-( >=20 > One could argue that these points (to a certain extend) hold true for RIRs = as > well. However, if we cannot trust them, it's curtains for libloc either way= . :-) > Some random ISPs trying to make us consuming geolocation data from random U= RLs, > on the other hand, poses a greater risk than benefit to the quality of the > location database. I see your point, but I disagree. The RIR databases are self-assessment, too. People can put whatever they want= in there and it is not being checked by anyone. The only thing that you might have in favour of your argument is that there i= s a better paper trail of any changes than the geo feeds. Those can be change= d - even randomly generated. But I believe that we have in both cases no chan= ce to verify any data. Malicious players will fake their location even in the RIR databases. What I would suggest as a minimum is to select at least a couple of =E2=80=9C= trusted=E2=80=9D or very large sources that we maintain manually. There are a= couple of cloud providers which use Geofeeds and we would quite likely impro= ve the quality of the data for them. > Which brings me directly to the next point... >=20 > (b) Presumed we still agree on not being more precise than /24 or /48, all > the information geofeeds provide could (should?) have been in the RIR datab= ases > as well. >=20 > The only exception is ARIN, but since we do not get their raw database, we = won't > be able to consume any geofeed URLs in it. So, for the area where we lack a= ccuracy > of geolocation information most, geofeed won't help us. And for all the oth= er RIRs > (LACNIC included, for which we process an additional geolocation database f= eed > already), the geofeeds ideally should not contain any new information to us. Why should we not process anything smaller than those prefixes? It wouldn=E2= =80=99t hurt us at all. > Earlier today, I created a location database text dump on location02 with a= nd without > the geofeed patchset applied. The diff can be retrieved from https://people= .ipfire.org/~pmueller/location-database-geofeed-diff.tar.gz, > and is rather massive, partly because CIDRs smaller than /24 resp. /48 are = yet to > be ignored by the geofeed processing routines. >=20 > I have yet to assess the diff closely, but for a superficial analysis, it a= ppears > like geofeed introduces a lot of changes that could have been in the respec= tive RIR > databases as well. The fact that they are not there does not inspire confid= ence. >=20 > Apologies for this rather disappointing feedback, and best regards, > Peter M=C3=BCller<20221028_live_geofeeds.txt> Well, I don=E2=80=99t think this is disappointing. Technically I suspect that= you are happy with the code. We now just need to figure out where to use it and where to not use it. Best, -Michael --===============7451673321197510749==--