From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH 01/10] importer: Store geofeed URLs from RIR data
Date: Sat, 29 Oct 2022 12:43:59 +0100 [thread overview]
Message-ID: <4F03F877-7419-4CB7-9E7E-A8C1A822CE34@ipfire.org> (raw)
In-Reply-To: <cfe8e9b0-1cf3-692b-96da-fb012a983aca@ipfire.org>
[-- Attachment #1: Type: text/plain, Size: 4411 bytes --]
Hello Peter,
> On 28 Oct 2022, at 21:29, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> Hello Michael,
>
> above all, thank you very much for the patchset and all the work behind it.
>
> Unfortunately, as briefly discussed via the phone already, I have some general
> concerns regarding geofeeds:
>
> (a) In contrast to RIRs, I do not see geofeed providers as trustworthy source.
> While the former are not trustworthy in terms of the data they provide (since
> no vetting or QA of database changes is usually conducted, and it does not look
> to me like this is going to change soon), at least their infrastructure is:
> It seems reasonable to me to trust, for example, RIPE's FTP server to serve
> the same database files regardless of the client requesting it. For some of
> them, we could even verify that through file signature validation, assuming that
> it is too costly to do live GPG-signing at scale.
>
> Geofeed URLs, in contrast, can lead to anywhere, and I would not be surprised
> at all to see dubious ISPs serving different geofeeds to different clients.
> Given that our IP address ranges are public and static, and libloc reveals itself
> through the User-Agent HTTP header, it would be quite easy to serve us a geofeed
> that tampers with data, while playing innocent to other clients.
>
> In addition, many of the 215 geofeed URLs that are currently live (attached) point
> to services such as Google Docs or GitHub - both don't strike me as reliable sources
> in terms of persistence. Generally, we have the full problem of URL/domain rot again. :-(
>
> One could argue that these points (to a certain extend) hold true for RIRs as
> well. However, if we cannot trust them, it's curtains for libloc either way. :-)
> Some random ISPs trying to make us consuming geolocation data from random URLs,
> on the other hand, poses a greater risk than benefit to the quality of the
> location database.
I see your point, but I disagree.
The RIR databases are self-assessment, too. People can put whatever they want in there and it is not being checked by anyone.
The only thing that you might have in favour of your argument is that there is a better paper trail of any changes than the geo feeds. Those can be changed - even randomly generated. But I believe that we have in both cases no chance to verify any data.
Malicious players will fake their location even in the RIR databases.
What I would suggest as a minimum is to select at least a couple of “trusted” or very large sources that we maintain manually. There are a couple of cloud providers which use Geofeeds and we would quite likely improve the quality of the data for them.
> Which brings me directly to the next point...
>
> (b) Presumed we still agree on not being more precise than /24 or /48, all
> the information geofeeds provide could (should?) have been in the RIR databases
> as well.
>
> The only exception is ARIN, but since we do not get their raw database, we won't
> be able to consume any geofeed URLs in it. So, for the area where we lack accuracy
> of geolocation information most, geofeed won't help us. And for all the other RIRs
> (LACNIC included, for which we process an additional geolocation database feed
> already), the geofeeds ideally should not contain any new information to us.
Why should we not process anything smaller than those prefixes? It wouldn’t hurt us at all.
> Earlier today, I created a location database text dump on location02 with and without
> the geofeed patchset applied. The diff can be retrieved from https://people.ipfire.org/~pmueller/location-database-geofeed-diff.tar.gz,
> and is rather massive, partly because CIDRs smaller than /24 resp. /48 are yet to
> be ignored by the geofeed processing routines.
>
> I have yet to assess the diff closely, but for a superficial analysis, it appears
> like geofeed introduces a lot of changes that could have been in the respective RIR
> databases as well. The fact that they are not there does not inspire confidence.
>
> Apologies for this rather disappointing feedback, and best regards,
> Peter Müller<20221028_live_geofeeds.txt>
Well, I don’t think this is disappointing. Technically I suspect that you are happy with the code.
We now just need to figure out where to use it and where to not use it.
Best,
-Michael
prev parent reply other threads:[~2022-10-29 11:43 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-27 16:48 Michael Tremer
2022-09-27 16:48 ` [PATCH 02/10] importer: Add command to import geofeeds into the database Michael Tremer
2022-09-27 16:48 ` [PATCH 03/10] importer: Just fetch any exception from the executor Michael Tremer
2022-09-27 16:48 ` [PATCH 04/10] importer: Sync geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 05/10] importer: Use geofeeds for country assignment Michael Tremer
2022-09-27 16:48 ` [PATCH 06/10] importer: Use a GIST index for networks from geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 07/10] importer: Add a search index match geofeed networks quicker Michael Tremer
2022-09-27 16:48 ` [PATCH 08/10] importer: Fix reading Geofeeds from remarks Michael Tremer
2022-09-27 16:48 ` [PATCH 09/10] importer: Ensure that we only use HTTPS URLs for Geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 10/10] importer: Validate country codes from Geofeeds Michael Tremer
2022-10-28 20:29 ` [PATCH 01/10] importer: Store geofeed URLs from RIR data Peter Müller
2022-10-29 11:43 ` Michael Tremer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F03F877-7419-4CB7-9E7E-A8C1A822CE34@ipfire.org \
--to=michael.tremer@ipfire.org \
--cc=location@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox