public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
* [PATCH 01/10] importer: Store geofeed URLs from RIR data
@ 2022-09-27 16:48 Michael Tremer
  2022-09-27 16:48 ` [PATCH 02/10] importer: Add command to import geofeeds into the database Michael Tremer
                   ` (9 more replies)
  0 siblings, 10 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 2657 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 40 +++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 9faf23b..5bd5da3 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -182,6 +182,11 @@ class CLI(object):
 				CREATE INDEX IF NOT EXISTS networks_family ON networks USING BTREE(family(network));
 				CREATE INDEX IF NOT EXISTS networks_search ON networks USING GIST(network inet_ops);
 
+				-- geofeeds
+				CREATE TABLE IF NOT EXISTS network_geofeeds(network inet, url text);
+				CREATE UNIQUE INDEX IF NOT EXISTS network_geofeeds_unique
+					ON network_geofeeds(network);
+
 				-- overrides
 				CREATE TABLE IF NOT EXISTS autnum_overrides(
 					number bigint NOT NULL,
@@ -799,6 +804,16 @@ class CLI(object):
 
 				inetnum[key].append(val)
 
+			# Parse the geofeed attribute
+			elif key == "geofeed":
+				inetnum["geofeed"] = val
+
+			# Parse geofeed when used as a remark
+			elif key == "remark":
+				m = re.match(r"^(?:geofeed|Geofeed)\s+(https://.*)", val)
+				if m:
+					inetnum["geofeed"] = m.group(1)
+
 		# Skip empty objects
 		if not inetnum or not "country" in inetnum:
 			return
@@ -810,7 +825,6 @@ class CLI(object):
 		# them into the database, if _check_parsed_network() succeeded
 		for single_network in inetnum.get("inet6num") or inetnum.get("inetnum"):
 			if self._check_parsed_network(single_network):
-
 				# Skip objects with unknown country codes if they are valid to avoid log spam...
 				if validcountries and invalidcountries:
 					log.warning("Skipping network with bogus countr(y|ies) %s (original countries: %s): %s" % \
@@ -823,6 +837,30 @@ class CLI(object):
 					"%s" % single_network, inetnum.get("country")[0], inetnum.get("country"), source_key,
 				)
 
+				# Update any geofeed information
+				geofeed = inetnum.get("geofeed", None)
+
+				# Store/update any geofeeds
+				if geofeed:
+					self.db.execute("""
+						INSERT INTO
+							network_geofeeds(
+								network,
+								url
+							)
+						VALUES(
+							%s, %s
+						)
+						ON CONFLICT (network) DO
+							UPDATE SET url = excluded.url""",
+						"%s" % single_network, geofeed,
+					)
+
+				# Delete any previous geofeeds
+				else:
+					self.db.execute("DELETE FROM network_geofeeds WHERE network = %s",
+						"%s" % single_network)
+
 	def _parse_org_block(self, block, source_key):
 		org = {}
 		for line in block:
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 02/10] importer: Add command to import geofeeds into the database
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 03/10] importer: Just fetch any exception from the executor Michael Tremer
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 5281 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 143 +++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 5bd5da3..ddec376 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -18,6 +18,7 @@
 ###############################################################################
 
 import argparse
+import concurrent.futures
 import ipaddress
 import json
 import logging
@@ -95,6 +96,11 @@ class CLI(object):
 		update_announcements.add_argument("server", nargs=1,
 			help=_("Route Server to connect to"), metavar=_("SERVER"))
 
+		# Update geofeeds
+		update_geofeeds = subparsers.add_parser("update-geofeeds",
+			help=_("Update Geofeeds"))
+		update_geofeeds.set_defaults(func=self.handle_update_geofeeds)
+
 		# Update overrides
 		update_overrides = subparsers.add_parser("update-overrides",
 			help=_("Update overrides"),
@@ -183,6 +189,25 @@ class CLI(object):
 				CREATE INDEX IF NOT EXISTS networks_search ON networks USING GIST(network inet_ops);
 
 				-- geofeeds
+				CREATE TABLE IF NOT EXISTS geofeeds(
+					id serial primary key,
+					url text,
+					status integer default null,
+					updated_at timestamp without time zone default null
+				);
+				CREATE UNIQUE INDEX IF NOT EXISTS geofeeds_unique
+					ON geofeeds(url);
+				CREATE TABLE IF NOT EXISTS geofeed_networks(
+					geofeed_id integer references geofeeds(id) on delete cascade,
+					network inet,
+					country text,
+					region text,
+					city text
+				);
+				CREATE INDEX IF NOT EXISTS geofeed_networks_geofeed_id
+					ON geofeed_networks(geofeed_id);
+				CREATE INDEX IF NOT EXISTS geofeed_networks_search
+					ON geofeed_networks(network);
 				CREATE TABLE IF NOT EXISTS network_geofeeds(network inet, url text);
 				CREATE UNIQUE INDEX IF NOT EXISTS network_geofeeds_unique
 					ON network_geofeeds(network);
@@ -1253,6 +1278,124 @@ class CLI(object):
 				# Otherwise return the line
 				yield line
 
+	def handle_update_geofeeds(self, ns):
+		# Fetch all Geofeeds that require an update
+		geofeeds = self.db.query("""
+			SELECT
+				id,
+				url
+			FROM
+				geofeeds
+			WHERE
+				updated_at IS NULL
+			OR
+				updated_at <= CURRENT_TIMESTAMP - INTERVAL '1 week'
+			ORDER BY
+				id
+		""")
+
+		with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
+			results = executor.map(self._fetch_geofeed, geofeeds)
+
+			for result in results:
+				print(result)
+
+	def _fetch_geofeed(self, geofeed):
+		log.debug("Fetching Geofeed %s" % geofeed.url)
+
+		with self.db.transaction():
+			# Open the URL
+			try:
+				req = urllib.request.Request(geofeed.url, headers={
+					"User-Agent" : "location/%s" % location.__version__,
+
+					# We expect some plain text file in CSV format
+					"Accept" : "text/csv, text/plain",
+				})
+
+				# XXX set proxy
+
+				# Send the request
+				with urllib.request.urlopen(req, timeout=10) as f:
+					# Remove any previous data
+					self.db.execute("DELETE FROM geofeed_networks \
+						WHERE geofeed_id = %s", geofeed.id)
+
+					# Read the output line by line
+					for line in f:
+						line = line.decode()
+
+						# Strip any newline
+						line = line.rstrip()
+
+						# Skip empty lines
+						if not line:
+							continue
+
+						# Try to parse the line
+						try:
+							fields = line.split(",", 5)
+						except ValueError:
+							log.debug("Could not parse line: %s" % line)
+							continue
+
+						# Check if we have enough fields
+						if len(fields) < 4:
+							log.debug("Not enough fields in line: %s" % line)
+							continue
+
+						# Fetch all fields
+						network, country, region, city, = fields[:4]
+
+						# Try to parse the network
+						try:
+							network = ipaddress.ip_network(network, strict=False)
+						except ValueError:
+							log.debug("Could not parse network: %s" % network)
+							continue
+
+						# XXX Check the country code
+
+						# Write this into the database
+						self.db.execute("""
+							INSERT INTO
+								geofeed_networks (
+									geofeed_id,
+									network,
+									country,
+									region,
+									city
+								)
+							VALUES (%s, %s, %s, %s, %s)""",
+							geofeed.id,
+							"%s" % network,
+							country,
+							region,
+							city,
+						)
+
+			# Catch any HTTP errors
+			except urllib.request.HTTPError as e:
+				self.db.execute("UPDATE geofeeds SET status = %s \
+					WHERE id = %s", e.code, geofeed.id)
+
+			# Catch any other errors
+			except urllib.request.URLError as e:
+				log.error("Could not fetch URL %s: %s" % (geofeed.url, e))
+
+			# Mark the geofeed as updated
+			else:
+				self.db.execute("""
+					UPDATE
+						geofeeds
+					SET
+						updated_at = CURRENT_TIMESTAMP,
+						status = NULL
+					WHERE
+						id = %s""",
+					geofeed.id,
+				)
+
 	def handle_update_overrides(self, ns):
 		with self.db.transaction():
 			# Only drop manually created overrides, as we can be reasonably sure to have them,
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 03/10] importer: Just fetch any exception from the executor
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
  2022-09-27 16:48 ` [PATCH 02/10] importer: Add command to import geofeeds into the database Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 04/10] importer: Sync geofeeds Michael Tremer
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 902 bytes --]

Any exceptions will only be raised in the main thread when they are
fetched from the executor. We do not need to print any other return
values here.

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index ddec376..014f3b6 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -1297,8 +1297,9 @@ class CLI(object):
 		with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
 			results = executor.map(self._fetch_geofeed, geofeeds)
 
+			# Fetch all results to raise any exceptions
 			for result in results:
-				print(result)
+				pass
 
 	def _fetch_geofeed(self, geofeed):
 		log.debug("Fetching Geofeed %s" % geofeed.url)
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 04/10] importer: Sync geofeeds
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
  2022-09-27 16:48 ` [PATCH 02/10] importer: Add command to import geofeeds into the database Michael Tremer
  2022-09-27 16:48 ` [PATCH 03/10] importer: Just fetch any exception from the executor Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 05/10] importer: Use geofeeds for country assignment Michael Tremer
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

Geofeeds are kept in a separate table to only fetch them once per URL.
This needs to be kept in sync which is done before we update any feeds.

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 014f3b6..12035f1 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -1279,6 +1279,38 @@ class CLI(object):
 				yield line
 
 	def handle_update_geofeeds(self, ns):
+		# Sync geofeeds
+		with self.db.transaction():
+			# Delete all geofeeds which are no longer linked
+			self.db.execute("""
+				DELETE FROM
+					geofeeds
+				WHERE
+					NOT EXISTS (
+						SELECT
+							1
+						FROM
+							network_geofeeds
+						WHERE
+							geofeeds.url = network_geofeeds.url
+					)""",
+			)
+
+			# Copy all geofeeds
+			self.db.execute("""
+				INSERT INTO
+					geofeeds(
+						url
+					)
+				SELECT
+					url
+				FROM
+					network_geofeeds
+				ON CONFLICT (url)
+					DO NOTHING
+				""",
+			)
+
 		# Fetch all Geofeeds that require an update
 		geofeeds = self.db.query("""
 			SELECT
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 05/10] importer: Use geofeeds for country assignment
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (2 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 04/10] importer: Sync geofeeds Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 06/10] importer: Use a GIST index for networks from geofeeds Michael Tremer
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 1453 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 12035f1..f8d2dc8 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -291,6 +291,8 @@ class CLI(object):
 				SELECT network FROM networks
 				UNION
 				SELECT network FROM network_overrides
+				UNION
+				SELECT network FROM geofeed_networks
 			),
 
 			ordered_networks AS (
@@ -333,6 +335,29 @@ class CLI(object):
 						SELECT country FROM autnum_overrides overrides
 							WHERE networks.autnum = overrides.number
 					),
+					(
+						SELECT
+							geofeed_networks.country AS country
+						FROM
+							network_geofeeds
+
+						-- Join the data from the geofeeds
+						LEFT JOIN
+							geofeeds ON network_geofeeds.url = geofeeds.url
+						LEFT JOIN
+							geofeed_networks ON geofeeds.id = geofeed_networks.geofeed_id
+
+						-- Check whether we have a geofeed for this network
+						WHERE
+							networks.network <<= network_geofeeds.network
+						AND
+							networks.network <<= geofeed_networks.network
+
+						-- Filter for the best result
+						ORDER BY
+							masklen(geofeed_networks.network) DESC
+						LIMIT 1
+					),
 					networks.country
 				) AS country,
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 06/10] importer: Use a GIST index for networks from geofeeds
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (3 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 05/10] importer: Use geofeeds for country assignment Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 07/10] importer: Add a search index match geofeed networks quicker Michael Tremer
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 834 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index f8d2dc8..5759e0c 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -207,7 +207,7 @@ class CLI(object):
 				CREATE INDEX IF NOT EXISTS geofeed_networks_geofeed_id
 					ON geofeed_networks(geofeed_id);
 				CREATE INDEX IF NOT EXISTS geofeed_networks_search
-					ON geofeed_networks(network);
+					ON geofeed_networks USING GIST(network inet_ops);
 				CREATE TABLE IF NOT EXISTS network_geofeeds(network inet, url text);
 				CREATE UNIQUE INDEX IF NOT EXISTS network_geofeeds_unique
 					ON network_geofeeds(network);
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 07/10] importer: Add a search index match geofeed networks quicker
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (4 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 06/10] importer: Use a GIST index for networks from geofeeds Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 08/10] importer: Fix reading Geofeeds from remarks Michael Tremer
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 845 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 5759e0c..7e2136e 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -211,6 +211,10 @@ class CLI(object):
 				CREATE TABLE IF NOT EXISTS network_geofeeds(network inet, url text);
 				CREATE UNIQUE INDEX IF NOT EXISTS network_geofeeds_unique
 					ON network_geofeeds(network);
+				CREATE INDEX IF NOT EXISTS network_geofeeds_search
+					ON network_geofeeds USING GIST(network inet_ops);
+				CREATE INDEX IF NOT EXISTS network_geofeeds_url
+					ON network_geofeeds(url);
 
 				-- overrides
 				CREATE TABLE IF NOT EXISTS autnum_overrides(
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 08/10] importer: Fix reading Geofeeds from remarks
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (5 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 07/10] importer: Add a search index match geofeed networks quicker Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 09/10] importer: Ensure that we only use HTTPS URLs for Geofeeds Michael Tremer
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]

Only RIPE has a dedicated geofeed field. For others, this data needs to
be read from the remarks section.

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 7e2136e..0e2764e 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -863,8 +863,8 @@ class CLI(object):
 				inetnum["geofeed"] = val
 
 			# Parse geofeed when used as a remark
-			elif key == "remark":
-				m = re.match(r"^(?:geofeed|Geofeed)\s+(https://.*)", val)
+			elif key == "remarks":
+				m = re.match(r"^(?:Geofeed)\s+(https://.*)", val)
 				if m:
 					inetnum["geofeed"] = m.group(1)
 
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 09/10] importer: Ensure that we only use HTTPS URLs for Geofeeds
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (6 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 08/10] importer: Fix reading Geofeeds from remarks Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-09-27 16:48 ` [PATCH 10/10] importer: Validate country codes from Geofeeds Michael Tremer
  2022-10-28 20:29 ` [PATCH 01/10] importer: Store geofeed URLs from RIR data Peter Müller
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 754 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 0e2764e..d0384b5 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -894,6 +894,11 @@ class CLI(object):
 				# Update any geofeed information
 				geofeed = inetnum.get("geofeed", None)
 
+				# Make sure that this is a HTTPS URL
+				if geofeed and not geofeed.startswith("https://"):
+					log.warning("Geofeed URL is not using HTTPS: %s" % geofeed)
+					geofeed = None
+
 				# Store/update any geofeeds
 				if geofeed:
 					self.db.execute("""
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 10/10] importer: Validate country codes from Geofeeds
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (7 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 09/10] importer: Ensure that we only use HTTPS URLs for Geofeeds Michael Tremer
@ 2022-09-27 16:48 ` Michael Tremer
  2022-10-28 20:29 ` [PATCH 01/10] importer: Store geofeed URLs from RIR data Peter Müller
  9 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-09-27 16:48 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
---
 src/scripts/location-importer.in | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index d0384b5..d8f9cbf 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -1421,7 +1421,15 @@ class CLI(object):
 							log.debug("Could not parse network: %s" % network)
 							continue
 
-						# XXX Check the country code
+						# Strip any excess whitespace from country codes
+						if country:
+							country = country.strip()
+
+						# Check the country code
+						if not location.country_code_is_valid(country):
+							log.warning("Invalid country code in Geofeed %s: %s" \
+								% (geofeed.url, country))
+							continue
 
 						# Write this into the database
 						self.db.execute("""
-- 
2.30.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 01/10] importer: Store geofeed URLs from RIR data
  2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
                   ` (8 preceding siblings ...)
  2022-09-27 16:48 ` [PATCH 10/10] importer: Validate country codes from Geofeeds Michael Tremer
@ 2022-10-28 20:29 ` Peter Müller
  2022-10-29 11:43   ` Michael Tremer
  9 siblings, 1 reply; 12+ messages in thread
From: Peter Müller @ 2022-10-28 20:29 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 3134 bytes --]

Hello Michael,

above all, thank you very much for the patchset and all the work behind it.

Unfortunately, as briefly discussed via the phone already, I have some general
concerns regarding geofeeds:

(a) In contrast to RIRs, I do not see geofeed providers as trustworthy source.
While the former are not trustworthy in terms of the data they provide (since
no vetting or QA of database changes is usually conducted, and it does not look
to me like this is going to change soon), at least their infrastructure is:
It seems reasonable to me to trust, for example, RIPE's FTP server to serve
the same database files regardless of the client requesting it. For some of
them, we could even verify that through file signature validation, assuming that
it is too costly to do live GPG-signing at scale.

Geofeed URLs, in contrast, can lead to anywhere, and I would not be surprised
at all to see dubious ISPs serving different geofeeds to different clients.
Given that our IP address ranges are public and static, and libloc reveals itself
through the User-Agent HTTP header, it would be quite easy to serve us a geofeed
that tampers with data, while playing innocent to other clients.

In addition, many of the 215 geofeed URLs that are currently live (attached) point
to services such as Google Docs or GitHub - both don't strike me as reliable sources
in terms of persistence. Generally, we have the full problem of URL/domain rot again. :-(

One could argue that these points (to a certain extend) hold true for RIRs as
well. However, if we cannot trust them, it's curtains for libloc either way. :-)
Some random ISPs trying to make us consuming geolocation data from random URLs,
on the other hand, poses a greater risk than benefit to the quality of the
location database.

Which brings me directly to the next point...

(b) Presumed we still agree on not being more precise than /24 or /48, all
the information geofeeds provide could (should?) have been in the RIR databases
as well.

The only exception is ARIN, but since we do not get their raw database, we won't
be able to consume any geofeed URLs in it. So, for the area where we lack accuracy
of geolocation information most, geofeed won't help us. And for all the other RIRs
(LACNIC included, for which we process an additional geolocation database feed
already), the geofeeds ideally should not contain any new information to us.


Earlier today, I created a location database text dump on location02 with and without
the geofeed patchset applied. The diff can be retrieved from https://people.ipfire.org/~pmueller/location-database-geofeed-diff.tar.gz,
and is rather massive, partly because CIDRs smaller than /24 resp. /48 are yet to
be ignored by the geofeed processing routines.

I have yet to assess the diff closely, but for a superficial analysis, it appears
like geofeed introduces a lot of changes that could have been in the respective RIR
databases as well. The fact that they are not there does not inspire confidence.

Apologies for this rather disappointing feedback, and best regards,
Peter Müller



[-- Attachment #2: 20221028_live_geofeeds.txt --]
[-- Type: text/plain, Size: 44446 bytes --]

  id  |                                                                               url                                                                                | status |         updated_at         
------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+----------------------------
  803 | https://128bit.ee/geofeed.csv                                                                                                                                    |        | 2022-09-27 08:48:26.590283
  570 | https://1581710f-1ced-4a06-8390-7cc61076f103.selcdn.net/geofeed.csv                                                                                              |        | 2022-09-27 08:48:25.051701
  132 | https://207941.xyz/static/geo.csv                                                                                                                                |        | 2022-09-27 08:48:20.414238
  317 | https://39office.co.uk/geofeed/geofeed.csv                                                                                                                       |    406 | 
  555 | https://abramcevo.net/geofeeds.csv                                                                                                                               |        | 2022-09-27 08:48:24.871628
 1053 | https://akrn.net/geofeed/                                                                                                                                        |        | 2022-09-27 08:48:30.103967
  357 | https://altairline.net/ipgeofeed.csv                                                                                                                             |        | 2022-09-27 08:48:24.184789
  246 | https://api.ewpratten.com/network/geofeed.csv                                                                                                                    |        | 2022-09-27 08:48:22.676312
  278 | https://as203333.jonasbusch.de/geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:23.171395
  256 | https://as204406.net/geofeed.csv                                                                                                                                 |        | 2022-09-27 08:48:23.042018
   52 | https://as205479.net/geofeed.csv                                                                                                                                 |    404 | 
  207 | https://as206016.ezdomain.ru/geofeed.csv                                                                                                                         |        | 2022-09-27 08:48:21.961518
  191 | https://as207941.net/static/geo.csv                                                                                                                              |        | 2022-09-27 08:48:29.101466
  128 | https://as209114.net/geofeed.csv                                                                                                                                 |        | 2022-09-27 08:48:20.541015
  142 | https://as211398.net/geofeed.csv                                                                                                                                 |        | 2022-09-27 08:48:20.695646
    8 | https://as47869.net/geofeed.csv                                                                                                                                  |        | 2022-09-27 08:48:16.969164
  459 | https://as56655.net/geofeed.csv                                                                                                                                  |        | 2022-09-27 08:48:24.426049
  842 | https://assets.ip-only.net/documents/geoipfeed/AS12552.txt                                                                                                       |        | 2022-09-27 08:48:47.301489
  206 | https://assigned.network/geofeed.csv                                                                                                                             |        | 2022-09-27 08:48:21.642988
  163 | https://august.tw/rfc8805.csv                                                                                                                                    |        | 2022-09-27 08:48:21.286217
  133 | https://b4rn.org.uk/geofeed.csv                                                                                                                                  |        | 2022-09-27 08:48:21.189017
  318 | https://bakergroup.uk/geofeed.csv                                                                                                                                |        | 2022-09-27 08:48:23.679303
  185 | https://bisv.ru/geofeed.csv                                                                                                                                      |        | 2022-09-27 08:48:21.167518
  493 | https://blade-public-static.blade-group.com/network/geofeed.csv                                                                                                  |        | 2022-09-27 08:48:25.0396
   91 | https://chesskuo.tw/geofeeds.csv                                                                                                                                 |        | 2022-09-27 08:48:20.384467
 1192 | https://cilix.co.uk/geofeed.csv                                                                                                                                  |        | 2022-09-27 08:48:31.21659
   76 | https://city-telekom.ru/geo/google.csv                                                                                                                           |        | 2022-09-27 08:48:19.094937
  408 | https://clouvider.com/geoipfeed.csv                                                                                                                              |        | 2022-09-27 08:48:24.893566
    4 | https://connectivia.it/wp-content/uploads/2022/08/geofeed.csv                                                                                                    |        | 2022-09-27 08:48:27.052104
  824 | https://convergenze.it/geofeeds.csv                                                                                                                              |        | 2022-09-27 08:48:26.868773
 1122 | https://cposo.ru/geofeed.csv                                                                                                                                     |        | 2022-09-27 08:48:31.070979
   24 | https://data-source.elastx.cloud/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:17.833537
  326 | https://deltatelesystems.ru/geofeed.csv                                                                                                                          |        | 2022-09-27 08:48:24.164182
  902 | https://docs.google.com/spreadsheets/d/1mPvsx7AzQx8yqFhyoYK6K9viEJNEpAZ3ICGKXG3Us5o/edit?usp=sharing/geofeed.csv                                                 |        | 2022-09-27 08:48:28.139785
  833 | https://docs.google.com/spreadsheets/d/e/2PACX-1vR2cb06ptOKwF8GLQwz0AAPJDjEi3L1Y0cWGlbB5pfyHFc_CFiW3tYTbrI5q3_Iuak2iiMB0htwLZPE/pub?output=csv                   |        | 2022-09-27 08:48:27.655495
 1015 | https://docs.google.com/spreadsheets/d/e/2PACX-1vSrczJhjW18HlapiISgmf6_52-Qc3fDA280OxzxjEEUAXnZxZgPEvh_qehOE_tvq-ZOVmICiYDr544P/pub?gid=0&single=true&output=csv |        | 2022-09-27 08:48:29.876872
  673 | https://docs.google.com/spreadsheets/d/e/2PACX-1vTfAIimGjXRLgfJd85VvVj5-fLa6a8kREezGUE8lLffEyW1i-mQCV8qPuqxIeyRzen1iYbaIDcEazaK/pub?output=csv                   |        | 2022-09-27 08:48:26.520855
  674 | https://docs.google.com/spreadsheets/d/e/2PACX-1vTfJX6jAMdyLkgqFPj9aoO5k9TEsFegn0KDWX4gfnK2XGbC1S3hZ1EHZOMMPJ0kchky7ElZr_Uh2vI6/pub?output=csv                   |        | 2022-09-27 08:48:26.805053
  197 | https://docs.google.com/spreadsheets/d/e/2PACX-1vTm5F9a_HLGR-FPR7LWut672RX4nihGTovrmtVd8asQq4N81_JRJEoVQzF9YWQdYBAccsc_yLp9Pi9K/pub?gid=0&single=true&output=csv |        | 2022-09-27 08:48:21.869312
  305 | https://etelecom.ru/files/geofeed.csv                                                                                                                            |        | 2022-09-27 08:48:23.672911
   53 | https://exoscale-prefixes.sos.exo.io/exoscale_geofeed                                                                                                            |        | 2022-09-27 08:48:19.077892
   43 | https://files.as209861.net/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:18.536718
  998 | https://files.delta.nl/geofeed-dfn.csv                                                                                                                           |        | 2022-09-27 08:48:28.699714
  636 | https://fj38gd6m2pd73464fjqw.maloco.ru/geofeed.csv                                                                                                               |        | 2022-09-27 08:48:25.445304
  245 | https://fr89.uk/geofeed.csv                                                                                                                                      |        | 2022-09-27 08:48:22.658833
  328 | https://gblnet.net/ipaddress/geofeed.csv                                                                                                                         |        | 2022-09-27 08:48:24.145082
 1069 | https://geo.blacknight.ie/ripe-geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:30.140024
   39 | https://geo.daknob.net/as210312.csv                                                                                                                              |        | 2022-09-27 08:48:18.423616
  204 | https://geofeed.186526.net/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:21.427929
  106 | https://geofeed.as207111.net/geofeed.csv                                                                                                                         |        | 2022-09-27 08:48:20.243992
  101 | https://geofeed.as210546.net/subnet.csv                                                                                                                          |        | 2022-09-27 08:48:19.671305
  303 | https://geofeed.azedunet.com/geofeed.csv                                                                                                                         |        | 2022-09-27 08:48:23.662716
  236 | https://geofeed.b00b.eu/feed.txt                                                                                                                                 |        | 2022-09-27 08:48:22.48175
   74 | https://geofeed.bluevps.com/geofeed.csv                                                                                                                          |        | 2022-09-27 08:48:18.806875
  482 | https://geofeed.bsonetwork.net/geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:25.421007
   15 | https://geofeed.constant.com/                                                                                                                                    |        | 2022-09-27 08:48:19.238399
  125 | https://geofeed.cynthia.re/geofeed/primary.csv                                                                                                                   |        | 2022-09-27 08:48:20.07647
  669 | https://geofeeddata.s3.eu-west-2.amazonaws.com/geofeed.csv                                                                                                       |        | 2022-09-27 08:48:25.590025
 1112 | https://geofeed.disney.com                                                                                                                                       |        | 2022-09-27 08:48:32.368274
 1085 | https://geofeed.disney.com/                                                                                                                                      |        | 2022-09-27 08:48:32.455889
   10 | https://geofeed.hostzealot.com/geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:17.675103
   90 | https://geofeed.huize.asia/geofeed.csv                                                                                                                           |    523 | 
  354 | https://geofeed.ip-max.net/ip-max.csv                                                                                                                            |        | 2022-09-27 08:48:24.150327
  695 | https://geofeed.keepit.com/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:25.840645
   18 | https://geofeed.kviknet.dk/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:17.255633
    6 | https://geofeed.llnw.net/                                                                                                                                        |        | 2022-09-27 08:48:18.41984
  162 | https://geofeed.netfiretec.com/                                                                                                                                  |        | 2022-09-27 08:48:22.236797
  244 | https://geofeed.noc.vanvik.ax/geofeed.csv                                                                                                                        |        | 2022-09-27 08:48:22.385053
    1 | https://geofeed.rapidseedbox.com/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:17.381663
  211 | https://geofeed.rejecty.com/geofeed.csv                                                                                                                          |        | 2022-09-27 08:48:21.839197
  697 | https://geofeed.servercore.com/prefixes.csv                                                                                                                      |    404 | 
  796 | https://geofeed.servercore.com/subnets.csv                                                                                                                       |        | 2022-09-27 08:48:26.582391
  111 | https://geofeeds.ihcb-group.com/                                                                                                                                 |    526 | 
   23 | https://geofeed.snapserv.net/v1.csv                                                                                                                              |        | 2022-09-27 08:48:17.633081
  103 | https://geofeeds.speedypage.com/geofeed.csv                                                                                                                      |        | 2022-09-27 08:48:19.843224
  621 | https://geofeeds.surfshark.com/geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:25.237156
  330 | https://geofeed.timeweb.net/geofeed.csv                                                                                                                          |        | 2022-09-27 08:48:24.032876
   78 | https://geofeed.transatel.com/geoloc.csv                                                                                                                         |        | 2022-09-27 08:48:19.238399
   20 | https://geofeed.tunenet.dk/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:17.656902
  706 | https://geofeed.wgtwo.com/geofeed.csv                                                                                                                            |        | 2022-09-27 08:48:26.116829
  117 | https://geofeed.zhiccc.net/2a0d-2587-geofeed.csv                                                                                                                 |        | 2022-09-27 08:48:20.163871
  127 | https://geofeed.zhiccc.net/2a0e-b107-geofeed.csv                                                                                                                 |        | 2022-09-27 08:48:20.147434
   44 | https://geo.imaster.ru/geofeeds.csv                                                                                                                              |        | 2022-09-27 08:48:18.936465
   80 | https://geoip.51178.ru/geofeed.csv                                                                                                                               |        | 2022-09-27 08:48:19.535356
   12 | https://geolocation.itm8.com                                                                                                                                     |        | 2022-09-27 08:48:17.612405
  658 | https://geolocation.misaka.io/as969/r4g.csv                                                                                                                      |        | 2022-09-27 08:48:25.610238
   31 | https://geolocation.misaka.io/as969/r6g.csv                                                                                                                      |        | 2022-09-27 08:48:18.054779
  295 | https://geo.telehouse-rechenzentrum.de/geofeed.txt                                                                                                               |        | 2022-09-27 08:48:23.186858
  198 | https://gf.hrvoje.org                                                                                                                                            |        | 2022-09-27 08:48:20.99458
  164 | https://gf.hrvoje.org/                                                                                                                                           |        | 2022-09-27 08:48:20.682033
  397 | https://github.com/aossia/geolocation/blob/main/geofeed.csv                                                                                                      |        | 2022-09-27 08:48:24.52641
    3 | https://github.com/datis-geoip/geofeed/blob/main/geofeed.csv                                                                                                     |        | 2022-09-27 08:48:17.461629
 1062 | https://github.com/geocheck/geofeed/blob/main/185.108.207.0-24_madrid.csv                                                                                        |        | 2022-09-27 08:48:29.918208
  996 | https://github.com/geocheck/geofeed/blob/main/185.167.234.0-24_zurich.csv                                                                                        |        | 2022-09-27 08:48:28.924989
  689 | https://github.com/geocheck/geofeed/blob/main/193.31.62.0-24_helsinki.csv                                                                                        |        | 2022-09-27 08:48:25.967151
 1077 | https://github.com/geocheck/geofeed/blob/main/193.37.196.0-24_kishinev.csv                                                                                       |        | 2022-09-27 08:48:30.292457
 1002 | https://github.com/geocheck/geofeed/blob/main/194.150.210.0-24_ankara.csv                                                                                        |        | 2022-09-27 08:48:28.970679
 1003 | https://github.com/geocheck/geofeed/blob/main/194.150.211.0-24_bratislava.csv                                                                                    |        | 2022-09-27 08:48:29.029244
 1065 | https://github.com/geocheck/geofeed/blob/main/194.233.8.0-22_kiev.csv                                                                                            |        | 2022-09-27 08:48:30.061457
  828 | https://github.com/geocheck/geofeed/blob/main/195.34.78.0-24_athens.csv                                                                                          |        | 2022-09-27 08:48:27.226006
 1049 | https://github.com/geocheck/geofeed/blob/main/212.69.18.0-24_luxembourg.csv                                                                                      |        | 2022-09-27 08:48:29.631977
 1114 | https://github.com/geocheck/geofeed/blob/main/213.139.64.0-22_paris.csv                                                                                          |        | 2022-09-27 08:48:30.552134
 1116 | https://github.com/geocheck/geofeed/blob/main/213.225.238.0-24_riga.csv                                                                                          |        | 2022-09-27 08:48:30.737163
 1125 | https://github.com/geocheck/geofeed/blob/main/45.140.195.0-24_oslo.csv                                                                                           |        | 2022-09-27 08:48:31.092659
 1118 | https://github.com/geocheck/geofeed/blob/main/45.153.125.0-24_warsaw.csv                                                                                         |        | 2022-09-27 08:48:30.836871
 1094 | https://github.com/geocheck/geofeed/blob/main/45.88.10.0-24_jerusalem.csv                                                                                        |        | 2022-09-27 08:48:30.320103
 1063 | https://github.com/geocheck/geofeed/blob/main/5.182.34.0-24_roma.csv                                                                                             |        | 2022-09-27 08:48:30.073939
 1044 | https://github.com/geocheck/geofeed/blob/main/5.249.188.0-22_amsterdam.csv                                                                                       |        | 2022-09-27 08:48:29.435688
 1117 | https://github.com/geocheck/geofeed/blob/main/62.72.179.0-24_stockholm.csv                                                                                       |        | 2022-09-27 08:48:30.650348
 1026 | https://github.com/geocheck/geofeed/blob/main/88.216.184.0-24_london.csv                                                                                         |        | 2022-09-27 08:48:29.425145
  990 | https://github.com/geocheck/geofeed/blob/main/89.116.172.0-23_vilnius.csv                                                                                        |        | 2022-09-27 08:48:28.647311
  823 | https://github.com/geocheck/geofeed/blob/main/89.116.200.0-24_vilnius.csv                                                                                        |        | 2022-09-27 08:48:27.109292
  991 | https://github.com/geocheck/geofeed/blob/main/89.117.36.0-23_vilnius.csv                                                                                         |        | 2022-09-27 08:48:28.668863
  932 | https://github.com/geocheck/geofeed/blob/main/91.208.73.0-24_dublin.csv                                                                                          |        | 2022-09-27 08:48:28.595083
 1078 | https://github.com/geocheck/geofeed/blob/main/91.228.168.0-24_tbilisi.csv                                                                                        |        | 2022-09-27 08:48:30.180117
   27 | https://github.com/Gnetwork-networkteam/GeoLocation_RIPE/blob/main/Geoloc.csv                                                                                    |        | 2022-09-27 08:48:17.995401
  911 | https://github.com/luakst/geofeed/blob/main/45.140.244.0-23_edirne.csv                                                                                           |        | 2022-09-27 08:48:27.925962
 1023 | https://github.com/luakst/geofeed/blob/main/91.186.194.0-23_edirne.csv                                                                                           |        | 2022-09-27 08:48:29.30541
  874 | https://github.com/luakst/geofeed/blob/main/91.186.212.0-22_Helsinki.csv                                                                                         |        | 2022-09-27 08:48:27.476827
  903 | https://github.com/tognetwork/Geofeed/blob/6b6f7b1a0dd05b779b84c3a6dec29c41d2581335/GNetwork_geofeed.csv                                                         |    404 | 
  339 | https://github.com/welltelecom/geofeeds/blob/main/geofeeds.csv                                                                                                   |        | 2022-09-27 08:48:24.051408
  853 | https://globalsecurelayer.com/google-data/GSL-geoip-feed.csv                                                                                                     |        | 2022-09-27 08:48:27.234621
  238 | https://hatsnet.work/geofeed/2a0d-2587-8800-geofeed.csv                                                                                                          |        | 2022-09-27 08:48:22.339861
   77 | https://hellomouse.net/api/geofeed                                                                                                                               |        | 2022-09-27 08:48:19.656768
    5 | https://info.net.deic.dk/deic-geofeed.csv                                                                                                                        |        | 2022-09-27 08:48:17.481572
  783 | https://ipbroker.info/geofeeds/ipbroker.csv                                                                                                                      |        | 2022-09-27 08:48:26.303243
  296 | https://ip-geolocation.fastly.com/                                                                                                                               |        | 2022-09-27 08:48:57.027704
  280 | https://ip-geolocation.xenode.app/                                                                                                                               |        | 
 1045 | https://ipocean.ru/geofeed.csv                                                                                                                                   |        | 2022-09-27 08:48:29.825817
  129 | https://itldc.com/ipgeo.csv                                                                                                                                      |        | 2022-09-27 08:48:20.404382
  322 | https://jcs.jo/geotag.csv                                                                                                                                        |        | 2022-09-27 08:48:23.948845
  220 | https://kagl.me/rfc8805.csv                                                                                                                                      |        | 2022-09-27 08:48:22.776661
    7 | https://keanu.bahnhof.net/geofeed.csv                                                                                                                            |        | 2022-09-27 08:48:17.234445
  194 | https://kitten.network/geofeed.csv                                                                                                                               |        | 2022-09-27 08:48:20.812368
  690 | https://kreationnext.com/location/geofeed.csv                                                                                                                    |        | 2022-09-27 08:48:25.805146
   35 | https://lds.online/geofeed.csv                                                                                                                                   |        | 2022-09-27 08:48:18.375832
  757 | https://massresponse.com/geofeed.csv                                                                                                                             |        | 2022-09-27 08:48:26.131257
  248 | https://minicdn.as211233.net/geofeed.csv                                                                                                                         |        | 2022-09-27 08:48:23.154443
 1140 | https://n.ceisn.co/rfc8805.csv                                                                                                                                   |        | 2022-09-27 08:48:31.158014
  662 | https://netspeed.com.tr/geofeed.csv                                                                                                                              |    406 | 
  102 | https://noc.livecomm.net/geofeed.csv                                                                                                                             |        | 2022-09-27 08:48:19.827172
  281 | https://northlayer.com/geofeed.csv                                                                                                                               |        | 2022-09-27 08:48:23.679303
  384 | https://openfactory.net/geofeed.csv                                                                                                                              |        | 2022-09-27 08:48:24.293772
   89 | https://opengeofeed.org/feed/as142289.csv                                                                                                                        |        | 2022-09-27 08:48:19.344074
  255 | https://opengeofeed.org/feed/as203145.csv                                                                                                                        |        | 2022-09-27 08:48:23.049411
  279 | https://opengeofeed.org/feed/as203199.csv                                                                                                                        |        | 2022-09-27 08:48:23.062945
   40 | https://opengeofeed.org/feed/as208175.csv                                                                                                                        |        | 2022-09-27 08:48:18.567422
   41 | https://opengeofeed.org/feed/as208187.csv                                                                                                                        |        | 2022-09-27 08:48:18.567422
   42 | https://opengeofeed.org/feed/as212710.csv                                                                                                                        |        | 2022-09-27 08:48:18.554637
    9 | https://opengeofeed.org/feed/as49605.csv                                                                                                                         |        | 2022-09-27 08:48:17.210073
  112 | https://open.okita.network/feed.csv                                                                                                                              |    521 | 
  917 | https://packetwall.org/geofeed.csv                                                                                                                               |        | 2022-09-27 08:48:28.620797
   95 | https://peering.pudu.be/geofeed.csv                                                                                                                              |        | 2022-09-27 08:48:19.71252
   30 | https://profitserver.ru/php/geofeed.php                                                                                                                          |        | 2022-09-27 08:48:18.375832
 1144 | https://quickhost.uk/assets/geoip/geofeed.csv                                                                                                                    |        | 2022-09-27 08:48:30.927672
 1086 | https://raw.githubusercontent.com/ahdpik1/geofeeds/main/geofeed.csv                                                                                              |        | 2022-09-27 08:48:30.298753
  214 | https://raw.githubusercontent.com/Alexander-Berry-Roe/geofeed/main/geofeed.csv                                                                                   |        | 2022-09-27 08:48:21.978083
  136 | https://raw.githubusercontent.com/ChrisMacNaughton/geofeed-as207420/main/geofeed.csv                                                                             |        | 2022-09-27 08:48:20.597619
   88 | https://raw.githubusercontent.com/c-nico/AS39792/main/geofeed.csv                                                                                                |        | 2022-09-27 08:48:19.524228
  811 | https://raw.githubusercontent.com/evoxt/geofeed/main/geofeed.csv                                                                                                 |        | 2022-09-27 08:48:26.781036
 1173 | https://raw.githubusercontent.com/HeyKuxo/geofeed/main/geofeed.csv                                                                                               |        | 2022-09-27 08:48:31.358659
 1005 | https://raw.githubusercontent.com/Hoasted/geofeed/master/geofeed.csv                                                                                             |        | 2022-09-27 08:48:28.768182
  337 | https://raw.githubusercontent.com/jppol-noc/geoip/main/geoip.txt                                                                                                 |        | 2022-09-27 08:48:23.720605
  205 | https://raw.githubusercontent.com/leo10ui/ripe/main/geofeeed.csv                                                                                                 |        | 2022-09-27 08:48:21.596527
  209 | https://raw.githubusercontent.com/MrMoreira/geofeed/main/geofeed.csv                                                                                             |        | 2022-09-27 08:48:21.699892
  431 | https://raw.githubusercontent.com/navarino/geofeed/main/geofeed.csv                                                                                              |        | 2022-09-27 08:48:24.338294
  202 | https://raw.githubusercontent.com/ngarafol/geofeed/main/geofeed.csv                                                                                              |        | 2022-09-27 08:48:21.353898
  786 | https://raw.githubusercontent.com/notyourcommy/veesp-geo/main/geofeed.csv                                                                                        |        | 2022-09-27 08:48:26.749152
  216 | https://raw.githubusercontent.com/null31/geofeed/master/geofeed.csv                                                                                              |        | 2022-09-27 08:48:22.118851
  235 | https://raw.githubusercontent.com/rapdodge/AS203868-Geofeeds/main/geofeeds.csv                                                                                   |        | 2022-09-27 08:48:22.264681
  430 | https://raw.githubusercontent.com/servinga/geofeed/main/geofeed.csv                                                                                              |        | 2022-09-27 08:48:24.411605
 1033 | https://raw.githubusercontent.com/Simonadascalu/Freedomtech-Geofeed/main/Freedomtech%20solutions%20-%20ALL?token=GHSAT0AAAAAABQKY2PBBELKSXZL6TVYIS7SYP3WQZA      |        | 2022-09-27 08:48:29.60595
 1149 | https://raw.githubusercontent.com/supplierstechpay/geofeed/main/geofeed.csv                                                                                      |        | 2022-09-27 08:48:31.226972
  224 | https://raw.githubusercontent.com/tomas347/geofeed/main/geofeed.csv                                                                                              |        | 2022-09-27 08:48:22.217225
 1035 | https://raw.githubusercontent.com/visnetwork/geofeed/main/geofeed.csv                                                                                            |        | 2022-09-27 08:48:29.131521
 1013 | https://raw.githubusercontent.com/vtainc/geofeeds/main/geofeeds.csv                                                                                              |        | 2022-09-27 08:48:28.970679
   87 | https://raw.githubusercontent.com/Web1-Oy/geofeed/main/geofeed.csv                                                                                               |        | 2022-09-27 08:48:19.494054
  114 | https://red-panda.be/geofeed.csv                                                                                                                                 |        | 2022-09-27 08:48:19.908639
  213 | https://ripe-ariutk.onrender.com/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:22.366702
  645 | https://ripe.unyc.io                                                                                                                                             |        | 2022-09-27 08:48:25.476598
  758 | https://rose.dsh-mirror.de/geofeed/geofeed.csv                                                                                                                   |        | 2022-09-27 08:48:26.241661
   25 | https://s3.wifirst.net/geofeed/AS52075_Geofeed.csv                                                                                                               |        | 2022-09-27 08:48:17.805775
  712 | https://secure.wireline.com.au/geo/feed.csv                                                                                                                      |        | 2022-09-27 08:48:28.120107
   26 | https://self.bbanda.it/site.cgi?action=download_document&docnum=92120                                                                                            |        | 2022-09-27 08:48:18.34809
   96 | https://server-factory.com/geofeed.csv                                                                                                                           |        | 2022-09-27 08:48:19.6206
  804 | https://service.wienenergie.at/media/files/geoip.csv                                                                                                             |        | 2022-09-27 08:48:26.856748
  190 | https://smishcraft.com/geofeed.csv                                                                                                                               |        | 2022-09-27 08:48:20.704988
  554 | https://static.cloud.konicaminolta.eu/geofeed/204839.csv                                                                                                         |        | 2022-09-27 08:48:24.75275
  532 | https://static.cloud.konicaminolta.eu/geofeed/205287.csv                                                                                                         |        | 2022-09-27 08:48:24.659096
  201 | https://storage.pwn.blue/assets/geofeed.csv                                                                                                                      |        | 2022-09-27 08:48:21.5807
  653 | https://telecu.net/geofeed.csv                                                                                                                                   |        | 2022-09-27 08:48:26.055104
 1080 | https://teploset.org/geofeed.csv                                                                                                                                 |        | 2022-09-27 08:48:30.760928
   45 | https://tktelecom.ru/geoloc.csv                                                                                                                                  |        | 2022-09-27 08:48:19.215348
  159 | https://v6only.host/geofeed.csv                                                                                                                                  |        | 2022-09-27 08:48:20.422941
  475 | https://webhost1.ru/upload/geoip/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:24.659096
  763 | https://www.alwyzon.com/feeds/geoip.csv                                                                                                                          |        | 2022-09-27 08:48:26.261211
  257 | https://www.bnb.host/geo.csv                                                                                                                                     |        | 2022-09-27 08:48:23.028548
 1240 | https://www.daryllswer.com/geofeed/                                                                                                                              |        | 2022-09-27 08:48:31.36456
  769 | https://www.garrison.com/geolocation/KL-DC1.csv                                                                                                                  |    403 | 
  639 | https://www.garrison.com/geolocation/SG-DC1.csv                                                                                                                  |    403 | 
  634 | https://www.garrison.com/geolocation/UK-DC1.csv                                                                                                                  |    403 | 
  768 | https://www.garrison.com/geolocation/UK-DC2.csv                                                                                                                  |    403 | 
  637 | https://www.garrison.com/geolocation/US-DC1.csv                                                                                                                  |    403 | 
  638 | https://www.garrison.com/geolocation/US-DC2.csv                                                                                                                  |    403 | 
   47 | https://www.it-df.net/geofeed.csv                                                                                                                                |        | 2022-09-27 08:48:18.644533
  310 | https://www.iunxi.com/nl/ext/csv/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:23.701211
  131 | https://www.ncryptd.net/geo/feed/geofeed.csv                                                                                                                     |        | 2022-09-27 08:48:20.243992
  948 | https://www.onvi.nl/geofeed.txt                                                                                                                                  |        | 2022-09-27 08:48:28.245746
  861 | https://www.paltel.ps/ip_geopaltel/                                                                                                                              |        | 
  100 | https://www.rkshosting.com/geofeed.csv                                                                                                                           |        | 
   28 | https://www.siportal.it/csv/geofeed.csv                                                                                                                          |        | 2022-09-27 08:48:17.86757
   13 | https://www.teledata.de/as21263_geofeed.csv                                                                                                                      |        | 2022-09-27 08:48:17.633081
  882 | https://www.transmost.ru/files/geofeed.csv                                                                                                                       |        | 2022-09-27 08:48:27.635654
   34 | https://xiaoyu.net/BGP/geofeed.csv                                                                                                                               |        | 
   19 | https://zappiehost.com/geofeeds.csv                                                                                                                              |        | 2022-09-27 08:48:20.556104
(215 rows)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 01/10] importer: Store geofeed URLs from RIR data
  2022-10-28 20:29 ` [PATCH 01/10] importer: Store geofeed URLs from RIR data Peter Müller
@ 2022-10-29 11:43   ` Michael Tremer
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-10-29 11:43 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 4411 bytes --]

Hello Peter,

> On 28 Oct 2022, at 21:29, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> Hello Michael,
> 
> above all, thank you very much for the patchset and all the work behind it.
> 
> Unfortunately, as briefly discussed via the phone already, I have some general
> concerns regarding geofeeds:
> 
> (a) In contrast to RIRs, I do not see geofeed providers as trustworthy source.
> While the former are not trustworthy in terms of the data they provide (since
> no vetting or QA of database changes is usually conducted, and it does not look
> to me like this is going to change soon), at least their infrastructure is:
> It seems reasonable to me to trust, for example, RIPE's FTP server to serve
> the same database files regardless of the client requesting it. For some of
> them, we could even verify that through file signature validation, assuming that
> it is too costly to do live GPG-signing at scale.
> 
> Geofeed URLs, in contrast, can lead to anywhere, and I would not be surprised
> at all to see dubious ISPs serving different geofeeds to different clients.
> Given that our IP address ranges are public and static, and libloc reveals itself
> through the User-Agent HTTP header, it would be quite easy to serve us a geofeed
> that tampers with data, while playing innocent to other clients.
> 
> In addition, many of the 215 geofeed URLs that are currently live (attached) point
> to services such as Google Docs or GitHub - both don't strike me as reliable sources
> in terms of persistence. Generally, we have the full problem of URL/domain rot again. :-(
> 
> One could argue that these points (to a certain extend) hold true for RIRs as
> well. However, if we cannot trust them, it's curtains for libloc either way. :-)
> Some random ISPs trying to make us consuming geolocation data from random URLs,
> on the other hand, poses a greater risk than benefit to the quality of the
> location database.

I see your point, but I disagree.

The RIR databases are self-assessment, too. People can put whatever they want in there and it is not being checked by anyone.

The only thing that you might have in favour of your argument is that there is a better paper trail of any changes than the geo feeds. Those can be changed - even randomly generated. But I believe that we have in both cases no chance to verify any data.

Malicious players will fake their location even in the RIR databases.

What I would suggest as a minimum is to select at least a couple of “trusted” or very large sources that we maintain manually. There are a couple of cloud providers which use Geofeeds and we would quite likely improve the quality of the data for them.

> Which brings me directly to the next point...
> 
> (b) Presumed we still agree on not being more precise than /24 or /48, all
> the information geofeeds provide could (should?) have been in the RIR databases
> as well.
> 
> The only exception is ARIN, but since we do not get their raw database, we won't
> be able to consume any geofeed URLs in it. So, for the area where we lack accuracy
> of geolocation information most, geofeed won't help us. And for all the other RIRs
> (LACNIC included, for which we process an additional geolocation database feed
> already), the geofeeds ideally should not contain any new information to us.

Why should we not process anything smaller than those prefixes? It wouldn’t hurt us at all.

> Earlier today, I created a location database text dump on location02 with and without
> the geofeed patchset applied. The diff can be retrieved from https://people.ipfire.org/~pmueller/location-database-geofeed-diff.tar.gz,
> and is rather massive, partly because CIDRs smaller than /24 resp. /48 are yet to
> be ignored by the geofeed processing routines.
> 
> I have yet to assess the diff closely, but for a superficial analysis, it appears
> like geofeed introduces a lot of changes that could have been in the respective RIR
> databases as well. The fact that they are not there does not inspire confidence.
> 
> Apologies for this rather disappointing feedback, and best regards,
> Peter Müller<20221028_live_geofeeds.txt>

Well, I don’t think this is disappointing. Technically I suspect that you are happy with the code.

We now just need to figure out where to use it and where to not use it.

Best,
-Michael

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-10-29 11:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-27 16:48 [PATCH 01/10] importer: Store geofeed URLs from RIR data Michael Tremer
2022-09-27 16:48 ` [PATCH 02/10] importer: Add command to import geofeeds into the database Michael Tremer
2022-09-27 16:48 ` [PATCH 03/10] importer: Just fetch any exception from the executor Michael Tremer
2022-09-27 16:48 ` [PATCH 04/10] importer: Sync geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 05/10] importer: Use geofeeds for country assignment Michael Tremer
2022-09-27 16:48 ` [PATCH 06/10] importer: Use a GIST index for networks from geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 07/10] importer: Add a search index match geofeed networks quicker Michael Tremer
2022-09-27 16:48 ` [PATCH 08/10] importer: Fix reading Geofeeds from remarks Michael Tremer
2022-09-27 16:48 ` [PATCH 09/10] importer: Ensure that we only use HTTPS URLs for Geofeeds Michael Tremer
2022-09-27 16:48 ` [PATCH 10/10] importer: Validate country codes from Geofeeds Michael Tremer
2022-10-28 20:29 ` [PATCH 01/10] importer: Store geofeed URLs from RIR data Peter Müller
2022-10-29 11:43   ` Michael Tremer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox