public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
From: "Peter Müller" <peter.mueller@ipfire.org>
To: location@lists.ipfire.org
Subject: [PATCH] location-importer: Replace ARIN AS names source with one that offers human-readable names
Date: Sun, 10 Dec 2023 19:37:00 +0000	[thread overview]
Message-ID: <beef74a6-6aff-4f40-be53-f8acbb0a1055@ipfire.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 4366 bytes --]

This patch replaces our previous source for AS names in ARIN's realms
with another file provided by ARIN that contains human-readable names
for organizations ASNs have been allocated to.

Please note that a

	TRUNCATE autnums;

is necessary on machines previously running the old version of
location-importer, in order to make use of this changed data source.

Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
---
 src/scripts/location-importer.in | 47 ++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 20 deletions(-)

diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
index 28a4f6c..96b3a20 100644
--- a/src/scripts/location-importer.in
+++ b/src/scripts/location-importer.in
@@ -3,7 +3,7 @@
 #                                                                             #
 # libloc - A library to determine the location of someone on the Internet     #
 #                                                                             #
-# Copyright (C) 2020-2022 IPFire Development Team <info(a)ipfire.org>           #
+# Copyright (C) 2020-2023 IPFire Development Team <info(a)ipfire.org>           #
 #                                                                             #
 # This library is free software; you can redistribute it and/or               #
 # modify it under the terms of the GNU Lesser General Public                  #
@@ -19,6 +19,7 @@
 
 import argparse
 import concurrent.futures
+import csv
 import http.client
 import ipaddress
 import json
@@ -1033,36 +1034,42 @@ class CLI(object):
 	def _import_as_names_from_arin(self):
 		downloader = location.importer.Downloader()
 
-		# XXX: Download AS names file from ARIN (note that these names appear to be quite
-		# technical, not intended for human consumption, as description fields in
-		# organisation handles for other RIRs are - however, this is what we have got,
-		# and in some cases, it might be still better than nothing)
-		for line in downloader.request_lines("https://ftp.arin.net/info/asn.txt"):
-			# Valid lines start with a space, followed by the number of the Autonomous System ...
-			if not line.startswith(" "):
+		# Download AS names file from ARIN and load it into CSV parser
+		for line in downloader.request_lines("https://ftp.arin.net/pub/resource_registry_service/asns.csv"):
+
+			# Valid lines start with a " ...
+			if not line.startswith("\""):
 				continue
 
 			# Split line and check if there is a valid ASN in it...
-			asn, name = line.split()[0:2]
+			for row in csv.reader([line]):
+				orgname = row[0]
+				orghandle = row[1]
+				firstasn = row[3]
+				lastasn = row[4]
 
 			try:
-				asn = int(asn)
+				firstasn = int(firstasn.strip("\""))
+				lastasn = int(lastasn.strip("\""))
 			except ValueError:
-				log.debug("Skipping ARIN AS names line not containing an integer for ASN")
+				log.debug("Skipping ARIN AS names line not containing valid integers for ASN")
 				continue
 
 			# Filter invalid ASNs...
-			if not self._check_parsed_asn(asn):
+			if not self._check_parsed_asn(firstasn):
 				continue
 
-			# Skip any AS name that appears to be a placeholder for a different RIR or entity...
-			if re.match(r"^(ASN-BLK|)(AFCONC|AFRINIC|APNIC|ASNBLK|LACNIC|RIPE|IANA)(?:\d?$|\-)", name):
+			if firstasn > lastasn:
+				continue
+
+			# Filter any bulk AS assignments, since these are present for other RIRs where
+			# we get better data from elsewhere.
+			if not firstasn == lastasn:
 				continue
 
-			# Bail out in case the AS name contains anything we do not expect here...
-			if re.search(r"[^a-zA-Z0-9-_]", name):
-				log.debug("Skipping ARIN AS name for %s containing invalid characters: %s" % \
-						(asn, name))
+			# Skip any AS name that appears to be a placeholder for a different RIR or entity...
+			if re.match(r"^(AFRINIC|APNIC|LACNIC|RIPE)$", orghandle.strip("\"")):
+				continue
 
 			# Things look good here, run INSERT statement and skip this one if we already have
 			# a (better?) name for this Autonomous System...
@@ -1073,8 +1080,8 @@ class CLI(object):
 					source
 				) VALUES (%s, %s, %s)
 				ON CONFLICT (number) DO NOTHING""",
-				asn,
-				name,
+				firstasn,
+				orgname.strip("\""),
 				"ARIN",
 			)
 
-- 
2.35.3

             reply	other threads:[~2023-12-10 19:37 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-10 19:37 Peter Müller [this message]
2023-12-13 11:45 ` Michael Tremer
2024-02-21 17:16   ` Michael Tremer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=beef74a6-6aff-4f40-be53-f8acbb0a1055@ipfire.org \
    --to=peter.mueller@ipfire.org \
    --cc=location@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox