public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH v2] location-importer.in: Import (technical) AS names from ARIN
Date: Thu, 10 Jun 2021 09:52:18 +0100	[thread overview]
Message-ID: <5B56B4C5-0AB7-4CAA-9827-9EF0DD984467@ipfire.org> (raw)
In-Reply-To: <20210608170307.623-1-peter.mueller@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 4190 bytes --]

Hello,

> On 8 Jun 2021, at 18:03, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> ARIN and LACNIC, unfortunately, do not seem to publish data containing
> human readable AS names. For the former, we at least have a list of
> tecnical names, which this patch fetches and inserts into the autnums
> table.
> 
> While some of them do not seem to be suitable for human consumption (i.
> e. being very cryptic), providing these data might be helpful
> neverthelesss.
> 
> The second version of this patch contains some additional remarks on
> efficient Python coding style from Michael, doing things more "pythonic".
> 
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/location-importer.in | 55 +++++++++++++++++++++++++++++++++
> 1 file changed, 55 insertions(+)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index aa3b8f7..6ccee3b 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -505,6 +505,9 @@ class CLI(object):
> 						for line in f:
> 							self._parse_line(line, source_key, validcountries)
> 
> +		# Download and import (technical) AS names from ARIN
> +		self._import_as_names_from_arin()
> +
> 	def _check_parsed_network(self, network):
> 		"""
> 			Assistive function to detect and subsequently sort out parsed
> @@ -775,6 +778,58 @@ class CLI(object):
> 			"%s" % network, country, [country], source_key,
> 		)
> 
> +	def _import_as_names_from_arin(self):
> +		downloader = location.importer.Downloader()
> +
> +		# XXX: Download AS names file from ARIN (note that these names appear to be quite
> +		# technical, not intended for human consumption, as description fields in
> +		# organisation handles for other RIRs are - however, this is what we have got,
> +		# and in some cases, it might be still better than nothing)
> +		with downloader.request("https://ftp.arin.net/info/asn.txt", return_blocks=False) as f:
> +			for line in f:
> +				# Convert binary line to string...
> +				line = str(line)
> +
> +				# ... valid lines start with a space, followed by the number of the Autonomous System ...
> +				if not line.startswith(" "):
> +					continue
> +
> +				# Split line and check if there is a valid ASN in it...
> +				asn, name = line.split()[0:2]
> +
> +				try:
> +					asn = int(asn)
> +				except ValueError:
> +					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
> +					continue
> +
> +				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
> +					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
> +					continue
> +
> +				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
> +				if re.match(r"^(ASN-BLK|)(AFCONC|AFRINIC|APNIC|ASNBLK|DNIC|LACNIC|RIPE|IANA)(\d?$|\-.*)", name):
> +					continue

This is still not entirely optimal. It doesn’t matter too much, so I will merge it, but…

* You added a selection group which you do not need, so you could have written (?:…) instead of (…).

\-.* matches a literal dash and then anything after it. You do not care about what comes after, so you could have just had \- and that is it. It would have saved a couple of CPU cycles because you don’t have to read the entire rest of the string.

> +
> +				# Bail out in case the AS name contains anything we do not expect here...
> +				if re.search(r"[^a-zA-Z0-9-_]", name):
> +					log.debug("Skipping ARIN AS name for %s containing invalid characters: %s" % \
> +							(asn, name))
> +
> +				# Things look good here, run INSERT statement and skip this one if we already have
> +				# a (better?) name for this Autonomous System...
> +				self.db.execute("""
> +					INSERT INTO autnums(
> +						number,
> +						name,
> +						source
> +					) VALUES (%s, %s, %s)
> +					ON CONFLICT (number) DO NOTHING""",
> +					asn,
> +					name,
> +					"ARIN",
> +				)
> +
> 	def handle_update_announcements(self, ns):
> 		server = ns.server[0]
> 
> -- 
> 2.20.1
> 

-Michael


      reply	other threads:[~2021-06-10  8:52 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08 17:03 Peter Müller
2021-06-10  8:52 ` Michael Tremer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5B56B4C5-0AB7-4CAA-9827-9EF0DD984467@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=location@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox