public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH] Process LACNIC geofeed as well
Date: Mon, 13 Dec 2021 18:54:07 +0000	[thread overview]
Message-ID: <B8605BFC-E386-410A-830C-1C830028A3BC@ipfire.org> (raw)
In-Reply-To: <bdc5ec85-963b-25b4-3c0a-a420a263a883@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 4625 bytes --]

Hello Peter,

I merged this patch, but I wanted to give you some guidance on how I would have written the Python code.

I am not saying my way is the better way. I just find it a lot easier to read.

> On 11 Dec 2021, at 21:59, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> This improves country code accurarcy for suballocations within IP space
> managed by LACNIC, as the delegated-extended-latest file only provides
> country code information at the top level of an allocated network.
> 
> Sadly, lacnic.db.gz does not contain descriptions or names of Autonomous
> Systems within the space maintained by LACNIC.
> 
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/importer.py          |  4 ++-
> src/python/location-importer.in | 43 +++++++++++++++++++++++++++------
> 2 files changed, 39 insertions(+), 8 deletions(-)
> 
> diff --git a/src/python/importer.py b/src/python/importer.py
> index de340b4..dee36ed 100644
> --- a/src/python/importer.py
> +++ b/src/python/importer.py
> @@ -53,7 +53,9 @@ WHOIS_SOURCES = {
> 		],
> 
> 	# Latin America and Caribbean Network Information Centre
> -	# XXX ???
> +	"LACNIC": [
> +		"https://ftp.lacnic.net/lacnic/dbase/lacnic.db.gz"
> +		],
> 
> 	# Réseaux IP Européens
> 	"RIPE": [
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index b791b4d..78bb846 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -681,13 +681,42 @@ class CLI(object):
> 				# Strip any excess space
> 				start_address, end_address = start_address.rstrip(), end_address.strip()
> 
> -				# Convert to IP address
> -				try:
> -					start_address = ipaddress.ip_address(start_address)
> -					end_address   = ipaddress.ip_address(end_address)
> -				except ValueError:
> -					log.warning("Could not parse line: %s" % line)
> -					return
> +				# Handle "inetnum" formatting in LACNIC DB (e.g. "24.152.8/22" instead of "24.152.8.0/22")

I would have checked in the beginning whether the string contains “/“ and then branched off to somewhere that handles this.

That would have been more intuitive to read than the following if statement:

> +				if start_address and not (delim or end_address):
> +					try:
> +						start_address = ipaddress.ip_network(start_address, strict=False)
> +					except ValueError:
> +						start_address = start_address.split("/")
> +						ldigits = len(start_address[0].split("."))

You can use start_address[0].count(“.”) here which is shorter and easier to read. Hopefully it is implemented faster, too.

> +						# How many octets do we need to add?
> +						# (LACNIC does not seem to have a /8 or greater assigned, so the following should suffice.)
> +						if ldigits == 2:
> +							start_address = start_address[0] + ".0.0/" + start_address[1]
> +						elif ldigits == 3:
> +							start_address = start_address[0] + ".0/" + start_address[1]
> +						else:
> +							log.warning("Could not recover IPv4 address from line in LACNIC DB format: %s" % line)
> +							return

I would consider these inputs (e.g. 24.152.8/22) valid IP addresses. They are not very popular, confusing and very easy to get wrong, but inet_ntop should parse this correctly.

However, since we are in this mess already, maybe a regular expression or substitution would have been more simple to write.

In the end, this solution seems to work. The reason why I am going on about this is that we run through this millions of times. Even adding an extra if statement is becoming a performance penalty which I would like to avoid as much as possible. The other factor is that code should be easy to audit and that is why I would like to pass on my patterns because I believe they are good :)

-Michael

> +
> +						try:
> +							start_address = ipaddress.ip_network(start_address, strict=False)
> +						except ValueError:
> +							log.warning("Could not parse line in LACNIC DB format: %s" % line)
> +							return
> +
> +					# Enumerate first and last IP address of this network
> +					end_address = start_address[-1]
> +					start_address = start_address[0]
> +
> +				else:
> +					# Convert to IP address
> +					try:
> +						start_address = ipaddress.ip_address(start_address)
> +						end_address   = ipaddress.ip_address(end_address)
> +					except ValueError:
> +						log.warning("Could not parse line: %s" % line)
> +						return
> 
> 				inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
> 
> -- 
> 2.26.2


      reply	other threads:[~2021-12-13 18:54 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-11 21:59 Peter Müller
2021-12-13 18:54 ` Michael Tremer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B8605BFC-E386-410A-830C-1C830028A3BC@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=location@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox