From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH] Process LACNIC geofeed as well
Date: Mon, 13 Dec 2021 18:54:07 +0000 [thread overview]
Message-ID: <B8605BFC-E386-410A-830C-1C830028A3BC@ipfire.org> (raw)
In-Reply-To: <bdc5ec85-963b-25b4-3c0a-a420a263a883@ipfire.org>
[-- Attachment #1: Type: text/plain, Size: 4625 bytes --]
Hello Peter,
I merged this patch, but I wanted to give you some guidance on how I would have written the Python code.
I am not saying my way is the better way. I just find it a lot easier to read.
> On 11 Dec 2021, at 21:59, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> This improves country code accurarcy for suballocations within IP space
> managed by LACNIC, as the delegated-extended-latest file only provides
> country code information at the top level of an allocated network.
>
> Sadly, lacnic.db.gz does not contain descriptions or names of Autonomous
> Systems within the space maintained by LACNIC.
>
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/importer.py | 4 ++-
> src/python/location-importer.in | 43 +++++++++++++++++++++++++++------
> 2 files changed, 39 insertions(+), 8 deletions(-)
>
> diff --git a/src/python/importer.py b/src/python/importer.py
> index de340b4..dee36ed 100644
> --- a/src/python/importer.py
> +++ b/src/python/importer.py
> @@ -53,7 +53,9 @@ WHOIS_SOURCES = {
> ],
>
> # Latin America and Caribbean Network Information Centre
> - # XXX ???
> + "LACNIC": [
> + "https://ftp.lacnic.net/lacnic/dbase/lacnic.db.gz"
> + ],
>
> # Réseaux IP Européens
> "RIPE": [
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index b791b4d..78bb846 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -681,13 +681,42 @@ class CLI(object):
> # Strip any excess space
> start_address, end_address = start_address.rstrip(), end_address.strip()
>
> - # Convert to IP address
> - try:
> - start_address = ipaddress.ip_address(start_address)
> - end_address = ipaddress.ip_address(end_address)
> - except ValueError:
> - log.warning("Could not parse line: %s" % line)
> - return
> + # Handle "inetnum" formatting in LACNIC DB (e.g. "24.152.8/22" instead of "24.152.8.0/22")
I would have checked in the beginning whether the string contains “/“ and then branched off to somewhere that handles this.
That would have been more intuitive to read than the following if statement:
> + if start_address and not (delim or end_address):
> + try:
> + start_address = ipaddress.ip_network(start_address, strict=False)
> + except ValueError:
> + start_address = start_address.split("/")
> + ldigits = len(start_address[0].split("."))
You can use start_address[0].count(“.”) here which is shorter and easier to read. Hopefully it is implemented faster, too.
> + # How many octets do we need to add?
> + # (LACNIC does not seem to have a /8 or greater assigned, so the following should suffice.)
> + if ldigits == 2:
> + start_address = start_address[0] + ".0.0/" + start_address[1]
> + elif ldigits == 3:
> + start_address = start_address[0] + ".0/" + start_address[1]
> + else:
> + log.warning("Could not recover IPv4 address from line in LACNIC DB format: %s" % line)
> + return
I would consider these inputs (e.g. 24.152.8/22) valid IP addresses. They are not very popular, confusing and very easy to get wrong, but inet_ntop should parse this correctly.
However, since we are in this mess already, maybe a regular expression or substitution would have been more simple to write.
In the end, this solution seems to work. The reason why I am going on about this is that we run through this millions of times. Even adding an extra if statement is becoming a performance penalty which I would like to avoid as much as possible. The other factor is that code should be easy to audit and that is why I would like to pass on my patterns because I believe they are good :)
-Michael
> +
> + try:
> + start_address = ipaddress.ip_network(start_address, strict=False)
> + except ValueError:
> + log.warning("Could not parse line in LACNIC DB format: %s" % line)
> + return
> +
> + # Enumerate first and last IP address of this network
> + end_address = start_address[-1]
> + start_address = start_address[0]
> +
> + else:
> + # Convert to IP address
> + try:
> + start_address = ipaddress.ip_address(start_address)
> + end_address = ipaddress.ip_address(end_address)
> + except ValueError:
> + log.warning("Could not parse line: %s" % line)
> + return
>
> inetnum["inetnum"] = list(ipaddress.summarize_address_range(start_address, end_address))
>
> --
> 2.26.2
prev parent reply other threads:[~2021-12-13 18:54 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-11 21:59 Peter Müller
2021-12-13 18:54 ` Michael Tremer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=B8605BFC-E386-410A-830C-1C830028A3BC@ipfire.org \
--to=michael.tremer@ipfire.org \
--cc=location@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox