public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH] location-importer.in: Conduct sanity checks per DROP list
Date: Tue, 04 Oct 2022 09:39:39 +0100	[thread overview]
Message-ID: <DAB7F09F-6EC4-40F6-8564-E81713A5C58F@ipfire.org> (raw)
In-Reply-To: <397a7f14-d0a4-743d-5c5c-2f05ccb8259f@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 7376 bytes --]

This is the reason why we are doing code review...

> On 2 Oct 2022, at 12:04, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> Hello Michael,
> 
>> Hello,
>> 
>>> On 26 Sep 2022, at 19:26, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>> 
>>> Hello Michael,
>>> 
>>> better late than never: Thanks for your reply and the comments.
>>> 
>>>> Hello Peter,
>>>> 
>>>>> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>> 
>>>>> Previously, the lack of distinction between different DROP lists caused
>>>>> only the last one to be persisted.
>>>>> 
>>>>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>>>>> ---
>>>>> src/scripts/location-importer.in | 58 +++++++++++++++++++++-----------
>>>>> 1 file changed, 39 insertions(+), 19 deletions(-)
>>>>> 
>>>>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
>>>>> index 8d47497..e4a9ab0 100644
>>>>> --- a/src/scripts/location-importer.in
>>>>> +++ b/src/scripts/location-importer.in
>>>>> @@ -1427,18 +1427,21 @@ class CLI(object):
>>>>> 	def _update_overrides_for_spamhaus_drop(self):
>>>>> 		downloader = location.importer.Downloader()
>>>>> 
>>>>> -		ip_urls = [
>>>>> -					"https://www.spamhaus.org/drop/drop.txt",
>>>>> -					"https://www.spamhaus.org/drop/edrop.txt",
>>>>> -					"https://www.spamhaus.org/drop/dropv6.txt"
>>>>> +		ip_lists = [
>>>>> +					("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"),
>>>>> +					("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"),
>>>>> +					("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt")
>>>>> 				]
>>>> 
>>>> Would it not be better to have this as a dict?
>>>> 
>>>> Or if you prefer something to iterate over, a tuple?
>>>> 
>>>> Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted?
>>> 
>>> Hm, good point. I have changed that in the upcoming second version of the patch.
>>> 
>>>> 
>>>>> 
>>>>> -		asn_urls = [
>>>>> -					"https://www.spamhaus.org/drop/asndrop.txt"
>>>>> +		asn_lists = [
>>>>> +					("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt")
>>>>> 				]
>>>>> 
>>>>> -		for url in ip_urls:
>>>>> -			# Fetch IP list
>>>>> +		for list in ip_lists:
>>>>> +			name = list[0]
>>>>> +			url = list[1]
>>>> 
>>>> It would be more Pythonic if you wrote it like this:
>>>> 
>>>> for name, url in ip_lists:
>>>> ...
>>>> 
>>>>> +
>>>>> +			# Fetch IP list from given URL
>>>>> 			f = downloader.retrieve(url)
>>>>> 
>>>>> 			# Split into lines
>>>>> @@ -1448,11 +1451,11 @@ class CLI(object):
>>>>> 			# downloads.
>>>>> 			if len(fcontent) > 10:
>>>>> 				self.db.execute("""
>>>>> -					DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list';
>>>>> -					DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists';
>>>>> -				""")
>>>>> +					DELETE FROM network_overrides WHERE source = '%s';
>>>> 
>>>> No need for the ‘’ around the %s.
>>> 
>>> Yes there is, the SQL command results in an exception otherwise.
>> 
>> Err, you are allowing some SQL command injection here.
>> 
>> You do not want to use “%s” as a string format parameter for Python strings. It is more of a placeholder for the value that has to be escaped first - SQLite is using ? which is probably a better choice for Python.
>> 
>> The database binding is doing this automatically, but the arguments need to be passed after the query.
>> 
>> Under no circumstances whatsoever you want to chance the query dynamically. Never ever.
>> 
>> I fixed this here: https://git.ipfire.org/?p=location/libloc.git;a=commitdiff;h=b56bf4bf065130b38968b580e0bea6db809783d8
> 
> better late than never: Many thanks for spotting and fixing this before it went into
> anything stable or productive! Guess I'm not a good programmer after all...
> 
> All the best,
> Peter Müller
> 
>> 
>> Best,
>> -Michael
>> 
>>> 
>>> Second version of the patch is coming right up.
>>> 
>>> Thanks, and best regards,
>>> Peter Müller
>>> 
>>>> 
>>>>> +				""" % name,
>>>>> +				)
>>>>> 			else:
>>>>> -				log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url)
>>>>> +				log.error("%s (%s) returned likely bogus file, ignored" % (name, url))
>>>>> 				continue
>>>>> 
>>>>> 			# Iterate through every line, filter comments and add remaining networks to
>>>>> @@ -1475,8 +1478,8 @@ class CLI(object):
>>>>> 
>>>>> 					# Sanitize parsed networks...
>>>>> 					if not self._check_parsed_network(network):
>>>>> -						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
>>>>> -							(url, network))
>>>>> +						log.warning("Skipping bogus network found in %s (%s): %s" % \
>>>>> +							(name, url, network))
>>>>> 						continue
>>>>> 
>>>>> 					# Conduct SQL statement...
>>>>> @@ -1488,14 +1491,31 @@ class CLI(object):
>>>>> 						) VALUES (%s, %s, %s)
>>>>> 						ON CONFLICT (network) DO UPDATE SET is_drop = True""",
>>>>> 						"%s" % network,
>>>>> -						"Spamhaus DROP lists",
>>>>> +						name,
>>>>> 						True
>>>>> 					)
>>>>> 
>>>>> -		for url in asn_urls:
>>>>> +		for list in asn_lists:
>>>>> +			name = list[0]
>>>>> +			url = list[1]
>>>>> +
>>>> 
>>>> See above.
>>>> 
>>>>> 			# Fetch URL
>>>>> 			f = downloader.retrieve(url)
>>>>> 
>>>>> +			# Split into lines
>>>>> +			fcontent = f.readlines()
>>>>> +
>>>>> +			# Conduct a very basic sanity check to rule out CDN issues causing bogus DROP
>>>>> +			# downloads.
>>>>> +			if len(fcontent) > 10:
>>>>> +				self.db.execute("""
>>>>> +					DELETE FROM autnum_overrides WHERE source = '%s';
>>>>> +				""" % name,
>>>>> +				)
>>>>> +			else:
>>>>> +				log.error("%s (%s) returned likely bogus file, ignored" % (name, url))
>>>>> +				continue
>>>> 
>>>> Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data.
>>>> 
>>>> Would it also not be programmatically better to start the import and then count how many lines you are importing?
>>>> 
>>>> The “f.readlines()” call is something that I would consider a bit ugly.
>>>> 
>>>>> +
>>>>> 			# Iterate through every line, filter comments and add remaining ASNs to
>>>>> 			# the override table in case they are valid...
>>>>> 			with self.db.transaction():
>>>>> @@ -1518,8 +1538,8 @@ class CLI(object):
>>>>> 
>>>>> 					# Filter invalid ASNs...
>>>>> 					if not self._check_parsed_asn(asn):
>>>>> -						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
>>>>> -							(url, asn))
>>>>> +						log.warning("Skipping bogus ASN found in %s (%s): %s" % \
>>>>> +							(name, url, asn))
>>>>> 						continue
>>>>> 
>>>>> 					# Conduct SQL statement...
>>>>> @@ -1531,7 +1551,7 @@ class CLI(object):
>>>>> 						) VALUES (%s, %s, %s)
>>>>> 						ON CONFLICT (number) DO UPDATE SET is_drop = True""",
>>>>> 						"%s" % asn,
>>>>> -						"Spamhaus ASN-DROP list",
>>>>> +						name,
>>>>> 						True
>>>>> 					)
>>>>> 
>>>>> -- 
>>>>> 2.35.3


      reply	other threads:[~2022-10-04  8:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-18 21:42 Peter Müller
2022-08-19  8:21 ` Matthias Fischer
2022-08-19  8:25   ` Michael Tremer
2022-08-19  8:31 ` Michael Tremer
2022-09-26 18:26   ` Peter Müller
2022-09-26 18:26     ` [PATCH v2] " Peter Müller
2022-09-27  9:17       ` Michael Tremer
2022-09-27  9:22     ` [PATCH] " Michael Tremer
2022-10-02 11:04       ` Peter Müller
2022-10-04  8:39         ` Michael Tremer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DAB7F09F-6EC4-40F6-8564-E81713A5C58F@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=location@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox