Hello Michael,

better late than never: Thanks for your reply and the comments.

> Hello Peter,
> 
>> On 18 Aug 2022, at 22:42, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>
>> Previously, the lack of distinction between different DROP lists caused
>> only the last one to be persisted.
>>
>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>> ---
>> src/scripts/location-importer.in | 58 +++++++++++++++++++++-----------
>> 1 file changed, 39 insertions(+), 19 deletions(-)
>>
>> diff --git a/src/scripts/location-importer.in b/src/scripts/location-importer.in
>> index 8d47497..e4a9ab0 100644
>> --- a/src/scripts/location-importer.in
>> +++ b/src/scripts/location-importer.in
>> @@ -1427,18 +1427,21 @@ class CLI(object):
>> 	def _update_overrides_for_spamhaus_drop(self):
>> 		downloader = location.importer.Downloader()
>>
>> -		ip_urls = [
>> -					"https://www.spamhaus.org/drop/drop.txt",
>> -					"https://www.spamhaus.org/drop/edrop.txt",
>> -					"https://www.spamhaus.org/drop/dropv6.txt"
>> +		ip_lists = [
>> +					("Spamhaus DROP list", "https://www.spamhaus.org/drop/drop.txt"),
>> +					("Spamhaus EDROP list", "https://www.spamhaus.org/drop/edrop.txt"),
>> +					("Spamhaus DROPv6 list", "https://www.spamhaus.org/drop/dropv6.txt")
>> 				]
> 
> Would it not be better to have this as a dict?
> 
> Or if you prefer something to iterate over, a tuple?
> 
> Would it not be better/easier if we would use a handle like “SPAMHAUS-DROP”, “SPAMHAUS-EDROP”, … instead of a string with spaces? It does not matter that much to me, but since this is a machine-readable string, should it not be less human formatted?

Hm, good point. I have changed that in the upcoming second version of the patch.

> 
>>
>> -		asn_urls = [
>> -					"https://www.spamhaus.org/drop/asndrop.txt"
>> +		asn_lists = [
>> +					("Spamhaus ASN-DROP list", "https://www.spamhaus.org/drop/asndrop.txt")
>> 				]
>>
>> -		for url in ip_urls:
>> -			# Fetch IP list
>> +		for list in ip_lists:
>> +			name = list[0]
>> +			url = list[1]
> 
> It would be more Pythonic if you wrote it like this:
> 
>   for name, url in ip_lists:
>      ...
> 
>> +
>> +			# Fetch IP list from given URL
>> 			f = downloader.retrieve(url)
>>
>> 			# Split into lines
>> @@ -1448,11 +1451,11 @@ class CLI(object):
>> 			# downloads.
>> 			if len(fcontent) > 10:
>> 				self.db.execute("""
>> -					DELETE FROM autnum_overrides WHERE source = 'Spamhaus ASN-DROP list';
>> -					DELETE FROM network_overrides WHERE source = 'Spamhaus DROP lists';
>> -				""")
>> +					DELETE FROM network_overrides WHERE source = '%s';
> 
> No need for the ‘’ around the %s.

Yes there is, the SQL command results in an exception otherwise.

Second version of the patch is coming right up.

Thanks, and best regards,
Peter Müller

> 
>> +				""" % name,
>> +				)
>> 			else:
>> -				log.error("Spamhaus DROP URL %s returned likely bogus file, ignored" % url)
>> +				log.error("%s (%s) returned likely bogus file, ignored" % (name, url))
>> 				continue
>>
>> 			# Iterate through every line, filter comments and add remaining networks to
>> @@ -1475,8 +1478,8 @@ class CLI(object):
>>
>> 					# Sanitize parsed networks...
>> 					if not self._check_parsed_network(network):
>> -						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
>> -							(url, network))
>> +						log.warning("Skipping bogus network found in %s (%s): %s" % \
>> +							(name, url, network))
>> 						continue
>>
>> 					# Conduct SQL statement...
>> @@ -1488,14 +1491,31 @@ class CLI(object):
>> 						) VALUES (%s, %s, %s)
>> 						ON CONFLICT (network) DO UPDATE SET is_drop = True""",
>> 						"%s" % network,
>> -						"Spamhaus DROP lists",
>> +						name,
>> 						True
>> 					)
>>
>> -		for url in asn_urls:
>> +		for list in asn_lists:
>> +			name = list[0]
>> +			url = list[1]
>> +
> 
> See above.
> 
>> 			# Fetch URL
>> 			f = downloader.retrieve(url)
>>
>> +			# Split into lines
>> +			fcontent = f.readlines()
>> +
>> +			# Conduct a very basic sanity check to rule out CDN issues causing bogus DROP
>> +			# downloads.
>> +			if len(fcontent) > 10:
>> +				self.db.execute("""
>> +					DELETE FROM autnum_overrides WHERE source = '%s';
>> +				""" % name,
>> +				)
>> +			else:
>> +				log.error("%s (%s) returned likely bogus file, ignored" % (name, url))
>> +				continue
> 
> Is there a reason why this DELETE statement is not part of the transaction? If something goes wrong later on, you would have lost the previous data.
> 
> Would it also not be programmatically better to start the import and then count how many lines you are importing?
> 
> The “f.readlines()” call is something that I would consider a bit ugly.
> 
>> +
>> 			# Iterate through every line, filter comments and add remaining ASNs to
>> 			# the override table in case they are valid...
>> 			with self.db.transaction():
>> @@ -1518,8 +1538,8 @@ class CLI(object):
>>
>> 					# Filter invalid ASNs...
>> 					if not self._check_parsed_asn(asn):
>> -						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
>> -							(url, asn))
>> +						log.warning("Skipping bogus ASN found in %s (%s): %s" % \
>> +							(name, url, asn))
>> 						continue
>>
>> 					# Conduct SQL statement...
>> @@ -1531,7 +1551,7 @@ class CLI(object):
>> 						) VALUES (%s, %s, %s)
>> 						ON CONFLICT (number) DO UPDATE SET is_drop = True""",
>> 						"%s" % asn,
>> -						"Spamhaus ASN-DROP list",
>> +						name,
>> 						True
>> 					)
>>
>> -- 
>> 2.35.3
>