public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
* [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs
@ 2021-10-10 16:16 Peter Müller
  2021-10-10 16:16 ` [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Peter Müller
  2021-10-12 11:23 ` [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Michael Tremer
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Müller @ 2021-10-10 16:16 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
---
 src/python/location-importer.in | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index da058d3..c2b3e41 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -574,6 +574,22 @@ class CLI(object):
 		# be suitable for libloc consumption...
 		return True
 
+	def _check_parsed_asn(self, asn):
+		"""
+			Assistive function to filter Autonomous System Numbers not being suitable
+			for adding to our database. Returns False in such cases, and True otherwise.
+		"""
+
+		if not asn or not isinstance(asn, int):
+			return False
+
+		if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
+			log.debug("Skipping invalid ASN: %s" % asn)
+			return False
+
+		# ASN is fine if we made it here...
+		return True
+
 	def _parse_block(self, block, source_key, validcountries = None):
 		# Get first line to find out what type of block this is
 		line = block[0]
@@ -829,8 +845,8 @@ class CLI(object):
 					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
 					continue
 
-				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
-					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
+				# Filter invalid ASNs...
+				if not self._check_parsed_asn(asn):
 					continue
 
 				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
-- 
2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists
  2021-10-10 16:16 [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Peter Müller
@ 2021-10-10 16:16 ` Peter Müller
  2021-10-12 11:26   ` Michael Tremer
  2021-10-12 11:23 ` [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Michael Tremer
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Müller @ 2021-10-10 16:16 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 5715 bytes --]

A while ago, it was discussed whether or not libloc should become an
"opinionated database", i. e. including any information on a network's
reputation.

In general, this idea was dismissed as libloc is neither intended nor
suitable for such tasks, and we do not want to make (political?)
decisions like these for various reasons. All we do is to provide a
useful location database in a neutral way, and leave it up to our users
on how to react on certain results.

However, there is a problematic area. Take AS55303 as an example: We
_know_ this is to be a dirty network, tampering with RIR data and
hijacking IP space, and strongly recommend against processing any
connection originating from or directed to it.

Since it appears to be loaded with proxies used by miscreants for
abusive purposes, all we can do at the time of writing is to flag it
as "anonymous proxy", but we lack possibility of telling our users
something like "this is not a safe area". The very same goes for known
bulletproof ISPs, IP hijackers, and so forth.

This patch therefore suggests to populate the "is_drop" flag introduced
in libloc 0.9.8 (albeit currently unused in production) with the
contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
have at least the baddest of the bad covered. The very same lists are,
in fact, included in popular IPS rulesets as well - a decent amount of
IPFire users is therefore likely to have them already enabled, but in a
very costly way.

It is not planned to go further, partly because there is no other feed
publicly available, which would come with the same intention,
volatility, and FP rate.

The second version of this patch makes use of an auxiliary function to
sanitise ASNs, hence avoiding boilerplate code, and treats any line
starting with a semicolon as a comment, which should be sufficient.

Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
---
 src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/src/python/location-importer.in b/src/python/location-importer.in
index c2b3e41..1d84c15 100644
--- a/src/python/location-importer.in
+++ b/src/python/location-importer.in
@@ -1075,6 +1075,9 @@ class CLI(object):
 			# network allocation lists in a machine-readable format...
 			self._update_overrides_for_aws()
 
+			# Update overrides for Spamhaus DROP feeds...
+			self._update_overrides_for_spamhaus_drop()
+
 			for file in ns.files:
 				log.info("Reading %s..." % file)
 
@@ -1259,6 +1262,107 @@ class CLI(object):
 				)
 
 
+	def _update_overrides_for_spamhaus_drop(self):
+		downloader = location.importer.Downloader()
+
+		ip_urls = [
+					"https://www.spamhaus.org/drop/drop.txt",
+					"https://www.spamhaus.org/drop/edrop.txt",
+					"https://www.spamhaus.org/drop/dropv6.txt"
+				]
+
+		asn_urls = [
+					"https://www.spamhaus.org/drop/asndrop.txt"
+				]
+
+		for url in ip_urls:
+			try:
+				with downloader.request(url, return_blocks=False) as f:
+					fcontent = f.body.readlines()
+			except Exception as e:
+				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
+				return
+
+			# Iterate through every line, filter comments and add remaining networks to
+			# the override table in case they are valid...
+			with self.db.transaction():
+				for sline in fcontent:
+
+					# The response is assumed to be encoded in UTF-8...
+					sline = sline.decode("utf-8")
+
+					# Comments start with a semicolon...
+					if sline.startswith(";"):
+						continue
+
+					# Extract network and ignore anything afterwards...
+					try:
+						network = ipaddress.ip_network(sline.split()[0], strict=False)
+					except ValueError:
+						log.error("Unable to parse line: %s" % sline)
+						continue
+
+					# Sanitize parsed networks...
+					if not self._check_parsed_network(network):
+						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
+							(url, network))
+						continue
+
+					# Conduct SQL statement...
+					self.db.execute("""
+						INSERT INTO network_overrides(
+							network,
+							source,
+							is_drop
+						) VALUES (%s, %s, %s)
+						ON CONFLICT (network) DO NOTHING""",
+						"%s" % network,
+						"Spamhaus DROP lists",
+						True
+					)
+
+		for url in asn_urls:
+			try:
+				with downloader.request(url, return_blocks=False) as f:
+					fcontent = f.body.readlines()
+			except Exception as e:
+				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
+				return
+
+			# Iterate through every line, filter comments and add remaining ASNs to
+			# the override table in case they are valid...
+			with self.db.transaction():
+				for sline in fcontent:
+
+					# The response is assumed to be encoded in UTF-8...
+					sline = sline.decode("utf-8")
+
+					# Comments start with a semicolon...
+					if sline.startswith(";"):
+						continue
+
+					# Extract ASN and ignore anything afterwards...
+					asn = int(sline.split()[0].strip("AS"))
+
+					# Filter invalid ASNs...
+					if not self._check_parsed_asn(asn):
+						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
+							(url, asn))
+						continue
+
+					# Conduct SQL statement...
+					self.db.execute("""
+						INSERT INTO autnum_overrides(
+							number,
+							source,
+							is_drop
+						) VALUES (%s, %s, %s)
+						ON CONFLICT (number) DO NOTHING""",
+						"%s" % asn,
+						"Spamhaus ASN-DROP list",
+						True
+					)
+
 	@staticmethod
 	def _parse_bool(block, key):
 		val = block.get(key)
-- 
2.26.2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs
  2021-10-10 16:16 [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Peter Müller
  2021-10-10 16:16 ` [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Peter Müller
@ 2021-10-12 11:23 ` Michael Tremer
  2021-10-13 16:33   ` Peter Müller
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Tremer @ 2021-10-12 11:23 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 2274 bytes --]

Hello,

> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/location-importer.in | 20 ++++++++++++++++++--
> 1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index da058d3..c2b3e41 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -574,6 +574,22 @@ class CLI(object):
> 		# be suitable for libloc consumption...
> 		return True
> 
> +	def _check_parsed_asn(self, asn):
> +		"""
> +			Assistive function to filter Autonomous System Numbers not being suitable
> +			for adding to our database. Returns False in such cases, and True otherwise.
> +		"""
> +
> +		if not asn or not isinstance(asn, int):
> +			return False

Does this happen that a non-integer is being passed to this function?

You also return False for zero without logging the message.

I would suggest to drop the check above.

> +
> +		if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
> +			log.debug("Skipping invalid ASN: %s" % asn)
> +			return False

This works, but I do not consider this very Pythonic.

I would have written a tuple which conatins one tuple for each range and then iterate over that until you find a match.

> +
> +		# ASN is fine if we made it here...
> +		return True

Ellipses in comments are sometimes weird...

> +
> 	def _parse_block(self, block, source_key, validcountries = None):
> 		# Get first line to find out what type of block this is
> 		line = block[0]
> @@ -829,8 +845,8 @@ class CLI(object):
> 					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
> 					continue
> 
> -				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
> -					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
> +				# Filter invalid ASNs...
> +				if not self._check_parsed_asn(asn):
> 					continue
> 
> 				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
> -- 
> 2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists
  2021-10-10 16:16 ` [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Peter Müller
@ 2021-10-12 11:26   ` Michael Tremer
  2021-10-13 16:37     ` Peter Müller
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Tremer @ 2021-10-12 11:26 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 6521 bytes --]

Hello,

> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> A while ago, it was discussed whether or not libloc should become an
> "opinionated database", i. e. including any information on a network's
> reputation.
> 
> In general, this idea was dismissed as libloc is neither intended nor
> suitable for such tasks, and we do not want to make (political?)
> decisions like these for various reasons. All we do is to provide a
> useful location database in a neutral way, and leave it up to our users
> on how to react on certain results.
> 
> However, there is a problematic area. Take AS55303 as an example: We
> _know_ this is to be a dirty network, tampering with RIR data and
> hijacking IP space, and strongly recommend against processing any
> connection originating from or directed to it.
> 
> Since it appears to be loaded with proxies used by miscreants for
> abusive purposes, all we can do at the time of writing is to flag it
> as "anonymous proxy", but we lack possibility of telling our users
> something like "this is not a safe area". The very same goes for known
> bulletproof ISPs, IP hijackers, and so forth.
> 
> This patch therefore suggests to populate the "is_drop" flag introduced
> in libloc 0.9.8 (albeit currently unused in production) with the
> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
> have at least the baddest of the bad covered. The very same lists are,
> in fact, included in popular IPS rulesets as well - a decent amount of
> IPFire users is therefore likely to have them already enabled, but in a
> very costly way.
> 
> It is not planned to go further, partly because there is no other feed
> publicly available, which would come with the same intention,
> volatility, and FP rate.
> 
> The second version of this patch makes use of an auxiliary function to
> sanitise ASNs, hence avoiding boilerplate code, and treats any line
> starting with a semicolon as a comment, which should be sufficient.
> 
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
> 1 file changed, 104 insertions(+)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index c2b3e41..1d84c15 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -1075,6 +1075,9 @@ class CLI(object):
> 			# network allocation lists in a machine-readable format...
> 			self._update_overrides_for_aws()
> 
> +			# Update overrides for Spamhaus DROP feeds...
> +			self._update_overrides_for_spamhaus_drop()
> +
> 			for file in ns.files:
> 				log.info("Reading %s..." % file)
> 
> @@ -1259,6 +1262,107 @@ class CLI(object):
> 				)
> 
> 
> +	def _update_overrides_for_spamhaus_drop(self):
> +		downloader = location.importer.Downloader()
> +
> +		ip_urls = [
> +					"https://www.spamhaus.org/drop/drop.txt",
> +					"https://www.spamhaus.org/drop/edrop.txt",
> +					"https://www.spamhaus.org/drop/dropv6.txt"
> +				]
> +
> +		asn_urls = [
> +					"https://www.spamhaus.org/drop/asndrop.txt"
> +				]
> +
> +		for url in ip_urls:
> +			try:
> +				with downloader.request(url, return_blocks=False) as f:
> +					fcontent = f.body.readlines()
> +			except Exception as e:
> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
> +				return
> +
> +			# Iterate through every line, filter comments and add remaining networks to
> +			# the override table in case they are valid...
> +			with self.db.transaction():
> +				for sline in fcontent:
> +
> +					# The response is assumed to be encoded in UTF-8...
> +					sline = sline.decode("utf-8")
> +
> +					# Comments start with a semicolon...
> +					if sline.startswith(";"):
> +						continue
> +
> +					# Extract network and ignore anything afterwards...
> +					try:
> +						network = ipaddress.ip_network(sline.split()[0], strict=False)
> +					except ValueError:
> +						log.error("Unable to parse line: %s" % sline)
> +						continue
> +
> +					# Sanitize parsed networks...
> +					if not self._check_parsed_network(network):
> +						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
> +							(url, network))
> +						continue
> +
> +					# Conduct SQL statement...
> +					self.db.execute("""
> +						INSERT INTO network_overrides(
> +							network,
> +							source,
> +							is_drop
> +						) VALUES (%s, %s, %s)
> +						ON CONFLICT (network) DO NOTHING""",
> +						"%s" % network,
> +						"Spamhaus DROP lists",
> +						True
> +					)
> +
> +		for url in asn_urls:
> +			try:
> +				with downloader.request(url, return_blocks=False) as f:
> +					fcontent = f.body.readlines()
> +			except Exception as e:
> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
> +				return
> +
> +			# Iterate through every line, filter comments and add remaining ASNs to
> +			# the override table in case they are valid...
> +			with self.db.transaction():
> +				for sline in fcontent:
> +
> +					# The response is assumed to be encoded in UTF-8...
> +					sline = sline.decode("utf-8")
> +
> +					# Comments start with a semicolon...
> +					if sline.startswith(";"):
> +						continue
> +
> +					# Extract ASN and ignore anything afterwards...
> +					asn = int(sline.split()[0].strip("AS"))

I kind of like stuff like this being split over multiple lines. Performance is the same, but it is easier to understand and debugging lines can be added easier.

> +
> +					# Filter invalid ASNs...
> +					if not self._check_parsed_asn(asn):
> +						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
> +							(url, asn))
> +						continue

This answers my question from my other email whether it is possible that a non-integer is being passed to the check function. It can’t happen  here. I would prefer to drop the isinstance() check for performance reasons.

> +
> +					# Conduct SQL statement...
> +					self.db.execute("""
> +						INSERT INTO autnum_overrides(
> +							number,
> +							source,
> +							is_drop
> +						) VALUES (%s, %s, %s)
> +						ON CONFLICT (number) DO NOTHING""",
> +						"%s" % asn,
> +						"Spamhaus ASN-DROP list",
> +						True
> +					)
> +
> 	@staticmethod
> 	def _parse_bool(block, key):
> 		val = block.get(key)
> -- 
> 2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs
  2021-10-12 11:23 ` [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Michael Tremer
@ 2021-10-13 16:33   ` Peter Müller
  2021-10-14 18:19     ` Michael Tremer
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Müller @ 2021-10-13 16:33 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 2982 bytes --]

Hello Michael,

thanks for your reply.

> Hello,
> 
>> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>
>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>> ---
>> src/python/location-importer.in | 20 ++++++++++++++++++--
>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>> index da058d3..c2b3e41 100644
>> --- a/src/python/location-importer.in
>> +++ b/src/python/location-importer.in
>> @@ -574,6 +574,22 @@ class CLI(object):
>> 		# be suitable for libloc consumption...
>> 		return True
>>
>> +	def _check_parsed_asn(self, asn):
>> +		"""
>> +			Assistive function to filter Autonomous System Numbers not being suitable
>> +			for adding to our database. Returns False in such cases, and True otherwise.
>> +		"""
>> +
>> +		if not asn or not isinstance(asn, int):
>> +			return False
> 
> Does this happen that a non-integer is being passed to this function?

What's wrong with input validation? I _like_ input validation. :-)

Seriously: Anything else than an integer does not make sense for an ASN. Sure, this
function is not intended to get anything else, but we will never know. Better to be
safe than sorry.

> You also return False for zero without logging the message.

True. Since there will probably a second version of this patchset, I will ensure it
logs anything useful in this case.

> I would suggest to drop the check above.

Frankly, I don't see why.

>> +
>> +		if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
>> +			log.debug("Skipping invalid ASN: %s" % asn)
>> +			return False
> 
> This works, but I do not consider this very Pythonic.
> 
> I would have written a tuple which conatins one tuple for each range and then iterate over that until you find a match.

Far from being a Python developer, this wouldn't have come to my mind. But if it's
Pythonic, I'll do so. When in Rome...

> 
>> +
>> +		# ASN is fine if we made it here...
>> +		return True
> 
> Ellipses in comments are sometimes weird...

???

Thanks, and best regards,
Peter Müller

> 
>> +
>> 	def _parse_block(self, block, source_key, validcountries = None):
>> 		# Get first line to find out what type of block this is
>> 		line = block[0]
>> @@ -829,8 +845,8 @@ class CLI(object):
>> 					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
>> 					continue
>>
>> -				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
>> -					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
>> +				# Filter invalid ASNs...
>> +				if not self._check_parsed_asn(asn):
>> 					continue
>>
>> 				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
>> -- 
>> 2.26.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists
  2021-10-12 11:26   ` Michael Tremer
@ 2021-10-13 16:37     ` Peter Müller
  2021-10-14 18:12       ` Michael Tremer
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Müller @ 2021-10-13 16:37 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 7065 bytes --]

Hello Michael,

thanks for your reply.

> Hello,
> 
>> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>
>> A while ago, it was discussed whether or not libloc should become an
>> "opinionated database", i. e. including any information on a network's
>> reputation.
>>
>> In general, this idea was dismissed as libloc is neither intended nor
>> suitable for such tasks, and we do not want to make (political?)
>> decisions like these for various reasons. All we do is to provide a
>> useful location database in a neutral way, and leave it up to our users
>> on how to react on certain results.
>>
>> However, there is a problematic area. Take AS55303 as an example: We
>> _know_ this is to be a dirty network, tampering with RIR data and
>> hijacking IP space, and strongly recommend against processing any
>> connection originating from or directed to it.
>>
>> Since it appears to be loaded with proxies used by miscreants for
>> abusive purposes, all we can do at the time of writing is to flag it
>> as "anonymous proxy", but we lack possibility of telling our users
>> something like "this is not a safe area". The very same goes for known
>> bulletproof ISPs, IP hijackers, and so forth.
>>
>> This patch therefore suggests to populate the "is_drop" flag introduced
>> in libloc 0.9.8 (albeit currently unused in production) with the
>> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
>> have at least the baddest of the bad covered. The very same lists are,
>> in fact, included in popular IPS rulesets as well - a decent amount of
>> IPFire users is therefore likely to have them already enabled, but in a
>> very costly way.
>>
>> It is not planned to go further, partly because there is no other feed
>> publicly available, which would come with the same intention,
>> volatility, and FP rate.
>>
>> The second version of this patch makes use of an auxiliary function to
>> sanitise ASNs, hence avoiding boilerplate code, and treats any line
>> starting with a semicolon as a comment, which should be sufficient.
>>
>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>> ---
>> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
>> 1 file changed, 104 insertions(+)
>>
>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>> index c2b3e41..1d84c15 100644
>> --- a/src/python/location-importer.in
>> +++ b/src/python/location-importer.in
>> @@ -1075,6 +1075,9 @@ class CLI(object):
>> 			# network allocation lists in a machine-readable format...
>> 			self._update_overrides_for_aws()
>>
>> +			# Update overrides for Spamhaus DROP feeds...
>> +			self._update_overrides_for_spamhaus_drop()
>> +
>> 			for file in ns.files:
>> 				log.info("Reading %s..." % file)
>>
>> @@ -1259,6 +1262,107 @@ class CLI(object):
>> 				)
>>
>>
>> +	def _update_overrides_for_spamhaus_drop(self):
>> +		downloader = location.importer.Downloader()
>> +
>> +		ip_urls = [
>> +					"https://www.spamhaus.org/drop/drop.txt",
>> +					"https://www.spamhaus.org/drop/edrop.txt",
>> +					"https://www.spamhaus.org/drop/dropv6.txt"
>> +				]
>> +
>> +		asn_urls = [
>> +					"https://www.spamhaus.org/drop/asndrop.txt"
>> +				]
>> +
>> +		for url in ip_urls:
>> +			try:
>> +				with downloader.request(url, return_blocks=False) as f:
>> +					fcontent = f.body.readlines()
>> +			except Exception as e:
>> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
>> +				return
>> +
>> +			# Iterate through every line, filter comments and add remaining networks to
>> +			# the override table in case they are valid...
>> +			with self.db.transaction():
>> +				for sline in fcontent:
>> +
>> +					# The response is assumed to be encoded in UTF-8...
>> +					sline = sline.decode("utf-8")
>> +
>> +					# Comments start with a semicolon...
>> +					if sline.startswith(";"):
>> +						continue
>> +
>> +					# Extract network and ignore anything afterwards...
>> +					try:
>> +						network = ipaddress.ip_network(sline.split()[0], strict=False)
>> +					except ValueError:
>> +						log.error("Unable to parse line: %s" % sline)
>> +						continue
>> +
>> +					# Sanitize parsed networks...
>> +					if not self._check_parsed_network(network):
>> +						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
>> +							(url, network))
>> +						continue
>> +
>> +					# Conduct SQL statement...
>> +					self.db.execute("""
>> +						INSERT INTO network_overrides(
>> +							network,
>> +							source,
>> +							is_drop
>> +						) VALUES (%s, %s, %s)
>> +						ON CONFLICT (network) DO NOTHING""",
>> +						"%s" % network,
>> +						"Spamhaus DROP lists",
>> +						True
>> +					)
>> +
>> +		for url in asn_urls:
>> +			try:
>> +				with downloader.request(url, return_blocks=False) as f:
>> +					fcontent = f.body.readlines()
>> +			except Exception as e:
>> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
>> +				return
>> +
>> +			# Iterate through every line, filter comments and add remaining ASNs to
>> +			# the override table in case they are valid...
>> +			with self.db.transaction():
>> +				for sline in fcontent:
>> +
>> +					# The response is assumed to be encoded in UTF-8...
>> +					sline = sline.decode("utf-8")
>> +
>> +					# Comments start with a semicolon...
>> +					if sline.startswith(";"):
>> +						continue
>> +
>> +					# Extract ASN and ignore anything afterwards...
>> +					asn = int(sline.split()[0].strip("AS"))
> 
> I kind of like stuff like this being split over multiple lines. Performance is the same, but it is easier to understand and debugging lines can be added easier.

I see.

> 
>> +
>> +					# Filter invalid ASNs...
>> +					if not self._check_parsed_asn(asn):
>> +						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
>> +							(url, asn))
>> +						continue
> 
> This answers my question from my other email whether it is possible that a non-integer is being passed to the check function. It can’t happen  here. I would prefer to drop the isinstance() check for performance reasons.

True, it can't happen, but I just noticed I would not catch a ValueError in case something
else than an integer would have left after all these split() and strip(). In the
_import_as_names_from_arin() function, this is covered, hence I will add this here as well.

Thanks, and best regards,
Peter Müller

> 
>> +
>> +					# Conduct SQL statement...
>> +					self.db.execute("""
>> +						INSERT INTO autnum_overrides(
>> +							number,
>> +							source,
>> +							is_drop
>> +						) VALUES (%s, %s, %s)
>> +						ON CONFLICT (number) DO NOTHING""",
>> +						"%s" % asn,
>> +						"Spamhaus ASN-DROP list",
>> +						True
>> +					)
>> +
>> 	@staticmethod
>> 	def _parse_bool(block, key):
>> 		val = block.get(key)
>> -- 
>> 2.26.2
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists
  2021-10-13 16:37     ` Peter Müller
@ 2021-10-14 18:12       ` Michael Tremer
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Tremer @ 2021-10-14 18:12 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 7704 bytes --]

Hello,

> On 13 Oct 2021, at 17:37, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> Hello Michael,
> 
> thanks for your reply.
> 
>> Hello,
>> 
>>> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>> 
>>> A while ago, it was discussed whether or not libloc should become an
>>> "opinionated database", i. e. including any information on a network's
>>> reputation.
>>> 
>>> In general, this idea was dismissed as libloc is neither intended nor
>>> suitable for such tasks, and we do not want to make (political?)
>>> decisions like these for various reasons. All we do is to provide a
>>> useful location database in a neutral way, and leave it up to our users
>>> on how to react on certain results.
>>> 
>>> However, there is a problematic area. Take AS55303 as an example: We
>>> _know_ this is to be a dirty network, tampering with RIR data and
>>> hijacking IP space, and strongly recommend against processing any
>>> connection originating from or directed to it.
>>> 
>>> Since it appears to be loaded with proxies used by miscreants for
>>> abusive purposes, all we can do at the time of writing is to flag it
>>> as "anonymous proxy", but we lack possibility of telling our users
>>> something like "this is not a safe area". The very same goes for known
>>> bulletproof ISPs, IP hijackers, and so forth.
>>> 
>>> This patch therefore suggests to populate the "is_drop" flag introduced
>>> in libloc 0.9.8 (albeit currently unused in production) with the
>>> contents of Spamhaus' DROP lists (https://www.spamhaus.org/drop/), to
>>> have at least the baddest of the bad covered. The very same lists are,
>>> in fact, included in popular IPS rulesets as well - a decent amount of
>>> IPFire users is therefore likely to have them already enabled, but in a
>>> very costly way.
>>> 
>>> It is not planned to go further, partly because there is no other feed
>>> publicly available, which would come with the same intention,
>>> volatility, and FP rate.
>>> 
>>> The second version of this patch makes use of an auxiliary function to
>>> sanitise ASNs, hence avoiding boilerplate code, and treats any line
>>> starting with a semicolon as a comment, which should be sufficient.
>>> 
>>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>>> ---
>>> src/python/location-importer.in | 104 ++++++++++++++++++++++++++++++++
>>> 1 file changed, 104 insertions(+)
>>> 
>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>> index c2b3e41..1d84c15 100644
>>> --- a/src/python/location-importer.in
>>> +++ b/src/python/location-importer.in
>>> @@ -1075,6 +1075,9 @@ class CLI(object):
>>> 			# network allocation lists in a machine-readable format...
>>> 			self._update_overrides_for_aws()
>>> 
>>> +			# Update overrides for Spamhaus DROP feeds...
>>> +			self._update_overrides_for_spamhaus_drop()
>>> +
>>> 			for file in ns.files:
>>> 				log.info("Reading %s..." % file)
>>> 
>>> @@ -1259,6 +1262,107 @@ class CLI(object):
>>> 				)
>>> 
>>> 
>>> +	def _update_overrides_for_spamhaus_drop(self):
>>> +		downloader = location.importer.Downloader()
>>> +
>>> +		ip_urls = [
>>> +					"https://www.spamhaus.org/drop/drop.txt",
>>> +					"https://www.spamhaus.org/drop/edrop.txt",
>>> +					"https://www.spamhaus.org/drop/dropv6.txt"
>>> +				]
>>> +
>>> +		asn_urls = [
>>> +					"https://www.spamhaus.org/drop/asndrop.txt"
>>> +				]
>>> +
>>> +		for url in ip_urls:
>>> +			try:
>>> +				with downloader.request(url, return_blocks=False) as f:
>>> +					fcontent = f.body.readlines()
>>> +			except Exception as e:
>>> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
>>> +				return
>>> +
>>> +			# Iterate through every line, filter comments and add remaining networks to
>>> +			# the override table in case they are valid...
>>> +			with self.db.transaction():
>>> +				for sline in fcontent:
>>> +
>>> +					# The response is assumed to be encoded in UTF-8...
>>> +					sline = sline.decode("utf-8")
>>> +
>>> +					# Comments start with a semicolon...
>>> +					if sline.startswith(";"):
>>> +						continue
>>> +
>>> +					# Extract network and ignore anything afterwards...
>>> +					try:
>>> +						network = ipaddress.ip_network(sline.split()[0], strict=False)
>>> +					except ValueError:
>>> +						log.error("Unable to parse line: %s" % sline)
>>> +						continue
>>> +
>>> +					# Sanitize parsed networks...
>>> +					if not self._check_parsed_network(network):
>>> +						log.warning("Skipping bogus network found in Spamhaus DROP URL %s: %s" % \
>>> +							(url, network))
>>> +						continue
>>> +
>>> +					# Conduct SQL statement...
>>> +					self.db.execute("""
>>> +						INSERT INTO network_overrides(
>>> +							network,
>>> +							source,
>>> +							is_drop
>>> +						) VALUES (%s, %s, %s)
>>> +						ON CONFLICT (network) DO NOTHING""",
>>> +						"%s" % network,
>>> +						"Spamhaus DROP lists",
>>> +						True
>>> +					)
>>> +
>>> +		for url in asn_urls:
>>> +			try:
>>> +				with downloader.request(url, return_blocks=False) as f:
>>> +					fcontent = f.body.readlines()
>>> +			except Exception as e:
>>> +				log.error("Unable to download Spamhaus DROP URL %s: %s" % (url, e))
>>> +				return
>>> +
>>> +			# Iterate through every line, filter comments and add remaining ASNs to
>>> +			# the override table in case they are valid...
>>> +			with self.db.transaction():
>>> +				for sline in fcontent:
>>> +
>>> +					# The response is assumed to be encoded in UTF-8...
>>> +					sline = sline.decode("utf-8")
>>> +
>>> +					# Comments start with a semicolon...
>>> +					if sline.startswith(";"):
>>> +						continue
>>> +
>>> +					# Extract ASN and ignore anything afterwards...
>>> +					asn = int(sline.split()[0].strip("AS"))
>> 
>> I kind of like stuff like this being split over multiple lines. Performance is the same, but it is easier to understand and debugging lines can be added easier.
> 
> I see.
> 
>> 
>>> +
>>> +					# Filter invalid ASNs...
>>> +					if not self._check_parsed_asn(asn):
>>> +						log.warning("Skipping bogus ASN found in Spamhaus DROP URL %s: %s" % \
>>> +							(url, asn))
>>> +						continue
>> 
>> This answers my question from my other email whether it is possible that a non-integer is being passed to the check function. It can’t happen  here. I would prefer to drop the isinstance() check for performance reasons.
> 
> True, it can't happen, but I just noticed I would not catch a ValueError in case something
> else than an integer would have left after all these split() and strip(). In the
> _import_as_names_from_arin() function, this is covered, hence I will add this here as well.

Yeah that might be true, but you don’t want to catch all exceptions. They are called exceptions for a reason.

When you parse stuff, you want to check this, but I would normally expect that the caller of a function has sufficiently typed their arguments. Otherwise you want to use a typed language like C :)

-Michael

> 
> Thanks, and best regards,
> Peter Müller
> 
>> 
>>> +
>>> +					# Conduct SQL statement...
>>> +					self.db.execute("""
>>> +						INSERT INTO autnum_overrides(
>>> +							number,
>>> +							source,
>>> +							is_drop
>>> +						) VALUES (%s, %s, %s)
>>> +						ON CONFLICT (number) DO NOTHING""",
>>> +						"%s" % asn,
>>> +						"Spamhaus ASN-DROP list",
>>> +						True
>>> +					)
>>> +
>>> 	@staticmethod
>>> 	def _parse_bool(block, key):
>>> 		val = block.get(key)
>>> -- 
>>> 2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs
  2021-10-13 16:33   ` Peter Müller
@ 2021-10-14 18:19     ` Michael Tremer
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Tremer @ 2021-10-14 18:19 UTC (permalink / raw)
  To: location

[-- Attachment #1: Type: text/plain, Size: 3809 bytes --]

Hi,

> On 13 Oct 2021, at 17:33, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> Hello Michael,
> 
> thanks for your reply.
> 
>> Hello,
>> 
>>> On 10 Oct 2021, at 17:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>> 
>>> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
>>> ---
>>> src/python/location-importer.in | 20 ++++++++++++++++++--
>>> 1 file changed, 18 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
>>> index da058d3..c2b3e41 100644
>>> --- a/src/python/location-importer.in
>>> +++ b/src/python/location-importer.in
>>> @@ -574,6 +574,22 @@ class CLI(object):
>>> 		# be suitable for libloc consumption...
>>> 		return True
>>> 
>>> +	def _check_parsed_asn(self, asn):
>>> +		"""
>>> +			Assistive function to filter Autonomous System Numbers not being suitable
>>> +			for adding to our database. Returns False in such cases, and True otherwise.
>>> +		"""
>>> +
>>> +		if not asn or not isinstance(asn, int):
>>> +			return False
>> 
>> Does this happen that a non-integer is being passed to this function?
> 
> What's wrong with input validation? I _like_ input validation. :-)

There is nothing wrong with that. You are just checking the developer here and I am not sure whether you want that or not.

> Seriously: Anything else than an integer does not make sense for an ASN. Sure, this
> function is not intended to get anything else, but we will never know. Better to be
> safe than sorry.

Not entirely. You want code to perform. If you want to be 100% use, Python isn’t the language this parser should be written in.

Nothing else but an integer makes sense. The question is how do you want to treat zero?

>> You also return False for zero without logging the message.
> 
> True. Since there will probably a second version of this patchset, I will ensure it
> logs anything useful in this case.
> 
>> I would suggest to drop the check above.
> 
> Frankly, I don't see why.
> 
>>> +
>>> +		if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
>>> +			log.debug("Skipping invalid ASN: %s" % asn)
>>> +			return False
>> 
>> This works, but I do not consider this very Pythonic.
>> 
>> I would have written a tuple which conatins one tuple for each range and then iterate over that until you find a match.
> 
> Far from being a Python developer, this wouldn't have come to my mind. But if it's
> Pythonic, I'll do so. When in Rome...

I don’t make the rules. That is just how I would do it:

* Data in one place

* A short algorithm that works on the data

In C I would hope that the compiler makes it fast.

> 
>> 
>>> +
>>> +		# ASN is fine if we made it here...
>>> +		return True
>> 
>> Ellipses in comments are sometimes weird...
> 
> ???

This one left the comment kind of open ended. Making it sound kind of unlikely.

> 
> Thanks, and best regards,
> Peter Müller
> 
>> 
>>> +
>>> 	def _parse_block(self, block, source_key, validcountries = None):
>>> 		# Get first line to find out what type of block this is
>>> 		line = block[0]
>>> @@ -829,8 +845,8 @@ class CLI(object):
>>> 					log.debug("Skipping ARIN AS names line not containing an integer for ASN")
>>> 					continue
>>> 
>>> -				if not ((1 <= asn and asn <= 23455) or (23457 <= asn and asn <= 64495) or (131072 <= asn and asn <= 4199999999)):
>>> -					log.debug("Skipping ARIN AS names line not containing a valid ASN: %s" % asn)
>>> +				# Filter invalid ASNs...
>>> +				if not self._check_parsed_asn(asn):
>>> 					continue
>>> 
>>> 				# Skip any AS name that appears to be a placeholder for a different RIR or entity...
>>> -- 
>>> 2.26.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-14 18:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-10 16:16 [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Peter Müller
2021-10-10 16:16 ` [PATCH 2/2] location-importer.in: Add Spamhaus DROP lists Peter Müller
2021-10-12 11:26   ` Michael Tremer
2021-10-13 16:37     ` Peter Müller
2021-10-14 18:12       ` Michael Tremer
2021-10-12 11:23 ` [PATCH 1/2] location-importer: Introduce auxiliary function to sanitise ASNs Michael Tremer
2021-10-13 16:33   ` Peter Müller
2021-10-14 18:19     ` Michael Tremer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox