From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: import additional IP information for Amazon AWS IP networks Date: Mon, 12 Apr 2021 19:48:00 +0200 Message-ID: <523d834c-9104-8df3-a3b9-7c8be18630ba@ipfire.org> In-Reply-To: <80063B39-D1F9-45BF-A57B-9B9BAE239D31@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============8063777577212952708==" List-Id: --===============8063777577212952708== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, thanks for your reply. Frankly, the longer I think about this patches' approach, the more I become u= nhappy with it: (a) We are processing the Amazon AWS IP range feed overcredulous: It comes wi= thout being digitally signed in any way over a HTTPS connection - at least _I_ don't trust PKI, and sh= ould probably finally write that blog post about it planned for quite some time now :-/ - from a CDN. ip-r= anges.amazonaws.com is not even DNSSEC-signed, not to mention DANE for their web service. Worse, my patch lacks additional safeguards. At the moment, the feeds' co= ntent is only checked for too big to too small prefixes, or anything not globally routable, and similar odd= ities. Amazon, however, must not publish any information regarding IP space they do not own - and if they = do, we should not process it. While this does not eliminate the possible attack of somebody tampering w= ith their feed on their server(s), the CDN, or anywhere in between, it would prevent a hostile actor to abus= e that feed for arbitrarily spoofing the contents of a libloc database generated by us. Unfortunately, I have no elegant idea how to do this at the moment. A mos= t basic approach would consist in rejecting any network not announced by ASNs we know are owned or maintain= ed by Amazon - not sure how volatile this list would be. Only accepting information for networks whose RIR data proof ownership or= maintenance by Amazon would be a more thorough approach, though. However, that involves bulk queries to th= e Whois, as a decent chunk of their IP space is assigned by ARIN. In case of RIPE et al., we might parse our = way through the databases we already have, but this is laborious, and we have no routines for enumerating main= tainer data yet. (b) I honestly dislike intransparent changes here. Since we fill the override= SQL table on demand every time, retracing content of generated location databases will be quite tricky if= they did not originate from our own override files. On the other hand, we do not store the contents of the RIR databases down= loaded, either. Simply dumping the Amazon AWS IP range feed into our Git repository would solve the transpar= ency issue, but results in unnecessary bloat - unless we really need it someday. Do you have a particular idea about how to solve this issue in mind? Regarding (a), the RIRs' FTP server FQDNs are at least DNSSEC-signed, but we = do not enforce this. While I vaguely remember to have seen signatures for the RIPE database, we currently do not v= alidate it, either. Although this would increase complexity and affects performance when generating a database = at our end, I would propose to do so whenever possible. Thoughts? Sorry for this length and not very optimistic answer. If you ask me, you'll a= lways get the worst-case scenario. :-) After all, we are doing security here... Thanks, and best regards, Peter M=C3=BCller > Hello Peter, >=20 > Thanks for this, I guess this would affect quite a few people out there=E2= =80=A6 >=20 > However, is it a good idea to use the overrides table for this? Should that= not be reserved for the pure overrides? >=20 > There is no way to view these changes. Is that something we can live with? >=20 > -Michael >=20 >> On 10 Apr 2021, at 13:28, Peter M=C3=BCller w= rote: >> >> Amazon publishes information regarding some of their IP networks >> primarily used for AWS cloud services in a machine-readable format. To >> improve libloc lookup results for these, we have little choice other >> than importing and parsing them. >> >> Unfortunately, there seems to be no machine-readable list of the >> locations of their data centers or availability zones available. If >> there _is_ any, please let the author know. >> >> Fixes: #12594 >> >> Signed-off-by: Peter M=C3=BCller >> --- >> src/python/location-importer.in | 110 ++++++++++++++++++++++++++++++++ >> 1 file changed, 110 insertions(+) >> >> diff --git a/src/python/location-importer.in b/src/python/location-importe= r.in >> index 1e08458..5be1d61 100644 >> --- a/src/python/location-importer.in >> +++ b/src/python/location-importer.in >> @@ -19,6 +19,7 @@ >> >> import argparse >> import ipaddress >> +import json >> import logging >> import math >> import re >> @@ -931,6 +932,10 @@ class CLI(object): >> TRUNCATE TABLE network_overrides; >> """) >> >> + # Update overrides for various cloud providers big enough to publish t= heir own IP >> + # network allocation lists in a machine-readable format... >> + self._update_overrides_for_aws() >> + >> for file in ns.files: >> log.info("Reading %s..." % file) >> >> @@ -998,6 +1003,111 @@ class CLI(object): >> else: >> log.warning("Unsupported type: %s" % type) >> >> + def _update_overrides_for_aws(self): >> + # Download Amazon AWS IP allocation file to create overrides... >> + downloader =3D location.importer.Downloader() >> + >> + try: >> + with downloader.request("https://ip-ranges.amazonaws.com/ip-ranges.jso= n", return_blocks=3DFalse) as f: >> + aws_ip_dump =3D json.load(f.body) >> + except Exception as e: >> + log.error("unable to preprocess Amazon AWS IP ranges: %s" % e) >> + return >> + >> + # XXX: Set up a dictionary for mapping a region name to a country. Unfo= rtunately, >> + # there seems to be no machine-readable version available of this other= than >> + # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-ava= ilability-zones.html >> + # (worse, it seems to be incomplete :-/ ); https://www.cloudping.cloud/= endpoints >> + # was helpful here as well. >> + aws_region_country_map =3D { >> + "af-south-1": "ZA", >> + "ap-east-1": "HK", >> + "ap-south-1": "IN", >> + "ap-south-2": "IN", >> + "ap-northeast-3": "JP", >> + "ap-northeast-2": "KR", >> + "ap-southeast-1": "SG", >> + "ap-southeast-2": "AU", >> + "ap-southeast-3": "MY", >> + "ap-northeast-1": "JP", >> + "ca-central-1": "CA", >> + "eu-central-1": "DE", >> + "eu-central-2": "CH", >> + "eu-west-1": "IE", >> + "eu-west-2": "GB", >> + "eu-south-1": "IT", >> + "eu-south-2": "ES", >> + "eu-west-3": "FR", >> + "eu-north-1": "SE", >> + "me-south-1": "BH", >> + "sa-east-1": "BR" >> + } >> + >> + # Fetch all valid country codes to check parsed networks aganist... >> + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_code") >> + validcountries =3D [] >> + >> + for row in rows: >> + validcountries.append(row.country_code) >> + >> + with self.db.transaction(): >> + for snetwork in aws_ip_dump["prefixes"] + aws_ip_dump["ipv6_prefixes"]: >> + try: >> + network =3D ipaddress.ip_network(snetwork.get("ip_prefix") or snetwo= rk.get("ipv6_prefix"), strict=3DFalse) >> + except ValueError: >> + log.warning("Unable to parse line: %s" % snetwork) >> + continue >> + >> + # Sanitize parsed networks... >> + if not self._check_parsed_network(network): >> + continue >> + >> + # Determine region of this network... >> + region =3D snetwork["region"] >> + cc =3D None >> + is_anycast =3D False >> + >> + # Any region name starting with "us-" will get "US" country code assi= gned straight away... >> + if region.startswith("us-"): >> + cc =3D "US" >> + elif region.startswith("cn-"): >> + # ... same goes for China ... >> + cc =3D "CN" >> + elif region =3D=3D "GLOBAL": >> + # ... funny region name for anycast-like networks ... >> + is_anycast =3D True >> + elif region in aws_region_country_map: >> + # ... assign looked up country code otherwise ... >> + cc =3D aws_region_country_map[region] >> + else: >> + # ... and bail out if we are missing something here >> + log.warning("Unable to determine country code for line: %s" % snetwo= rk) >> + continue >> + >> + # Skip networks with unknown country codes >> + if not is_anycast and validcountries and cc not in validcountries: >> + log.warning("Skipping Amazon AWS network with bogus country '%s': %s= " % \ >> + (cc, network)) >> + return >> + >> + # Conduct SQL statement... >> + self.db.execute(""" >> + INSERT INTO network_overrides( >> + network, >> + country, >> + is_anonymous_proxy, >> + is_satellite_provider, >> + is_anycast >> + ) VALUES (%s, %s, %s, %s, %s) >> + ON CONFLICT (network) DO NOTHING""", >> + "%s" % network, >> + cc, >> + None, >> + None, >> + is_anycast, >> + ) >> + >> + >> @staticmethod >> def _parse_bool(block, key): >> val =3D block.get(key) >> --=20 >> 2.26.2 >=20 --===============8063777577212952708==--