From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: import additional IP information for Amazon AWS IP networks Date: Wed, 14 Apr 2021 10:21:15 +0100 Message-ID: <4EEE5EFF-E912-4421-AC01-2B3613373CF5@ipfire.org> In-Reply-To: <523d834c-9104-8df3-a3b9-7c8be18630ba@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0949522727001414182==" List-Id: --===============0949522727001414182== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 12 Apr 2021, at 18:48, Peter M=C3=BCller wr= ote: >=20 > Hello Michael, >=20 > thanks for your reply. >=20 > Frankly, the longer I think about this patches' approach, the more I become= unhappy with it: Oh no. Don=E2=80=99t overthink it :) > (a) We are processing the Amazon AWS IP range feed overcredulous: It comes = without being digitally signed > in any way over a HTTPS connection - at least _I_ don't trust PKI, and s= hould probably finally write that > blog post about it planned for quite some time now :-/ - from a CDN. ip-= ranges.amazonaws.com is not even > DNSSEC-signed, not to mention DANE for their web service. There would be no other way how we can authenticate this data. We do exactly = the same with data from the RIRs. > Worse, my patch lacks additional safeguards. At the moment, the feeds' c= ontent is only checked for too big > to too small prefixes, or anything not globally routable, and similar od= dities. Amazon, however, must not > publish any information regarding IP space they do not own - and if they= do, we should not process it. Do we not automatically filter those out later? Should we apply the same DELE= TE FROM =E2=80=A6 statements to the overrides table that we apply to the impo= rted RIR data? https://git.ipfire.org/?p=3Dlocation/libloc.git;a=3Dblob;f=3Dsrc/python/loc= ation-importer.in;h=3D1e08458223bad810d133c2f08703c7b3ee84fc72;hb=3DHEAD#l744 > While this does not eliminate the possible attack of somebody tampering = with their feed on their server(s), > the CDN, or anywhere in between, it would prevent a hostile actor to abu= se that feed for arbitrarily spoofing > the contents of a libloc database generated by us. >=20 > Unfortunately, I have no elegant idea how to do this at the moment. A mo= st basic approach would consist in > rejecting any network not announced by ASNs we know are owned or maintai= ned by Amazon - not sure how volatile > this list would be. Probably single networks won=E2=80=99t be moved at all, but at the size of AW= S I assume that new networks are added very often. > Only accepting information for networks whose RIR data proof ownership o= r maintenance by Amazon would be a > more thorough approach, though. However, that involves bulk queries to t= he Whois, as a decent chunk of their > IP space is assigned by ARIN. In case of RIPE et al., we might parse our= way through the databases we already > have, but this is laborious, and we have no routines for enumerating mai= ntainer data yet. That would be a rather complicated process and I am not sure if it is worth i= t. IP address space that has been acquired and is transitioning to AWS might not= show up as owned by the right entity/entities and we might reject it. We sim= ply cannot check this automatically as we cannot check any other IP network b= eing owned by who ever it says. > (b) I honestly dislike intransparent changes here. Since we fill the overri= de SQL table on demand every time, > retracing content of generated location databases will be quite tricky i= f they did not originate from our own > override files. I am a little bit unhappy with this as well. The overrides table also takes p= recedence. That is why I would have expected this in the networks table. In a way, the RIRs are not transparent to us and we just import their data, d= o something with it and put it into our database. AWS is just another source = of data just like the RIRs. Although it isn=E2=80=99t perfect, I could live a lot better with this soluti= on. > On the other hand, we do not store the contents of the RIR databases dow= nloaded, either. Simply dumping the > Amazon AWS IP range feed into our Git repository would solve the transpa= rency issue, but results in unnecessary > bloat - unless we really need it someday. >=20 > Do you have a particular idea about how to solve this issue in mind? See above. > Regarding (a), the RIRs' FTP server FQDNs are at least DNSSEC-signed, but w= e do not enforce this. While I vaguely > remember to have seen signatures for the RIPE database, we currently do not= validate it, either. Although this > would increase complexity and affects performance when generating a databas= e at our end, I would propose to do so > whenever possible. Thoughts? Yes, we *should* do this, but I currently do not have any free time to work o= n it. Would be happy to support you on this. > Sorry for this length and not very optimistic answer. If you ask me, you'll= always get the worst-case scenario. :-) >=20 > After all, we are doing security here... :) -Michael >=20 > Thanks, and best regards, > Peter M=C3=BCller >=20 >=20 >> Hello Peter, >>=20 >> Thanks for this, I guess this would affect quite a few people out there=E2= =80=A6 >>=20 >> However, is it a good idea to use the overrides table for this? Should tha= t not be reserved for the pure overrides? >>=20 >> There is no way to view these changes. Is that something we can live with? >>=20 >> -Michael >>=20 >>> On 10 Apr 2021, at 13:28, Peter M=C3=BCller = wrote: >>>=20 >>> Amazon publishes information regarding some of their IP networks >>> primarily used for AWS cloud services in a machine-readable format. To >>> improve libloc lookup results for these, we have little choice other >>> than importing and parsing them. >>>=20 >>> Unfortunately, there seems to be no machine-readable list of the >>> locations of their data centers or availability zones available. If >>> there _is_ any, please let the author know. >>>=20 >>> Fixes: #12594 >>>=20 >>> Signed-off-by: Peter M=C3=BCller >>> --- >>> src/python/location-importer.in | 110 ++++++++++++++++++++++++++++++++ >>> 1 file changed, 110 insertions(+) >>>=20 >>> diff --git a/src/python/location-importer.in b/src/python/location-import= er.in >>> index 1e08458..5be1d61 100644 >>> --- a/src/python/location-importer.in >>> +++ b/src/python/location-importer.in >>> @@ -19,6 +19,7 @@ >>>=20 >>> import argparse >>> import ipaddress >>> +import json >>> import logging >>> import math >>> import re >>> @@ -931,6 +932,10 @@ class CLI(object): >>> TRUNCATE TABLE network_overrides; >>> """) >>>=20 >>> + # Update overrides for various cloud providers big enough to publish = their own IP >>> + # network allocation lists in a machine-readable format... >>> + self._update_overrides_for_aws() >>> + >>> for file in ns.files: >>> log.info("Reading %s..." % file) >>>=20 >>> @@ -998,6 +1003,111 @@ class CLI(object): >>> else: >>> log.warning("Unsupported type: %s" % type) >>>=20 >>> + def _update_overrides_for_aws(self): >>> + # Download Amazon AWS IP allocation file to create overrides... >>> + downloader =3D location.importer.Downloader() >>> + >>> + try: >>> + with downloader.request("https://ip-ranges.amazonaws.com/ip-ranges.js= on", return_blocks=3DFalse) as f: >>> + aws_ip_dump =3D json.load(f.body) >>> + except Exception as e: >>> + log.error("unable to preprocess Amazon AWS IP ranges: %s" % e) >>> + return >>> + >>> + # XXX: Set up a dictionary for mapping a region name to a country. Unf= ortunately, >>> + # there seems to be no machine-readable version available of this othe= r than >>> + # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-av= ailability-zones.html >>> + # (worse, it seems to be incomplete :-/ ); https://www.cloudping.cloud= /endpoints >>> + # was helpful here as well. >>> + aws_region_country_map =3D { >>> + "af-south-1": "ZA", >>> + "ap-east-1": "HK", >>> + "ap-south-1": "IN", >>> + "ap-south-2": "IN", >>> + "ap-northeast-3": "JP", >>> + "ap-northeast-2": "KR", >>> + "ap-southeast-1": "SG", >>> + "ap-southeast-2": "AU", >>> + "ap-southeast-3": "MY", >>> + "ap-northeast-1": "JP", >>> + "ca-central-1": "CA", >>> + "eu-central-1": "DE", >>> + "eu-central-2": "CH", >>> + "eu-west-1": "IE", >>> + "eu-west-2": "GB", >>> + "eu-south-1": "IT", >>> + "eu-south-2": "ES", >>> + "eu-west-3": "FR", >>> + "eu-north-1": "SE", >>> + "me-south-1": "BH", >>> + "sa-east-1": "BR" >>> + } >>> + >>> + # Fetch all valid country codes to check parsed networks aganist... >>> + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_code") >>> + validcountries =3D [] >>> + >>> + for row in rows: >>> + validcountries.append(row.country_code) >>> + >>> + with self.db.transaction(): >>> + for snetwork in aws_ip_dump["prefixes"] + aws_ip_dump["ipv6_prefixes"= ]: >>> + try: >>> + network =3D ipaddress.ip_network(snetwork.get("ip_prefix") or snetw= ork.get("ipv6_prefix"), strict=3DFalse) >>> + except ValueError: >>> + log.warning("Unable to parse line: %s" % snetwork) >>> + continue >>> + >>> + # Sanitize parsed networks... >>> + if not self._check_parsed_network(network): >>> + continue >>> + >>> + # Determine region of this network... >>> + region =3D snetwork["region"] >>> + cc =3D None >>> + is_anycast =3D False >>> + >>> + # Any region name starting with "us-" will get "US" country code ass= igned straight away... >>> + if region.startswith("us-"): >>> + cc =3D "US" >>> + elif region.startswith("cn-"): >>> + # ... same goes for China ... >>> + cc =3D "CN" >>> + elif region =3D=3D "GLOBAL": >>> + # ... funny region name for anycast-like networks ... >>> + is_anycast =3D True >>> + elif region in aws_region_country_map: >>> + # ... assign looked up country code otherwise ... >>> + cc =3D aws_region_country_map[region] >>> + else: >>> + # ... and bail out if we are missing something here >>> + log.warning("Unable to determine country code for line: %s" % snetw= ork) >>> + continue >>> + >>> + # Skip networks with unknown country codes >>> + if not is_anycast and validcountries and cc not in validcountries: >>> + log.warning("Skipping Amazon AWS network with bogus country '%s': %= s" % \ >>> + (cc, network)) >>> + return >>> + >>> + # Conduct SQL statement... >>> + self.db.execute(""" >>> + INSERT INTO network_overrides( >>> + network, >>> + country, >>> + is_anonymous_proxy, >>> + is_satellite_provider, >>> + is_anycast >>> + ) VALUES (%s, %s, %s, %s, %s) >>> + ON CONFLICT (network) DO NOTHING""", >>> + "%s" % network, >>> + cc, >>> + None, >>> + None, >>> + is_anycast, >>> + ) >>> + >>> + >>> @staticmethod >>> def _parse_bool(block, key): >>> val =3D block.get(key) >>> --=20 >>> 2.26.2 >>=20 --===============0949522727001414182==--