From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: import additional IP information for Amazon AWS IP networks Date: Fri, 14 May 2021 18:22:07 +0200 Message-ID: <983baf3b-59e9-88e3-89c3-0f1dca3e4a9e@ipfire.org> In-Reply-To: <4EEE5EFF-E912-4421-AC01-2B3613373CF5@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4353919803484477676==" List-Id: --===============4353919803484477676== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, thanks for your answer. I guess I'll reply better late than never... > Hello, >=20 >> On 12 Apr 2021, at 18:48, Peter M=C3=BCller w= rote: >> >> Hello Michael, >> >> thanks for your reply. >> >> Frankly, the longer I think about this patches' approach, the more I becom= e unhappy with it: >=20 > Oh no. Don=E2=80=99t overthink it :) D=C3=A9formation professionnelle. Sorry. >=20 >> (a) We are processing the Amazon AWS IP range feed overcredulous: It comes= without being digitally signed >> in any way over a HTTPS connection - at least _I_ don't trust PKI, and = should probably finally write that >> blog post about it planned for quite some time now :-/ - from a CDN. ip= -ranges.amazonaws.com is not even >> DNSSEC-signed, not to mention DANE for their web service. >=20 > There would be no other way how we can authenticate this data. We do exactl= y the same with data from the RIRs. Well, we at least could rely on RIR data being signed. Amazon did not bless u= s with that. >=20 >> Worse, my patch lacks additional safeguards. At the moment, the feeds' = content is only checked for too big >> to too small prefixes, or anything not globally routable, and similar o= ddities. Amazon, however, must not >> publish any information regarding IP space they do not own - and if the= y do, we should not process it. >=20 > Do we not automatically filter those out later? Should we apply the same DE= LETE FROM =E2=80=A6 statements to the overrides table that we apply to the im= ported RIR data? >=20 > https://git.ipfire.org/?p=3Dlocation/libloc.git;a=3Dblob;f=3Dsrc/python/l= ocation-importer.in;h=3D1e08458223bad810d133c2f08703c7b3ee84fc72;hb=3DHEAD#l7= 44 No, that DELETE FROM statement block covers announcements, not network object= s parsed from RIRs. >=20 >> While this does not eliminate the possible attack of somebody tampering= with their feed on their server(s), >> the CDN, or anywhere in between, it would prevent a hostile actor to ab= use that feed for arbitrarily spoofing >> the contents of a libloc database generated by us. >> >> Unfortunately, I have no elegant idea how to do this at the moment. A m= ost basic approach would consist in >> rejecting any network not announced by ASNs we know are owned or mainta= ined by Amazon - not sure how volatile >> this list would be. >=20 > Probably single networks won=E2=80=99t be moved at all, but at the size of = AWS I assume that new networks are added very often. Possibly, but hopefully not new Autonomous Systems. Restricting Amazon on tho= se would, however, mean an additional *.txt file, but I am fine with that. >=20 >> Only accepting information for networks whose RIR data proof ownership = or maintenance by Amazon would be a >> more thorough approach, though. However, that involves bulk queries to = the Whois, as a decent chunk of their >> IP space is assigned by ARIN. In case of RIPE et al., we might parse ou= r way through the databases we already >> have, but this is laborious, and we have no routines for enumerating ma= intainer data yet. >=20 > That would be a rather complicated process and I am not sure if it is worth= it. Probably not, thanks to heavy rate limits on their Whois servers. >=20 > IP address space that has been acquired and is transitioning to AWS might n= ot show up as owned by the right entity/entities and we might reject it. We s= imply cannot check this automatically as we cannot check any other IP network= being owned by who ever it says. >=20 >> (b) I honestly dislike intransparent changes here. Since we fill the overr= ide SQL table on demand every time, >> retracing content of generated location databases will be quite tricky = if they did not originate from our own >> override files. >=20 > I am a little bit unhappy with this as well. The overrides table also takes= precedence. That is why I would have expected this in the networks table. >=20 > In a way, the RIRs are not transparent to us and we just import their data,= do something with it and put it into our database. AWS is just another sourc= e of data just like the RIRs. >=20 > Although it isn=E2=80=99t perfect, I could live a lot better with this solu= tion. Me too. However, I would like to have a "source" column in the networks table= then, so we could at least filter those networks out easily, if we want or need to. >=20 >> On the other hand, we do not store the contents of the RIR databases do= wnloaded, either. Simply dumping the >> Amazon AWS IP range feed into our Git repository would solve the transp= arency issue, but results in unnecessary >> bloat - unless we really need it someday. >> >> Do you have a particular idea about how to solve this issue in mind? >=20 > See above. >=20 >> Regarding (a), the RIRs' FTP server FQDNs are at least DNSSEC-signed, but = we do not enforce this. While I vaguely >> remember to have seen signatures for the RIPE database, we currently do no= t validate it, either. Although this >> would increase complexity and affects performance when generating a databa= se at our end, I would propose to do so >> whenever possible. Thoughts? >=20 > Yes, we *should* do this, but I currently do not have any free time to work= on it. Would be happy to support you on this. I see, this will be the next item on my to do list then... Thanks, and best regards, Peter M=C3=BCller >=20 >> Sorry for this length and not very optimistic answer. If you ask me, you'l= l always get the worst-case scenario. :-) >> >> After all, we are doing security here... >=20 > :) >=20 > -Michael >=20 >> >> Thanks, and best regards, >> Peter M=C3=BCller >> >> >>> Hello Peter, >>> >>> Thanks for this, I guess this would affect quite a few people out there= =E2=80=A6 >>> >>> However, is it a good idea to use the overrides table for this? Should th= at not be reserved for the pure overrides? >>> >>> There is no way to view these changes. Is that something we can live with? >>> >>> -Michael >>> >>>> On 10 Apr 2021, at 13:28, Peter M=C3=BCller = wrote: >>>> >>>> Amazon publishes information regarding some of their IP networks >>>> primarily used for AWS cloud services in a machine-readable format. To >>>> improve libloc lookup results for these, we have little choice other >>>> than importing and parsing them. >>>> >>>> Unfortunately, there seems to be no machine-readable list of the >>>> locations of their data centers or availability zones available. If >>>> there _is_ any, please let the author know. >>>> >>>> Fixes: #12594 >>>> >>>> Signed-off-by: Peter M=C3=BCller >>>> --- >>>> src/python/location-importer.in | 110 ++++++++++++++++++++++++++++++++ >>>> 1 file changed, 110 insertions(+) >>>> >>>> diff --git a/src/python/location-importer.in b/src/python/location-impor= ter.in >>>> index 1e08458..5be1d61 100644 >>>> --- a/src/python/location-importer.in >>>> +++ b/src/python/location-importer.in >>>> @@ -19,6 +19,7 @@ >>>> >>>> import argparse >>>> import ipaddress >>>> +import json >>>> import logging >>>> import math >>>> import re >>>> @@ -931,6 +932,10 @@ class CLI(object): >>>> TRUNCATE TABLE network_overrides; >>>> """) >>>> >>>> + # Update overrides for various cloud providers big enough to publish= their own IP >>>> + # network allocation lists in a machine-readable format... >>>> + self._update_overrides_for_aws() >>>> + >>>> for file in ns.files: >>>> log.info("Reading %s..." % file) >>>> >>>> @@ -998,6 +1003,111 @@ class CLI(object): >>>> else: >>>> log.warning("Unsupported type: %s" % type) >>>> >>>> + def _update_overrides_for_aws(self): >>>> + # Download Amazon AWS IP allocation file to create overrides... >>>> + downloader =3D location.importer.Downloader() >>>> + >>>> + try: >>>> + with downloader.request("https://ip-ranges.amazonaws.com/ip-ranges.j= son", return_blocks=3DFalse) as f: >>>> + aws_ip_dump =3D json.load(f.body) >>>> + except Exception as e: >>>> + log.error("unable to preprocess Amazon AWS IP ranges: %s" % e) >>>> + return >>>> + >>>> + # XXX: Set up a dictionary for mapping a region name to a country. Un= fortunately, >>>> + # there seems to be no machine-readable version available of this oth= er than >>>> + # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-a= vailability-zones.html >>>> + # (worse, it seems to be incomplete :-/ ); https://www.cloudping.clou= d/endpoints >>>> + # was helpful here as well. >>>> + aws_region_country_map =3D { >>>> + "af-south-1": "ZA", >>>> + "ap-east-1": "HK", >>>> + "ap-south-1": "IN", >>>> + "ap-south-2": "IN", >>>> + "ap-northeast-3": "JP", >>>> + "ap-northeast-2": "KR", >>>> + "ap-southeast-1": "SG", >>>> + "ap-southeast-2": "AU", >>>> + "ap-southeast-3": "MY", >>>> + "ap-northeast-1": "JP", >>>> + "ca-central-1": "CA", >>>> + "eu-central-1": "DE", >>>> + "eu-central-2": "CH", >>>> + "eu-west-1": "IE", >>>> + "eu-west-2": "GB", >>>> + "eu-south-1": "IT", >>>> + "eu-south-2": "ES", >>>> + "eu-west-3": "FR", >>>> + "eu-north-1": "SE", >>>> + "me-south-1": "BH", >>>> + "sa-east-1": "BR" >>>> + } >>>> + >>>> + # Fetch all valid country codes to check parsed networks aganist... >>>> + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_code= ") >>>> + validcountries =3D [] >>>> + >>>> + for row in rows: >>>> + validcountries.append(row.country_code) >>>> + >>>> + with self.db.transaction(): >>>> + for snetwork in aws_ip_dump["prefixes"] + aws_ip_dump["ipv6_prefixes= "]: >>>> + try: >>>> + network =3D ipaddress.ip_network(snetwork.get("ip_prefix") or snet= work.get("ipv6_prefix"), strict=3DFalse) >>>> + except ValueError: >>>> + log.warning("Unable to parse line: %s" % snetwork) >>>> + continue >>>> + >>>> + # Sanitize parsed networks... >>>> + if not self._check_parsed_network(network): >>>> + continue >>>> + >>>> + # Determine region of this network... >>>> + region =3D snetwork["region"] >>>> + cc =3D None >>>> + is_anycast =3D False >>>> + >>>> + # Any region name starting with "us-" will get "US" country code as= signed straight away... >>>> + if region.startswith("us-"): >>>> + cc =3D "US" >>>> + elif region.startswith("cn-"): >>>> + # ... same goes for China ... >>>> + cc =3D "CN" >>>> + elif region =3D=3D "GLOBAL": >>>> + # ... funny region name for anycast-like networks ... >>>> + is_anycast =3D True >>>> + elif region in aws_region_country_map: >>>> + # ... assign looked up country code otherwise ... >>>> + cc =3D aws_region_country_map[region] >>>> + else: >>>> + # ... and bail out if we are missing something here >>>> + log.warning("Unable to determine country code for line: %s" % snet= work) >>>> + continue >>>> + >>>> + # Skip networks with unknown country codes >>>> + if not is_anycast and validcountries and cc not in validcountries: >>>> + log.warning("Skipping Amazon AWS network with bogus country '%s': = %s" % \ >>>> + (cc, network)) >>>> + return >>>> + >>>> + # Conduct SQL statement... >>>> + self.db.execute(""" >>>> + INSERT INTO network_overrides( >>>> + network, >>>> + country, >>>> + is_anonymous_proxy, >>>> + is_satellite_provider, >>>> + is_anycast >>>> + ) VALUES (%s, %s, %s, %s, %s) >>>> + ON CONFLICT (network) DO NOTHING""", >>>> + "%s" % network, >>>> + cc, >>>> + None, >>>> + None, >>>> + is_anycast, >>>> + ) >>>> + >>>> + >>>> @staticmethod >>>> def _parse_bool(block, key): >>>> val =3D block.get(key) >>>> --=20 >>>> 2.26.2 >>> >=20 --===============4353919803484477676==--