From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH] location-importer.in: import additional IP information for Amazon AWS IP networks Date: Tue, 18 May 2021 11:43:22 +0100 Message-ID: In-Reply-To: <983baf3b-59e9-88e3-89c3-0f1dca3e4a9e@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1648242435100845198==" List-Id: --===============1648242435100845198== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 14 May 2021, at 17:22, Peter M=C3=BCller wr= ote: >=20 > Hello Michael, >=20 > thanks for your answer. I guess I'll reply better late than never... >=20 >> Hello, >>=20 >>> On 12 Apr 2021, at 18:48, Peter M=C3=BCller = wrote: >>>=20 >>> Hello Michael, >>>=20 >>> thanks for your reply. >>>=20 >>> Frankly, the longer I think about this patches' approach, the more I beco= me unhappy with it: >>=20 >> Oh no. Don=E2=80=99t overthink it :) >=20 > D=C3=A9formation professionnelle. Sorry. >=20 >>=20 >>> (a) We are processing the Amazon AWS IP range feed overcredulous: It come= s without being digitally signed >>> in any way over a HTTPS connection - at least _I_ don't trust PKI, and = should probably finally write that >>> blog post about it planned for quite some time now :-/ - from a CDN. ip= -ranges.amazonaws.com is not even >>> DNSSEC-signed, not to mention DANE for their web service. >>=20 >> There would be no other way how we can authenticate this data. We do exact= ly the same with data from the RIRs. >=20 > Well, we at least could rely on RIR data being signed. Amazon did not bless= us with that. Again, nobody does. Maybe this is something you should raise at the next RIR = meeting :) >>=20 >>> Worse, my patch lacks additional safeguards. At the moment, the feeds' = content is only checked for too big >>> to too small prefixes, or anything not globally routable, and similar o= ddities. Amazon, however, must not >>> publish any information regarding IP space they do not own - and if the= y do, we should not process it. >>=20 >> Do we not automatically filter those out later? Should we apply the same D= ELETE FROM =E2=80=A6 statements to the overrides table that we apply to the i= mported RIR data? >>=20 >> https://git.ipfire.org/?p=3Dlocation/libloc.git;a=3Dblob;f=3Dsrc/python/l= ocation-importer.in;h=3D1e08458223bad810d133c2f08703c7b3ee84fc72;hb=3DHEAD#l7= 44 >=20 > No, that DELETE FROM statement block covers announcements, not network obje= cts parsed from RIRs. I know, and I was suggesting to run it on networks, too. >>> While this does not eliminate the possible attack of somebody tampering= with their feed on their server(s), >>> the CDN, or anywhere in between, it would prevent a hostile actor to ab= use that feed for arbitrarily spoofing >>> the contents of a libloc database generated by us. >>>=20 >>> Unfortunately, I have no elegant idea how to do this at the moment. A m= ost basic approach would consist in >>> rejecting any network not announced by ASNs we know are owned or mainta= ined by Amazon - not sure how volatile >>> this list would be. >>=20 >> Probably single networks won=E2=80=99t be moved at all, but at the size of= AWS I assume that new networks are added very often. >=20 > Possibly, but hopefully not new Autonomous Systems. Restricting Amazon on t= hose would, however, mean an additional *.txt > file, but I am fine with that. >=20 >>=20 >>> Only accepting information for networks whose RIR data proof ownership = or maintenance by Amazon would be a >>> more thorough approach, though. However, that involves bulk queries to = the Whois, as a decent chunk of their >>> IP space is assigned by ARIN. In case of RIPE et al., we might parse ou= r way through the databases we already >>> have, but this is laborious, and we have no routines for enumerating ma= intainer data yet. >>=20 >> That would be a rather complicated process and I am not sure if it is wort= h it. >=20 > Probably not, thanks to heavy rate limits on their Whois servers. >=20 >>=20 >> IP address space that has been acquired and is transitioning to AWS might = not show up as owned by the right entity/entities and we might reject it. We = simply cannot check this automatically as we cannot check any other IP networ= k being owned by who ever it says. >>=20 >>> (b) I honestly dislike intransparent changes here. Since we fill the over= ride SQL table on demand every time, >>> retracing content of generated location databases will be quite tricky = if they did not originate from our own >>> override files. >>=20 >> I am a little bit unhappy with this as well. The overrides table also take= s precedence. That is why I would have expected this in the networks table. >>=20 >> In a way, the RIRs are not transparent to us and we just import their data= , do something with it and put it into our database. AWS is just another sour= ce of data just like the RIRs. >>=20 >> Although it isn=E2=80=99t perfect, I could live a lot better with this sol= ution. >=20 > Me too. However, I would like to have a "source" column in the networks tab= le then, so we could at least filter those > networks out easily, if we want or need to. Agreed. Sadly we won=E2=80=99t be able to have this in the text dump of the d= atabase that we commit to the Git repository which makes debugging more compl= icated. >>=20 >>> On the other hand, we do not store the contents of the RIR databases do= wnloaded, either. Simply dumping the >>> Amazon AWS IP range feed into our Git repository would solve the transp= arency issue, but results in unnecessary >>> bloat - unless we really need it someday. >>>=20 >>> Do you have a particular idea about how to solve this issue in mind? >>=20 >> See above. >>=20 >>> Regarding (a), the RIRs' FTP server FQDNs are at least DNSSEC-signed, but= we do not enforce this. While I vaguely >>> remember to have seen signatures for the RIPE database, we currently do n= ot validate it, either. Although this >>> would increase complexity and affects performance when generating a datab= ase at our end, I would propose to do so >>> whenever possible. Thoughts? >>=20 >> Yes, we *should* do this, but I currently do not have any free time to wor= k on it. Would be happy to support you on this. >=20 > I see, this will be the next item on my to do list then... >=20 > Thanks, and best regards, > Peter M=C3=BCller >=20 >>=20 >>> Sorry for this length and not very optimistic answer. If you ask me, you'= ll always get the worst-case scenario. :-) >>>=20 >>> After all, we are doing security here... >>=20 >> :) >>=20 >> -Michael >>=20 >>>=20 >>> Thanks, and best regards, >>> Peter M=C3=BCller >>>=20 >>>=20 >>>> Hello Peter, >>>>=20 >>>> Thanks for this, I guess this would affect quite a few people out there= =E2=80=A6 >>>>=20 >>>> However, is it a good idea to use the overrides table for this? Should t= hat not be reserved for the pure overrides? >>>>=20 >>>> There is no way to view these changes. Is that something we can live wit= h? >>>>=20 >>>> -Michael >>>>=20 >>>>> On 10 Apr 2021, at 13:28, Peter M=C3=BCller wrote: >>>>>=20 >>>>> Amazon publishes information regarding some of their IP networks >>>>> primarily used for AWS cloud services in a machine-readable format. To >>>>> improve libloc lookup results for these, we have little choice other >>>>> than importing and parsing them. >>>>>=20 >>>>> Unfortunately, there seems to be no machine-readable list of the >>>>> locations of their data centers or availability zones available. If >>>>> there _is_ any, please let the author know. >>>>>=20 >>>>> Fixes: #12594 >>>>>=20 >>>>> Signed-off-by: Peter M=C3=BCller >>>>> --- >>>>> src/python/location-importer.in | 110 ++++++++++++++++++++++++++++++++ >>>>> 1 file changed, 110 insertions(+) >>>>>=20 >>>>> diff --git a/src/python/location-importer.in b/src/python/location-impo= rter.in >>>>> index 1e08458..5be1d61 100644 >>>>> --- a/src/python/location-importer.in >>>>> +++ b/src/python/location-importer.in >>>>> @@ -19,6 +19,7 @@ >>>>>=20 >>>>> import argparse >>>>> import ipaddress >>>>> +import json >>>>> import logging >>>>> import math >>>>> import re >>>>> @@ -931,6 +932,10 @@ class CLI(object): >>>>> TRUNCATE TABLE network_overrides; >>>>> """) >>>>>=20 >>>>> + # Update overrides for various cloud providers big enough to publis= h their own IP >>>>> + # network allocation lists in a machine-readable format... >>>>> + self._update_overrides_for_aws() >>>>> + >>>>> for file in ns.files: >>>>> log.info("Reading %s..." % file) >>>>>=20 >>>>> @@ -998,6 +1003,111 @@ class CLI(object): >>>>> else: >>>>> log.warning("Unsupported type: %s" % type) >>>>>=20 >>>>> + def _update_overrides_for_aws(self): >>>>> + # Download Amazon AWS IP allocation file to create overrides... >>>>> + downloader =3D location.importer.Downloader() >>>>> + >>>>> + try: >>>>> + with downloader.request("https://ip-ranges.amazonaws.com/ip-ranges.= json", return_blocks=3DFalse) as f: >>>>> + aws_ip_dump =3D json.load(f.body) >>>>> + except Exception as e: >>>>> + log.error("unable to preprocess Amazon AWS IP ranges: %s" % e) >>>>> + return >>>>> + >>>>> + # XXX: Set up a dictionary for mapping a region name to a country. U= nfortunately, >>>>> + # there seems to be no machine-readable version available of this ot= her than >>>>> + # https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-= availability-zones.html >>>>> + # (worse, it seems to be incomplete :-/ ); https://www.cloudping.clo= ud/endpoints >>>>> + # was helpful here as well. >>>>> + aws_region_country_map =3D { >>>>> + "af-south-1": "ZA", >>>>> + "ap-east-1": "HK", >>>>> + "ap-south-1": "IN", >>>>> + "ap-south-2": "IN", >>>>> + "ap-northeast-3": "JP", >>>>> + "ap-northeast-2": "KR", >>>>> + "ap-southeast-1": "SG", >>>>> + "ap-southeast-2": "AU", >>>>> + "ap-southeast-3": "MY", >>>>> + "ap-northeast-1": "JP", >>>>> + "ca-central-1": "CA", >>>>> + "eu-central-1": "DE", >>>>> + "eu-central-2": "CH", >>>>> + "eu-west-1": "IE", >>>>> + "eu-west-2": "GB", >>>>> + "eu-south-1": "IT", >>>>> + "eu-south-2": "ES", >>>>> + "eu-west-3": "FR", >>>>> + "eu-north-1": "SE", >>>>> + "me-south-1": "BH", >>>>> + "sa-east-1": "BR" >>>>> + } >>>>> + >>>>> + # Fetch all valid country codes to check parsed networks aganist... >>>>> + rows =3D self.db.query("SELECT * FROM countries ORDER BY country_cod= e") >>>>> + validcountries =3D [] >>>>> + >>>>> + for row in rows: >>>>> + validcountries.append(row.country_code) >>>>> + >>>>> + with self.db.transaction(): >>>>> + for snetwork in aws_ip_dump["prefixes"] + aws_ip_dump["ipv6_prefixe= s"]: >>>>> + try: >>>>> + network =3D ipaddress.ip_network(snetwork.get("ip_prefix") or sne= twork.get("ipv6_prefix"), strict=3DFalse) >>>>> + except ValueError: >>>>> + log.warning("Unable to parse line: %s" % snetwork) >>>>> + continue >>>>> + >>>>> + # Sanitize parsed networks... >>>>> + if not self._check_parsed_network(network): >>>>> + continue >>>>> + >>>>> + # Determine region of this network... >>>>> + region =3D snetwork["region"] >>>>> + cc =3D None >>>>> + is_anycast =3D False >>>>> + >>>>> + # Any region name starting with "us-" will get "US" country code a= ssigned straight away... >>>>> + if region.startswith("us-"): >>>>> + cc =3D "US" >>>>> + elif region.startswith("cn-"): >>>>> + # ... same goes for China ... >>>>> + cc =3D "CN" >>>>> + elif region =3D=3D "GLOBAL": >>>>> + # ... funny region name for anycast-like networks ... >>>>> + is_anycast =3D True >>>>> + elif region in aws_region_country_map: >>>>> + # ... assign looked up country code otherwise ... >>>>> + cc =3D aws_region_country_map[region] >>>>> + else: >>>>> + # ... and bail out if we are missing something here >>>>> + log.warning("Unable to determine country code for line: %s" % sne= twork) >>>>> + continue >>>>> + >>>>> + # Skip networks with unknown country codes >>>>> + if not is_anycast and validcountries and cc not in validcountries: >>>>> + log.warning("Skipping Amazon AWS network with bogus country '%s':= %s" % \ >>>>> + (cc, network)) >>>>> + return >>>>> + >>>>> + # Conduct SQL statement... >>>>> + self.db.execute(""" >>>>> + INSERT INTO network_overrides( >>>>> + network, >>>>> + country, >>>>> + is_anonymous_proxy, >>>>> + is_satellite_provider, >>>>> + is_anycast >>>>> + ) VALUES (%s, %s, %s, %s, %s) >>>>> + ON CONFLICT (network) DO NOTHING""", >>>>> + "%s" % network, >>>>> + cc, >>>>> + None, >>>>> + None, >>>>> + is_anycast, >>>>> + ) >>>>> + >>>>> + >>>>> @staticmethod >>>>> def _parse_bool(block, key): >>>>> val =3D block.get(key) >>>>> --=20 >>>>> 2.26.2 --===============1648242435100845198==--