From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: location@lists.ipfire.org Subject: Re: [PATCH v2 3/3] importer: Purge any redundant entries Date: Thu, 24 Sep 2020 11:22:18 +0100 Message-ID: In-Reply-To: <8a59a82a-ba19-d1d2-0a1d-87fc72195f16@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7929578681117900009==" List-Id: --===============7929578681117900009== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, I had to revert this patch and the previous one. For some reason, it is impossible to generate a new database although I made = more processor cores and memory available to the system. Please review this and try to find another approach that has less of an impac= t on the database system and that enables us to run a database update within = about an hour. Best, -Michael > On 20 Sep 2020, at 20:21, Peter M=C3=BCller wr= ote: >=20 > When importing inetnums, we might import various small networks > which are not relevant for us as long as they do not have a > different country code than their parent network. >=20 > Therefore we delete all these entries to keep the database > smaller without losing any information. The second version of this > patch introduces a SQL statement parallelised across all CPUs > available, while the DELETE-statement of the first version literally > took ages to complete. >=20 > However, cleaning up those data still takes about 26 hours (!) on > our location02 testing machine, making daily updates of the location > database impossible to the current knowledge. >=20 > real 1521m30.620s > user 38m45.521s > sys 9m6.027s >=20 > Special thanks goes to Michael for spending numerous hours > on this, setting up a testing environment, doing PostgreSQL magic > and providing helpful advice while debugging. >=20 > Partially fixes: #12458 >=20 > Cc: Michael Tremer > Signed-off-by: Peter M=C3=BCller > --- > src/python/location-importer.in | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) >=20 > diff --git a/src/python/location-importer.in b/src/python/location-importer= .in > index e3a07a0..1467923 100644 > --- a/src/python/location-importer.in > +++ b/src/python/location-importer.in > @@ -374,7 +374,27 @@ class CLI(object): > INSERT INTO autnums(number, name) > SELECT _autnums.number, _organizations.name FROM _autnums > JOIN _organizations ON _autnums.organization =3D _organizations.handle > - ON CONFLICT (number) DO UPDATE SET name =3D excluded.name; > + ON CONFLICT (number) DO UPDATE SET name =3D excluded.name > + """) > + > + self.db.execute(""" > + --- Purge any redundant entries > + CREATE TEMPORARY TABLE _garbage ON COMMIT DROP > + AS > + SELECT network FROM networks candidates > + WHERE EXISTS ( > + SELECT FROM networks > + WHERE > + networks.network << candidates.network > + AND > + networks.country =3D candidates.country > + ); > + > + CREATE UNIQUE INDEX _garbage_search ON _garbage USING BTREE(network); > + > + DELETE FROM networks WHERE EXISTS ( > + SELECT FROM _garbage WHERE networks.network =3D _garbage.network > + ); > """) >=20 > # Download all extended sources > --=20 > 2.26.2 --===============7929578681117900009==--