When importing inetnums, we might import various small networks which are not relevant for us as long as they do not have a different country code than their parent network.
Therefore we delete all these entries to keep the database smaller without losing any information. The second version of this patch introduces a SQL statement parallelised across all CPUs available, while the DELETE-statement of the first version literally took ages to complete.
However, cleaning up those data still takes about 26 hours (!) on our location02 testing machine, making daily updates of the location database impossible to the current knowledge.
real 1521m30.620s user 38m45.521s sys 9m6.027s
Special thanks goes to Michael for spending numerous hours on this, setting up a testing environment, doing PostgreSQL magic and providing helpful advice while debugging.
Partially fixes: #12458
Cc: Michael Tremer michael.tremer@ipfire.org Signed-off-by: Peter Müller peter.mueller@ipfire.org --- src/python/location-importer.in | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/src/python/location-importer.in b/src/python/location-importer.in index e3a07a0..1467923 100644 --- a/src/python/location-importer.in +++ b/src/python/location-importer.in @@ -374,7 +374,27 @@ class CLI(object): INSERT INTO autnums(number, name) SELECT _autnums.number, _organizations.name FROM _autnums JOIN _organizations ON _autnums.organization = _organizations.handle - ON CONFLICT (number) DO UPDATE SET name = excluded.name; + ON CONFLICT (number) DO UPDATE SET name = excluded.name + """) + + self.db.execute(""" + --- Purge any redundant entries + CREATE TEMPORARY TABLE _garbage ON COMMIT DROP + AS + SELECT network FROM networks candidates + WHERE EXISTS ( + SELECT FROM networks + WHERE + networks.network << candidates.network + AND + networks.country = candidates.country + ); + + CREATE UNIQUE INDEX _garbage_search ON _garbage USING BTREE(network); + + DELETE FROM networks WHERE EXISTS ( + SELECT FROM _garbage WHERE networks.network = _garbage.network + ); """)
# Download all extended sources