public inbox for location@lists.ipfire.org
 help / color / mirror / Atom feed
From: Michael Tremer <michael.tremer@ipfire.org>
To: location@lists.ipfire.org
Subject: Re: [PATCH v2 3/3] importer: Purge any redundant entries
Date: Thu, 24 Sep 2020 11:22:18 +0100	[thread overview]
Message-ID: <FD74922B-981B-439C-9081-D45ABB5C0885@ipfire.org> (raw)
In-Reply-To: <8a59a82a-ba19-d1d2-0a1d-87fc72195f16@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 2903 bytes --]

Hello,

I had to revert this patch and the previous one.

For some reason, it is impossible to generate a new database although I made more processor cores and memory available to the system.

Please review this and try to find another approach that has less of an impact on the database system and that enables us to run a database update within about an hour.

Best,
-Michael

> On 20 Sep 2020, at 20:21, Peter Müller <peter.mueller(a)ipfire.org> wrote:
> 
> When importing inetnums, we might import various small networks
> which are not relevant for us as long as they do not have a
> different country code than their parent network.
> 
> Therefore we delete all these entries to keep the database
> smaller without losing any information. The second version of this
> patch introduces a SQL statement parallelised across all CPUs
> available, while the DELETE-statement of the first version literally
> took ages to complete.
> 
> However, cleaning up those data still takes about 26 hours (!) on
> our location02 testing machine, making daily updates of the location
> database impossible to the current knowledge.
> 
> real    1521m30.620s
> user    38m45.521s
> sys     9m6.027s
> 
> Special thanks goes to Michael for spending numerous hours
> on this, setting up a testing environment, doing PostgreSQL magic
> and providing helpful advice while debugging.
> 
> Partially fixes: #12458
> 
> Cc: Michael Tremer <michael.tremer(a)ipfire.org>
> Signed-off-by: Peter Müller <peter.mueller(a)ipfire.org>
> ---
> src/python/location-importer.in | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/src/python/location-importer.in b/src/python/location-importer.in
> index e3a07a0..1467923 100644
> --- a/src/python/location-importer.in
> +++ b/src/python/location-importer.in
> @@ -374,7 +374,27 @@ class CLI(object):
> 				INSERT INTO autnums(number, name)
> 					SELECT _autnums.number, _organizations.name FROM _autnums
> 						JOIN _organizations ON _autnums.organization = _organizations.handle
> -				ON CONFLICT (number) DO UPDATE SET name = excluded.name;
> +				ON CONFLICT (number) DO UPDATE SET name = excluded.name
> +			""")
> +
> +			self.db.execute("""
> +				--- Purge any redundant entries
> +				CREATE TEMPORARY TABLE _garbage ON COMMIT DROP
> +				AS
> +				SELECT network FROM networks candidates
> +				WHERE EXISTS (
> +					SELECT FROM networks
> +					WHERE
> +						networks.network << candidates.network
> +					AND
> +						networks.country = candidates.country
> +				);
> +
> +				CREATE UNIQUE INDEX _garbage_search ON _garbage USING BTREE(network);
> +
> +				DELETE FROM networks WHERE EXISTS (
> +					SELECT FROM _garbage WHERE networks.network = _garbage.network
> +				);
> 			""")
> 
> 		# Download all extended sources
> -- 
> 2.26.2


      reply	other threads:[~2020-09-24 10:22 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-20 19:19 [PATCH v2 1/3] location-importer.in: avoid violating NOT NULL constraints during JOIN Peter Müller
2020-09-20 19:20 ` [PATCH v2 2/3] importer: Import raw sources for inetnum's again Peter Müller
2020-09-20 19:21   ` [PATCH v2 3/3] importer: Purge any redundant entries Peter Müller
2020-09-24 10:22     ` Michael Tremer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FD74922B-981B-439C-9081-D45ABB5C0885@ipfire.org \
    --to=michael.tremer@ipfire.org \
    --cc=location@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox