From: "Peter Müller" <peter.mueller@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Stale pakfire lock-file causing pakfire to no longer work
Date: Thu, 15 Sep 2022 07:39:03 +0000 [thread overview]
Message-ID: <f098b7ca-3fa4-b914-85eb-2c6924618522@ipfire.org> (raw)
In-Reply-To: <f65f8c8ebced8570c7ee30c1a9eaa285b71b3894.camel@sicho.home>
[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]
Hello Robin,
thank you for your detailed e-mail.
Just to ensure I did not misunderstood/overlook anything: Is this bug a
show-stopper to the release of Core Update 170? I.e., does it prevent
(some) IPFire installations from conducting further Pakfire tasks?
Thanks, and best regards,
Peter Müller
> Hi all
>
> Since the introduction of the /tmp/pakfire_lock-file in pakfire, I have
> a problem with monitoring 'pakfire status' using Zabbix.
>
> Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status" using
> the Zabbix Agent (which runs as user 'zabbix'); (this check was
> actually implemented by Alex back when he maintained the zabbix_agent
> addon)
> This works correctly for a while until pakfire suddenly refuses to
> start because /tmp/pakfire_lock is still present. But there is no (old)
> pakfire proces active anymore and the lockfile is never cleared. I have
> to manually delete it, to have pakfire work again for a while.
>
> Zabbix agent has a built-in timeout of 30s waiting for output of a
> called process; and if by then the process has not exited, it will get
> killed.
> At first I thought that that could be the problem, so I modified the
> check so that instead of Zabbix agent calling pakfire, it calls a
> custom script which in turn spawns a background process for pakfire,
> with the output redirected to zabbix_sender (a utility to directly sent
> data to Zabbix bypassing the agent). This way the agent won't kill the
> pakfire process as the custom script finishes almost instantly and the
> agent itself does not know of the spawned pakfire process.
> Then when the background pakfire process finishes, zabbix_sender just
> sends the output to Zabbix and this works without any timeout. So if it
> would happen that pakfire hangs, it would stay so..
> But also using this method.. I get the exact same result. This works
> correctly for a while until suddenly the lockfile is not cleared and
> pakfire won't start anymore.
>
> I have tried to emulate this behaviour manually trying to kill pakfire
> aggressively while it is busy and executing pakfire many times shortly
> after each other and in parallel.. But I fail to reproduce this
> behaviour. So I have no idea why this behavior happens when called
> unattended by Zabbix.
>
> The only possible clue I found is this line in the agent logfile (when
> still using the 'normal' method of letting the agent call pakfire
> directly):
> failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation not
> permitted
> which according some Chinese blogs I found, could be caused by sudo bug
> 447:
> https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-program-as-zombie/
> https://bugzilla.sudo.ws/show_bug.cgi?id=447
> However, that bug should no longer be present in sudo 1.9 which is
> currently shipped with IPFire.
> Despite that, I currently do suspect sudo to be the culprit.
>
> So I would like to propose a change to pakfire and its permissions, to
> allow for a non-root user to execute pakfire, and then within pakfire
> itself, check if the current user is root or not, and allow
> informational commands like 'status' to be executed by a non-root user
> (all db files are world-readable anyway).
> This way, sudo is no longer required for Zabbix to call 'pakfire
> status'. Hoping this would fix the problem.
>
> Alternatively we could record the pid of the current process during
> lock-file creation, and have a new pakfire process check if that pid
> still exists; if not, dump its own pid in the lockfile and continue
> work instead of bailing out. But I'm not sure how to implement this
> without again having a chance for some race conditions when multiple
> pakfire executions are performed in parallel.
>
> Or if anyone has better ideas to (try to) fix this ?
>
> Regards
> Robin
>
next prev parent reply other threads:[~2022-09-15 7:39 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-14 19:48 Robin Roevens
2022-09-15 7:39 ` Peter Müller [this message]
2022-09-15 19:01 ` Robin Roevens
2022-09-15 19:09 ` Bernhard Bitsch
2022-09-15 11:48 ` Bernhard Bitsch
2022-09-15 19:43 ` Robin Roevens
2022-09-15 20:03 ` Bernhard Bitsch
2022-09-15 20:30 ` Robin Roevens
2022-09-15 23:27 ` Bernhard Bitsch
2022-09-17 21:56 ` Robin Roevens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f098b7ca-3fa4-b914-85eb-2c6924618522@ipfire.org \
--to=peter.mueller@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox