public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
* Stale pakfire lock-file causing pakfire to no longer work
@ 2022-09-14 19:48 Robin Roevens
  2022-09-15  7:39 ` Peter Müller
  2022-09-15 11:48 ` Bernhard Bitsch
  0 siblings, 2 replies; 10+ messages in thread
From: Robin Roevens @ 2022-09-14 19:48 UTC (permalink / raw)
  To: development

[-- Attachment #1: Type: text/plain, Size: 3578 bytes --]

Hi all

Since the introduction of the /tmp/pakfire_lock-file in pakfire, I have
a problem with monitoring 'pakfire status' using Zabbix.

Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status" using
the Zabbix Agent (which runs as user 'zabbix'); (this check was
actually implemented by Alex back when he maintained the zabbix_agent
addon) 
This works correctly for a while until pakfire suddenly refuses to
start because /tmp/pakfire_lock is still present. But there is no (old)
pakfire proces active anymore and the lockfile is never cleared. I have
to manually delete it, to have pakfire work again for a while.

Zabbix agent has a built-in timeout of 30s waiting for output of a
called process; and if by then the process has not exited, it will get
killed. 
At first I thought that that could be the problem, so I modified the
check so that instead of Zabbix agent calling pakfire, it calls a
custom script which in turn spawns a background process for pakfire,
with the output redirected to zabbix_sender (a utility to directly sent
data to Zabbix bypassing the agent). This way the agent won't kill the
pakfire process as the custom script finishes almost instantly and the
agent itself does not know of the spawned pakfire process.
Then when the background pakfire process finishes, zabbix_sender just
sends the output to Zabbix and this works without any timeout. So if it
would happen that pakfire hangs, it would stay so..
But also using this method.. I get the exact same result. This works
correctly for a while until suddenly the lockfile is not cleared and
pakfire won't start anymore.

I have tried to emulate this behaviour manually trying to kill pakfire
aggressively while it is busy and executing pakfire many times shortly
after each other and in parallel.. But I fail to reproduce this
behaviour. So I have no idea why this behavior happens when called
unattended by Zabbix.

The only possible clue I found is this line in the agent logfile (when
still using the 'normal' method of letting the agent call pakfire
directly):
failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation not
permitted
which according some Chinese blogs I found, could be caused by sudo bug
447: 
https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-program-as-zombie/
https://bugzilla.sudo.ws/show_bug.cgi?id=447
However, that bug should no longer be present in sudo 1.9 which is
currently shipped with IPFire.
Despite that, I currently do suspect sudo to be the culprit.

So I would like to propose a change to pakfire and its permissions, to
allow for a non-root user to execute pakfire, and then within pakfire
itself, check if the current user is root or not, and allow
informational commands like 'status' to be executed by a non-root user
(all db files are world-readable anyway).
This way, sudo is no longer required for Zabbix to call 'pakfire
status'. Hoping this would fix the problem.

Alternatively we could record the pid of the current process during
lock-file creation, and have a new pakfire process check if that pid
still exists; if not, dump its own pid in the lockfile and continue
work instead of bailing out. But I'm not sure how to implement this
without again having a chance for some race conditions when multiple
pakfire executions are performed in parallel. 

Or if anyone has better ideas to (try to) fix this ?

Regards
Robin

-- 
Dit bericht is gescanned op virussen en andere gevaarlijke
inhoud door MailScanner en lijkt schoon te zijn.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-09-17 21:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-14 19:48 Stale pakfire lock-file causing pakfire to no longer work Robin Roevens
2022-09-15  7:39 ` Peter Müller
2022-09-15 19:01   ` Robin Roevens
2022-09-15 19:09     ` Bernhard Bitsch
2022-09-15 11:48 ` Bernhard Bitsch
2022-09-15 19:43   ` Robin Roevens
2022-09-15 20:03     ` Bernhard Bitsch
2022-09-15 20:30       ` Robin Roevens
2022-09-15 23:27         ` Bernhard Bitsch
2022-09-17 21:56           ` Robin Roevens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox