public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Robin Roevens <robin.roevens@disroot.org>
To: development@lists.ipfire.org
Subject: Re: Stale pakfire lock-file causing pakfire to no longer work
Date: Thu, 15 Sep 2022 21:01:31 +0200	[thread overview]
Message-ID: <bab7eed23f43b42cb973a46aa033e44aa1306de0.camel@sicho.home> (raw)
In-Reply-To: <f098b7ca-3fa4-b914-85eb-2c6924618522@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 5159 bytes --]

Hi Peter

This is definitely _not_ a show-stopper for CU 170 as this is already
present in pakfire since the lock-file was introduced in commit
https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=d6c2e6715575c4d531f1302ab6c7368329da8bd4
(24/05/21)

I noticed this problem back then but didn't investigate it properly
until now. And since in the meantime nobody else seems to have noticed
or reported this problem here, in bugzilla, the forum nor on my github
page for my zabbix template. 
So I can only assume it is quite obscure and possibly easier triggered
on an IPFire mini appliance (which is where I see the problem) than on
higher-end HW.

So I see no reason to delay CU 170 for this, as it was already present
since CU 158.

Regards
Robin

Peter Müller schreef op do 15-09-2022 om 07:39 [+0000]:
> Hello Robin,
> 
> thank you for your detailed e-mail.
> 
> Just to ensure I did not misunderstood/overlook anything: Is this bug
> a
> show-stopper to the release of Core Update 170? I.e., does it prevent
> (some) IPFire installations from conducting further Pakfire tasks?
> 
> Thanks, and best regards,
> Peter Müller
> 
> 
> > Hi all
> > 
> > Since the introduction of the /tmp/pakfire_lock-file in pakfire, I
> > have
> > a problem with monitoring 'pakfire status' using Zabbix.
> > 
> > Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status"
> > using
> > the Zabbix Agent (which runs as user 'zabbix'); (this check was
> > actually implemented by Alex back when he maintained the
> > zabbix_agent
> > addon) 
> > This works correctly for a while until pakfire suddenly refuses to
> > start because /tmp/pakfire_lock is still present. But there is no
> > (old)
> > pakfire proces active anymore and the lockfile is never cleared. I
> > have
> > to manually delete it, to have pakfire work again for a while.
> > 
> > Zabbix agent has a built-in timeout of 30s waiting for output of a
> > called process; and if by then the process has not exited, it will
> > get
> > killed. 
> > At first I thought that that could be the problem, so I modified
> > the
> > check so that instead of Zabbix agent calling pakfire, it calls a
> > custom script which in turn spawns a background process for
> > pakfire,
> > with the output redirected to zabbix_sender (a utility to directly
> > sent
> > data to Zabbix bypassing the agent). This way the agent won't kill
> > the
> > pakfire process as the custom script finishes almost instantly and
> > the
> > agent itself does not know of the spawned pakfire process.
> > Then when the background pakfire process finishes, zabbix_sender
> > just
> > sends the output to Zabbix and this works without any timeout. So
> > if it
> > would happen that pakfire hangs, it would stay so..
> > But also using this method.. I get the exact same result. This
> > works
> > correctly for a while until suddenly the lockfile is not cleared
> > and
> > pakfire won't start anymore.
> > 
> > I have tried to emulate this behaviour manually trying to kill
> > pakfire
> > aggressively while it is busy and executing pakfire many times
> > shortly
> > after each other and in parallel.. But I fail to reproduce this
> > behaviour. So I have no idea why this behavior happens when called
> > unattended by Zabbix.
> > 
> > The only possible clue I found is this line in the agent logfile
> > (when
> > still using the 'normal' method of letting the agent call pakfire
> > directly):
> > failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation
> > not
> > permitted
> > which according some Chinese blogs I found, could be caused by sudo
> > bug
> > 447: 
> > https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-program-as-zombie/
> > https://bugzilla.sudo.ws/show_bug.cgi?id=447
> > However, that bug should no longer be present in sudo 1.9 which is
> > currently shipped with IPFire.
> > Despite that, I currently do suspect sudo to be the culprit.
> > 
> > So I would like to propose a change to pakfire and its permissions,
> > to
> > allow for a non-root user to execute pakfire, and then within
> > pakfire
> > itself, check if the current user is root or not, and allow
> > informational commands like 'status' to be executed by a non-root
> > user
> > (all db files are world-readable anyway).
> > This way, sudo is no longer required for Zabbix to call 'pakfire
> > status'. Hoping this would fix the problem.
> > 
> > Alternatively we could record the pid of the current process during
> > lock-file creation, and have a new pakfire process check if that
> > pid
> > still exists; if not, dump its own pid in the lockfile and continue
> > work instead of bailing out. But I'm not sure how to implement this
> > without again having a chance for some race conditions when
> > multiple
> > pakfire executions are performed in parallel. 
> > 
> > Or if anyone has better ideas to (try to) fix this ?
> > 
> > Regards
> > Robin
> > 
> 

-- 
Dit bericht is gescanned op virussen en andere gevaarlijke
inhoud door MailScanner en lijkt schoon te zijn.


  reply	other threads:[~2022-09-15 19:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-14 19:48 Robin Roevens
2022-09-15  7:39 ` Peter Müller
2022-09-15 19:01   ` Robin Roevens [this message]
2022-09-15 19:09     ` Bernhard Bitsch
2022-09-15 11:48 ` Bernhard Bitsch
2022-09-15 19:43   ` Robin Roevens
2022-09-15 20:03     ` Bernhard Bitsch
2022-09-15 20:30       ` Robin Roevens
2022-09-15 23:27         ` Bernhard Bitsch
2022-09-17 21:56           ` Robin Roevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bab7eed23f43b42cb973a46aa033e44aa1306de0.camel@sicho.home \
    --to=robin.roevens@disroot.org \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox