From: Bernhard Bitsch <bbitsch@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Stale pakfire lock-file causing pakfire to no longer work
Date: Thu, 15 Sep 2022 21:09:30 +0200 [thread overview]
Message-ID: <1fa562fd-03fb-458b-07c6-3a2558e5a310@ipfire.org> (raw)
In-Reply-To: <bab7eed23f43b42cb973a46aa033e44aa1306de0.camel@sicho.home>
[-- Attachment #1: Type: text/plain, Size: 5249 bytes --]
Agreed (see my other post).
Am 15.09.2022 um 21:01 schrieb Robin Roevens:
> Hi Peter
>
> This is definitely _not_ a show-stopper for CU 170 as this is already
> present in pakfire since the lock-file was introduced in commit
> https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=d6c2e6715575c4d531f1302ab6c7368329da8bd4
> (24/05/21)
>
> I noticed this problem back then but didn't investigate it properly
> until now. And since in the meantime nobody else seems to have noticed
> or reported this problem here, in bugzilla, the forum nor on my github
> page for my zabbix template.
> So I can only assume it is quite obscure and possibly easier triggered
> on an IPFire mini appliance (which is where I see the problem) than on
> higher-end HW.
>
And yes, it depends on speed/performance as all race conditions.
> So I see no reason to delay CU 170 for this, as it was already present
> since CU 158.
>
> Regards
> Robin
>
Regards
Bernhard
> Peter Müller schreef op do 15-09-2022 om 07:39 [+0000]:
>> Hello Robin,
>>
>> thank you for your detailed e-mail.
>>
>> Just to ensure I did not misunderstood/overlook anything: Is this bug
>> a
>> show-stopper to the release of Core Update 170? I.e., does it prevent
>> (some) IPFire installations from conducting further Pakfire tasks?
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>
>>> Hi all
>>>
>>> Since the introduction of the /tmp/pakfire_lock-file in pakfire, I
>>> have
>>> a problem with monitoring 'pakfire status' using Zabbix.
>>>
>>> Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status"
>>> using
>>> the Zabbix Agent (which runs as user 'zabbix'); (this check was
>>> actually implemented by Alex back when he maintained the
>>> zabbix_agent
>>> addon)
>>> This works correctly for a while until pakfire suddenly refuses to
>>> start because /tmp/pakfire_lock is still present. But there is no
>>> (old)
>>> pakfire proces active anymore and the lockfile is never cleared. I
>>> have
>>> to manually delete it, to have pakfire work again for a while.
>>>
>>> Zabbix agent has a built-in timeout of 30s waiting for output of a
>>> called process; and if by then the process has not exited, it will
>>> get
>>> killed.
>>> At first I thought that that could be the problem, so I modified
>>> the
>>> check so that instead of Zabbix agent calling pakfire, it calls a
>>> custom script which in turn spawns a background process for
>>> pakfire,
>>> with the output redirected to zabbix_sender (a utility to directly
>>> sent
>>> data to Zabbix bypassing the agent). This way the agent won't kill
>>> the
>>> pakfire process as the custom script finishes almost instantly and
>>> the
>>> agent itself does not know of the spawned pakfire process.
>>> Then when the background pakfire process finishes, zabbix_sender
>>> just
>>> sends the output to Zabbix and this works without any timeout. So
>>> if it
>>> would happen that pakfire hangs, it would stay so..
>>> But also using this method.. I get the exact same result. This
>>> works
>>> correctly for a while until suddenly the lockfile is not cleared
>>> and
>>> pakfire won't start anymore.
>>>
>>> I have tried to emulate this behaviour manually trying to kill
>>> pakfire
>>> aggressively while it is busy and executing pakfire many times
>>> shortly
>>> after each other and in parallel.. But I fail to reproduce this
>>> behaviour. So I have no idea why this behavior happens when called
>>> unattended by Zabbix.
>>>
>>> The only possible clue I found is this line in the agent logfile
>>> (when
>>> still using the 'normal' method of letting the agent call pakfire
>>> directly):
>>> failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation
>>> not
>>> permitted
>>> which according some Chinese blogs I found, could be caused by sudo
>>> bug
>>> 447:
>>> https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-program-as-zombie/
>>> https://bugzilla.sudo.ws/show_bug.cgi?id=447
>>> However, that bug should no longer be present in sudo 1.9 which is
>>> currently shipped with IPFire.
>>> Despite that, I currently do suspect sudo to be the culprit.
>>>
>>> So I would like to propose a change to pakfire and its permissions,
>>> to
>>> allow for a non-root user to execute pakfire, and then within
>>> pakfire
>>> itself, check if the current user is root or not, and allow
>>> informational commands like 'status' to be executed by a non-root
>>> user
>>> (all db files are world-readable anyway).
>>> This way, sudo is no longer required for Zabbix to call 'pakfire
>>> status'. Hoping this would fix the problem.
>>>
>>> Alternatively we could record the pid of the current process during
>>> lock-file creation, and have a new pakfire process check if that
>>> pid
>>> still exists; if not, dump its own pid in the lockfile and continue
>>> work instead of bailing out. But I'm not sure how to implement this
>>> without again having a chance for some race conditions when
>>> multiple
>>> pakfire executions are performed in parallel.
>>>
>>> Or if anyone has better ideas to (try to) fix this ?
>>>
>>> Regards
>>> Robin
>>>
>>
>
next prev parent reply other threads:[~2022-09-15 19:09 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-14 19:48 Robin Roevens
2022-09-15 7:39 ` Peter Müller
2022-09-15 19:01 ` Robin Roevens
2022-09-15 19:09 ` Bernhard Bitsch [this message]
2022-09-15 11:48 ` Bernhard Bitsch
2022-09-15 19:43 ` Robin Roevens
2022-09-15 20:03 ` Bernhard Bitsch
2022-09-15 20:30 ` Robin Roevens
2022-09-15 23:27 ` Bernhard Bitsch
2022-09-17 21:56 ` Robin Roevens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1fa562fd-03fb-458b-07c6-3a2558e5a310@ipfire.org \
--to=bbitsch@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox