From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: development@lists.ipfire.org Subject: Re: Stale pakfire lock-file causing pakfire to no longer work Date: Thu, 15 Sep 2022 07:39:03 +0000 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3853944173476788416==" List-Id: --===============3853944173476788416== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Robin, thank you for your detailed e-mail. Just to ensure I did not misunderstood/overlook anything: Is this bug a show-stopper to the release of Core Update 170? I.e., does it prevent (some) IPFire installations from conducting further Pakfire tasks? Thanks, and best regards, Peter M=C3=BCller > Hi all >=20 > Since the introduction of the /tmp/pakfire_lock-file in pakfire, I have > a problem with monitoring 'pakfire status' using Zabbix. >=20 > Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status" using > the Zabbix Agent (which runs as user 'zabbix'); (this check was > actually implemented by Alex back when he maintained the zabbix_agent > addon)=C2=A0 > This works correctly for a while until pakfire suddenly refuses to > start because /tmp/pakfire_lock is still present. But there is no (old) > pakfire proces active anymore and the lockfile is never cleared. I have > to manually delete it, to have pakfire work again for a while. >=20 > Zabbix agent has a built-in timeout of 30s waiting for output of a > called process; and if by then the process has not exited, it will get > killed.=C2=A0 > At first I thought that that could be the problem, so I modified the > check so that instead of Zabbix agent calling pakfire, it calls a > custom script which in turn spawns a background process for pakfire, > with the output redirected to zabbix_sender (a utility to directly sent > data to Zabbix bypassing the agent). This way the agent won't kill the > pakfire process as the custom script finishes almost instantly and the > agent itself does not know of the spawned pakfire process. > Then when the background pakfire process finishes, zabbix_sender just > sends the output to Zabbix and this works without any timeout. So if it > would happen that pakfire hangs, it would stay so.. > But also using this method.. I get the exact same result. This works > correctly for a while until suddenly the lockfile is not cleared and > pakfire won't start anymore. >=20 > I have tried to emulate this behaviour manually trying to kill pakfire > aggressively while it is busy and executing pakfire many times shortly > after each other and in parallel.. But I fail to reproduce this > behaviour. So I have no idea why this behavior happens when called > unattended by Zabbix. >=20 > The only possible clue I found is this line in the agent logfile (when > still using the 'normal' method of letting the agent call pakfire > directly): > failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation not > permitted > which according some Chinese blogs I found, could be caused by sudo bug > 447:=C2=A0 > https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-progr= am-as-zombie/ > https://bugzilla.sudo.ws/show_bug.cgi?id=3D447 > However, that bug should no longer be present in sudo 1.9 which is > currently shipped with IPFire. > Despite that, I currently do suspect sudo to be the culprit. >=20 > So I would like to propose a change to pakfire and its permissions, to > allow for a non-root user to execute pakfire, and then within pakfire > itself, check if the current user is root or not, and allow > informational commands like 'status' to be executed by a non-root user > (all db files are world-readable anyway). > This way, sudo is no longer required for Zabbix to call 'pakfire > status'. Hoping this would fix the problem. >=20 > Alternatively we could record the pid of the current process during > lock-file creation, and have a new pakfire process check if that pid > still exists; if not, dump its own pid in the lockfile and continue > work instead of bailing out. But I'm not sure how to implement this > without again having a chance for some race conditions when multiple > pakfire executions are performed in parallel.=20 >=20 > Or if anyone has better ideas to (try to) fix this ? >=20 > Regards > Robin >=20 --===============3853944173476788416==--