Agreed (see my other post). Am 15.09.2022 um 21:01 schrieb Robin Roevens: > Hi Peter > > This is definitely _not_ a show-stopper for CU 170 as this is already > present in pakfire since the lock-file was introduced in commit > https://git.ipfire.org/?p=ipfire-2.x.git;a=commit;h=d6c2e6715575c4d531f1302ab6c7368329da8bd4 > (24/05/21) > > I noticed this problem back then but didn't investigate it properly > until now. And since in the meantime nobody else seems to have noticed > or reported this problem here, in bugzilla, the forum nor on my github > page for my zabbix template. > So I can only assume it is quite obscure and possibly easier triggered > on an IPFire mini appliance (which is where I see the problem) than on > higher-end HW. > And yes, it depends on speed/performance as all race conditions. > So I see no reason to delay CU 170 for this, as it was already present > since CU 158. > > Regards > Robin > Regards Bernhard > Peter Müller schreef op do 15-09-2022 om 07:39 [+0000]: >> Hello Robin, >> >> thank you for your detailed e-mail. >> >> Just to ensure I did not misunderstood/overlook anything: Is this bug >> a >> show-stopper to the release of Core Update 170? I.e., does it prevent >> (some) IPFire installations from conducting further Pakfire tasks? >> >> Thanks, and best regards, >> Peter Müller >> >> >>> Hi all >>> >>> Since the introduction of the /tmp/pakfire_lock-file in pakfire, I >>> have >>> a problem with monitoring 'pakfire status' using Zabbix. >>> >>> Every 10 minutes, I execute "sudo /opt/pakfire/pakfire status" >>> using >>> the Zabbix Agent (which runs as user 'zabbix'); (this check was >>> actually implemented by Alex back when he maintained the >>> zabbix_agent >>> addon) >>> This works correctly for a while until pakfire suddenly refuses to >>> start because /tmp/pakfire_lock is still present. But there is no >>> (old) >>> pakfire proces active anymore and the lockfile is never cleared. I >>> have >>> to manually delete it, to have pakfire work again for a while. >>> >>> Zabbix agent has a built-in timeout of 30s waiting for output of a >>> called process; and if by then the process has not exited, it will >>> get >>> killed. >>> At first I thought that that could be the problem, so I modified >>> the >>> check so that instead of Zabbix agent calling pakfire, it calls a >>> custom script which in turn spawns a background process for >>> pakfire, >>> with the output redirected to zabbix_sender (a utility to directly >>> sent >>> data to Zabbix bypassing the agent). This way the agent won't kill >>> the >>> pakfire process as the custom script finishes almost instantly and >>> the >>> agent itself does not know of the spawned pakfire process. >>> Then when the background pakfire process finishes, zabbix_sender >>> just >>> sends the output to Zabbix and this works without any timeout. So >>> if it >>> would happen that pakfire hangs, it would stay so.. >>> But also using this method.. I get the exact same result. This >>> works >>> correctly for a while until suddenly the lockfile is not cleared >>> and >>> pakfire won't start anymore. >>> >>> I have tried to emulate this behaviour manually trying to kill >>> pakfire >>> aggressively while it is busy and executing pakfire many times >>> shortly >>> after each other and in parallel.. But I fail to reproduce this >>> behaviour. So I have no idea why this behavior happens when called >>> unattended by Zabbix. >>> >>> The only possible clue I found is this line in the agent logfile >>> (when >>> still using the 'normal' method of letting the agent call pakfire >>> directly): >>> failed to kill [sudo /opt/pakfire/pakfire status]: [1] Operation >>> not >>> permitted >>> which according some Chinese blogs I found, could be caused by sudo >>> bug >>> 447: >>> https://blog.famzah.net/2010/11/01/sudo-hangs-and-leaves-the-executed-program-as-zombie/ >>> https://bugzilla.sudo.ws/show_bug.cgi?id=447 >>> However, that bug should no longer be present in sudo 1.9 which is >>> currently shipped with IPFire. >>> Despite that, I currently do suspect sudo to be the culprit. >>> >>> So I would like to propose a change to pakfire and its permissions, >>> to >>> allow for a non-root user to execute pakfire, and then within >>> pakfire >>> itself, check if the current user is root or not, and allow >>> informational commands like 'status' to be executed by a non-root >>> user >>> (all db files are world-readable anyway). >>> This way, sudo is no longer required for Zabbix to call 'pakfire >>> status'. Hoping this would fix the problem. >>> >>> Alternatively we could record the pid of the current process during >>> lock-file creation, and have a new pakfire process check if that >>> pid >>> still exists; if not, dump its own pid in the lockfile and continue >>> work instead of bailing out. But I'm not sure how to implement this >>> without again having a chance for some race conditions when >>> multiple >>> pakfire executions are performed in parallel. >>> >>> Or if anyone has better ideas to (try to) fix this ? >>> >>> Regards >>> Robin >>> >> >