Re: Stale pakfire lock-file causing pakfire to no longer work

15 Sep 2022

      Hi Robin,
thanks for your suggestions.
I just 'playing' with the flock() solution.
Looks good so far. With a little program
try to get lock ( open(), flock() )
if successful do the job ( just sleep 60s ) and release the lock  (close())
Starting a couple of instances of this program shows only one program 
active at the same time.
I would prefer this solution, because the flock() functionality is near 
at the theoretical 'semaphore' by Dijkstra and Hoare.
I hope to be able to integrate this in the pakfire program tomorrow. 
I'll send you a copy for test. If it is really 'only' a racing condition 
problem, you should be able to prove that the issue is gone.
The further steps will be to present a patch and integrate it into the 
system.
If we don't succeed, we should create a ticket in bugzilla to discuss it 
further.
Am 15.09.2022 um 22:30 schrieb Robin Roevens:
...
Hi Bernhard
Bernhard Bitsch schreef op do 15-09-2022 om 22:03 [+0200]:
...
Hi Robin,
Am 15.09.2022 um 21:43 schrieb Robin Roevens:
...
Hi Bernhard
Bernhard Bitsch schreef op do 15-09-2022 om 13:48 [+0200]:
...
Hi all,
as an 'old real time programmer' this reminds me deeply at
Dijkstra/Hoare's "Dining philosophers problem".
The check for presence of the lockfile and the generation of it
are
not
'atomic'. Means two programs can run in parallel.
Indeed..
In a shell script, a more atomic approach would be instead of using
a
lockfile, a lock-directory:
'mkdir' creates a directory only if it not already exists and if it
does already exist, it will return an exit code. So here we have
both
checking and generating in one atomic operation.
This is better explained here:
https://wiki.bash-hackers.org/howto/mutex
Not sure if this can be translated to Perl in an atomic way..
I did find this perl code snippet however:

use strict;
use warnings;
use Fcntl ':flock';
flock(DATA, LOCK_EX|LOCK_NB) or die "There can be only one! [$0]";
# mandatory line, flocking depends on DATA file handle
__DATA__

Which could be a possible solution, I think.
Looks promising. Will look into this.
...
I also found this, which seems quiet promising:
https://metacpan.org/pod/Script::Singleton
to perform locking by using shared memory.
Maybe yet another approach (idea from here:
https://unix.stackexchange.com/a/594126 ) could be to actually check if
another process named 'pakfire' is active (using Proc::ProcessTable ?)
instead of using a lock(file). As pakfire is single-threaded, I think
this may just do the job?
I suspect, that only looking at the process table introduces just 
another race condition.
Regards,
Bernhard
...
...
...
...
I'll investigate this further. But the deletion of the lock
should
happen anyways, as far I've seen till now.
True, it should be deleted always and as said before, I could not
reproduce this manually .. but my Zabbix agent seems to be able to
trigger this problem at least once every 24h on my IPFire mini
appliance, only by executing pakfire every 10 minutes. That is why
I'm
suspecting the abnormal termination of pakfire, leaving the
lockfile in
place, is actually caused by sudo.
On the other hand.. this can also happen when pakfire is running
and
suddenly the power is cut.. then the lockfile will still be present
when the machine is back up.. So I think, if we stay with the
lockfile,
we at least need some check for a stale lockfile, like checking if
the
process that created the lockfile still exists or not and removing
it
if not.
Because the lockfile is located in /tmp, I don't think it survives a
reboot.
Right, I missed that for a moment :-).
Regards
Robin
...
Regards
Bernhard
...
Regards
Robin
...
Regards,
Bernhard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: Stale pakfire lock-file causing pakfire to no longer work