Hi Robin, thanks for your suggestions. I just 'playing' with the flock() solution. Looks good so far. With a little program try to get lock ( open(), flock() ) if successful do the job ( just sleep 60s ) and release the lock (close()) Starting a couple of instances of this program shows only one program active at the same time. I would prefer this solution, because the flock() functionality is near at the theoretical 'semaphore' by Dijkstra and Hoare. I hope to be able to integrate this in the pakfire program tomorrow. I'll send you a copy for test. If it is really 'only' a racing condition problem, you should be able to prove that the issue is gone. The further steps will be to present a patch and integrate it into the system. If we don't succeed, we should create a ticket in bugzilla to discuss it further. Am 15.09.2022 um 22:30 schrieb Robin Roevens: > Hi Bernhard > > > Bernhard Bitsch schreef op do 15-09-2022 om 22:03 [+0200]: >> Hi Robin, >> >> >> Am 15.09.2022 um 21:43 schrieb Robin Roevens: >>> Hi Bernhard >>> >>> Bernhard Bitsch schreef op do 15-09-2022 om 13:48 [+0200]: >>>> Hi all, >>>> >>>> as an 'old real time programmer' this reminds me deeply at >>>> Dijkstra/Hoare's "Dining philosophers problem". >>>> >>>> The check for presence of the lockfile and the generation of it >>>> are >>>> not >>>> 'atomic'. Means two programs can run in parallel. >>> Indeed.. >>> In a shell script, a more atomic approach would be instead of using >>> a >>> lockfile, a lock-directory: >>> 'mkdir' creates a directory only if it not already exists and if it >>> does already exist, it will return an exit code. So here we have >>> both >>> checking and generating in one atomic operation. >>> This is better explained here: >>> https://wiki.bash-hackers.org/howto/mutex >>> >>> Not sure if this can be translated to Perl in an atomic way.. >>> I did find this perl code snippet however: >>> --- >>> use strict; >>> use warnings; >>> use Fcntl ':flock'; >>> >>> flock(DATA, LOCK_EX|LOCK_NB) or die "There can be only one! [$0]"; >>> >>> >>> # mandatory line, flocking depends on DATA file handle >>> __DATA__ >>> --- >>> Which could be a possible solution, I think. >>> >> >> Looks promising. Will look into this. >> >>> I also found this, which seems quiet promising: >>> https://metacpan.org/pod/Script::Singleton >>> to perform locking by using shared memory. > > > Maybe yet another approach (idea from here: > https://unix.stackexchange.com/a/594126 ) could be to actually check if > another process named 'pakfire' is active (using Proc::ProcessTable ?) > instead of using a lock(file). As pakfire is single-threaded, I think > this may just do the job? > I suspect, that only looking at the process table introduces just another race condition. Regards, Bernhard >>> >>>> >>>> I'll investigate this further. But the deletion of the lock >>>> should >>>> happen anyways, as far I've seen till now. >>> True, it should be deleted always and as said before, I could not >>> reproduce this manually .. but my Zabbix agent seems to be able to >>> trigger this problem at least once every 24h on my IPFire mini >>> appliance, only by executing pakfire every 10 minutes. That is why >>> I'm >>> suspecting the abnormal termination of pakfire, leaving the >>> lockfile in >>> place, is actually caused by sudo. >>> >>> On the other hand.. this can also happen when pakfire is running >>> and >>> suddenly the power is cut.. then the lockfile will still be present >>> when the machine is back up.. So I think, if we stay with the >>> lockfile, >>> we at least need some check for a stale lockfile, like checking if >>> the >>> process that created the lockfile still exists or not and removing >>> it >>> if not. >>> >> >> Because the lockfile is located in /tmp, I don't think it survives a >> reboot. > > Right, I missed that for a moment :-). > > Regards > Robin > >> >> Regards >> Bernhard >> >>> Regards >>> Robin >>> >>>> >>>> Regards, >>>> Bernhard >>>> >>> >> >