Hey David. Thanks for your elaborate mail
On Wed, 2024-04-17 at 21:55 +0200, David Kalnischkies wrote: > > Unpacking util-linux-extra (2.38.1-5+deb12u1) over (2.38.1-5+b1) > > ... > > Setting up util-linux-extra (2.38.1-5+deb12u1) ... > > dpkg: error: dpkg frontend lock was locked by another process with > > pid 1064194 > > Note: removing the lock file is always wrong, can damage the locked > > area > > and the entire system. See > > <https://wiki.debian.org/Teams/Dpkg/FAQ#db-lock>. > > E: Sub-process /usr/bin/dpkg returned an error code (2) > > I am assuming here that unpacking and setting up of util-linux-extra > worked fine and that dpkg run ended. The dpkg run after that, which > would probably have installed other things failed due to a lock being > held by something else… I'd have expected the same, with perhaps the difference that my blind assumption was that *everything* is done in one dpkg run, and thus it would hold the lock over everything. > > > Scanning processes... > > Scanning processor microcode... > > Scanning linux images... > > Running kernel seems to be up-to-date. > > The processor microcode seems to be up-to-date. > > No services need to be restarted. > > No containers need to be restarted. > > No user sessions are running outdated binaries. > > No VM guests are running outdated hypervisor (qemu) binaries on > > this host. > > This output (that I trimmed slightly) is from needrestart Sure that output was all clear (also apt's retry),.. I merely included it to show that when starting over, it simply moves one with the next packages. That is, I also don't think that this issue ever left a single package back in half-configured state or so. Actually I cannot even remember that I'd have ever seen broken packages because of this (e.g. because other deps were no longer fulfilled). > > Unfortunately it doesn't tell the name of pid 1064194 and the > > offending process > > is typically always already gone by then. > > (Maybe report that as a feature request for dpkg to show some info > about the pid instead of just the number, but that might be hard to > implement.) You think so? I'd have thought if it already knows the PID, it could just print the `comm` line of that? > > > But in any case, shouldn't apitude/apt/dpkg just permantenly hold > > the lock > > once the process has started until it finishes? > > That is how it is supposed to be, but I think aptitude was never > changed > to make full use of the frontend lock. Probably unrelated to this > issue, > but a quick grep on aptitude shows me: > > $ git grep -A 2 -- '->ReleaseLock' src/generic/apt/aptcache.cc > > :1006: apt_cache_file->ReleaseLock(); > > -1007- bool dpkg_selections_saved = > > dpkg_selections.save_selections(); > > -1008- if (! apt_cache_file->GainLock()) > which is the old pattern of releasing the lock and calling dpkg in > the > hopes that nothing grabs it in the meantime, which was the practice > before dpkg gained the frontend lock (these are aptitudes own methods > that wrap _system->Lock() from libapt that does acquire the frontend > and the dpkg lock – and also releases both if told so). > > The solution here should be to hold onto the frontend lock for the > entire run and do the lock&unlock dance for compatibility with the > dpkg > lock only. _system->LockInner() is part of that and grep has no hits > for it in aptitude. > > So, my suspicion is that aptitude doesn't use the frontend lock and > is > hence prune to other front ends grabbing the dpkg (and front end) > lock > the moment it releases the dpkg lock for dpkg. Hence the two fails > and > the run of needrestart takes long enough for the other front end to > finish so that the last dpkg call aptitude makes succeeds again. Well that sounds like a probably cause then. Though I still don't know *what* then steals the lock. I can only think of the Icinga/Prometheus probes. There should be nothing else on my systems that doesn't come out-of-the box (like apt systemd.timers or so). Thanks, Chris.