On 2023-01-09 14:08:06, Daniel Swarbrick wrote:
> Hi Eric,
>
> Thanks for the detailed bug report. As this is something which can 
> theoretically affect _any_ apt-based distributed (i.e., derivatives of 
> Debian), I feel that it should ideally be reported upstream.

I'm curious here, actually: which upstream are you thinking of? Because
I have the suspicion this is actually a python3-apt bug rather than
specific to this exporter...

> I personally run this textfile collector on a Debian bookworm system, as 
> well as apticron - so this is (I think) a similar scenario where two 
> independent processes are periodically updating the apt cache, and I 
> wondered whether that was wise or not. I have seen the textfile 
> collector block only once so far.

We're seeing repeated problems with this here. We manage a fleet of
about 90 Debian installations, out of which 42 have been upgraded to
bookworm and are showing symptoms. Those machines have a hourly legacy
cron job that updates the apt cache for another monitoring system, with
`apt update -qq`. Since we upgraded to bookworm, we have had 95 warnings
from cron, some of them repeating for hours on end.

One box in particular hung on that lock for over *two days*.

> The apt.sh script which apt_info.py replaces only executed "apt-get 
> --just-print" - so even if executed as root, it never tried to update 
> the apt cache. In fact, unless you had something else like apticron to 
> periodically update the apt cache, apt.sh would return stale information.

That does seem suboptimal, that said. :)

> I guess that a simple workaround would be to tweak the systemd service 
> so that apt_info.py is executed as an unprivileged user, which would be 
> unable to update the cache, and theoretically avoid any potential for a 
> deadlock. Perhaps a recommendation to the upstream developer could be 
> made, e.g. to add a command-line argument to the script so that it 
> wouldn't try to update the cache even when executed as root.

Surely there should be a timeout to this script or something? Why does
it hang in the first place? I'll investigate a little bit further.

Reply via email to