Package: unattended-upgrades,needrestart Severity: important Control: affects -1 + cron openssh-server systemd
Hi, I hit a funky interaction bug with the last bookworm stable point release upgrade. In isolation, each component behaves reasonably, but their combination may result in unexpected service failure. I seek feedback on improving the situation. 1. Services such as cron and ssh may leave processes behind after a restart. A long cron job and an existing ssh connection are example situations where this happens. 2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps to true and therefore upgrades one source package at a time. As a consequence it invokes needrestart a number of times. 3. Every time needrestart is invoked, it considers all running services and considers those left-over cron jobs or ssh connections as a reason to restart the service even if the main daemon process is no longer using an outdated copy. 4. systemd poses a limit on restarting services too frequently. If you restart a service 10 times within a minute, it temporarily ignores start requests and leaves the service in a failed state. The end result is that a stable point release may upgrade glibc rather early, then each of the minimal steps will restart your service until it fails. A stable point release has sufficiently many updates to trigger systemd's limit if you operate on fast storage. Terminating ssh in an unattended-upgrade is a significant problem justifying important severity. Hope you agree. Now the question arises what could be done to improve the situation. The default of Unattended-Upgrade::MinimalSteps is set to true arguing that this is safer. Arguably, setting it to false, also provides a kind of safety against unattended-upgrades terminating your ssh server. Another way to look at this would be that needrestart maybe should recognize that restarting cron or ssh is not going to help in this situation and skip doing that. Yet another way of looking at it, is considering that unattended-upgrades maybe should interact with needrestart more closely and batch up needrestart even in the fase of Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable needrestart somehow and then run it once after doing its thing? That would also speed things up. We're not yet at the end of options. Skipping restarts of young processes also is a possible avenue and suggested by Paul Wise via #889552. Last but not least, having unattended-upgrades perform a sleep between the upgrade operations would make it slow enough to not trigger systemd's limit. As we can see, there a are lots of options to twist the current behavior into something that avoids this particular failure mode. On the flip side, each of them has other subtle consequences, so it is not clear to me what the best option is. I appreciate some feedback from the relevant package maintainers. Helmut