Package: unattended-upgrades,needrestart
Severity: important
Control: affects -1 + cron openssh-server systemd

Hi,

I hit a funky interaction bug with the last bookworm stable point
release upgrade. In isolation, each component behaves reasonably, but
their combination may result in unexpected service failure. I seek
feedback on improving the situation.

1. Services such as cron and ssh may leave processes behind after a
   restart. A long cron job and an existing ssh connection are example
   situations where this happens.
2. By default, unattended-upgrades sets Unattended-Upgrade::MinimalSteps
   to true and therefore upgrades one source package at a time. As a
   consequence it invokes needrestart a number of times.
3. Every time needrestart is invoked, it considers all running services
   and considers those left-over cron jobs or ssh connections as a reason
   to restart the service even if the main daemon process is no longer
   using an outdated copy.
4. systemd poses a limit on restarting services too frequently. If you
   restart a service 10 times within a minute, it temporarily ignores
   start requests and leaves the service in a failed state.

The end result is that a stable point release may upgrade glibc rather
early, then each of the minimal steps will restart your service until it
fails. A stable point release has sufficiently many updates to trigger
systemd's limit if you operate on fast storage.

Terminating ssh in an unattended-upgrade is a significant problem
justifying important severity. Hope you agree.

Now the question arises what could be done to improve the situation.

The default of Unattended-Upgrade::MinimalSteps is set to true arguing
that this is safer. Arguably, setting it to false, also provides a kind
of safety against unattended-upgrades terminating your ssh server.

Another way to look at this would be that needrestart maybe should
recognize that restarting cron or ssh is not going to help in this
situation and skip doing that.

Yet another way of looking at it, is considering that
unattended-upgrades maybe should interact with needrestart more closely
and batch up needrestart even in the fase of
Unattended-Upgrade::MinimalSteps. Maybe it could temporarily disable
needrestart somehow and then run it once after doing its thing? That
would also speed things up.

We're not yet at the end of options. Skipping restarts of young
processes also is a possible avenue and suggested by Paul Wise via
#889552.

Last but not least, having unattended-upgrades perform a sleep between
the upgrade operations would make it slow enough to not trigger
systemd's limit.

As we can see, there a are lots of options to twist the current behavior
into something that avoids this particular failure mode. On the flip
side, each of them has other subtle consequences, so it is not clear to
me what the best option is. I appreciate some feedback from the relevant
package maintainers.

Helmut

Reply via email to