Bug#904558: What should happen when maintscripts fail to restart a service

Wouter Verhelst Wed, 19 Sep 2018 01:15:18 -0700

On Tue, Sep 18, 2018 at 10:04:26PM +0200, Tollef Fog Heen wrote:
> ]] Ian Jackson 
> 
> Hi,
> 
> > There may be good reasons not to treat daemon startup failure as a
> > postinst failure, but the argument above is not one of them.
> 
> I think this is the core question.  I largely agree with Ian here that
> having postinsts fail is not that big a deal if they can't make forward
> progress, but also we're being asked to advice on what happens when a
> maintainer script fails to restart a service.  I disagree with him on
> whether failure to start/restart a service should be considered a
> configuration failure.


I'm not sure why that position is even being considered valid.

> The API provided by a package being in the configured state is not
> whether the relevant daemon is running or not; that is runtime and can
> and will change many times while the package is in the configured state,
> so dpkg dependencies are not useful for expressing «this service must be
> running».

No. But it *is* a useful way to express "this service must be able to
run".

Additionally, if something fails to restart, then that is a serious
problem that I, as a system administrator, would like to know about.
Failure to configure a package signals that there is a serious problem
that I need to fix, so that informs me.

> (There's also the case where the service is running on a
> separate host, which is often the case for services such as databases
> and where the use of Depends is inappropriate.)
> 
> I think the general rule should be that the success/failure of the
> postinst script should signal whether the package considers itself ready
> to provide whatever API it exists to provide (disregarding the case of
> Essential packages here, since those are special).
> 
> This means that failure to start a daemon should generally not cause the
> postinst to fail.

I think it should.

If the daemon fails to restart, that means its configuration is
incomplete or incorrect, which means the package failed to configure
correctly. The failure to restart is just a symptom; the actual problem
is the broken configuration, which may have further effects beyond just
"the daemon won't restart". As such, in the general case, I think
failure to restart is something that should cause failure to configure.

There are really only two[1] reasons why a daemon could fail to restart:

- The maintainer made a mistake in the default configuration, and the
  user didn't make any changes so the old conffiles are being replaced
  by the new ones, or the package is being newly installed; now the
  daemon encounters a syntax error. This is a bug, plain and simple, and
  catching bugs earlier rather than later is a good idea, which will
  happen if the daemon restart failure causes a postinst failure.
- The maintainer made no mistake, but the upgrading user made some local
  changes, so the conffile system ensures that the syntactic differences
  in the configuration are not incorporated and the daemon fails to
  restart. As a system administrator, I would want to know when
  something like that happens sooner rather than later, so that I can
  fix it (also sooner rather than later). Failing to finish postinst
  correctly ensures that that does happen.

This is now being countered by "but some people use tools that don't
show failures to system administrators", from which the (wrong)
conclusion is drawn "so we shouldn't fail anymore". It would be awesome
if we lived in a world where we could avoid bugs in code and thus avoid
all possible failures, but alas, we don't. So, given that failures
*will* happen, even if we don't fail when daemons fail to restart, the
correct conclusion would be "so those tools should be fixed to do their
utter best to inform the system administrator when something failed".
When those tools do that, failure to restart a service is no longer a
problem for them, and we can continue to do the right thing.

[1] There is also the possibility of "the package ships with incomplete
    configuration on purpose, because there are no sane defaults to use
    and installing the package requires manual steps from the maintainer
    before it can be made to work", but (a) our best practices recommend
    against doing that if at all possible, and (b) in that case starting
    the daemon shouldn't even be attempted from postinst, and so failure
    to start can't be a consideration in the exit state of postinst.

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
     Hacklab

Bug#904558: What should happen when maintscripts fail to restart a service

Reply via email to