Public bug reported:

We have noticed that after the upgrade from 12.04 to 14.04, daemons that
are started from rc.d scripts are sometimes being run twice.

We've tracked this down to a race condition in the failsafe script.
Here is the normal sequence of events:

Mar 16 10:39:37 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:39:37 failsafe: net-device-up start event emitted
Mar 16 10:39:37 failsafe: starting failsafe script
Mar 16 10:39:37 failsafe: sleeping in failsafe script
Mar 16 10:39:37 failsafe: static-network-up start event emitted
Mar 16 10:39:37 failsafe: rc-sysinit starting event emitted
Mar 16 10:39:37 kernel: [    2.056689] init: failsafe main process (642) killed 
by TERM signal

(Note the inaccurate message about the 120 seconds being reached which
is actually logged immediately on boot - best just to ignore that.  The
TERM warning is also harmless - that is the normal result.)

Here is what we see on a bad boot, where the rc.d scripts are started
twice:

Mar 16 10:24:47 failsafe: static-network-up start event emitted
Mar 16 10:24:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:24:47 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:24:47 failsafe: net-device-up start event emitted
Mar 16 10:24:47 failsafe: starting failsafe script
Mar 16 10:24:47 failsafe: sleeping in failsafe script
Mar 16 10:26:47 failsafe: emitting from failsafe script
Mar 16 10:26:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:26:47 kernel: [  122.229597] init: failsafe main process (797) killed 
by TERM signal

rc-sysinit has been emitted twice.

Note that the rc-sysinit event has been emitted before the failsafe
script has been emitted, because in this boot it happens that the
static-network-up event was emitted before the net-device-up event.

As a result, the normal stop on "starting rc-sysinit" rule in the
failsafe job definition doesn't work because the failsafe job is not yet
running.

Another way to look at the issue is that the rc-sysinit job definition's
"start on (filesystem and static-network-up) or failsafe-boot" means
that it will always start twice if it finishes before the failsafe
handler fires.

** Affects: upstart (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1557761

Title:
  rc-sysinit run twice due to failsafe race condition

Status in upstart package in Ubuntu:
  New

Bug description:
  We have noticed that after the upgrade from 12.04 to 14.04, daemons
  that are started from rc.d scripts are sometimes being run twice.

  We've tracked this down to a race condition in the failsafe script.
  Here is the normal sequence of events:

  Mar 16 10:39:37 failsafe: Failsafe of 120 seconds reached.
  Mar 16 10:39:37 failsafe: net-device-up start event emitted
  Mar 16 10:39:37 failsafe: starting failsafe script
  Mar 16 10:39:37 failsafe: sleeping in failsafe script
  Mar 16 10:39:37 failsafe: static-network-up start event emitted
  Mar 16 10:39:37 failsafe: rc-sysinit starting event emitted
  Mar 16 10:39:37 kernel: [    2.056689] init: failsafe main process (642) 
killed by TERM signal

  (Note the inaccurate message about the 120 seconds being reached which
  is actually logged immediately on boot - best just to ignore that.
  The TERM warning is also harmless - that is the normal result.)

  Here is what we see on a bad boot, where the rc.d scripts are started
  twice:

  Mar 16 10:24:47 failsafe: static-network-up start event emitted
  Mar 16 10:24:47 failsafe: rc-sysinit starting event emitted
  Mar 16 10:24:47 failsafe: Failsafe of 120 seconds reached.
  Mar 16 10:24:47 failsafe: net-device-up start event emitted
  Mar 16 10:24:47 failsafe: starting failsafe script
  Mar 16 10:24:47 failsafe: sleeping in failsafe script
  Mar 16 10:26:47 failsafe: emitting from failsafe script
  Mar 16 10:26:47 failsafe: rc-sysinit starting event emitted
  Mar 16 10:26:47 kernel: [  122.229597] init: failsafe main process (797) 
killed by TERM signal

  rc-sysinit has been emitted twice.

  Note that the rc-sysinit event has been emitted before the failsafe
  script has been emitted, because in this boot it happens that the
  static-network-up event was emitted before the net-device-up event.

  As a result, the normal stop on "starting rc-sysinit" rule in the
  failsafe job definition doesn't work because the failsafe job is not
  yet running.

  Another way to look at the issue is that the rc-sysinit job
  definition's "start on (filesystem and static-network-up) or failsafe-
  boot" means that it will always start twice if it finishes before the
  failsafe handler fires.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/upstart/+bug/1557761/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to