** Description changed: - We're seeing a race between if-up.d/ntpdate and the ntp startup script. + [Impact] + * Hardware clocks are not stepped at boot, which can prevent NTP from ever + syncing the clock. + Incorrect clocks can cause serious issues in distributed systems. - 1) if-up.d/ntpdate starts. - 2) if-up.d/ntpdate acquires the lock "/var/lock/ntpdate-ifup". - 3) if-up.d/ntpdate stops the ntp service [which isn't running anyway]. - 4) if-up.d/ntpdate starts running ntpdate, which bids UDP *.ntp - 5) /etc/init.d/rc 2 executes "/etc/rc2.d/S20ntp start" - 6) /etc/init.d/ntp acquires the lock "/var/lock/ntpdate". - 7) /etc/init.d/ntp starts the ntp daemon. - 8) The ntp daemon logs an error, complaining that it cannot bind UDP *.ntp. - 9) if-up.d/ntpdate now starts the ntp service. + * Upstream originally added a lock file to eliminate a race between the ntp + service (which keeps the clock synchronized during normal operation) and + ntpdate (which is used to step the clock by large intervals at boot time). + That change had a flaw which introduced a deadlock. An Ubuntu patch was + applied which broke the locking mechanism entirely, reintroducing the race + condition. - The result is a weird churn, though ntpd does end up running at the end. + * This change undoes the Ubuntu patch and fixes the deadlock by unlocking + before attempting to start the ntp service. - Should these not be using the same lock file? + [Test Case] + + * There are two bugs: The race, and the deadlock. To reproduce the race more + consistently: + - add 'sleep 30' to '/etc/network/if-up.d/ntpdate' on the line preceding + '/usr/sbin/ntpdate-debian -s $OPTS 2>/dev/null || :', and comment out + 'invoke-rc.d --quiet $service stop >/dev/null 2>&1 || true'. This will + reproduce the case where the ntp service starts between the stop command + and the ntpdate command. + The result will be that the ntpdate command fails. There will be a + message in syslog like: + 'ntpdate[17660]: the NTP socket is in use, exiting' + - Reintroducing the lock brings back the deadlock issue. Both the ntpdate + if-up.d script and the ntp init script check the lock file, but the + ntpdate script attempted to start the ntp init script before unlocking + the lock. Moving the unlock before the init script invocation fixes + the deadlock. The original deadlock behavior is described here: + https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/246203 + + [Regression Potential] + + * Low. Out-of-sync clocks could be changed a large amount at boot time, but + only for machines with static IP's. The clock is only likely to be in this + state if the clock was very skewed at boot time, which is also unlikely + since NTP usually keeps the software clock in sync during operation and + the hardware clock is updated at shutdown.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1125726 Title: boot-time race between /etc/network/if-up.d/ntpdate and "/etc/init.d/ntp start" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ntp/+bug/1125726/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs