On Sat, 8 Jun 2013 19:51:12 +0200, Kurt Roeckx <k...@roeckx.be> wrote: > On Sat, Jun 08, 2013 at 02:45:32PM +0200, Sergio Gelato wrote: >> On Fri, 7 Jun 2013 22:11:30 +0200, Kurt Roeckx <k...@roeckx.be> wrote: >> > But you started a new one, which wrote a PID file, and then it >> > died because it detected that an other ntpd was still running, >> > and you really [only] want 1 running. It probably shouldn't have >> > written the pid file in that case. >> >> I now have an instance of the problem occurring naturally on a squeeze >> system (so the trigger mechanism isn't Ubuntu-only, one can't blame it on >> upstart in this case), and I can confirm that it is associated with >> attempts by the system to start two ntpd processes concurrently. >> Arranging >> for the instance that loses the race not to have its PID written to the >> file should be very helpful, I think. >> >> Here are some relevant logs about the incident, lightly sanitised: >> >> Jun 7 08:17:18 <HOST> dhclient: DHCPACK from <SERVERIP> >> Jun 7 08:17:18 <HOST> ntpd[1576]: ntpd exiting on signal 15 >> Jun 7 08:17:20 <HOST> ntpd[1904]: ntpd 4.2.6p2@1.2194-o Sun Oct 17 >> 13:35:13 UTC 2010 (1) >> Jun 7 08:17:20 <HOST> ntpd[1905]: ntpd 4.2.6p2@1.2194-o Sun Oct 17 >> 13:35:13 UTC 2010 (1) > > So you're starting it twice at the same time? Of course there is > no PID file yet at the time the 2nd gets started.
A race, as I said. And the problem isn't so much that multiple instances get started at the same time (only one of them will survive, at least for typical configurations) but that the init script can't always find the surviving one afterwards. > Looking at the init script, "status" doesn't use the pid file > currently. I beg to differ. It calls status_of_proc, which is defined in /lib/lsb/init-functions. status_of_proc in turn calls pidofproc, which has if [ ! "$specified" ]; then pidfile="/var/run/$base.pid" fi and only falls back on /bin/pidof if the pidfile doesn't exist. This matches what I've seen in testing (different behaviour in the case of a missing pidfile vs. an existing one with incorrect contents). Verification with strace is left as an exercise for the non-believer. > So it's just going to look at the processes. No, it's going to "kill -0" the pid named in the pidfile, and return 0 if that succeeds, 1 if there is no such process. > So > I don't see how status is going to react differently that puppet. > Note also how it did properly say it's running in your example, > even when the PID file is wrong. Only if there happens to be a running process with that pid. The implementation of pidofproc in /lib/lsb/init-functions doesn't check that the pid is that of an ntpd instance. The more common case, illustrated in the second half of my example, is for "status" to return 1 ("program is dead and /var/run pid file exists") if the pidfile is bogus. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org