On Tue, Aug 01, 2006 at 11:18:08AM +0200, Thomas Schwinge wrote:
> Package: openntpd
> Version: 3.9p1-3
> Severity: important
> 
> Hello!
> 
> I tried to use openntpd to keep this system's clock synchronized, but it
> simply doesn't work.  Starting from a synchronized clock, it drifts away
> more and more, just have a look at the following summary.
> 
> I also straced the `root' and `ntpd' processes and attached these logs:
> ntpd_root.strace.gz and ntpd_ntpd.strace.gz.
> 
> Might this be a problem with the kernel not fulfilling the adjtimex()
> calls?

I see 2 strange things here:
> Jul 31 18:18:32 taylor ntpd[5443]: adjusting local clock by -1.820363s

strace shows:
send(3, "<30>Jul 31 18:18:32 ntpd[5443]: "..., 67, MSG_NOSIGNAL) = 67
adjtimex({modes=32769, offset=0, freq=0, maxerror=16384000, esterror=16384000, 
status=64, constant=2, precision=1, tolerance=33554432, time={1154362712, 
304831}}) = 5

> Jul 31 19:24:14 taylor ntpd[5443]: adjusting local clock by -4.464328s

strace:
send(3, "<30>Jul 31 19:24:14 ntpd[5443]: "..., 67, MSG_NOSIGNAL) = 67
adjtimex({modes=32769, offset=0, freq=0, maxerror=16384000, esterror=16384000, 
status=64, constant=2, precision=1, tolerance=33554432, time={1154366654, 
308473}}) = 5

[...]
> Aug  1 09:19:25 taylor ntpd[5443]: adjusting local clock by -19.839407s

strace:
send(3, "<30>Aug  1 09:19:25 ntpd[5443]: "..., 68, MSG_NOSIGNAL) = 68
adjtimex({modes=32769, offset=-6581987, freq=0, maxerror=16384000, 
esterror=16384000, status=64, constant=2, precision=1, tolerance=33554432, 
time={1154416765,
936878}}) = 5

In all the adjtimex() execpt for the last one, offset is 0.

This is all in the ntpd_adjtime() function, which looks like:

        log_debug("adjusting local clock by %fs", d);
        d_to_tv(d, &tv);
        if (adjtime(&tv, &olddelta) == -1)
                log_warn("adjtime failed");

And:
void
d_to_tv(double d, struct timeval *tv)
{
        tv->tv_sec = (long)d;
        tv->tv_usec = (d - tv->tv_sec) * 1000000;
}

Which should put a negative number in both tv_sec and
tv_usec, for which the sum is the time you want to adjust.
In case I try with -1.2, I end up with -1 and -199999,
this all looks good to me.

The adjtimex() call only shows an offset of "0" in
all cases except for the last one.  And that doesn't make
any sense to me.

The code in glibc looks simple enough too:
        tmp.tv_sec = itv->tv_sec + itv->tv_usec / 1000000L;
        tmp.tv_usec = itv->tv_usec % 1000000L;
        tntx.offset = tmp.tv_usec + tmp.tv_sec * 1000000L;
        tntx.modes = ADJ_OFFSET_SINGLESHOT;

        if (ADJTIMEX (&tntx) < 0)
                return -1;

So that looks good too.

So it's not really making any sense to me why offset is 0
in all but the last cases.  I'm not sure which value
strace shows, the one we're trying to set, or the one the
kernel returns, but I assume both should be the same in
this case.

So this smells like a kernel bug to me.


The other weird thing:
> Jul 31 17:15:01 taylor ntpd[5444]: reply from 192.83.249.31: negative delay 
> -0.001068
> Jul 31 17:15:07 taylor ntpd[5444]: reply from 192.36.143.151: negative delay 
> -0.000159
> Jul 31 17:15:09 taylor ntpd[5444]: reply from 64.52.111.11: negative delay 
> -0.000107
> Jul 31 17:53:28 taylor ntpd[5444]: reply from 64.5.1.130: negative delay 
> -0.000125

All your 4 peers seem to have "negative delay".  This
basicly means that you got a reply from the peer before
you've even send the request.  The code looks like this:

         *     Timestamp Name          ID   When Generated
         *     ------------------------------------------------------------
         *     Originate Timestamp     T1   time request sent by client
         *     Receive Timestamp       T2   time request received by server
         *     Transmit Timestamp      T3   time reply sent by server
         *     Destination Timestamp   T4   time reply received by client

        p->reply[p->shift].delay = (T4 - T1) - (T3 - T2);
        if (p->reply[p->shift].delay < 0) {
                log_info("reply from %s: negative delay %f",
                    log_sockaddr((struct sockaddr *)&p->addr->ss),
                    p->reply[p->shift].delay);
                return (0);
        }

Since I doubt that T1 and T4 should be close to each
other for all those peers, this looks like some kind of
bug somewhere, but I have no idea where.  I'll forward
this upstream.



Kurt



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to