I was trying to understand how "discard monitor N" works in ntpd 4.2.8p15. I
don't have a high-volume server -- this was purely academic interest. In the
process I think I ran across a bug in how it works. Please correct me where I'm
wrong.
The academic question was: what are the valid range and units of N?
The documentation says "discard monitor N" determines the "probability of being
recorded for packets that overflow the MRU list size limit". Similarly Dr Mills
described it as the "probability that a packet that overflows the internal LRU
list is discarded". Naively I would have expected a probability to be expressed
as [0..1) or [0..100), but actually it is expressed in seconds. Internally it
is mon_age, and the default is 3000.
ntp_monitor.c:
int mon_age = 3000; /* preemption limit */
When a packet comes in from a new client that would overflow the MRU list, that
means ntpd has already checked that the MRU list is full, can't be extended,
and all the entries are too young to age out (mru maxage). In this "dire"
situation, the new information can be discarded, or it can be recorded over the
oldest list entry. The choice is made by chance. The probability that the
oldest entry will be recorded over depends on the age of the oldest entry, and
is calculated as:
oldest_age / mon_age
This means the probability of being recorded over is very low when the oldest
entry is only 1 second old, and very high when the oldest entry is nearly 3000
seconds old. And an oldest entry older than the threshold will always be
recorded over.
The relevant code is:
ntp_monitor.c:
/* Preempt from the MRU list if old enough. */
} else if (ntp_random() / (2. * FRAC) >
(double)oldest_age / mon_age) {
return ~(RES_LIMITED | RES_KOD) & flags;
} else {
mon_reclaim_entry(oldest);
Now here ntpd is generating a random real number by:
ntp_random() / (2. * FRAC)
This looks like an arithmetic error to me. It returns a [0..0.25) random number
where you would expect a [0..1) random number. To get a [0..1) random number,
you would want
ntp_random() * 2. / FRAC
and you do find that elsewhere in the code. FRAC represents 2^32. But
ntp_random() returns a random integer in the range 0 .. 2^31 - 1, and this must
be doubled (not halved) to get a [0..1) random number. So contrary to what I
believe is the intent, "discard monitor 3000" currently sets the age threshold
to 3000 รท 4 = 750 s, beyond which the oldest MRU list entry is always recorded
over in case of overflow.
Related Bugzilla bug 3640: ntp.conf: missing documentation for "discard
monitor" default value <https://bugs.ntp.org/show_bug.cgi?id=3640>. I did not
find a bug describing wrong behavior.
Cheers!
Edward
--
This is [email protected]
Subscribe: [email protected]
Unsubscribe: [email protected]