Ensuring wall_to_monotonic is not positive breaks use case

Rick Ratzel Wed, 05 Sep 2018 14:06:29 -0700

Hello,

We have a use case that was broken by the commit e1d7ba873555 (time: Always 
make sure wall_to_monotonic isn't positive).  We've been reverting the commit 
in our builds, but we'd greatly prefer a solution consistent with the mainline. 
 We also think our use case isn't unique to us, and may become more common in 
the near future.

Our use case is as follows: we have devices that have no notion of traceable
time and often boot up with a time value of 0 (the Epoch). These devices are
networked and share time using protocols such as IEEE 1588 (PTP) or IEEE
802.1AS. These protocols involve automatically electing a device to act as the
source of time for all other devices on the network (the "grandmaster" in PTP
speak) to transmit its time to the other "slave" devices. This common shared
time is used as a means to synchronize I/O operations across all devices to
create a distributed measurement or control system. The devices often
interoperate with other 3rd party devices that also share time using the same
protocol, and may also boot up with a time very near the Epoch. We have no
control over the 3rd party devices and cannot change the time that they boot up
with, or the standardized algorithm they use to elect a common grandmaster.

In this case, time is used only as a means to synchronize periodic operations,
where stable monotonically-increasing counts (this also implies no leap
seconds!) are all that's needed and traceability to a standardized timescale is
not necessary.

The problem arises when a device that's been elected grandmaster is sending out
time at or very near (maybe only a few seconds past) the Epoch, and a slave
device has an uptime of, say, several minutes past the Epoch. The slave device
will never be able to synchronize to the master in this situation, since the
master is sending out time values lower than the slave's Epoch+uptime lower
bound.

The presence of an RTC helps mitigate this situation, but only if the RTC has
been set accordingly and its batteries have not failed. We cannot guarantee
these conditions, and many of the networked devices participating will not even
have RTCs.

We're looking for suggestions on how best to proceed with a new change that
ideally both supports the use case described above, as well as addresses the
symptoms brought up in the initial commit (negative boot time causes
get_expiry() to overflow time_t, and show_stat() uses "unsigned long" to print
negative btime). Any thoughts on this would be greatly appreciated.

Link to initial commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1d7ba8735551ed79c7a0463a042353574b96da3

Thanks,
Rick Ratzel - National Instruments

Ensuring wall_to_monotonic is not positive breaks use case

Reply via email to