Hello,

We have a use case that was broken by the commit e1d7ba873555 (time: Always 
make sure wall_to_monotonic isn't positive).  We've been reverting the commit 
in our builds, but we'd greatly prefer a solution consistent with the mainline. 
 We also think our use case isn't unique to us, and may become more common in 
the near future.

Our use case is as follows: we have devices that have no notion of traceable 
time and often boot up with a time value of 0 (the Epoch).  These devices are 
networked and share time using protocols such as IEEE 1588 (PTP) or IEEE 
802.1AS.  These protocols involve automatically electing a device to act as the 
source of time for all other devices on the network (the "grandmaster" in PTP 
speak) to transmit its time to the other "slave" devices.  This common shared 
time is used as a means to synchronize I/O operations across all devices to 
create a distributed measurement or control system.  The devices often 
interoperate with other 3rd party devices that also share time using the same 
protocol, and may also boot up with a time very near the Epoch.  We have no 
control over the 3rd party devices and cannot change the time that they boot up 
with, or the standardized algorithm they use to elect a common grandmaster.

In this case, time is used only as a means to synchronize periodic operations, 
where stable monotonically-increasing counts (this also implies no leap 
seconds!) are all that's needed and traceability to a standardized timescale is 
not necessary.

The problem arises when a device that's been elected grandmaster is sending out 
time at or very near (maybe only a few seconds past) the Epoch, and a slave 
device has an uptime of, say, several minutes past the Epoch.  The slave device 
will never be able to synchronize to the master in this situation, since the 
master is sending out time values lower than the slave's Epoch+uptime lower 
bound.

The presence of an RTC helps mitigate this situation, but only if the RTC has 
been set accordingly and its batteries have not failed.  We cannot guarantee 
these conditions, and many of the networked devices participating will not even 
have RTCs.

We're looking for suggestions on how best to proceed with a new change that 
ideally both supports the use case described above, as well as addresses the 
symptoms brought up in the initial commit (negative boot time causes 
get_expiry() to overflow time_t, and show_stat() uses "unsigned long" to print 
negative btime).  Any thoughts on this would be greatly appreciated.

Link to initial commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1d7ba8735551ed79c7a0463a042353574b96da3

Thanks,
Rick Ratzel - National Instruments

Reply via email to