rmuir commented on issue #7687:
URL: https://github.com/apache/lucene/issues/7687#issuecomment-1225628331

   This popped again on the dev list, I think its worth considering that this 
could be caused by a "step" to the system time (time correction by NTP or VM 
utility).
   
   From what I can tell, this timeout is currently implemented with 
`System.currentTimeMillis()` which uses wall-clock time:
   ```
   clock_gettime(CLOCK_REALTIME, &ts);
   ```
   
   That means it is very prone to such disturbances in the system time.
   
   On the other hand, `System.nanoTime()` uses monotonic clock (such as 
TSC/HPET), at least on posix, which won't be impacted by such corrections:
   ```
   clock_gettime(CLOCK_MONOTONIC, &tp);
   ```
   
   See 
https://github.com/openjdk/jdk/blob/b6b0317f832985470ccf4bc1e2abf9015ce5bd54/src/hotspot/os/posix/os_posix.cpp#L1372
   
   At the least, I think we could avoid problems by switching the timeout 
implementation in randomizedtesting? At work, I do a ton of testing with VMs 
and I've spent a lot of time battling these issues. I feel like you only have 
two choices for VMs (especially using VirtualBox) to avoid such issues:
   1) use monotonic time everywhere and don't rely on wall-clock time (such as 
gettimeofday/CLOCK_REALTIME)
   2) configure time synchronization to force any "step" correction before 
tests are running and then only "slew" in the background so as not to disturb 
code that relies on wall-clock time.
   
   cc: @dweiss


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to