rmuir commented on issue #7687: URL: https://github.com/apache/lucene/issues/7687#issuecomment-1225628331
This popped again on the dev list, I think its worth considering that this could be caused by a "step" to the system time (time correction by NTP or VM utility). From what I can tell, this timeout is currently implemented with `System.currentTimeMillis()` which uses wall-clock time: ``` clock_gettime(CLOCK_REALTIME, &ts); ``` That means it is very prone to such disturbances in the system time. On the other hand, `System.nanoTime()` uses monotonic clock (such as TSC/HPET), at least on posix, which won't be impacted by such corrections: ``` clock_gettime(CLOCK_MONOTONIC, &tp); ``` See https://github.com/openjdk/jdk/blob/b6b0317f832985470ccf4bc1e2abf9015ce5bd54/src/hotspot/os/posix/os_posix.cpp#L1372 At the least, I think we could avoid problems by switching the timeout implementation in randomizedtesting? At work, I do a ton of testing with VMs and I've spent a lot of time battling these issues. I feel like you only have two choices for VMs (especially using VirtualBox) to avoid such issues: 1) use monotonic time everywhere and don't rely on wall-clock time (such as gettimeofday/CLOCK_REALTIME) 2) configure time synchronization to force any "step" correction before tests are running and then only "slew" in the background so as not to disturb code that relies on wall-clock time. cc: @dweiss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org