mikemccand commented on code in PR #13190:
URL: https://github.com/apache/lucene/pull/13190#discussion_r1531945845
##########
lucene/core/src/java/org/apache/lucene/index/MergeRateLimiter.java:
##########
@@ -118,24 +118,32 @@ private long maybePause(long bytes, long curNS) throws
MergePolicy.MergeAbortedE
throw new MergePolicy.MergeAbortedException("Merge aborted.");
}
- double rate = mbPerSec; // read from volatile rate once.
- double secondsToPause = (bytes / 1024. / 1024.) / rate;
-
- // Time we should sleep until; this is purely instantaneous
- // rate (just adds seconds onto the last time we had paused to);
- // maybe we should also offer decayed recent history one?
- long targetNS = lastNS + (long) (1000000000 * secondsToPause);
-
- long curPauseNS = targetNS - curNS;
-
- // We don't bother with thread pausing if the pause is smaller than 2 msec.
- if (curPauseNS <= MIN_PAUSE_NS) {
- // Set to curNS, not targetNS, to enforce the instant rate, not
- // the "averaged over all history" rate:
- lastNS = curNS;
+ final double rate = mbPerSec; // read from volatile rate once.
+ final double secondsToPause = (bytes / 1024. / 1024.) / rate;
+
+ AtomicLong curPauseNSSetter = new AtomicLong();
+ lastNS.updateAndGet(
+ last -> {
+ // Time we should sleep until; this is purely instantaneous
+ // rate (just adds seconds onto the last time we had paused to);
+ // maybe we should also offer decayed recent history one?
+ long targetNS = last + (long) (1000000000 * secondsToPause);
+ long curPauseNS = targetNS - curNS;
+ // We don't bother with thread pausing if the pause is smaller than
2 msec.
+ if (curPauseNS <= MIN_PAUSE_NS) {
+ // Set to curNS, not targetNS, to enforce the instant rate, not
+ // the "averaged over all history" rate:
+ curPauseNSSetter.set(0);
+ return curNS;
+ }
Review Comment:
Sorry I'm trying to catch up here and will probably ask a bunch of stupid
questions :) Thank you @benwtrent for persisting in this hairy logic!
> (I'm considering making `pause` synchronized rather than `maybePause` so
that `System.nanoTime()` is computed within the lock and the pausing logic
accounts for the fact that some time may have been spent waiting on the lock
already.)
Couldn't we move the `curNS = System.nanoTime()` inside the
locked/updateAndGet'd part of `maybePause`? I like the thread safety that the
`updateAndGet` is giving us here, ensuring we accurately account for all bytes
written by N threads within a single merge. Also, I don't expect the added
sync required here will hurt performance much: Lucene is doing much work to
produce these bytes being written already, so conflict should be rare-ish.
Also, if we make `pause` sync'd, it will cause scary looking `jstack` thread
dumps? Making it look like one thread is sleeping while holding a lock and
blocking other threads (which indeed is what it'd be doing). Versus the thread
stacks we'd see w/ the current approach that make it quite clear that all N
threads are intentionally stalling in `mergeProgress.pauseNanos`? It would
reduce the horror reaction we'd see peeking at thread dumps maybe ...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]