mikemccand commented on a change in pull request #1552: URL: https://github.com/apache/lucene-solr/pull/1552#discussion_r439492017
########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ########## @@ -4483,6 +4593,7 @@ public int length() { // Merge would produce a 0-doc segment, so we do nothing except commit the merge to remove all the 0-doc segments that we "merged": assert merge.info.info.maxDoc() == 0; commitMerge(merge, mergeState); + success = true; Review comment: I think this was a small pre-existing bug. I.e. the merge has in fact succeeded on this path. Before this change we are calling `closeMergeReaders` twice (once in the line above this, then again on line 4720 below. Maybe that is harmless, but code-wise I think this path did succeed. If necessary, we could pull this out into its own PR? But I think it's a good, if subtle, catch. The merge did succeed in this path. ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ########## @@ -3228,15 +3268,38 @@ private long prepareCommitInternal() throws IOException { // sneak into the commit point: toCommit = segmentInfos.clone(); + if (anyChanges) { + // Find any merges that can execute on commit (per MergePolicy). + MergePolicy.MergeSpecification mergeSpec = Review comment: I think what makes this tricky is that this is a combination of `MergePolicy` (to pick the small merges) and `MergeScheduler` (to run them and await their completion, subject to a time limit) purposes. I do not think you can achieve this by just wrapping in `MergePolicy`, but I agree it would be better if we could find a simpler way to achieve it. ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java ########## @@ -459,6 +463,31 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) { return this; } + /** + * Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...). + * If this time is reached, we proceed with the commit based on segments merged up to that point. + * The merges are not cancelled, and may still run to completion independent of the commit. + */ + public IndexWriterConfig setMaxCommitMergeWaitSeconds(double maxCommitMergeWaitSeconds) { Review comment: I think @msfroh had considered a separate `IndexWriter` method before but something went wrong with that approach? I don't think this should be a separate method, actually. We have a `MergePolicy` that governs which merges should happen upon which events/triggers and what this change is adding is a new trigger (on commit) at which merging could conceivably occur. If we added this method, the implication to fresh eyes would be that the existing `prepareCommit` will also wait for merges with some default parameter, while this new method lets you change the default. Anyway, let's hear from @msfroh if there was some wrinkle on making a dedicated method for this, but I still think that's a messy API. We should rather use our existing `MergePolicy` API correctly. ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ########## @@ -3257,6 +3320,52 @@ private long prepareCommitInternal() throws IOException { } finally { maybeCloseOnTragicEvent(); } + + if (mergeAwaitLatchRef != null) { + CountDownLatch mergeAwaitLatch = mergeAwaitLatchRef.get(); + // If we found and registered any merges above, within the flushLock, then we want to ensure that they + // complete execution. Note that since we released the lock, other merges may have been scheduled. We will + // block until the merges that we registered complete. As they complete, they will update toCommit to + // replace merged segments with the result of each merge. + config.getIndexWriterEvents().beginMergeOnCommit(); + mergeScheduler.merge(mergeSource, MergeTrigger.COMMIT); + long mergeWaitStart = System.nanoTime(); + int abandonedCount = 0; + long waitTimeMillis = (long) (config.getMaxCommitMergeWaitSeconds() * 1000.0); + try { + if (mergeAwaitLatch.await(waitTimeMillis, TimeUnit.MILLISECONDS) == false) { + synchronized (this) { + // Need to do this in a synchronized block, to make sure none of our commit merges are currently + // executing mergeFinished (since mergeFinished itself is called from within the IndexWriter lock). + // After we clear the value from mergeAwaitLatchRef, the merges we schedule will still execute as + // usual, but when they finish, they won't attempt to update toCommit or modify segment reference + // counts. + mergeAwaitLatchRef.set(null); + for (MergePolicy.OneMerge commitMerge : commitMerges) { + if (runningMerges.contains(commitMerge) || pendingMerges.contains(commitMerge)) { + abandonedCount++; + } + } + } + } + } catch (InterruptedException ie) { + throw new ThreadInterruptedException(ie); + } finally { + if (infoStream.isEnabled("IW")) { + infoStream.message("IW", String.format(Locale.ROOT, "Waited %.1f ms for commit merges", + (System.nanoTime() - mergeWaitStart)/1_000_000.0)); + infoStream.message("IW", "After executing commit merges, had " + toCommit.size() + " segments"); + if (abandonedCount > 0) { + infoStream.message("IW", "Abandoned " + abandonedCount + " commit merges after " + waitTimeMillis + " ms"); + } + } + if (abandonedCount > 0) { + config.getIndexWriterEvents().abandonedMergesOnCommit(abandonedCount); Review comment: The idea with this is to be able to track externally to `IndexWriter` how many merges completed during the commit window, how many did not, how long the commit waited for small merges to finish, etc. This is useful telemetry for understanding how the feature is actually working in your production cluster. I don't think we can get the equivalent telemetry by e.g. wrapping `MergePolicy` or `MergeScheduler`, because it is `IndexWriter` that knows how long it will wait and knows which merges made it and which did not. I think `testPoints` only run under assertion, and are really designed for unit tests to do interesting things. But maybe we could somehow abuse it for this use case? I agree it should be tested better if we want to include it here. If it is really controversial then +1 to remove the `IndexWriterEvents` entirely here and pursue it as a separate issue. It was useful for us (Amazon product search) to understand how this feature impacts our production clusters, and to help us tune a reasonable timeout value. ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java ########## @@ -109,6 +110,9 @@ /** Default value for whether calls to {@link IndexWriter#close()} include a commit. */ public final static boolean DEFAULT_COMMIT_ON_CLOSE = true; + + /** Default value for time to wait for merges on commit (when using a {@link MergePolicy} that implements findFullFlushMerges). */ + public static final double DEFAULT_MAX_COMMIT_MERGE_WAIT_SECONDS = 30.0; Review comment: Maybe 5.0 or 10.0 seconds instead? A "typical" commit, writing new Lucene segments and `fsync`ing many files can often take several seconds, but 30 seems high. ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ########## @@ -3152,6 +3154,42 @@ public final boolean flushNextBuffer() throws IOException { } } + private MergePolicy.OneMerge updateSegmentInfosOnMergeFinish(MergePolicy.OneMerge merge, final SegmentInfos toCommit, + AtomicReference<CountDownLatch> mergeLatchRef) { + return new MergePolicy.OneMerge(merge.segments) { + public void mergeFinished() throws IOException { + super.mergeFinished(); + CountDownLatch mergeAwaitLatch = mergeLatchRef.get(); + if (mergeAwaitLatch == null) { + // Commit thread timed out waiting for this merge and moved on. No need to manipulate toCommit. + return; + } + if (committed) { + deleter.incRef(this.info.files()); Review comment: I think @msfroh might remember some context on why he settled on this approach. This is inherently a complex problem: we want to let `MergePolicy` pick the "right" merges to do on commit (the smallish ones). But, it may pick poorly, or maybe the box is IO starved currently, and so we want to 1) ask `MergeScheduler` to kick off the merges it had requested, but 2) those merges that complete it time will make it into the commit point, while those that do not will still be allowed to run to completion and be switched in the live `SegmentInfos` (no wasted work), but will not be in the current commit point. Anyway, +1 if we can re-use existing code that "merges the merged segments" into a live or the pending commit `SegmentInfos`. Maybe we can factor out the common code to reduce the added complexity here? ########## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriterConfig.java ########## @@ -459,6 +463,31 @@ public IndexWriterConfig setCommitOnClose(boolean commitOnClose) { return this; } + /** + * Expert: sets the amount of time to wait for merges returned by MergePolicy.findFullFlushMerges(...). + * If this time is reached, we proceed with the commit based on segments merged up to that point. + * The merges are not cancelled, and may still run to completion independent of the commit. + */ + public IndexWriterConfig setMaxCommitMergeWaitSeconds(double maxCommitMergeWaitSeconds) { + this.maxCommitMergeWaitSeconds = maxCommitMergeWaitSeconds; + return this; + } + + /** + * Set the callback that gets invoked when IndexWriter performs various actions. + */ + public IndexWriterConfig setIndexWriterEvents(IndexWriterEvents indexWriterEvents) { Review comment: Yeah for sure if we include it in this PR it should be tested by the PR. See my comment above about this. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org