zhaih commented on code in PR #935: URL: https://github.com/apache/lucene/pull/935#discussion_r891966944
########## lucene/core/src/java/org/apache/lucene/index/LogMergePolicy.java: ########## @@ -568,23 +568,41 @@ public MergeSpecification findMerges( // Finally, record all merges that are viable at this level: int end = start + mergeFactor; while (end <= 1 + upto) { - boolean anyTooLarge = false; boolean anyMerging = false; + long mergeSize = 0; + long mergeDocs = 0; for (int i = start; i < end; i++) { final SegmentInfoAndLevel segLevel = levels.get(i); final SegmentCommitInfo info = segLevel.info; - anyTooLarge |= - (size(info, mergeContext) >= maxMergeSize - || sizeDocs(info, mergeContext) >= maxMergeDocs); if (mergingSegments.contains(info)) { anyMerging = true; break; } + long segmentSize = size(info, mergeContext); + long segmentDocs = sizeDocs(info, mergeContext); + if (mergeSize + segmentSize > maxMergeSize || mergeDocs + segmentDocs > maxMergeDocs) { + // This merge is full, stop adding more segments to it + if (i == start) { + // This segment alone is too large, return a singleton merge + if (verbose(mergeContext)) { + message( + " " + i + " is larger than the max merge size/docs; ignoring", mergeContext); + } + end = i + 1; + } else { + // Previous segments are under the max merge size, return them + end = i; + } + break; + } + mergeSize += segmentSize; + mergeDocs += segmentDocs; } - if (anyMerging) { - // skip - } else if (!anyTooLarge) { + if (anyMerging || end - start <= 1) { + // skip: there is an ongoing merge at the current level or the computed merge has a single + // segment and this merge policy doesn't do singleton merges + } else { Review Comment: So basically for a level which meets the merge factor we're merging every segments slice that is not exceed the max size specified? ########## lucene/core/src/java/org/apache/lucene/index/LogMergePolicy.java: ########## @@ -568,23 +568,41 @@ public MergeSpecification findMerges( // Finally, record all merges that are viable at this level: Review Comment: I'm confused by the level quantization process above actually: 1. From L527 it determines the max level, and then a level bottom based on that 2. Then it find the right boundary by search backwards for the first qualified segment in L556- This seems assuming the `levels` are sorted, but I can't find the sorting anywhere. Or otherwise how could it guarantee that the level decided in range of `[start,end)` won't contain segments that have lower level than `levelBottom`? Sorry the question is not quite related to the change itself -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org