carlosdelest commented on code in PR #13430: URL: https://github.com/apache/lucene/pull/13430#discussion_r1620451379
########## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ########## @@ -93,6 +93,7 @@ public class TieredMergePolicy extends MergePolicy { private double segsPerTier = 10.0; private double forceMergeDeletesPctAllowed = 10.0; private double deletesPctAllowed = 20.0; + private int targetSearchConcurrency = -1; Review Comment: Yes - I iterated a bit on this but now it's equivalent. Thanks! ########## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ########## @@ -522,21 +550,28 @@ private MergeSpecification doFindMerges( final List<SegmentCommitInfo> candidate = new ArrayList<>(); Review Comment: I see some test failures, failing to merge segments that are eligible both from a bytes and doc perspective. It seems related to [this line here](https://github.com/apache/lucene/pull/13430/files#diff-da7b621934abf1e7bc74d6bf15f17ddd95cffa4f374776125d8de5cdc0054b2dL507): ```java if (mergeType == MERGE_TYPE.NATURAL && sortedEligible.size() <= allowedSegCount && remainingDelCount <= allowedDelCount) { return spec; } ``` Here we're bailing out if we don't have enough eligible segments as the allowed seg count. When adding more allowed segment counts with the target search concurrency, this optimization makes it impossible to merge segments that are available for merging, and thus get over budget and fail the tests. I don't get this optimization - why shouldn't we try to merge if we have less sorted eligibles? What am I missing to change this so we can take into account the bigger number of `allowedSegCount` that comes with adding target search concurrency? ########## lucene/core/src/test/org/apache/lucene/index/TestTieredMergePolicy.java: ########## @@ -431,6 +447,128 @@ public void testForcedMergesRespectSegSize() throws Exception { dir.close(); } + public void testForcedMergesRespectsTargetSearchConcurrency() throws Exception { Review Comment: I've given it a try - LMKWYT! ########## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ########## @@ -409,6 +433,8 @@ public MergeSpecification findMerges( // allowedSegCount may occasionally be less than segsPerTier // if segment sizes are below the floor size allowedSegCount = Math.max(allowedSegCount, segsPerTier); + // Override with the targetSearchConcurrency if it is greater + allowedSegCount = Math.max(allowedSegCount, targetSearchConcurrency); Review Comment: Correct, now the highest tier needs to take that into account. Thanks! 👍 ########## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ########## @@ -257,6 +258,25 @@ public double getSegmentsPerTier() { return segsPerTier; } + /** + * Sets the target search concurrency. This allows merging to ensure that there are at least + * targetSearchConcurrency segments on the top tier. This setting can be overriden by force Review Comment: Makes sense, updating. ########## lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java: ########## @@ -581,11 +621,14 @@ private MergeSpecification doFindMerges( // whose length is less than the merge factor, it means we are reaching // the tail of the list of segments and will only find smaller merges. // Stop here. - if (bestScore != null && hitTooLarge == false && candidate.size() < mergeFactor) { + if (bestScore != null + && hitTooLarge == false + && hitMaxDocs == false + && candidate.size() < mergeFactor) { break; } - final MergeScore score = score(candidate, hitTooLarge, segInfosSizes); + final MergeScore score = score(candidate, hitTooLarge || hitMaxDocs, segInfosSizes); Review Comment: I see - I was thinking in doing the same with docs than with bytes, but I see they're quite different. Docs is more a hint to maintain the number of segments we want, but bytes provides a hard limit on size and also score guidance on segments to merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org