Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

via GitHub Fri, 31 May 2024 02:34:43 -0700


carlosdelest commented on code in PR #13430:
URL: https://github.com/apache/lucene/pull/13430#discussion_r1620451379



##########
lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java:
##########
@@ -93,6 +93,7 @@ public class TieredMergePolicy extends MergePolicy {
   private double segsPerTier = 10.0;
   private double forceMergeDeletesPctAllowed = 10.0;
   private double deletesPctAllowed = 20.0;
+  private int targetSearchConcurrency = -1;

Review Comment:
   Yes - I iterated a bit on this but now it's equivalent. Thanks!



##########
lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java:
##########
@@ -522,21 +550,28 @@ private MergeSpecification doFindMerges(
         final List<SegmentCommitInfo> candidate = new ArrayList<>();

Review Comment:
   I see some test failures, failing to merge segments that are eligible both 
from a bytes and doc perspective.
   
   It seems related to [this line 
here](https://github.com/apache/lucene/pull/13430/files#diff-da7b621934abf1e7bc74d6bf15f17ddd95cffa4f374776125d8de5cdc0054b2dL507):
   
   ```java
         if (mergeType == MERGE_TYPE.NATURAL
             && sortedEligible.size() <= allowedSegCount
             && remainingDelCount <= allowedDelCount) {
           return spec;
         } 
   ```
   
   Here we're bailing out if we don't have enough eligible segments as the 
allowed seg count.
   
   When adding more allowed segment counts with the target search concurrency, 
this optimization makes it impossible to merge segments that are available for 
merging, and thus get over budget and fail the tests.
   
   I don't get this optimization - why shouldn't we try to merge if we have 
less sorted eligibles? What am I missing to change this so we can take into 
account the bigger number of `allowedSegCount` that comes with adding target 
search concurrency?
   
   



##########
lucene/core/src/test/org/apache/lucene/index/TestTieredMergePolicy.java:
##########
@@ -431,6 +447,128 @@ public void testForcedMergesRespectSegSize() throws 
Exception {
     dir.close();
   }
 
+  public void testForcedMergesRespectsTargetSearchConcurrency() throws 
Exception {

Review Comment:
   I've given it a try - LMKWYT!



##########
lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java:
##########
@@ -409,6 +433,8 @@ public MergeSpecification findMerges(
     // allowedSegCount may occasionally be less than segsPerTier
     // if segment sizes are below the floor size
     allowedSegCount = Math.max(allowedSegCount, segsPerTier);
+    // Override with the targetSearchConcurrency if it is greater
+    allowedSegCount = Math.max(allowedSegCount, targetSearchConcurrency);

Review Comment:
   Correct, now the highest tier needs to take that into account. Thanks! 👍 



##########
lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java:
##########
@@ -257,6 +258,25 @@ public double getSegmentsPerTier() {
     return segsPerTier;
   }
 
+  /**
+   * Sets the target search concurrency. This allows merging to ensure that 
there are at least
+   * targetSearchConcurrency segments on the top tier. This setting can be 
overriden by force

Review Comment:
   Makes sense, updating.



##########
lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java:
##########
@@ -581,11 +621,14 @@ private MergeSpecification doFindMerges(
         // whose length is less than the merge factor, it means we are reaching
         // the tail of the list of segments and will only find smaller merges.
         // Stop here.
-        if (bestScore != null && hitTooLarge == false && candidate.size() < 
mergeFactor) {
+        if (bestScore != null
+            && hitTooLarge == false
+            && hitMaxDocs == false
+            && candidate.size() < mergeFactor) {
           break;
         }
 
-        final MergeScore score = score(candidate, hitTooLarge, segInfosSizes);
+        final MergeScore score = score(candidate, hitTooLarge || hitMaxDocs, 
segInfosSizes);

Review Comment:
   I see - I was thinking in doing the same with docs than with bytes, but I 
see they're quite different. Docs is more a hint to maintain the number of 
segments we want, but bytes provides a hard limit on size and also score 
guidance on segments to merge.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] WIP - Add minimum number of segments to TieredMergePolicy [lucene]

Reply via email to