khushbr opened a new issue, #13226:
URL: https://github.com/apache/lucene/issues/13226

   ### Description
   
   ### Description
   We have a cluster, running on Lucene v8.7.0 and configured with 
`TieredMergePolicy`.  We are seeing a peculiar behavior where segments with 
heavy deletes are not getting picked as part of background merge operation and 
also, on invoking force merge expunge delete. 
   
   As seen in below segment info, all the segments are close to 
`maxMergedSegmentBytes` 5GB value and the `segDelPct` is ~99.9%, which is 
significantly higher than the threshold value of 20%, defined in 
`deletesPctAllowed`.  
   
   ```
   index     shard prirep segment generation docs.count docs.deleted    size 
size.memory committed searchable version compound
   index-1   0     p      _37pfj     5398399         10            0  20.9kb    
       0 true      false      8.7.0   true
   index-1   0     r      _1pzoc     2892252         37            0  26.1kb    
       0 true      false      8.7.0   true
   ...
   index-1   0     p      _2hr8l     4187685         40     28834069   4.8gb    
   10408 true      true       8.7.0   false
   index-1   0     p      _1voeo     3157584          0     29902767   4.9gb    
   10520 true      true       8.7.0   false
   index-1   0     r      _1wtuc     3211284          0     29948777   4.9gb    
   10520 true      true       8.7.0   false
   index-1   0     r      _2jchf     4261875         40     30082926     5gb    
   10584 true      true       8.7.0   false
   ```
   
   
   1. In an attempt to influence the 
[MergeScore](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java#L588)
 for these segments, I increased the value for `reclaim_deletes_weight` to 
ridiculously high number `50` but the segment was still getting skipped with 
score of `~2.22` and skew value `0.713`.
   ```
   [2024-03-20T15:46:36,015][TRACE][o.e.i.e.E.MP ]: Lucene Merge Thread 
#403832] MP:   maybe=_1wtuc(8.7.0):C29948777/29948777:[diagnostics={os=Linux, 
java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, 
source=merge, ... :softDel=12100433 :id=i9yc9l5c6qvt26u9srmz8umo 
score=2.220052984489907 skew=0.713 nonDelRatio=1.000 tooLarge=false 
size=7083.170 MB
   
   ```
   2. Also played with increasing and decreasing the threshold value of 
`max_merged_segment`. Decreasing the value to 3GB resulted in segment `_1pa38`  
getting picked for merge, but the deletes were not expunged post the merge 
finish. 
   ```
   [2024-03-27T08:33:42,666][TRACE][o.e.i.e.E.MS] [refresh][T#1] MS:     launch 
new thread
   ..
   [2024-03-27T08:33:42,667][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] 
IW: now apply deletes for 10 merging segments
   ...
   2024-03-27T08:33:42,667][TRACE][o.e.i.e.E.IW ]: Lucene Merge Thread #145644] 
IW: now merge ...  index=_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, 
java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, 
source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com 
Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, 
mergeMaxNumSegments=40, mergeFactor=10, 
timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886
 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8...
   [2024-03-27T08:33:42,848][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] 
IW: merge codec=Lucene87 maxDoc=6985; merged segment has no vectors; norms; 
docValues; prox; freqs; points; 0.2 sec to merge segment [1.19 MB, 6.64 MB/sec]
   
   [2024-03-27T08:33:44,699][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] 
IW: commitMerge: ... index=_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, 
java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, 
source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com 
Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, 
mergeMaxNumSegments=40, mergeFactor=10, 
timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886
 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8
   ...
   [2024-03-27T08:33:44,700][TRACE][o.e.i.e.E.IFD]: Lucene Merge Thread 
#145644] IFD: now checkpoint "_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, 
java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, 
source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com 
Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, 
mergeMaxNumSegments=40, mergeFactor=10, 
timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886
 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8 ...
   [2024-03-27T08:33:44,701][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] 
IW: after commitMerge: _1pa38(8.7.0):C60121906:[diagnostics={os=Linux, 
java.version=11.0.17...
   ```
   
   
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to