khushbr opened a new issue, #13226: URL: https://github.com/apache/lucene/issues/13226
### Description ### Description We have a cluster, running on Lucene v8.7.0 and configured with `TieredMergePolicy`. We are seeing a peculiar behavior where segments with heavy deletes are not getting picked as part of background merge operation and also, on invoking force merge expunge delete. As seen in below segment info, all the segments are close to `maxMergedSegmentBytes` 5GB value and the `segDelPct` is ~99.9%, which is significantly higher than the threshold value of 20%, defined in `deletesPctAllowed`. ``` index shard prirep segment generation docs.count docs.deleted size size.memory committed searchable version compound index-1 0 p _37pfj 5398399 10 0 20.9kb 0 true false 8.7.0 true index-1 0 r _1pzoc 2892252 37 0 26.1kb 0 true false 8.7.0 true ... index-1 0 p _2hr8l 4187685 40 28834069 4.8gb 10408 true true 8.7.0 false index-1 0 p _1voeo 3157584 0 29902767 4.9gb 10520 true true 8.7.0 false index-1 0 r _1wtuc 3211284 0 29948777 4.9gb 10520 true true 8.7.0 false index-1 0 r _2jchf 4261875 40 30082926 5gb 10584 true true 8.7.0 false ``` 1. In an attempt to influence the [MergeScore](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java#L588) for these segments, I increased the value for `reclaim_deletes_weight` to ridiculously high number `50` but the segment was still getting skipped with score of `~2.22` and skew value `0.713`. ``` [2024-03-20T15:46:36,015][TRACE][o.e.i.e.E.MP ]: Lucene Merge Thread #403832] MP: maybe=_1wtuc(8.7.0):C29948777/29948777:[diagnostics={os=Linux, java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, source=merge, ... :softDel=12100433 :id=i9yc9l5c6qvt26u9srmz8umo score=2.220052984489907 skew=0.713 nonDelRatio=1.000 tooLarge=false size=7083.170 MB ``` 2. Also played with increasing and decreasing the threshold value of `max_merged_segment`. Decreasing the value to 3GB resulted in segment `_1pa38` getting picked for merge, but the deletes were not expunged post the merge finish. ``` [2024-03-27T08:33:42,666][TRACE][o.e.i.e.E.MS] [refresh][T#1] MS: launch new thread .. [2024-03-27T08:33:42,667][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] IW: now apply deletes for 10 merging segments ... 2024-03-27T08:33:42,667][TRACE][o.e.i.e.E.IW ]: Lucene Merge Thread #145644] IW: now merge ... index=_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, mergeMaxNumSegments=40, mergeFactor=10, timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8... [2024-03-27T08:33:42,848][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] IW: merge codec=Lucene87 maxDoc=6985; merged segment has no vectors; norms; docValues; prox; freqs; points; 0.2 sec to merge segment [1.19 MB, 6.64 MB/sec] [2024-03-27T08:33:44,699][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] IW: commitMerge: ... index=_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, mergeMaxNumSegments=40, mergeFactor=10, timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8 ... [2024-03-27T08:33:44,700][TRACE][o.e.i.e.E.IFD]: Lucene Merge Thread #145644] IFD: now checkpoint "_1pa38(8.7.0):C60121906:[diagnostics={os=Linux, java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, source=merge, os.version=5.10.149-133.644.amzn2.x86_64, java.vendor=Amazon.com Inc., java.vm.version=11.0.17+9-LTS, lucene.version=8.7.0, mergeMaxNumSegments=40, mergeFactor=10, timestamp=1711309532883}]:[attributes={Lucene87StoredFieldsFormat.mode=BEST_SPEED}]:fieldInfosGen=1886:dvGen=1886 :softDel=60117820 :id=4v1w9rn7oj6c0d78gy0t6lih8 ... [2024-03-27T08:33:44,701][TRACE][o.e.i.e.E.IW]: Lucene Merge Thread #145644] IW: after commitMerge: _1pa38(8.7.0):C60121906:[diagnostics={os=Linux, java.version=11.0.17... ``` ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org