vigyasharma commented on issue #13226: URL: https://github.com/apache/lucene/issues/13226#issuecomment-2033392249
`TieredMergePolicy` prefers merges that have less skew across segment sizes, smaller size, and higher no. of expunged deletes. Each merge here is a set of segments that will be merged into a single segment (eventually this becomes a`OneMerge` object). To do this curation, the policy assigns a *merge score* to each merge, and lower values of the score are preferred for merging. __ ```java [2024-03-20T15:46:36,015][TRACE][o.e.i.e.E.MP ]: Lucene Merge Thread #403832] MP: maybe=_1wtuc(8.7.0):C29948777/29948777:[diagnostics={os=Linux, java.version=11.0.17, os.arch=amd64, java.runtime.version=11.0.17+9-LTS, source=merge, ... :softDel=12100433 :id=i9yc9l5c6qvt26u9srmz8umo score=2.220052984489907 skew=0.713 nonDelRatio=1.000 tooLarge=false size=7083.170 MB ``` From the log above, `_1wtuc` seems to have a high skew value (it ranges from `1/mergeFactor = 0.1` (best) to 1 (worst)), but what stands out is the high value of `nonDelRatio = 1.000`. **nonDelRatio** is [calculated](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/TieredMergePolicy.java#L700) as `totalBytesAfterMerge / totalBytesBeforeMerge`, and gives a sense of the no. of deletes that merge would expunge. A high value (1 being highest) indicates that merge will not reclaim any deletes! The value for `totalBytesAfterMerge` comes from summing up the post-merge size of each segment, which is [computed](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/MergePolicy.java#L753-L765) by prorating the size of expunged deletes: `segmentSize * (1 - reclaimableDeletes/maxDoc)`. The no. of reclaimable deletes is fetched from `numDeletesToMerge()` in the merge policy, which can be overridden by implementations like `SoftDeletesRetentionMergePolicy` to retain soft deleted documents in the segment post merge. It is likely that for this segment, even though we have a high no. of deletes, `SoftDeletesRetentionMergePolicy` is retaining all of them, causing `nonDelRatio` to be 1. Would help to look at your **SoftDeletesRetentionMergePolicy** implementation. ... As a side note, is the log line above truncated? Because going by `C29948777/29948777` and `:softDel=12100433` - the size of pending deletes in the segment is `29948777` (same as total docs), while no. of soft deletes is `12100433`, (only 60% of total pending deletes). Even if all of them are retained by the merge policy, there should still be 40% deletes that merge can reclaim. I wonder if some info, like details of other segments in the merge, got truncated from the log line. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org