[
https://issues.apache.org/jira/browse/LUCENE-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545487#comment-17545487
]
Michael Sokolov commented on LUCENE-10599:
------------------------------------------
I don't have any deep understanding of the log merge policy, but this grouping
operation you describe sounds buggy; +1 to improve it to be less jagged.
> Improve LogMergePolicy's handling of maxMergeSize
> -------------------------------------------------
>
> Key: LUCENE-10599
> URL: https://issues.apache.org/jira/browse/LUCENE-10599
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Time Spent: 10m
> Remaining Estimate: 0h
>
> LogMergePolicy excludes from merging segments whose size is greater than or
> equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still
> considered for merging, segments will effectively reach a size somewhere
> between maxMergeSize and mergeFactor*maxMergeSize before they are not
> considered for merging anymore.
> At least this is what I thought. When LogMergePolicy ignores a segment that
> is too large for merging, it also ignores other segments that are in the same
> window of mergeFactor segments for merging if they are on the same tier. So
> actually segments might reach a size that is somewhere between maxMergeSize /
> mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not
> considered for merging anymore.
> Assuming a merge factor of 10 and a max merge size of 1,000 this means that
> segments will reach their maximum size somewhere between 178 and 10,000. This
> range is too large and makes maxMergeSize too hard to reason about?
> Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will
> happily merge them into a single 9990-docs segment. However if you have one
> 1,000 segment and 9 180-docs segments, then the 180-docs segments will not
> get merged with any other segment, even if you keep adding segments to the
> index.
> I propose to change this behavior so that when a large segment is
> encountered, then we wouldn't skip the entire window of mergeFactor segments,
> but just the segments that are too large.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]