[ 
https://issues.apache.org/jira/browse/LUCENE-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552278#comment-17552278
 ] 

ASF subversion and git services commented on LUCENE-10599:
----------------------------------------------------------

Commit 26e3bbc7d21e36f51cee81748d7c5e0222c3e86d in lucene's branch 
refs/heads/branch_9x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=26e3bbc7d21 ]

LUCENE-10599: Improve LogMergePolicy's handling of maxMergeSize. (#935)

With this change, segments are more likely to be considered for merging until
they reach the max merge size. Before this change, LogMergePolicy would exclude
an entire window of `mergeFactor` segments from merging if this window had a
too large segment and other segments were on the same tier.

> Improve LogMergePolicy's handling of maxMergeSize
> -------------------------------------------------
>
>                 Key: LUCENE-10599
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10599
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> LogMergePolicy excludes from merging segments whose size is greater than or 
> equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still 
> considered for merging, segments will effectively reach a size somewhere 
> between maxMergeSize and mergeFactor*maxMergeSize before they are not 
> considered for merging anymore.
> At least this is what I thought. When LogMergePolicy ignores a segment that 
> is too large for merging, it also ignores other segments that are in the same 
> window of mergeFactor segments for merging if they are on the same tier. So 
> actually segments might reach a size that is somewhere between maxMergeSize / 
> mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not 
> considered for merging anymore.
> Assuming a merge factor of 10 and a max merge size of 1,000 this means that 
> segments will reach their maximum size somewhere between 178 and 10,000. This 
> range is too large and makes maxMergeSize too hard to reason about?
> Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will 
> happily merge them into a single 9990-docs segment. However if you have one 
> 1,000 segment and 9 180-docs segments, then the 180-docs segments will not 
> get merged with any other segment, even if you keep adding segments to the 
> index.
> I propose to change this behavior so that when a large segment is 
> encountered, then we wouldn't skip the entire window of mergeFactor segments, 
> but just the segments that are too large.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to