[ https://issues.apache.org/jira/browse/LUCENE-10599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17545487#comment-17545487 ]
Michael Sokolov commented on LUCENE-10599: ------------------------------------------ I don't have any deep understanding of the log merge policy, but this grouping operation you describe sounds buggy; +1 to improve it to be less jagged. > Improve LogMergePolicy's handling of maxMergeSize > ------------------------------------------------- > > Key: LUCENE-10599 > URL: https://issues.apache.org/jira/browse/LUCENE-10599 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > LogMergePolicy excludes from merging segments whose size is greater than or > equal to maxMergeSize. Since a segment whose size is maxMergeSize-1 is still > considered for merging, segments will effectively reach a size somewhere > between maxMergeSize and mergeFactor*maxMergeSize before they are not > considered for merging anymore. > At least this is what I thought. When LogMergePolicy ignores a segment that > is too large for merging, it also ignores other segments that are in the same > window of mergeFactor segments for merging if they are on the same tier. So > actually segments might reach a size that is somewhere between maxMergeSize / > mergeFactor^0.75 and maxMergeSize * mergeFactor before they are not > considered for merging anymore. > Assuming a merge factor of 10 and a max merge size of 1,000 this means that > segments will reach their maximum size somewhere between 178 and 10,000. This > range is too large and makes maxMergeSize too hard to reason about? > Specifically, if you have 10 999-docs segments, then LogDocMergePolicy will > happily merge them into a single 9990-docs segment. However if you have one > 1,000 segment and 9 180-docs segments, then the 180-docs segments will not > get merged with any other segment, even if you keep adding segments to the > index. > I propose to change this behavior so that when a large segment is > encountered, then we wouldn't skip the entire window of mergeFactor segments, > but just the segments that are too large. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org