[ https://issues.apache.org/jira/browse/LUCENE-10574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17538183#comment-17538183 ]
Adrien Grand commented on LUCENE-10574: --------------------------------------- I was assuming we wanted to have strong guarantees about the number of segments in the index at search time, but it's a fair point that degrading to O(n^2) merging to meet this guarantee is not a good trade-off. I tried to think of ways we could do this. One obvious option is to remove {{floorSegmentBytes}}, but this might be a bit too extreme as it would allow any index to have a long tail of small segments? One idea I started playing with consists of ensuring that every merge grows the largest input segment by at least some fraction, e.g. 50%. It tries to strike a balance between avoiding pathological merging and still trying to keep the number of segments contained at search time. I quickly hacked this into TieredMergePolicy and this made the StoredFieldsBenchmark more than 2x faster. I wonder if there are other approaches we should consider. > Remove O(n^2) from TieredMergePolicy or change defaults to one that doesn't > do this > ----------------------------------------------------------------------------------- > > Key: LUCENE-10574 > URL: https://issues.apache.org/jira/browse/LUCENE-10574 > Project: Lucene - Core > Issue Type: Bug > Reporter: Robert Muir > Priority: Major > > Remove {{floorSegmentBytes}} parameter, or change lucene's default to a merge > policy that doesn't merge in an O(n^2) way. > I have the feeling it might have to be the latter, as folks seem really wed > to this crazy O(n^2) behavior. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org