[ https://issues.apache.org/jira/browse/LUCENE-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir resolved LUCENE-10569. ---------------------------------- Resolution: Won't Fix I'm closing this as won't fix because it isn't enough to change a default or alter a parameter. Now starting to see hacky solutions to dodge around this problem instead of actually fixing it. The O(n^2) behavior needs to go: LUCENE-10574 > Think again about the floor segment size? > ----------------------------------------- > > Key: LUCENE-10569 > URL: https://issues.apache.org/jira/browse/LUCENE-10569 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > > TieredMergePolicy has a floor segment size that it uses to prevent indexes > from having a long tail of small segments, which would be very inefficient at > search time. It is 2MB by default. > While this floor segment size is good for searches, it also has the side > effect of making merges run in quadratic time when segments are below this > size. This caught me by surprise several times when working on datasets that > have few fields or that are extremely space-efficient: even segments that are > not so small from a doc count perspective could be considered too small and > trigger quadratic merging because of this floor segment size. > We came up whis 2MB floor segment size many years ago when Lucene was less > space-efficient. I think we should consider lowering it at a minimum, and > maybe move from a threshold on the document count rather than the byte size > of the segment to better work with datasets of small or highly-compressible > documents > Separately, we should enable merge-on-refresh by default (LUCENE-10078) to > make sure that searches actually take advantage of this quadratic merging of > small segments, that only exists to make searches more efficient. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org