[ 
https://issues.apache.org/jira/browse/LUCENE-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-10569.
----------------------------------
    Resolution: Won't Fix

I'm closing this as won't fix because it isn't enough to change a default or 
alter a parameter.

Now starting to see hacky solutions to dodge around this problem instead of 
actually fixing it.

The O(n^2) behavior needs to go: LUCENE-10574

> Think again about the floor segment size?
> -----------------------------------------
>
>                 Key: LUCENE-10569
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10569
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> TieredMergePolicy has a floor segment size that it uses to prevent indexes 
> from having a long tail of small segments, which would be very inefficient at 
> search time. It is 2MB by default.
> While this floor segment size is good for searches, it also has the side 
> effect of making merges run in quadratic time when segments are below this 
> size. This caught me by surprise several times when working on datasets that 
> have few fields or that are extremely space-efficient: even segments that are 
> not so small from a doc count perspective could be considered too small and 
> trigger quadratic merging because of this floor segment size.
> We came up whis 2MB floor segment size many years ago when Lucene was less 
> space-efficient. I think we should consider lowering it at a minimum, and 
> maybe move from a threshold on the document count rather than the byte size 
> of the segment to better work with datasets of small or highly-compressible 
> documents
> Separately, we should enable merge-on-refresh by default (LUCENE-10078) to 
> make sure that searches actually take advantage of this quadratic merging of 
> small segments, that only exists to make searches more efficient.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to