jpountz opened a new issue, #14004:
URL: https://github.com/apache/lucene/issues/14004

   ### Description
   
   I have been experimenting with configuring `TieredMergePolicy` to keep the 
segment count very low:
    - segsPerTier = 2
    - floorSegmentSize = 512MB
   
   This typically helps if you run queries that have a high per-segment 
overhead (vector search, multi-term queries) and have a low indexing throughput 
(especially if indexing and search run on separate hardware so that merges 
don't disturb searches).
   
   Interestingly, an index that is less than 1GB can still have 10 segments 
with the above merge policy because of the constraint to not run merges where 
the resulting segment is less than 50% bigger than the biggest input segment. 
E.g. consider the following segment sizes: 100kB, 300kB, 800kB, 2MB, 5MB, 12MB, 
30MB, 70MB, 150MB, 400MB. There is no pair of segments where the sum is more 
than 50% bigger than the max input segment.
   
   I have bias against removing this constraint since containing write 
amplification is important to not run into quadratic merging, but I wonder if 
there are other ways how we could further reduce the number of segments.
   
   For instance, `TieredMergePolicy` automatically takes the min of 
`maxMergeAtOnce` and `numSegsPerTier` as a merge factor, but it's not clear to 
me why this is important. If the merge policy allowed merges to have between 2 
and 10 segments in the above example, it could find merges in the described 
segment structure, and this would likely help have lower write amplification 
for the same segment count?
   
   Other ideas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to