jpountz opened a new issue, #14004: URL: https://github.com/apache/lucene/issues/14004
### Description I have been experimenting with configuring `TieredMergePolicy` to keep the segment count very low: - segsPerTier = 2 - floorSegmentSize = 512MB This typically helps if you run queries that have a high per-segment overhead (vector search, multi-term queries) and have a low indexing throughput (especially if indexing and search run on separate hardware so that merges don't disturb searches). Interestingly, an index that is less than 1GB can still have 10 segments with the above merge policy because of the constraint to not run merges where the resulting segment is less than 50% bigger than the biggest input segment. E.g. consider the following segment sizes: 100kB, 300kB, 800kB, 2MB, 5MB, 12MB, 30MB, 70MB, 150MB, 400MB. There is no pair of segments where the sum is more than 50% bigger than the max input segment. I have bias against removing this constraint since containing write amplification is important to not run into quadratic merging, but I wonder if there are other ways how we could further reduce the number of segments. For instance, `TieredMergePolicy` automatically takes the min of `maxMergeAtOnce` and `numSegsPerTier` as a merge factor, but it's not clear to me why this is important. If the merge policy allowed merges to have between 2 and 10 segments in the above example, it could find merges in the described segment structure, and this would likely help have lower write amplification for the same segment count? Other ideas? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org