jpountz commented on PR #13430: URL: https://github.com/apache/lucene/pull/13430#issuecomment-2137276912
> There are at most 2 segment tiers. Well, there can be more tiers, but since tiers have exponential sizes (e.g. if you merge factor is 10, each tier has segments that are 10x bigger than the previous tier) it's almost certainly fine to ignore segments on tiers beyond the 2nd higher tier, they would account to very few documents compared with the first 2 tiers. > segsPerTier is not taken into account if minNumSegments is specified, correct? If `minNumSegments` is bigger than `segsPerTier`, then the highest tier should indeed allow for up to `minNumSegments` (or maybe `minNumSegments-1` if there are multiple tiers, though it may introduce complexity for little value, maybe it's better/easier to keep things simple and allow for `minNumSegments` on the highest tier all the time). But lower tiers should still aim for `segsPerTier` segments. > In case segments reach maxMergedSegmentBytes, should we fall back to the previous segsPerTier behaviour? I can understood this question in different ways, so I'll try to clarify how I think these two parameters should interact: - Even if `minNumSegments` is configured, we should still honor `maxMergedSegmentBytes` and not create segments that are bigger than that. - If your merge policy has segsPerTier = 10, maxMergedSegmentMB = 5GB, floorSegmentBytes = 100MB, minNumSegments = 12, and the total index size is 15GB, it should aim for allowedSegCount=23 segments (10 100MB segments ~=1GB, 14 1GB segments). So even though we could have created a segment of the maximum merged size by merging 10 1GB segments, we did not do it because then we would have ended up with a segment that had more than maxDoc/12 documents (assuming all docs contribute equally to the index size). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org