jpountz commented on PR #13430:
URL: https://github.com/apache/lucene/pull/13430#issuecomment-2137276912

   > There are at most 2 segment tiers.
   
   Well, there can be more tiers, but since tiers have exponential sizes (e.g. 
if you merge factor is 10, each tier has segments that are 10x bigger than the 
previous tier) it's almost certainly fine to ignore segments on tiers beyond 
the 2nd higher tier, they would account to very few documents compared with the 
first 2 tiers.
   
   > segsPerTier is not taken into account if minNumSegments is specified, 
correct?
   
   If `minNumSegments` is bigger than `segsPerTier`, then the highest tier 
should indeed allow for up to `minNumSegments` (or maybe `minNumSegments-1` if 
there are multiple tiers, though it may introduce complexity for little value, 
maybe it's better/easier to keep things simple and allow for `minNumSegments` 
on the highest tier all the time). But lower tiers should still aim for 
`segsPerTier` segments.
   
   > In case segments reach maxMergedSegmentBytes, should we fall back to the 
previous segsPerTier behaviour?
   
   I can understood this question in different ways, so I'll try to clarify how 
I think these two parameters should interact:
    - Even if `minNumSegments` is configured, we should still honor 
`maxMergedSegmentBytes` and not create segments that are bigger than that.
    - If your merge policy has segsPerTier = 10, maxMergedSegmentMB = 5GB, 
floorSegmentBytes = 100MB, minNumSegments = 12, and the total index size is 
15GB, it should aim for allowedSegCount=23 segments (10 100MB segments ~=1GB, 
14 1GB segments). So even though we could have created a segment of the maximum 
merged size by merging 10 1GB segments, we did not do it because then we would 
have ended up with a segment that had more than maxDoc/12 documents (assuming 
all docs contribute equally to the index size).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to