On 4/11/2013 7:46 AM, Michael Ryan wrote:
In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of
10. We would see the worst case happen when there were exactly 20 segments (or
some other multiple of 10, I believe) at the start of the optimize. IIRC, it
would merge those 20 segments down to 2 segments, and then merge those 2
segments down to 1 segment. 1*indexSize space was used by the original index
(because there is still a reader open on it), 1*indexSpace was used by the 2
segments, and 1*indexSize space was used by the 1 segment. This is the worst
case because there are two full additional copies of the index on disk.
Normally, when the number of segments is not a multiple of the mergeFactor,
there will be some part of the index that was not part of both merges (and this
part that is excluded usually would be the largest segments).
We worked around this by doing multiple optimize passes, where the first pass
merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip
from Lance Norskog on the mailing list a couple years ago).
For optimizes that are taking multiple passes instead of just building
one segment from the start, TieredMergePolicy offers the
maxMergeAtOnceExplicit parameter. In most situations, setting this to
three times the value of maxMergeAtOnce and maxSegmentsPerTier (which
are usually set the same) will probably result in all optimizes
completing in one pass.
Thanks,
Shawn