On 4/11/2013 7:46 AM, Michael Ryan wrote:
In our system, we use LogByteSizeMergePolicy, and used to have a mergeFactor of 
10. We would see the worst case happen when there were exactly 20 segments (or 
some other multiple of 10, I believe) at the start of the optimize. IIRC, it 
would merge those 20 segments down to 2 segments, and then merge those 2 
segments down to 1 segment. 1*indexSize space was used by the original index 
(because there is still a reader open on it), 1*indexSpace was used by the 2 
segments, and 1*indexSize space was used by the 1 segment. This is the worst 
case because there are two full additional copies of the index on disk. 
Normally, when the number of segments is not a multiple of the mergeFactor, 
there will be some part of the index that was not part of both merges (and this 
part that is excluded usually would be the largest segments).

We worked around this by doing multiple optimize passes, where the first pass 
merges down to between 2 and 2*mergeFactor-1 segments (based on a great tip 
from Lance Norskog on the mailing list a couple years ago).

For optimizes that are taking multiple passes instead of just building one segment from the start, TieredMergePolicy offers the maxMergeAtOnceExplicit parameter. In most situations, setting this to three times the value of maxMergeAtOnce and maxSegmentsPerTier (which are usually set the same) will probably result in all optimizes completing in one pass.

Thanks,
Shawn

Reply via email to