I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, 
thus using the default LogByteSizeMergePolicy.  Before I do an optimize, 
typically the largest segment will be about 90% of the total index size.

When I do an optimize, the total disk space required is usually about 2x the 
index size.  But about 10% of the time, the disk space required is about 3x the 
index size - when this happens, I see a very large segment created, roughly the 
size of the original index size, followed by another slightly larger segment.

After some investigating, I found that this would happen when there were 
exactly 20 segments in the index when the optimize started.  My hypothesis is 
that this is a side-effect of the 20 segments being evenly divisible by the 
mergeFactor of 10.  I'm thinking that when there are 20 segments, the largest 
segment is being merged twice - first when merging the 20 segments down to 2, 
then again when merging from 2 to 1.

I would like to avoid this if at all possible, as it requires 50% more disk 
space and takes almost twice as long to optimize.  Would using 
TieredMergePolicy help me here, or some other config I can change?

-Michael

Reply via email to