I'm using Solr 3.2 with a mergeFactor of 10 and no merge policy configured, thus using the default LogByteSizeMergePolicy. Before I do an optimize, typically the largest segment will be about 90% of the total index size.
When I do an optimize, the total disk space required is usually about 2x the index size. But about 10% of the time, the disk space required is about 3x the index size - when this happens, I see a very large segment created, roughly the size of the original index size, followed by another slightly larger segment. After some investigating, I found that this would happen when there were exactly 20 segments in the index when the optimize started. My hypothesis is that this is a side-effect of the 20 segments being evenly divisible by the mergeFactor of 10. I'm thinking that when there are 20 segments, the largest segment is being merged twice - first when merging the 20 segments down to 2, then again when merging from 2 to 1. I would like to avoid this if at all possible, as it requires 50% more disk space and takes almost twice as long to optimize. Would using TieredMergePolicy help me here, or some other config I can change? -Michael
