On 2/9/2014 11:41 PM, Arun Rangarajan wrote: > I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an > 'out of memory' error. Is optimize really necessary, since I read that > lucene is able to handle multiple segments well now?
I have had indexes with more than 45 segments, because of the merge settings that I use. My large index shards are about 16GB at the moment. Out of memory errors are very rare because I use a fairly large heap, at 6GB for a machine that hosts three of these large shards. When I was still experimenting with my memory settings, I did see occasional out of memory errors during normal segment merging. Increasing your heap size is pretty much required at this point. I've condensed some very basic information about heap sizing here: http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap As for whether optimizing on 4.x is necessary: I do not have any hard numbers for you, but I can tell you that an optimized index does seem noticeably faster than one that is freshly built and has has a large number of relatively large segments. I optimize my index shards on an schedule, but it is relatively infrequent -- one large shard per night. Most of the time what I have is one really large segment and a bunch of super-small segments, and that does not seem to suffer from performance issues compared to a fully optimized index. The situation is different right after a fresh rebuild, which produces a handful of very large segments and a bunch of smaller segments of varying sizes. Interesting but probably irrelevant details: Although I don't use mergeFactor any more, the TieredMergePolicy settings that I use are equivalent to a mergeFactor of 35. I chose this number back in the 1.4.1 days because it resulted in synchronicity between merges and lucene segment names when LogByteSizeMergePolicy was still in use. Segments _0 through _z would be merged into segment _10, and so on. Thanks, Shawn