Dear Shawn, Thanks for your reply. For now, I did merges in steps with maxSegments param (using HOST:PORT/CORE/update?optimize=true&maxSegments=10). First I merged the 45 segments to 10, and then from 10 to 5. (Merging from 5 to 2 again caused out-of-memory exception.) Now I have a 5-segment index with all segments roughly of equal sizes. Will try using that and see if that is good enough for us.
On Sun, Feb 9, 2014 at 11:22 PM, Shawn Heisey <s...@elyograg.org> wrote: > On 2/9/2014 11:41 PM, Arun Rangarajan wrote: > > I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an > > 'out of memory' error. Is optimize really necessary, since I read that > > lucene is able to handle multiple segments well now? > > I have had indexes with more than 45 segments, because of the merge > settings that I use. My large index shards are about 16GB at the > moment. Out of memory errors are very rare because I use a fairly large > heap, at 6GB for a machine that hosts three of these large shards. When > I was still experimenting with my memory settings, I did see occasional > out of memory errors during normal segment merging. > > Increasing your heap size is pretty much required at this point. I've > condensed some very basic information about heap sizing here: > > http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap > > As for whether optimizing on 4.x is necessary: I do not have any hard > numbers for you, but I can tell you that an optimized index does seem > noticeably faster than one that is freshly built and has has a large > number of relatively large segments. > > I optimize my index shards on an schedule, but it is relatively > infrequent -- one large shard per night. Most of the time what I have > is one really large segment and a bunch of super-small segments, and > that does not seem to suffer from performance issues compared to a fully > optimized index. The situation is different right after a fresh > rebuild, which produces a handful of very large segments and a bunch of > smaller segments of varying sizes. > > Interesting but probably irrelevant details: > > Although I don't use mergeFactor any more, the TieredMergePolicy > settings that I use are equivalent to a mergeFactor of 35. I chose this > number back in the 1.4.1 days because it resulted in synchronicity > between merges and lucene segment names when LogByteSizeMergePolicy was > still in use. Segments _0 through _z would be merged into segment _10, > and so on. > > Thanks, > Shawn > >