On 1/18/2017 6:51 AM, Kelly, Frank wrote: > We’re investigating a strange spike in Heap memory usage in our > Production Solr. > Heap is stable for days ~ 1.6GB and then suddenly spikes to 3.9 GB and > we get an OOM. > > Our app server behavior using Solr appears to unchanged (no new schema > updates, no additional indexing or searching we could see) > We’re speculating that perhaps segment merges may be contributing to > the heap size increase? > > *Details* > Solr 5.3.1 > Solr Cloud deployment with 110M+ documents in 2 Collections (72M and > 28M) each across 3 shards (each with 3 replicas) > Heavy indexing vs Query load (API calls are 90% Indexing, 10% querying) > > Heap Settings > -Xmx4096m > > Some solrconfig.xml settings > > <!-- default: 100 --> > <ramBufferSizeMB>256</ramBufferSizeMB> > <!-- default: 1000 --> > <maxBufferedDocs>10000</maxBufferedDocs> > > <!-- default: 8 --> > <maxIndexingThreads>10</maxIndexingThreads> > > <mergeFactor>20</mergeFactor> > > We turned on InfoStream logging and saw the following > > 2017-01-18 13:31:55.368 INFO (Lucene Merge Thread #24) > [c:prod_us-east-1_here_account s:shard1 r:core_node30 > x:prod_us-east-1_here_account_shard1_replica4] > o.a.s.u.LoggingInfoStream [TMP][Lucene Merge Thread #24]: > seg=_9eac9(5.3.1):C23776249/1714903:delGen=13735 size=4338.599 MB > [skip: too large]
This "skip: too large" message likely means that the size of this segment, if merged with other segments, would be larger than the max segment size. The max size defaults to 5GB, this segment is 4.3GB in size already. I think you've got an incorrect idea of how Java memory works. You indicated that the heap stays stable at about 1.6GB ... but this is NOT how Java works. When a piece of memory is allocated by a Java program, that memory is not reclaimed when the program no longer needs the object. It is garbage collection, a background process, that frees the memory. A graph of memory usage from a healthy Java program looks like a sawblade -- allocations use up all the memory in one of the heap regions, then garbage collection kicks in and frees up what it can. Java's normal operation involves constant "spikes" in heap usage. The heap usage of Solr will constantly increase as it runs, then garbage collection will kick in when one of the heap regions reaches capacity, reclaiming objects that the program no longer needs and freeing up memory. OOM happens when garbage collection is unable to free any memory because all of it is still in use. There are exactly two ways to deal with OOM: 1) Increase the size of your heap. 2) Make the program use less memory. I have two theories about why your solr install is using up all your heap and still requesting more: 1) Your Solr caches, particularly your filterCache, may be very large. 2) You may be doing a large number of queries that use a lot of memory -- a lot of facets, and/or using a lot of different fields for sorting. Assuming the entire index is on one server, for your 72 million document index, each filterCache entry is 9 million bytes in size. For your 28 million document index, each filterCache entry is 3.5 million bytes. The default size for the filterCache in Solr example configs is 512. If you actually fill that cache up on a 72 million document index, just the one cache would require more than the 4GB of memory that you have allocated to Java. You probably need to decrease the size of the filterCache. If you're doing a lot of facets or sorting, you may need to increase the heap size. Segment merges do use additional memory, but I wouldn't expect that to be anything more than a minor contributor to heap usage. Here's some additional reading on the subject of Solr performance. Most of this page talks about memory, because that's the limiting factor for performance in most cases. The page includes some information about things that can require a lot of heap memory, and steps you may be able to take to reduce the memory required: https://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn