On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote: > We monitored the index size for a few days and found that it varies > widely from 11GB to 43GB.
Lucene/Solr indexes consists of segments, each holding a number of documents. When a document is deleted, its bytes are not removed immediately, only marked. When a document is updated, it is effectively a delete and an add. If you have an index with 3 documents segment-0 (live docs [0, 1, 2], deleted docs []) and update document 0 and 1, you will have segment-0 (live docs [2], deleted docs [0, 1]) segment-1 (live docs [0, 1], deleted docs []) if you then update document 1 again, you will have segment-0 (live docs [2], deleted docs [0, 1]) segment-1 (live docs [0], deleted docs [1]) segment-1 (live docs [1], deleted docs []) for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents. The space is reclaimed when segments are merged, but depending on your setup and update pattern that may take some time. Furthermore there is a temporary overhead of merging, when the merged segment is being written and the old segments are still available. 4x the minimum size is fairly large, but not unrealistic, with enough index-updates. > Recently, we started getting a lot of out of memory errors on the > master. Everytime, solr becomes unresponsive and we need to restart > jetty to bring it back up. At the same we observed the variation in > index size. We are suspecting that these two problems may be linked. Quick sanity check: Look for "Overlapping onDeckSearchers" in your solr.log to see if your memory problems are caused by multiple open searchers: https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm ingSearchers.3DX.22_mean.3F -- Toke Eskildsen, Royal Danish Library