On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> We monitored the index size for a few days and found that it varies
> widely from 11GB to 43GB. 

Lucene/Solr indexes consists of segments, each holding a number of
documents. When a document is deleted, its bytes are not removed
immediately, only marked. When a document is updated, it is effectively
a delete and an add.

If you have an index with 3 documents
  segment-0 (live docs [0, 1, 2], deleted docs [])
and update document 0 and 1, you will have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live docs
[0, 1], deleted docs [])
if you then update document 1 again, you will
have
  segment-0 (live docs [2], deleted docs [0, 1])
  segment-1 (live
docs [0], deleted docs [1])
  segment-1 (live docs [1], deleted docs [])

for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.

The space is reclaimed when segments are merged, but depending on your setup 
and update pattern that may take some time. Furthermore there is a temporary 
overhead of merging, when the merged segment is being written and the old 
segments are still available. 4x the minimum size is fairly large, but not 
unrealistic, with enough index-updates.

> Recently, we started getting a lot of out of memory errors on the
> master. Everytime, solr becomes unresponsive and we need to restart
> jetty to bring it back up. At the same we observed the variation in
> index size. We are suspecting that these two problems may be linked.

Quick sanity check: Look for "Overlapping onDeckSearchers" in your
solr.log to see if your memory problems are caused by multiple open
searchers:
https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
ingSearchers.3DX.22_mean.3F
-- 
Toke Eskildsen, Royal Danish Library

Reply via email to