SOLR memory usage (sort fields? replication?)

Andrea Gazzarini Tue, 13 Aug 2013 08:52:59 -0700

Hi,

I'm getting some Out of memory (heap space) from my solr instance andafter investigating a little bit, I found several threads about sortingbehaviour in SOLR.


First, some information about the environment

- I'm using SOLR 3.6.1 and master / slave architecture with 1 master and2 slaves.- All of them have Xms and Xmx set to 4GB, index is about 10GB for about1.800.000 documents.

- Indexes are updated (and therefore replicated once in a day)

After the first OOM I saw the corresponding dump on Memory Analyzer andI found a BIG /org.apache.lucene.search.FieldCacheImpl /instance (morethan 2GB)...I exploded its internal structure and realized that I had alot of long long sort fields (book titles which were composed by title +subtitle + author concatenated)...so, what I did? basically I reducedthe length of that field (now is composed only by the first title) sonow I have a more limited number of unique fields.


Now, 5 hours ago

- I took the production SOLR log and I extracted something about 20.000(real) queries- I started the master, slaves and reindexed all documents, after alittle index has been replicated on slaves.- I started solrmeter that is randonmly querying slaves (using theextracted queries)- After two hours memory comsuption peak was (jvisualvm) about 2GB,every (moreless) 5 minutes GC freed about 500GB...constantly.- I indexed 4000 documents, 10 minutes after replication the wholememory consumption has been completely translated up...min peak 2GBmin, 2.6GB max.- After two hours I indexed other documents (4000) and now I have a minpeak of 2.6GB and a max of 3.4GB...and is still slowly growing...

Note that the number of newly indexed documents is not so relevant (4000on a total of 1.800.000)


Now, using JConsole I see

- a PS Eden space which is periodically clean (it's responsible of thewave between the min and the max usage)

- a PS Survivor space which is very low (16MB)

- a PS Old Gen which is set to 2.6GB and it's growing, very slowly butit's still growing...


Now, the question...

I generated another dump and, as expected, the most part of the usage isstill in /org.apache.lucene.search.FieldCacheImpl. /Of course, the sizeis about 980MB (initially it was more than 2GB) which seems good (atleast better than the initial situation). The most part of those 980MBare still occupied by sort fields


What I'm not understanding is how sort fields are loaded in memory...

I mean, I read that in order to optimize sorting, SOLR needs to load allvalues of sort fields, ok, that's good. But why I see severalWeakHashMaps that contains different Entry references with the same sortfield (and its values)?

For example for title_sort (unique values are 1.432.000) I have two(different, is not the same reference) Entry objects with a


- key "title_sort"

- and a value (org.apache.lucene.search.FieldCache$StringIndex) whichhas a int array [1.432.000] and a String array with moreless the same size

So the memory usage (in this case) is doubled...are sort field valuesloaded in memory more than once? How many times?


Best and as usual, sorry for the long email
Andrea

SOLR memory usage (sort fields? replication?)

Reply via email to