Hi,
I'm getting some Out of memory (heap space) from my solr instance and after investigating a little bit, I found several threads about sorting behaviour in SOLR.

First, some information about the environment

- I'm using SOLR 3.6.1 and master / slave architecture with 1 master and 2 slaves. - All of them have Xms and Xmx set to 4GB, index is about 10GB for about 1.800.000 documents.
- Indexes are updated (and therefore replicated once in a day)

After the first OOM I saw the corresponding dump on Memory Analyzer and I found a BIG /org.apache.lucene.search.FieldCacheImpl /instance (more than 2GB)...I exploded its internal structure and realized that I had a lot of long long sort fields (book titles which were composed by title + subtitle + author concatenated)...so, what I did? basically I reduced the length of that field (now is composed only by the first title) so now I have a more limited number of unique fields.

Now, 5 hours ago

- I took the production SOLR log and I extracted something about 20.000 (real) queries - I started the master, slaves and reindexed all documents, after a little index has been replicated on slaves. - I started solrmeter that is randonmly querying slaves (using the extracted queries) - After two hours memory comsuption peak was (jvisualvm) about 2GB, every (moreless) 5 minutes GC freed about 500GB...constantly. - I indexed 4000 documents, 10 minutes after replication the whole memory consumption has been completely translated up...min peak 2GB min, 2.6GB max. - After two hours I indexed other documents (4000) and now I have a min peak of 2.6GB and a max of 3.4GB...and is still slowly growing...

Note that the number of newly indexed documents is not so relevant (4000 on a total of 1.800.000)

Now, using JConsole I see

- a PS Eden space which is periodically clean (it's responsible of the wave between the min and the max usage)
- a PS Survivor space which is very low (16MB)
- a PS Old Gen which is set to 2.6GB and it's growing, very slowly but it's still growing...

Now, the question...

I generated another dump and, as expected, the most part of the usage is still in /org.apache.lucene.search.FieldCacheImpl. /Of course, the size is about 980MB (initially it was more than 2GB) which seems good (at least better than the initial situation). The most part of those 980MB are still occupied by sort fields

What I'm not understanding is how sort fields are loaded in memory...
I mean, I read that in order to optimize sorting, SOLR needs to load all values of sort fields, ok, that's good. But why I see several WeakHashMaps that contains different Entry references with the same sort field (and its values)?

For example for title_sort (unique values are 1.432.000) I have two (different, is not the same reference) Entry objects with a

- key "title_sort"
- and a value (org.apache.lucene.search.FieldCache$StringIndex) which has a int array [1.432.000] and a String array with moreless the same size

So the memory usage (in this case) is doubled...are sort field values loaded in memory more than once? How many times?

Best and as usual, sorry for the long email
Andrea

Reply via email to