On 1/25/2013 4:49 AM, Harish Verma wrote:
we are testing solr 4.1 running inside tomcat 7 and java 7 with  following
options

JAVA_OPTS="-Xms256m -Xmx2048m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode -XX:+ParallelRefProcEnabled
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/ubuntu/OOM_HeapDump"

our source code looks like following:
/**** START *****/
int noOfSolrDocumentsInBatch = 0;
for(int i=0 ; i<5000 ; i++) {
     SolrInputDocument solrInputDocument = getNextSolrInputDocument();
     server.add(solrInputDocument);
     noOfSolrDocumentsInBatch += 1;
     if(noOfSolrDocumentsInBatch == 10) {
         server.commit();
         noOfSolrDocumentsInBatch = 0;
     }
}
/**** END *****/

the method "getNextSolrInputDocument()" generates a solr document with 100
fields (average). Around 50 of the fields are of "text_general" type.
Some of the "test_general" fields consist of approx 1000 words rest
consists of few words. Ouf of total fields there are around 35-40
multivalued fields (not of type "text_general").
We are indexing all the fields but storing only 8 fields. Out of these 8
fields two are string type, five are long and one is boolean. So our index
size is only 394 MB. But the RAM occupied at time of OOM is around 2.5 GB.
Why the memory is so high even though the index size is small?
What is being stored in the memory? Our understanding is that after every
commit documents are flushed to the disk.So nothing should remain in RAM
after commit.

We are using the following settings:

server.commit() set waitForSearcher=true and waitForFlush=true
solrConfig.xml has following properties set:
directoryFactory = solr.MMapDirectoryFactory
maxWarmingSearchers = 1
text_general data type is being used as supplied in the schema.xml with the
solr setup.
maxIndexingThreads = 8(default)
<autoCommit><maxTime>15000</maxTime><openSearcher>false</openSearcher></autoCommit>

We get Java heap Out Of Memory Error after commiting around 3990 solr
documents.Some of the snapshots of memory dump from profiler are uploaded
at following links.
http://s9.postimage.org/w7589t9e7/memorydump1.png
http://s7.postimage.org/p3abs6nuj/memorydump2.png

can somebody please suggest what should we do to minimize/optimize the
memory consumption in our case with the reasons?
also suggest what should be optimal values and reason for following
parameters of solrConfig.xml
useColdSearcher - true/false?
     maxwarmingsearchers- number
     spellcheck-on/off?
     omitNorms=true/false?
     omitTermFreqAndPositions?
     mergefactor? we are using default value 10
     java garbage collection tuning parameters ?

Additional information is needed. What OS platform? Is the OS 64-bit? Is Java 64-bit? How much total RAM? We'll need your solrconfig.xml file, in particular the query and indexConfig sections. Use your favorite paste site (pastie.org, pastebin.com for example) to link the solrconfig.xml file.

General thoughts without the above information:

You are reserving half of your java heap for the permanent generation. I have a solr installation where Java has a max heap of 8GB, about 5GB of that is currently committed - actually allocated at the OS level. My perm gen space is 65908KB. This server handles a total index size of nearly 70GB. I doubt you need 1GB for your perm gen size.

A 2GB heap is fairly small in the Solr world. If you are using a 32 bit java, that's the biggest heap you can create, so 64-bit on both Java and OS is the way to go. You can reduce memory requirements a small amount by using Jetty instead of Tomcat, but the difference is probably not big enough to really matter.

For the questions you asked at the end, most of them are personal preference, but maxWarmingSearchers should normally be kept low. I think I have a value of 2 in my config. Here are the GC tuning parameters that I am currently testing. I have been having problems with long GC pauses that I am trying to fix:

-Xms1024M
-Xmx8192M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:NewRatio=3
-XX:MaxTenuringThreshold=8
-XX:+CMSParallelRemarkEnable

You should only use CMSIncrementalMode if you only have one or two processor cores. My reading has suggested that when you have more, it is not beneficial.

So far my GC parameters seem to be working really well, but I need to do a full reindex which should force usage of the entire 8GB heap and push garbage collection to its limits.

I have a question of my own for someone familiar with the code. Does Solr extensively use weak references? If so, ParallelRefProcEnabled might be a win.

Thanks,
Shawn

Reply via email to