We have noticed that when the first query hits Solr after starting it up, 
memory use increases significantly, from about 1GB to about 16GB, and then as 
queries are received it goes up to about 19GB at which point there is a Full 
Garbage Collection which takes about 30 seconds and then memory use drops back 
down to 16GB.  Under a relatively heavy load, the full GC happens about every 
10-20 minutes.

 We are running 3 Solr shards under one Tomcat with 20GB allocated to the jvm.  
Each shard has a total index size of about 400GB on and a tii size of about 
600MB and indexes about 650,000 full-text books. (The server has a total of 
72GB of memory, so we are leaving quite a bit of memory for the OS disk cache).

Is there some argument we could give the jvm so that it would collect garbage 
more frequently? Or some other JVM tuning action that might reduce the amount 
of time where Solr is waiting on GC?

If we could get the time for each GC to take under a second, with the trade-off 
being that GC  would occur much more frequently, that would help us avoid the 
occasional query taking more than 30 seconds at the cost of a larger number of 
queries taking at least a second.


Tom Burton-West
http://www.hathitrust.org/blogs/large-scale-search


Reply via email to