On 2/5/2013 12:51 PM, sausarkar wrote:
We have a 96GB ram machine with 16 processors. the JVM is set to use 60 GB.
The test that we are running are purely query there is no indexing going on.
I dont see garbage collection when I attach visualVM but see frequent CPU
spikes ~once every minute.

A previous message from you indicates that your index is 12GB. I agree with Erick that this is not very large. The pauses that you have described sound a lot like stop-the-world garbage collection. I've seen very long pauses on an 8GB heap ... I don't even want to think about what could happen on 60GB.

Do you really need a 60GB heap? My dev server handles seven index shards with a 7GB heap and 16GB total RAM. On 4.1 the total index size is is over 100GB. On 4.2-SNAPSHOT the total index size is about 83GB. Query performance isn't stellar, but it works perfectly. My production servers (running 3.5) have tons of RAM and each one only gets half the index, but they only run with the heap at 8GB. My queries are pretty low volume and not HUGELY complex. Median query time is about 26 milliseconds and 95th percentile is about 950 milliseconds.

Looking at the GC stats in jconsole/jvisualvm, I didn't think I had a GC pause problem, but I was proven wrong when I started correlating all the various logs in my system to load balancer "DOWN" incidents. I saw a pause of 12 seconds once in the GC log - on an 8GB heap.

I was introduced to a very cool program that tracks any kind of pause that's caused by factors outside the Java program, like GC pauses in the JVM or something happening in the OS. This is much easier to interpret than Java's GC logging, and you can get a nice graph from the data.

http://www.azulsystems.com/jHiccup

Using jHiccup, I was able to do a little bit of comparison between different runs. That helped me find some GC tuning parameters that have almost gotten rid of my GC pause problem. I'm constantly working on those parameters. The current values are:

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:NewRatio=3
-XX:MaxTenuringThreshold=8
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+UseLargePages
-XX:+AggressiveOpts

The Xing JVM (made by the company that created jHiccup) apparently has extremely low GC pause characteristics even with giant heaps like yours. I'm not using it, and I don't know how much it costs.

Thanks,
Shawn

Reply via email to