On 2/5/2013 12:51 PM, sausarkar wrote:
We have a 96GB ram machine with 16 processors. the JVM is set to use 60 GB.
The test that we are running are purely query there is no indexing going on.
I dont see garbage collection when I attach visualVM but see frequent CPU
spikes ~once every minute.
A previous message from you indicates that your index is 12GB. I agree
with Erick that this is not very large. The pauses that you have
described sound a lot like stop-the-world garbage collection. I've seen
very long pauses on an 8GB heap ... I don't even want to think about
what could happen on 60GB.
Do you really need a 60GB heap? My dev server handles seven index
shards with a 7GB heap and 16GB total RAM. On 4.1 the total index size
is is over 100GB. On 4.2-SNAPSHOT the total index size is about 83GB.
Query performance isn't stellar, but it works perfectly. My production
servers (running 3.5) have tons of RAM and each one only gets half the
index, but they only run with the heap at 8GB. My queries are pretty
low volume and not HUGELY complex. Median query time is about 26
milliseconds and 95th percentile is about 950 milliseconds.
Looking at the GC stats in jconsole/jvisualvm, I didn't think I had a GC
pause problem, but I was proven wrong when I started correlating all the
various logs in my system to load balancer "DOWN" incidents. I saw a
pause of 12 seconds once in the GC log - on an 8GB heap.
I was introduced to a very cool program that tracks any kind of pause
that's caused by factors outside the Java program, like GC pauses in the
JVM or something happening in the OS. This is much easier to interpret
than Java's GC logging, and you can get a nice graph from the data.
http://www.azulsystems.com/jHiccup
Using jHiccup, I was able to do a little bit of comparison between
different runs. That helped me find some GC tuning parameters that have
almost gotten rid of my GC pause problem. I'm constantly working on
those parameters. The current values are:
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:NewRatio=3
-XX:MaxTenuringThreshold=8
-XX:+CMSParallelRemarkEnabled
-XX:+ParallelRefProcEnabled
-XX:+UseLargePages
-XX:+AggressiveOpts
The Xing JVM (made by the company that created jHiccup) apparently has
extremely low GC pause characteristics even with giant heaps like yours.
I'm not using it, and I don't know how much it costs.
Thanks,
Shawn