On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote: > I was wondering if I could get some JVM/GC tuning advice to resolve an > issue that we are experiencing. > > Full disclaimer, I am in no way a JVM/Solr expert so any advice you can > render would be greatly appreciated. > > Our Solr cloud nodes are having issues throwing OOM exceptions under load. > This issue has only started manifesting itself over the last few months > during which time the only change I can discern is an increase in index > size. They are running Solr 5.5.2 on OpenJDK version "1.8.0_101". The > index is currently 58G and the server has 46G of physical RAM and runs > nothing other than the Solr node.
The advice I see about tuning your garbage collection won't help you. GC tuning can do absolutely nothing about OutOfMemoryError problems. Better tuning might *delay* the OOM, but it can't prevent it. You need to figure out exactly what resource is running out. Hopefully one of the solr logfiles will have the actual OutOfMemoryError exception information. It might not be the heap. Once you know what resource is running out and causing the OOM, then we can look deeper. A side note: The OOM is not *technically* causing a crash, even though that might be the visible behavior. When Solr is started on a non-windows system with the included scripts, it runs with a parameter that calls a script on OOM. That script *very intentionally* kills Solr. This is done because program operation when OOM hits is unpredictable, and there's a decent chance that if it keeps running, your index will get corrupted. That could happen anyway, but with quick action to kill the program, it's less likely. > The JVM is invoked with the following JVM options: > -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer > -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8 > -XX:NewRatio=3 -XX:OldPLABSize=16 > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 /data/gnpd/solr/logs > -XX:ParallelGCThreads=4 > -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 > -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > -XX:TargetSurvivorRatio=90 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC Solr 5.5.2 includes GC tuning options in its default configuration. Unless you'd like to switch to G1, you might want to let Solr's start script handle that for you instead of overriding the options. The defaults are substantially similar to what you have defined. > I have imported the GC logs into GCViewer and attached a link to a > screenshot showing the lead up to a OOM crash. Interestingly the young > generation space is almost empty before the repeated GC's and subsequent > crash. > https://imgur.com/a/Wtlez Can you share the actual GC logfile? You'll need to use a file sharing site to do that, attachments almost never work on the mailing list. The info in the summary to the right of the graph seems to support your contention that there is plenty of heap, so the OutOfMemoryError is probably not related to heap memory. You're going to have to look at your logfiles to see what the root cause is. Thanks, Shawn