Just as a side note, when Solr goes OOM and kills itself, and if you're running HDFS, you are guaranteed to have write.lock files left over.  If you're running lots of shards/replicas, you may have many files that you need to go into HDFS and delete before restarting.

-Joe


On 4/11/2018 10:46 AM, Shawn Heisey wrote:
On 4/11/2018 4:01 AM, Adam Harrison-Fuller wrote:
I was wondering if I could get some JVM/GC tuning advice to resolve an
issue that we are experiencing.

Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
render would be greatly appreciated.

Our Solr cloud nodes are having issues throwing OOM exceptions under load.
This issue has only started manifesting itself over the last few months
during which time the only change I can discern is an increase in index
size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
index is currently 58G and the server has 46G of physical RAM and runs
nothing other than the Solr node.
The advice I see about tuning your garbage collection won't help you.
GC tuning can do absolutely nothing about OutOfMemoryError problems.
Better tuning might *delay* the OOM, but it can't prevent it.

You need to figure out exactly what resource is running out.  Hopefully
one of the solr logfiles will have the actual OutOfMemoryError exception
information.  It might not be the heap.

Once you know what resource is running out and causing the OOM, then we
can look deeper.

A side note:  The OOM is not *technically* causing a crash, even though
that might be the visible behavior.  When Solr is started on a
non-windows system with the included scripts, it runs with a parameter
that calls a script on OOM. That script *very intentionally* kills
Solr.  This is done because program operation when OOM hits is
unpredictable, and there's a decent chance that if it keeps running,
your index will get corrupted.  That could happen anyway, but with quick
action to kill the program, it's less likely.

The JVM is invoked with the following JVM options:
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
-XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
-XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 -XX:+ManagementServer
-XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
-XX:NewRatio=3 -XX:OldPLABSize=16
-XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 /data/gnpd/solr/logs
-XX:ParallelGCThreads=4
-XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
-XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
-XX:TargetSurvivorRatio=90
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Solr 5.5.2 includes GC tuning options in its default configuration.
Unless you'd like to switch to G1, you might want to let Solr's start
script handle that for you instead of overriding the options.  The
defaults are substantially similar to what you have defined.

I have imported the GC logs into GCViewer and attached a link to a
screenshot showing the lead up to a OOM crash.  Interestingly the young
generation space is almost empty before the repeated GC's and subsequent
crash.
https://imgur.com/a/Wtlez
Can you share the actual GC logfile?  You'll need to use a file sharing
site to do that, attachments almost never work on the mailing list.

The info in the summary to the right of the graph seems to support your
contention that there is plenty of heap, so the OutOfMemoryError is
probably not related to heap memory.  You're going to have to look at
your logfiles to see what the root cause is.

Thanks,
Shawn


---
This email has been checked for viruses by AVG.
http://www.avg.com


Reply via email to