Re: Solr OOM Crashes / JVM tuning advice

Walter Underwood Wed, 11 Apr 2018 08:36:51 -0700

One other note on the JVM options, even though those aren’t the cause of the 
problem.


Don’t run four GC threads when you have four processors. That can use 100% of 
CPU just doing GC.

With four processors, I’d run one thread.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 11, 2018, at 7:49 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. 
> Also, I always use a start size the same as the max size, since servers will 
> eventually grow to the max size. So:
> 
> -Xmx12G -Xms12G
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <sujaybawas...@gmail.com> wrote:
>> 
>> What is directory factory defined in solrconfig.xml? Your JVM heap should
>> be tuned up with respect to that.
>> How solr is being use,  is it more updates and less query or less updates
>> more queries?
>> What is OOM error? Is it frequent GC or Error 12?
>> 
>> On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
>> aharrison-ful...@mintel.com> wrote:
>> 
>>> Hey Jesus,
>>> 
>>> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to them.
>>> 
>>> Cheers!
>>> Adam
>>> 
>>> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com> wrote:
>>> 
>>>> Hi Adam,
>>>> 
>>>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical RAM,
>>>> your JVM can afford more RAM without threading penalties due to outside
>>>> heap RAM lacks.
>>>> 
>>>> Another good one would be to increase -XX:CMSInitiatingOccupancyFraction
>>>> =50
>>>> to 75. I think that CMS collector works better when Old generation space
>>> is
>>>> more populated.
>>>> 
>>>> I usually use to set Survivor spaces to lesser size. If you want to try
>>>> SurvivorRatio to 6, i think performance would be improved.
>>>> 
>>>> Another good practice for me would be to set an static NewSize instead
>>>> of -XX:NewRatio=3.
>>>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one
>>> third
>>>> of total heap space is recommended).
>>>> 
>>>> Finally, my best results after a deep JVM I+D related to Solr, came
>>>> removing ScavengeBeforeRemark flag and applying this new one: +
>>>> ParGCCardsPerStrideChunk.
>>>> 
>>>> However, It would be a good one to set ParallelGCThreads and
>>>> *ConcGCThreads *to their optimal value, and we need you system CPU number
>>>> to know it. Can you provide this data, please?
>>>> 
>>>> Regards
>>>> 
>>>> 
>>>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
>>>> aharrison-ful...@mintel.com
>>>>> :
>>>> 
>>>>> Hey all,
>>>>> 
>>>>> I was wondering if I could get some JVM/GC tuning advice to resolve an
>>>>> issue that we are experiencing.
>>>>> 
>>>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you can
>>>>> render would be greatly appreciated.
>>>>> 
>>>>> Our Solr cloud nodes are having issues throwing OOM exceptions under
>>>> load.
>>>>> This issue has only started manifesting itself over the last few months
>>>>> during which time the only change I can discern is an increase in index
>>>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".  The
>>>>> index is currently 58G and the server has 46G of physical RAM and runs
>>>>> nothing other than the Solr node.
>>>>> 
>>>>> The JVM is invoked with the following JVM options:
>>>>> -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=
>>>> 6000
>>>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>>>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
>>>> -XX:+ManagementServer
>>>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
>>>>> -XX:NewRatio=3 -XX:OldPLABSize=16
>>>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
>>>>> /data/gnpd/solr/logs
>>>>> -XX:ParallelGCThreads=4
>>>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
>>>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>>>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>>>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>>>>> -XX:TargetSurvivorRatio=90
>>>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>>>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>>>>> 
>>>>> These values were decided upon serveral years by a colleague based upon
>>>>> some suggestions from this mailing group with an index size ~25G.
>>>>> 
>>>>> I have imported the GC logs into GCViewer and attached a link to a
>>>>> screenshot showing the lead up to a OOM crash.  Interestingly the young
>>>>> generation space is almost empty before the repeated GC's and
>>> subsequent
>>>>> crash.
>>>>> https://imgur.com/a/Wtlez
>>>>> 
>>>>> I was considering slowly increasing the amount of heap available to the
>>>> JVM
>>>>> slowly until the crashes, any other suggestions?  I'm looking at trying
>>>> to
>>>>> get the nodes stable without having issues with the GC taking forever
>>> to
>>>>> run.
>>>>> 
>>>>> Additional information can be provided on request.
>>>>> 
>>>>> Cheers!
>>>>> Adam
>>>>> 
>>>>> --
>>>>> 
>>>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>>>> Registered in
>>>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>>>> 
>>>>> Contact details for
>>>>> our other offices can be found at http://www.mintel.com/office-
>>> locations
>>>>> <http://www.mintel.com/office-locations>.
>>>>> 
>>>>> This email and any attachments
>>>>> may include content that is confidential, privileged
>>>>> or otherwise
>>>>> protected under applicable law. Unauthorised disclosure, copying,
>>>>> distribution
>>>>> or use of the contents is prohibited and may be unlawful. If
>>>>> you have received this email in error,
>>>>> including without appropriate
>>>>> authorisation, then please reply to the sender about the error
>>>>> and delete
>>>>> this email and any attachments.
>>>>> 
>>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>>> Registered in
>>> England: Number 1475918. | VAT Number: GB 232 9342 72
>>> 
>>> Contact details for
>>> our other offices can be found at http://www.mintel.com/office-locations
>>> <http://www.mintel.com/office-locations>.
>>> 
>>> This email and any attachments
>>> may include content that is confidential, privileged
>>> or otherwise
>>> protected under applicable law. Unauthorised disclosure, copying,
>>> distribution
>>> or use of the contents is prohibited and may be unlawful. If
>>> you have received this email in error,
>>> including without appropriate
>>> authorisation, then please reply to the sender about the error
>>> and delete
>>> this email and any attachments.
>>> 
>>> 
>> 
>> 
>> -- 
>> Thanks,
>> Sujay P Bawaskar
>> M:+91-77091 53669
>

Re: Solr OOM Crashes / JVM tuning advice

Reply via email to