Re: Solr OOM Crashes / JVM tuning advice

Deepak Goel Wed, 11 Apr 2018 16:02:24 -0700

A few observations:

1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up
to 9+GB on 10th April (It steadily increases throughout the day)
2. The Old Gen GC is never able to reclaim any free memory




Deepak
"Please stop cruelty to Animals, help by becoming a Vegan"
+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller <
aharrison-ful...@mintel.com> wrote:

> In addition, here is the GC log leading up to the crash.
>
> https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_
> 20180410_1009.zip?dl=0
>
> Thanks!
>
> Adam
>
> On 11 April 2018 at 16:18, Adam Harrison-Fuller <
> aharrison-ful...@mintel.com
> > wrote:
>
> > Thanks for the advice so far.
> >
> > The directoryFactory is set to ${solr.directoryFactory:solr.
> NRTCachingDirectoryFactory}.
> >
> >
> > The servers workload is predominantly queries with updates taking place
> > once a day.  It seems the servers are more likely to go down whilst the
> > servers are indexing but not exclusively so.
> >
> > I'm having issues locating the actual out of memory exception.  I can
> tell
> > that it has ran out of memory as its called the oom_killer script which
> as
> > left a log file in the logs directory.  I cannot find the actual
> exception
> > in the solr.log or our solr_gc.log, any suggestions?
> >
> > Cheers,
> > Adam
> >
> >
> > On 11 April 2018 at 15:49, Walter Underwood <wun...@wunderwood.org>
> wrote:
> >
> >> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888.
> >> Also, I always use a start size the same as the max size, since servers
> >> will eventually grow to the max size. So:
> >>
> >> -Xmx12G -Xms12G
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <sujaybawas...@gmail.com>
> >> wrote:
> >> >
> >> > What is directory factory defined in solrconfig.xml? Your JVM heap
> >> should
> >> > be tuned up with respect to that.
> >> > How solr is being use,  is it more updates and less query or less
> >> updates
> >> > more queries?
> >> > What is OOM error? Is it frequent GC or Error 12?
> >> >
> >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller <
> >> > aharrison-ful...@mintel.com> wrote:
> >> >
> >> >> Hey Jesus,
> >> >>
> >> >> Thanks for the suggestions.  The Solr nodes have 4 CPUs assigned to
> >> them.
> >> >>
> >> >> Cheers!
> >> >> Adam
> >> >>
> >> >> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com>
> >> wrote:
> >> >>
> >> >>> Hi Adam,
> >> >>>
> >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical
> >> RAM,
> >> >>> your JVM can afford more RAM without threading penalties due to
> >> outside
> >> >>> heap RAM lacks.
> >> >>>
> >> >>> Another good one would be to increase -XX:CMSInitiatingOccupancyFrac
> >> tion
> >> >>> =50
> >> >>> to 75. I think that CMS collector works better when Old generation
> >> space
> >> >> is
> >> >>> more populated.
> >> >>>
> >> >>> I usually use to set Survivor spaces to lesser size. If you want to
> >> try
> >> >>> SurvivorRatio to 6, i think performance would be improved.
> >> >>>
> >> >>> Another good practice for me would be to set an static NewSize
> instead
> >> >>> of -XX:NewRatio=3.
> >> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb
> (one
> >> >> third
> >> >>> of total heap space is recommended).
> >> >>>
> >> >>> Finally, my best results after a deep JVM I+D related to Solr, came
> >> >>> removing ScavengeBeforeRemark flag and applying this new one: +
> >> >>> ParGCCardsPerStrideChunk.
> >> >>>
> >> >>> However, It would be a good one to set ParallelGCThreads and
> >> >>> *ConcGCThreads *to their optimal value, and we need you system CPU
> >> number
> >> >>> to know it. Can you provide this data, please?
> >> >>>
> >> >>> Regards
> >> >>>
> >> >>>
> >> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller <
> >> >>> aharrison-ful...@mintel.com
> >> >>>> :
> >> >>>
> >> >>>> Hey all,
> >> >>>>
> >> >>>> I was wondering if I could get some JVM/GC tuning advice to resolve
> >> an
> >> >>>> issue that we are experiencing.
> >> >>>>
> >> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you
> >> can
> >> >>>> render would be greatly appreciated.
> >> >>>>
> >> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions
> under
> >> >>> load.
> >> >>>> This issue has only started manifesting itself over the last few
> >> months
> >> >>>> during which time the only change I can discern is an increase in
> >> index
> >> >>>> size.  They are running Solr 5.5.2 on OpenJDK version "1.8.0_101".
> >> The
> >> >>>> index is currently 58G and the server has 46G of physical RAM and
> >> runs
> >> >>>> nothing other than the Solr node.
> >> >>>>
> >> >>>> The JVM is invoked with the following JVM options:
> >> >>>> -XX:CMSInitiatingOccupancyFraction=50
> -XX:CMSMaxAbortablePrecleanTim
> >> e=
> >> >>> 6000
> >> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
> >> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888
> >> >>> -XX:+ManagementServer
> >> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8
> >> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16
> >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000
> >> >>>> /data/gnpd/solr/logs
> >> >>>> -XX:ParallelGCThreads=4
> >> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864
> >> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> >> -XX:+PrintGCDateStamps
> >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
> >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
> >> >>>> -XX:TargetSurvivorRatio=90
> >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
> >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> >> >>>>
> >> >>>> These values were decided upon serveral years by a colleague based
> >> upon
> >> >>>> some suggestions from this mailing group with an index size ~25G.
> >> >>>>
> >> >>>> I have imported the GC logs into GCViewer and attached a link to a
> >> >>>> screenshot showing the lead up to a OOM crash.  Interestingly the
> >> young
> >> >>>> generation space is almost empty before the repeated GC's and
> >> >> subsequent
> >> >>>> crash.
> >> >>>> https://imgur.com/a/Wtlez
> >> >>>>
> >> >>>> I was considering slowly increasing the amount of heap available to
> >> the
> >> >>> JVM
> >> >>>> slowly until the crashes, any other suggestions?  I'm looking at
> >> trying
> >> >>> to
> >> >>>> get the nodes stable without having issues with the GC taking
> forever
> >> >> to
> >> >>>> run.
> >> >>>>
> >> >>>> Additional information can be provided on request.
> >> >>>>
> >> >>>> Cheers!
> >> >>>> Adam
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >> >>>> Registered in
> >> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72
> >> >>>>
> >> >>>> Contact details for
> >> >>>> our other offices can be found at http://www.mintel.com/office-
> >> >> locations
> >> >>>> <http://www.mintel.com/office-locations>.
> >> >>>>
> >> >>>> This email and any attachments
> >> >>>> may include content that is confidential, privileged
> >> >>>> or otherwise
> >> >>>> protected under applicable law. Unauthorised disclosure, copying,
> >> >>>> distribution
> >> >>>> or use of the contents is prohibited and may be unlawful. If
> >> >>>> you have received this email in error,
> >> >>>> including without appropriate
> >> >>>> authorisation, then please reply to the sender about the error
> >> >>>> and delete
> >> >>>> this email and any attachments.
> >> >>>>
> >> >>>>
> >> >>>
> >> >>
> >> >> --
> >> >>
> >> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> >> >> Registered in
> >> >> England: Number 1475918. | VAT Number: GB 232 9342 72
> >> >>
> >> >> Contact details for
> >> >> our other offices can be found at http://www.mintel.com/office-l
> >> ocations
> >> >> <http://www.mintel.com/office-locations>.
> >> >>
> >> >> This email and any attachments
> >> >> may include content that is confidential, privileged
> >> >> or otherwise
> >> >> protected under applicable law. Unauthorised disclosure, copying,
> >> >> distribution
> >> >> or use of the contents is prohibited and may be unlawful. If
> >> >> you have received this email in error,
> >> >> including without appropriate
> >> >> authorisation, then please reply to the sender about the error
> >> >> and delete
> >> >> this email and any attachments.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > Sujay P Bawaskar
> >> > M:+91-77091 53669
> >>
> >>
> >
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> <http://www.mintel.com/office-locations>.
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>
>

Re: Solr OOM Crashes / JVM tuning advice

Reply via email to