A few observations: 1. The Old Gen Heap on 9th April is about 6GB occupied which then runs up to 9+GB on 10th April (It steadily increases throughout the day) 2. The Old Gen GC is never able to reclaim any free memory
Deepak "Please stop cruelty to Animals, help by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" On Wed, Apr 11, 2018 at 8:53 PM, Adam Harrison-Fuller < aharrison-ful...@mintel.com> wrote: > In addition, here is the GC log leading up to the crash. > > https://www.dropbox.com/s/sq09d6hbss9b5ov/solr_gc_log_ > 20180410_1009.zip?dl=0 > > Thanks! > > Adam > > On 11 April 2018 at 16:18, Adam Harrison-Fuller < > aharrison-ful...@mintel.com > > wrote: > > > Thanks for the advice so far. > > > > The directoryFactory is set to ${solr.directoryFactory:solr. > NRTCachingDirectoryFactory}. > > > > > > The servers workload is predominantly queries with updates taking place > > once a day. It seems the servers are more likely to go down whilst the > > servers are indexing but not exclusively so. > > > > I'm having issues locating the actual out of memory exception. I can > tell > > that it has ran out of memory as its called the oom_killer script which > as > > left a log file in the logs directory. I cannot find the actual > exception > > in the solr.log or our solr_gc.log, any suggestions? > > > > Cheers, > > Adam > > > > > > On 11 April 2018 at 15:49, Walter Underwood <wun...@wunderwood.org> > wrote: > > > >> For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. > >> Also, I always use a start size the same as the max size, since servers > >> will eventually grow to the max size. So: > >> > >> -Xmx12G -Xms12G > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >> > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <sujaybawas...@gmail.com> > >> wrote: > >> > > >> > What is directory factory defined in solrconfig.xml? Your JVM heap > >> should > >> > be tuned up with respect to that. > >> > How solr is being use, is it more updates and less query or less > >> updates > >> > more queries? > >> > What is OOM error? Is it frequent GC or Error 12? > >> > > >> > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller < > >> > aharrison-ful...@mintel.com> wrote: > >> > > >> >> Hey Jesus, > >> >> > >> >> Thanks for the suggestions. The Solr nodes have 4 CPUs assigned to > >> them. > >> >> > >> >> Cheers! > >> >> Adam > >> >> > >> >> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com> > >> wrote: > >> >> > >> >>> Hi Adam, > >> >>> > >> >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical > >> RAM, > >> >>> your JVM can afford more RAM without threading penalties due to > >> outside > >> >>> heap RAM lacks. > >> >>> > >> >>> Another good one would be to increase -XX:CMSInitiatingOccupancyFrac > >> tion > >> >>> =50 > >> >>> to 75. I think that CMS collector works better when Old generation > >> space > >> >> is > >> >>> more populated. > >> >>> > >> >>> I usually use to set Survivor spaces to lesser size. If you want to > >> try > >> >>> SurvivorRatio to 6, i think performance would be improved. > >> >>> > >> >>> Another good practice for me would be to set an static NewSize > instead > >> >>> of -XX:NewRatio=3. > >> >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb > (one > >> >> third > >> >>> of total heap space is recommended). > >> >>> > >> >>> Finally, my best results after a deep JVM I+D related to Solr, came > >> >>> removing ScavengeBeforeRemark flag and applying this new one: + > >> >>> ParGCCardsPerStrideChunk. > >> >>> > >> >>> However, It would be a good one to set ParallelGCThreads and > >> >>> *ConcGCThreads *to their optimal value, and we need you system CPU > >> number > >> >>> to know it. Can you provide this data, please? > >> >>> > >> >>> Regards > >> >>> > >> >>> > >> >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller < > >> >>> aharrison-ful...@mintel.com > >> >>>> : > >> >>> > >> >>>> Hey all, > >> >>>> > >> >>>> I was wondering if I could get some JVM/GC tuning advice to resolve > >> an > >> >>>> issue that we are experiencing. > >> >>>> > >> >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you > >> can > >> >>>> render would be greatly appreciated. > >> >>>> > >> >>>> Our Solr cloud nodes are having issues throwing OOM exceptions > under > >> >>> load. > >> >>>> This issue has only started manifesting itself over the last few > >> months > >> >>>> during which time the only change I can discern is an increase in > >> index > >> >>>> size. They are running Solr 5.5.2 on OpenJDK version "1.8.0_101". > >> The > >> >>>> index is currently 58G and the server has 46G of physical RAM and > >> runs > >> >>>> nothing other than the Solr node. > >> >>>> > >> >>>> The JVM is invoked with the following JVM options: > >> >>>> -XX:CMSInitiatingOccupancyFraction=50 > -XX:CMSMaxAbortablePrecleanTim > >> e= > >> >>> 6000 > >> >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > >> >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 > >> >>> -XX:+ManagementServer > >> >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8 > >> >>>> -XX:NewRatio=3 -XX:OldPLABSize=16 > >> >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 > >> >>>> /data/gnpd/solr/logs > >> >>>> -XX:ParallelGCThreads=4 > >> >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 > >> >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime > >> -XX:+PrintGCDateStamps > >> >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > >> >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > >> >>>> -XX:TargetSurvivorRatio=90 > >> >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > >> >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > >> >>>> > >> >>>> These values were decided upon serveral years by a colleague based > >> upon > >> >>>> some suggestions from this mailing group with an index size ~25G. > >> >>>> > >> >>>> I have imported the GC logs into GCViewer and attached a link to a > >> >>>> screenshot showing the lead up to a OOM crash. Interestingly the > >> young > >> >>>> generation space is almost empty before the repeated GC's and > >> >> subsequent > >> >>>> crash. > >> >>>> https://imgur.com/a/Wtlez > >> >>>> > >> >>>> I was considering slowly increasing the amount of heap available to > >> the > >> >>> JVM > >> >>>> slowly until the crashes, any other suggestions? I'm looking at > >> trying > >> >>> to > >> >>>> get the nodes stable without having issues with the GC taking > forever > >> >> to > >> >>>> run. > >> >>>> > >> >>>> Additional information can be provided on request. > >> >>>> > >> >>>> Cheers! > >> >>>> Adam > >> >>>> > >> >>>> -- > >> >>>> > >> >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > >> >>>> Registered in > >> >>>> England: Number 1475918. | VAT Number: GB 232 9342 72 > >> >>>> > >> >>>> Contact details for > >> >>>> our other offices can be found at http://www.mintel.com/office- > >> >> locations > >> >>>> <http://www.mintel.com/office-locations>. > >> >>>> > >> >>>> This email and any attachments > >> >>>> may include content that is confidential, privileged > >> >>>> or otherwise > >> >>>> protected under applicable law. Unauthorised disclosure, copying, > >> >>>> distribution > >> >>>> or use of the contents is prohibited and may be unlawful. If > >> >>>> you have received this email in error, > >> >>>> including without appropriate > >> >>>> authorisation, then please reply to the sender about the error > >> >>>> and delete > >> >>>> this email and any attachments. > >> >>>> > >> >>>> > >> >>> > >> >> > >> >> -- > >> >> > >> >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > >> >> Registered in > >> >> England: Number 1475918. | VAT Number: GB 232 9342 72 > >> >> > >> >> Contact details for > >> >> our other offices can be found at http://www.mintel.com/office-l > >> ocations > >> >> <http://www.mintel.com/office-locations>. > >> >> > >> >> This email and any attachments > >> >> may include content that is confidential, privileged > >> >> or otherwise > >> >> protected under applicable law. Unauthorised disclosure, copying, > >> >> distribution > >> >> or use of the contents is prohibited and may be unlawful. If > >> >> you have received this email in error, > >> >> including without appropriate > >> >> authorisation, then please reply to the sender about the error > >> >> and delete > >> >> this email and any attachments. > >> >> > >> >> > >> > > >> > > >> > -- > >> > Thanks, > >> > Sujay P Bawaskar > >> > M:+91-77091 53669 > >> > >> > > > > -- > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > Registered in > England: Number 1475918. | VAT Number: GB 232 9342 72 > > Contact details for > our other offices can be found at http://www.mintel.com/office-locations > <http://www.mintel.com/office-locations>. > > This email and any attachments > may include content that is confidential, privileged > or otherwise > protected under applicable law. Unauthorised disclosure, copying, > distribution > or use of the contents is prohibited and may be unlawful. If > you have received this email in error, > including without appropriate > authorisation, then please reply to the sender about the error > and delete > this email and any attachments. > >