Thanks for the advice so far. The directoryFactory is set to ${solr.directoryFactory:solr.NRTCachingDirectoryFactory}.
The servers workload is predominantly queries with updates taking place once a day. It seems the servers are more likely to go down whilst the servers are indexing but not exclusively so. I'm having issues locating the actual out of memory exception. I can tell that it has ran out of memory as its called the oom_killer script which as left a log file in the logs directory. I cannot find the actual exception in the solr.log or our solr_gc.log, any suggestions? Cheers, Adam On 11 April 2018 at 15:49, Walter Underwood <wun...@wunderwood.org> wrote: > For readability, I’d use -Xmx12G instead of -XX:MaxHeapSize=12884901888. > Also, I always use a start size the same as the max size, since servers > will eventually grow to the max size. So: > > -Xmx12G -Xms12G > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Apr 11, 2018, at 6:29 AM, Sujay Bawaskar <sujaybawas...@gmail.com> > wrote: > > > > What is directory factory defined in solrconfig.xml? Your JVM heap should > > be tuned up with respect to that. > > How solr is being use, is it more updates and less query or less updates > > more queries? > > What is OOM error? Is it frequent GC or Error 12? > > > > On Wed, Apr 11, 2018 at 6:05 PM, Adam Harrison-Fuller < > > aharrison-ful...@mintel.com> wrote: > > > >> Hey Jesus, > >> > >> Thanks for the suggestions. The Solr nodes have 4 CPUs assigned to > them. > >> > >> Cheers! > >> Adam > >> > >> On 11 April 2018 at 11:22, Jesus Olivan <jesus.oli...@letgo.com> wrote: > >> > >>> Hi Adam, > >>> > >>> IMHO you could try increasing heap to 20 Gb (with 46 Gb of physical > RAM, > >>> your JVM can afford more RAM without threading penalties due to outside > >>> heap RAM lacks. > >>> > >>> Another good one would be to increase -XX: > CMSInitiatingOccupancyFraction > >>> =50 > >>> to 75. I think that CMS collector works better when Old generation > space > >> is > >>> more populated. > >>> > >>> I usually use to set Survivor spaces to lesser size. If you want to try > >>> SurvivorRatio to 6, i think performance would be improved. > >>> > >>> Another good practice for me would be to set an static NewSize instead > >>> of -XX:NewRatio=3. > >>> You could try to set -XX:NewSize=7000m and -XX:MaxNewSize=7000Mb (one > >> third > >>> of total heap space is recommended). > >>> > >>> Finally, my best results after a deep JVM I+D related to Solr, came > >>> removing ScavengeBeforeRemark flag and applying this new one: + > >>> ParGCCardsPerStrideChunk. > >>> > >>> However, It would be a good one to set ParallelGCThreads and > >>> *ConcGCThreads *to their optimal value, and we need you system CPU > number > >>> to know it. Can you provide this data, please? > >>> > >>> Regards > >>> > >>> > >>> 2018-04-11 12:01 GMT+02:00 Adam Harrison-Fuller < > >>> aharrison-ful...@mintel.com > >>>> : > >>> > >>>> Hey all, > >>>> > >>>> I was wondering if I could get some JVM/GC tuning advice to resolve an > >>>> issue that we are experiencing. > >>>> > >>>> Full disclaimer, I am in no way a JVM/Solr expert so any advice you > can > >>>> render would be greatly appreciated. > >>>> > >>>> Our Solr cloud nodes are having issues throwing OOM exceptions under > >>> load. > >>>> This issue has only started manifesting itself over the last few > months > >>>> during which time the only change I can discern is an increase in > index > >>>> size. They are running Solr 5.5.2 on OpenJDK version "1.8.0_101". > The > >>>> index is currently 58G and the server has 46G of physical RAM and runs > >>>> nothing other than the Solr node. > >>>> > >>>> The JVM is invoked with the following JVM options: > >>>> -XX:CMSInitiatingOccupancyFraction=50 -XX: > CMSMaxAbortablePrecleanTime= > >>> 6000 > >>>> -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark > >>>> -XX:ConcGCThreads=4 -XX:InitialHeapSize=12884901888 > >>> -XX:+ManagementServer > >>>> -XX:MaxHeapSize=12884901888 -XX:MaxTenuringThreshold=8 > >>>> -XX:NewRatio=3 -XX:OldPLABSize=16 > >>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 30000 > >>>> /data/gnpd/solr/logs > >>>> -XX:ParallelGCThreads=4 > >>>> -XX:+ParallelRefProcEnabled -XX:PretenureSizeThreshold=67108864 > >>>> -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCDateStamps > >>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC > >>>> -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4 > >>>> -XX:TargetSurvivorRatio=90 > >>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers > >>>> -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC > >>>> > >>>> These values were decided upon serveral years by a colleague based > upon > >>>> some suggestions from this mailing group with an index size ~25G. > >>>> > >>>> I have imported the GC logs into GCViewer and attached a link to a > >>>> screenshot showing the lead up to a OOM crash. Interestingly the > young > >>>> generation space is almost empty before the repeated GC's and > >> subsequent > >>>> crash. > >>>> https://imgur.com/a/Wtlez > >>>> > >>>> I was considering slowly increasing the amount of heap available to > the > >>> JVM > >>>> slowly until the crashes, any other suggestions? I'm looking at > trying > >>> to > >>>> get the nodes stable without having issues with the GC taking forever > >> to > >>>> run. > >>>> > >>>> Additional information can be provided on request. > >>>> > >>>> Cheers! > >>>> Adam > >>>> > >>>> -- > >>>> > >>>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > >>>> Registered in > >>>> England: Number 1475918. | VAT Number: GB 232 9342 72 > >>>> > >>>> Contact details for > >>>> our other offices can be found at http://www.mintel.com/office- > >> locations > >>>> <http://www.mintel.com/office-locations>. > >>>> > >>>> This email and any attachments > >>>> may include content that is confidential, privileged > >>>> or otherwise > >>>> protected under applicable law. Unauthorised disclosure, copying, > >>>> distribution > >>>> or use of the contents is prohibited and may be unlawful. If > >>>> you have received this email in error, > >>>> including without appropriate > >>>> authorisation, then please reply to the sender about the error > >>>> and delete > >>>> this email and any attachments. > >>>> > >>>> > >>> > >> > >> -- > >> > >> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN > >> Registered in > >> England: Number 1475918. | VAT Number: GB 232 9342 72 > >> > >> Contact details for > >> our other offices can be found at http://www.mintel.com/office- > locations > >> <http://www.mintel.com/office-locations>. > >> > >> This email and any attachments > >> may include content that is confidential, privileged > >> or otherwise > >> protected under applicable law. Unauthorised disclosure, copying, > >> distribution > >> or use of the contents is prohibited and may be unlawful. If > >> you have received this email in error, > >> including without appropriate > >> authorisation, then please reply to the sender about the error > >> and delete > >> this email and any attachments. > >> > >> > > > > > > -- > > Thanks, > > Sujay P Bawaskar > > M:+91-77091 53669 > > -- Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN Registered in England: Number 1475918. | VAT Number: GB 232 9342 72 Contact details for our other offices can be found at http://www.mintel.com/office-locations <http://www.mintel.com/office-locations>. This email and any attachments may include content that is confidential, privileged or otherwise protected under applicable law. Unauthorised disclosure, copying, distribution or use of the contents is prohibited and may be unlawful. If you have received this email in error, including without appropriate authorisation, then please reply to the sender about the error and delete this email and any attachments.