Not sure if it helps anyone, but I am seeing decent results with the following.
It was mostly a result of trial and error, I am not familiar with Java GC or even Java itself. I added my interpretation of what was happening, but I am not sure if it is right, take it for what it's worth. It'd be nice if someone could provide a better technical explanation. We are about to hit deaily peak load and so far it doesn't look like there is any negative performance impact. -XX:NewRatio=2 \ #Increases the size of the young generation -XX:SurvivorRatio=3 \ #Increases the size of the survivor spaces -XX:TargetSurvivorRatio=90 \ -XX:MaxTenuringThreshold=8 \ -XX:+UseConcMarkSweepGC \ -XX:+CMSScavengeBeforeRemark \ -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \ #Increasing these didn't really help -XX:PretenureSizeThreshold=512m \ # I am not sure what the full impact of this yet, but I am assuming it will put less stuff in the eden space -XX:CMSFullGCsBeforeCompaction=1 \ -XX:+UseCMSInitiatingOccupancyOnly \ -XX:CMSInitiatingOccupancyFraction=70 \ -XX:CMSMaxAbortablePrecleanTime=6000 \ -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages \ -XX:+AggressiveOpts \ Here are the GC times over 1 second. Before: 2016-04-26T04:30:53.175-0400: 244734.856: Total time for which application threads were stopped: 12.6587130 seconds, Stopping threads took: 0.0024770 seconds 2016-04-26T04:31:22.808-0400: 244764.489: Total time for which application threads were stopped: 10.6840330 seconds, Stopping threads took: 0.0004840 seconds 2016-04-26T04:31:48.586-0400: 244790.267: Total time for which application threads were stopped: 10.8198760 seconds, Stopping threads took: 0.0010340 seconds 2016-04-26T04:32:10.095-0400: 244811.777: Total time for which application threads were stopped: 9.5644690 seconds, Stopping threads took: 0.0006750 seconds 2016-04-26T04:32:32.600-0400: 244834.282: Total time for which application threads were stopped: 10.0890420 seconds, Stopping threads took: 0.0009930 seconds 2016-04-26T04:32:55.747-0400: 244857.429: Total time for which application threads were stopped: 10.3426480 seconds, Stopping threads took: 0.0008190 seconds 2016-04-26T04:33:20.522-0400: 244882.203: Total time for which application threads were stopped: 10.7531070 seconds, Stopping threads took: 0.0013280 seconds 2016-04-26T04:33:45.853-0400: 244907.535: Total time for which application threads were stopped: 10.3933700 seconds, Stopping threads took: 0.0013970 seconds 2016-04-26T04:34:15.634-0400: 244937.316: Total time for which application threads were stopped: 10.5744420 seconds, Stopping threads took: 0.0008980 seconds 2016-04-26T04:34:53.802-0400: 244975.484: Total time for which application threads were stopped: 10.4964470 seconds, Stopping threads took: 0.0013830 seconds 2016-04-26T04:35:19.276-0400: 245000.957: Total time for which application threads were stopped: 9.8195470 seconds, Stopping threads took: 0.0016110 seconds 2016-04-26T04:35:43.617-0400: 245025.299: Total time for which application threads were stopped: 9.4856600 seconds, Stopping threads took: 0.0014980 seconds 2016-04-26T04:36:06.540-0400: 245048.222: Total time for which application threads were stopped: 9.5009880 seconds, Stopping threads took: 0.0009080 seconds 2016-04-26T04:36:32.843-0400: 245074.525: Total time for which application threads were stopped: 9.6370000 seconds, Stopping threads took: 0.0011770 seconds 2016-04-26T04:36:57.114-0400: 245098.795: Total time for which application threads were stopped: 10.0064990 seconds, Stopping threads took: 0.0011480 seconds 2016-04-26T04:37:21.074-0400: 245122.755: Total time for which application threads were stopped: 9.7061140 seconds, Stopping threads took: 0.0009760 seconds 2016-04-26T04:37:45.716-0400: 245147.398: Total time for which application threads were stopped: 9.9133330 seconds, Stopping threads took: 0.0008220 seconds 2016-04-26T04:38:11.412-0400: 245173.094: Total time for which application threads were stopped: 10.6839560 seconds, Stopping threads took: 0.0015370 seconds 2016-04-26T04:38:37.177-0400: 245198.859: Total time for which application threads were stopped: 10.0646910 seconds, Stopping threads took: 0.0013740 seconds 2016-04-26T04:39:00.516-0400: 245222.197: Total time for which application threads were stopped: 9.8280250 seconds, Stopping threads took: 0.0008900 seconds 2016-04-26T04:39:25.255-0400: 245246.937: Total time for which application threads were stopped: 10.8429080 seconds, Stopping threads took: 0.0007120 seconds 2016-04-26T04:41:06.937-0400: 245348.619: Total time for which application threads were stopped: 9.8060420 seconds, Stopping threads took: 0.0006300 seconds 2016-04-26T04:41:43.370-0400: 245385.052: Total time for which application threads were stopped: 10.8144800 seconds, Stopping threads took: 0.0002260 seconds 2016-04-26T04:42:09.479-0400: 245411.161: Total time for which application threads were stopped: 9.4059640 seconds, Stopping threads took: 0.0001340 seconds 2016-04-26T04:42:36.033-0400: 245437.715: Total time for which application threads were stopped: 9.9446430 seconds, Stopping threads took: 0.0007500 seconds 2016-04-26T04:43:02.409-0400: 245464.091: Total time for which application threads were stopped: 10.4197000 seconds, Stopping threads took: 0.0000260 seconds 2016-04-26T04:43:29.559-0400: 245491.241: Total time for which application threads were stopped: 9.6712880 seconds, Stopping threads took: 0.0001080 seconds 2016-04-26T04:43:56.648-0400: 245518.330: Total time for which application threads were stopped: 9.8339590 seconds, Stopping threads took: 0.0011820 seconds 2016-04-26T04:45:35.358-0400: 245617.040: Total time for which application threads were stopped: 9.5853210 seconds, Stopping threads took: 0.0001760 seconds 2016-04-26T04:54:58.764-0400: 246180.446: Total time for which application threads were stopped: 2.9048350 seconds, Stopping threads took: 0.0008180 seconds 2016-04-26T04:55:06.107-0400: 246187.789: Total time for which application threads were stopped: 1.1189760 seconds, Stopping threads took: 0.0011390 seconds After: 2016-04-29T04:30:05.758-0400: 29962.077: Total time for which application threads were stopped: 1.0823960 seconds, Stopping threads took: 0.0005840 seconds 2016-04-29T04:30:11.349-0400: 29967.668: Total time for which application threads were stopped: 1.4147830 seconds, Stopping threads took: 0.0008980 seconds 2016-04-29T04:30:17.198-0400: 29973.517: Total time for which application threads were stopped: 1.6294590 seconds, Stopping threads took: 0.0009380 seconds 2016-04-29T04:30:22.350-0400: 29978.669: Total time for which application threads were stopped: 1.6787880 seconds, Stopping threads took: 0.0012320 seconds 2016-04-29T04:30:28.230-0400: 29984.549: Total time for which application threads were stopped: 1.6895760 seconds, Stopping threads took: 0.0010270 seconds 2016-04-29T04:30:29.944-0400: 29986.263: Total time for which application threads were stopped: 1.5271500 seconds, Stopping threads took: 0.0009670 seconds 2016-04-29T04:30:35.282-0400: 29991.601: Total time for which application threads were stopped: 1.6575670 seconds, Stopping threads took: 0.0006200 seconds 2016-04-29T04:30:51.011-0400: 30007.329: Total time for which application threads were stopped: 2.0383550 seconds, Stopping threads took: 0.0004640 seconds 2016-04-29T04:31:03.032-0400: 30019.351: Total time for which application threads were stopped: 2.1963570 seconds, Stopping threads took: 0.0004650 seconds 2016-04-29T04:31:07.679-0400: 30023.998: Total time for which application threads were stopped: 1.2220760 seconds, Stopping threads took: 0.0004720 seconds On Thu, Apr 28, 2016 at 1:02 PM, Jeff Wartes <jwar...@whitepages.com> wrote: > > Shawn Heisey’s page is the usual reference guide for GC settings: > https://wiki.apache.org/solr/ShawnHeisey > Most of the learnings from that are in the Solr 5.x startup scripts > already, but your heap is bigger, so your mileage may vary. > > Some tools I’ve used while doing GC tuning: > > * VisualVM - Comes with the jdk. It has a Visual GC plug-in that’s pretty > nice for visualizing what’s going on in realtime, but you need to connect > it via jstatd for that to work. > * GCViewer - Visualizes a GC log. The UI leaves a lot to be desired, but > it’s the best tool I’ve found for this purpose. Use this fork for jdk 6+ - > https://github.com/chewiebug/GCViewer > * Swiss Java Knife has a bunch of useful features - > https://github.com/aragozin/jvm-tools > * YourKit - I’ve been using this lately to analyze where garbage comes > from. It’s not free though. > * Eclipse Memory Analyzer - I used this to analyze heap dumps before I got > a YourKit license: http://www.eclipse.org/mat/ > > Good luck! > > > > > > > On 4/28/16, 9:27 AM, "Yonik Seeley" <ysee...@gmail.com> wrote: > > >On Thu, Apr 28, 2016 at 12:21 PM, Nick Vasilyev > ><nick.vasily...@gmail.com> wrote: > >> Hi Yonik, > >> > >> There are a lot of logistics involved with re-indexing and naturally > >> upgrading Solr. I was hoping that there is an easier alternative since > this > >> is only a single back end script that is having problems. > >> > >> Is there any room for improvement with tweaking GC params? > > > >There always is ;-) But I'm not a GC tuning expert. I prefer to > >attack memory problems more head-on (i.e. with code to use less > >memory). > > > >-Yonik >