On 4/28/2016 9:43 AM, Nick Vasilyev wrote: > I forgot to mention that the index is approximately 50 million docs split > across 4 shards (replication factor 2) on 2 solr replicas.
Later in the thread, Jeff Wartes mentioned my wiki page for GC tuning. https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr Oracle seems to be putting most of their recent work into the G1 collector. They want to make it the default collector for Java 9. I don't know if this is going to happen, or it they will delay that change until Java 10 ... but I think that the default *is* going to eventually be G1. Lucene strongly recommends against EVER using G1, but I have not noticed any problems with it on Solr. I have not seen any specific *reason* that G1 is not recommended, at least with a 64-bit JVM. I can only remember seeing issues in Jira with G1 when the JVM was 32-bit -- which has its own limitations, so I don't recommend it anyway. If somebody can point me to specific *OPEN* issues showing current problems with G1 on a 64-bit JVM, I will have an easier time believing the Lucene assertion that it's a bad idea. I have searched Jira and can't find anything relevant. The best GC results I've seen in testing Solr have been with the G1 collector. I haven't done any testing for quite a while, and almost all of the testing that I've done has been with 4.x versions, not 5.x. Out of the box, Solr 5.0 and later uses GC tuning with the CMS collector that looks a lot like the CMS config that I came up with. This works pretty well, but if the heap size gets big enough, especially 32GB or larger, I suspect that it will start to have problems with GC performance. You could give the settings I listed under "Current experiments" a try. You would do this by editing your solr.in.* script to comment out the current GC tuning parameters and substituting the new set of parameters. This is a G1 config, and as I already mentioned, Lucene recommends NEVER using G1. For your specific situation, I think you should try setting the max heap to 31GB instead of 32GB, so the pointer sizes are cut in half, without making a huge difference in the total amount of memory available. The chewiebug version of GCViewer is what I use to take the GC logfile and see how Solr did with particular GC settings. At my request, they have incorporated some really nice statistical metrics into GCViewer, but it's not available in the released versions -- you'll have to compile it yourself until 1.35 comes out. https://github.com/chewiebug/GCViewer/issues/139 I have also had some good luck with jHiccup. That tool will inform you of pauses that happen for *any* reason, not just because of GC. My experience is that GC is the only *major* cause of pauses. Thanks, Shawn