Just to chime in a few quick thoughts....

I think my experience to this point is G1 (barring unknown lucene bug risk)
is actually a lower risk easier collector to use. However that doesn't
necessarily mean better. You don't have set the space sizes or any number
of all sorts of various parameters you seem to have to set with cms. It can
control pause variability much more than cms does. CMS also has the dubious
distinction of working well when things are fine and being a single
threaded full GC disaster on failures. Some of the settings that Solr uses
are really more matters of opinion than one size fits all. A 50 percent
initiating ratio can consume 4 CPUs permanently at the default settings if
you haven't set your size correctly. 90 percent target survivor ratio can
actually cause very long minor gcs if the survivor space fills creating
much more variable behavior than most people realize although for the most
part don't notice. And CMS goes on and on with all these settings that
really requires more thorough analysis and learning about each setting to
an almost absurd level. G1 has a very small number of easily understandable
settings to it that controls pauses and variability really well. It does
come at a risk of throughput, but for Solr pause goals are far more
important to me than throughout. All that said, I've still typically used
and seen CMS in most circumstances because I have way more experience with
it. And I think a well functioning CMS is more likely to have lower pauses
and better throughput. Its just riskier that it might work much worse. I
think I also don't feel like I know all the warts of G1 yet, so that has
also kept me reticent to use it more.

On Jan 24, 2017 5:46 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

> On 1/23/2017 1:00 PM, Walter Underwood wrote:
> > We have a workload with very long queries, and that can drive the CMS
> collector into using about 20% of the CPU time. So I’m ready to try G1 on a
> couple of replicas and see what happens. I’ve already upgraded to Java 8
> update 121.
> >
> > I’ve read these pages:
> >
> > https://wiki.apache.org/solr/ShawnHeisey#G1_.28Garbage_
> First.29_Collector <https://wiki.apache.org/solr/
> ShawnHeisey#G1_.28Garbage_First.29_Collector>
> > https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66 <
> https://gist.github.com/rockagen/e6d28244e1d540c05144370d6a64ba66>
>
> I would really like a concrete reason as to why the Lucene wiki
> recommends NEVER using G1.
>
> https://wiki.apache.org/lucene-java/JavaBugs#Oracle_
> Java_.2F_Sun_Java_.2F_OpenJDK_Bugs
>
> If that recommendation can be backed up with demonstrable problems, then
> it would make sense.  I took a look through some of the email history
> sent by Jenkins, which runs automated testing of Lucene and Solr using
> various configurations and Java versions.  Problems that were detected
> on tests run with the G1 collector *also* happen on test runs using
> other collectors.  The number of new messages from tests using G1 are a
> very minor portion of the total number of new messages.  If G1 were a
> root cause of big problems, I would expect the number of new failures
> using G1 to be somewhere near half of the total, possibly more.
>
> As many of you know, Solr's essential functionality comes from Lucene,
> so this does matter for Solr.
>
> I myself have never had an issue running Solr with the G1 collector.  I
> haven't found any open and substantiated bugs on Lucene or Solr that
> document real problems with G1 on a 64-bit JVM.  There is one bug that
> happens on a 32-bit JVM ... but most users are NOT limited to 32-bit.
> For those that are limited that way, CMS is probably plenty fast because
> the heap can't go beyond 2GB.
>
> For my production and dev systems, the 4.x versions are running the G1
> collector.  Most of the 5.x and later installs are using the GC tuning
> that Solr contains by default, which is CMS.
>
> Thanks,
> Shawn
>
>

Reply via email to