@Shawn @Erick Thanks for your kindle help!

No OOM log and I confirm there was no OOM happened.

My ZK ticktime is set to 5000, so 5000*20 = 100s > 60s, and I checked solr 
code: the leader waiting time: 4000ms is a const variable, is not configurable. 
(why it isn't a configurable param?)

My solr version is 7.3.1, xmx = 30000MB (via solr UI, peak memory is 22GB)
I have already used CMS GC tuning (param has a little difference from your wiki 
page).

I will try the following advice:

  *   lower heap size
  *   turn to G1 (the same param as wiki)
  *   try to restart one SOLR node when this error happens.

Thanks again.

________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Wednesday, September 18, 2019 20:21
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: Re: Question about "No registered leader" error

On 9/18/2019 6:11 AM, Shawn Heisey wrote:
> On 9/17/2019 9:35 PM, Hongxu Ma wrote:
>> My questions:
>>
>>    *   Is this error possible caused by "long gc pause"? my solr
>> zkClientTimeout=60000
>
> It's possible.  I can't say for sure that this is the issue, but it
> might be.

A followup.  I was thinking about the interactions here.  It looks like
Solr only waits four seconds for the leader election, and both of the
pauses you mentioned are longer than that.

Four seconds is probably too short a time to wait, and I do not think
that timeout is configurable anywhere.

> What version of Solr do you have, and what is your max heap?  The CMS
> garbage collection that Solr 5.0 and later incorporate by default is
> pretty good.  My G1 settings might do slightly better, but the
> improvement won't be dramatic unless your existing commandline has
> absolutely no gc tuning at all.

That question will be important.  If you already have our CMS GC tuning,
switching to G1 probably is not going to solve this.  Lowering the max
heap might be the only viable solution in that case, and depending on
what you're dealing with, it will either be impossible or it will
require more servers.

Thanks,
Shawn

Reply via email to