Ok, i applied the patch and it is clear the timeout is 15000. Solr.xml says 
30000 if ZK_CLIENT_TIMEOUT is not set, which is by default unset in 
solr.in.sh,but set in bin/solr to 15000. So it seems Solr's default is still 
15000, not 30000.

But, back to my topic. I see we explicitly set it in solr.in.sh to 30000. To be 
sure, i applied your patch to a production machine, all our collections run 
with 30000. So how would that explain this log line?

o.a.z.ClientCnxn Client session timed out, have not heard from server in 22130ms

We also see these with smaller values, seven seconds. And, is this actually an 
indicator of the problems we have?

Any ideas?

Many thanks,
Markus
 
 
-----Original message-----
> From:Markus Jelsma <markus.jel...@openindex.io>
> Sent: Saturday 27th January 2018 10:03
> To: solr-user@lucene.apache.org
> Subject: RE: 7.2.1 cluster dies within minutes after restart
> 
> Hello,
> 
> I grepped for it yesterday and found nothing but 30000 in the settings, but 
> judging from the weird time out value, you may be right. Let me apply your 
> patch early next week and check for spurious warnings.
> 
> Another note worthy observation for those working on cloud stability and 
> recovery, whenever this happens, some nodes are also absolutely sure to run 
> OOM. The leaders usually live longest, the replica's don't, their heap usage 
> peaks every time, consistently. 
> 
> Thanks,
> Markus
>  
> -----Original message-----
> > From:Shawn Heisey <apa...@elyograg.org>
> > Sent: Saturday 27th January 2018 0:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > 
> > On 1/26/2018 10:02 AM, Markus Jelsma wrote:
> > > o.a.z.ClientCnxn Client session timed out, have not heard from server in 
> > > 22130ms (although zkClientTimeOut is 30000).
> > 
> > Are you absolutely certain that there is a setting for zkClientTimeout
> > that is actually getting applied?  The default value in Solr's example
> > configs is 30 seconds, but the internal default in the code (when no
> > configuration is found) is still 15.  I have confirmed this in the code.
> > 
> > Looks like SolrCloud doesn't log the values it's using for things like
> > zkClientTimeout.  I think it should.
> > 
> > https://issues.apache.org/jira/browse/SOLR-11915
> > 
> > Thanks,
> > Shawn
> > 
> > 
> 

Reply via email to