When leader reaches 99% physical memory on the box and starts swapping (stops 
replicating), we forcefully bring down leader (first kill -15 and then kill -9 
if kill -15 doesn't work). This is when we are looking up to replica to assume 
leader's role and it never happens. 

Zookeeper timeout is 45 seconds. We can increase it up to 2 minutes and test. 

<cores adminPath="/admin/cores" defaultCoreName="collection1" host="${host:}" 
hostPort="${jetty.port:8983}" hostContext="${hostContext:solr}" 
zkClientTimeout="${zkClientTimeout:45000}">

As per definition of zkClientTimeout, After the leader is brought down and it 
doesn't talk to zookeeper for 45 seconds, shouldn't ZK promote replica to 
leader? I am not sure how increasing zk timeout will help. 

 
-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, January 28, 2015 11:42 AM
To: solr-user@lucene.apache.org
Subject: Re: replica never takes leader role

This is not the desired behavior at all. I know there have been
improvements in this area since 4.8, but can't seem to locate the JIRAs.

I'm curious _why_ the nodes are going down though, is it happening at
random or are you taking it down? One problem has been that the Zookeeper
timeout used to default to 15 seconds, and occasionally a node would be
unresponsive (sometimes due to GC pauses) and exceed the timeout. So upping
the ZK timeout has helped some people avoid this...

FWIW,
Erick

On Wed, Jan 28, 2015 at 7:11 AM, Joshi, Shital <shital.jo...@gs.com> wrote:

> We're using Solr 4.8.0
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Tuesday, January 27, 2015 7:47 PM
> To: solr-user@lucene.apache.org
> Subject: Re: replica never takes leader role
>
> What version of Solr? This is an ongoing area of improvements and several
> are very recent.
>
> Try searching the JIRA for Solr for details.
>
> Best,
> Erick
>
> On Tue, Jan 27, 2015 at 1:51 PM, Joshi, Shital <shital.jo...@gs.com>
> wrote:
>
> > Hello,
> >
> > We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes and three
> > zookeeper instances. We have noticed that when a leader node goes down
> the
> > replica never takes over as a leader, cloud becomes unusable and we have
> to
> > bounce entire cloud for replica to assume leader role. Is this default
> > behavior? How can we change this?
> >
> > Thanks.
> >
> >
> >
>

Reply via email to