Re: waitForLeaderToSeeDownState when leader is down

Mark Miller Wed, 16 Apr 2014 16:36:45 -0700

What version are you testing? Thought we had addressed this.
-- 
Mark Miller
about.me/markrmiller


On April 16, 2014 at 6:02:09 PM, Jessica Mallet ([email protected]) wrote:

Hi Furkan,  

Thanks for the reply. I understand the intent. However, in the case I  
described, the follower is blocked on looking for a leader (throws the  
pasted exception because it can't find the leader) before it participates  
in election; therefore, it will never come up while the leader waits for it  
to come up (they're deadlocked waiting for each other). What I'm suggesting  
is that maybe the follower should just just skip waitForLeaderToSeeDownState  
when there's no leader (instead of failing with the pasted stacktrace) and  
go ahead and start participating in election. That way the leader will see  
more replicas come up, and they can sync with each other and move on.  

Thanks,  
Jessica  


On Sat, Apr 12, 2014 at 4:14 PM, Furkan KAMACI <[email protected]>wrote:  

> Hi;  
>  
> There is an explanation as follows: "This is meant to protect the case  
> where you stop a shard or it fails and then the first node to get started  
> back up has stale data - you don't want it to just become the leader. So we  
> wait to see everyone we know about in the shard up to 3 or 5 min by  
> default. Then we know all the shards participate in the leader election and  
> the leader will end up with all updates it should have." You can check it  
> from here:  
>  
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccajt9wng_yykcxggentgcxguhhcjhidear-jygpgrnkaedrz...@mail.gmail.com%3E
>   
>  
> Thanks;  
> Furkan KAMACI  
>  
>  
> 2014-04-08 23:51 GMT+03:00 Jessica Mallet <[email protected]>:  
>  
> > To clarify, when I said "leader" and "follower" I meant the old leader  
> and  
> > follower before the zookeeper session expiration. When they're recovering  
> > there's no leader.  
> >  
> >  
> > On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet <[email protected]>  
> > wrote:  
> >  
> > > I'm playing with dropping the cluster's connections to zookeeper and  
> then  
> > > reconnecting them, and during recovery, I always see this on the  
> leader's  
> > > logs:  
> > >  
> > > ElectionContext.java (line 361) Waiting until we see more replicas up  
> for  
> > > shard shard1: total=2 found=1 timeoutin=139902  
> > >  
> > > and then on the follower, I see:  
> > > SolrException.java (line 121) There was a problem finding the leader in  
> > > zk:org.apache.solr.common.SolrException: Could not get leader props  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)  
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
>   
> > > at  
> > >  
> >  
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
>   
> > > at  
> > > org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)  
> > > at  
> > > org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
>   
> > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException:  
> > > KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111)  
> > > at  
> > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51)  
> > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)  
> > > at  
> > >  
> >  
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>   
> > > at  
> > >  
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)  
> > > at  
> > >  
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)  
> > > ... 6 more  
> > >  
> > > They block each other's progress until leader decides to give up and  
> not  
> > > wait for more replicas to come up:  
> > >  
> > > ElectionContext.java (line 368) Was waiting for replicas to come up,  
> but  
> > > they are taking too long - assuming they won't come back till later  
> > >  
> > > and then recovery moves forward again.  
> > >  
> > > Should waitForLeaderToSeeDownState move on if there's no leader at the  
> > > moment?  
> > > Thanks,  
> > > Jessica  
> > >  
> >  
>

Re: waitForLeaderToSeeDownState when leader is down

Reply via email to