waitForLeaderToSeeDownState when leader is down

Jessica Mallet Tue, 08 Apr 2014 13:51:09 -0700

I'm playing with dropping the cluster's connections to zookeeper and then
reconnecting them, and during recovery, I always see this on the leader's
logs:


ElectionContext.java (line 361) Waiting until we see more replicas up for
shard shard1: total=2 found=1 timeoutin=139902

and then on the follower, I see:
SolrException.java (line 121) There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)
        at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
        at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
        at
org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
        at
org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)
        at
org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
        at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)
        at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)
        ... 6 more

They block each other's progress until leader decides to give up and not
wait for more replicas to come up:

ElectionContext.java (line 368) Was waiting for replicas to come up, but
they are taking too long - assuming they won't come back till later

and then recovery moves forward again.

Should waitForLeaderToSeeDownState move on if there's no leader at the
moment?
Thanks,
Jessica

waitForLeaderToSeeDownState when leader is down

Reply via email to