To clarify, when I said "leader" and "follower" I meant the old leader and
follower before the zookeeper session expiration. When they're recovering
there's no leader.


On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet <mewmewb...@gmail.com> wrote:

> I'm playing with dropping the cluster's connections to zookeeper and then
> reconnecting them, and during recovery, I always see this on the leader's
> logs:
>
> ElectionContext.java (line 361) Waiting until we see more replicas up for
> shard shard1: total=2 found=1 timeoutin=139902
>
> and then on the follower, I see:
> SolrException.java (line 121) There was a problem finding the leader in
> zk:org.apache.solr.common.SolrException: Could not get leader props
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958)
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922)
>         at
> org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463)
>         at
> org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380)
>         at
> org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
>         at
> org.apache.solr.cloud.ZkController$1.command(ZkController.java:232)
>         at
> org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
> KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>         at
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
>         at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273)
>         at
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270)
>         at
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73)
>         at
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270)
>         at
> org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936)
>         ... 6 more
>
> They block each other's progress until leader decides to give up and not
> wait for more replicas to come up:
>
> ElectionContext.java (line 368) Was waiting for replicas to come up, but
> they are taking too long - assuming they won't come back till later
>
> and then recovery moves forward again.
>
> Should waitForLeaderToSeeDownState move on if there's no leader at the
> moment?
> Thanks,
> Jessica
>

Reply via email to