I'm playing with dropping the cluster's connections to zookeeper and then reconnecting them, and during recovery, I always see this on the leader's logs:
ElectionContext.java (line 361) Waiting until we see more replicas up for shard shard1: total=2 found=1 timeoutin=139902 and then on the follower, I see: SolrException.java (line 121) There was a problem finding the leader in zk:org.apache.solr.common.SolrException: Could not get leader props at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922) at org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463) at org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380) at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) at org.apache.solr.cloud.ZkController$1.command(ZkController.java:232) at org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179) Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273) at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270) at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936) ... 6 more They block each other's progress until leader decides to give up and not wait for more replicas to come up: ElectionContext.java (line 368) Was waiting for replicas to come up, but they are taking too long - assuming they won't come back till later and then recovery moves forward again. Should waitForLeaderToSeeDownState move on if there's no leader at the moment? Thanks, Jessica