Hi; There is an explanation as follows: "This is meant to protect the case where you stop a shard or it fails and then the first node to get started back up has stale data - you don't want it to just become the leader. So we wait to see everyone we know about in the shard up to 3 or 5 min by default. Then we know all the shards participate in the leader election and the leader will end up with all updates it should have." You can check it from here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccajt9wng_yykcxggentgcxguhhcjhidear-jygpgrnkaedrz...@mail.gmail.com%3E
Thanks; Furkan KAMACI 2014-04-08 23:51 GMT+03:00 Jessica Mallet <mewmewb...@gmail.com>: > To clarify, when I said "leader" and "follower" I meant the old leader and > follower before the zookeeper session expiration. When they're recovering > there's no leader. > > > On Tue, Apr 8, 2014 at 1:49 PM, Jessica Mallet <mewmewb...@gmail.com> > wrote: > > > I'm playing with dropping the cluster's connections to zookeeper and then > > reconnecting them, and during recovery, I always see this on the leader's > > logs: > > > > ElectionContext.java (line 361) Waiting until we see more replicas up for > > shard shard1: total=2 found=1 timeoutin=139902 > > > > and then on the follower, I see: > > SolrException.java (line 121) There was a problem finding the leader in > > zk:org.apache.solr.common.SolrException: Could not get leader props > > at > > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:958) > > at > > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:922) > > at > > > org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1463) > > at > > > org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:380) > > at > > org.apache.solr.cloud.ZkController.access$100(ZkController.java:84) > > at > > org.apache.solr.cloud.ZkController$1.command(ZkController.java:232) > > at > > > org.apache.solr.common.cloud.ConnectionManager$2$1.run(ConnectionManager.java:179) > > Caused by: org.apache.zookeeper.KeeperException$NoNodeException: > > KeeperErrorCode = NoNode for /collections/lc4/leaders/shard1 > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > > at > > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) > > at > > > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:273) > > at > > > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:270) > > at > > > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:73) > > at > > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:270) > > at > > org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:936) > > ... 6 more > > > > They block each other's progress until leader decides to give up and not > > wait for more replicas to come up: > > > > ElectionContext.java (line 368) Was waiting for replicas to come up, but > > they are taking too long - assuming they won't come back till later > > > > and then recovery moves forward again. > > > > Should waitForLeaderToSeeDownState move on if there's no leader at the > > moment? > > Thanks, > > Jessica > > >