Matt: How are you creating the new replica? Are you giving it an explicit name? And especially is it the same name as one you've already deleted?
'cause I can't really imagine why you'd be getting a ZK exception saying the node already exists. Shot in the dark here...... On Wed, Apr 8, 2015 at 4:11 PM, Matt Kuiper <matt.kui...@issinc.com> wrote: > Found this error which likely explains my issue with new replicas not coming > up, not sure next step. Almost looks like Zookeeper's record of a Shard's > leader is not being updated? > > 4/8/2015, 4:56:03 PM > ERROR > ShardLeaderElectionContext > There was a problem trying to register as the > leader:org.apache.solr.common.SolrException: Could not register as the leader > because creating the ephemeral registration node in ZooKeeper failed > There was a problem trying to register as the > leader:org.apache.solr.common.SolrException: Could not register as the leader > because creating the ephemeral registration node in ZooKeeper failed > at > org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150) > at > org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306) > at > org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163) > at > org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125) > at > org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55) > at > org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358) > at > org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.solr.common.SolrException: > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /collections/kla_collection/leaders/shard4 > at > org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40) > at > org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137) > ... 11 more > Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: > KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4 > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:119) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at > org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416) > at > org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403) > at > org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142) > at > org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34) > > Matt > > > -----Original Message----- > From: Matt Kuiper [mailto:matt.kui...@issinc.com] > Sent: Wednesday, April 08, 2015 4:36 PM > To: solr-user@lucene.apache.org > Subject: RE: Clusterstate - state active > > Erick, Anshum, > > Thanks for your replies! Yes, it is replica state that I am looking at, and > this the answer I was hoping for. > > I am working on a solution that involves moving some replicas to new Solr > nodes as they are made available. Before deleting the original replicas > backing the shard, I check the replica state to make sure is active for the > new replicas. > > Initially it was working pretty well, but with more recent testing I > regularly see the shard go down. The two new replicas go into failed > recovery state after the original replicas are deleted, the logs report that > a registered leader was not found for the shard. Initially I was concerned > that maybe the new shards were not fully synced with the leader, even though > I checked for active state. > > Now I am wondering if the new shards are somehow competing (or somehow > reluctant ) to become leader, and thus neither become leader. I plan to > test just creating one new replica on a new solr node, checking for state is > active, then deleting original replicas, and then creating second new replica. > > Any thoughts? > > Matt > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Wednesday, April 08, 2015 4:13 PM > To: solr-user@lucene.apache.org > Subject: Re: Clusterstate - state active > > Matt: > > In a word, "yes". Depending on the size of the index for that shard, the > transition from Down->Recovering->Active may be too fast to catch. > If replicating the index takes a while, though, you should at least see the > "Recovering" state, during which time there won't be any searches forwarded > to that node. > > Best, > Erick > > On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper <matt.kui...@issinc.com> wrote: >> Hello, >> >> When creating a new replica, and the state is recorded as active with in ZK >> clusterstate, does that mean that new replica has synched with the leader >> replica for the particular shard? >> >> Thanks, >> Matt >>