Matt:

How are you creating the new replica? Are you giving it an explicit
name? And especially is it the same name as one you've already
deleted?

'cause I can't really imagine why you'd be getting a ZK exception
saying the node already exists.

Shot in the dark here......

On Wed, Apr 8, 2015 at 4:11 PM, Matt Kuiper <matt.kui...@issinc.com> wrote:
> Found this error which likely explains my issue with new replicas not coming 
> up, not sure next step.  Almost looks like Zookeeper's record of a Shard's 
> leader is not being updated?
>
> 4/8/2015, 4:56:03 PM
> ERROR
> ShardLeaderElectionContext
> There was a problem trying to register as the 
> leader:org.apache.solr.common.SolrException: Could not register as the leader 
> because creating the ephemeral registration node in ZooKeeper failed
> There was a problem trying to register as the 
> leader:org.apache.solr.common.SolrException: Could not register as the leader 
> because creating the ephemeral registration node in ZooKeeper failed
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
>         at 
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
>         at 
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
>         at 
> org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
>         at 
> org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.solr.common.SolrException: 
> org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
> NodeExists for /collections/kla_collection/leaders/shard4
>         at 
> org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
>         ... 11 more
> Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
> KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
>         at 
> org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)
>
> Matt
>
>
> -----Original Message-----
> From: Matt Kuiper [mailto:matt.kui...@issinc.com]
> Sent: Wednesday, April 08, 2015 4:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Clusterstate - state active
>
> Erick, Anshum,
>
> Thanks for your replies!  Yes, it is replica state that I am looking at, and 
> this the answer I was hoping for.
>
> I am working on a solution that involves moving some replicas to new Solr 
> nodes as they are made available.  Before deleting the original replicas 
> backing the shard, I check the replica state to make sure is active for the 
> new replicas.
>
> Initially it was working pretty well, but with more recent testing I 
> regularly see the shard go down.  The two new replicas go into failed 
> recovery state after the original replicas are deleted, the logs report that 
> a registered leader was not found for the shard.  Initially I was concerned 
> that maybe the new shards were not fully synced with the leader, even though 
> I checked for active state.
>
> Now I am wondering if the new shards are somehow competing (or somehow 
> reluctant )  to become leader, and thus neither become leader.  I plan to 
> test just creating one new replica on a new solr node, checking for state is 
> active, then deleting original replicas, and then creating second new replica.
>
> Any thoughts?
>
> Matt
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, April 08, 2015 4:13 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Clusterstate - state active
>
> Matt:
>
> In a word, "yes". Depending on the size of the index for that shard, the 
> transition from Down->Recovering->Active may be too fast to catch.
> If replicating the index takes a while, though, you should at least see the 
> "Recovering" state, during which time there won't be any searches forwarded 
> to that node.
>
> Best,
> Erick
>
> On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper <matt.kui...@issinc.com> wrote:
>> Hello,
>>
>> When creating a new replica, and the state is recorded as active with in ZK 
>> clusterstate, does that mean that new replica has synched with the leader 
>> replica for the particular shard?
>>
>> Thanks,
>> Matt
>>

Reply via email to