Found this error which likely explains my issue with new replicas not coming 
up, not sure next step.  Almost looks like Zookeeper's record of a Shard's 
leader is not being updated?

4/8/2015, 4:56:03 PM
ERROR
ShardLeaderElectionContext
There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
There was a problem trying to register as the 
leader:org.apache.solr.common.SolrException: Could not register as the leader 
because creating the ephemeral registration node in ZooKeeper failed
        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
        at 
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
        at 
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
        at 
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
        at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
        at 
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
        at 
org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException: 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /collections/kla_collection/leaders/shard4
        at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
        ... 11 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException: 
KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
        at 
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
        at 
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
        at 
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
        at 
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)

Matt


-----Original Message-----
From: Matt Kuiper [mailto:matt.kui...@issinc.com] 
Sent: Wednesday, April 08, 2015 4:36 PM
To: solr-user@lucene.apache.org
Subject: RE: Clusterstate - state active

Erick, Anshum,

Thanks for your replies!  Yes, it is replica state that I am looking at, and 
this the answer I was hoping for.  

I am working on a solution that involves moving some replicas to new Solr nodes 
as they are made available.  Before deleting the original replicas backing the 
shard, I check the replica state to make sure is active for the new replicas.  

Initially it was working pretty well, but with more recent testing I regularly 
see the shard go down.  The two new replicas go into failed recovery state 
after the original replicas are deleted, the logs report that a registered 
leader was not found for the shard.  Initially I was concerned that maybe the 
new shards were not fully synced with the leader, even though I checked for 
active state.

Now I am wondering if the new shards are somehow competing (or somehow 
reluctant )  to become leader, and thus neither become leader.  I plan to test 
just creating one new replica on a new solr node, checking for state is active, 
then deleting original replicas, and then creating second new replica.

Any thoughts?

Matt

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, April 08, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Clusterstate - state active

Matt:

In a word, "yes". Depending on the size of the index for that shard, the 
transition from Down->Recovering->Active may be too fast to catch.
If replicating the index takes a while, though, you should at least see the 
"Recovering" state, during which time there won't be any searches forwarded to 
that node.

Best,
Erick

On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper <matt.kui...@issinc.com> wrote:
> Hello,
>
> When creating a new replica, and the state is recorded as active with in ZK 
> clusterstate, does that mean that new replica has synched with the leader 
> replica for the particular shard?
>
> Thanks,
> Matt
>

Reply via email to