Found this error which likely explains my issue with new replicas not coming
up, not sure next step. Almost looks like Zookeeper's record of a Shard's
leader is not being updated?
4/8/2015, 4:56:03 PM
ERROR
ShardLeaderElectionContext
There was a problem trying to register as the
leader:org.apache.solr.common.SolrException: Could not register as the leader
because creating the ephemeral registration node in ZooKeeper failed
There was a problem trying to register as the
leader:org.apache.solr.common.SolrException: Could not register as the leader
because creating the ephemeral registration node in ZooKeeper failed
at
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:150)
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:306)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:163)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:125)
at org.apache.solr.cloud.LeaderElector.access$200(LeaderElector.java:55)
at
org.apache.solr.cloud.LeaderElector$ElectionWatcher.process(LeaderElector.java:358)
at
org.apache.solr.common.cloud.SolrZkClient$3$1.run(SolrZkClient.java:209)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /collections/kla_collection/leaders/shard4
at
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:40)
at
org.apache.solr.cloud.ShardLeaderElectionContextBase.runLeaderProcess(ElectionContext.java:137)
... 11 more
Caused by: org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /collections/kla_collection/leaders/shard4
at org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
at
org.apache.solr.common.cloud.SolrZkClient$11.execute(SolrZkClient.java:462)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:74)
at
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:459)
at
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:416)
at
org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:403)
at
org.apache.solr.cloud.ShardLeaderElectionContextBase$1.execute(ElectionContext.java:142)
at
org.apache.solr.common.util.RetryUtil.retryOnThrowable(RetryUtil.java:34)
Matt
-----Original Message-----
From: Matt Kuiper [mailto:[email protected]]
Sent: Wednesday, April 08, 2015 4:36 PM
To: [email protected]
Subject: RE: Clusterstate - state active
Erick, Anshum,
Thanks for your replies! Yes, it is replica state that I am looking at, and
this the answer I was hoping for.
I am working on a solution that involves moving some replicas to new Solr nodes
as they are made available. Before deleting the original replicas backing the
shard, I check the replica state to make sure is active for the new replicas.
Initially it was working pretty well, but with more recent testing I regularly
see the shard go down. The two new replicas go into failed recovery state
after the original replicas are deleted, the logs report that a registered
leader was not found for the shard. Initially I was concerned that maybe the
new shards were not fully synced with the leader, even though I checked for
active state.
Now I am wondering if the new shards are somehow competing (or somehow
reluctant ) to become leader, and thus neither become leader. I plan to test
just creating one new replica on a new solr node, checking for state is active,
then deleting original replicas, and then creating second new replica.
Any thoughts?
Matt
-----Original Message-----
From: Erick Erickson [mailto:[email protected]]
Sent: Wednesday, April 08, 2015 4:13 PM
To: [email protected]
Subject: Re: Clusterstate - state active
Matt:
In a word, "yes". Depending on the size of the index for that shard, the
transition from Down->Recovering->Active may be too fast to catch.
If replicating the index takes a while, though, you should at least see the
"Recovering" state, during which time there won't be any searches forwarded to
that node.
Best,
Erick
On Wed, Apr 8, 2015 at 2:58 PM, Matt Kuiper <[email protected]> wrote:
> Hello,
>
> When creating a new replica, and the state is recorded as active with in ZK
> clusterstate, does that mean that new replica has synched with the leader
> replica for the particular shard?
>
> Thanks,
> Matt
>