Re: Solr Admin Page Says Leader is Down, Replica is Up, Zookeeper Says Thar They are Both Active

Furkan KAMACI Sun, 19 May 2013 03:56:31 -0700

Hi Mark;

I continue my tests and realized that issue. I have 1 leader and 1 replica
at each shard. I have killed the leader:


* Cloud graph says that *leader has gone ** *(which I expect). However *
previous* *none leader still is not a leader ** *(which I didn't expect).

* Zookeeper clusterstate.json says that the node that has gone is* still
active and leader *(which I didn't expect)

* At Cloud tree link; /collections/collection1/leaders* *shard1 says that *
previous* *none leader is leader *(which I expect)*
*
*
*
Is there* *any* *contradiction* *here or do I miss anything?*
*

PS: I have reloaded core at replica and I got an error about no registered
leader was found and error getting leader from zk for shard shard1. This
maybe an issue about Zookeeper too?


2013/5/14 Mark Miller <markrmil...@gmail.com>

> The actual state is a mix of the clusterstate.json and the ephemeral live
> nodes - a node may be listed as active or whatever, and if it's live node
> is not up, it doesn't matter - it's considered down.
>
> - Mark
>
> On May 14, 2013, at 8:08 AM, Furkan KAMACI <furkankam...@gmail.com> wrote:
>
> > Node is shown as down at admin page. It says there is one replica for
> that
> > shard but leader is dead (no new leader is selected!) however when I
> check
> > zookeeper information from /clusterstate.json at admin page I see that:
> >
> > "shard2":{
> > "range":"b3330000-e665ffff",
> > "state":"active",
> > "replicas":{
> > "10.***.**.*1:8983_solr_collection1":{
> > "shard":"shard2",
> > *"state":"active",*
> > "core":"collection1",
> > "collection":"collection1",
> > "node_name":"10.***.**.*1:8983_solr",
> >  "base_url":"http://10.***.**.*1:8983/solr";,
> > "leader":"true"},
> > "10.***.**.**2:8983_solr_collection1":{
> > "shard":"shard2",
> > *"state":"active",*
> > "core":"collection1",
> > "collection":"collection1",
> >  "node_name":"10.***.***.**2:8983_solr",
> >  "base_url":"http://10.***.***.**2:8983/solr"}}},
> >
> > I mean dead node is still listed as active!
> >
> > I have exceptions and warning at my solr log:
> >
> > ...
> > INFO: Updating cluster state from ZooKeeper...
> > May 14, 2013 2:31:12 PM org.apache.solr.cloud.ZkController
> > publishAndWaitForDownStates
> > WARNING: Timed out waiting to see all nodes published as DOWN in our
> cluster
> > ...
> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController getLeader
> > SEVERE: Error getting leader from zk
> > org.apache.solr.common.SolrException: There is conflicting information
> > about the leader of shard: shard2 our state
> > says:http://10.***.***.*1:8983/solr/collection1/
> > but zookeeper says:http://10.***.***.**2:8983/solr/collection1/
> > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:849)
> > at org.apache.solr.cloud.ZkController.register(ZkController.java:776)
> > at org.apache.solr.cloud.ZkController.register(ZkController.java:727)
> > at
> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
> > at
> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
> > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> > at java.lang.Thread.run(Thread.java:722)
> >
> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish
> > INFO: publishing core=collection1 state=down
> > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish
> > INFO: numShards not found on descriptor - reading it from system property
> > May 14, 2013 2:32:14 PM org.apache.solr.common.SolrException log
> > SEVERE: :org.apache.solr.common.SolrException: Error getting leader from
> zk
> > for shard shard2
> > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864)
> > at org.apache.solr.cloud.ZkController.register(ZkController.java:776)
> > at org.apache.solr.cloud.ZkController.register(ZkController.java:727)
> > at
> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908)
> > at
> org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892)
> > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638)
> > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> > at java.lang.Thread.run(Thread.java:722)
> >
> > and after that it closes main searcher.
> >
> > How can I get rid of this error and why there is a mismatch between admin
> > page's graph and clusterstate?
>
>

Re: Solr Admin Page Says Leader is Down, Replica is Up, Zookeeper Says Thar They are Both Active

Reply via email to