Hi Mark; I continue my tests and realized that issue. I have 1 leader and 1 replica at each shard. I have killed the leader:
* Cloud graph says that *leader has gone ** *(which I expect). However * previous* *none leader still is not a leader ** *(which I didn't expect). * Zookeeper clusterstate.json says that the node that has gone is* still active and leader *(which I didn't expect) * At Cloud tree link; /collections/collection1/leaders* *shard1 says that * previous* *none leader is leader *(which I expect)* * * * Is there* *any* *contradiction* *here or do I miss anything?* * PS: I have reloaded core at replica and I got an error about no registered leader was found and error getting leader from zk for shard shard1. This maybe an issue about Zookeeper too? 2013/5/14 Mark Miller <markrmil...@gmail.com> > The actual state is a mix of the clusterstate.json and the ephemeral live > nodes - a node may be listed as active or whatever, and if it's live node > is not up, it doesn't matter - it's considered down. > > - Mark > > On May 14, 2013, at 8:08 AM, Furkan KAMACI <furkankam...@gmail.com> wrote: > > > Node is shown as down at admin page. It says there is one replica for > that > > shard but leader is dead (no new leader is selected!) however when I > check > > zookeeper information from /clusterstate.json at admin page I see that: > > > > "shard2":{ > > "range":"b3330000-e665ffff", > > "state":"active", > > "replicas":{ > > "10.***.**.*1:8983_solr_collection1":{ > > "shard":"shard2", > > *"state":"active",* > > "core":"collection1", > > "collection":"collection1", > > "node_name":"10.***.**.*1:8983_solr", > > "base_url":"http://10.***.**.*1:8983/solr", > > "leader":"true"}, > > "10.***.**.**2:8983_solr_collection1":{ > > "shard":"shard2", > > *"state":"active",* > > "core":"collection1", > > "collection":"collection1", > > "node_name":"10.***.***.**2:8983_solr", > > "base_url":"http://10.***.***.**2:8983/solr"}}}, > > > > I mean dead node is still listed as active! > > > > I have exceptions and warning at my solr log: > > > > ... > > INFO: Updating cluster state from ZooKeeper... > > May 14, 2013 2:31:12 PM org.apache.solr.cloud.ZkController > > publishAndWaitForDownStates > > WARNING: Timed out waiting to see all nodes published as DOWN in our > cluster > > ... > > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController getLeader > > SEVERE: Error getting leader from zk > > org.apache.solr.common.SolrException: There is conflicting information > > about the leader of shard: shard2 our state > > says:http://10.***.***.*1:8983/solr/collection1/ > > but zookeeper says:http://10.***.***.**2:8983/solr/collection1/ > > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:849) > > at org.apache.solr.cloud.ZkController.register(ZkController.java:776) > > at org.apache.solr.cloud.ZkController.register(ZkController.java:727) > > at > org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) > > at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) > > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) > > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) > > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > at java.lang.Thread.run(Thread.java:722) > > > > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish > > INFO: publishing core=collection1 state=down > > May 14, 2013 2:32:14 PM org.apache.solr.cloud.ZkController publish > > INFO: numShards not found on descriptor - reading it from system property > > May 14, 2013 2:32:14 PM org.apache.solr.common.SolrException log > > SEVERE: :org.apache.solr.common.SolrException: Error getting leader from > zk > > for shard shard2 > > at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:864) > > at org.apache.solr.cloud.ZkController.register(ZkController.java:776) > > at org.apache.solr.cloud.ZkController.register(ZkController.java:727) > > at > org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:908) > > at > org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:892) > > at org.apache.solr.core.CoreContainer.register(CoreContainer.java:841) > > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:638) > > at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > > > at java.lang.Thread.run(Thread.java:722) > > > > and after that it closes main searcher. > > > > How can I get rid of this error and why there is a mismatch between admin > > page's graph and clusterstate? > >