Dear Mr. Shalin,
Yes. I mean "state" in Cluster State API and UI.
Let me explain what happened previous days by detail:
Think I have Collection A distributed across node1 (the leader), node2 and
node 3.
I used the following command to block node 1 solr and zookeeper ports from
being listend:
(the ports are 2888/3888/2181 and 4239)
firewall-cmd --remove-port=<node1Port>/tcp --permanent
node 1 state is still "active", and leader is "true" in response of Cluster
State API.
the Solr logs of node 1 is like below:
org.apache.solr.common.SolrException: ClusterState says we are the leader
(<node1IP>:4239/solr/collectionA_shard2_replica1), but locally we don't
think so. Request came from <node2IP>:4239/solr/collectionA_shard4_replica3/
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:658)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:418)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:346)
at ......
node 2 error in solr logs is:
forwarding update to <node1IP>:4239/solr/collection A_shard5_replica1/
failed - retrying ... retries: 24 add{,id=121,commitWithin=1000}
params:update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=node2:4239/solr/collection
A_shard2_replica2/
rsp:503:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at node1IP:4239/solr/collection A_shard5_replica1: Service
Unavailable
node 3 error in solr logs is like node 2 error.
------------------------------------------------------------------------------------------------
Unforunately, today I found that my node 4 and node 5 from collection B and
C became down. The logs errors were like below:
2018-03-01 00:26:46.133 ERROR
(zkCallback-4-thread-28-processing-n:node4IP:4239_solr-EventThread) [ ]
o.a.s.c.ZkController :org.apache.solr.common.SolrException: There was a
problem making a request to the leader
at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1551)
at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:476)
at org.apache.solr.cloud.ZkController.access$500(ZkController.java:121)
at org.apache.solr.cloud.ZkController$1.command(ZkController.java:338)
at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)
and
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/Collection B/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
at
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1096)
... 39 more
I think these errors are related to blocking the ports of node 1.
I wonder if you help me.
Regards,
Zahra
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html