Dear Mr. Shalin,

Yes. I mean "state" in Cluster State API and UI.

Let me explain what happened previous days by detail:

Think I have Collection A distributed across node1 (the leader), node2 and
node 3. 

I used the following command to block node 1 solr and zookeeper ports from
being listend:
(the ports are 2888/3888/2181 and 4239)

firewall-cmd --remove-port=<node1Port>/tcp --permanent

node 1 state is still "active", and leader is "true" in response of Cluster
State API.

the Solr logs of node 1 is like below:


org.apache.solr.common.SolrException: ClusterState says we are the leader
(<node1IP>:4239/solr/collectionA_shard2_replica1), but locally we don't
think so. Request came from <node2IP>:4239/solr/collectionA_shard4_replica3/
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:658)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:418)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:346)
        at ......

node 2 error in solr logs is:

forwarding update to <node1IP>:4239/solr/collection A_shard5_replica1/
failed - retrying ... retries: 24 add{,id=121,commitWithin=1000}
params:update.chain=add-unknown-fields-to-the-schema&update.distrib=TOLEADER&distrib.from=node2:4239/solr/collection
A_shard2_replica2/
rsp:503:org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at node1IP:4239/solr/collection A_shard5_replica1: Service
Unavailable

node 3 error in solr logs is like node 2 error.

------------------------------------------------------------------------------------------------

Unforunately, today I found that my node 4 and node 5 from collection B and
C became down. The  logs errors were like below:

2018-03-01 00:26:46.133 ERROR
(zkCallback-4-thread-28-processing-n:node4IP:4239_solr-EventThread) [   ]
o.a.s.c.ZkController :org.apache.solr.common.SolrException: There was a
problem making a request to the leader
        at
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1551)
        at
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:476)
        at org.apache.solr.cloud.ZkController.access$500(ZkController.java:121)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:338)
        at
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:168)
        at
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:57)
        at
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:142)
        at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:505)

and 

Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /collections/Collection B/state.json
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1212)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
        at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
        at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
        at
org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110)
        at
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1096)
        ... 39 more


I think these errors are related to blocking the ports of node 1.

I wonder if you help me.

Regards,
Zahra









--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to