Hello. In one of our test environment, we have a SolrCloud cluster of 8 SolrCloud nodes and a quorum of 5 ZooKeeper node. We have only 2 collections and all SolrCloud nodes are identical and have a single replica of each collection.
I noticed that when I shut down one of the solr nodes and refresh the solrCloud admin UI, the Cloud->Graph view immediately shows the node/shards as Gone/down (in gray color), which is what I expected. Now, when I go through the UI to the Tree view and browse under individual collections, the file state.json shows all nodes as "Active" or up. I expected this to show "down": This is the main issue here. I looked into ZK for the state.json file and all nodes are marked as actives in state.json on ZK as well. So, it seems the overseer is not writing to ZK? Note that when I use the API /solr/admin/collections?action=CLUSTERSTATUS, I have the expected result i.e 1 host is down When I do /solr/admin/collections?action=OVERSEERSTATUS there is no failed operation shown For now, we noticed this issue in one of our test environment. When I deploy a local cluster on my machine, I cannot reproduce this stale state.json issue. Any idea or hint about what could be causing this would be very appreciated. Thank you. Arcadius.