Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr stop…) it’s fine. If I do a "kill -9” on it, I see the same thing you do on master.
It’s a bit tricky. When a node goes away without a chance to gracefully shut down, there’s no chance to set the state in the collection’s “state.json” znode. However, the node will be removed from the “live_nodes” list and a replica is not truly active unless its state is “active” in the state.json file _and_ the node appears in live_nodes. CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently doesn’t. I’ll raise a JIRA. Thanks for letting us know Erick > On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: > > colStatus (and clusterStatus) from the Collections api. > https://lucene.apache.org/solr/guide/8_1/collections-api.html#colstatus > > > Running something like this in the browser where the live solr node is > accessible on port 8983 (but points at a Docker container which is running > the Solr node): > http://localhost:8983/solr/admin/collections?action=COLSTATUS&collection=coll > > > > > -----Erick Erickson <erickerick...@gmail.com> wrote: ----- > To: solr-user@lucene.apache.org > From: Erick Erickson <erickerick...@gmail.com> > Date: 10/29/2019 05:39PM > Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 in > a distributed deployment > > > Uhm, what is colStatus? You need to show us _exactly_ what Solr commands > you’re running for us to make any intelligent comments. > >> On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: >> >> Hi, >> >> We're seeing an issue with colStatus in a distributed Solr deployment. >> >> Scenario: >> Collection with: >> - 1 zk >> - 2 solr nodes on different boxes (simulated using Docker containers) >> - replication factor 5 >> >> When we take down one node, our clusterStatus response is as expected (only >> listing the live node as live, and anything on the "down" node shows the >> state as down). >> >> Our colStatus response however continues to shows every shard as being >> "active" with the replica breakdown on every shard as "total" == "active", >> and "down" always being zero. >> i.e. >> "shards":{ >> "shard1":{ >> "state":"active", >> "range":"80000000-ffffffff", >> "replicas":{ >> "total":5, >> "active":5, >> "down":0, >> "recovering":0, >> "recovery_failed":0}, >> >> Even though we expect the "down" count to be either 3 or 2 depending on the >> shard (and thus "active" being of count 2 or 3 less than it is). >> >> When testing this situation with both Solr nodes being on the same box, the >> colStatus response is as expected in regards to the replica counts. >> >> Thanks!Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number >> 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >