We tried both stopping Solr gracefully, and by killing the Docker container (not gracefully) and always had the same results.
That's brilliant, thank you. Could you please send a link to the issue once it's up. We have our clusterStatus and colStatus json responses and our collection graph showing one of the nodes being down if you'd like us to attach that to the issue. Also, whenever we've come across this down node problem, we've also noticed a bit of a ui issue on the cloud/nodes view where one of the node rows has its column output off by one (we can attach the screenshot to the issue as well if you'd like) i.e. the "Node" value would be in the "Host" column, the "CPU" value would be in the "Node" column ... making the "Replicas" column empty. -----Erick Erickson <erickerick...@gmail.com> wrote: ----- To: solr-user@lucene.apache.org From: Erick Erickson <erickerick...@gmail.com> Date: 10/30/2019 01:37PM Subject: Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in a distributed deployment Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr stop…) it’s fine. If I do a "kill -9” on it, I see the same thing you do on master. It’s a bit tricky. When a node goes away without a chance to gracefully shut down, there’s no chance to set the state in the collection’s “state.json” znode. However, the node will be removed from the “live_nodes” list and a replica is not truly active unless its state is “active” in the state.json file _and_ the node appears in live_nodes. CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently doesn’t. I’ll raise a JIRA. Thanks for letting us know Erick > On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: > > colStatus (and clusterStatus) from the Collections api. > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_collections-2Dapi.html-23colstatus&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=V0GTCxFMwqrK0qtiGhBK55cwM7I2m6OVJOZL94jOqYI&e= > > > > Running something like this in the browser where the live solr node is > accessible on port 8983 (but points at a Docker container which is running > the Solr node): > https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_collections-3Faction-3DCOLSTATUS-26collection-3Dcoll&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=c2fNGwqzx7e_S5v0R_3YO4X6dys0u-PE-pUxErOXpYo&e= > > > > > > -----Erick Erickson <erickerick...@gmail.com> wrote: ----- > To: solr-user@lucene.apache.org > From: Erick Erickson <erickerick...@gmail.com> > Date: 10/29/2019 05:39PM > Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 in > a distributed deployment > > > Uhm, what is colStatus? You need to show us _exactly_ what Solr commands > you’re running for us to make any intelligent comments. > >> On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: >> >> Hi, >> >> We're seeing an issue with colStatus in a distributed Solr deployment. >> >> Scenario: >> Collection with: >> - 1 zk >> - 2 solr nodes on different boxes (simulated using Docker containers) >> - replication factor 5 >> >> When we take down one node, our clusterStatus response is as expected (only >> listing the live node as live, and anything on the "down" node shows the >> state as down). >> >> Our colStatus response however continues to shows every shard as being >> "active" with the replica breakdown on every shard as "total" == "active", >> and "down" always being zero. >> i.e. >> "shards":{ >> "shard1":{ >> "state":"active", >> "range":"80000000-ffffffff", >> "replicas":{ >> "total":5, >> "active":5, >> "down":0, >> "recovering":0, >> "recovery_failed":0}, >> >> Even though we expect the "down" count to be either 3 or 2 depending on the >> shard (and thus "active" being of count 2 or 3 less than it is). >> >> When testing this situation with both Solr nodes being on the same box, the >> colStatus response is as expected in regards to the replica counts. >> >> Thanks!Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number >> 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU