https://issues.apache.org/jira/browse/SOLR-13882
Do watch out for browser or other caching, I often use a private window to avoid being fooled, I’ve had that happen more than once. If you see this problem and then look in the UI at cloud>>tree>>collections>>your_collection>>state.json and see the state of a replica as “down”, then it’s most probably some kind of outside-of-solr caching, ‘cause that value is just counted to create the output for COLSTATUS. Also be aware that the corresponding entry in live_nodes will _NOT_ be removed until ZK tries to ping the Solr node and times out, so there’s a lag between when a node goes away un-gracefully and when that node is removed, during which the replica will be counted as active even if live_nodes is checked. As far as the UI is concerned, please go ahead and search the JIRA system first to see if it’s been noted, otherwise go ahead and raise a JIRA. All you need is a sign-on. Do include which browser and version, which Solr version and a screenshot please. Best, Erick > On Oct 30, 2019, at 10:08 AM, Elizaveta Golova <egol...@uk.ibm.com> wrote: > > We tried both stopping Solr gracefully, and by killing the Docker container > (not gracefully) and always had the same results. > > > That's brilliant, thank you. > Could you please send a link to the issue once it's up. > We have our clusterStatus and colStatus json responses and our collection > graph showing one of the nodes being down if you'd like us to attach that to > the issue. > > > Also, whenever we've come across this down node problem, we've also noticed a > bit of a ui issue on the cloud/nodes view where one of the node rows has its > column output off by one (we can attach the screenshot to the issue as well > if you'd like) > i.e. the "Node" value would be in the "Host" column, the "CPU" value would be > in the "Node" column ... making the "Replicas" column empty. > > > > -----Erick Erickson <erickerick...@gmail.com> wrote: ----- > To: solr-user@lucene.apache.org > From: Erick Erickson <erickerick...@gmail.com> > Date: 10/30/2019 01:37PM > Subject: Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in > a distributed deployment > > > Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr > stop…) it’s fine. If I do a "kill -9” on it, I see the same thing you do on > master. > > It’s a bit tricky. When a node goes away without a chance to gracefully shut > down, there’s no chance to set the state in the collection’s “state.json” > znode. However, the node will be removed from the “live_nodes” list and a > replica is not truly active unless its state is “active” in the state.json > file _and_ the node appears in live_nodes. > > CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently > doesn’t. > > I’ll raise a JIRA. > > Thanks for letting us know > > Erick > >> On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: >> >> colStatus (and clusterStatus) from the Collections api. >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_collections-2Dapi.html-23colstatus&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=V0GTCxFMwqrK0qtiGhBK55cwM7I2m6OVJOZL94jOqYI&e= >> >> >> >> Running something like this in the browser where the live solr node is >> accessible on port 8983 (but points at a Docker container which is running >> the Solr node): >> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_collections-3Faction-3DCOLSTATUS-26collection-3Dcoll&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=c2fNGwqzx7e_S5v0R_3YO4X6dys0u-PE-pUxErOXpYo&e= >> >> >> >> >> >> -----Erick Erickson <erickerick...@gmail.com> wrote: ----- >> To: solr-user@lucene.apache.org >> From: Erick Erickson <erickerick...@gmail.com> >> Date: 10/29/2019 05:39PM >> Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 >> in a distributed deployment >> >> >> Uhm, what is colStatus? You need to show us _exactly_ what Solr commands >> you’re running for us to make any intelligent comments. >> >>> On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote: >>> >>> Hi, >>> >>> We're seeing an issue with colStatus in a distributed Solr deployment. >>> >>> Scenario: >>> Collection with: >>> - 1 zk >>> - 2 solr nodes on different boxes (simulated using Docker containers) >>> - replication factor 5 >>> >>> When we take down one node, our clusterStatus response is as expected (only >>> listing the live node as live, and anything on the "down" node shows the >>> state as down). >>> >>> Our colStatus response however continues to shows every shard as being >>> "active" with the replica breakdown on every shard as "total" == "active", >>> and "down" always being zero. >>> i.e. >>> "shards":{ >>> "shard1":{ >>> "state":"active", >>> "range":"80000000-ffffffff", >>> "replicas":{ >>> "total":5, >>> "active":5, >>> "down":0, >>> "recovering":0, >>> "recovery_failed":0}, >>> >>> Even though we expect the "down" count to be either 3 or 2 depending on the >>> shard (and thus "active" being of count 2 or 3 less than it is). >>> >>> When testing this situation with both Solr nodes being on the same box, the >>> colStatus response is as expected in regards to the replica counts. >>> >>> Thanks!Unless stated otherwise above: >>> IBM United Kingdom Limited - Registered in England and Wales with number >>> 741598. >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >>> >> >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number >> 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU