https://issues.apache.org/jira/browse/SOLR-13882

Do watch out for browser or other caching, I often use a private window to 
avoid being fooled, I’ve had that happen more than once. If you see this 
problem and then look in the UI at 
cloud>>tree>>collections>>your_collection>>state.json and see the state of a 
replica as “down”, then it’s most probably some kind of outside-of-solr 
caching, ‘cause that value is just counted to create the output for COLSTATUS.

Also be aware that the corresponding entry in live_nodes will _NOT_ be removed 
until ZK tries to ping the Solr node and times out, so there’s a lag between 
when a node goes away un-gracefully and when that node is removed, during which 
the replica will be counted as active even if live_nodes is checked.

As far as the UI is concerned, please go ahead and search the JIRA system first 
to see if it’s been noted, otherwise go ahead and raise a JIRA. All you need is 
a sign-on. Do include which browser and version, which Solr version and a 
screenshot please.

Best,
Erick


> On Oct 30, 2019, at 10:08 AM, Elizaveta Golova <egol...@uk.ibm.com> wrote:
> 
> We tried both stopping Solr gracefully, and by killing the Docker container 
> (not gracefully) and always had the same results.
> 
> 
> That's brilliant, thank you.
> Could you please send a link to the issue once it's up.
> We have our clusterStatus and colStatus json responses and our collection 
> graph showing one of the nodes being down if you'd like us to attach that to 
> the issue.
> 
> 
> Also, whenever we've come across this down node problem, we've also noticed a 
> bit of a ui issue on the cloud/nodes view where one of the node rows has its 
> column output off by one (we can attach the screenshot to the issue as well 
> if you'd like) 
> i.e. the "Node" value would be in the "Host" column, the "CPU" value would be 
> in the "Node" column ... making the "Replicas" column empty.
> 
> 
> 
> -----Erick Erickson <erickerick...@gmail.com> wrote: -----
> To: solr-user@lucene.apache.org
> From: Erick Erickson <erickerick...@gmail.com>
> Date: 10/30/2019 01:37PM
> Subject: Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in 
> a distributed deployment
> 
> 
> Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr 
> stop…) it’s fine. If I do a "kill -9” on it, I see the same thing you do on 
> master.
> 
> It’s a bit tricky. When a node goes away without a chance to gracefully shut 
> down, there’s no chance to set the state in the collection’s “state.json” 
> znode. However, the node will be removed from the “live_nodes” list and a 
> replica is not truly active unless its state is “active” in the state.json 
> file _and_ the node appears in live_nodes.
> 
> CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently 
> doesn’t.
> 
> I’ll raise a JIRA.
> 
> Thanks for letting us know
> 
> Erick
> 
>> On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote:
>> 
>> colStatus (and clusterStatus) from the Collections api.
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_collections-2Dapi.html-23colstatus&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=V0GTCxFMwqrK0qtiGhBK55cwM7I2m6OVJOZL94jOqYI&e=
>>  
>> 
>> 
>> Running something like this in the browser where the live solr node is 
>> accessible on port 8983 (but points at a Docker container which is running 
>> the Solr node):
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_collections-3Faction-3DCOLSTATUS-26collection-3Dcoll&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=c2fNGwqzx7e_S5v0R_3YO4X6dys0u-PE-pUxErOXpYo&e=
>>  
>> 
>> 
>> 
>> 
>> -----Erick Erickson <erickerick...@gmail.com> wrote: -----
>> To: solr-user@lucene.apache.org
>> From: Erick Erickson <erickerick...@gmail.com>
>> Date: 10/29/2019 05:39PM
>> Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 
>> in a distributed deployment
>> 
>> 
>> Uhm, what is colStatus? You need to show us _exactly_ what Solr commands 
>> you’re running for us to make any intelligent comments.
>> 
>>> On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We're seeing an issue with colStatus in a distributed Solr deployment.
>>> 
>>> Scenario:
>>> Collection with:
>>> - 1 zk
>>> - 2 solr nodes on different boxes (simulated using Docker containers)
>>> - replication factor 5
>>> 
>>> When we take down one node, our clusterStatus response is as expected (only 
>>> listing the live node as live, and anything on the "down" node shows the 
>>> state as down).
>>> 
>>> Our colStatus response however continues to shows every shard as being 
>>> "active" with the replica breakdown on every shard as "total" == "active", 
>>> and "down" always being zero.
>>> i.e.
>>> "shards":{
>>> "shard1":{
>>> "state":"active",
>>> "range":"80000000-ffffffff",
>>> "replicas":{
>>> "total":5,
>>> "active":5,
>>> "down":0,
>>> "recovering":0,
>>> "recovery_failed":0},
>>> 
>>> Even though we expect the "down" count to be either 3 or 2 depending on the 
>>> shard (and thus "active" being of count 2 or 3 less than it is).
>>> 
>>> When testing this situation with both Solr nodes being on the same box, the 
>>> colStatus response is as expected in regards to the replica counts.
>>> 
>>> Thanks!Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number 
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>> 
>> 
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>> 
> 
> 
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598. 
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Reply via email to