We tried both stopping Solr gracefully, and by killing the Docker container 
(not gracefully) and always had the same results.


That's brilliant, thank you.
Could you please send a link to the issue once it's up.
We have our clusterStatus and colStatus json responses and our collection graph 
showing one of the nodes being down if you'd like us to attach that to the 
issue.


Also, whenever we've come across this down node problem, we've also noticed a 
bit of a ui issue on the cloud/nodes view where one of the node rows has its 
column output off by one (we can attach the screenshot to the issue as well if 
you'd like) 
i.e. the "Node" value would be in the "Host" column, the "CPU" value would be 
in the "Node" column ... making the "Replicas" column empty.



-----Erick Erickson <erickerick...@gmail.com> wrote: -----
To: solr-user@lucene.apache.org
From: Erick Erickson <erickerick...@gmail.com>
Date: 10/30/2019 01:37PM
Subject: Re: [EXTERNAL] colStatus response not as expected with Solr 8.1.1 in a 
distributed deployment


Exactly how did you kill the instance? If I stop Solr gracefully (bin/solr 
stop…) it’s fine. If I do a "kill -9” on it, I see the same thing you do on 
master.

It’s a bit tricky. When a node goes away without a chance to gracefully shut 
down, there’s no chance to set the state in the collection’s “state.json” 
znode. However, the node will be removed from the “live_nodes” list and a 
replica is not truly active unless its state is “active” in the state.json file 
_and_ the node appears in live_nodes.

CLUSTERSTATUS pretty clearly understands this, but COLSTATUS apparently doesn’t.

I’ll raise a JIRA.

Thanks for letting us know

Erick

> On Oct 29, 2019, at 2:10 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote:
>
> colStatus (and clusterStatus) from the Collections api.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_collections-2Dapi.html-23colstatus&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=V0GTCxFMwqrK0qtiGhBK55cwM7I2m6OVJOZL94jOqYI&e=
>  
>
>
> Running something like this in the browser where the live solr node is 
> accessible on port 8983 (but points at a Docker container which is running 
> the Solr node):
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_collections-3Faction-3DCOLSTATUS-26collection-3Dcoll&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=hYWjY91INT8BxCM7Yo3LAY4kHcOGUOO3miRla3QTVdo&m=c-eyx2cStZUbvbmTDEvuqNmXsuMXmRejU2ksFOhx9sw&s=c2fNGwqzx7e_S5v0R_3YO4X6dys0u-PE-pUxErOXpYo&e=
>  
>
>
>
>
> -----Erick Erickson <erickerick...@gmail.com> wrote: -----
> To: solr-user@lucene.apache.org
> From: Erick Erickson <erickerick...@gmail.com>
> Date: 10/29/2019 05:39PM
> Subject: [EXTERNAL] Re: colStatus response not as expected with Solr 8.1.1 in 
> a distributed deployment
>
>
> Uhm, what is colStatus? You need to show us _exactly_ what Solr commands 
> you’re running for us to make any intelligent comments.
>
>> On Oct 29, 2019, at 1:12 PM, Elizaveta Golova <egol...@uk.ibm.com> wrote:
>>
>> Hi,
>>
>> We're seeing an issue with colStatus in a distributed Solr deployment.
>>
>> Scenario:
>> Collection with:
>> - 1 zk
>> - 2 solr nodes on different boxes (simulated using Docker containers)
>> - replication factor 5
>>
>> When we take down one node, our clusterStatus response is as expected (only 
>> listing the live node as live, and anything on the "down" node shows the 
>> state as down).
>>
>> Our colStatus response however continues to shows every shard as being 
>> "active" with the replica breakdown on every shard as "total" == "active", 
>> and "down" always being zero.
>> i.e.
>> "shards":{
>> "shard1":{
>> "state":"active",
>> "range":"80000000-ffffffff",
>> "replicas":{
>> "total":5,
>> "active":5,
>> "down":0,
>> "recovering":0,
>> "recovery_failed":0},
>>
>> Even though we expect the "down" count to be either 3 or 2 depending on the 
>> shard (and thus "active" being of count 2 or 3 less than it is).
>>
>> When testing this situation with both Solr nodes being on the same box, the 
>> colStatus response is as expected in regards to the replica counts.
>>
>> Thanks!Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number 
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number 
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Reply via email to