Re: Decommissioned nodes are in UNREACHABLE state

Alain RODRIGUEZ Mon, 25 May 2020 08:09:31 -0700

Hello,

Wow. This is a 1 year old issue. Are we still talking about the same node?
Other than what I wrote in my previous message, I'm not sure how to guide
you on that one.


 I am wondering, where the information is coming from.


Me too! :).


I checked system.peers for the IP in UNREACHABLE state and it's not present
>

Have you looked at all the nodes? This system table is *NOT* distributed,
so querying this 'cqlsh -e "SELECT * FROM system.peers;"' will give
different results on each of the node. It's enough of having one node with
this node for it to show up on 'nodetool describecluster' output as
UNREACHABLE I think.

Random ideas and questions:
- Does the corresponding instance still exist?
- Are we speaking of the same node than last year? That would be the
longest lasting ghost node I've heard about :).
- Is it defined somewhere in your configuration still (Snitch config file:
cassandra-topology.properties - maybe if using PropertyFileSnitch)?
- What's the C* version you're using? Still 2.1.16?

I somewhat feel you (or I...) might be missing something here, the node has
to be referenced somewhere, if not it would disappear on restart.

Good luck with that.

C*heers,
-----------------------
Alain Rodriguez - alain.rodrig...@datastax.com
France / Spain

https://www.datastax.com



Le sam. 23 mai 2020 à 19:07, Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> a écrit :

> any inputs here?
>
> On Sat, May 2, 2020 at 12:49 PM Jai Bheemsen Rao Dhanwada <
> jaibheem...@gmail.com> wrote:
>
>> Hello Alain,
>>
>> Thanks for your suggestions.
>>
>> Surprisingly, the node which is in unreachable state, is not present in
>> any of the system tables. I am wondering, where the information is coming
>> from.
>> I checked system.peers for the IP in UNREACHABLE state and it's not
>> present. I tried restart of Cassandra service as well.
>>
>> On Thu, Jun 20, 2019 at 5:59 AM Alain RODRIGUEZ <arodr...@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Assuming you nodes are out for a while and you don't need the data after
>>> 60 days (or cannot get it anyway), the way to fix this is to force the node
>>> out. I would try, in this order:
>>>
>>> - nodetool removenode HOSTID
>>> - nodetool removenode force
>>>
>>> These 2 might really not work at this stage, but if they do, this is a
>>> clean way to do so.
>>> Now, to really push the ghost nodes to the exit door, it often takes:
>>>
>>> - nodetool assassinate
>>>
>>> I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
>>> details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html
>>> ):
>>>
>>> echo "run -b org.apache.cassandra.net:type=Gossiper
>>>> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
>>>> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199
>>>
>>>
>>> This should really remove the traces of the node, without any safety, no
>>> streaming, no checks, just get rid of it. So to use with a lot of care and
>>> understanding. In your situation I guess this is what will work.
>>>
>>> As a last attempt, you could try removing traces of the dead node(s)
>>> from all the live nodes 'system.peers' table. This table is local to each
>>> node, so the DELETE command is to be send to all the nodes (that have a
>>> trace of an old node).
>>>
>>> - cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"
>>>
>>> but I see the node IPs in UNREACHABLE state in "nodetool
>>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days)
>>>
>>>
>>> To be more accurate,  you should never see leaving node as unreachable I
>>> believe (not even for 72 hours). The 72 hours is the time Gossip should
>>> continue referencing the old nodes. Typically when you remove the ghost
>>> nodes, they should no longer appear in 'nodetool describe' cluster at all,
>>>  I would say immediately, but still appear in 'nodetool gossipinfo' with a
>>> 'left' or 'remove' status.
>>>
>>> I hope that helps and that one of the above will do the trick (I'd bet
>>> on the assassinate :)). Also sorry it took us a while to answer you this
>>> relatively common question :);
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - al...@thelastpickle.com
>>> France / Spain
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>> Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
>>> jaibheem...@gmail.com> a écrit :
>>>
>>>> Hello,
>>>>
>>>> I have a Cassandra cluster running with 2.1.16 version of Cassandra,
>>>> where I have decommissioned few nodes from the cluster using "nodetool
>>>> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
>>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
>>>> Rolling restart of the nodes didn't remove them. any idea what could be
>>>> causing here?
>>>>
>>>> Note: I don't see them in the nodetool status output.
>>>>
>>>

Re: Decommissioned nodes are in UNREACHABLE state

Reply via email to