Even when a node has been stopped, it will still show up in the "nodetool status" output from other running nodes. While a node is starting, the status output from this node itself is pointless, because it may yet to receive the status from other nodes. You should ignore that until it's fully started.

The time between restarting each node depends on how quick a node starts. Replaying commit logs can take very long time (I have seen it took over 10 minutes). You should always check a restarting node's current status, ensure it has finished starting, and then wait for gossip to settle (sleep for a few minutes should do) before moving on to the next node.

On 23/09/2022 15:07, Marc Hoppins wrote:
I restarted 48 nodes and every one came up fine. I was just wondering why the 
status run on the restarted node has no ID until it has finished dealing with 
whatever it does when starting up but it shows up immediately when status is 
run on any other node.

I guess it prompts the question: how much time should elapse between restarting 
each node? It seems to be something <60 seconds but I suppose it would depend 
on whatever was lingering in the commit directory.

-----Original Message-----
From: Bowen Song via user <user@cassandra.apache.org>
Sent: Friday, September 23, 2022 3:47 PM
To: user@cassandra.apache.org
Subject: Re: Restart Cassandra

EXTERNAL


Did the node finish starting when you checked the "nodetool status"
output? Try "nodetool netstats" on the starting node, the output will show "Mode: NORMAL" if it has finished starting. 
It's also worth checking the "nodetool info" output, and make sure "Gossip active" and "Native Transport 
active" (unless you have disabled it) are "true".

On 23/09/2022 08:17, Marc Hoppins wrote:
Hi all,

Restarting the service on a node.  Checking status from a remote node, I see:

(prod) marc.hoppins.ipa@ba-cassandra01:~ $ /opt/cassandra/bin/nodetool status 
-r|grep 03
UN  ba-cassandra09   779.03 GiB   16      ?     
1fc8061d-2dd4-4b2c-97fa-e492063da495  SSW09
UN  ba-cassandra20   796.94 GiB   16      ?     
c6b43e76-bd5d-4672-a62a-83a06030578d  SSW09
UN  ba-cassandra10   750.84 GiB   16      ?     
c03ae9c6-89cb-4e65-a1ef-a56e2efc24da  SSW09
UN  ba-cassandra04   785.97 GiB   16      ?     
16dac20f-89fe-435c-8b49-d80a03fe239e  SSW09
DN  ba-cassandra03   729.43 GiB   16      ?     
8785b173-6b68-45a4-ad38-e9b4036ffaf5  SSW09
UN  dr1-cassandra18  738.9 GiB    16      ?     
84348044-b6c6-44d0-9038-b1d49d39e496  SSW02
UN  dr1-cassandra03  783.04 GiB   16      ?     
21dac8e4-b556-48f1-873d-fb2876e2c349  SSW02

But when checking locally, I see:

?N  ba-cassandra08   ?           16      ?     
8520bdcd-1cfb-431f-a99c-15b8ca288e96  SSW09
?N  ba-cassandra04   ?           16      ?     
16dac20f-89fe-435c-8b49-d80a03fe239e  SSW09
?N  ba-cassandra11   ?           16      ?     
cf010de0-657c-4135-beec-7ba37cc3d8f4  SSW09
?N  ba-cassandra18   ?           16      ?     
994c67d7-e6f9-4419-a02b-b5296ec92cb0  SSW09
UN  ba-cassandra03   146.03 GiB  16      ?                                      
     SSW09
?N  ba-cassandra21   ?           16      ?     
7a8fc8c4-fb64-4bb7-ad5c-cb5112d9f783  SSW09
?N  ba-cassandra13   ?           16      ?     
853f095a-c780-473d-85f0-b8d047d745f1  SSW09

When I recheck it, I notice that the data count increases to the correct amount 
after some small time, the node ID appears for the local status, and the remote 
status shows as UN.  If the remote status shows the node ID, why is it missing 
locally?   Is the node ID only stored on the seeds?

Reply via email to