Even when a node has been stopped, it will still show up in the
"nodetool status" output from other running nodes. While a node is
starting, the status output from this node itself is pointless, because
it may yet to receive the status from other nodes. You should ignore
that until it's fully started.
The time between restarting each node depends on how quick a node
starts. Replaying commit logs can take very long time (I have seen it
took over 10 minutes). You should always check a restarting node's
current status, ensure it has finished starting, and then wait for
gossip to settle (sleep for a few minutes should do) before moving on to
the next node.
On 23/09/2022 15:07, Marc Hoppins wrote:
I restarted 48 nodes and every one came up fine. I was just wondering why the
status run on the restarted node has no ID until it has finished dealing with
whatever it does when starting up but it shows up immediately when status is
run on any other node.
I guess it prompts the question: how much time should elapse between restarting
each node? It seems to be something <60 seconds but I suppose it would depend
on whatever was lingering in the commit directory.
-----Original Message-----
From: Bowen Song via user <user@cassandra.apache.org>
Sent: Friday, September 23, 2022 3:47 PM
To: user@cassandra.apache.org
Subject: Re: Restart Cassandra
EXTERNAL
Did the node finish starting when you checked the "nodetool status"
output? Try "nodetool netstats" on the starting node, the output will show "Mode: NORMAL" if it has finished starting.
It's also worth checking the "nodetool info" output, and make sure "Gossip active" and "Native Transport
active" (unless you have disabled it) are "true".
On 23/09/2022 08:17, Marc Hoppins wrote:
Hi all,
Restarting the service on a node. Checking status from a remote node, I see:
(prod) marc.hoppins.ipa@ba-cassandra01:~ $ /opt/cassandra/bin/nodetool status
-r|grep 03
UN ba-cassandra09 779.03 GiB 16 ?
1fc8061d-2dd4-4b2c-97fa-e492063da495 SSW09
UN ba-cassandra20 796.94 GiB 16 ?
c6b43e76-bd5d-4672-a62a-83a06030578d SSW09
UN ba-cassandra10 750.84 GiB 16 ?
c03ae9c6-89cb-4e65-a1ef-a56e2efc24da SSW09
UN ba-cassandra04 785.97 GiB 16 ?
16dac20f-89fe-435c-8b49-d80a03fe239e SSW09
DN ba-cassandra03 729.43 GiB 16 ?
8785b173-6b68-45a4-ad38-e9b4036ffaf5 SSW09
UN dr1-cassandra18 738.9 GiB 16 ?
84348044-b6c6-44d0-9038-b1d49d39e496 SSW02
UN dr1-cassandra03 783.04 GiB 16 ?
21dac8e4-b556-48f1-873d-fb2876e2c349 SSW02
But when checking locally, I see:
?N ba-cassandra08 ? 16 ?
8520bdcd-1cfb-431f-a99c-15b8ca288e96 SSW09
?N ba-cassandra04 ? 16 ?
16dac20f-89fe-435c-8b49-d80a03fe239e SSW09
?N ba-cassandra11 ? 16 ?
cf010de0-657c-4135-beec-7ba37cc3d8f4 SSW09
?N ba-cassandra18 ? 16 ?
994c67d7-e6f9-4419-a02b-b5296ec92cb0 SSW09
UN ba-cassandra03 146.03 GiB 16 ?
SSW09
?N ba-cassandra21 ? 16 ?
7a8fc8c4-fb64-4bb7-ad5c-cb5112d9f783 SSW09
?N ba-cassandra13 ? 16 ?
853f095a-c780-473d-85f0-b8d047d745f1 SSW09
When I recheck it, I notice that the data count increases to the correct amount
after some small time, the node ID appears for the local status, and the remote
status shows as UN. If the remote status shows the node ID, why is it missing
locally? Is the node ID only stored on the seeds?