Hi Joe,
In case of a single disk failure, you should not remove the data
directory from the cassandra.yaml file. Instead, you should replace the
failed disk with a new empty disk. See
https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRecoverUsingJBOD.html
for the steps.
Since your node failed to start, I guess it's not too late to restore
the settings in the cassandra.yaml file and then follow the above steps.
However, replacing the entire node is always an option if everything
else has failed, as long as you have RF>1 and other nodes in the cluster
are all healthy. If you need to do this, follow the steps here:
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html
As of your last question,
> /When a drive fails with cassandra, is it common for the node to come
down? /
this actually depends on the disk_failure_policy in your cassandra.yaml
file, read the comments in it will help you understand the available
choices.
Cheers,
Bowen
On 06/12/2021 14:11, Joe Obernberger wrote:
Hi All - one node in an 11 node cluster experienced a drive failure on
the first drive in the list. I removed that drive from the list so
that it now reads:
data_file_directories:
- /data/2/cassandra/data
- /data/3/cassandra/data
- /data/4/cassandra/data
- /data/5/cassandra/data
- /data/6/cassandra/data
- /data/8/cassandra/data
- /data/9/cassandra/data
But when I try to start the server, I get:
Exception (java.lang.RuntimeException) encountered during startup: A
node with address /172.16.100.251:7000 already exists, cancelling
join. Use cassandra.replace_address if you want to replace this node.
java.lang.RuntimeException: A node with address /172.16.100.251:7000
already exists, cancelling join. Use cassandra.replace_address if you
want to replace this node.
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: A node with address /172.16.100.251:7000
already exists, cancelling join. Use cassandra.replace_address if you
want to replace this node.
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
INFO [StorageServiceShutdownHook] 2021-12-05 15:49:48,468
HintsService.java:220 - Paused hints dispatch
WARN [StorageServiceShutdownHook] 2021-12-05 15:49:48,470
Gossiper.java:1993 - No local state, state is in silent shutdown, or
node hasn't joined, not announcing shutdown
Do I need to remove and re-add the node? When a drive fails with
cassandra, is it common for the node to come down?
Thank you!
-Joe Obernberger