Re: Node failed after drive failed

Bowen Song Sat, 11 Dec 2021 12:44:47 -0800

Hi Joe,

In case of a single disk failure, you should not remove the datadirectory from the cassandra.yaml file. Instead, you should replace thefailed disk with a new empty disk. Seehttps://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRecoverUsingJBOD.htmlfor the steps.

Since your node failed to start, I guess it's not too late to restorethe settings in the cassandra.yaml file and then follow the above steps.However, replacing the entire node is always an option if everythingelse has failed, as long as you have RF>1 and other nodes in the clusterare all healthy. If you need to do this, follow the steps here:https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsReplaceNode.html


As of your last question,

> /When a drive fails with cassandra, is it common for the node to comedown? /

this actually depends on the disk_failure_policy in your cassandra.yamlfile, read the comments in it will help you understand the availablechoices.


Cheers,
Bowen

On 06/12/2021 14:11, Joe Obernberger wrote:

Hi All - one node in an 11 node cluster experienced a drive failure onthe first drive in the list. I removed that drive from the list sothat it now reads:
data_file_directories:
    - /data/2/cassandra/data
    - /data/3/cassandra/data
    - /data/4/cassandra/data
    - /data/5/cassandra/data
    - /data/6/cassandra/data
    - /data/8/cassandra/data
    - /data/9/cassandra/data

But when I try to start the server, I get:
Exception (java.lang.RuntimeException) encountered during startup: Anode with address /172.16.100.251:7000 already exists, cancellingjoin. Use cassandra.replace_address if you want to replace this node.java.lang.RuntimeException: A node with address /172.16.100.251:7000already exists, cancelling join. Use cassandra.replace_address if youwant to replace this node. atorg.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659) atorg.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934) atorg.apache.cassandra.service.StorageService.initServer(StorageService.java:784) atorg.apache.cassandra.service.StorageService.initServer(StorageService.java:729) atorg.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420) atorg.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763) atorg.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)ERROR [main] 2021-12-05 15:49:48,446 CassandraDaemon.java:909 -Exception encountered during startupjava.lang.RuntimeException: A node with address /172.16.100.251:7000already exists, cancelling join. Use cassandra.replace_address if youwant to replace this node. atorg.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:659) atorg.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934) atorg.apache.cassandra.service.StorageService.initServer(StorageService.java:784) atorg.apache.cassandra.service.StorageService.initServer(StorageService.java:729) atorg.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420) atorg.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763) atorg.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)INFO [StorageServiceShutdownHook] 2021-12-05 15:49:48,468HintsService.java:220 - Paused hints dispatchWARN [StorageServiceShutdownHook] 2021-12-05 15:49:48,470Gossiper.java:1993 - No local state, state is in silent shutdown, ornode hasn't joined, not announcing shutdown
Do I need to remove and re-add the node? When a drive fails withcassandra, is it common for the node to come down?
Thank you!

-Joe Obernberger

Re: Node failed after drive failed

Reply via email to