One last update:
After kicking it more, it finally fully joined the cluster. The third
time the server was rebooted and after that it eventually reached the UN
state. I wish I had kept the link, but I had read that someone had a
similar issue joining a node to a cluster with 4.1.x and the answ
Some updates after getting back to this. I did hardware tests and could
not find any hardware issues. Instead of trying a replace, I went the
route of removing the dead node entirely and then adding in a new node.
The new node is still joining, but I am hitting some oddities in the
log. When j
To add on to what Bowen already wrote, if you cannot find any reason in the
logs at all, I would retry using different hardware.
In the recent past I have seen two cases where strange Cassandra problems were
actually caused by broken hardware (in both cases, a faulty memory module
caused the i
Is it bad to leave the replacement node up and running for hours even
when the cluster forgets it for the old node being replaced? I'll have
to set the logging to trace. debug produced nothing. I did stop the
service, which produced errors in the other nodes in the datacenter
since they had ope
In my experience, failed bootstrap / node replacement always leave some
traces in the logs. At the very minimal, there's going to be logs about
streaming sessions failing or aborting. I have never seen it silently
fails or stops without leaving any traces in the log. I can't think of
anything t
I checked all the logs and really couldn't find anything. I couldn't
find any sort of errors in dmesg, system.log, debug.log, gc.log (maybe
up the log level?), systemd journal...the logs are totally clean. It
just stops gossiping all of a sudden at 22GB of data each time, then the
old node retu
The dead node being replaced went back to DN state indicating the new
replacement node failed to join the cluster, usually because the
streaming was interrupted (e.g. by network issues, or long STW GC
pauses). I would start looking for red flags in the logs, including
Cassandra's logs, GC logs,
Hello everyone,
I have a cluster with 2 datacenters. I am using
GossipingPropertyFileSnitch as my endpoint snitch. Cassandra version
4.1.8. One datacenter is fully Ubuntu 24.04 and OpenJDK 11 and another
is Ubuntu 20.04 on OpenJDK 8. A seed node died in my second DC running
Ubuntu 20.04 hosts
end goal was to learn more about
> replacing a dead node in a live Cassandra cluster with minimal disruption
> to the existing cluster and figure out a better and faster way of doing the
> same.
>
> I am running a package installation of the following version of Cassandra.
>
>
Hello Cassandra-users,
I was running some tests today. My end goal was to learn more about
replacing a dead node in a live Cassandra cluster with minimal disruption
to the existing cluster and figure out a better and faster way of doing the
same.
I am running a package installation of the
used, the data
will stream from the decommissioned node. If removetoken is used, the data will
stream from the remaining replicas.
Hope this helps
Jan/
On Thu, 4/21/16, Anubhav Kale wrote:
Subject: RE: Problem Replacing a Dead Node
To: "
>
>
>
> *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
> *Sent:* Thursday, April 21, 2016 11:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Problem Replacing a Dead Node
>
>
>
> Here is a bit more detail of the whole situation. I am hoping someo
Replacing a Dead Node
Here is a bit more detail of the whole situation. I am hoping someone can help
me out here.
We have a seven node cluster. One the nodes started to have issues but it was
running. We decided to add a new node, and remove the problematic node after
the new node joins
Here is a bit more detail of the whole situation. I am hoping someone can
help me out here.
We have a seven node cluster. One the nodes started to have issues but it
was running. We decided to add a new node, and remove the problematic node
after the new node joins. However, the new node did not j
e byte counters to calculate streaming percentage complete and
> extrapolate.
>
>
>
> From: Mir Tanvir Hossain
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, April 21, 2016 at 10:02 AM
> To: "user@cassandra.apache.org"
> Subject: Problem Replaci
> Is the datastax-agent running fine on the node ? What does nodetool status
> and system.log show ?
>
>
>
> *From:* Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
> *Sent:* Thursday, April 21, 2016 10:02 AM
> *To:* user@cassandra.apache.org
> *Subject:* Prob
, use
the byte counters to calculate streaming percentage complete and extrapolate.
From: Mir Tanvir Hossain
Reply-To: "user@cassandra.apache.org"
Date: Thursday, April 21, 2016 at 10:02 AM
To: "user@cassandra.apache.org"
Subject: Problem Replacing a Dead Node
Hi, I am
Is the datastax-agent running fine on the node ? What does nodetool status and
system.log show ?
From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com]
Sent: Thursday, April 21, 2016 10:02 AM
To: user@cassandra.apache.org
Subject: Problem Replacing a Dead Node
Hi, I am trying to replace
Hi, I am trying to replace a dead node with by following
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html.
It's been 3 full days since the replacement node started, and the node is
still not showing up as part of the cluster on OpsCenter. I was wondering
wheth
In that case, just don't delete the dead node (what I think you should
do anyways. I'm pretty sure it can't be deleted if you're going to
replace it with "-Dcassandra.replace_address=...").
I was speaking about the case that you _do_ want it replaced. You can
just delete it and bootstrap a new node
I think Cassandra gives us control as what we want to do:
a) If we want to replace a dead node then we should specify
"-Dcassandra.replace_address=old_node_ipaddress"
b) If we are adding new nodes (no replacement) then do not specify above
option and tokens would get assigned randomly.
I can think
I guess Cassandra is aware that it has some replicas not meeting the
replication factor. Wouldn't it be nice if a bootstrapping node would
get those?
Could make things much simpler in the Ops view.
What do you think?
On Fri, Dec 5, 2014 at 8:31 AM, Jaydeep Chovatia
wrote:
> as per my knowledge i
as per my knowledge if you have externally NOT specified
"-Dcassandra.replace_address=old_node_ipaddress" then new tokens (randomly)
would get assigned to bootstrapping node instead of tokens of dead node.
-jaydeep
On Thu, Dec 4, 2014 at 6:50 AM, Omri Bahumi wrote:
> Hi,
>
> I was wondering, ho
Hi,
I was wondering, how would auto_bootstrap behave in this scenario:
1. I had a cluster with 3 nodes (RF=2)
2. One node died, I deleted it with "nodetool removenode" (+ force)
3. A new node launched with "auto_bootstrap: true"
The question is: will the "right" vnodes go to the new node as if i
test data will at
> least be stored on another node. Now what do I have to do to sync the
> "dead node" again after restoring the VM from the snapshot? Will a
> nodetool repair command be sufficient?
--
View this message in context:
http://cassandra-user-incubator-apache-o
tored on
another node. Now what do I have to do to sync the "dead node" again after
restoring the VM from the snapshot? Will a nodetool repair command be
sufficient?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replacing-a-dead-node-i
On Tue, Aug 12, 2014 at 4:33 AM, tsi wrote:
> In the datastax documentation there is a description how to replace a dead
> node
> (
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
> ).
> Is the replace_address option required even if the IP addr
a note about the auto
bootstrapping being stored somewhere in the system tables)?
--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Replacing-a-dead-node-in-Cassandra-2-0-8-tp7596245.html
Sent from the cassandra-u...@incubator.apache.org mailing
Repairing the range is an expensive operation and don't forget--just
because a node is down does not mean it's dead. I take nodes down for
maintenance all the time--maybe there was a security update that needed to
be applied, for example, or perhaps a kernel update. There are a multitude
of reaso
>
> Thanks Mongo maven:)
> I understand why you need to to do this.
My question was more from the architecture point if view. Why doesn't
> Cassandra just redistribute the data? Is it because of the gossip protocol?
Sure.. well I've attempted to launch new nodes to redistribute the data on
a tem
Thanks Mongo maven:)
I understand why you need to to do this.
My question was more from the architecture point if view. Why doesn't Cassandra
just redistribute the data? Is it because of the gossip protocol?
Thanks,
Prem
On 3 Jun 2014, at 17:35, Curious Patient wrote:
>> Assuming replication
>
> Assuming replication factor is >2, if a node dies, why does it matter? If
> we add a new node is added, shouldn't it just take the chunk of data it
> server as the "primary" node from the other existing nodes.
> Why do we need to worry about replacing the dead node?
The reason this matters is
A dead node is still allocated key ranges, and Cassandra will wait for it
to come back online rather than redistributing its data. It needs to be
decommissioned or replaced by a new node for it to be truly dead as far as
the cluster is concerned.
On Tue, Jun 3, 2014 at 11:12 AM, Prem Yadav wrote
Hi,
in the last week week, we saw at least two emails about dead node
replacement. Though I saw the documentation about how to do this, i am not
sure I understand why this is required.
Assuming replication factor is >2, if a node dies, why does it matter? If
we add a new node is added, shouldn't
On Mon, Oct 11, 2010 at 03:41, Chen Xinli wrote:
> Hi,
>
> We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled,
> hinted handoff disabled, WRITE with QUORUM, READ with ONE.
> we want to rely on read-repair totally for node failure, as returning
> inconsistent result temporarily i
Hi,
We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled,
hinted handoff disabled, WRITE with QUORUM, READ with ONE.
we want to rely on read-repair totally for node failure, as returning
inconsistent result temporarily is ok for us.
If a node is temporarily dead and returneded to
36 matches
Mail list logo