l 16, 2020 at 3:32 AM
To: "user@cassandra.apache.org"
Subject: Re: Cassandra node JVM hang during node repair a table with
materialized view
Message from External Sender
Thanks a lot. We are working on removing views and control the partition size.
I hope the improvements help us
Bes
Thanks a lot. We are working on removing views and control the partition
size. I hope the improvements help us
Best regards
Gb
Erick Ramirez 于2020年4月16日周四 下午2:08写道:
> GC collector is G1. I ever repair the node after scale up. The JVM issue
>> reproduced. Can I increase the heap to 40 GB on
>
> GC collector is G1. I ever repair the node after scale up. The JVM issue
> reproduced. Can I increase the heap to 40 GB on a 64GB VM?
>
I wouldn't recommend going beyond 31GB on G1. It will be diminishing
returns as I mentioned before.
Do you think the issue is related to materialized view
Thanks a lot for your sharing.
The node is added recently. The bootstrap failed since too many tombstone.
So we enabled the node without bootstrap enabled. Some sstables are not
created in bootstrap. So the missing files might be numerous. I have set
the repair thread number is 1. should I als
Is this the first time you've repaired your cluster? Because it sounds like
it isn't coping. First thing you need to make sure of is to *not* run
repairs in parallel. It can overload your cluster -- only kick off a repair
one node at a time on small clusters. For larger clusters, you might be
able
Hello experts
I have a 9 nodes cluster on AWS. Recently, some nodes were down and I want
to repair the cluster after I restarted them. But I found the repair
operation causes lots of memtable flush and then the JVM GC failed.
Consequently, the node hang.
I am using the cassandra 3.1.0.
java vers
RF=5 allows you to lose two hosts without losing quorum
Many teams can calculate their hardware failure rate and replacement time. If
you can do both of these things you can pick and RF that meets your durability
and availability SLO. For sufficiently high SLOs you’ll need RF > 3
> On Jun 30,
On Sat, Jun 29, 2019 at 5:49 AM Jeff Jirsa wrote:
> If you’re at RF= 3 and read/write at quorum, you’ll have full visibility
> of all data if you switch to RF=4 and continue reading at quorum because
> quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the
> two nodes that got
If you’re at RF= 3 and read/write at quorum, you’ll have full visibility of all
data if you switch to RF=4 and continue reading at quorum because quorum if 4
is 3, so you’re guaranteed to overlap with at least one of the two nodes that
got all earlier writes
Going from 3 to 4 to 5 requires a re
On Fri, Jun 28, 2019 at 11:29 PM Jeff Jirsa wrote:
> you often have to run repair after each increment - going from 3 -> 5
> means 3 -> 4, repair, 4 -> 5 - just going 3 -> 5 will violate consistency
> guarantees, and is technically unsafe.
>
Jeff,
How going from 3 -> 4 is *not violating* consi
Yep - not to mention the increased complexity and overhead of going from
ONE to QUORUM, or the increased cost of QUORUM in RF=5 vs RF=3.
If you're in a cloud provider, I've found you're almost always better off
adding a new DC with a higher RF, assuming you're on NTS like Jeff
mentioned.
On Fri,
For just changing RF:
You only need to repair the full token range - how you do that is up to
you. Running `repair -pr -full` on each node will do that. Running `repair
-full` will do it multiple times, so it's more work, but technically
correct.The caveat that few people actually appreciate about
Hi all …
The datastax & apache docs are clear: run ‘nodetool repair’ after you alter a
keyspace to change its RF or RS.
However, the details are all over the place as what type of repair and on what
nodes it needs to run. None of the above doc authorities are clear and what you
find on the int
rd
> compatiblity break in a bug-fix release.
>
> Just my 2 cents from someone having > 300 Cassandra 2.1 JVMs out there spread
> around the world.
>
> Thanks,
> Thomas
>
> From: kurt greaves [mailto:k...@instaclustr.com]
> Sent: Dienstag, 19. September 20
changing so often. To me, this is a major flaw in Cassandra.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com]
> *Sent:* Tuesday, September 19, 2017 2:33 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: M
unsubscribe
Anthony P. Scism
Info Tech-Risk Mgmt/Client Sys - Capacity Planning
Work: 402-544-0361 Mobile: 402-707-4446
From: "Durity, Sean R"
To: "user@cassandra.apache.org"
Date: 09/19/2017 09:25 AM
Subject: RE: Mul
:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
In 4.0 anti-compaction is no longer run after full repairs, so we should
probably backport this behavior to 3.0, given there are known limitations with
incremental repair on 3.0 and non-incremental
> From: kurt greaves [mailto:k...@instaclustr.com]
> Sent: Dienstag, 19. September 2017 06:24
> To: User
>
>
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs
> still trigge
t;
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Montag, 18. September 2017 16:10
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
>
> Sorry I may be wrong about the cause - didn't see -full
>
>
&g
ational
> POV.
>
> Thanks again.
>
> Thomas
>
> From: Jeff Jirsa [mailto:jji...@gmail.com]
> Sent: Montag, 18. September 2017 15:56
> To: user@cassandra.apache.org
> Subject: Re: Multi-node repair fails after upgrading to 3.0.14
>
> The command you'
Hi Jeff,
understood. That’s quite a change then coming from 2.1 from an operational POV.
Thanks again.
Thomas
From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Montag, 18. September 2017 15:56
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
The
t; without printing a stack trace.
>
> The error message and stack trace isn’t really useful here. Any further
> ideas/experiences?
>
> Thanks,
> Thomas
>
> From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
> Sent: Freitag, 15. September 2017 11:30
> To: u
t;
>
>
> Thanks,
>
> Thomas
>
>
>
> *From:* Alexander Dejanovski [mailto:a...@thelastpickle.com]
> *Sent:* Freitag, 15. September 2017 11:30
>
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14
>
>
: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
Right, you should indeed add the "--full" flag to perform full repairs, and you
can then keep the "-pr" flag.
I'd advise to monitor the status o
Few notes:
- in 3.0 the default changed to incremental repair which will have to
anticompact sstables to allow you to repair the primary ranges you've specified
- since you're starting the repair on all nodes at the same time, you end up
with overlapping anticompactions
Generally you should stag
Alex,
thanks again! We will switch back to the 2.1 behavior for now.
Thomas
From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 11:30
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails after upgrading to 3.0.14
Right, you should indeed
; partition range (-pr) option, but with 3.0 we additionally have to provide
> the –full option, right?
>
>
>
> Thanks again,
>
> Thomas
>
>
>
> *From:* Alexander Dejanovski [mailto:a...@thelastpickle.com]
> *Sent:* Freitag, 15. September 2017 09:45
> *To:* user@
with the
partition range (-pr) option, but with 3.0 we additionally have to provide the
–full option, right?
Thanks again,
Thomas
From: Alexander Dejanovski [mailto:a...@thelastpickle.com]
Sent: Freitag, 15. September 2017 09:45
To: user@cassandra.apache.org
Subject: Re: Multi-node repair fails
Hi Thomas,
in 2.1.18, the default repair mode was full repair while since 2.2 it is
incremental repair.
So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't
trigger the same operation.
Incremental repair cannot run on more than one node at a time on a cluster,
because you risk to
Hello,
we are currently in the process of upgrading from 2.1.18 to 3.0.14. After
upgrading a few test environments, we start to see some suspicious log entries
regarding repair issues.
We have a cron job on all nodes basically executing the following repair call
on a daily basis:
nodetool rep
I think you can't as in previous version, you might want to look at streams
(nodetool netstats) and validation compactions (nodetool compactionstats).
I don't enter in the details as this has already been answered a lot of
time since 0.X version of Cassandra.
The only new thing I was able to find
Hi,
I am using incremental repair in Cassandra 2.1.2 right now, I am wondering if
there is any API that I can get the current progress of the current repair job?
That would be a great help. Thanks.
Regards,
-Jieming-
Thanks DuyHai.
From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: 2014年11月14日 21:55
To: user@cassandra.apache.org
Subject: Re: Questiona about node repair
By checking into the source code:
StorageService:
public void forceTerminateAllRepairSessions
Thanks Rob.
From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 2014年11月15日 2:50
To: user@cassandra.apache.org
Subject: Re: Questiona about node repair
On Thu, Nov 13, 2014 at 7:01 PM, Di, Jieming
mailto:jieming...@emc.com>> wrote
I have a question about Cassandra node repair, ther
On Thu, Nov 13, 2014 at 7:01 PM, Di, Jieming wrote
> I have a question about Cassandra node repair, there is a function called
> “forceTerminateAllRepairSessions();”, so will the function terminate all
> the repair session in only one node, or it will terminate all the session
> in
, Jieming wrote:
> Hi There,
>
>
>
> I have a question about Cassandra node repair, there is a function called
> “forceTerminateAllRepairSessions();”, so will the function terminate all
> the repair session in only one node, or it will terminate all the session
> in a ring? An
Hi There,
I have a question about Cassandra node repair, there is a function called
"forceTerminateAllRepairSessions();", so will the function terminate all the
repair session in only one node, or it will terminate all the session in a
ring? And when it terminates all repair session
On Mon, Dec 12, 2011 at 3:47 PM, Brian Fleming wrote:
>
> However after the repair completed, we had over 2.5 times the original
> load. Issuing a 'cleanup' reduced this to about 1.5 times the original
> load. We observed an increase in the number of keys via 'cfstats' which is
> obviously accou
Hi,
We simulated a node 'failure' on one of our nodes by deleting the entire
Cassandra installation directory & reconfiguring a fresh instance with the
same token. When we issued a 'repair' it started streaming data back onto
the node as expected.
However after the repair completed, we had over
Yes, the fact that node send TreeRequest (and merkle trees) to themselves is
part of the protocol, no problem there.
As for "it has ran for many hours without repairing anything", what makes you
think it didn't repair anything ?
--
Sylvain
On Mon, Sep 19, 2011 at 4:14 PM, Jason Harvey wrote:
>
Got a response from jbellis in IRC saying that the node will have to
build its own hash tree. The request to itself is normal.
On Mon, Sep 19, 2011 at 7:01 AM, Jason Harvey wrote:
> I have a node in my 0.8.5 ring that I'm attempting to repair. I sent
> it the repair command and let it run for a f
I have a node in my 0.8.5 ring that I'm attempting to repair. I sent
it the repair command and let it run for a few hours. After checking
the logs it didn't appear to have repaired at all. This was the last
repair-related thing in the logs:
INFO [AntiEntropyStage:1] 2011-09-19 05:53:55,823
AntiEn
node must do. IMHO it's not a good idea.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 1 Aug 2011, at 06:09, Yan Chunlu wrote:
>
> > I am running 3 nodes and RF=3, cassandra v0.7.4
> > se
, at 06:09, Yan Chunlu wrote:
> I am running 3 nodes and RF=3, cassandra v0.7.4
> seems when disablegossip and disablethrift could keep node in pretty low
> load. sometimes when the node repair doing "rebuilding sstable", I would
> disable gossip and thrift to lower the load. n
I am running 3 nodes and RF=3, cassandra v0.7.4
seems when disablegossip and disablethrift could keep node in pretty low
load. sometimes when the node repair doing "rebuilding sstable", I would
disable gossip and thrift to lower the load. not sure if I could disable
them in the whole
es.apache.org/jira/browse/CASSANDRA-2156>
>> https://issues.apache.org/jira/browse/CASSANDRA-2156
>>
>> but seems only available to 0.8 and people submitted a patch for 0.6, I am
>> using 0.7.4, do I need to dig into the code and make my own patch?
>>
>> does
nks!
>>
>> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu wrote:
>> at the beginning of using cassandra, I have no idea that I should run "node
>> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
>> node repair for months, the da
e the io problem? thanks!
>
> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu <
> springri...@gmail.com> wrote:
>
>> at the beginning of using cassandra, I have no idea that I should run
>> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have
2011 at 4:44 PM, Yan Chunlu <
> springri...@gmail.com> wrote:
>
>> at the beginning of using cassandra, I have no idea that I should run
>> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have
>> not run node repair for months, the data
e beginning of using cassandra, I have no idea that I should run "node
> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
> node repair for months, the data size is 20G.
>
> the problem is when I start running node repair now, it eat up all disk
4:44 PM, Yan Chunlu wrote:
> at the beginning of using cassandra, I have no idea that I should run "node
> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
> node repair for months, the data size is 20G.
>
> the problem is when I start running n
at the beginning of using cassandra, I have no idea that I should run "node
repair" frequently, so basically, I have 3 nodes with RF=3 and have not run
node repair for months, the data size is 20G.
the problem is when I start running node repair now, it eat up all disk io
and the s
> The more often you repair, the quicker it will be. The more often your
> nodes go down the longer it will be.
Going to have to disagree a bit here. In most cases the cost of
running through the data and calculating the merkle tree should be
quite significant, and hopefully the differences shoul
(not answering (1) right now, because it's more involved)
> 2. Does a Nodetool Repair block any reads and writes on the node,
> while the repair is going on ? During repair, if I try to do an
> insert, will the insert wait for repair to complete first ?
It doesn't imply any blocking. It's roughly
be able to compare with
other nodes, and if there are differences, it has to send/receive data
from other nodes.
-Original Message-
From: A J [mailto:s5a...@gmail.com]
Sent: Monday, July 11, 2011 2:43 PM
To: user@cassandra.apache.org
Subject: Node repair questions
Hello,
Have the
Hello,
Have the following questions related to nodetool repair:
1. I know that Nodetool Repair Interval has to be less than
GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds
and 'Nodetool Repair Interval'. What factors would want me to change
the default of 10 days of GCGraceSe
ndra.apache.org
Subject: Re: node repair
On Mon, Mar 22, 2010 at 11:53 AM, Todd Burruss wrote:
> it's very possible if i thought it wasn't working. is there a delay between
> compation and streaming?
yes, it can be a significant one if you have a lot of data.
you can look at th
On Mon, Mar 22, 2010 at 11:53 AM, Todd Burruss wrote:
> it's very possible if i thought it wasn't working. is there a delay between
> compation and streaming?
yes, it can be a significant one if you have a lot of data.
you can look at the compaction mbean for progress on that side of things.
didn't see any compaction.
From: Stu Hood [stu.h...@rackspace.com]
Sent: Monday, March 22, 2010 7:08 AM
To: user@cassandra.apache.org
Subject: RE: node repair
Hey Todd,
Repair involves 2 major compactions in addition to the streaming. More
information
g for that case.
Thanks,
Stu
-Original Message-
From: "Todd Burruss"
Sent: Sunday, March 21, 2010 3:43pm
To: "user@cassandra.apache.org"
Subject: RE: node repair
while preparing a test to capture logs i decided to not let the data set get
too big and i did see it fin
es below except for read
repair ... i'll keep an eye out for it again and try it again with more data.
thx
From: Stu Hood [stu.h...@rackspace.com]
Sent: Sunday, March 21, 2010 12:08 PM
To: user@cassandra.apache.org
Subject: RE: node repair
If you have
If you have debug logs from the run, would you mind opening a JIRA describing
the problem?
-Original Message-
From: "Todd Burruss"
Sent: Sunday, March 21, 2010 1:30pm
To: "Todd Burruss" , "user@cassandra.apache.org"
Subject: RE: node repair
one last co
random
partitioner and assigned a token to each node.
From: Todd Burruss
Sent: Saturday, March 20, 2010 6:48 PM
To: Todd Burruss; user@cassandra.apache.org
Subject: RE: node repair
fyi ... i just compacted and node 105 is definitely not being repaired
fyi ... i just compacted and node 105 is definitely not being repaired
From: Todd Burruss
Sent: Saturday, March 20, 2010 12:34 PM
To: user@cassandra.apache.org
Subject: RE: node repair
same IP, same token. i'm trying Handling Failure, #3.
it is ru
05Up 65.62 GB 170141183460469231731687303715884105728
|-->|
From: Jonathan Ellis [jbel...@gmail.com]
Sent: Saturday, March 20, 2010 11:23 AM
To: user@cassandra.apache.org
Subject: Re: node repair
if you bring up a new node w/ a diff
if you bring up a new node w/ a different ip but the same token, it
will confuse things.
http://wiki.apache.org/cassandra/Operations "handling failure" section
covers best practices here.
On Sat, Mar 20, 2010 at 11:51 AM, Todd Burruss wrote:
> i had a node fail, lost all data. so i brought it b
i had a node fail, lost all data. so i brought it back up fresh, but assigned
it the same token in storage-conf.xml. then ran nodetool repair.
all compactions have finished, no streams are happening. nothing. so i did it
again. same thing. i don't think its working. is there a log message
67 matches
Mail list logo