Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-17 Thread Reid Pinchback
l 16, 2020 at 3:32 AM To: "user@cassandra.apache.org" Subject: Re: Cassandra node JVM hang during node repair a table with materialized view Message from External Sender Thanks a lot. We are working on removing views and control the partition size. I hope the improvements help us Bes

Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-16 Thread Ben G
Thanks a lot. We are working on removing views and control the partition size. I hope the improvements help us Best regards Gb Erick Ramirez 于2020年4月16日周四 下午2:08写道: > GC collector is G1. I ever repair the node after scale up. The JVM issue >> reproduced. Can I increase the heap to 40 GB on

Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-15 Thread Erick Ramirez
> > GC collector is G1. I ever repair the node after scale up. The JVM issue > reproduced. Can I increase the heap to 40 GB on a 64GB VM? > I wouldn't recommend going beyond 31GB on G1. It will be diminishing returns as I mentioned before. Do you think the issue is related to materialized view

Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-15 Thread Ben G
Thanks a lot for your sharing. The node is added recently. The bootstrap failed since too many tombstone. So we enabled the node without bootstrap enabled. Some sstables are not created in bootstrap. So the missing files might be numerous. I have set the repair thread number is 1. should I als

Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-15 Thread Erick Ramirez
Is this the first time you've repaired your cluster? Because it sounds like it isn't coping. First thing you need to make sure of is to *not* run repairs in parallel. It can overload your cluster -- only kick off a repair one node at a time on small clusters. For larger clusters, you might be able

Cassandra node JVM hang during node repair a table with materialized view

2020-04-15 Thread Ben G
Hello experts I have a 9 nodes cluster on AWS. Recently, some nodes were down and I want to repair the cluster after I restarted them. But I found the repair operation causes lots of memtable flush and then the JVM GC failed. Consequently, the node hang. I am using the cassandra 3.1.0. java vers

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-07-01 Thread Jeff Jirsa
RF=5 allows you to lose two hosts without losing quorum Many teams can calculate their hardware failure rate and replacement time. If you can do both of these things you can pick and RF that meets your durability and availability SLO. For sufficiently high SLOs you’ll need RF > 3 > On Jun 30,

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-30 Thread Oleksandr Shulgin
On Sat, Jun 29, 2019 at 5:49 AM Jeff Jirsa wrote: > If you’re at RF= 3 and read/write at quorum, you’ll have full visibility > of all data if you switch to RF=4 and continue reading at quorum because > quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the > two nodes that got

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Jeff Jirsa
If you’re at RF= 3 and read/write at quorum, you’ll have full visibility of all data if you switch to RF=4 and continue reading at quorum because quorum if 4 is 3, so you’re guaranteed to overlap with at least one of the two nodes that got all earlier writes Going from 3 to 4 to 5 requires a re

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Oleksandr Shulgin
On Fri, Jun 28, 2019 at 11:29 PM Jeff Jirsa wrote: > you often have to run repair after each increment - going from 3 -> 5 > means 3 -> 4, repair, 4 -> 5 - just going 3 -> 5 will violate consistency > guarantees, and is technically unsafe. > Jeff, How going from 3 -> 4 is *not violating* consi

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Jon Haddad
Yep - not to mention the increased complexity and overhead of going from ONE to QUORUM, or the increased cost of QUORUM in RF=5 vs RF=3. If you're in a cloud provider, I've found you're almost always better off adding a new DC with a higher RF, assuming you're on NTS like Jeff mentioned. On Fri,

Re: Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Jeff Jirsa
For just changing RF: You only need to repair the full token range - how you do that is up to you. Running `repair -pr -full` on each node will do that. Running `repair -full` will do it multiple times, so it's more work, but technically correct.The caveat that few people actually appreciate about

Running Node Repair After Changing RF or Replication Strategy for a Keyspace

2019-06-28 Thread Fd Habash
Hi all … The datastax & apache docs are clear: run ‘nodetool repair’ after you alter a keyspace to change its RF or RS. However, the details are all over the place as what type of repair and on what nodes it needs to run. None of the above doc authorities are clear and what you find on the int

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Jeff Jirsa
rd > compatiblity break in a bug-fix release. > > Just my 2 cents from someone having > 300 Cassandra 2.1 JVMs out there spread > around the world. > > Thanks, > Thomas > > From: kurt greaves [mailto:k...@instaclustr.com] > Sent: Dienstag, 19. September 20

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread kurt greaves
changing so often. To me, this is a major flaw in Cassandra. > > > > > > Sean Durity > > > > *From:* Steinmaurer, Thomas [mailto:thomas.steinmau...@dynatrace.com] > *Sent:* Tuesday, September 19, 2017 2:33 AM > *To:* user@cassandra.apache.org > *Subject:* RE: M

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Anthony P. Scism
unsubscribe Anthony P. Scism Info Tech-Risk Mgmt/Client Sys - Capacity Planning Work: 402-544-0361 Mobile: 402-707-4446 From: "Durity, Sean R" To: "user@cassandra.apache.org" Date: 09/19/2017 09:25 AM Subject: RE: Mul

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-19 Thread Steinmaurer, Thomas
:56 To: user@cassandra.apache.org Subject: Re: Multi-node repair fails after upgrading to 3.0.14 In 4.0 anti-compaction is no longer run after full repairs, so we should probably backport this behavior to 3.0, given there are known limitations with incremental repair on 3.0 and non-incremental

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Paulo Motta
> From: kurt greaves [mailto:k...@instaclustr.com] > Sent: Dienstag, 19. September 2017 06:24 > To: User > > > Subject: Re: Multi-node repair fails after upgrading to 3.0.14 > > > > https://issues.apache.org/jira/browse/CASSANDRA-13153 implies full repairs > still trigge

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread kurt greaves
t; > > *From:* Jeff Jirsa [mailto:jji...@gmail.com] > *Sent:* Montag, 18. September 2017 16:10 > > *To:* user@cassandra.apache.org > *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14 > > > > Sorry I may be wrong about the cause - didn't see -full > > &g

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Jeff Jirsa
ational > POV. > > Thanks again. > > Thomas > > From: Jeff Jirsa [mailto:jji...@gmail.com] > Sent: Montag, 18. September 2017 15:56 > To: user@cassandra.apache.org > Subject: Re: Multi-node repair fails after upgrading to 3.0.14 > > The command you'

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
Hi Jeff, understood. That’s quite a change then coming from 2.1 from an operational POV. Thanks again. Thomas From: Jeff Jirsa [mailto:jji...@gmail.com] Sent: Montag, 18. September 2017 15:56 To: user@cassandra.apache.org Subject: Re: Multi-node repair fails after upgrading to 3.0.14 The

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Jeff Jirsa
t; without printing a stack trace. > > The error message and stack trace isn’t really useful here. Any further > ideas/experiences? > > Thanks, > Thomas > > From: Alexander Dejanovski [mailto:a...@thelastpickle.com] > Sent: Freitag, 15. September 2017 11:30 > To: u

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Alexander Dejanovski
t; > > > Thanks, > > Thomas > > > > *From:* Alexander Dejanovski [mailto:a...@thelastpickle.com] > *Sent:* Freitag, 15. September 2017 11:30 > > > *To:* user@cassandra.apache.org > *Subject:* Re: Multi-node repair fails after upgrading to 3.0.14 > >

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-18 Thread Steinmaurer, Thomas
: Freitag, 15. September 2017 11:30 To: user@cassandra.apache.org Subject: Re: Multi-node repair fails after upgrading to 3.0.14 Right, you should indeed add the "--full" flag to perform full repairs, and you can then keep the "-pr" flag. I'd advise to monitor the status o

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Jeff Jirsa
Few notes: - in 3.0 the default changed to incremental repair which will have to anticompact sstables to allow you to repair the primary ranges you've specified - since you're starting the repair on all nodes at the same time, you end up with overlapping anticompactions Generally you should stag

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
Alex, thanks again! We will switch back to the 2.1 behavior for now. Thomas From: Alexander Dejanovski [mailto:a...@thelastpickle.com] Sent: Freitag, 15. September 2017 11:30 To: user@cassandra.apache.org Subject: Re: Multi-node repair fails after upgrading to 3.0.14 Right, you should indeed

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Alexander Dejanovski
; partition range (-pr) option, but with 3.0 we additionally have to provide > the –full option, right? > > > > Thanks again, > > Thomas > > > > *From:* Alexander Dejanovski [mailto:a...@thelastpickle.com] > *Sent:* Freitag, 15. September 2017 09:45 > *To:* user@

RE: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Steinmaurer, Thomas
with the partition range (-pr) option, but with 3.0 we additionally have to provide the –full option, right? Thanks again, Thomas From: Alexander Dejanovski [mailto:a...@thelastpickle.com] Sent: Freitag, 15. September 2017 09:45 To: user@cassandra.apache.org Subject: Re: Multi-node repair fails

Re: Multi-node repair fails after upgrading to 3.0.14

2017-09-15 Thread Alexander Dejanovski
Hi Thomas, in 2.1.18, the default repair mode was full repair while since 2.2 it is incremental repair. So running "nodetool repair -pr" since your upgrade to 3.0.14 doesn't trigger the same operation. Incremental repair cannot run on more than one node at a time on a cluster, because you risk to

Multi-node repair fails after upgrading to 3.0.14

2017-09-14 Thread Steinmaurer, Thomas
Hello, we are currently in the process of upgrading from 2.1.18 to 3.0.14. After upgrading a few test environments, we start to see some suspicious log entries regarding repair issues. We have a cron job on all nodes basically executing the following repair call on a daily basis: nodetool rep

Re: Cassandra 2.1.2 - How to get node repair progress

2015-01-26 Thread Alain RODRIGUEZ
I think you can't as in previous version, you might want to look at streams (nodetool netstats) and validation compactions (nodetool compactionstats). I don't enter in the details as this has already been answered a lot of time since 0.X version of Cassandra. The only new thing I was able to find

Cassandra 2.1.2 - How to get node repair progress

2015-01-22 Thread Di, Jieming
Hi, I am using incremental repair in Cassandra 2.1.2 right now, I am wondering if there is any API that I can get the current progress of the current repair job? That would be a great help. Thanks. Regards, -Jieming-

RE: Questiona about node repair

2014-11-14 Thread Di, Jieming
Thanks DuyHai. From: DuyHai Doan [mailto:doanduy...@gmail.com] Sent: 2014年11月14日 21:55 To: user@cassandra.apache.org Subject: Re: Questiona about node repair By checking into the source code: StorageService: public void forceTerminateAllRepairSessions

RE: Questiona about node repair

2014-11-14 Thread Di, Jieming
Thanks Rob. From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 2014年11月15日 2:50 To: user@cassandra.apache.org Subject: Re: Questiona about node repair On Thu, Nov 13, 2014 at 7:01 PM, Di, Jieming mailto:jieming...@emc.com>> wrote I have a question about Cassandra node repair, ther

Re: Questiona about node repair

2014-11-14 Thread Robert Coli
On Thu, Nov 13, 2014 at 7:01 PM, Di, Jieming wrote > I have a question about Cassandra node repair, there is a function called > “forceTerminateAllRepairSessions();”, so will the function terminate all > the repair session in only one node, or it will terminate all the session > in

Re: Questiona about node repair

2014-11-14 Thread DuyHai Doan
, Jieming wrote: > Hi There, > > > > I have a question about Cassandra node repair, there is a function called > “forceTerminateAllRepairSessions();”, so will the function terminate all > the repair session in only one node, or it will terminate all the session > in a ring? An

Questiona about node repair

2014-11-13 Thread Di, Jieming
Hi There, I have a question about Cassandra node repair, there is a function called "forceTerminateAllRepairSessions();", so will the function terminate all the repair session in only one node, or it will terminate all the session in a ring? And when it terminates all repair session

Re: Node repair : excessive data

2011-12-12 Thread Tyler Hobbs
On Mon, Dec 12, 2011 at 3:47 PM, Brian Fleming wrote: > > However after the repair completed, we had over 2.5 times the original > load. Issuing a 'cleanup' reduced this to about 1.5 times the original > load. We observed an increase in the number of keys via 'cfstats' which is > obviously accou

Node repair : excessive data

2011-12-12 Thread Brian Fleming
Hi, We simulated a node 'failure' on one of our nodes by deleting the entire Cassandra installation directory & reconfiguring a fresh instance with the same token. When we issued a 'repair' it started streaming data back onto the node as expected. However after the repair completed, we had over

Re: Node repair never progressing past strange AEService request (0.8.5)

2011-09-19 Thread Sylvain Lebresne
Yes, the fact that node send TreeRequest (and merkle trees) to themselves is part of the protocol, no problem there. As for "it has ran for many hours without repairing anything", what makes you think it didn't repair anything ? -- Sylvain On Mon, Sep 19, 2011 at 4:14 PM, Jason Harvey wrote: >

Re: Node repair never progressing past strange AEService request (0.8.5)

2011-09-19 Thread Jason Harvey
Got a response from jbellis in IRC saying that the node will have to build its own hash tree. The request to itself is normal. On Mon, Sep 19, 2011 at 7:01 AM, Jason Harvey wrote: > I have a node in my 0.8.5 ring that I'm attempting to repair. I sent > it the repair command and let it run for a f

Node repair never progressing past strange AEService request (0.8.5)

2011-09-19 Thread Jason Harvey
I have a node in my 0.8.5 ring that I'm attempting to repair. I sent it the repair command and let it run for a few hours. After checking the logs it didn't appear to have repaired at all. This was the last repair-related thing in the logs: INFO [AntiEntropyStage:1] 2011-09-19 05:53:55,823 AntiEn

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
node must do. IMHO it's not a good idea. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 1 Aug 2011, at 06:09, Yan Chunlu wrote: > > > I am running 3 nodes and RF=3, cassandra v0.7.4 > > se

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread aaron morton
, at 06:09, Yan Chunlu wrote: > I am running 3 nodes and RF=3, cassandra v0.7.4 > seems when disablegossip and disablethrift could keep node in pretty low > load. sometimes when the node repair doing "rebuilding sstable", I would > disable gossip and thrift to lower the load. n

Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
I am running 3 nodes and RF=3, cassandra v0.7.4 seems when disablegossip and disablethrift could keep node in pretty low load. sometimes when the node repair doing "rebuilding sstable", I would disable gossip and thrift to lower the load. not sure if I could disable them in the whole

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread Yan Chunlu
es.apache.org/jira/browse/CASSANDRA-2156> >> https://issues.apache.org/jira/browse/CASSANDRA-2156 >> >> but seems only available to 0.8 and people submitted a patch for 0.6, I am >> using 0.7.4, do I need to dig into the code and make my own patch? >> >> does

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread aaron morton
nks! >> >> On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu wrote: >> at the beginning of using cassandra, I have no idea that I should run "node >> repair" frequently, so basically, I have 3 nodes with RF=3 and have not run >> node repair for months, the da

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread Yan Chunlu
e the io problem? thanks! > > On Wed, Jul 20, 2011 at 4:44 PM, Yan Chunlu < > springri...@gmail.com> wrote: > >> at the beginning of using cassandra, I have no idea that I should run >> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
2011 at 4:44 PM, Yan Chunlu < > springri...@gmail.com> wrote: > >> at the beginning of using cassandra, I have no idea that I should run >> "node repair" frequently, so basically, I have 3 nodes with RF=3 and have >> not run node repair for months, the data

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Aaron Morton
e beginning of using cassandra, I have no idea that I should run "node > repair" frequently, so basically, I have 3 nodes with RF=3 and have not run > node repair for months, the data size is 20G. > > the problem is when I start running node repair now, it eat up all disk

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
4:44 PM, Yan Chunlu wrote: > at the beginning of using cassandra, I have no idea that I should run "node > repair" frequently, so basically, I have 3 nodes with RF=3 and have not run > node repair for months, the data size is 20G. > > the problem is when I start running n

node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
at the beginning of using cassandra, I have no idea that I should run "node repair" frequently, so basically, I have 3 nodes with RF=3 and have not run node repair for months, the data size is 20G. the problem is when I start running node repair now, it eat up all disk io and the s

Re: Node repair questions

2011-07-11 Thread Peter Schuller
> The more often you repair, the quicker it will be.  The more often your > nodes go down the longer it will be. Going to have to disagree a bit here. In most cases the cost of running through the data and calculating the merkle tree should be quite significant, and hopefully the differences shoul

Re: Node repair questions

2011-07-11 Thread Peter Schuller
(not answering (1) right now, because it's more involved) > 2. Does a Nodetool Repair block any reads and writes on the node, > while the repair is going on ? During repair, if I try to do an > insert, will the insert wait for repair to complete first ? It doesn't imply any blocking. It's roughly

RE: Node repair questions

2011-07-11 Thread Jeremiah Jordan
be able to compare with other nodes, and if there are differences, it has to send/receive data from other nodes. -Original Message- From: A J [mailto:s5a...@gmail.com] Sent: Monday, July 11, 2011 2:43 PM To: user@cassandra.apache.org Subject: Node repair questions Hello, Have the

Node repair questions

2011-07-11 Thread A J
Hello, Have the following questions related to nodetool repair: 1. I know that Nodetool Repair Interval has to be less than GCGraceSeconds. How do I come up with an exact value of GCGraceSeconds and 'Nodetool Repair Interval'. What factors would want me to change the default of 10 days of GCGraceSe

RE: node repair

2010-03-22 Thread Todd Burruss
ndra.apache.org Subject: Re: node repair On Mon, Mar 22, 2010 at 11:53 AM, Todd Burruss wrote: > it's very possible if i thought it wasn't working. is there a delay between > compation and streaming? yes, it can be a significant one if you have a lot of data. you can look at th

Re: node repair

2010-03-22 Thread Jonathan Ellis
On Mon, Mar 22, 2010 at 11:53 AM, Todd Burruss wrote: > it's very possible if i thought it wasn't working.  is there a delay between > compation and streaming? yes, it can be a significant one if you have a lot of data. you can look at the compaction mbean for progress on that side of things.

RE: node repair

2010-03-22 Thread Todd Burruss
didn't see any compaction. From: Stu Hood [stu.h...@rackspace.com] Sent: Monday, March 22, 2010 7:08 AM To: user@cassandra.apache.org Subject: RE: node repair Hey Todd, Repair involves 2 major compactions in addition to the streaming. More information

RE: node repair

2010-03-22 Thread Stu Hood
g for that case. Thanks, Stu -Original Message- From: "Todd Burruss" Sent: Sunday, March 21, 2010 3:43pm To: "user@cassandra.apache.org" Subject: RE: node repair while preparing a test to capture logs i decided to not let the data set get too big and i did see it fin

RE: node repair

2010-03-21 Thread Todd Burruss
es below except for read repair ... i'll keep an eye out for it again and try it again with more data. thx From: Stu Hood [stu.h...@rackspace.com] Sent: Sunday, March 21, 2010 12:08 PM To: user@cassandra.apache.org Subject: RE: node repair If you have

RE: node repair

2010-03-21 Thread Stu Hood
If you have debug logs from the run, would you mind opening a JIRA describing the problem? -Original Message- From: "Todd Burruss" Sent: Sunday, March 21, 2010 1:30pm To: "Todd Burruss" , "user@cassandra.apache.org" Subject: RE: node repair one last co

RE: node repair

2010-03-21 Thread Todd Burruss
random partitioner and assigned a token to each node. From: Todd Burruss Sent: Saturday, March 20, 2010 6:48 PM To: Todd Burruss; user@cassandra.apache.org Subject: RE: node repair fyi ... i just compacted and node 105 is definitely not being repaired

RE: node repair

2010-03-20 Thread Todd Burruss
fyi ... i just compacted and node 105 is definitely not being repaired From: Todd Burruss Sent: Saturday, March 20, 2010 12:34 PM To: user@cassandra.apache.org Subject: RE: node repair same IP, same token. i'm trying Handling Failure, #3. it is ru

RE: node repair

2010-03-20 Thread Todd Burruss
05Up 65.62 GB 170141183460469231731687303715884105728 |-->| From: Jonathan Ellis [jbel...@gmail.com] Sent: Saturday, March 20, 2010 11:23 AM To: user@cassandra.apache.org Subject: Re: node repair if you bring up a new node w/ a diff

Re: node repair

2010-03-20 Thread Jonathan Ellis
if you bring up a new node w/ a different ip but the same token, it will confuse things. http://wiki.apache.org/cassandra/Operations "handling failure" section covers best practices here. On Sat, Mar 20, 2010 at 11:51 AM, Todd Burruss wrote: > i had a node fail, lost all data.  so i brought it b

node repair

2010-03-20 Thread Todd Burruss
i had a node fail, lost all data. so i brought it back up fresh, but assigned it the same token in storage-conf.xml. then ran nodetool repair. all compactions have finished, no streams are happening. nothing. so i did it again. same thing. i don't think its working. is there a log message