Cassandra 5.0.2 - Repair error

2025-07-16 Thread Joe Obernberger
Hi all, I'm getting the following error when executing a repair on a table: error: Repair job has failed with the error message: Repair command #4 failed with error Did not get replies from all endpoints.. Check the logs on the repair participants for further details -- Stack

Nodetool repair skipping some token ranges

2025-06-06 Thread varun nalamati
Hi Everyone, 1. We are currently facing a data discrepancy issue where a UDT (User-Defined Type) column is returning different values across multiple data centers. We are running on DSE 6.9.6 on Cassandra 3.11 2. To resolve this, we have already attempted a full repair and a -pr

Repair is failing for some tokens

2025-04-16 Thread Soyal Badkur
Hi Team, When we are trying to execute a repair with token range on cassandra 3.11.13 it is getting failed with below errors. java.lang.RuntimeException: Repair job has failed with the error message: [2025-04-16 13:18:15,587] Some repair failed at org.apache.cassandra.tools.RepairRunner.progress

Re: Cassandra 4.0.10 snapshot failure during sequential repair

2025-03-27 Thread Miklosovic, Stefan via user
(CassandraTableRepairManager.java:74) If you check this: https://github.com/apache/cassandra/blob/cassandra-4.0/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L72 There is if (force || !cfs.snapshotExists…) So if “force” is “false”, which is the case in case repair is global

Re: Switching to Incremental Repair

2024-02-15 Thread Chris Lohfink
witch into IR sstables with more caveats. Probably worth a jira to add a faster solution On Thu, Feb 15, 2024 at 12:50 PM Kristijonas Zalys wrote: > Hi folks, > > One last question regarding incremental repair. > > What would be a safe approach to temporarily stop running incre

Re: Switching to Incremental Repair

2024-02-15 Thread Bowen Song via user
running out of disk space, and you should address that issue first before even considering upgrading Cassandra. On 15/02/2024 18:49, Kristijonas Zalys wrote: Hi folks, One last question regarding incremental repair. What would be a safe approach to temporarily stop running incremental repair

Re: Switching to Incremental Repair

2024-02-15 Thread Kristijonas Zalys
Hi folks, One last question regarding incremental repair. What would be a safe approach to temporarily stop running incremental repair on a cluster (e.g.: during a Cassandra major version upgrade)? My understanding is that if we simply stop running incremental repair, the cluster's nodes ca

Full repair with -pr option getting stuck on Cassandra 4.0.10

2024-02-08 Thread manish khandelwal
In a two datacenter cluster (11 nodes each) we are seeing repair getting stuck. Issue is when repair is triggered on a particular keyspace repair session is lost and cassandra never returns for that particular session. There are no "WARN" or "ERROR" logs in Cassandra logs. No

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
The over-streaming is only problematic for the repaired SSTables, but it can be triggered by inconsistencies within the unrepaired SSTables during an incremental repair session. This is because although an incremental repair will only compare the unrepaired SSTables, but it will stream both

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
Thank you very much for your explanation. Streaming happens on the token range level, not the SSTable level, right? So, when running an incremental repair before the full repair, the problem that “some unrepaired SSTables are being marked as repaired on one node but not on another” should not

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Unfortunately repair doesn't compare each partition individually. Instead, it groups multiple partitions together and calculate a hash of them, stores the hash in a leaf of a merkle tree, and then compares the merkle trees between replicas during a repair session. If any one of the parti

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> Caution, using the method you described, the amount of data streamed at the > end with the full repair is not the amount of data written between stopping > the first node and the last node, but depends on the table size, the number > of partitions written, their distribution in

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Caution, using the method you described, the amount of data streamed at the end with the full repair is not the amount of data written between stopping the first node and the last node, but depends on the table size, the number of partitions written, their distribution in the ring and the

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> That's a feature we need to implement in Reaper. I think disallowing the > start of the new incremental repair would be easier to manage than pausing > the full repair that's already running. It's also what I think I'd expect as > a user. > > I'l

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> Full repair running for an entire week sounds excessively long. Even if > you've got 1 TB of data per node, 1 week means the repair speed is less than > 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck > of the full repair speed and work on t

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Just one more thing. Make sure you run 'nodetool repair -full' instead of just 'nodetool repair'. That's because the command's default was changed in Cassandra 2.x. The default was full repair before that change, but the new default now is incremental repair. O

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Not disabling auto-compaction may result in repaired SSTables getting compacted together with unrepaired SSTables before the repair state is set on them, which leads to mismatch in the repaired data between nodes, and potentially very expensive over-streaming in a future full repair. You

Re: Switching to Incremental Repair

2024-02-06 Thread Kristijonas Zalys
t in Reaper. I think disallowing the > start of the new incremental repair would be easier to manage than pausing > the full repair that's already running. It's also what I think I'd expect > as a user. > > I'll create an issue to track this. > > Le sam. 3 févr. 2024,

Re: Switching to Incremental Repair

2024-02-04 Thread Alexander DEJANOVSKI
Hi Sebastian, That's a feature we need to implement in Reaper. I think disallowing the start of the new incremental repair would be easier to manage than pausing the full repair that's already running. It's also what I think I'd expect as a user. I'll create an issue

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
Full repair running for an entire week sounds excessively long. Even if you've got 1 TB of data per node, 1 week means the repair speed is less than 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck of the full repair speed and work on that instead. On

Re: Switching to Incremental Repair

2024-02-03 Thread Sebastian Marsching
Hi, > 2. use an orchestration tool, such as Cassandra Reaper, to take care of that > for you. You will still need monitor and alert to ensure the repairs are run > successfully, but fixing a stuck or failed repair is not very time sensitive, > you can usually leave it till Monday m

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
Hi Kristijonas, It is not possible to run two repairs, regardless whether they are incremental or full, for the same token range and on the same table concurrently. You have two options: 1. create a schedule that's don't overlap, e.g. run incremental repair daily except the 1

Re: Switching to Incremental Repair

2024-02-02 Thread manish khandelwal
They(incremental and full repairs) are required to run separately at different times. You need to identify a schedule, for example, running incremental repairs every week for 3 weeks and then run full repair in the 4th week. Regards Manish On Sat, Feb 3, 2024 at 7:29 AM Kristijonas Zalys wrote

Re: Switching to Incremental Repair

2024-02-02 Thread Kristijonas Zalys
Hi Bowen, Thank you for your help! So given that we would need to run both incremental and full repair for a given cluster, is it safe to have both types of repair running for the same token ranges at the same time? Would it not create a race condition? Thanks, Kristijonas On Fri, Feb 2, 2024

Re: Switching to Incremental Repair

2024-02-02 Thread Bowen Song via user
Hi Kristijonas, To answer your questions: 1. It's still necessary to run full repair on a cluster on which incremental repair is run periodically. The frequency of full repair is more of an art than science. Generally speaking, the less reliable the storage media, the more frequently

Switching to Incremental Repair

2024-02-02 Thread Kristijonas Zalys
Hi folks, I am working on switching from full to incremental repair in Cassandra v4.0.6 (soon to be v4.1.3) and I have a few questions. 1. Is it necessary to run regular full repair on a cluster if I already run incremental repair? If yes, what frequency would you recommend for full

Re: Over streaming in one node during repair.

2024-01-24 Thread manish khandelwal
, > "sstablemetadata" and "sstabledump" commands handy. > > > On 23/01/2024 18:07, manish khandelwal wrote: > > In one of our two datacenter setup(3+3), one Cassndra node is getting lot > of data streamed from other nodes during repair to the extent that

Re: Over streaming in one node during repair.

2024-01-24 Thread Bowen Song via user
t; and "sstabledump" commands handy. On 23/01/2024 18:07, manish khandelwal wrote: In one of our two datacenter setup(3+3), one Cassndra node is getting lot of data streamed from other nodes during repair to the extent that it fills up and ends with full disk. I am not able to understan

Re: Over streaming in one node during repair.

2024-01-23 Thread Sebastian Marsching
actually already present – just in the other set of SSTables. > Am 23.01.2024 um 19:07 schrieb manish khandelwal > : > > In one of our two datacenter setup(3+3), one Cassndra node is getting lot of > data streamed from other nodes during repair to the extent that it fills up > and e

Over streaming in one node during repair.

2024-01-23 Thread manish khandelwal
In one of our two datacenter setup(3+3), one Cassndra node is getting lot of data streamed from other nodes during repair to the extent that it fills up and ends with full disk. I am not able to understand what could be the reason that this node is misbehaving in the cluster. Cassandra version is

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Bowen Song via user
Hi Jeff, Does subrange repair mark the SSTable as repaired? From my memory, it doesn't. Regards, Bowen On 27/11/2023 16:47, Jeff Jirsa wrote: I don’t work for datastax, thats not my blog, and I’m on a phone and potentially missing nuance, but I’d never try to convert a cluster to

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Jeff Jirsa
era. Instead I’d leave compaction running and slowly run incremental repair across parts of the token range, slowing down as pending compactions increase I’d choose token ranges such that you’d repair 5-10% of the data on each node at a time > On Nov 23, 2023, at 11:31 PM, Sebast

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Bowen Song via user
to run the full repair for your entire cluster, not each node. Depending on the number of nodes in your cluster, each node should take significantly less time than that unless you have RF set to the total number of nodes. Keep in mind that you only need to disable the auto-compaction for the dur

Migrating to incremental repair in C* 4.x

2023-11-23 Thread Sebastian Marsching
period of time (if you are interested in the reasons why, they are at the end of this e-mail). Therefore, I am wondering whether a slighly different process might work better for us: 1. Run a full repair (we periodically run those anyway). 2. Mark all SSTables as repaired, even though they will

Cassandra 4.0.10 snapshot failure during sequential repair

2023-11-20 Thread Panagiotis Melidis via user
: Unable to take a snapshot bec3dba0-7d70-11ee-99d3-7bda513c2b90 on test_keyspace/test1 This behavior is reproduced consistently, when the following are true: * It is a normal sequential repair (--full and --sequential), * It is not a global repair, meaning at least one datacenter is

Re: Repair errors

2023-08-11 Thread Surbhi Gupta
ndra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:221) > ... 46 common frames omitted > Caused by: org.apache.cassandra.io.compress.CorruptBlockException: > (/data/3/cassandra/data/doc/source_correlations-4ce2d9f0912b11edbd6d4d9b3bfd78b2/nb-9816-big-Data.db)

Re: Repair errors

2023-08-11 Thread Joe Obernberger
at 604552 of length 7911.     at org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:209) Ideas? -Joe On 8/7/2023 10:27 PM, manish khandelwal wrote: What logs of /172.16.20.16:7000 <http://172.16.20.16:7000/> say when repair failed. It ind

Re: Repair errors

2023-08-07 Thread manish khandelwal
What logs of /172.16.20.16:7000 say when repair failed. It indicates "validation failed". Can you check system.log for /172.16.20.16:7000 and see what they say. Looks like you have some issue with *doc/origdoc, probably some corrupt sstable. *Try to run repair for individual table a

Re: Repair errors

2023-08-07 Thread Joe Obernberger
Thank you.  I've tried: nodetool repair --full nodetool repair -pr They all get to 57% on any of the nodes, and then fail. Interestingly the debug log only has INFO - there are no errors. [2023-08-07 14:02:09,828] Repair command #6 failed with error Incremental repair session 83dc17d0

Re: Repair errors

2023-08-06 Thread Josh McKenzie
Quick drive-by observation: > Did not get replies from all endpoints.. Check the > logs on the repair participants for further details > dropping message of type HINT_REQ due to error > org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The > channel this output str

Re: Repair errors

2023-08-04 Thread Surbhi Gupta
but it has hung. I tried to run: > nodetool repair -pr > on each of the nodes, but they all fail with some form of this error: > > error: Repair job has failed with the error message: Repair command #521 > failed with error Did not get replies from all endpoints.. Check the &g

Repair errors

2023-08-04 Thread Joe Obernberger
Hi All - been using reaper to do repairs, but it has hung.  I tried to run: nodetool repair -pr on each of the nodes, but they all fail with some form of this error: error: Repair job has failed with the error message: Repair command #521 failed with error Did not get replies from all endpoints

Re: unlimited repair throughput in Cassandra 4.0

2022-06-14 Thread Azamat Hackimov
Hello. I had the same issues on full repair. I've checked on various GC settings, the most performant is ZGC on Java 11, but I had some stability issues. I left G1GC settings from 3.11.x and got the same issues as yours: CPU load over 90 %, and growing count of open file descriptors (up t

Snapshots of repair are not cleared in Cassandra 4

2022-02-07 Thread Muhammad Soliman
Hi Everyone We have migrated some of our clusters from Cassandra 3.11.11 to 4.0.1. We do repairs periodically triggered by some automation. Each time we run repair we do full `-full` sequential `-seq` primary `-pr` repairs for a portion of the full ring range and we finish iterating over the full

Re: Question related to nodetool repair options

2021-09-07 Thread Deepak Sharma
s a [-pr -full] repair. I think > you're confusing the concept of a full repair vs incremental. This document > might help you understand the concepts -- > https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/operations/opsRepairNodesManualRepair.html. > Cheers! > >>

Re: Question related to nodetool repair options

2021-09-07 Thread Erick Ramirez
No, I'm just saying that [-pr] is the same as [-pr -full], NOT the same as just [-full] on its own. Primary range repairs are not compatible with incremental repairs so by definition, -pr is a [-pr -full] repair. I think you're confusing the concept of a full repair vs incremental. Thi

Re: Question related to nodetool repair options

2021-09-07 Thread Deepak Sharma
Thanks Erick for the response. So in option 3, -pr is not taken into consideration which essentially means option 3 is the same as option 1 (which is the full repair). Right, just want to be sure? Best, Deepak On Tue, Sep 7, 2021 at 3:41 PM Erick Ramirez wrote: > >1. Will perform

Re: Question related to nodetool repair options

2021-09-07 Thread Erick Ramirez
1. Will perform a full repair vs incremental which is the default in some later versions. 2. As you said, will only repair the token range(s) on the node for which it is a primary owner. 3. The -full flag with -pr is redundant -- primary range repairs are always done as a full

Question related to nodetool repair options

2021-09-07 Thread Deepak Sharma
Hi There, We are on Cassandra 3.0.11 and I want to understand what is the difference between following two commands 1. nodetool repair -full 2. nodetool repair -pr 3. nodetool repair -full -pr As per my understanding 1. will do the full repair across all keyspaces. 2. with -pr, restricts repair

Re: High memory usage during nodetool repair

2021-08-09 Thread Elliott Sims
aper. >> >> Thanks, >> Jim >> >> On Mon, Aug 2, 2021 at 7:12 PM Amandeep Srivastava < >> amandeep.srivastava1...@gmail.com> wrote: >> >>> Can anyone please help with the above questions? To summarise: >>> >>> 1) What is the i

Re: Long GC pauses during repair

2021-08-04 Thread Jeff Jirsa
High inter-dc latency could make writes more likely not to land, which would make repair do more work. Also true for read and writes - waiting for the cross-dc request will keep threads around longer, so more concurrent work, so more GC. May be that the GC is coming from the read/write path, and

Re: Long GC pauses during repair

2021-08-04 Thread manish khandelwal
Can inter dc latency cause high gc pauses? Other clusters working fine with same configuration? Only this particular cluster is giving long GC pauses during repair. Regards Manish On Tue, Aug 3, 2021 at 6:42 PM Jim Shaw wrote: > CMS heap too large will have long GC. you may try reduce heap

Re: High memory usage during nodetool repair

2021-08-03 Thread Amandeep Srivastava
the above questions? To summarise: >> >> 1) What is the impact of using mmap only for indices besides a >> degradation in read performance? >> 2) Why does the off heap consumed during Cassandra full repair remains >> occupied 12+ hours after the repair completion and

Re: Long GC pauses during repair

2021-08-03 Thread Jim Shaw
CMS heap too large will have long GC. you may try reduce heap on 1 node to see. or go GC1 if it is easy way. Thanks, Jim On Tue, Aug 3, 2021 at 3:33 AM manish khandelwal < manishkhandelwa...@gmail.com> wrote: > Long GC (1 seconds /2 seconds) pauses seen during repair on the >

Re: High memory usage during nodetool repair

2021-08-03 Thread Jim Shaw
t of using mmap only for indices besides a degradation > in read performance? > 2) Why does the off heap consumed during Cassandra full repair remains > occupied 12+ hours after the repair completion and is there a > manual/configuration driven way to clear that earlier? > > Thanks

Long GC pauses during repair

2021-08-03 Thread manish khandelwal
Long GC (1 seconds /2 seconds) pauses seen during repair on the coordinator. Running full repair with partition range option. GC collector is CMS and heap is 14G. Cluster is 7+7. Cassandra version is 3.11.2. Not much traffic when repair is running. What could be the probable cause of long gc

Re: High memory usage during nodetool repair

2021-08-02 Thread manish khandelwal
mpact of using mmap only for indices besides a >> degradation in read performance? >> 2) Why does the off heap consumed during Cassandra full repair remains >> occupied 12+ hours after the repair completion and is there a >> manual/configuration driven way to clea

Re: High memory usage during nodetool repair

2021-08-02 Thread manish khandelwal
ease help with the above questions? To summarise: > > 1) What is the impact of using mmap only for indices besides a degradation > in read performance? > 2) Why does the off heap consumed during Cassandra full repair remains > occupied 12+ hours after the repair completion and is there

Re: High memory usage during nodetool repair

2021-08-02 Thread Amandeep Srivastava
Can anyone please help with the above questions? To summarise: 1) What is the impact of using mmap only for indices besides a degradation in read performance? 2) Why does the off heap consumed during Cassandra full repair remains occupied 12+ hours after the repair completion and is there a

Re: High memory usage during nodetool repair

2021-07-29 Thread Amandeep Srivastava
anted to understand the role of the > heap and off-heap memory separately during the process. > > Also, for my case, once the nodes reach the 95% memory usage, it stays > there for almost 10-12 hours after the repair is complete, before falling > back to 65%. Any pointers on what might

Re: High memory usage during nodetool repair

2021-07-29 Thread Amandeep Srivastava
exists one? Wanted to understand the role of the heap and off-heap memory separately during the process. Also, for my case, once the nodes reach the 95% memory usage, it stays there for almost 10-12 hours after the repair is complete, before falling back to 65%. Any pointers on what might be consumin

Re: High memory usage during nodetool repair

2021-07-28 Thread Erick Ramirez
Based on the symptoms you described, it's most likely caused by SSTables being mmap()ed as part of the repairs. Set `disk_access_mode: mmap_index_only` so only index files get mapped and not the data files. I've explained it in a bit more detail in this article -- https://community.datastax.com/qu

Re: High memory usage during nodetool repair

2021-07-28 Thread Bowen Song
Could it be related to https://issues.apache.org/jira/browse/CASSANDRA-14096 ? On 28/07/2021 13:55, Amandeep Srivastava wrote: Hi team, My Cluster configs: DC1 - 9 nodes, DC2 - 4 nodes Node configs: 12 core x 96GB ram x 1 TB HDD Repair params: -full -pr -local Cassandra version: 3.11.4 I&#

High memory usage during nodetool repair

2021-07-28 Thread Amandeep Srivastava
Hi team, My Cluster configs: DC1 - 9 nodes, DC2 - 4 nodes Node configs: 12 core x 96GB ram x 1 TB HDD Repair params: -full -pr -local Cassandra version: 3.11.4 I'm running a full repair on DC2 nodes - one node and one keyspace at a time. During the repair, ram usage on all 4 nodes spike up

Re: TWCS repair and compact help

2021-06-29 Thread Kane Wilson
> > Oh. So our data is all messed up now because of the “nodetool compact” I > ran. > > > > Hi Erick. Thanks for the quick reply. > > > > I just want to be sure about compact. I saw Cassandra will do compaction > by itself even when I do not run “nodetool compact” manually (nodetool > compaction

RE: TWCS repair and compact help

2021-06-29 Thread Eric Wong
ache.org Subject: Re: TWCS repair and compact help You definitely shouldn't perform manual compactions -- you should let the normal compaction tasks take care of it. It is unnecessary to manually run compactions since it creates more problems than it solves as I've exp

Re: TWCS repair and compact help

2021-06-29 Thread Gábor Auth
Hi, On Tue, Jun 29, 2021 at 12:34 PM Erick Ramirez wrote: > You definitely shouldn't perform manual compactions -- you should let the > normal compaction tasks take care of it. It is unnecessary to manually run > compactions since it creates more problems than it solves as I've explained > in th

Re: TWCS repair and compact help

2021-06-29 Thread Erick Ramirez
You definitely shouldn't perform manual compactions -- you should let the normal compaction tasks take care of it. It is unnecessary to manually run compactions since it creates more problems than it solves as I've explained in this post -- https://community.datastax.com/questions/6396/. Cheers!

TWCS repair and compact help

2021-06-29 Thread Eric Wong
Hi: We need some help on cassandra repair and compact for a table that uses TWCS. We are running cassandra 4.0-rc1. A database called test_db, biggest table "minute_rate", storing time-series data. It has the following configuration: CREATE TABLE test_db.minute_rate ( marke

Re: unable to repair

2021-05-31 Thread Jeff Jirsa
t. You don't want to be issuing simultaneous create >>> statements from different clients. IF NOT EXISTS won't necessarily catch >>> all cases. >>> >>>> As for the schema mismatch, what is the best way of fixing that issue? >>>>

Re: unable to repair

2021-05-30 Thread Sébastien Rebecchi
eate statements from different clients. IF NOT EXISTS won't necessarily >>> catch all cases. >>> >>> >>>> As for the schema mismatch, what is the best way of fixing that issue? >>>> Could Cassandra recover from that on its own or is there a nod

Re: unable to repair

2021-05-30 Thread Bowen Song
it seems a very heavy procedure for that. A rolling restart is usually enough to fix the issue. You might want to repair afterwards, and check that data didn't make it to different versions of the table on different nodes (in which case some more int

Re: unable to repair

2021-05-30 Thread Sébastien Rebecchi
all cases. >> >> >>> As for the schema mismatch, what is the best way of fixing that issue? >>> Could Cassandra recover from that on its own or is there a nodetool command >>> to force schema agreement? I have heard that we have to restart the nodes 1 &g

Re: unable to repair

2021-05-28 Thread Sébastien Rebecchi
t way of fixing that issue? >> Could Cassandra recover from that on its own or is there a nodetool command >> to force schema agreement? I have heard that we have to restart the nodes 1 >> by 1, but it seems a very heavy procedure for that. >> > A rolling restart is usually

Re: unable to repair

2021-05-27 Thread Kane Wilson
ing that issue? > Could Cassandra recover from that on its own or is there a nodetool command > to force schema agreement? I have heard that we have to restart the nodes 1 > by 1, but it seems a very heavy procedure for that. > A rolling restart is usually enough to fix the issue. You

Re: unable to repair

2021-05-27 Thread Sébastien Rebecchi
OK I will check that, thank you! Sébastien Le jeu. 27 mai 2021 à 11:07, Bowen Song a écrit : > Hi Sébastien, > > > The error message you shared came from the repair coordinator node's > log, and it's the result of failures reported by 3 other nodes. If you > coul

Re: unable to repair

2021-05-27 Thread Bowen Song
Hi Sébastien, The error message you shared came from the repair coordinator node's log, and it's the result of failures reported by 3 other nodes. If you could have a look at the 3 nodes listed in the error message - 135.181.222.100, 135.181.217.109 and 135.181.221.180, you shou

Re: unable to repair

2021-05-26 Thread Sébastien Rebecchi
Sorry Kane, I am a little bit confused, we are talking about schema version at node level. Which client operations could trigger schema change at node level? Do you mean that for ex creating a new table trigger a schema change globally, not only at KS/table single level? Sébastien Le jeu. 27 mai

Re: unable to repair

2021-05-26 Thread Sébastien Rebecchi
I don't have schema changes, except keyspaces and tables creations. But they are done from multiple sources indeed. With a "create if not exists" statement, on demand. Thanks you for your answer, I will try to see if I could precreate them then. As for the schema mismatch, what is the best way of

Re: unable to repair

2021-05-26 Thread Kane Wilson
> > I have had that error sometimes when schema mismatch but also when all > schema match. So I think this is not the only cause. > Have you checked the logs for errors on 135.181.222.100, 135.181.217.109, and 135.181.221.180? They may give you some better information about why they are sending bad

Re: unable to repair

2021-05-26 Thread Sébastien Rebecchi
chema mismatch or node unavailability might result in this. > > Thanks, > > Dipan Shah > > -- > *From:* Sébastien Rebecchi > *Sent:* Wednesday, May 26, 2021 7:35 PM > *To:* user@cassandra.apache.org > *Subject:* unable to repair > > Hi, > &

Re: unable to repair

2021-05-26 Thread Dipan Shah
M To: user@cassandra.apache.org Subject: unable to repair Hi, I have an issue with repairing my Casandra cluster, that was already the case with Cassandra 3 and the issue is not solved with Cassandra 4 RC1. I run in a for loop, one 1 by 1, the following command: nodetool -h THE_NODE -u jTHE_USER -pw THE_PASSW

unable to repair

2021-05-26 Thread Sébastien Rebecchi
Hi, I have an issue with repairing my Casandra cluster, that was already the case with Cassandra 3 and the issue is not solved with Cassandra 4 RC1. I run in a for loop, one 1 by 1, the following command: nodetool -h THE_NODE -u jTHE_USER -pw THE_PASSWORD repair --full -pr and I always get the

Re: Repair on a slow node (or is it?)

2021-03-31 Thread Lapo Luchini
Thanks for all your suggestions! I'm looking into it and so far it seems to be mainly a problem of disk I/O, as the host is running on spindle disks and being a DR of an entire cluster gives it many changes to follow. First (easy) try will be to add an SSD as ZFS cache (ZIL + L2ARC). Should m

Re: Repair on a slow node (or is it?)

2021-03-29 Thread Kane Wilson
remote disaster recovery copy" 2.7 TiB. > > Doing repairs only on the production cluster takes a semi-decent time > (24h for the biggest keyspace, which takes 90% of the space), but by > doing repair across the two DCs takes forever, and segments often fail > even if I increased R

Repair on a slow node (or is it?)

2021-03-29 Thread Lapo Luchini
e), but by doing repair across the two DCs takes forever, and segments often fail even if I increased Reaper segment time limit to 2h. In trying to debug the issue, I noticed that "compactionstats -H" on the DR node shows huge (and very very slow) validations: compaction completed total

Re: Best strategy to run repair

2021-03-23 Thread Bowen Song
Kane also mentioned) for subrange repair. Doing subrange repair yourself may lead to a lot of trouble as calculating correct subranges is not an easy task. On Tue, Mar 23, 2021 at 3:38 AM Kane Wilson wrote: -pr on all nodes takes much longer as you'll do at least

Re: Best strategy to run repair

2021-03-22 Thread manish khandelwal
n Mon, 22 Mar 2021 at 20:28, manish khandelwal < > manishkhandelwa...@gmail.com> wrote: > >> Also try to use Cassandra reaper (as Kane also mentioned) for subrange >> repair. Doing subrange repair yourself may lead to a lot of trouble as >> calculating correct subranges is

Re: Best strategy to run repair

2021-03-22 Thread Surbhi Gupta
Does describering not give the correct sub ranges for each node ? On Mon, 22 Mar 2021 at 20:28, manish khandelwal < manishkhandelwa...@gmail.com> wrote: > Also try to use Cassandra reaper (as Kane also mentioned) for subrange > repair. Doing subrange repair yourself may lead to a lo

Re: Best strategy to run repair

2021-03-22 Thread manish khandelwal
Also try to use Cassandra reaper (as Kane also mentioned) for subrange repair. Doing subrange repair yourself may lead to a lot of trouble as calculating correct subranges is not an easy task. On Tue, Mar 23, 2021 at 3:38 AM Kane Wilson wrote: > -pr on all nodes takes much longer as you&#x

Re: Best strategy to run repair

2021-03-22 Thread Kane Wilson
, and managed services On Tue, Mar 23, 2021 at 7:33 AM Surbhi Gupta wrote: > Hi, > > We are on open source 3.11.5 . > We need to repair a production cluster . > We are using num_token as 256 . > What will be a better option to run repair ? > 1. nodetool -pr (Primary rang

Best strategy to run repair

2021-03-22 Thread Surbhi Gupta
Hi, We are on open source 3.11.5 . We need to repair a production cluster . We are using num_token as 256 . What will be a better option to run repair ? 1. nodetool -pr (Primary range repair on all nodes, one node at a time) OR 2. nodetool -st -et (Subrange repair , taking the ranges for each

Re: Full repair results in uneven data distribution

2021-03-16 Thread Bowen Song
e by one. To avoid this issue in the future, I'd recommend you avoid causing Cassandra to do anti-compaction during repairs. You can achieve that by specifying a DC in the "nodetool repair" command, such as "nodetool repair -full -dc DC1". This will work even you only

Full repair results in uneven data distribution

2021-03-16 Thread Inquistive allen
Hello Team, Sorry for this might be a simple question. I was working on Cassandra 2.1.14 Node1 -- 4.5 mb data Node2 -- 5.3 mb data Node3 -- 4.9 mb data Node3 was down since 90 days. I brought it up and it joined the cluster. To sync data I ran nodetool repair --full Repair was successful

Mutation dropped and Read-Repair performance issue

2020-12-19 Thread sunil pawar
Hi All, We are facing problems of failure of Read-Repair stages with error Digest Mismatch and count is 300+ per day per node. At the same time, we are experiencing node is getting overloaded for a quick couple of seconds due to long GC pauses (of around 7-8 seconds). We are not running a repair

Re: Anti Compactions while running repair

2020-11-09 Thread manish khandelwal
t;> One more query, all are sstables (repaired + unrepaired ) part of >> anti-compaction? We are using full repair with -pr option. >> >> Regards >> Manish >> >> On Mon, Nov 9, 2020 at 11:17 AM Alexander DEJANOVSKI < >> adejanov...@gmail.com> wrote: >>

Re: Issue with anti-compaction while running full repair with -pr option

2020-11-09 Thread manish khandelwal
Pushpendra, Probably you can read all the data using spark with Consistency level ALL for repairing the data. Regards Manish On Mon, Nov 9, 2020 at 11:31 AM Alexander DEJANOVSKI wrote: > Hi, > > You have two options to disable anticompaction when running full repair: > > - add

Re: Anti Compactions while running repair

2020-11-08 Thread Alexander DEJANOVSKI
Only sstables at unrepaired state go through anticompaction. Le lun. 9 nov. 2020 à 07:01, manish khandelwal a écrit : > Thanks Alex. > > One more query, all are sstables (repaired + unrepaired ) part of > anti-compaction? We are using full repair with -pr option. > > Regards &

Re: Anti Compactions while running repair

2020-11-08 Thread manish khandelwal
Thanks Alex. One more query, all are sstables (repaired + unrepaired ) part of anti-compaction? We are using full repair with -pr option. Regards Manish On Mon, Nov 9, 2020 at 11:17 AM Alexander DEJANOVSKI wrote: > Hi Manish, > > Anticompaction is the same whether you run full or in

Re: Issue with anti-compaction while running full repair with -pr option

2020-11-08 Thread Alexander DEJANOVSKI
Hi, You have two options to disable anticompaction when running full repair: - add the list of DCs using the --dc flag (even if there's just a single DC in your cluster) - Use subrange repair, which is done by tools such as Reaper (it can be challenging to do it yourself on a vnode cl

  1   2   3   4   5   6   7   8   9   10   >