Re: Issue replacing a dead node

2025-05-16 Thread Bowen Song via user
n the old node returning to DN state. What is `nodetool bootstrap resume` going to do? Is there a risk to running resume when the replacement node is no longer in the cluster? Could too high of a tombstone ratio cause this? On 5/15/25 5:08 PM, Bowen Song via user wrote: The dead node being

Re: Issue replacing a dead node

2025-05-15 Thread Bowen Song via user
The dead node being replaced went back to DN state indicating the new replacement node failed to join the cluster, usually because the streaming was interrupted (e.g. by network issues, or long STW GC pauses). I would start looking for red flags in the logs, including Cassandra's logs, GC logs,

Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-14 Thread Bowen Song via user
A few suspects: * snapshots, which could've been created automatically, such as by dropping or truncating tables when auto_snapshots is set to true, or compaction when snapshot_before_compaction is set to true * backups, which could've been created automatically, e.g. when incremental_backup

Re: Cassandra Memory Spikes - Tuning Suggestions?

2025-02-26 Thread Bowen Song via user
Hi vignesh, Correlation does not imply causation. I wouldn't work on the assumption that the memory usage spikes are caused by compactions to start with. It's best to prove the causal effect first. There's multiple ways to do this, I'm just throwing in some ideas: 1. taking a heap dump whil

Re: Unexplained stuck memtable flush

2024-11-13 Thread Bowen Song via user
It's interesting how they organised the documentation. So it is guaranteed that the ConcurrentLinkedQueue can be modified and won't break the iterator. But I don't see anything mentioning the reverse. Can an iterator removing items from the middle of a queue (which by definition is FIFO) bre

Re: Unexplained stuck memtable flush

2024-11-12 Thread Bowen Song via user
ndra/db/ReadExecutionController.java#L141C77-L141C97> /            indexController = new ReadExecutionController(command, indexCfs.readOrdering.start(), indexCfs.metadata(), null, null, NO_SAMPLING, false);/ If "/indexCfs.readOrdering.start()/" succeeded but the constructor

Re: Unexplained stuck memtable flush

2024-11-08 Thread Bowen Song via user
atch them properly by itself) --- How many CommitLogSegment objects do you have in your heap dump? What are values for the following fields of CommitLogSegment objects? lastSyncedOffset lastMarkerOffset cdcState Do you have CDC index files written by org.apache.cassandra.db.commitlog.CommitLogSegmen

Re: Unexplained stuck memtable flush

2024-11-07 Thread Bowen Song via user
as a wall clock time? I've found that the syncComplete.queue is empty, meaning the WaitQueue object believes that there's nothing waiting for the signal, yet the "read-hotness-tracker:1" thread is clearly waiting for it. On 06/11/2024 13:49, Bowen Song via user wrote: I

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
06/11/2024 18:36, Bowen Song via user wrote: I can see some similarities and some differences between your thread dump and ours. In your thread dump: * no MemtableFlushWriter thread * the MemtablePostFlush thread is idle * the MemtableReclaimMemory thread is waiting for a barrier, possib

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
onController(command, indexCfs.readOrdering.start(), indexCfs.metadata(), null, null, NO_SAMPLING, false);/ If "/indexCfs.readOrdering.start()/" succeeded but the constructor "/new ReadExecutionController/", then we are not closing "/indexCfs.readOrdering/", which me

Re: Unexplained stuck memtable flush

2024-11-06 Thread Bowen Song via user
ck on the signal.awaitUninterruptibly() Now I know what is blocking the memtable flushing, but I haven't been able to figure out is why it got stuck on waiting for that signal. I would appreciate it if anyone can offer some insight here. On 05/11/2024 17:48, Bowen Song via user wrote: I

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
I will give it a try and see what I can find. I plan to go down the rabbit hole tomorrow. Will keep you updated. On 05/11/2024 17:34, Jeff Jirsa wrote: On Nov 5, 2024, at 4:12 AM, Bowen Song via user wrote: Writes on this node starts to timeout and fail. But if left untouched, it's

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
e cluster, and we haven't seen a single issue since switching to XFS. Thanks for the advice though, I'll keep it in mind if I encounter it again. Jon On Tue, Nov 5, 2024 at 9:18 AM Bowen Song via user wrote: Hi Jon, That is interesting. We happen to be running Cassandra

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
p (stack traces) is small and does not have sensitive info. Regards, Dmitry On Tue, 5 Nov 2024 at 13:53, Bowen Song via user wrote: It's about 18GB in size and may contain a huge amount of sensitive data (e.g. all the pending writes), so I can't share

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
DC3? I'll extend the hint window (e.g., to one week) and allow the other data centers (DC1 and DC2) to save hints for DC3. Then, when DC3 returns online, it can receive and process the hints. Edi On Tue, Nov 5, 2024 at 2:34 PM Bowen Song via user wrote: You just confirmed my susp

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
contain thread stacks info. Thread dump (stack traces) is small and does not have sensitive info. Regards, Dmitry On Tue, 5 Nov 2024 at 13:53, Bowen Song via user wrote: It's about 18GB in size and may contain a huge amount of sensitive data (e.g. all the pending writes), so I can&#x

Re: Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
it here. On 05/11/2024 13:01, Dmitry Konstantinov wrote: Hi Bowen, would it be possible to share a full thread dump? Regards, Dmitry On Tue, 5 Nov 2024 at 12:12, Bowen Song via user wrote: Hi all, We have a cluster running Cassandra 4.1.1. We are seeing the memtable flush randomly

Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
half of new nodes) fill be finish from perspective of data synch? Thx for sharing you best practices, regards     Jiri * * * This item's classification is Internal. It was created by and is in property of the EmbedIT. Do not distribute outside of the organization. From:* Bowen Song via

Re: [External]Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
outside of the organization. From:* Bowen Song via user *Sent:* Tuesday, November 5, 2024 1:12 PM *To:* d...@cassandra.apache.org; user@cassandra.apache.org *Cc:* Bowen Song *Subject:* [External]Unexplained stuck memtable flush This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
nks Edi On Tue, Nov 5, 2024 at 1:27 PM Bowen Song via user wrote:  From the way you wrote this, I suspect the name DC may have different meaning here. Are you talking about the physical location (i.e server rooms), or the Cassandra DC (i.e. group of nodes for replication p

Unexplained stuck memtable flush

2024-11-05 Thread Bowen Song via user
Hi all, We have a cluster running Cassandra 4.1.1. We are seeing the memtable flush randomly getting stuck. This has happened twice in the last 10 days, to two different nodes in the same cluster. This started to happen after we enabled CDC, and each time it got stuck, there was at least one

Re: Migration Cassandra to a new data center

2024-11-05 Thread Bowen Song via user
From the way you wrote this, I suspect the name DC may have different meaning here. Are you talking about the physical location (i.e server rooms), or the Cassandra DC (i.e. group of nodes for replication purposes)? On 05/11/2024 11:01, edi mari wrote: Hello, We have a Cassandra cluster deploy

Re: Cross-Node Latency Issues

2024-10-24 Thread Bowen Song via user
tup. Regards, Ashish On Thu, Oct 24, 2024 at 6:32 PM Bowen Song via user wrote: Can you be more explicit about the "latency metrics from Grafana" you looked at? What percentile latencies were you looking at? Any aggregation used? You can post the underlying queries used for the

Re: Cross-Node Latency Issues

2024-10-24 Thread Bowen Song via user
Can you be more explicit about the "latency metrics from Grafana" you looked at? What percentile latencies were you looking at? Any aggregation used? You can post the underlying queries used for the dashboard if that's easier than explaining it. In general you should only care about the max, no

Re: Tombstone Generation in Cassandra 4.1.3 Despite No Update/Delete Operations

2024-10-24 Thread Bowen Song via user
Is one tombstone scanned per query causing any issue? I mean real issues, not the scanning of tombstone itself. On 24/10/2024 04:56, Naman kaushik wrote: Thanks everyone for your responses. We have columns with |list| and |list| types, and after using |sstabledump|, we found that the tombston

Re: Upgrading to Cassandra 5.0

2024-10-03 Thread Bowen Song via user
The supported and recommend route for upgrading 3.x to 5.x is to upgrade from 3.x to 4.x first, and then from 4.x to 5.x. Even if you've tested upgrading from 3.x to 5.x directly and it worked in a test environment, it is still unsupported and not recommended. That's because you may overlook s

Re: Recommend Cassandra consultant

2024-09-27 Thread Bowen Song via user
Hello Jeff, I'm not a consultant, but do have some experience on troubleshooting this type of issues. The first thing in troubleshooting is gathering information. You don't want to troubleshoot issues blindly. Some (but not all) important information are CPU usage, network IO, disk IO, JVM

Re: CDC and schema disagreement

2024-09-24 Thread Bowen Song via user
Thank you for reporting this. I may check next week more closely and let you know. On Fri, Sep 20, 2024 at 5:43 PM Bowen Song via user wrote: Hi all, I suspect that I've ran into a bug (or two). On Cassandra 4.1.1, when `cdc_enabled` in the cassandra.yaml file is set

CDC and schema disagreement

2024-09-20 Thread Bowen Song via user
Hi all, I suspect that I've ran into a bug (or two). On Cassandra 4.1.1, when `cdc_enabled` in the cassandra.yaml file is set to `false` on at least one node in the cluster, and then the `ALTER TABLE ... WITH cdc=...` statement was run against that node, the cluster will end up in the schema

Re: Cassandra Inbound Error Message

2024-08-29 Thread Bowen Song via user
vent these errors? Where can I find more information about these errors, and under what circumstances do these messages appear? Additionally, what does the term "SMALL_MESSAGES" mean in the error message? Edi On Tue, Aug 27, 2024 at 8:04 PM Bowen Song via user wrote: Hello

Re: Cassandra Inbound Error Message

2024-08-27 Thread Bowen Song via user
Hello Edi, Before attempt to prematurely optimise, let's try to understand the situation a bit better. * What's the bandwidth available? (think: total bandwidth and the typical usage) * What's causing the heavy network load? * How much bandwidth is consumed by the heavy network load? * How l

Re: Bootstrap error - Cassandra 4.1.5

2024-08-15 Thread Bowen Song via user
   99% Any ideas? Thank you! -Joe On 8/15/2024 9:03 AM, Bowen Song via user wrote: You may need to look at the zipped log files if the streaming had been running for a while before failing. The error could have happened hours or days before the final failure. If your cluster is al

Re: Bootstrap error - Cassandra 4.1.5

2024-08-15 Thread Bowen Song via user
p for storage.  I'd love a way to double the number of nodes, but sounds like I shouldn't have let it get this far.  We're having some odd performance issues on reads, that I'm diagnosing. -Joe On 8/14/2024 5:07 PM, Bowen Song via user wrote: It looks like all your nodes are in

Re: Bootstrap error - Cassandra 4.1.5

2024-08-14 Thread Bowen Song via user
It looks like all your nodes are in the same DC and the same rack with 256 vnodes each. It's very hard (if not impossible) to add multiple nodes to the same DC concurrently and safely in this setup. You are better off adding one node at a time to this cluster. Try search for "ERROR" in the log

Re: TWCS Log Warning

2024-05-23 Thread Bowen Song via user
As the log level name "DEBUG" suggested, these are debug messages, not warnings. Is there any reason made you believe that these messages are warnings? On 23/05/2024 11:10, Isaeed Mohanna wrote: Hi I have a big table (~220GB reported by used space live by tablestats) with time series data

Re: Change num_tokens in a live cluster

2024-05-16 Thread Bowen Song via user
data need to be moved? On 16/05/2024 15:54, Gábor Auth wrote: Hi, On Thu, 16 May 2024, 10:37 Bowen Song via user, wrote: You can also add a new DC with the desired number of nodes and num_tokens on each node with auto bootstrap disabled, then rebuild the new DC from the existing

Re: Change num_tokens in a live cluster

2024-05-16 Thread Bowen Song via user
You can also add a new DC with the desired number of nodes and num_tokens on each node with auto bootstrap disabled, then rebuild the new DC from the existing DC before decommission the existing DC. This method only needs to copy data once, and can copy from/to multiple nodes concurrently, ther

Re: compaction trigger after every fix interval

2024-04-28 Thread Bowen Song via user
There's many things that can trigger a compaction, knowing the type of compaction can help narrow it down. Have you looked at the nodetool compactionstats command output when it is happening? What is the compaction type? It can be "compaction", but can also be something else, such as "validati

Re: Trouble with using group commitlog_sync

2024-04-24 Thread Bowen Song via user
, Apr 23, 2024 at 10:24 PM Bowen Song via user wrote: You might have run into the bottleneck of the driver's IO thread. Try increase the driver's connections-per-server limit to 2 or 3 if you've only got 1 server in the cluster. Or alternatively, run two clie

Re: Mixed Cluster 4.0 and 4.1

2024-04-24 Thread Bowen Song via user
about having a schema mismatch for this long time. Should I be concerned, or have others upgraded in a similar way? Thanks Paul On 24 Apr 2024, at 17:02, Bowen Song via user wrote: Hi Paul, You don't need to plan for or introduce an outage for a rolling upgrade, which is the preferred

Re: Mixed Cluster 4.0 and 4.1

2024-04-24 Thread Bowen Song via user
Hi Paul, You don't need to plan for or introduce an outage for a rolling upgrade, which is the preferred route. It isn't advisable to take down an entire DC to do upgrade. You should aim to complete upgrading the entire cluster and finish a full repair within the shortest gc_grace_seconds (d

Re: Trouble with using group commitlog_sync

2024-04-24 Thread Bowen Song via user
n Tue, Apr 23, 2024 at 12:46 PM Bowen Song via user wrote: To achieve 10k loop iterations per second, each iteration must take 0.1 milliseconds or less. Considering that each iteration needs to lock and unlock the semaphore (two syscalls) and make network requests (more sy

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
what it's worth, I do see 100% CPU utilization in every single one of these tests. On Tue, Apr 23, 2024 at 11:01 AM Bowen Song via user wrote: Have you checked the thread CPU utilisation of the client side? You likely will need more than one thread to do insertion in a loo

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
23, 2024 at 10:45 AM Bowen Song via user wrote: I suspect you are abusing batch statements. Batch statements should only be used where atomicity or isolation is needed. Using batch statements won't make inserting multiple partitions faster. In fact, it often will make that s

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
eases to 3200 / second. If I set commitlog_sync_group_window to 1ms, the throughput increases to 13k / second, which is slightly less than batch commit mode. Is group commit mode supposed to have better performance than batch mode? On Tue, Apr 23, 2024 at 8:46 AM Bowen Song

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
hat set to the default 1000ms. On Tue, Apr 23, 2024 at 8:15 AM Bowen Song via user wrote: Why would you want to set commitlog_sync_batch_window to 1 second long when commitlog_sync is set to batch mode? The documentation <https://cassandra.apache.org/doc/stable/cassandra/archite

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
Why would you want to set commitlog_sync_batch_window to 1 second long when commitlog_sync is set to batch mode? The documentation on this says: /This window should be kept short because the writer threads

Re: Alternate apt repo for Debian installation?

2024-03-20 Thread Bowen Song via user
You can try https://archive.apache.org/dist/cassandra/debian/ The deb files can be found here: https://archive.apache.org/dist/cassandra/debian/pool/main/c/cassandra/ On 20/03/2024 20:47, Grant Talarico wrote: Hi there. Hopefully this is the right place to ask this question. I'm trying to ins

Re: [EXTERNAL] Re: About Cassandra stable version having Java 17 support

2024-03-18 Thread Bowen Song via user
Latest release on 2023-12-05). Can you please let us know when the team is planning to GA Cassandra 5.0 version which has Java 17 support? Regards, Divyanshi ---- *From:* Bowen Song via user *Sent:* Monday, March 18, 2024 5:14

Re: About Cassandra stable version having Java 17 support

2024-03-18 Thread Bowen Song via user
Why Java 17? It makes no sense to choose an officially non-supported library version for a piece of software. That decision making process is the problem, not the software's library version compatibility. On 18/03/2024 09:44, Divyanshi Kaushik via user wrote: Hi All, As per my project requir

Re: Best Practices for Managing Concurrent Client Connections in Cassandra

2024-02-29 Thread Bowen Song via user
They are suitable for production use for protecting your Cassandra server, not the clients. The clients likely will experience an error when the limit is reached, and it needs to handle that error appropriately. What you really want to do probably are: 1. change the client's behaviour, limit t

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Bowen Song via user
s/linux/commit/f382fb0bcef4c37dc049e9f6963e3baf204d815c). Regards, Dmitry On Thu, 22 Feb 2024 at 15:30, Bowen Song via user wrote: Hi Pierre, Is there anything stopping you from using the compaction_throughput <https://github.com/apache/cassandra/blob/f9e033f519c14596da4dc9548757

Re: Cassandra 4.1 compaction thread no longer low priority (cpu nice)

2024-02-22 Thread Bowen Song via user
Hi Pierre, Is there anything stopping you from using the compaction_throughput option in the cassandra.yaml file to manage the performance impact of compaction operations? With thread

Re: Requesting Feedback for Cassandra as a backup solution.

2024-02-19 Thread Bowen Song via user
d produce those to the respective topic when Kafka is live. Thanks and regards, Gowtham S On Sat, 17 Feb 2024 at 18:10, Bowen Song via user wrote: Hi Gowtham, On the face of it, it sounds like you are planning to use Cassandra for a queue-like application, which is a well docu

Re: Requesting Feedback for Cassandra as a backup solution.

2024-02-17 Thread Bowen Song via user
Hi Gowtham, On the face of it, it sounds like you are planning to use Cassandra for a queue-like application, which is a well documented anti-pattern. If that's not the case, can you please show the table schema and some example queries? Cheers, Bowen On 17/02/2024 08:44, Gowtham S wrote:

Re: Switching to Incremental Repair

2024-02-15 Thread Bowen Song via user
stian, we have nodes where the disk usage is multiple TiBs so significant growth can be quite dangerous in our case. Would the only safe choice be to mark all SSTables as unrepaired before stopping regular incremental repair? Thanks, Kristijonas On Wed, Feb 7, 2024 at 4:33 PM Bowen Song via

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
20:22 schrieb Bowen Song via user : Unfortunately repair doesn't compare each partition individually. Instead, it groups multiple partitions together and calculate a hash of them, stores the hash in a leaf of a merkle tree, and then compares the merkle trees between replicas during a repair s

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Unfortunately repair doesn't compare each partition individually. Instead, it groups multiple partitions together and calculate a hash of them, stores the hash in a leaf of a merkle tree, and then compares the merkle trees between replicas during a repair session. If any one of the partitions c

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Caution, using the method you described, the amount of data streamed at the end with the full repair is not the amount of data written between stopping the first node and the last node, but depends on the table size, the number of partitions written, their distribution in the ring and the 'repa

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Just one more thing. Make sure you run 'nodetool repair -full' instead of just 'nodetool repair'. That's because the command's default was changed in Cassandra 2.x. The default was full repair before that change, but the new default now is incremental repair. O

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Not disabling auto-compaction may result in repaired SSTables getting compacted together with unrepaired SSTables before the repair state is set on them, which leads to mismatch in the repaired data between nodes, and potentially very expensive over-streaming in a future full repair. You should

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
Full repair running for an entire week sounds excessively long. Even if you've got 1 TB of data per node, 1 week means the repair speed is less than 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck of the full repair speed and work on that instead. On 03/02/2024 16

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
ate a race condition? Thanks, Kristijonas On Fri, Feb 2, 2024 at 3:36 PM Bowen Song via user wrote: Hi Kristijonas, To answer your questions: 1. It's still necessary to run full repair on a cluster on which incremental repair is run periodically. The frequency of full

Re: Switching to Incremental Repair

2024-02-02 Thread Bowen Song via user
Hi Kristijonas, To answer your questions: 1. It's still necessary to run full repair on a cluster on which incremental repair is run periodically. The frequency of full repair is more of an art than science. Generally speaking, the less reliable the storage media, the more frequently full rep

Re: Tests failing for ppc64le architecture.

2024-01-30 Thread Bowen Song via user
Hi Sunidhi, In case you haven't noticed, this is the Cassandra user mailing list, not the dev mailing list. Most people in this mailing list have never attempted to built Cassandra from the source code. IMHO you should try the Cassandra dev mailing list for this type of things. Cheers, Bowen

Re: Over streaming in one node during repair.

2024-01-24 Thread Bowen Song via user
Some common causes of over-streaming: * "repair_session_space" is too small (either manually specified, or heap size is small and data on disk is large) * Manually deleting SSTable files * Unexpected foreign (e.g. from a backup) SSTable files * Marking SSTable as repaired or unrepaired inc

Re: COMMERCIAL:Re: COMMERCIAL:Re: COMMERCIAL:Re: system_schema.tables id and table uuid on disk mismatch

2024-01-18 Thread Bowen Song via user
rectory 3. removing the incorrect directory afterwards ---- *From:* Bowen Song via user *Sent:* Thursday, January 18, 2024 5:34:57 PM *To:* user@cassandra.apache.org *Cc:* Bowen Song *Subject:* COMMERCIAL:Re: COMMERCIAL:Re: COM

Re: COMMERCIAL:Re: COMMERCIAL:Re: system_schema.tables id and table uuid on disk mismatch

2024-01-18 Thread Bowen Song via user
wrote: It has same mismatch id in all nodes not just one node. *From:* Bowen Song via user *Sent:* Thursday, January 18, 2024 3:18:11 PM *To:* user@cassandra.apache.org *Cc:* Bowen Song *Subject:* COMMERCIAL:Re: COMMERCI

Re: COMMERCIAL:Re: system_schema.tables id and table uuid on disk mismatch

2024-01-18 Thread Bowen Song via user
e any data before nodetool import. Thanks again. *From:* Bowen Song via user *Sent:* Thursday, January 18, 2024 1:17:11 PM *To:* user@cassandra.apache.org *Cc:* Bowen Song *Subject:* COMMERCIAL:Re: system_schema.tables i

Re: system_schema.tables id and table uuid on disk mismatch

2024-01-18 Thread Bowen Song via user
It sounds like you have done some concurrent table creation/deletion in the past (e.g. CREATE TABLE IF NOT EXISTS from multiple clients), which resulted in this mismatch. After you restarted the node, Cassandra corrected it by discarding the old table ID and any data associated with it. This is

Re: About Map column

2023-12-18 Thread Bowen Song via user
Hi Sebastien, It's a bit more complicated than that. To begin with, the first-class citizen in Cassandra is partition, not row. All map fields in the same row are in the same partition, and all rows with the same partition key but different clustering keys are also in the same partition. Duri

Re: Schema inconsistency in mixed-version cluster

2023-12-12 Thread Bowen Song via user
I don't recognise those names: * channel_data_id * control_system_type * server_id * decimation_levels I assume these are column names of a non-system table. From the stack trace, this looks like an error from a node which was running 4.1.3, and this node was not the coordinator for this q

Re: Remove folders of deleted tables

2023-12-07 Thread Bowen Song via user
block like the 1st use case, and then perform a small number of queries to merge pre-results client-side) and in that case TTL+TWCS would probably apply, it remains the same question as above. Thanks for your time :) Sébastien. Le mer. 6 déc. 2023 à 15:46, Bowen Song via user a écrit :

Re: Remove folders of deleted tables

2023-12-06 Thread Bowen Song via user
u confirm (or invalidate) that please? Sébastien. Le mer. 6 déc. 2023 à 03:00, Bowen Song via user a écrit : The same table name with two different CF IDs is not just "temporary schema disagreements", it's much worse than that. This breaks the eventual consistency guara

Re: Remove folders of deleted tables

2023-12-05 Thread Bowen Song via user
ore the KS folder has 65K subfolders, so I would say I have time to think of redesigning the data model ^^ Nevertheless, does it sound too much in terms of thombstones in the systems tables (with the default GC grace period of 10 days)? Sébastien. Le mar. 5 déc. 2023, 12:19, Bowen Song via user

Re: Remove folders of deleted tables

2023-12-05 Thread Bowen Song via user
Please rethink your use case. Create and delete tables concurrently often lead to schema disagreement. Even doing so on a single node sequentially will lead to a large number of tombstones in the system tables. On 04/12/2023 19:55, Sébastien Rebecchi wrote: Thank you Dipan. Do you know if the

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Bowen Song via user
Hi Jeff, Does subrange repair mark the SSTable as repaired? From my memory, it doesn't. Regards, Bowen On 27/11/2023 16:47, Jeff Jirsa wrote: I don’t work for datastax, thats not my blog, and I’m on a phone and potentially missing nuance, but I’d never try to convert a cluster to IR by d

Re: Memory and caches

2023-11-27 Thread Bowen Song via user
Hi Sebastien, What's your goal? Improving cache hit rate purely for the sake of having a higher hit rate is rarely a good goal, because higher cache hit rate doesn't always mean faster operations. Do you have specific issues with performance? If so, can you please tell us more about it? Thi

Re: Migrating to incremental repair in C* 4.x

2023-11-27 Thread Bowen Song via user
Hi Sebastian, It's better to walk down the path on which others have walked before you and had great success, than a path that nobody has ever walked. For the former, you know it's relatively safe and it works. The same can hardly be said for the later. You said it takes a week to run the fu

Re: Cassandra stopped responding but still alive

2023-11-01 Thread Bowen Song via user
What do you mean by saying "Cassandra stopped responding ... to nodetool requests"? Is it a specific nodetool command (e.g. "nodetool status") or all nodetool commands? What's the issue? Was it an error message, such as connection refused? Or freezes/unresponsive? It's common to see Cassandra

Re: java driver with cassandra proxies (option: -Dcassandra.join_ring=false)

2023-10-12 Thread Bowen Song via user
I'm not 100% sure, but it's worth trying to disable the token metadata , because the driver needs to read the "system.peers_v2" table for populating the token metadata. On 11/10/2023 19:15, R

Re: [HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread Bowen Song via user
ed in 4.1.1. On Sep 11, 2023, at 2:09 PM, Bowen Song via user wrote: *Description* When adding a new node to an existing cluster, the new node bootstrapping fails with the "io.netty.channel.unix.Errors$NativeIoException: writeAddress(..) failed: Connection timed out" error

[HELP] Cassandra 4.1.1 Repeated Bootstrapping Failure

2023-09-11 Thread Bowen Song via user
*Description* When adding a new node to an existing cluster, the new node bootstrapping fails with the "io.netty.channel.unix.Errors$NativeIoException: writeAddress(..) failed: Connection timed out" error from the streaming source node. Resuming the bootstrap with "nodetool bootstrap re

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
ellent, but the repairs are painful.  I come from the Hadoop world where it was all about large servers with lots of disk. Relatively small number of tables, but some have a high number of rows, 10bil + - we use spark to run across all the data. -Joe On 8/17/2023 12:13 PM, Bowen Song via

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
s/ / / *Daemeon Reiydelle* *email: daeme...@gmail.com* *LI: https://www.linkedin.com/in/daemeonreiydelle/* *San Francisco 1.415.501.0198/Skype daemeon.c.m.reiydelle* On Thu, Aug 17, 2023 at 6:13 AM Bowen Song via user wrote: Just pointing out the obvious, for 1PB of data on nodes with 2

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
39 servers, but you'd have to run 40 instances of Cassandra on each server; maybe 24G of heap per instance, so a server with 1TByte of RAM would work. Is this what folks would do? -Joe On 8/17/2023 9:13 AM, Bowen Song via user wrote: Just pointing out the obvious, for 1PB of data on nodes with

Re: Big Data Question

2023-08-17 Thread Bowen Song via user
Just pointing out the obvious, for 1PB of data on nodes with 2TB disk each, you will need far more than 500 nodes. 1, it is unwise to run Cassandra with replication factor 1. It usually makes sense to use RF=3, so 1PB data will cost 3PB of storage space, minimal of 1500 such nodes. 2, depend

Re: 2 nodes marked as '?N' in 5 node cluster

2023-08-17 Thread Bowen Song via user
The first thing to look is the logs, specifically, the /var/log/cassandra/system.log file on each node. 5 seconds time drift is enough to cause Cassandra to fail. You should ensure the time difference between Cassandra nodes is very low by ensure time sync is working correctly, otherwise cross

Re: Survey about the parsing of the tooling's output

2023-07-10 Thread Bowen Song via user
We parse the output of the following nodetool sub-commands in our custom scripts: * status * netstats * tpstats * ring We don't mind the output format change between major releases as long as all the following are true: 1. major releases are not too frequent e.g. no more frequent than

Re: 4.0 upgrade

2023-07-09 Thread Bowen Song via user
You should not make DDL (e.g. TRUNCATE, ALTER TABLE) or DCL (e.g. GRANT, ALTER ROLE) operations or run repair on a mixed version cluster. Source: https://www.datastax.com/learn/whats-new-for-cassandra-4/migrating-cassandra-4x You should also ensure the gc_grace_seconds value is large enough to

Re: Upgrade from 3.11.5 to 4.1.x

2023-07-09 Thread Bowen Song via user
Assuming "do it in one go" means a rolling upgrade from 3.11.5 to 4.1.2 skipping all version numbers between these two, the answer is yes, you can "do it in one go". On 08/07/2023 01:14, Surbhi Gupta wrote: Hi, We have to upgrade from 3.11.5 to 4.1.x . Can we do it in one go ? Or do we have t

Re: Issue while node addition on cassandra 4.0.7

2023-06-29 Thread Bowen Song via user
27; in telnet. On 29/06/2023 12:42, Bowen Song wrote: Did anyone connecting to the servers' storage port via telnet, nc (netcat) or something similar? 218762506 is 0x0D0A0D0A, which is two newlines. On 29/06/2023 11:49, MyWorld wrote: When checked in the source nodes, we got similar erro

Re: Issue while node addition on cassandra 4.0.7

2023-06-29 Thread Bowen Song via user
net.Message$InvalidLegacyProtocolMagic: Read 218762506, Expected -900387334 On Thu, Jun 29, 2023 at 2:57 PM Bowen Song via user wrote: The expected value "-900387334" is the little endian decimal representation of the PROTOCOL_MAGIC value 0xCA552DFA defined in the net/Message.java

Re: Issue while node addition on cassandra 4.0.7

2023-06-29 Thread Bowen Song via user
The expected value "-900387334" is the little endian decimal representation of the PROTOCOL_MAGIC value 0xCA552DFA defined in the net/Message.java file. The

Re: Impact of column names on storage

2023-06-12 Thread Bowen Song via user
Actually, I was wrong. The column names are not stored in the *-Data.db files, but stored in the *-Statistics.db files. Cassandra only stores one copy of the column names per SSTable data file, therefore the disk space usage is negligible. On 12/06/2023 14:31, Bowen Song wrote: The SSTable

Re: Impact of column names on storage

2023-06-12 Thread Bowen Song via user
The SSTable compression will take care of the storage space usage, which means users usually don't need to worry about the length of column names, unless they are ridiculously long and hard to compress, or if SSTable compression is turned off. On 12/06/2023 13:55, Dimpal Gurabani wrote: Hi a

Re: Is cleanup is required if cluster topology changes

2023-05-09 Thread Bowen Song via user
of the suggested methods in this thread, and see how it goes. We will keep you updated on our progress. Thanks a lot once again! Jaydeep On Fri, May 5, 2023 at 8:55 AM Bowen Song via user wrote:

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
is down during the node replacement period, we will get availability drop because most of our use case is local_quorum with replication factor 3. On Fri, May 5, 2023 at 5:59 AM Bowen Song via user wrote: Have you thought of using "-Dcassandra.replace_address_first_boot=..." (or

Re: Is cleanup is required if cluster topology changes

2023-05-05 Thread Bowen Song via user
Have you thought of using "-Dcassandra.replace_address_first_boot=..." (or "-Dcassandra.replace_address=..." if you are using an older version)? This will not result in a topology change, which means "nodetool cleanup" is not needed after the operation is completed. On 05/05/2023 05:24, Jaydee

Re: Optimization for partitions with high number of rows

2023-04-16 Thread Bowen Song via user
in this table is such that all data for a given row is written at the same time, so I know I can use frozen udt instead of this, making it faster, but I wonder if there is another way. On Tue, Apr 11, 2023 at 9:06 PM Bowen Song via user wrote: Reading 4MB from 70k rows and 13 colum

  1   2   3   >