RE: [External]Cassandra 5.0: Any Official Tests Supporting 'Free Performance Gains'

2025-03-19 Thread Jiri Steuer (EIT)
Hi FMH,

I haven't seen these official tests and that was the reason I did these tests 
with the official tools. Regards

   J. Steuer



This item's classification is Internal. It was created by and is in property of 
EmbedIT. Do not distribute outside of the organization.

From: FMH 
Sent: Wednesday, March 19, 2025 3:14 PM
To: Cassandra Support-user 
Subject: [External]Cassandra 5.0: Any Official Tests Supporting 'Free 
Performance Gains'

This message is from an EXTERNAL SENDER - be CAUTIOUS, particularly with links 
and attachments.
Please report all suspicious e-mails to 
helpd...@embedit.com


As I'm evaluating to upgrade to C* 4 or 5, one statement caught my attention 
for the 5 release 
(https://cassandra.apache.org/_/blog/Apache-Cassandra-5.0-Announcement.html):
"Trie Memtables and Trie SSTables These low-level optimizations yield 
impressive gains in memory usage and storage efficiency, providing a "free" 
performance"

I have only found a single doc show-casing empirical evidence for such 
performance gains. As per this document, compared to version 4.1, C* 5 had ...
- 38% better performance and 26% better response time for write operations
- 12% better performance and 9% better response time for read operations

I'm just wondering if there has been any official test results supporting the 
claim for 'free performance'.

I'm trying to corroborate the test results described above.

https://www.linkedin.com/pulse/performance-comparison-between-cassandra-version-41-5-jiri-steuer-pxbtf/


Thank you



Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

2025-03-19 Thread William Crowell via user
Bowen, Fabien, Stéphane, and Luciano,

A bit more information here...

We have not run incremental repairs, and we have not made any changes to the 
compression properties on the tables.

When we first started the database the TTL on the records was set to 0 but not 
it is set to 10 days.

We do have one table in a keyspace that is occupying 84.1GB of disk space:

ls -l /var/lib/cassandra/data/keyspace1/table1
…
-rw-rw-r--. 1  x 84145170181 Mar 18 08:28 nb-163033-big-Data.db
…

Regards,

William Crowell

From: William Crowell via user 
Date: Friday, March 14, 2025 at 10:53 AM
To: user@cassandra.apache.org 
Cc: William Crowell , Bowen Song 
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3
Bowen,

This is just a single Cassandra node.  Unfortunately, I cannot get on the box 
at the moment, but the following configuration is in cassandra.yaml:

snapshot_before_compaction: false
auto_snapshot: true
incremental_backups: false

The only other configuration parameter that had been changed other than the 
keystore and truststore was num_tokens (default: 16):

num_tokens: 256

I also noticed the compression ratio on the largest table is not good:  
0.566085855123187

Regards,

William Crowell

From: Bowen Song via user 
Date: Friday, March 14, 2025 at 10:13 AM
To: William Crowell via user 
Cc: Bowen Song 
Subject: Re: Increased Disk Usage After Upgrading From Cassandra 3.x.x to 4.1.3

A few suspects:

* snapshots, which could've been created automatically, such as by dropping or 
truncating tables when auto_snapshots is set to true, or compaction when 
snapshot_before_compaction is set to true

* backups, which could've been created automatically, e.g. when 
incremental_backups is set to true

* mixing repaired and unrepaired sstables, which is usually caused by 
incremental repairs, even if it had only been ran once

* partially upgraded cluster, e.g. mixed Cassandra version in the same cluster

* token ring change (e.g. adding or removing nodes) without "nodetool cleanup"

* actual increase in data size

* changes made to the compression table properties



To find the root cause, you will need to check the file/folder sizes to find 
out what is using the extra disk space, and may also need to review the 
cassandra.yaml file (or post it here with sensitive information removed) and 
any actions you've made to the cluster prior to the first appearance of the 
issue.



Also, manually running major compactions is no advised.
On 12/03/2025 20:26, William Crowell via user wrote:
Hi.  A few months ago, I upgraded a single node Cassandra instance from version 
3 to 4.1.3.  This instance is not very large with about 15 to 20 gigabytes of 
data on version 3, but after the update it has went substantially up to over 
100gb.  I do a compaction once a week and take a snapshot, but with the 
increase in data it makes the compaction a much lengthier process.  I also did 
a sstableupate as part of the upgrade.  Any reason for the increased size of 
the database on the file system?

I am using the default STCS compaction strategy.  My “nodetool cfstats” on a 
heavily used table looks like this:

Keyspace : 
Read Count: 48089
Read Latency: 12.52872569610514 ms
Write Count: 1616682825
Write Latency: 0.0067135265490310386 ms
Pending Flushes: 0
Table: sometable
SSTable count: 13
Old SSTable count: 0
Space used (live): 104005524836
Space used (total): 104005524836
Space used by snapshots (total): 0
Off heap memory used (total): 116836824
SSTable Compression Ratio: 0.566085855123187
Number of partitions (estimate): 14277177
Memtable cell count: 81033
Memtable data size: 13899174
Memtable off heap memory used: 0
Memtable switch count: 13171
Local read count: 48089
Local read latency: NaN ms
Local write count: 1615681213
Local write latency: 0.005 ms
Pending flushes: 0
Percent repaired: 0.0
Bytes repaired: 0.000KiB
Bytes unrepaired: 170.426GiB
Bytes pending repair: 0.000KiB
Bloom filter false positives: 125
Bloom filter false ratio: 0.00494
Bloom filter space used: 24656936
Bloom filter off heap memory used: 24656832
Index summary off heap memory used: 2827608
Compression metadata off heap memory used: 89352384
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 61214
Compacted partition mean bytes: 11888
Average live cells per slice (last five minutes): NaN
Maximum live cells per slice (last five minutes

Cassandra 5.0: Any Official Tests Supporting 'Free Performance Gains'

2025-03-19 Thread FMH
As I'm evaluating to upgrade to C* 4 or 5, one statement caught my
attention for the 5 release (
https://cassandra.apache.org/_/blog/Apache-Cassandra-5.0-Announcement.html):
"Trie Memtables and Trie SSTables These low-level optimizations yield
impressive gains in memory usage and storage efficiency, providing a "free"
performance"

I have only found a single doc show-casing empirical evidence for such
performance gains. As per this document, compared to version 4.1, C* 5 had
...
- 38% better performance and 26% better response time for write operations
- 12% better performance and 9% better response time for read operations

I'm just wondering if there has been any official test results supporting
the claim for 'free performance'.

I'm trying to corroborate the test results described above.

https://www.linkedin.com/pulse/performance-comparison-between-cassandra-version-41-5-jiri-steuer-pxbtf/


Thank you


Cassandra 4 on RHEL 9 with fapplicy

2025-03-19 Thread Surbhi Gupta
Hi ,

Has anyone able to make fapolicy with Cassandra 4 on RHEL9 .
We tried but we see that initially it took a fresh 3 node cluster to spin
up around 3 hours. Initially schema pull does not happen and eventually
everything starts connecting.

Anyone has setup fapolicy?
Please advise.

Thanks
Surbhi