RE: num_token effects

Isaeed Mohanna Thu, 24 Jul 2025 03:34:18 -0700

Thanks so much for the detailed explanation

From: Elliott Sims via user <user@cassandra.apache.org>
Sent: Thursday, July 24, 2025 7:34 AM
To: user@cassandra.apache.org
Cc: Elliott Sims <elli...@backblaze.com>
Subject: Re: num_token effects

You're correct about the DC migration being the safest/best option.

Another option would be to change the token count by smaller increments to not
overload the old nodes, but this is an incredibly slow and painful process that
you almost certainly shouldn't do.

One of the biggest downsides to higher token counts is repair cost. The
subrange repairs managed by Reaper actually dampen the impact a lot here -
pre-Reaper, especially with Cassandra <=3.0, repairs could lead to sstable
explosion and GC load that could basically collapse the cluster. With
Cassandra 3.11+ and Reaper the impact is way less severe. It'll just make
repairs slower. In my experience a few years ago, going from 256 to 16 cut
repair times in half.

There's some availability implications to higher num_token values, partly
mitigated by using NetworktopologyStrategy:
https://github.com/jolynch/python_performance_toolkit/raw/master/notebooks/cassandra_availability/whitepaper/cassandra-availability-virtual.pdf

Node-joining speed is kind of a weird mix of benefit/deficit. As far as I can
tell each token range will be streamed in and handled by a thread, so if you
have 32 cores and at least 32 (probably 96, if rf=3) nodes already in the
cluster, num_token=32 will be faster than num_token=16. Very diminishing
returns much above that though, since past that point a large enough cluster to
benefit is also going to see more pain (node rejoin times, repair times, etc)
from the larger number of token ranges.

On Tue, Jul 22, 2025 at 6:39 PM Isaeed Mohanna
<isa...@xsense.co<mailto:isa...@xsense.co>> wrote:
Hi
In a 4 node cluster with replication factor 3 we have been using the default
num_tokens=256 from Cassandra 3. We have upgraded to Cassandra 4.1 last year
and planning on upgrading to 5 soon and we see that the recommendation has been
changed to 16.
An attempt to do a rolling build of the cluster with num_token=16 has failed
since once I take down one of the old nodes most of the data arrives into the
other old nodes which then become strained and at high risk of crashing.
Each of the old nodes stores data of ~ 700GB.
To my understanding the safest option is to setup a second DC and join it to
the cluster then decommission the old one but I would like to understand how
bad is it to keep the num_tokens at 256 and how will it affect practically in a
day to day use in 4.1 and Cassandra 5.
Thanks for any help,
Isaeed Mohanna

Best regards,
Isaeed Mohanna, Software Development Manager
[cid:image001.png@01DBFC9F.56F72A40] Intelligent Cold Chain Monitoring
E-Mail: isa...@xsense.co<mailto:isa...@xsense.co> | Web:
www.xsense.co<https://urldefense.com/v3/__http:/www.xsense.co/__;!!A7vfX_LdLUs!o8RL6kYRGTJ6JvKF9BNTOww3oA6X_PZv357unlzj7Huo1dJaYFtO5r5s8ZklfpF_HuZXeQc1be30d4GfIw$>

If you have further questions please don’t hesitate to contact me.

This email, including its contents and any attachment(s), may contain
confidential and/or proprietary information and is solely for the review and
use of the intended recipient(s). If you have received this email in error,
please notify the sender and permanently delete this email, its content, and
any attachment(s). Any disclosure, copying, or taking of any action in reliance
on an email received in error is strictly prohibited.

RE: num_token effects

Reply via email to