[
https://issues.apache.org/jira/browse/CASSANDRA-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120762#comment-17120762
]
Erick Ramirez commented on CASSANDRA-15521:
-------------------------------------------
We should summarise what's been discussed [so far] on the dev ML in this
ticket. I think we've narrowed it down to {{num_tokens: 8}} and
{{num_tokens:16}} but my take from the ML thread is that we're leaning towards
16. (I'm happy to be corrected) :)
> Update default for num_tokens from 256 to something more reasonable
> -------------------------------------------------------------------
>
> Key: CASSANDRA-15521
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15521
> Project: Cassandra
> Issue Type: Improvement
> Components: Feature/Virtual Nodes
> Reporter: Jeremy Hanna
> Assignee: Jeremy Hanna
> Priority: Normal
>
> The default for num_tokens or the number of token ranges assigned to a node
> using virtual nodes is way too high. 256 token ranges makes repair painful.
> Since it's a default, someone new to Cassandra won't know better and if left
> unchanged, they will have to live with it or perform a migration to a new
> datacenter with a lower number.
> At the same time, going too low with the default allocation algorithm can
> hotspot nodes to have more tokens assigned than others. There is a new token
> allocation algorithm introduced but it's not default.
> The proposal of this ticket is to set the default to something more
> reasonable to align with best practices without using the new token algorithm
> or giving it specific token values as some do. 32 is a good compromise and
> is what the project uses in a lot of the tests that are done.
> So generally it would be good to move to a more sane value and to align with
> testing so users are more confident that the defaults have a lot of testing
> behind them.
> As discussed on the dev mailing list, we want to make sure this change to the
> default doesn't come as an unpleasant surprise to cluster operators. For
> num_tokens specifically, if you were to upgrade to a version with the new
> default and the user didn't change it to the existing value, the node would
> not start, saying you can't change the num_tokens on an existing node. So we
> will want to put a release note to indicate that when upgrading, make a note
> of the num_tokens change when looking at the new configuration.
> Along with not being able to start nodes, which is fail-fast, there is the
> matter of adding new nodes to the cluster. You can certainly add a new node
> to a cluster or datacenter with a different number of token ranges assigned.
> It will give that node a different amount of data to be responsible for. For
> example, if the nodes in a datacenter all have num_tokens=256 (current
> default) and you add a node to that datacenter with num_tokens=32 (new
> default), it will only claim 1/8th of the token ranges and data as the other
> nodes in that datacenter. Fortunately, this is a property that is explicitly
> defined rather than implicit like some of the table settings. Also most if
> not all operators will upgrade the existing nodes to that new version before
> trying to add a node with that new version. So if there is a different
> number for num_tokens on the existing nodes, they'll be aware of it
> immediately.
> In any case, this is a long proposal for what will be a small change in the
> cassandra.yaml and something in the release notes, that is, changing the
> default num_tokens value from 256 to 32.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]