+1. I've been making a case for this for some time now, and was actually a
focus of my talk last week. I'd be very happy to get this into 4.0.

We've tested various num_tokens with the algorithm on various sized
clusters and we've found that typically 16 works best. With lower numbers
we found that balance is good initially but as a cluster gets larger you
have some problems. E.g We saw that on a 60 node cluster with 8 tokens per
node we were seeing a difference of 22% in token ownership, but on a <=12
node cluster a difference of only 12%. 16 tokens on the other hand wasn't
perfect but generally gave a better balance regardless of cluster size at
least up to 100 nodes. TBH we should probably do some proper testing and
record all the results for this before we pick a default (I'm happy to do
this - think we can use the original testing script for this).

But anyway, I'd say Jon is on the right track. Personally how I'd like to
see it is that we:

   1. Change allocate_tokens_for_keyspace to allocate_tokens_for_rf in the
   same way that DSE does it. Allowing a user to specify a RF to allocate
   from, and allowing multiple DC's.
   2. Add a new boolean property random_token_allocation, defaults to false.
   3. Make allocate_tokens_for_rf default to *unset**.
   4. Make allocate_tokens_for_rf *required*** if num_tokens > 1 and
   random_token_allocation != true.
   5. Default num_tokens to 16 (or whatever we find appropriate)

* I think setting a default is asking for trouble. When people are going to
add new DC's/nodes we don't want to risk them adding a node with the wrong
RF. I think it's safe to say that a user should have to think about this
before they spin up their cluster.
** Following above, it should be required to be set so that we don't have
people accidentally using random allocation. I think we should really be
aiming to get rid of random allocation completely, but provide a new
property to enable it for backwards compatibility (also for testing).

It's worth noting that a smaller number of tokens *theoretically* decreases
the time for replacement/rebuild, so if we're considering QUORUM
availability with vnodes there's an argument against having a very low
num_tokens. I think it's better to utilise NTS and racks to reduce the
chance of a QUORUM outage over banking on having a lower number of tokens,
as with just a low number of tokens unless you go all the way to 1 you are
just relying on luck that 2 nodes don't overlap. Guess what I'm saying is
that I think we should be choosing a num_tokens that gives the best
distribution for most cluster sizes rather than choosing one that
"decreases" the probability of an outage.

Also I think we should continue using CASSANDRA-13701 to track this. TBH I
think in general we should be a bit better at searching for and using
existing tickets...

On Sat, 22 Sep 2018 at 18:13, Stefan Podkowinski <s...@apache.org> wrote:

> There already have been some discussions on this here:
> https://issues.apache.org/jira/browse/CASSANDRA-13701
>
> The mentioned blocker there on the token allocation shouldn't exist
> anymore. Although it would be good to get more feedback on it, in case
> we want to enable it by default, along with new defaults for number of
> tokens.
>
>
> On 22.09.18 06:30, Dinesh Joshi wrote:
> > Jon, thanks for starting this thread!
> >
> > I have created CASSANDRA-14784 to track this.
> >
> > Dinesh
> >
> >> On Sep 21, 2018, at 9:18 PM, Sankalp Kohli <kohlisank...@gmail.com>
> wrote:
> >>
> >> Putting it on JIRA is to make sure someone is assigned to it and it is
> tracked. Changes should be discussed over ML like you are saying.
> >>
> >> On Sep 21, 2018, at 21:02, Jonathan Haddad <j...@jonhaddad.com> wrote:
> >>
> >>>> We should create a JIRA to find what other defaults we need revisit.
> >>> Changing a default is a pretty big deal, I think we should discuss any
> >>> changes to defaults here on the ML before moving it into JIRA.  It's
> nice
> >>> to get a bit more discussion around the change than what happens in
> JIRA.
> >>>
> >>> We (TLP) did some testing on 4 tokens and found it to work surprisingly
> >>> well.   It wasn't particularly formal, but we verified the load stays
> >>> pretty even with only 4 tokens as we added nodes to the cluster.
> Higher
> >>> token count hurts availability by increasing the number of nodes any
> given
> >>> node is a neighbor with, meaning any 2 nodes that fail have an
> increased
> >>> chance of downtime when using QUORUM.  In addition, with the recent
> >>> streaming optimization it seems the token counts will give a greater
> chance
> >>> of a node streaming entire sstables (with LCS), meaning we'll do a
> better
> >>> job with node density out of the box.
> >>>
> >>> Next week I can try to put together something a little more convincing.
> >>> Weekend time.
> >>>
> >>> Jon
> >>>
> >>>
> >>> On Fri, Sep 21, 2018 at 8:45 PM sankalp kohli <kohlisank...@gmail.com>
> >>> wrote:
> >>>
> >>>> +1 to lowering it.
> >>>> Thanks Jon for starting this.We should create a JIRA to find what
> other
> >>>> defaults we need revisit. (Please keep this discussion for "default
> token"
> >>>> only.  )
> >>>>
> >>>>> On Fri, Sep 21, 2018 at 8:26 PM Jeff Jirsa <jji...@gmail.com> wrote:
> >>>>>
> >>>>> Also agree it should be lowered, but definitely not to 1, and
> probably
> >>>>> something closer to 32 than 4.
> >>>>>
> >>>>> --
> >>>>> Jeff Jirsa
> >>>>>
> >>>>>
> >>>>>> On Sep 21, 2018, at 8:24 PM, Jeremy Hanna <
> jeremy.hanna1...@gmail.com>
> >>>>> wrote:
> >>>>>> I agree that it should be lowered. What I’ve seen debated a bit in
> the
> >>>>> past is the number but I don’t think anyone thinks that it should
> remain
> >>>>> 256.
> >>>>>>> On Sep 21, 2018, at 7:05 PM, Jonathan Haddad <j...@jonhaddad.com>
> >>>> wrote:
> >>>>>>> One thing that's really, really bothered me for a while is how we
> >>>>> default
> >>>>>>> to 256 tokens still.  There's no experienced operator that leaves
> it
> >>>> as
> >>>>> is
> >>>>>>> at this point, meaning the only people using 256 are the poor folks
> >>>> that
> >>>>>>> just got started using C*.  I've worked with over a hundred
> clusters
> >>>> in
> >>>>> the
> >>>>>>> last couple years, and I think I only worked with one that had
> lowered
> >>>>> it
> >>>>>>> to something else.
> >>>>>>>
> >>>>>>> I think it's time we changed the default to 4 (or 8, up for
> debate).
> >>>>>>>
> >>>>>>> To improve the behavior, we need to change a couple other things.
> The
> >>>>>>> allocate_tokens_for_keyspace setting is... odd.  It requires you
> have
> >>>> a
> >>>>>>> keyspace already created, which doesn't help on new clusters.  What
> >>>> I'd
> >>>>>>> like to do is add a new setting, allocate_tokens_for_rf, and set
> it to
> >>>>> 3 by
> >>>>>>> default.
> >>>>>>>
> >>>>>>> To handle clusters that are already using 256 tokens, we could
> prevent
> >>>>> the
> >>>>>>> new node from joining unless a -D flag is set to explicitly allow
> >>>>>>> imbalanced tokens.
> >>>>>>>
> >>>>>>> We've agreed to a trunk freeze, but I feel like this is important
> >>>> enough
> >>>>>>> (and pretty trivial) to do now.  I'd also personally characterize
> this
> >>>>> as a
> >>>>>>> bug fix since 256 is horribly broken when the cluster gets to any
> >>>>>>> reasonable size, but maybe I'm alone there.
> >>>>>>>
> >>>>>>> I honestly can't think of a use case where random tokens is a good
> >>>>> choice
> >>>>>>> anymore, so I'd be fine / ecstatic with removing it completely and
> >>>>>>> requiring either allocate_tokens_for_keyspace (for existing
> clusters)
> >>>>>>> or allocate_tokens_for_rf
> >>>>>>> to be set.
> >>>>>>>
> >>>>>>> Thoughts?  Objections?
> >>>>>>> --
> >>>>>>> Jon Haddad
> >>>>>>> http://www.rustyrazorblade.com
> >>>>>>> twitter: rustyrazorblade
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>>>
> >>>
> >>> --
> >>> Jon Haddad
> >>> http://www.rustyrazorblade.com
> >>> twitter: rustyrazorblade
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Reply via email to