Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Mick Semb Wever


> TLDR, based on availability concerns, skew concerns, operational 
> concerns, and based on the fact that the new allocation algorithm can 
> be configured fairly simply now, this is a proposal to go with 4 as the 
> new default and the allocate_tokens_for_local_replication_factor set to 
> 3.  


I'm uncomfortable going with the default of `num_tokens: 4`.
I would rather see a default of `num_tokens: 16` based on the following…

a) 4 num_tokens does not provide a good out-of-the-box experience.
b) 4 num_tokens doesn't provide any significant streaming benefits over 16.
c)  edge-case availability doesn't trump (a) & (b)


For (a)…
 The first node in each rack, up to RF racks, in each datacenter can't use the 
allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first three nodes 
will be poorly balanced. If three poorly unbalanced nodes in a cluster is an 
issue (because the cluster is small enough) therefore 4 is the wrong default. 
From our own experience, we have had to bootstrap these nodes multiple times 
until they generate something ok. In practice 4 num_tokens (over 16) has 
provided more headache with clients than gain.

Elaborating, 256 was originally chosen because the token randomness over that 
many always averaged out. With a default of  
`allocate_tokens_for_local_replication_factor: 3` this issue is largely solved, 
but you will still have those initial nodes with randomly generated tokens. 
Ref: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80
And to be precise: tokens are randomly generated until there is a node in each 
rack up to RF racks. So, if you have RF=3, in theory (or are a newbie) you 
could boot 100 nodes only in the first two racks, and they will all be random 
tokens regardless of the allocate_tokens_for_local_replication_factor setting.

For example, using 4 num_tokens, 3 racks and RF=3…
 - in a 6 node cluster, there's a total of 24 tokens, half of which are random,
 - in a 9 node cluster, there's a total of 36 tokens, a third of which is 
random,
 - etc

Following this logic i would not be willing to apply 4 unless you know there 
will be more than 36 nodes in each data centre, ie less than ~8% of your tokens 
are randomly generated. Many clusters don't have that size, and imho that's why 
4 is a bad default. 

A default of 16 by the same logic only needs 9 nodes in each dc to overcome 
that randomness degree.

The workaround to all this is having to manually define `initial_token: …` on 
those initial nodes. I'm really not inspired imposing that upon new users.

For (b)…
 there's been a number of improvements already around streaming that solves 
much of what would be any difference there is between 4 and 16 num_tokens. And 
4 num_tokens means bigger token ranges so could well be disadvantageous due to 
over-streaming.

For (c)…
 we are trying to optimise availability in situations we can never guarantee 
availability. I understand it's a nice operational advantage to have in a 
shit-show, but it's not a systems design that you can design and rely upon. 
There's also the question of availability vs the size of the token-range that 
becomes unavailable.



regards,
Mick


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Alexander Dejanovski
While I (mostly) understand the maths behind using 4 vnodes as a default
(which really is a question of extreme availability), I don't think they
provide noticeable performance improvements over using 16, while 16 vnodes
will protect folks from imbalances. It is very hard to deal with unbalanced
clusters, and people start to deal with it once some nodes are already
close to being full. Operationally, it's far from trivial.
We're going to make some experiments at bootstrapping clusters with 4
tokens on the latest alpha to see how much balance we can expect, and how
removing one node could impact it.

If we're talking about repairs, using 4 vnodes will generate overstreaming,
which can create lots of serious performance issues. Even on clusters with
500GB of node density, we never use less than ~15 segments per node with
Reaper.
Not everyone uses Reaper, obviously, and there will be no protection
against overstreaming with such a low default for folks not using subrange
repairs.
On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and Reaper
already allows to get good repair performance because token ranges sharing
the exact same replicas will be processed in a single repair session. On
large clusters, I reckon it's good to have way less vnodes to speed up
repairs.

Cassandra 4.0 is supposed to aim at providing a rock stable release of
Cassandra, fixing past instabilities, and I think lowering to 4 tokens by
default defeats that purpose.
16 tokens is a reasonable compromise for clusters of all sizes, without
being too aggressive. Those with enough C* experience can still lower that
number for their clusters.

Cheers,

-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever  wrote:

>
> > TLDR, based on availability concerns, skew concerns, operational
> > concerns, and based on the fact that the new allocation algorithm can
> > be configured fairly simply now, this is a proposal to go with 4 as the
> > new default and the allocate_tokens_for_local_replication_factor set to
> > 3.
>
>
> I'm uncomfortable going with the default of `num_tokens: 4`.
> I would rather see a default of `num_tokens: 16` based on the following…
>
> a) 4 num_tokens does not provide a good out-of-the-box experience.
> b) 4 num_tokens doesn't provide any significant streaming benefits over 16.
> c)  edge-case availability doesn't trump (a) & (b)
>
>
> For (a)…
>  The first node in each rack, up to RF racks, in each datacenter can't use
> the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first
> three nodes will be poorly balanced. If three poorly unbalanced nodes in a
> cluster is an issue (because the cluster is small enough) therefore 4 is
> the wrong default. From our own experience, we have had to bootstrap these
> nodes multiple times until they generate something ok. In practice 4
> num_tokens (over 16) has provided more headache with clients than gain.
>
> Elaborating, 256 was originally chosen because the token randomness over
> that many always averaged out. With a default of
> `allocate_tokens_for_local_replication_factor: 3` this issue is largely
> solved, but you will still have those initial nodes with randomly generated
> tokens. Ref:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80
> And to be precise: tokens are randomly generated until there is a node in
> each rack up to RF racks. So, if you have RF=3, in theory (or are a newbie)
> you could boot 100 nodes only in the first two racks, and they will all be
> random tokens regardless of the
> allocate_tokens_for_local_replication_factor setting.
>
> For example, using 4 num_tokens, 3 racks and RF=3…
>  - in a 6 node cluster, there's a total of 24 tokens, half of which are
> random,
>  - in a 9 node cluster, there's a total of 36 tokens, a third of which is
> random,
>  - etc
>
> Following this logic i would not be willing to apply 4 unless you know
> there will be more than 36 nodes in each data centre, ie less than ~8% of
> your tokens are randomly generated. Many clusters don't have that size, and
> imho that's why 4 is a bad default.
>
> A default of 16 by the same logic only needs 9 nodes in each dc to
> overcome that randomness degree.
>
> The workaround to all this is having to manually define `initial_token: …`
> on those initial nodes. I'm really not inspired imposing that upon new
> users.
>
> For (b)…
>  there's been a number of improvements already around streaming that
> solves much of what would be any difference there is between 4 and 16
> num_tokens. And 4 num_tokens means bigger token ranges so could well be
> disadvantageous due to over-streaming.
>
> For (c)…
>  we are trying to optimise availability in situations we can never
> guarantee availability. I understand it's a nice operational advantage to
> have in a shi

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Joshua McKenzie
>
> We should be using the default value that benefits the most people, rather
> than an arbitrary compromise.

I'd caution we're talking about the default value *we believe* will benefit
the most people according to our respective understandings of C* usage.

 Most clusters don't shrink, they stay the same size or grow. I'd say 90%
> or more fall in this category.

While I agree with the "most don't shrink, they stay the same or grow"
claim intuitively, there's a distinct difference impacting the 4 vs. 16
debate between what ratio we think stays the same size and what ratio we
think grows that I think informs this discussion.

There's a *lot* of Cassandra out in the world, and these changes are going
to impact all of it. I'm not advocating a certain position on 4 vs. 16, but
I do think we need to be very careful about how strongly we hold our
beliefs and present them as facts in discussions like this.

For my unsolicited .02, it sounds an awful lot like we're stuck between a
rock and a hard place in that there is no correct "one size fits all"
answer here (or, said another way: both 4 and 16 are correct, just for
different cases and we don't know / agree on which one we think is the
right one to target), so perhaps a discussion on a smart evolution of token
allocation counts based on quantized tiers of cluster size and dataset
growth (either automated or through operational best practices) could be
valuable along with this.

On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski 
wrote:

> While I (mostly) understand the maths behind using 4 vnodes as a default
> (which really is a question of extreme availability), I don't think they
> provide noticeable performance improvements over using 16, while 16 vnodes
> will protect folks from imbalances. It is very hard to deal with unbalanced
> clusters, and people start to deal with it once some nodes are already
> close to being full. Operationally, it's far from trivial.
> We're going to make some experiments at bootstrapping clusters with 4
> tokens on the latest alpha to see how much balance we can expect, and how
> removing one node could impact it.
>
> If we're talking about repairs, using 4 vnodes will generate overstreaming,
> which can create lots of serious performance issues. Even on clusters with
> 500GB of node density, we never use less than ~15 segments per node with
> Reaper.
> Not everyone uses Reaper, obviously, and there will be no protection
> against overstreaming with such a low default for folks not using subrange
> repairs.
> On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and Reaper
> already allows to get good repair performance because token ranges sharing
> the exact same replicas will be processed in a single repair session. On
> large clusters, I reckon it's good to have way less vnodes to speed up
> repairs.
>
> Cassandra 4.0 is supposed to aim at providing a rock stable release of
> Cassandra, fixing past instabilities, and I think lowering to 4 tokens by
> default defeats that purpose.
> 16 tokens is a reasonable compromise for clusters of all sizes, without
> being too aggressive. Those with enough C* experience can still lower that
> number for their clusters.
>
> Cheers,
>
> -
> Alexander Dejanovski
> France
> @alexanderdeja
>
> Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever  wrote:
>
> >
> > > TLDR, based on availability concerns, skew concerns, operational
> > > concerns, and based on the fact that the new allocation algorithm can
> > > be configured fairly simply now, this is a proposal to go with 4 as the
> > > new default and the allocate_tokens_for_local_replication_factor set to
> > > 3.
> >
> >
> > I'm uncomfortable going with the default of `num_tokens: 4`.
> > I would rather see a default of `num_tokens: 16` based on the following…
> >
> > a) 4 num_tokens does not provide a good out-of-the-box experience.
> > b) 4 num_tokens doesn't provide any significant streaming benefits over
> 16.
> > c)  edge-case availability doesn't trump (a) & (b)
> >
> >
> > For (a)…
> >  The first node in each rack, up to RF racks, in each datacenter can't
> use
> > the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first
> > three nodes will be poorly balanced. If three poorly unbalanced nodes in
> a
> > cluster is an issue (because the cluster is small enough) therefore 4 is
> > the wrong default. From our own experience, we have had to bootstrap
> these
> > nodes multiple times until they generate something ok. In practice 4
> > num_tokens (over 16) has provided more headache with clients than gain.
> >
> > Elaborating, 256 was originally chosen because the token randomness over
> > that many always averaged out. With a default of
> > `allocate_tokens_for_local_replication_factor: 3` this issue is largely
> > solved, but you will still have those initial nodes with randomly
> generated
> > tokens. Ref:
> >
> htt

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Dimitar Dimitrov
Hey all,

At some point not too long ago I spent some time trying to
make the token allocation algorithm the default.

I didn't foresee it, although it might be obvious for many of
you, but one corollary of the way the algorithm works (or more
precisely might not work) with multiple seeds or simultaneous
multi-node bootstraps or decommissions, is that a lot of dtests
start failing due to deterministic token conflicts. I wasn't
able to fix that by changing solely ccm and the dtests, unless
careful, sequential node bootstrap was enforced. While it's strongly
suggested to users to do exactly that in the real world, it would
have exploded dtest run times to unacceptable levels.

I have to clarify that what I'm working with is not exactly
C*, and my knowledge of the C* codebase is not as up to date as
I would want it to, but I suspect that the above problem might very
well affect C* too, in which case changing the defaults might
be a less-than-trivial undertaking.

Regards,
Dimitar

На пт, 31.01.2020 г. в 17:20 Joshua McKenzie  написа:

> >
> > We should be using the default value that benefits the most people,
> rather
> > than an arbitrary compromise.
>
> I'd caution we're talking about the default value *we believe* will benefit
> the most people according to our respective understandings of C* usage.
>
>  Most clusters don't shrink, they stay the same size or grow. I'd say 90%
> > or more fall in this category.
>
> While I agree with the "most don't shrink, they stay the same or grow"
> claim intuitively, there's a distinct difference impacting the 4 vs. 16
> debate between what ratio we think stays the same size and what ratio we
> think grows that I think informs this discussion.
>
> There's a *lot* of Cassandra out in the world, and these changes are going
> to impact all of it. I'm not advocating a certain position on 4 vs. 16, but
> I do think we need to be very careful about how strongly we hold our
> beliefs and present them as facts in discussions like this.
>
> For my unsolicited .02, it sounds an awful lot like we're stuck between a
> rock and a hard place in that there is no correct "one size fits all"
> answer here (or, said another way: both 4 and 16 are correct, just for
> different cases and we don't know / agree on which one we think is the
> right one to target), so perhaps a discussion on a smart evolution of token
> allocation counts based on quantized tiers of cluster size and dataset
> growth (either automated or through operational best practices) could be
> valuable along with this.
>
> On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski <
> a...@thelastpickle.com>
> wrote:
>
> > While I (mostly) understand the maths behind using 4 vnodes as a default
> > (which really is a question of extreme availability), I don't think they
> > provide noticeable performance improvements over using 16, while 16
> vnodes
> > will protect folks from imbalances. It is very hard to deal with
> unbalanced
> > clusters, and people start to deal with it once some nodes are already
> > close to being full. Operationally, it's far from trivial.
> > We're going to make some experiments at bootstrapping clusters with 4
> > tokens on the latest alpha to see how much balance we can expect, and how
> > removing one node could impact it.
> >
> > If we're talking about repairs, using 4 vnodes will generate
> overstreaming,
> > which can create lots of serious performance issues. Even on clusters
> with
> > 500GB of node density, we never use less than ~15 segments per node with
> > Reaper.
> > Not everyone uses Reaper, obviously, and there will be no protection
> > against overstreaming with such a low default for folks not using
> subrange
> > repairs.
> > On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and
> Reaper
> > already allows to get good repair performance because token ranges
> sharing
> > the exact same replicas will be processed in a single repair session. On
> > large clusters, I reckon it's good to have way less vnodes to speed up
> > repairs.
> >
> > Cassandra 4.0 is supposed to aim at providing a rock stable release of
> > Cassandra, fixing past instabilities, and I think lowering to 4 tokens by
> > default defeats that purpose.
> > 16 tokens is a reasonable compromise for clusters of all sizes, without
> > being too aggressive. Those with enough C* experience can still lower
> that
> > number for their clusters.
> >
> > Cheers,
> >
> > -
> > Alexander Dejanovski
> > France
> > @alexanderdeja
> >
> > Consultant
> > Apache Cassandra Consulting
> > http://www.thelastpickle.com
> >
> >
> > On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever  wrote:
> >
> > >
> > > > TLDR, based on availability concerns, skew concerns, operational
> > > > concerns, and based on the fact that the new allocation algorithm can
> > > > be configured fairly simply now, this is a proposal to go with 4 as
> the
> > > > new default and the allocate_tokens_for_local_replication_factor set
> to
> >

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
So why even have virtual nodes at all, why not work on improving single
token approaches so that we can support cluster doubling, which IMO would
enable cassandra to more quickly scale for volatile loads?

It's my guess/understanding that vnodes eliminate the token rebalancing
that existed back in the days of single token. Did vnodes also help reduce
the amount of streamed data in rebalancing/expansion? VNodes also help the
streaming from multiple sources in expansion, but if it limits us to single
node expansion that really limits flexibility on large node count clusters.

Were there other advantages to VNodes that I missed?

IIRC High vnode count basically broke the secondary low cardinality
indexes, and vnode=4 might help that a lot.

But if 4 hasn't shown any balancing issues, I'm all for it.

On Fri, Jan 31, 2020 at 9:58 AM Dimitar Dimitrov 
wrote:

> Hey all,
>
> At some point not too long ago I spent some time trying to
> make the token allocation algorithm the default.
>
> I didn't foresee it, although it might be obvious for many of
> you, but one corollary of the way the algorithm works (or more
> precisely might not work) with multiple seeds or simultaneous
> multi-node bootstraps or decommissions, is that a lot of dtests
> start failing due to deterministic token conflicts. I wasn't
> able to fix that by changing solely ccm and the dtests, unless
> careful, sequential node bootstrap was enforced. While it's strongly
> suggested to users to do exactly that in the real world, it would
> have exploded dtest run times to unacceptable levels.
>
> I have to clarify that what I'm working with is not exactly
> C*, and my knowledge of the C* codebase is not as up to date as
> I would want it to, but I suspect that the above problem might very
> well affect C* too, in which case changing the defaults might
> be a less-than-trivial undertaking.
>
> Regards,
> Dimitar
>
> На пт, 31.01.2020 г. в 17:20 Joshua McKenzie 
> написа:
>
> > >
> > > We should be using the default value that benefits the most people,
> > rather
> > > than an arbitrary compromise.
> >
> > I'd caution we're talking about the default value *we believe* will
> benefit
> > the most people according to our respective understandings of C* usage.
> >
> >  Most clusters don't shrink, they stay the same size or grow. I'd say 90%
> > > or more fall in this category.
> >
> > While I agree with the "most don't shrink, they stay the same or grow"
> > claim intuitively, there's a distinct difference impacting the 4 vs. 16
> > debate between what ratio we think stays the same size and what ratio we
> > think grows that I think informs this discussion.
> >
> > There's a *lot* of Cassandra out in the world, and these changes are
> going
> > to impact all of it. I'm not advocating a certain position on 4 vs. 16,
> but
> > I do think we need to be very careful about how strongly we hold our
> > beliefs and present them as facts in discussions like this.
> >
> > For my unsolicited .02, it sounds an awful lot like we're stuck between a
> > rock and a hard place in that there is no correct "one size fits all"
> > answer here (or, said another way: both 4 and 16 are correct, just for
> > different cases and we don't know / agree on which one we think is the
> > right one to target), so perhaps a discussion on a smart evolution of
> token
> > allocation counts based on quantized tiers of cluster size and dataset
> > growth (either automated or through operational best practices) could be
> > valuable along with this.
> >
> > On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski <
> > a...@thelastpickle.com>
> > wrote:
> >
> > > While I (mostly) understand the maths behind using 4 vnodes as a
> default
> > > (which really is a question of extreme availability), I don't think
> they
> > > provide noticeable performance improvements over using 16, while 16
> > vnodes
> > > will protect folks from imbalances. It is very hard to deal with
> > unbalanced
> > > clusters, and people start to deal with it once some nodes are already
> > > close to being full. Operationally, it's far from trivial.
> > > We're going to make some experiments at bootstrapping clusters with 4
> > > tokens on the latest alpha to see how much balance we can expect, and
> how
> > > removing one node could impact it.
> > >
> > > If we're talking about repairs, using 4 vnodes will generate
> > overstreaming,
> > > which can create lots of serious performance issues. Even on clusters
> > with
> > > 500GB of node density, we never use less than ~15 segments per node
> with
> > > Reaper.
> > > Not everyone uses Reaper, obviously, and there will be no protection
> > > against overstreaming with such a low default for folks not using
> > subrange
> > > repairs.
> > > On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and
> > Reaper
> > > already allows to get good repair performance because token ranges
> > sharing
> > > the exact same replicas will be processed in a single repair ses

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Michael Shuler

On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:

one corollary of the way the algorithm works (or more
precisely might not work) with multiple seeds or simultaneous
multi-node bootstraps or decommissions, is that a lot of dtests
start failing due to deterministic token conflicts. I wasn't
able to fix that by changing solely ccm and the dtests
I appreciate all the detailed discussion. For a little historic context, 
since I brought up this topic in the contributors zoom meeting, unstable 
dtests was precisely the reason we moved the dtest configurations to 
'num_tokens: 32'. That value has been used in CI dtest since something 
like 2014, when we found that this helped stabilize a large segment of 
flaky dtest failures. No real science there, other than "this hurts less."


I have no real opinion on the suggestions of using 4 or 16, other than I 
believe most "default config using" new users are starting with smaller 
numbers of nodes. The small-but-growing users and veteran large cluster 
admins should be gaining more operational knowledge and be able to 
adjust their own config choices according to their needs (and good 
comment suggestions in the yaml). Whatever default config value is 
chosen for num_tokens, I think it should suit the new users with smaller 
clusters. The suggestion Mick makes that 16 makes a better choice for 
small numbers of nodes, well, that would seem to be the better choice 
for those users we are trying to help the most with the default.


I fully agree that science, maths, and support/ops experience should 
guide the choice, but I don't believe that large/giant clusters and 
admins are the target audience for the value we select.


--
Kind regards,
Michael

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
"large/giant clusters and admins are the target audience for the value we
select"

There are reasons aside from massive scale to pick cassandra, but the
primary reason cassandra is selected technically is to support vertically
scaling to large clusters.

Why pick a value that once you reach scale you need to switch token count?
It's still a ticking time bomb, although 16 won't be what 256 is.

H. But 4 is bad and could scare off adoption.

Ultimately a well-written article on operations and how to transition from
16 --> 4 and at what point that is a good idea (aka not when your cluster
is too big) should be a critical part of this.

On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler 
wrote:

> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
> > one corollary of the way the algorithm works (or more
> > precisely might not work) with multiple seeds or simultaneous
> > multi-node bootstraps or decommissions, is that a lot of dtests
> > start failing due to deterministic token conflicts. I wasn't
> > able to fix that by changing solely ccm and the dtests
> I appreciate all the detailed discussion. For a little historic context,
> since I brought up this topic in the contributors zoom meeting, unstable
> dtests was precisely the reason we moved the dtest configurations to
> 'num_tokens: 32'. That value has been used in CI dtest since something
> like 2014, when we found that this helped stabilize a large segment of
> flaky dtest failures. No real science there, other than "this hurts less."
>
> I have no real opinion on the suggestions of using 4 or 16, other than I
> believe most "default config using" new users are starting with smaller
> numbers of nodes. The small-but-growing users and veteran large cluster
> admins should be gaining more operational knowledge and be able to
> adjust their own config choices according to their needs (and good
> comment suggestions in the yaml). Whatever default config value is
> chosen for num_tokens, I think it should suit the new users with smaller
> clusters. The suggestion Mick makes that 16 makes a better choice for
> small numbers of nodes, well, that would seem to be the better choice
> for those users we are trying to help the most with the default.
>
> I fully agree that science, maths, and support/ops experience should
> guide the choice, but I don't believe that large/giant clusters and
> admins are the target audience for the value we select.
>
> --
> Kind regards,
> Michael
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Carl Mueller
edit: 4 is bad at small cluster sizes and could scare off adoption

On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller 
wrote:

> "large/giant clusters and admins are the target audience for the value we
> select"
>
> There are reasons aside from massive scale to pick cassandra, but the
> primary reason cassandra is selected technically is to support vertically
> scaling to large clusters.
>
> Why pick a value that once you reach scale you need to switch token count?
> It's still a ticking time bomb, although 16 won't be what 256 is.
>
> H. But 4 is bad and could scare off adoption.
>
> Ultimately a well-written article on operations and how to transition from
> 16 --> 4 and at what point that is a good idea (aka not when your cluster
> is too big) should be a critical part of this.
>
> On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler 
> wrote:
>
>> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
>> > one corollary of the way the algorithm works (or more
>> > precisely might not work) with multiple seeds or simultaneous
>> > multi-node bootstraps or decommissions, is that a lot of dtests
>> > start failing due to deterministic token conflicts. I wasn't
>> > able to fix that by changing solely ccm and the dtests
>> I appreciate all the detailed discussion. For a little historic context,
>> since I brought up this topic in the contributors zoom meeting, unstable
>> dtests was precisely the reason we moved the dtest configurations to
>> 'num_tokens: 32'. That value has been used in CI dtest since something
>> like 2014, when we found that this helped stabilize a large segment of
>> flaky dtest failures. No real science there, other than "this hurts less."
>>
>> I have no real opinion on the suggestions of using 4 or 16, other than I
>> believe most "default config using" new users are starting with smaller
>> numbers of nodes. The small-but-growing users and veteran large cluster
>> admins should be gaining more operational knowledge and be able to
>> adjust their own config choices according to their needs (and good
>> comment suggestions in the yaml). Whatever default config value is
>> chosen for num_tokens, I think it should suit the new users with smaller
>> clusters. The suggestion Mick makes that 16 makes a better choice for
>> small numbers of nodes, well, that would seem to be the better choice
>> for those users we are trying to help the most with the default.
>>
>> I fully agree that science, maths, and support/ops experience should
>> guide the choice, but I don't believe that large/giant clusters and
>> admins are the target audience for the value we select.
>>
>> --
>> Kind regards,
>> Michael
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Joseph Lynch
I think that we might be bikeshedding this number a bit because it is easy
to debate and there is not yet one right answer. I hope we recognize either
choice (4 or 16) is fine in that users can always override us and we can
always change our minds later or better yet improve allocation so users
don't have to care. Either choice is an improvement on the status quo. I
only truly care that when we change this default let's make sure that:
1. Users can still launch a cluster all at once. Last I checked even with
allocate_for_rf you need to bootstrap one node at a time for even
allocation to work properly; please someone correct me if I'm wrong, and if
I'm not let's get this fixed before the beta.
2. We get good documentation about this choice into our docs.
[documentation team and I are on it!]

I don't like phrasing this as a "small user" vs "large user" discussion.
Everybody using Cassandra wants it to be easy to operate with high
availability and consistent performance. Optimizing for "I can oops a few
nodes and not have an outage" is an important thing to optimize for
regardless of scale. It seems we have a lot of input on this thread that
we're frequently seeing users override this to 4 (apparently even with
random allocation? I am personally surprised by this if true). Some people
have indicated that they like a higher number like 16 or 32. Some (most?)
of our largest users by footprint are still using 1.

The only significant advantage I'm aware of for 16 over 4 is that users can
scale up and down in increments of N/16 (12 node cluster -> 1) instead of
N/4 (12 node cluster -> 3) without further token allocation improvements in
Cassandra. Practically speaking I think people are often spreading nodes
out over RF=3 "racks" (e.g. GCP, Azure, and AWS) so they'll want to scale
by increments of 3 anyways. I agree with Jon that optimizing for
scale-downs is odd; it's a pretty infrequent operation and all the users I
know doing autoscaling are doing it vertically using networked attached
storage (~EBS). Let's also remember repairing clusters with 16 tokens per
node is slower (probably about 2-4x slower) than repairing clusters with 4
tokens.

With zero copy streaming there should no benefit to more tokens for data
transfer, if there is, it is a bug in streaming performance and let's fix
it.
Honestly, in my opinion if we have balancing issues with small number of
tokens that is a bug and we should just fix it; token moves are safe, it is
definitely possible for Cassandra to just self-balance itself.

Let's not worry about scaring off users with this choice, choosing 4 will
not scare off users any more than 256 random tokens has scared off users
when they realized that they can't have any combination of two nodes down
in different racks.

-Joey

On Fri, Jan 31, 2020 at 10:16 AM Carl Mueller
 wrote:

> edit: 4 is bad at small cluster sizes and could scare off adoption
>
> On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > "large/giant clusters and admins are the target audience for the value we
> > select"
> >
> > There are reasons aside from massive scale to pick cassandra, but the
> > primary reason cassandra is selected technically is to support vertically
> > scaling to large clusters.
> >
> > Why pick a value that once you reach scale you need to switch token
> count?
> > It's still a ticking time bomb, although 16 won't be what 256 is.
> >
> > H. But 4 is bad and could scare off adoption.
> >
> > Ultimately a well-written article on operations and how to transition
> from
> > 16 --> 4 and at what point that is a good idea (aka not when your cluster
> > is too big) should be a critical part of this.
> >
> > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler 
> > wrote:
> >
> >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
> >> > one corollary of the way the algorithm works (or more
> >> > precisely might not work) with multiple seeds or simultaneous
> >> > multi-node bootstraps or decommissions, is that a lot of dtests
> >> > start failing due to deterministic token conflicts. I wasn't
> >> > able to fix that by changing solely ccm and the dtests
> >> I appreciate all the detailed discussion. For a little historic context,
> >> since I brought up this topic in the contributors zoom meeting, unstable
> >> dtests was precisely the reason we moved the dtest configurations to
> >> 'num_tokens: 32'. That value has been used in CI dtest since something
> >> like 2014, when we found that this helped stabilize a large segment of
> >> flaky dtest failures. No real science there, other than "this hurts
> less."
> >>
> >> I have no real opinion on the suggestions of using 4 or 16, other than I
> >> believe most "default config using" new users are starting with smaller
> >> numbers of nodes. The small-but-growing users and veteran large cluster
> >> admins should be gaining more operational knowledge and be able to
> >> adjust their own config choices according to their needs (

Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Jeff Jirsa
On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch  wrote:

> I think that we might be bikeshedding this number a bit because it is easy
> to debate and there is not yet one right answer.
>


https://www.youtube.com/watch?v=v465T5u9UKo


Re: [Discuss] num_tokens default in Cassandra 4.0

2020-01-31 Thread Jeremy Hanna
I think Mick and Anthony make some valid operational and skew points for 
smaller/starting clusters with 4 num_tokens. There’s an arbitrary line between 
small and large clusters but I think most would agree that most clusters are on 
the small to medium side. (A small nuance is afaict the probabilities have to 
do with quorum on a full token range, ie it has to do with the size of a 
datacenter not the full cluster

As I read this discussion I’m personally more inclined to go with 16 for now. 
It’s true that if we could fix the skew and topology gotchas for those starting 
things up, 4 would be ideal from an availability perspective. However we’re 
still in the brainstorming stage for how to address those challenges. I think 
we should create tickets for those issues and go with 16 for 4.0.

This is about an out of the box experience. It balances availability, 
operations (such as skew and general bootstrap friendliness and 
streaming/repair), and cluster sizing. Balancing all of those, I think for now 
I’m more comfortable with 16 as the default with docs on considerations and 
tickets to unblock 4 as the default for all users.

>>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa  wrote:
>> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch  wrote:
>> I think that we might be bikeshedding this number a bit because it is easy
>> to debate and there is not yet one right answer.
> 
> 
> https://www.youtube.com/watch?v=v465T5u9UKo

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [VOTE] Release Apache Cassandra 4.0-alpha3

2020-01-31 Thread Yuji Ito
+1 (non-binding)

I've briefly tested the build with Jepsen.
https://github.com/scalar-labs/scalar-jepsen

2020年1月31日(金) 13:37 Anthony Grasso :

> +1 (non-binding)
>
> On Fri, 31 Jan 2020 at 08:48, Joshua McKenzie 
> wrote:
>
> > +1
> >
> > On Thu, Jan 30, 2020 at 4:31 PM Brandon Williams 
> wrote:
> >
> > > +1
> > >
> > > On Thu, Jan 30, 2020 at 1:47 PM Mick Semb Wever 
> wrote:
> > > >
> > > > Proposing the test build of Cassandra 4.0-alpha3 for release.
> > > >
> > > > sha1: 5f7c88601c65cdf14ee68387ed68203f2603fc29
> > > > Git:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-alpha3-tentative
> > > > Maven Artifacts:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1189/org/apache/cassandra/apache-cassandra/4.0-alpha3/
> > > >
> > > > The Source and Build Artifacts, and the Debian and RPM packages are
> > > available here:
> > > https://dist.apache.org/repos/dist/dev/cassandra/4.0-alpha3/
> > > >
> > > > The vote will be open for 72 hours (longer if needed). Everyone who
> has
> > > tested the build is invited to vote. Votes by PMC members are
> considered
> > > binding. A vote passes if there are at least three binding +1s.
> > > >
> > > > ** Please note this is my first time as release manager, and the
> > release
> > > process has been improved to deal with sha256|512 checksums as well as
> to
> > > use the ASF dev dist staging location. So please be extra critical. **
> > > >
> > > >
> > > > [1]: CHANGES.txt:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-alpha3-tentative
> > > > [2]: NEWS.txt:
> > >
> >
> https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-alpha3-tentative
> > > >
> > > > -
> > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>