Re: [Discuss] num_tokens default in Cassandra 4.0
> TLDR, based on availability concerns, skew concerns, operational > concerns, and based on the fact that the new allocation algorithm can > be configured fairly simply now, this is a proposal to go with 4 as the > new default and the allocate_tokens_for_local_replication_factor set to > 3. I'm uncomfortable going with the default of `num_tokens: 4`. I would rather see a default of `num_tokens: 16` based on the following… a) 4 num_tokens does not provide a good out-of-the-box experience. b) 4 num_tokens doesn't provide any significant streaming benefits over 16. c) edge-case availability doesn't trump (a) & (b) For (a)… The first node in each rack, up to RF racks, in each datacenter can't use the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first three nodes will be poorly balanced. If three poorly unbalanced nodes in a cluster is an issue (because the cluster is small enough) therefore 4 is the wrong default. From our own experience, we have had to bootstrap these nodes multiple times until they generate something ok. In practice 4 num_tokens (over 16) has provided more headache with clients than gain. Elaborating, 256 was originally chosen because the token randomness over that many always averaged out. With a default of `allocate_tokens_for_local_replication_factor: 3` this issue is largely solved, but you will still have those initial nodes with randomly generated tokens. Ref: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80 And to be precise: tokens are randomly generated until there is a node in each rack up to RF racks. So, if you have RF=3, in theory (or are a newbie) you could boot 100 nodes only in the first two racks, and they will all be random tokens regardless of the allocate_tokens_for_local_replication_factor setting. For example, using 4 num_tokens, 3 racks and RF=3… - in a 6 node cluster, there's a total of 24 tokens, half of which are random, - in a 9 node cluster, there's a total of 36 tokens, a third of which is random, - etc Following this logic i would not be willing to apply 4 unless you know there will be more than 36 nodes in each data centre, ie less than ~8% of your tokens are randomly generated. Many clusters don't have that size, and imho that's why 4 is a bad default. A default of 16 by the same logic only needs 9 nodes in each dc to overcome that randomness degree. The workaround to all this is having to manually define `initial_token: …` on those initial nodes. I'm really not inspired imposing that upon new users. For (b)… there's been a number of improvements already around streaming that solves much of what would be any difference there is between 4 and 16 num_tokens. And 4 num_tokens means bigger token ranges so could well be disadvantageous due to over-streaming. For (c)… we are trying to optimise availability in situations we can never guarantee availability. I understand it's a nice operational advantage to have in a shit-show, but it's not a systems design that you can design and rely upon. There's also the question of availability vs the size of the token-range that becomes unavailable. regards, Mick - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [Discuss] num_tokens default in Cassandra 4.0
While I (mostly) understand the maths behind using 4 vnodes as a default (which really is a question of extreme availability), I don't think they provide noticeable performance improvements over using 16, while 16 vnodes will protect folks from imbalances. It is very hard to deal with unbalanced clusters, and people start to deal with it once some nodes are already close to being full. Operationally, it's far from trivial. We're going to make some experiments at bootstrapping clusters with 4 tokens on the latest alpha to see how much balance we can expect, and how removing one node could impact it. If we're talking about repairs, using 4 vnodes will generate overstreaming, which can create lots of serious performance issues. Even on clusters with 500GB of node density, we never use less than ~15 segments per node with Reaper. Not everyone uses Reaper, obviously, and there will be no protection against overstreaming with such a low default for folks not using subrange repairs. On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and Reaper already allows to get good repair performance because token ranges sharing the exact same replicas will be processed in a single repair session. On large clusters, I reckon it's good to have way less vnodes to speed up repairs. Cassandra 4.0 is supposed to aim at providing a rock stable release of Cassandra, fixing past instabilities, and I think lowering to 4 tokens by default defeats that purpose. 16 tokens is a reasonable compromise for clusters of all sizes, without being too aggressive. Those with enough C* experience can still lower that number for their clusters. Cheers, - Alexander Dejanovski France @alexanderdeja Consultant Apache Cassandra Consulting http://www.thelastpickle.com On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever wrote: > > > TLDR, based on availability concerns, skew concerns, operational > > concerns, and based on the fact that the new allocation algorithm can > > be configured fairly simply now, this is a proposal to go with 4 as the > > new default and the allocate_tokens_for_local_replication_factor set to > > 3. > > > I'm uncomfortable going with the default of `num_tokens: 4`. > I would rather see a default of `num_tokens: 16` based on the following… > > a) 4 num_tokens does not provide a good out-of-the-box experience. > b) 4 num_tokens doesn't provide any significant streaming benefits over 16. > c) edge-case availability doesn't trump (a) & (b) > > > For (a)… > The first node in each rack, up to RF racks, in each datacenter can't use > the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first > three nodes will be poorly balanced. If three poorly unbalanced nodes in a > cluster is an issue (because the cluster is small enough) therefore 4 is > the wrong default. From our own experience, we have had to bootstrap these > nodes multiple times until they generate something ok. In practice 4 > num_tokens (over 16) has provided more headache with clients than gain. > > Elaborating, 256 was originally chosen because the token randomness over > that many always averaged out. With a default of > `allocate_tokens_for_local_replication_factor: 3` this issue is largely > solved, but you will still have those initial nodes with randomly generated > tokens. Ref: > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/dht/tokenallocator/ReplicationAwareTokenAllocator.java#L80 > And to be precise: tokens are randomly generated until there is a node in > each rack up to RF racks. So, if you have RF=3, in theory (or are a newbie) > you could boot 100 nodes only in the first two racks, and they will all be > random tokens regardless of the > allocate_tokens_for_local_replication_factor setting. > > For example, using 4 num_tokens, 3 racks and RF=3… > - in a 6 node cluster, there's a total of 24 tokens, half of which are > random, > - in a 9 node cluster, there's a total of 36 tokens, a third of which is > random, > - etc > > Following this logic i would not be willing to apply 4 unless you know > there will be more than 36 nodes in each data centre, ie less than ~8% of > your tokens are randomly generated. Many clusters don't have that size, and > imho that's why 4 is a bad default. > > A default of 16 by the same logic only needs 9 nodes in each dc to > overcome that randomness degree. > > The workaround to all this is having to manually define `initial_token: …` > on those initial nodes. I'm really not inspired imposing that upon new > users. > > For (b)… > there's been a number of improvements already around streaming that > solves much of what would be any difference there is between 4 and 16 > num_tokens. And 4 num_tokens means bigger token ranges so could well be > disadvantageous due to over-streaming. > > For (c)… > we are trying to optimise availability in situations we can never > guarantee availability. I understand it's a nice operational advantage to > have in a shi
Re: [Discuss] num_tokens default in Cassandra 4.0
> > We should be using the default value that benefits the most people, rather > than an arbitrary compromise. I'd caution we're talking about the default value *we believe* will benefit the most people according to our respective understandings of C* usage. Most clusters don't shrink, they stay the same size or grow. I'd say 90% > or more fall in this category. While I agree with the "most don't shrink, they stay the same or grow" claim intuitively, there's a distinct difference impacting the 4 vs. 16 debate between what ratio we think stays the same size and what ratio we think grows that I think informs this discussion. There's a *lot* of Cassandra out in the world, and these changes are going to impact all of it. I'm not advocating a certain position on 4 vs. 16, but I do think we need to be very careful about how strongly we hold our beliefs and present them as facts in discussions like this. For my unsolicited .02, it sounds an awful lot like we're stuck between a rock and a hard place in that there is no correct "one size fits all" answer here (or, said another way: both 4 and 16 are correct, just for different cases and we don't know / agree on which one we think is the right one to target), so perhaps a discussion on a smart evolution of token allocation counts based on quantized tiers of cluster size and dataset growth (either automated or through operational best practices) could be valuable along with this. On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski wrote: > While I (mostly) understand the maths behind using 4 vnodes as a default > (which really is a question of extreme availability), I don't think they > provide noticeable performance improvements over using 16, while 16 vnodes > will protect folks from imbalances. It is very hard to deal with unbalanced > clusters, and people start to deal with it once some nodes are already > close to being full. Operationally, it's far from trivial. > We're going to make some experiments at bootstrapping clusters with 4 > tokens on the latest alpha to see how much balance we can expect, and how > removing one node could impact it. > > If we're talking about repairs, using 4 vnodes will generate overstreaming, > which can create lots of serious performance issues. Even on clusters with > 500GB of node density, we never use less than ~15 segments per node with > Reaper. > Not everyone uses Reaper, obviously, and there will be no protection > against overstreaming with such a low default for folks not using subrange > repairs. > On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and Reaper > already allows to get good repair performance because token ranges sharing > the exact same replicas will be processed in a single repair session. On > large clusters, I reckon it's good to have way less vnodes to speed up > repairs. > > Cassandra 4.0 is supposed to aim at providing a rock stable release of > Cassandra, fixing past instabilities, and I think lowering to 4 tokens by > default defeats that purpose. > 16 tokens is a reasonable compromise for clusters of all sizes, without > being too aggressive. Those with enough C* experience can still lower that > number for their clusters. > > Cheers, > > - > Alexander Dejanovski > France > @alexanderdeja > > Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > > On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever wrote: > > > > > > TLDR, based on availability concerns, skew concerns, operational > > > concerns, and based on the fact that the new allocation algorithm can > > > be configured fairly simply now, this is a proposal to go with 4 as the > > > new default and the allocate_tokens_for_local_replication_factor set to > > > 3. > > > > > > I'm uncomfortable going with the default of `num_tokens: 4`. > > I would rather see a default of `num_tokens: 16` based on the following… > > > > a) 4 num_tokens does not provide a good out-of-the-box experience. > > b) 4 num_tokens doesn't provide any significant streaming benefits over > 16. > > c) edge-case availability doesn't trump (a) & (b) > > > > > > For (a)… > > The first node in each rack, up to RF racks, in each datacenter can't > use > > the allocation strategy. With 4 num_tokens, 3 racks and RF=3, the first > > three nodes will be poorly balanced. If three poorly unbalanced nodes in > a > > cluster is an issue (because the cluster is small enough) therefore 4 is > > the wrong default. From our own experience, we have had to bootstrap > these > > nodes multiple times until they generate something ok. In practice 4 > > num_tokens (over 16) has provided more headache with clients than gain. > > > > Elaborating, 256 was originally chosen because the token randomness over > > that many always averaged out. With a default of > > `allocate_tokens_for_local_replication_factor: 3` this issue is largely > > solved, but you will still have those initial nodes with randomly > generated > > tokens. Ref: > > > htt
Re: [Discuss] num_tokens default in Cassandra 4.0
Hey all, At some point not too long ago I spent some time trying to make the token allocation algorithm the default. I didn't foresee it, although it might be obvious for many of you, but one corollary of the way the algorithm works (or more precisely might not work) with multiple seeds or simultaneous multi-node bootstraps or decommissions, is that a lot of dtests start failing due to deterministic token conflicts. I wasn't able to fix that by changing solely ccm and the dtests, unless careful, sequential node bootstrap was enforced. While it's strongly suggested to users to do exactly that in the real world, it would have exploded dtest run times to unacceptable levels. I have to clarify that what I'm working with is not exactly C*, and my knowledge of the C* codebase is not as up to date as I would want it to, but I suspect that the above problem might very well affect C* too, in which case changing the defaults might be a less-than-trivial undertaking. Regards, Dimitar На пт, 31.01.2020 г. в 17:20 Joshua McKenzie написа: > > > > We should be using the default value that benefits the most people, > rather > > than an arbitrary compromise. > > I'd caution we're talking about the default value *we believe* will benefit > the most people according to our respective understandings of C* usage. > > Most clusters don't shrink, they stay the same size or grow. I'd say 90% > > or more fall in this category. > > While I agree with the "most don't shrink, they stay the same or grow" > claim intuitively, there's a distinct difference impacting the 4 vs. 16 > debate between what ratio we think stays the same size and what ratio we > think grows that I think informs this discussion. > > There's a *lot* of Cassandra out in the world, and these changes are going > to impact all of it. I'm not advocating a certain position on 4 vs. 16, but > I do think we need to be very careful about how strongly we hold our > beliefs and present them as facts in discussions like this. > > For my unsolicited .02, it sounds an awful lot like we're stuck between a > rock and a hard place in that there is no correct "one size fits all" > answer here (or, said another way: both 4 and 16 are correct, just for > different cases and we don't know / agree on which one we think is the > right one to target), so perhaps a discussion on a smart evolution of token > allocation counts based on quantized tiers of cluster size and dataset > growth (either automated or through operational best practices) could be > valuable along with this. > > On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski < > a...@thelastpickle.com> > wrote: > > > While I (mostly) understand the maths behind using 4 vnodes as a default > > (which really is a question of extreme availability), I don't think they > > provide noticeable performance improvements over using 16, while 16 > vnodes > > will protect folks from imbalances. It is very hard to deal with > unbalanced > > clusters, and people start to deal with it once some nodes are already > > close to being full. Operationally, it's far from trivial. > > We're going to make some experiments at bootstrapping clusters with 4 > > tokens on the latest alpha to see how much balance we can expect, and how > > removing one node could impact it. > > > > If we're talking about repairs, using 4 vnodes will generate > overstreaming, > > which can create lots of serious performance issues. Even on clusters > with > > 500GB of node density, we never use less than ~15 segments per node with > > Reaper. > > Not everyone uses Reaper, obviously, and there will be no protection > > against overstreaming with such a low default for folks not using > subrange > > repairs. > > On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and > Reaper > > already allows to get good repair performance because token ranges > sharing > > the exact same replicas will be processed in a single repair session. On > > large clusters, I reckon it's good to have way less vnodes to speed up > > repairs. > > > > Cassandra 4.0 is supposed to aim at providing a rock stable release of > > Cassandra, fixing past instabilities, and I think lowering to 4 tokens by > > default defeats that purpose. > > 16 tokens is a reasonable compromise for clusters of all sizes, without > > being too aggressive. Those with enough C* experience can still lower > that > > number for their clusters. > > > > Cheers, > > > > - > > Alexander Dejanovski > > France > > @alexanderdeja > > > > Consultant > > Apache Cassandra Consulting > > http://www.thelastpickle.com > > > > > > On Fri, Jan 31, 2020 at 1:41 PM Mick Semb Wever wrote: > > > > > > > > > TLDR, based on availability concerns, skew concerns, operational > > > > concerns, and based on the fact that the new allocation algorithm can > > > > be configured fairly simply now, this is a proposal to go with 4 as > the > > > > new default and the allocate_tokens_for_local_replication_factor set > to > >
Re: [Discuss] num_tokens default in Cassandra 4.0
So why even have virtual nodes at all, why not work on improving single token approaches so that we can support cluster doubling, which IMO would enable cassandra to more quickly scale for volatile loads? It's my guess/understanding that vnodes eliminate the token rebalancing that existed back in the days of single token. Did vnodes also help reduce the amount of streamed data in rebalancing/expansion? VNodes also help the streaming from multiple sources in expansion, but if it limits us to single node expansion that really limits flexibility on large node count clusters. Were there other advantages to VNodes that I missed? IIRC High vnode count basically broke the secondary low cardinality indexes, and vnode=4 might help that a lot. But if 4 hasn't shown any balancing issues, I'm all for it. On Fri, Jan 31, 2020 at 9:58 AM Dimitar Dimitrov wrote: > Hey all, > > At some point not too long ago I spent some time trying to > make the token allocation algorithm the default. > > I didn't foresee it, although it might be obvious for many of > you, but one corollary of the way the algorithm works (or more > precisely might not work) with multiple seeds or simultaneous > multi-node bootstraps or decommissions, is that a lot of dtests > start failing due to deterministic token conflicts. I wasn't > able to fix that by changing solely ccm and the dtests, unless > careful, sequential node bootstrap was enforced. While it's strongly > suggested to users to do exactly that in the real world, it would > have exploded dtest run times to unacceptable levels. > > I have to clarify that what I'm working with is not exactly > C*, and my knowledge of the C* codebase is not as up to date as > I would want it to, but I suspect that the above problem might very > well affect C* too, in which case changing the defaults might > be a less-than-trivial undertaking. > > Regards, > Dimitar > > На пт, 31.01.2020 г. в 17:20 Joshua McKenzie > написа: > > > > > > > We should be using the default value that benefits the most people, > > rather > > > than an arbitrary compromise. > > > > I'd caution we're talking about the default value *we believe* will > benefit > > the most people according to our respective understandings of C* usage. > > > > Most clusters don't shrink, they stay the same size or grow. I'd say 90% > > > or more fall in this category. > > > > While I agree with the "most don't shrink, they stay the same or grow" > > claim intuitively, there's a distinct difference impacting the 4 vs. 16 > > debate between what ratio we think stays the same size and what ratio we > > think grows that I think informs this discussion. > > > > There's a *lot* of Cassandra out in the world, and these changes are > going > > to impact all of it. I'm not advocating a certain position on 4 vs. 16, > but > > I do think we need to be very careful about how strongly we hold our > > beliefs and present them as facts in discussions like this. > > > > For my unsolicited .02, it sounds an awful lot like we're stuck between a > > rock and a hard place in that there is no correct "one size fits all" > > answer here (or, said another way: both 4 and 16 are correct, just for > > different cases and we don't know / agree on which one we think is the > > right one to target), so perhaps a discussion on a smart evolution of > token > > allocation counts based on quantized tiers of cluster size and dataset > > growth (either automated or through operational best practices) could be > > valuable along with this. > > > > On Fri, Jan 31, 2020 at 8:57 AM Alexander Dejanovski < > > a...@thelastpickle.com> > > wrote: > > > > > While I (mostly) understand the maths behind using 4 vnodes as a > default > > > (which really is a question of extreme availability), I don't think > they > > > provide noticeable performance improvements over using 16, while 16 > > vnodes > > > will protect folks from imbalances. It is very hard to deal with > > unbalanced > > > clusters, and people start to deal with it once some nodes are already > > > close to being full. Operationally, it's far from trivial. > > > We're going to make some experiments at bootstrapping clusters with 4 > > > tokens on the latest alpha to see how much balance we can expect, and > how > > > removing one node could impact it. > > > > > > If we're talking about repairs, using 4 vnodes will generate > > overstreaming, > > > which can create lots of serious performance issues. Even on clusters > > with > > > 500GB of node density, we never use less than ~15 segments per node > with > > > Reaper. > > > Not everyone uses Reaper, obviously, and there will be no protection > > > against overstreaming with such a low default for folks not using > > subrange > > > repairs. > > > On small clusters, even with 256 vnodes, using Cassandra 3.0/3.x and > > Reaper > > > already allows to get good repair performance because token ranges > > sharing > > > the exact same replicas will be processed in a single repair ses
Re: [Discuss] num_tokens default in Cassandra 4.0
On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: one corollary of the way the algorithm works (or more precisely might not work) with multiple seeds or simultaneous multi-node bootstraps or decommissions, is that a lot of dtests start failing due to deterministic token conflicts. I wasn't able to fix that by changing solely ccm and the dtests I appreciate all the detailed discussion. For a little historic context, since I brought up this topic in the contributors zoom meeting, unstable dtests was precisely the reason we moved the dtest configurations to 'num_tokens: 32'. That value has been used in CI dtest since something like 2014, when we found that this helped stabilize a large segment of flaky dtest failures. No real science there, other than "this hurts less." I have no real opinion on the suggestions of using 4 or 16, other than I believe most "default config using" new users are starting with smaller numbers of nodes. The small-but-growing users and veteran large cluster admins should be gaining more operational knowledge and be able to adjust their own config choices according to their needs (and good comment suggestions in the yaml). Whatever default config value is chosen for num_tokens, I think it should suit the new users with smaller clusters. The suggestion Mick makes that 16 makes a better choice for small numbers of nodes, well, that would seem to be the better choice for those users we are trying to help the most with the default. I fully agree that science, maths, and support/ops experience should guide the choice, but I don't believe that large/giant clusters and admins are the target audience for the value we select. -- Kind regards, Michael - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [Discuss] num_tokens default in Cassandra 4.0
"large/giant clusters and admins are the target audience for the value we select" There are reasons aside from massive scale to pick cassandra, but the primary reason cassandra is selected technically is to support vertically scaling to large clusters. Why pick a value that once you reach scale you need to switch token count? It's still a ticking time bomb, although 16 won't be what 256 is. H. But 4 is bad and could scare off adoption. Ultimately a well-written article on operations and how to transition from 16 --> 4 and at what point that is a good idea (aka not when your cluster is too big) should be a critical part of this. On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler wrote: > On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: > > one corollary of the way the algorithm works (or more > > precisely might not work) with multiple seeds or simultaneous > > multi-node bootstraps or decommissions, is that a lot of dtests > > start failing due to deterministic token conflicts. I wasn't > > able to fix that by changing solely ccm and the dtests > I appreciate all the detailed discussion. For a little historic context, > since I brought up this topic in the contributors zoom meeting, unstable > dtests was precisely the reason we moved the dtest configurations to > 'num_tokens: 32'. That value has been used in CI dtest since something > like 2014, when we found that this helped stabilize a large segment of > flaky dtest failures. No real science there, other than "this hurts less." > > I have no real opinion on the suggestions of using 4 or 16, other than I > believe most "default config using" new users are starting with smaller > numbers of nodes. The small-but-growing users and veteran large cluster > admins should be gaining more operational knowledge and be able to > adjust their own config choices according to their needs (and good > comment suggestions in the yaml). Whatever default config value is > chosen for num_tokens, I think it should suit the new users with smaller > clusters. The suggestion Mick makes that 16 makes a better choice for > small numbers of nodes, well, that would seem to be the better choice > for those users we are trying to help the most with the default. > > I fully agree that science, maths, and support/ops experience should > guide the choice, but I don't believe that large/giant clusters and > admins are the target audience for the value we select. > > -- > Kind regards, > Michael > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [Discuss] num_tokens default in Cassandra 4.0
edit: 4 is bad at small cluster sizes and could scare off adoption On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller wrote: > "large/giant clusters and admins are the target audience for the value we > select" > > There are reasons aside from massive scale to pick cassandra, but the > primary reason cassandra is selected technically is to support vertically > scaling to large clusters. > > Why pick a value that once you reach scale you need to switch token count? > It's still a ticking time bomb, although 16 won't be what 256 is. > > H. But 4 is bad and could scare off adoption. > > Ultimately a well-written article on operations and how to transition from > 16 --> 4 and at what point that is a good idea (aka not when your cluster > is too big) should be a critical part of this. > > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler > wrote: > >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: >> > one corollary of the way the algorithm works (or more >> > precisely might not work) with multiple seeds or simultaneous >> > multi-node bootstraps or decommissions, is that a lot of dtests >> > start failing due to deterministic token conflicts. I wasn't >> > able to fix that by changing solely ccm and the dtests >> I appreciate all the detailed discussion. For a little historic context, >> since I brought up this topic in the contributors zoom meeting, unstable >> dtests was precisely the reason we moved the dtest configurations to >> 'num_tokens: 32'. That value has been used in CI dtest since something >> like 2014, when we found that this helped stabilize a large segment of >> flaky dtest failures. No real science there, other than "this hurts less." >> >> I have no real opinion on the suggestions of using 4 or 16, other than I >> believe most "default config using" new users are starting with smaller >> numbers of nodes. The small-but-growing users and veteran large cluster >> admins should be gaining more operational knowledge and be able to >> adjust their own config choices according to their needs (and good >> comment suggestions in the yaml). Whatever default config value is >> chosen for num_tokens, I think it should suit the new users with smaller >> clusters. The suggestion Mick makes that 16 makes a better choice for >> small numbers of nodes, well, that would seem to be the better choice >> for those users we are trying to help the most with the default. >> >> I fully agree that science, maths, and support/ops experience should >> guide the choice, but I don't believe that large/giant clusters and >> admins are the target audience for the value we select. >> >> -- >> Kind regards, >> Michael >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >>
Re: [Discuss] num_tokens default in Cassandra 4.0
I think that we might be bikeshedding this number a bit because it is easy to debate and there is not yet one right answer. I hope we recognize either choice (4 or 16) is fine in that users can always override us and we can always change our minds later or better yet improve allocation so users don't have to care. Either choice is an improvement on the status quo. I only truly care that when we change this default let's make sure that: 1. Users can still launch a cluster all at once. Last I checked even with allocate_for_rf you need to bootstrap one node at a time for even allocation to work properly; please someone correct me if I'm wrong, and if I'm not let's get this fixed before the beta. 2. We get good documentation about this choice into our docs. [documentation team and I are on it!] I don't like phrasing this as a "small user" vs "large user" discussion. Everybody using Cassandra wants it to be easy to operate with high availability and consistent performance. Optimizing for "I can oops a few nodes and not have an outage" is an important thing to optimize for regardless of scale. It seems we have a lot of input on this thread that we're frequently seeing users override this to 4 (apparently even with random allocation? I am personally surprised by this if true). Some people have indicated that they like a higher number like 16 or 32. Some (most?) of our largest users by footprint are still using 1. The only significant advantage I'm aware of for 16 over 4 is that users can scale up and down in increments of N/16 (12 node cluster -> 1) instead of N/4 (12 node cluster -> 3) without further token allocation improvements in Cassandra. Practically speaking I think people are often spreading nodes out over RF=3 "racks" (e.g. GCP, Azure, and AWS) so they'll want to scale by increments of 3 anyways. I agree with Jon that optimizing for scale-downs is odd; it's a pretty infrequent operation and all the users I know doing autoscaling are doing it vertically using networked attached storage (~EBS). Let's also remember repairing clusters with 16 tokens per node is slower (probably about 2-4x slower) than repairing clusters with 4 tokens. With zero copy streaming there should no benefit to more tokens for data transfer, if there is, it is a bug in streaming performance and let's fix it. Honestly, in my opinion if we have balancing issues with small number of tokens that is a bug and we should just fix it; token moves are safe, it is definitely possible for Cassandra to just self-balance itself. Let's not worry about scaring off users with this choice, choosing 4 will not scare off users any more than 256 random tokens has scared off users when they realized that they can't have any combination of two nodes down in different racks. -Joey On Fri, Jan 31, 2020 at 10:16 AM Carl Mueller wrote: > edit: 4 is bad at small cluster sizes and could scare off adoption > > On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller < > carl.muel...@smartthings.com> > wrote: > > > "large/giant clusters and admins are the target audience for the value we > > select" > > > > There are reasons aside from massive scale to pick cassandra, but the > > primary reason cassandra is selected technically is to support vertically > > scaling to large clusters. > > > > Why pick a value that once you reach scale you need to switch token > count? > > It's still a ticking time bomb, although 16 won't be what 256 is. > > > > H. But 4 is bad and could scare off adoption. > > > > Ultimately a well-written article on operations and how to transition > from > > 16 --> 4 and at what point that is a good idea (aka not when your cluster > > is too big) should be a critical part of this. > > > > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler > > wrote: > > > >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: > >> > one corollary of the way the algorithm works (or more > >> > precisely might not work) with multiple seeds or simultaneous > >> > multi-node bootstraps or decommissions, is that a lot of dtests > >> > start failing due to deterministic token conflicts. I wasn't > >> > able to fix that by changing solely ccm and the dtests > >> I appreciate all the detailed discussion. For a little historic context, > >> since I brought up this topic in the contributors zoom meeting, unstable > >> dtests was precisely the reason we moved the dtest configurations to > >> 'num_tokens: 32'. That value has been used in CI dtest since something > >> like 2014, when we found that this helped stabilize a large segment of > >> flaky dtest failures. No real science there, other than "this hurts > less." > >> > >> I have no real opinion on the suggestions of using 4 or 16, other than I > >> believe most "default config using" new users are starting with smaller > >> numbers of nodes. The small-but-growing users and veteran large cluster > >> admins should be gaining more operational knowledge and be able to > >> adjust their own config choices according to their needs (
Re: [Discuss] num_tokens default in Cassandra 4.0
On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch wrote: > I think that we might be bikeshedding this number a bit because it is easy > to debate and there is not yet one right answer. > https://www.youtube.com/watch?v=v465T5u9UKo
Re: [Discuss] num_tokens default in Cassandra 4.0
I think Mick and Anthony make some valid operational and skew points for smaller/starting clusters with 4 num_tokens. There’s an arbitrary line between small and large clusters but I think most would agree that most clusters are on the small to medium side. (A small nuance is afaict the probabilities have to do with quorum on a full token range, ie it has to do with the size of a datacenter not the full cluster As I read this discussion I’m personally more inclined to go with 16 for now. It’s true that if we could fix the skew and topology gotchas for those starting things up, 4 would be ideal from an availability perspective. However we’re still in the brainstorming stage for how to address those challenges. I think we should create tickets for those issues and go with 16 for 4.0. This is about an out of the box experience. It balances availability, operations (such as skew and general bootstrap friendliness and streaming/repair), and cluster sizing. Balancing all of those, I think for now I’m more comfortable with 16 as the default with docs on considerations and tickets to unblock 4 as the default for all users. >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa wrote: >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch wrote: >> I think that we might be bikeshedding this number a bit because it is easy >> to debate and there is not yet one right answer. > > > https://www.youtube.com/watch?v=v465T5u9UKo - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [VOTE] Release Apache Cassandra 4.0-alpha3
+1 (non-binding) I've briefly tested the build with Jepsen. https://github.com/scalar-labs/scalar-jepsen 2020年1月31日(金) 13:37 Anthony Grasso : > +1 (non-binding) > > On Fri, 31 Jan 2020 at 08:48, Joshua McKenzie > wrote: > > > +1 > > > > On Thu, Jan 30, 2020 at 4:31 PM Brandon Williams > wrote: > > > > > +1 > > > > > > On Thu, Jan 30, 2020 at 1:47 PM Mick Semb Wever > wrote: > > > > > > > > Proposing the test build of Cassandra 4.0-alpha3 for release. > > > > > > > > sha1: 5f7c88601c65cdf14ee68387ed68203f2603fc29 > > > > Git: > > > > > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/4.0-alpha3-tentative > > > > Maven Artifacts: > > > > > > https://repository.apache.org/content/repositories/orgapachecassandra-1189/org/apache/cassandra/apache-cassandra/4.0-alpha3/ > > > > > > > > The Source and Build Artifacts, and the Debian and RPM packages are > > > available here: > > > https://dist.apache.org/repos/dist/dev/cassandra/4.0-alpha3/ > > > > > > > > The vote will be open for 72 hours (longer if needed). Everyone who > has > > > tested the build is invited to vote. Votes by PMC members are > considered > > > binding. A vote passes if there are at least three binding +1s. > > > > > > > > ** Please note this is my first time as release manager, and the > > release > > > process has been improved to deal with sha256|512 checksums as well as > to > > > use the ASF dev dist staging location. So please be extra critical. ** > > > > > > > > > > > > [1]: CHANGES.txt: > > > > > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/4.0-alpha3-tentative > > > > [2]: NEWS.txt: > > > > > > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/4.0-alpha3-tentative > > > > > > > > - > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > - > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > >