[DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
CASSANDRA-12126 addresses one correctness issue of Light Weight Transactions. Unfortunately, the current patch developed by Sylvain and Benedict requires an extra round trip between the coordinator and the replicas for SERIAL and LOCAL_SERIAL reads. After some experimentations, Benedict discovered that this extra round trip could lead to a significant increase in timeouts for read-heavy workloads. Users for which this behavior is a problem will be able to switch back to the old behavior using a system property, therefore choosing performance versus correctness. On the side, Benedict has worked on another approach that does not suffer from that performance problem and also addresses some LWT correctness issues that can happen when adding or removing nodes. He initially intended to deliver that improvement in 4.X but can try to incorporate it into 4.0. Regarding CASSANDRA-12126 and 4.0 we are facing several options and Benedict, Sylvain and I wanted to get the community feedback on them. We can: 1. Try to use Benedict proposal for 4.0 if the community has the appetite for it. The main issue there is some potential extra delay for 4.0 2. Do nothing for 4.0. Meaning do not commit the current patch. We have lived a long time with that issue and we can probably wait a bit more for a proper solution. 3. Commit the patch as such, fixing the correctness but introducing potentially some performance issue until we release a better solution. 4. Changing the patch to default to the current behavior but allowing people to enable the new one if the correctness is a problem for them. Thanks in advance for your feedback.
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
How old is the C-12126 surfaced defect? i.e. is this a thing we've had since initial introduction of paxos or is it a regression we introduced somewhere along the way? On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer wrote: > CASSANDRA-12126 addresses one correctness issue of Light Weight > Transactions. Unfortunately, the current patch developed by Sylvain and > Benedict requires an extra round trip between the coordinator and the > replicas for SERIAL and LOCAL_SERIAL reads. > After some experimentations, Benedict discovered that this extra round trip > could lead to a significant increase in timeouts for read-heavy workloads. > > Users for which this behavior is a problem will be able to switch back to > the old behavior using a system property, therefore choosing performance > versus correctness. > > On the side, Benedict has worked on another approach that does not suffer > from that performance problem and also addresses some LWT correctness > issues that can happen when adding or removing nodes. He initially intended > to deliver that improvement in 4.X but can try to incorporate it into 4.0. > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for > 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more > for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > > Thanks in advance for your feedback. >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
It's been there since the beginning. If we were to consider the alternative proposal for 4.0, it would not have to be blocking for release. I had planned to come forward after 4.0, primarily because I did not want to create further political complexities for the project at this time, but also because I do not presently have the time to produce all of the documentation we might like for such a proposal. However, the work is ready, has already been reviewed by multiple committers, has had more extensive testing than any feature I'm aware of to date, and could be made available for 4.0 in fairly short order. While the work itself is non-trivial, the work to integrate it is not complex. It would also be optional, and configurable at runtime. The only likely blocker would be the process of review, and any other due diligence the project might want to undertake. Absolutely not something I advocate for or against an accelerated timescale on. I have no personal preference for the approach taken, just providing this for context. On 11/11/2020, 16:18, "Joshua McKenzie" wrote: How old is the C-12126 surfaced defect? i.e. is this a thing we've had since initial introduction of paxos or is it a regression we introduced somewhere along the way? On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer wrote: > CASSANDRA-12126 addresses one correctness issue of Light Weight > Transactions. Unfortunately, the current patch developed by Sylvain and > Benedict requires an extra round trip between the coordinator and the > replicas for SERIAL and LOCAL_SERIAL reads. > After some experimentations, Benedict discovered that this extra round trip > could lead to a significant increase in timeouts for read-heavy workloads. > > Users for which this behavior is a problem will be able to switch back to > the old behavior using a system property, therefore choosing performance > versus correctness. > > On the side, Benedict has worked on another approach that does not suffer > from that performance problem and also addresses some LWT correctness > issues that can happen when adding or removing nodes. He initially intended > to deliver that improvement in 4.X but can try to incorporate it into 4.0. > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for > 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more > for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > > Thanks in advance for your feedback. > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Got it. Thanks for the extra context. No real opinion here. :) On Wed, Nov 11, 2020 at 11:29 AM Benedict Elliott Smith wrote: > It's been there since the beginning. > > If we were to consider the alternative proposal for 4.0, it would not have > to be blocking for release. I had planned to come forward after 4.0, > primarily because I did not want to create further political complexities > for the project at this time, but also because I do not presently have the > time to produce all of the documentation we might like for such a proposal. > However, the work is ready, has already been reviewed by multiple > committers, has had more extensive testing than any feature I'm aware of to > date, and could be made available for 4.0 in fairly short order. While the > work itself is non-trivial, the work to integrate it is not complex. It > would also be optional, and configurable at runtime. > > The only likely blocker would be the process of review, and any other due > diligence the project might want to undertake. Absolutely not something I > advocate for or against an accelerated timescale on. I have no personal > preference for the approach taken, just providing this for context. > > > On 11/11/2020, 16:18, "Joshua McKenzie" wrote: > > How old is the C-12126 surfaced defect? i.e. is this a thing we've had > since initial introduction of paxos or is it a regression we introduced > somewhere along the way? > > On Wed, Nov 11, 2020 at 11:03 AM Benjamin Lerer < > benjamin.le...@datastax.com> > wrote: > > > CASSANDRA-12126 addresses one correctness issue of Light Weight > > Transactions. Unfortunately, the current patch developed by Sylvain > and > > Benedict requires an extra round trip between the coordinator and the > > replicas for SERIAL and LOCAL_SERIAL reads. > > After some experimentations, Benedict discovered that this extra > round trip > > could lead to a significant increase in timeouts for read-heavy > workloads. > > > > Users for which this behavior is a problem will be able to switch > back to > > the old behavior using a system property, therefore choosing > performance > > versus correctness. > > > > On the side, Benedict has worked on another approach that does not > suffer > > from that performance problem and also addresses some LWT correctness > > issues that can happen when adding or removing nodes. He initially > intended > > to deliver that improvement in 4.X but can try to incorporate it > into 4.0. > > > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > > Benedict, Sylvain and I wanted to get the community feedback on them. > > > > We can: > > > >1. Try to use Benedict proposal for 4.0 if the community has the > >appetite for it. The main issue there is some potential extra > delay for > > 4.0 > >2. Do nothing for 4.0. Meaning do not commit the current patch. > We have > >lived a long time with that issue and we can probably wait a bit > more > > for a > >proper solution. > >3. Commit the patch as such, fixing the correctness but > introducing > >potentially some performance issue until we release a better > solution. > >4. Changing the patch to default to the current behavior but > allowing > >people to enable the new one if the correctness is a problem for > them. > > > > Thanks in advance for your feedback. > > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
> Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > If these options are for 4.0, is it then (4) that it is getting applied to 3.0 and 3.11 ? If that is the case then I would vote on also applying (4) to 4.0, given we are now in front of beta4. Please let's not further delay 4.0. Post 4.0, if (1) is as described "a parallel implementation of the same underlying Paxos algorithm" can it also pluggable (either opt-in or opt-out)? And would/could EPaxos become pluggable too in a similar manner (if it eventuates)? I'm in favour on providing more pluggable interfaces into C*, along with the code quality improvements that's going to have to be accompanied with. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
In my opinion, a similar calculus should be applied to 3.0 and 3.11. This is a(n arguably quite serious) bug, so whatever is not overly onerous to backport should be considered while they are supported. The work under discussion has two components: a replacement to the core consensus algorithm, and mechanisms to ensure safety across range movements. The latter might be more invasive for 3.x, but the former should be quite easy to backport and as such probably quite well justified. > can it also pluggable (either opt-in or opt-out)? I think pluggable means something different to opt-in/opt-out, at least to me. I'm all for more pluggability, and also for more optionality, but the decision is very sensitive to context. We need to be able to select between our options, which for consensus practically means supporting live migration - which is exceptionally challenging in any general sense (and perhaps inherently non-pluggable). As to future development for consensus, I personally hope the work we are discussing here will be a strong platform for it, but obviously that's for the community to decide later on. I think the work to take it forwards to something epaxos-like will not be that herculean, with some incremental milestones en route. But that's a totally different discussion for the future, and either a CEP or a small intercollegiate working group. On 11/11/2020, 18:48, "Michael Semb Wever" wrote: > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > Benedict, Sylvain and I wanted to get the community feedback on them. > > We can: > >1. Try to use Benedict proposal for 4.0 if the community has the >appetite for it. The main issue there is some potential extra delay for 4.0 >2. Do nothing for 4.0. Meaning do not commit the current patch. We have >lived a long time with that issue and we can probably wait a bit more for a >proper solution. >3. Commit the patch as such, fixing the correctness but introducing >potentially some performance issue until we release a better solution. >4. Changing the patch to default to the current behavior but allowing >people to enable the new one if the correctness is a problem for them. > If these options are for 4.0, is it then (4) that it is getting applied to 3.0 and 3.11 ? If that is the case then I would vote on also applying (4) to 4.0, given we are now in front of beta4. Please let's not further delay 4.0. Post 4.0, if (1) is as described "a parallel implementation of the same underlying Paxos algorithm" can it also pluggable (either opt-in or opt-out)? And would/could EPaxos become pluggable too in a similar manner (if it eventuates)? I'm in favour on providing more pluggable interfaces into C*, along with the code quality improvements that's going to have to be accompanied with. - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance
Knowing there is a correctness issue in LWT, and given users use LWT primarily for correctness, my opinion is we should commit the correctness patch (makes it one of #1, #3 or #4) I agree we should not cause further delay to 4.0 release (making it one of #3 or #4). Con for #3 would be, applications may have to rework their (and downstreams') configuration(s) to potentially accommodate for the performance regression which may not be ideal for a seamless 4.0 upgrade that we expect users to experience. Now, given this correctness issue has been since the beginning, existing LWT users would notice no new difference potentially w.r.t. correctness since they may have already worked around this bug (if they noticed), so +1 to option #4. On Wed, Nov 11, 2020 at 1:49 PM Benedict Elliott Smith wrote: > In my opinion, a similar calculus should be applied to 3.0 and 3.11. This > is a(n arguably quite serious) bug, so whatever is not overly onerous to > backport should be considered while they are supported. The work under > discussion has two components: a replacement to the core consensus > algorithm, and mechanisms to ensure safety across range movements. The > latter might be more invasive for 3.x, but the former should be quite easy > to backport and as such probably quite well justified. > > > can it also pluggable (either opt-in or opt-out)? > > I think pluggable means something different to opt-in/opt-out, at least to > me. I'm all for more pluggability, and also for more optionality, but the > decision is very sensitive to context. We need to be able to select between > our options, which for consensus practically means supporting live > migration - which is exceptionally challenging in any general sense (and > perhaps inherently non-pluggable). > > As to future development for consensus, I personally hope the work we are > discussing here will be a strong platform for it, but obviously that's for > the community to decide later on. I think the work to take it forwards to > something epaxos-like will not be that herculean, with some incremental > milestones en route. But that's a totally different discussion for the > future, and either a CEP or a small intercollegiate working group. > > > On 11/11/2020, 18:48, "Michael Semb Wever" wrote: > > > > Regarding CASSANDRA-12126 and 4.0 we are facing several options and > > Benedict, Sylvain and I wanted to get the community feedback on them. > > > > We can: > > > >1. Try to use Benedict proposal for 4.0 if the community has the > >appetite for it. The main issue there is some potential extra > delay for 4.0 > >2. Do nothing for 4.0. Meaning do not commit the current patch. > We have > >lived a long time with that issue and we can probably wait a bit > more for a > >proper solution. > >3. Commit the patch as such, fixing the correctness but > introducing > >potentially some performance issue until we release a better > solution. > >4. Changing the patch to default to the current behavior but > allowing > >people to enable the new one if the correctness is a problem for > them. > > > > > If these options are for 4.0, is it then (4) that it is getting > applied to 3.0 and 3.11 ? > > If that is the case then I would vote on also applying (4) to 4.0, > given we are now in front of beta4. Please let's not further delay 4.0. > > Post 4.0, if (1) is as described "a parallel implementation of the > same underlying Paxos algorithm" can it also pluggable (either opt-in or > opt-out)? And would/could EPaxos become pluggable too in a similar manner > (if it eventuates)? I'm in favour on providing more pluggable interfaces > into C*, along with the code quality improvements that's going to have to > be accompanied with. > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >