Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

Benedict Elliott Smith Fri, 20 Nov 2020 10:04:24 -0800

Well, I expressed a preference for #3 over #4, particularly for the 3.x series. 
 However at this point, I think the lack of a clear project decision means we 
can punt it back to you and Sylvain to make the final call.


On 20/11/2020, 16:23, "Benjamin Lerer" <benjamin.le...@datastax.com> wrote:

    I will try to summarize the discussion to clarify the outcome.

    Mick is in favor of #4
    Summanth is in favor of #4
    Sylvain answer was not clear for me. I understood it like I prefer #3 to #4
    and I am also fine with #1
    Jeff is in favor of #3 and will understand #4
    David is in favor #3 (fix bug and add flag to roll back to old behavior) in
    4.0 and #4 in 3.0 and 3.11

    Do not hesitate to correct me if I misunderstood your answer.

    Based on these answers it seems clear that most people prefer to go for #3
    or #4.

    The choice between #3 (fix correctness opt-in to current behavior) and #4
    (current behavior opt-in to correctness) is a bit less clear specially if
    we consider the 3.X branches or 4.0.

    Does anybody as some idea on how to choose between those 2 choices or some
    extra opinions on #3 versus #4?






    On Wed, Nov 18, 2020 at 9:45 PM David Capwell <dcapw...@gmail.com> wrote:

    > I feel that #4 (fix bug and add flag to roll back to old behavior) is 
best.
    >
    > About the alternative implementation, I am fine adding it to 3.x and 4.0,
    > but should treat it as a different path disabled by default that you can
    > opt-into, with a plan to opt-in by default "eventually".
    >
    > On Wed, Nov 18, 2020 at 11:10 AM Benedict Elliott Smith <
    > bened...@apache.org>
    > wrote:
    >
    > > Perhaps there might be broader appetite to weigh in on which major
    > > releases we might target for work that fixes the correctness bug without
    > > serious performance regression?
    > >
    > > i.e., if we were to fix the correctness bug now, introducing a serious
    > > performance regression (either opt-in or opt-out), but were to land work
    > > without this problem for 5.0, would there be appetite to backport this
    > work
    > > to any of 4.0, 3.11 or 3.0?
    > >
    > >
    > > On 18/11/2020, 18:31, "Jeff Jirsa" <jji...@gmail.com> wrote:
    > >
    > >     This is complicated and relatively few people on earth understand 
it,
    > > so
    > >     having little feedback is mostly expected, unfortunately.
    > >
    > >     My normal emotional response is "correctness is required, opt-in to
    > >     performance improvements that sacrifice strict correctness", but I'm
    > > also
    > >     sure this is going to surprise people, and would understand / accept
    > #4
    > >     (default to current, opt-in to correct).
    > >
    > >
    > >     On Wed, Nov 18, 2020 at 4:54 AM Benedict Elliott Smith <
    > > bened...@apache.org>
    > >     wrote:
    > >
    > >     > It doesn't seem like there's much enthusiasm for any of the 
options
    > >     > available here...
    > >     >
    > >     > On 12/11/2020, 14:37, "Benedict Elliott Smith" <
    > bened...@apache.org
    > > >
    > >     > wrote:
    > >     >
    > >     >     > Is the new implementation a separate, distinctly modularized
    > > new
    > >     > body of work
    > >     >
    > >     >     It’s primarily a distinct, modularised and new body of work,
    > > however
    > >     > there is some shared code that has been modified - namely
    > > PaxosState, in
    > >     > which legacy code is maintained but modified for compatibility, 
and
    > > the
    > >     > system.paxos table (which receives a new column, and slightly
    > > modified
    > >     > serialization code).  It is conceptually an optimised version of
    > the
    > >     > existing algorithm.
    > >     >
    > >     >     If there's a chance of being of value to 4.0, I can try to put
    > > up a
    > >     > patch next week alongside a high level description of the changes.
    > >     >
    > >     >     > But a performance regression is a regression, I'm not
    > > shrugging it
    > >     > off.
    > >     >
    > >     >     I don't want to give the impression I'm shrugging off the
    > > correctness
    > >     > issue either. It's a serious issue to fix, but since all 
successful
    > > updates
    > >     > to the database are linearizable, I think it's likely that many
    > >     > applications behave correctly with the present semantics, or at
    > least
    > >     > encounter only transient errors. No doubt many also do not, but I
    > > have no
    > >     > idea of the ratio.
    > >     >
    > >     >     The regression isn't itself a simple issue either - depending
    > on
    > > the
    > >     > topology and message latencies it is not difficult to produce
    > > inescapable
    > >     > contention, i.e. guaranteed timeouts - that might persist as long
    > as
    > >     > clients continue to retry. It could be quite a serious degradation
    > of
    > >     > service to impose on our users.
    > >     >
    > >     >     I don't pretend to know the correct way to make a decision
    > > balancing
    > >     > these considerations, but I am perhaps more concerned about
    > imposing
    > >     > service outages than I am temporarily maintaining semantics our
    > > users have
    > >     > apparently accepted for years - though I absolutely share your
    > >     > embarrassment there.
    > >     >
    > >     >
    > >     >     On 12/11/2020, 12:41, "Joshua McKenzie" <jmcken...@apache.org
    > >
    > > wrote:
    > >     >
    > >     >         Is the new implementation a separate, distinctly
    > modularized
    > > new
    > >     > body of
    > >     >         work or does it make substantial changes to existing
    > >     > implementation and
    > >     >         subsume it?
    > >     >
    > >     >         On Thu, Nov 12, 2020 at 3:56 AM Sylvain Lebresne <
    > >     > lebre...@gmail.com> wrote:
    > >     >
    > >     >         > Regarding option #4, I'll remark that experience tends 
to
    > >     > suggest users
    > >     >         > don't consistently read the `NEWS.txt` file on upgrade,
    > so
    > >     > option #4 will
    > >     >         > likely essentially mean "LWT has a correctness issue, 
but
    > > once
    > >     > it broke
    > >     >         > your data enough that you'll notice, you'll be able to
    > dig
    > > the
    > >     > proper flag
    > >     >         > to fix it for next time". I guess it's better than
    > > nothing, of
    > >     > course, but
    > >     >         > I'll admit that defaulting to "opt-in correctness",
    > > especially
    > >     > for a
    > >     >         > feature (LWT) that exists uniquely to provide additional
    > >     > guarantees, is
    > >     >         > something I have a hard rallying behind.
    > >     >         >
    > >     >         > But a performance regression is a regression, I'm not
    > > shrugging
    > >     > it off.
    > >     >         > Still, I feel we shouldn't leave LWT with a fairly
    > serious
    > > known
    > >     >         > correctness bug and I frankly feel bad for "the project"
    > > that
    > >     > this has been
    > >     >         > known for so long without action, so I'm a bit biased in
    > > wanting
    > >     > to get it
    > >     >         > fixed asap.
    > >     >         >
    > >     >         > But maybe I'm overstating the urgency here, and maybe
    > > option #1
    > >     > is a better
    > >     >         > way forward.
    > >     >         >
    > >     >         > --
    > >     >         > Sylvain
    > >     >         >
    > >     >
    > >     >
    > >     >
    > >     >
    > >  ---------------------------------------------------------------------
    > >     >     To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
    > >     >     For additional commands, e-mail: dev-h...@cassandra.apache.org
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > ---------------------------------------------------------------------
    > >     > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
    > >     > For additional commands, e-mail: dev-h...@cassandra.apache.org
    > >     >
    > >     >
    > >
    > >
    > >
    > > ---------------------------------------------------------------------
    > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
    > > For additional commands, e-mail: dev-h...@cassandra.apache.org
    > >
    > >
    >



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: [DISCUSS] CASSANDRA-12126: LWTs correcteness and performance

Reply via email to