Thanks Benjamin!

I propose we de-scope 15538 as the ticket does not currently have a clear 
definition of done. Unless others disagree, we can remove the fix version via 
lazy consensus in a couple days. That leaves us with a well-defined set of 
tickets that are making progress.

Re: the next question:
"Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there is 
no sudden burst in the number of issues."

This is a great question for all on the list. Please consider what follows as 
my interpretation of our current status relative to the project's Release 
Lifecycle doc (and all "we/you" pronouns collective): 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle

We're currently meeting all criteria for the Beta phase except "No flaky tests" 
and a small number of known bugs (eg., 16307, 16078). The good news is we have 
the tickets in both categories identified (discussed earlier in this thread), 
and they don't appear to be a large amount of work - potentially with the 
exception of CASSANDRA-16078: Performance regression for queries accessing 
multiple rows. The ticket reports a 39% perf regression for queries fetching 
multiple rows in a partition via IN clauses – a major regression that should 
block release until understood/fixed. Caleb's working on this now.

Once those issues and the validation epics that are now in review are wrapped 
(which look like a few weeks' work if contributors can jump on the flaky test 
tickets), we'll have met our criteria for graduating beta.

The definition of an RC release is that any SHA we cut an RC build from may 
legitimately be the SHA declared "Apache Cassandra 4.0.0." This is where it 
gets real. When the project declares a build "RC," we're staking our collective 
credibility on it and recommend that users upgrade to a build that received 
this designation.

I feel very good about where 4.0 is at. We've all surfaced and resolved a large 
number of important issues. We've enhanced the project's testing infrastructure 
to broaden the surface covered, which reduces the probability of unknown 
unknowns. And we've collectively developed toolchains for large-scale 
verification, including of existing live clusters via diff.

After beta’s complete, the next chasm to cross seems like our own collective 
willingness to deploy and operate Cassandra 4.0 clusters in production. Once 
we're at RC, willing to do so, and to recommend users do the same, I think 
we'll have hit our definition of done.

As we wrap up the remaining beta issues and flaky tests, now's a good time for 
that RC gut check. If there's a remaining issue that would prevent you from 
running trunk in a prod environment, please file it and raise attention - it'll 
help us finish polishing the release. And if there isn't - deploy it!

We still need to finish the remaining bugs in scope and get tests reliably 
green. But it feels good to be this close.

– Scott

________________________________________
From: Benjamin Lerer <benjamin.le...@datastax.com>
Sent: Tuesday, January 19, 2021 1:54 AM
To: dev@cassandra.apache.org
Subject: Re: [DISCUSS] Revisiting the quality testing epic scope

Thank you for your reply, Scott.

My understanding is that Alexander is moving forward on CASSANDRA-15580
(Repair)  and that Andres is focussing with Caleb on the tickets of
CASSANDRA-15579 (Distributed Read/Write Path). The biggest unknown here
seems to be CASSANDRA-16262 as you mentioned.

Regarding CASSANDRA-15582 (Metrics), I shifted my focus toward helping with
reviews for the release candidate. By consequence, outside of 2 patches
created by  Sumanth during the holidays, the epic has not been moving
forward.

the silver lining is that it shouldn’t be long before the others wrap up.
>

Do you have a timeframe in mind for releasing 4.0 GA? Assuming that there
is no sudden burst in the number of issues.

We do have several flaky test tickets that could use attention, though
>

I believe that Adam, Berenguer and Brandon have started focusing on them.

On Sat, Jan 16, 2021 at 10:49 PM Scott Andreas <sc...@paradoxica.net> wrote:

> Thanks for raising the question, Benjamin! Notes on a few tickets inline
> below.
>
> Non-Blocking:
> – CASSANDRA-15537 Local Read/Write Path: Upgrade and Diff Test
> I think it’s reasonable to consider this ticket complete. Yifan and others
> have worked to execute several dozen diff tests and while I’m sure others
> will continue, it’s reasonable to say cassandra-diff has been used to
> compare 3.0 vs. 4.0 clusters with a wide variety of data models. I’ll check
> with Yifan on Tuesday re: updating the status of the ticket. It would be
> wonderful to hear of diff runs and experience from additional contributors
> if others can share.
>
> – CASSANDRA-15584 Tooling - External Ecosystem
> Great collaboration on this one (including issues filed arising from this
> coverage, such as a recent ticket related to Medusa).
>
> Blocking GA:
> – CASSANDRA-15579 Distributed Read/Write Path
> The coordination and replication subtasks (16180, 16181) are making good
> progress. I’ll check with Caleb and David on 16262 (the fuzz testing
> subtask on Tuesday).
>
> – CASSANDRA-15581 Compaction
> Most of these are perf tests rather than development tasks, though the
> ones complete are listed as Patch Available. I’ll check with Yifan if it’d
> make sense to move those for which no planned work remains to Resolved. I
> don’t think there’s a lot left here.
>
> – CASSANDRA-15538 Local Read/Write Path - Other Areas
> Will see if anything specific is planned, as scope is relatively undefined.
>
> With the exception of 15538, most of these look to be moving along or
> nearly complete. I don’t think I’d shift others aside from it into the
> non-blocking category - but the silver lining is that it shouldn’t be long
> before the others wrap up.
>
> We do have several flaky test tickets that could use attention, though —
> these may be quick to push through if anyone is able to pick them up:
>
> – CASSANDRA-16236: Fix flaky testTrackMaxDeletionTime
> – CASSANDRA-16238: Fix flaky test
> test_insert_data_during_replace_same_address -
> replace_address_test.TestReplaceAddress
> – CASSANDRA-16239: Fix flaky test
> org.apache.cassandra.distributed.test.NetstatsRepairStreamingTest
> testWithCompressionDisabled
> – CASSANDRA-16317: Fix flaky test incompleteCommit -
> org.apache.cassandra.distributed.test.CASTest
> – CASSANDRA-16355: Fix flaky test incompletePropose -
> org.apache.cassandra.distributed.test.CASTest
> – CASSANDRA-16382: Fix flaky
> LongSharedExecutorPoolTest.testPromptnessOfExecution
> – CASSANDRA-16358: Minor Flakiness in
> ProxyHandlerConnectionsTest#testExpireSomeFromBatch
> – CASSANDRA-16229: Flaky jvm-dtest:
> org.apache.cassandra.distributed.test.ring.NodeNotInRingTest.nodeNotInRingTest
> – CASSANDRA-16061:
> transient_replication_ring_test.py::TestTransientReplicationRing::test_move_forwards_and_cleanup
>
> Cheers,
>
> – Scott
>
> > On Jan 14, 2021, at 9:05 AM, Benjamin Lerer <benjamin.le...@datastax.com>
> wrote:
> >
> > Hi everybody,
> >
> > As discussed before the holidays, it might make sense to revisit the
> scope
> > of the quality testing tickets for 4.0 GA to ensure that the 4.0 release
> is
> > not held for longer than necessary.
> >
> > The current status of the quality testing tasks are the following:
> >
> > *DONE:*
> >
> > * CASSANDRA-15583 <https://issues.apache.org/jira/browse/CASSANDRA-15583
> >
> > Tooling, Bundled and First Party*
> > CASSANDRA-15586 <https://issues.apache.org/jira/browse/CASSANDRA-15586>
> > Cluster Setup and Maintenance
> > CASSANDRA-15587 <https://issues.apache.org/jira/browse/CASSANDRA-15587>
> > Platforms and Runtimes
> >
> >
> > *NON BLOCKING:*
> >
> > The goals of the following ticket have been reached. Once GA is closed
> they
> > will be marked as done.
> >
> > CASSANDRA-15537 <https://issues.apache.org/jira/browse/CASSANDRA-15537>
> > Local Read/Write Path: Upgrade and Diff Test
> > CASSANDRA-15584 <https://issues.apache.org/jira/browse/CASSANDRA-15584>
> > Tooling - External Ecosystem
> >
> > If I understood Jordan comment correctly on the following ticket, its
> > should also not be a blocker for 4.0
> > CASSANDRA-15585 <https://issues.apache.org/jira/browse/CASSANDRA-15585>
> > Test Frameworks, Tooling, Infra / Automation
> >
> > *BLOCKING GA:*
> >
> > CASSANDRA-15579 <https://issues.apache.org/jira/browse/CASSANDRA-15579>
> > Distributed Read/Write Path
> >    4 sub-tasks: 1 resolved, 2 in progress, 1 open
> >
> > CASSANDRA-15580 <https://issues.apache.org/jira/browse/CASSANDRA-15580>
> > Repair
> >    Test scenarios are ready, working on integrating them to circle-ci
> >
> > CASSANDRA-15581 <https://issues.apache.org/jira/browse/CASSANDRA-15581>
> > Compaction
> >    9 sub-tasks: 5 patch available, 1 review in progress, 3 triage needed
> >
> > CASSANDRA-15582 <https://issues.apache.org/jira/browse/CASSANDRA-15582>
> > Metrics
> >   16 sub-tasks: 9 resolved, 5 patch available, 5 open
> >
> > CASSANDRA-15588 <https://issues.apache.org/jira/browse/CASSANDRA-15588>
> > Cluster Upgrade
> > 6 sub-tasks: 4 resolved, 1 in progress, 1 open
> > CASSANDRA-15538 <https://issues.apache.org/jira/browse/CASSANDRA-15538>
> > Local Read/Write Path No progress has been made on that ticket. The
> > conclusion so far is that Harry is our best choice to uncover issues in
> > that area but there is no clear plan on how to move forward.
> > We have made some progress across the quality testing tickets.
> Nevertheless
> > there is still a significant amount of tickets to fix. As our time and
> > resources are limited it might make sense to focus on what we believe are
> > the most critical for 4.0 and relax our constraints on others. For
> example
> > it seems to me that the metrics tickets will mainly help to discover non
> > critical old issues that are not blockers for 4.0. It is clear to me that
> > they should be fixed but that could probably be done for the 4.0.x/4.1
> > release  (I fully volunteer for that :-)). The same could be true for
> some
> > other areas of the code.
> >
> > In my opinion the important questions we would need to answer are:
> >
> >   1. Are there some tickets that we should make non-blocking for 4.0 ?
> >   2. What do we do about CASSANDRA-15538
> >   <https://issues.apache.org/jira/browse/CASSANDRA-15538> Local
> Read/Write
> >   Path?
> >
> > Thanks in advance for your feedback :-)
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to