Re: [DISCUSS] Considering when to push tickets out of 4.0

Sam Tunnicliffe Wed, 17 Jun 2020 03:28:17 -0700


> On 17 Jun 2020, at 09:36, Benedict Elliott Smith <[email protected]> wrote:
> 
> If these tickets are the only blockers I agree with Scott's assessment.  We 
> could even disable the v5 protocol if we're keen to get it out of the door 
> today, and only enable it once 15299 lands.  I don't personally think the 
> other two tickets would be impossible to land during a beta either, even if 
> they are API affecting - they should be backwards compatible after all.


I feel the same way, though rather than disabling v5 we could just decide that 
removing its beta status needn't be a requirement for 4.0. As has been 
mentioned in previous threads, we aren't planning to remove v4, or even v3, 
support in 4.0 so keeping v5 as a beta protocol version would allow time for 
the drivers to implement full support before promoting it to a fully supported 
version. Making such a change, which only modifies the status of the version, 
seems reasonable in a minor provided that the beta version has been thoroughly 
validated.  

As far as I'm aware, neither the java, python nor gocql drivers currently 
support the existing checksumming feature from CASSANDRA-13304. So I'm 100% in 
agreement with Benedict that we should revert this before beta. The remaining 
decision is whether we feel it's appropriate and desirable to release v5 
without any additional mechanism for ensuring integrity. If so, then we could 
punt 15299 out of 4.0/v5 entirely; if not then we either hold off cutting 4.0 
beta until 15299 is available or we remove the expectation that v5 will come 
out of beta in 4.0 

Responding to Mick from earlier in the thread:

> I understand the importance of CASSANDRA-15299. But it hasn't had any
> comments in 12 twelve days, and in this stage of the feature freeze, with
> so few alpha bugs remaining, that's a long time. Sam, can you speak to its
> eta?

That is way too long without any visible progress and I apologise for the radio 
silence. I have a rather small amount of tidying up to do, but otherwise I 
think what I have is ready for review and the client facing aspects are stable. 
I'm still actively working on a test harness and have some bulk renaming to do 
to, but I don't think that these should hold up the review too much. I'll aim 
to push an update to the JIRA tomorrow.


> 
>> [Josh] however historically on the project we've had a large number of 
>> defects surfaced by a diverse collection of users
>> [Scott] this was in part a case of a pressing need to investigate a 
>> potential 3.0 data resurrection issue drawing attention from 4.0
> 
> This is a really common theme with 4.0, whose timeline has been hit primarily 
> because of issues still circulating with the 3.0 line that were never 
> discovered by testing or user reports during beta, RC, or four years of 
> releases.  My personal view, informed by this, is that we _didn't find_ the 
> most serious bugs historically, even with user reports, and we need to be 
> honest with ourselves about this in order to plot a route forwards to high 
> quality releases.  We cannot _depend_ on community feedback for determining 
> release quality; we need a plan to consciously deliver it ourselves.
> 
> 
> On 17/06/2020, 05:12, "Scott Andreas" <[email protected]> wrote:
> 
>    I'll take attribution for the delay in comment on 15299; this was in part 
> a case of a pressing need to investigate a potential 3.0 data resurrection 
> issue drawing attention from 4.0.
> 
>    I agree with the statement that we shouldn't consider protocol V5 ready 
> for finalization in its current form. If we feel that this ticket alone is 
> what delays release of beta and are comfortable with a release note caveating 
> that one V5 ticket remains before the new protocol is finalized, that could 
> be a reasonable compromise.
> 
>    I don't have especially strong feelings re: 15146 and 14825 and think 
> these are reasonable candidates for deferral.
> 
>    ________________________________________
>    From: Joshua McKenzie <[email protected]>
>    Sent: Tuesday, June 16, 2020 4:08 PM
>    To: [email protected]
>    Subject: Re: [DISCUSS] Considering when to push tickets out of 4.0
> 
>    I completely respect and agree with the need for a drumbeat to change our
>    culture around testing and quality; I also agree we haven't done much to
>    materially change that uniquely to 4.0. The 40_quality_testing epic is our
>    first step in that direction though I have some personal concerns about
>    leaning on bespoke manual testing for quality since we humans are
>    infinitely fallible. :)
> 
>    What elicited that response from me is the claim that we haven't yet tested
>    the software, implicitly invalidating the time and energy the community has
>    put into that thus far. I wouldn't argue that we've adequately tested for a
>    GA release, certainly, but we're discussing beta in this thread. As a
>    project, the advice we have about the testing and usage of the beta is
>    something along the lines of "use this in test/QA and only in cases where
>    minutes of downtime is acceptable." Perhaps we should consider revising the
>    release lifecycle on the wiki if this is something we're not aligned on?
> 
>    To your point above, the problems found to date were largely with 3.0 and
>    found by user report and not by project developer testing. The sooner we
>    can get the 4.0 beta into the hands of the community, the sooner we can get
>    more of those reports while we also work to broaden and deepen our
>    programmatic testing frameworks and platforms. (To acknowledge: I presume
>    that a majority of the user testing that surfaced defects in 3.0 came from
>    one large user's investment of time and resources, however historically on
>    the project we've had a large number of defects surfaced by a diverse
>    collection of users and I'd like to see us move in that direction again for
>    the long-term health of the project. Hence my attempts to move us towards
>    beta and take on an awareness campaign and call to action for the community
>    to engage in testing.)
> 
> 
>    On Tue, Jun 16, 2020 at 6:37 PM Benedict Elliott Smith 
> <[email protected]>
>    wrote:
> 
>>> Further, we have thousands of tests across all our suites
>> 
>> I think most here would agree that our testing remains inadequate, and
>> that this (modest, even in pure numerical terms for such a large project)
>> number of often poorly-written unit tests does not really change that fact.
>> 
>> Most of the problems found to date have been found with 3.0, not with 4.0,
>> and found by user report.  We agreed a long time ago that we would aim for
>> 4.0 to be a more stable release than any prior.  Today I think the only
>> reason that might be true is the amount of work invested in fixing problems
>> found in _earlier releases_, not due to verification of 4.0.
>> 
>> I say this not to influence the decision about when and what lands in
>> beta, only to ensure we stay honest with ourselves about our progress on
>> quality.  I hope the software itself is higher quality today, but I do not
>> believe it is honest to (yet) claim that our testing is significantly
>> higher quality than those releases we all agree were inadequate.  There
>> exists some wider external use case testing, but being mostly invisible to
>> the community it is unclear how much broader our coverage is with these
>> included.
>> 
>> On 16/06/2020, 23:08, "David Capwell" <[email protected]> wrote:
>> 
>>    Inline
>> 
>>> On Jun 16, 2020, at 2:17 PM, Joshua McKenzie <[email protected]>
>> wrote:
>>> 
>>>> 
>>>> we still produce incorrect results as shown by CASSANDRA-15313;
>> this is a
>>>> correctness issue, so must be a blocker for v5 protocol.
>>> 
>>> That makes complete sense; I'd somehow missed the incorrect results
>> aspect
>>> in trying to get context on the work. I'd be eager to hear about
>> progress
>>> on it as well.
>>> 
>>> Regarding the question of "why would users test if we haven't tested
>> yet",
>>> I respectfully disagree both on the assertion we haven't tested yet
>> as well
>>> as on the distinction between an "us vs. them" in the community.
>> We're all
>>> users and participants in the Cassandra community and ecosystem so
>> anyone
>>> downloading the DB to test it out is just as vital as one of us from
>> the
>>> dev list, committer list, or pmc list testing out the DB.
>> 
>>    I apologies if I came off discriminatory, I will try to absorb your
>> words carefully; thank you for correcting my behavior.
>> 
>>> While we can
>>> reasonably expect a dev paid full time working on the project with a
>> large
>>> amount of infrastructure doing testing to be crucial to getting a
>> release
>>> out and doing certain kinds of testing, there are literally
>> thousands of
>>> different companies out in the world basing their critical
>> infrastructure
>>> on this project and them testing out their use-cases and migration
>> is just
>>> as critical to this release being ready. It takes a village.
>> 
>>    I do agree that user validation is important for the release, I was
>> mostly trying to question why start here before the testing work in JIRA is
>> complete.  Maybe I am in the wrong, I have been heads down working on data
>> corruption issues in 3.x; I have become more risk adverse.
>> 
>>> 
>>> Further, we have thousands of tests across all our suites, hundreds
>> of new
>>> use-case testing that has been done against 4.0 at this point, and
>> 30+%
>>> more bugs fixed in this release than 3.0; the blanket assertion that
>> we
>>> haven't tested 4.0 yet doesn't resonate with me. While we haven't
>> done the
>>> entirety of our final 40 beta phase testing yet, testing is
>> constantly
>>> going on against this codebase by both people on the ML and off.
>>> 
>>> Now, if there are major known glaring issues where we have problems
>> that
>>> would prevent users from actually testing out the beta and kicking
>> the
>>> tires, that's a different story entirely and I'd argue those tickets
>> should
>>> be reflected in the alpha phase (see: CASSANDRA-15299 apparently ;) )
>>> 
>>> Does that make sense?
>> 
>>    I have been meaning to ask this, mostly asking people in Slack and
>> this actually confuses me.
>> 
>>    I was working off the assumption that the fix version meant it was a
>> blocker for that release, and that Alpha special cased and would have
>> releases even with blocking issues (which is documented in the Release
>> Lifecycle).  When I ask around I hear that this is not correct and that
>> alpha means “blocks beta”, beta means “blocks RC”, etc (is any of this
>> documented, I couldn’t find any last time I was talking to others about
>> this).
>> 
>>    Now, lets say we close alpha and cut a beta release, my understanding
>> is that tickets which block the next beta release are alpha…. So do we
>> still mark them alpha (even though we won’t have a alpha release)?
>> 
>>    This has been confusing me since beta has a lot of work pending… sorry
>> for not bring this up in a dedicated dev@ thread
>> 
>> 
>>> 
>>> On Tue, Jun 16, 2020 at 4:58 PM Benedict Elliott Smith <
>> [email protected]>
>>> wrote:
>>> 
>>>> So, if it helps matters: I am explicitly -1 the prior version of
>> this work
>>>> due to the technical concerns expressed here and on the ticket.  So
>> we
>>>> either need to revert that patch or incorporate 15299.
>>>> 
>>>> On 16/06/2020, 21:48, "Mick Semb Wever" <[email protected]> wrote:
>>>> 
>>>>> 
>>>>> 2) Alternatively, it's been 3 years, 4 months, 13 days since the
>>>> release of
>>>>> 3.10.0 (the last time we added new features to the DB)
>>>>> 
>>>> 
>>>> 
>>>>   We did tick-tock, pushing feature releases too quickly, and
>> without
>>>>   supporting them for long enough to get stable. And then we've
>> done "a
>>>> la no
>>>>   feature releases" for over 3 years. It feels like the bar went
>> from
>>>> too low
>>>>   to too high.
>>>> 
>>>>   I understand the importance of CASSANDRA-15299. But it hasn't
>> had any
>>>>   comments in 12 twelve days, and in this stage of the feature
>> freeze,
>>>> with
>>>>   so few alpha bugs remaining, that's a long time. Sam, can you
>> speak to
>>>> its
>>>>   eta?
>>>> 
>>>> 
>>>> 
>>>>> 4) If we plan on releasing 4.1 six months after the release of 4.0
>>>> (i.e.
>>>>> calender scope vs. feature scope - not yet agreed upon but an
>>>> option),
>>>> 
>>>> 
>>>> 
>>>>   I like this. I think it's worth appreciating the different
>>>> perspectives of
>>>>   this community: those involved with private clusters that don't
>> rely on
>>>>   official releases, versus those involved with the public and
>> other
>>>> people's
>>>>   clusters. The latter group needs those official releases much
>> more, but
>>>>   this also ties into putting those users more in focus and
>> figuring out
>>>>   where the bar best sits. This isn't meant to divide, we all care
>> and
>>>> voice
>>>>   for the user, but just to utilise the different strengths
>> brought to
>>>> the
>>>>   table.
>>>> 
>>>> 
>>>>> If we want 4.0.0 out faster, the biggest gains would be to get the
>>>> test
>>>>   plans written up and get more people working on automated
>> testing.
>>>> 
>>>> 
>>>>   Yes, 110%.  Though, as long as this continues to improve, as it
>> has,
>>>> does
>>>>   it need to be a blocker on 4.0?
>>>> 
>>>> 
>>>> 
>>>> 
>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>> 
>>>> 
>> 
>> 
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: [email protected]
>>    For additional commands, e-mail: [email protected]
>> 
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
>> 
> 
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: [email protected]
>    For additional commands, e-mail: [email protected]
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Considering when to push tickets out of 4.0

Reply via email to