Hi Jason,
I agree - we should release with the dataloss bug fix. I went over the gist - 
apart from the Python errors and test teardown failures, there seem to be a few 
failures that look legitimate. Any chance you can run the dtests on the 
previous release SHAs and compare the dtest failures? If they're the same / 
similar, we know at least we're at parity with the previous release :)
Dinesh 

    On Tuesday, July 24, 2018, 8:18:50 AM PDT, Jason Brown 
<jasedbr...@gmail.com> wrote:  
 
 TL;DR We are in a better place than we were for the 3.0.16 and 3.11.2
releases. The current fails are not fatal, although they warrant
investigation. My opinion is that due the critical data loss bugs that are
fixed by CASSANDRA-14513 and CASSANDRA-14515, we should cut the builds now.

I've run the HEAD of the 3.0 and 3.11 branches vs the 3.0.16 and 3.11.2
release shas, and there are far less failing dtests now. In comparison:

- 3.11
-- HEAD - 5-6 failing tests
-- 3.11.2 - 18-20 failures

- 3.0
-- HEAD - 14-16 failures
-- 3.0.16 - 22-25 failures

The raw dump of my work can be found here:
https://gist.github.com/jasobrown/e7ecf6d0bf875d1f4a08ee06ac7eaba0. I've
applied no effort to clean it up, but it's available (includes links to the
circleci runs). I haven't completed an exhautive analysis of the failures
to see how far they go back as things become tricky (or, at least, very
time intensive to research) with the pytest/python-3 update with
CASSANDRA-14134. Thus some of the failures might be in the dtests
themselves (I suspect a couple of the failures are), but most are proabaly
legit failures.

As this thread is about cutting the releases, I'll save any significiant
analysis for a followup thread. I will say that the current failures are a
subset of the previous release's failures, and those failures are not data
loss bugs.

Overall, I feel far more comfortable getting the data loss fixes out
without any further delay than waiting for a few minor fixes. I will triage
the dtest failures over the coming days. There are some open tickets, and
I'll try to corral those with any new ones.

Thanks,

-Jason


On Mon, Jul 23, 2018 at 10:26 AM, dinesh.jo...@yahoo.com.INVALID <
dinesh.jo...@yahoo.com.invalid> wrote:

> I can help out with the triage / rerunning dtests if needed.
> Dinesh
>
>    On Monday, July 23, 2018, 10:22:18 AM PDT, Jason Brown <
> jasedbr...@gmail.com> wrote:
>
>  I spoke with some people over here, and I'm going to spend a day doing a
> quick triage of the failing dtests. There are some fixes for data loss bugs
> that are critical to get out in these builds, so I'll ensure the current
> failures are within an acceptable level of flakey-ness in order to unblock
> those fixes.
>
> Will have an update shortly ...
>
> -Jason
>
> On Mon, Jul 23, 2018 at 9:18 AM, Jason Brown <jasedbr...@gmail.com> wrote:
>
> > Hi all,
> >
> > First, thanks Joey for running the tests. Your pass/fail counts are
> > basically what in line with what I've seen for the last several months.
> (I
> > don't have an aggregated list anywhere, just observations from recent
> runs).
> >
> > Second, it's beyond me why there's such inertia to actually cutting a
> > release. We're getting up to almost *six months* since the last release.
> > Are there any grand objections at this point?
> >
> > Thanks,
> >
> > -Jason
> >
> >
> > On Tue, Jul 17, 2018 at 4:01 PM, Joseph Lynch <joe.e.ly...@gmail.com>
> > wrote:
> >
> >> We ran the tests against 3.0, 2.2 and 3.11 using circleci and there are
> >> various failing dtests but all three have green unit tests.
> >>
> >> 3.11.3 tentative (31d5d87, test branch
> >> <https://circleci.com/gh/vinaykumarchella/cassandra/tree/
> >> cassandra_3.11_temp_testing>,
> >> unit tests <https://circleci.com/gh/vinaykumarchella/cassandra/258>
> >> pass, 5
> >> <https://circleci.com/gh/vinaykumarchella/cassandra/256> and 6
> >> <https://circleci.com/gh/vinaykumarchella/cassandra/256#
> >> tests/containers/8>
> >> dtest failures)
> >> 3.0.17 tentative (d52c7b8, test branch
> >> <https://circleci.com/gh/jolynch/workflows/cassandra/tree/3.0-testing>,
> >> unit
> >> tests <https://circleci.com/gh/jolynch/cassandra/110> pass, 14
> >> <https://circleci.com/gh/jolynch/cassandra/112> and 15
> >> <https://circleci.com/gh/jolynch/cassandra/111> dtest failures)
> >> 2.2.13 tentative (3482370, test branch
> >> <https://circleci.com/gh/sumanth-pasupuleti/workflows/cassan
> >> dra/tree/2.2-testing>,
> >> unit tests <https://circleci.com/gh/sumanth-pasupuleti/cassandra/20>
> >> pass, 9
> >> <https://circleci.com/gh/sumanth-pasupuleti/cassandra/21> and 10
> >> <https://circleci.com/gh/sumanth-pasupuleti/cassandra/22#
> >> tests/containers/8>
> >> dtest failures)
> >>
> >> It looks like many (~6) of the failures in 3.0.x are related to
> >> snapshot_test.TestArchiveCommitlog. I'm not sure if this is abnormal.
> >>
> >> I don't see a good historical record to know if these are just flakes,
> but
> >> if we only want to go on green builds perhaps we can either disable the
> >> flakey tests or fix them up? If someone feels strongly we should fix
> >> particular tests up please link a jira and I can take a whack at some of
> >> them.
> >>
> >> -Joey
> >>
> >> On Tue, Jul 17, 2018 at 9:35 AM Michael Shuler <mich...@pbandjelly.org>
> >> wrote:
> >>
> >> > On 07/16/2018 11:27 PM, Jason Brown wrote:
> >> > > Hey all,
> >> > >
> >> > > The recent builds were -1'd, but it appears the issues have been
> >> resolved
> >> > > (2.2.13 with CASSANDRA-14423, and 3.0.17 / 3.11.3 reverting
> >> > > CASSANDRA-14252). Can we go ahead and reroll now?
> >> >
> >> > Could someone run through the tests on 2.2, 3.0, 3.11 branches and
> link
> >> > them?  Thanks!
> >> >
> >> > Michael
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >> >
> >> >
> >>
> >
> >
>
>
  

Reply via email to