> > It'd be great to > expand this, but it's been somewhat difficult to do, since last time a > bootstrap test was attempted, it has immediately uncovered enough issues to > keep us busy fixing them for quite some time. Maybe it's about time to try > that again.
I'm going to go with a "yes please". :) On Wed, Nov 3, 2021 at 9:27 AM Oleksandr Petrov <oleksandr.pet...@gmail.com> wrote: > I'll merge 16262 and the Harry blog-post that accompanies it shortly. > Having 16262 merged will significantly reduce the amount of resistance one > has to overcome in order to write a fuzz test. But this, of course, only > covers short/small/unit-test-like tests. > > For longer running tests, I guess for now we will have to rely on folks > (hopefully) running long fuzz tests and reporting issues. But eventually > it'd be great to have enough automation around it so that anyone could do > that and where test results are public. > > In regard to long-running tests, currently with Harry we can run three > kinds of long-running tests: > 1. Stress-like concurrent write workload, followed by periods of quiescence > and then validation > 2. Writes with injected faults, followed by repair and validation > 3. Stress-like concurrent read/write workload with fault injection without > validation, for finding rare edge conditions / triggering possible > exceptions > > Which means that quorum read and write paths (for all kinds of schemas, > including all possible kinds of read and write queries), compactions, > repairs, read-repairs and hints are covered fairly well. However things > like bootstrap and other kinds of range movements aren't. It'd be great to > expand this, but it's been somewhat difficult to do, since last time a > bootstrap test was attempted, it has immediately uncovered enough issues to > keep us busy fixing them for quite some time. Maybe it's about time to try > that again. > > For short tests, you can think of Harry as a tool to save you time and > allow focusing on higher-level test meaning rather than creating schema and > coming up with specific values to insert/select. > > Thanks > --Alex > > > > On Tue, Nov 2, 2021 at 5:30 PM Ekaterina Dimitrova <e.dimitr...@gmail.com> > wrote: > > > Did I hear my name? đ > > Sorry Josh, you are wrong :-) 2 out of 30 in two months were real bugs > > discovered by pflaky tests and one of them was very hard to hit. So > 6-7%. I > > think that report I sent back then didnât come through so the topic was > > cleared in a follow up mail by Benjamin; with a lot of sweat but we kept > to > > the promised 4.0 standard. > > > > Now back to this topic: > > - green CI without enough test coverage is nothing more than green CI > > unfortunately to me. I know this is an elephant but I wonât sleep well > > tonight if I donât mention it. > > - I believe the looping of tests mentioned by Berenguer can help for > > verifying no new weird flakiness is introduced by new tests added. And of > > course it helps a lot during fixing flaky tests, I think thatâs clear. > > > > I think that it would be great if each such test > > > > (or > > > > > test group) was guaranteed to have a ticket and some preliminary > > > analysis > > > > > was done to confirm it is just a test problem before releasing the > > new > > > > > version > > > > Probably not bad idea. Preliminary analysis. But we need to get into the > > cadence of regular checking our CI; divide and conquer on regular basis > > between all of us. Not to mention it is way easier to follow up recently > > introduced issues with the people who worked on stuff then trying to find > > out what happened a year ago in a rush before a release. I agree it is > not > > about the number but what stays behind it. > > > > Requiring all tests to run pre every merge, easily we can add this in > > circle but there are many people who donât have access to high resources > so > > again they wonât be able to run absolutely everything. At the end > > everything is up to the diligence of the reviewers/committers. Plus > > official CI is Jenkins and we know there are different infra related > > failures in the different CIs. Not an easy topic, indeed. I support > running > > all tests, just having in mind all the related issues/complications. > > > > I would say in my mind upgrade tests are particularly important to be > green > > before a release, too. > > > > Seems to me we have the tools, but now it is time to organize the rhythm > in > > an efficient manner. > > > > Best regards, > > Ekaterina > > > > > > On Tue, 2 Nov 2021 at 11:06, Joshua McKenzie <jmcken...@apache.org> > wrote: > > > > > To your point Jacek, I believe in the run up to 4.0 Ekaterina did some > > > analysis and something like 18% (correct me if I'm wrong here) of the > > test > > > failures we were considering "flaky tests" were actual product defects > in > > > the database. With that in mind, we should be uncomfortable cutting a > > > release if we have 6 test failures since there's every likelihood one > of > > > them is a surfaced bug. > > > > > > ensuring our best practices are followed for every merge > > > > > > I totally agree but I also don't think we have this codified (unless > I'm > > > just completely missing something - very possible! ;)) Seems like we > have > > > different circle configs, different sets of jobs being run, Harry / > > Hunter > > > (maybe?) / ?? run on some but not all commits and/or all branches, > > > manual performance testing on specific releases but nothing surfaced > > > formally to the project as a reproducible suite like we used to have > > years > > > ago (primitive though it was at the time with what it covered). > > > > > > If we *don't* have this clarified right now, I think there's > significant > > > value in enumerating and at least documenting what our agreed upon best > > > practices are so we can start holding ourselves and each other > > accountable > > > to that bar. Given some of the incredible but sweeping work coming down > > the > > > pike, this strikes me as a thing we need to be proactive and vigilant > > about > > > so as not to regress. > > > > > > ~Josh > > > > > > On Tue, Nov 2, 2021 at 3:49 AM Jacek Lewandowski < > > > lewandowski.ja...@gmail.com> wrote: > > > > > > > > > > > > > we already have a way to confirm flakiness on circle by running the > > > test > > > > > repeatedly N times. Like 100 or 500. That has proven to work very > > well > > > > > so far, at least for me. #collaborating #justfyi > > > > > > > > > > > > > It does not prove that it is the test flakiness. It still can be a > bug > > in > > > > the code which occurs intermittently under some rare conditions > > > > > > > > > > > > - - -- --- ----- -------- ------------- > > > > Jacek Lewandowski > > > > > > > > > > > > On Tue, Nov 2, 2021 at 7:46 AM Berenguer Blasi < > > berenguerbl...@gmail.com > > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > we already have a way to confirm flakiness on circle by running the > > > test > > > > > repeatedly N times. Like 100 or 500. That has proven to work very > > well > > > > > so far, at least for me. #collaborating #justfyi > > > > > > > > > > On the 60+ failures it is not as bad as it looks. Let me explain. I > > > have > > > > > been tracking failures in 4.0 and trunk daily, it's grown as a > habit > > in > > > > > me after the 4.0 push. And 4.0 and trunk were hovering around <10 > > > > > failures solidly (you can check jenkins ci graphs). The random > bisect > > > or > > > > > fix was needed leaving behind 3 or 4 tests that have defeated > > already 2 > > > > > or 3 committers, so the really tough guys. I am reasonably > convinced > > > > > once the 60+ failures fix merges we'll be back to the <10 failures > > with > > > > > relative little effort. > > > > > > > > > > So we're just in the middle of a 'fix' but overall we shouldn't be > as > > > > > bad as it looks now as we've been quite good at keeping CI > green-ish > > > imo. > > > > > > > > > > Also +1 to releasable branches, which whatever we settle it means > it > > is > > > > > not a wall of failures, bc of reasons explained like the hidden > costs > > > etc > > > > > > > > > > My 2cts. > > > > > > > > > > On 2/11/21 6:07, Jacek Lewandowski wrote: > > > > > >> I donât think means guaranteeing there are no failing tests > > (though > > > > > >> ideally this would also happen), but about ensuring our best > > > practices > > > > > are > > > > > >> followed for every merge. 4.0 took so long to release because of > > the > > > > > amount > > > > > >> of hidden work that was created by merging work that didnât meet > > the > > > > > >> standard for release. > > > > > >> > > > > > > Tests are sometimes considered flaky because they fail > > intermittently > > > > but > > > > > > it may not be related to the insufficiently consistent test > > > > > implementation > > > > > > and can reveal some real problem in the production code. I saw > that > > > in > > > > > > various codebases and I think that it would be great if each such > > > test > > > > > (or > > > > > > test group) was guaranteed to have a ticket and some preliminary > > > > analysis > > > > > > was done to confirm it is just a test problem before releasing > the > > > new > > > > > > version > > > > > > > > > > > > Historically we have also had significant pressure to backport > > > features > > > > > to > > > > > >> earlier versions due to the cost and risk of upgrading. If we > > > maintain > > > > > >> broader version compatibility for upgrade, and reduce the risk > of > > > > > adopting > > > > > >> newer versions, then this pressure is also reduced > significantly. > > > > Though > > > > > >> perhaps we will stick to our guns here anyway, as there seems to > > be > > > > > renewed > > > > > >> pressure to limit work in GA releases to bug fixes exclusively. > It > > > > > remains > > > > > >> to be seen if this holds. > > > > > > > > > > > > Are there any precise requirements for supported upgrade and > > > downgrade > > > > > > paths? > > > > > > > > > > > > Thanks > > > > > > - - -- --- ----- -------- ------------- > > > > > > Jacek Lewandowski > > > > > > > > > > > > > > > > > > On Sat, Oct 30, 2021 at 4:07 PM bened...@apache.org < > > > > bened...@apache.org > > > > > > > > > > > > wrote: > > > > > > > > > > > >>> How do we define what "releasable trunk" means? > > > > > >> For me, the major criteria is ensuring that work is not merged > > that > > > is > > > > > >> known to require follow-up work, or could reasonably have been > > known > > > > to > > > > > >> require follow-up work if better QA practices had been followed. > > > > > >> > > > > > >> So, a big part of this is ensuring we continue to exceed our > > targets > > > > for > > > > > >> improved QA. For me this means trying to weave tools like Harry > > and > > > > the > > > > > >> Simulator into our development workflow early on, but weâll see > > how > > > > well > > > > > >> these tools gain broader adoption. This also means focus in > > general > > > on > > > > > >> possible negative effects of a change. > > > > > >> > > > > > >> I think we could do with producing guidance documentation for > how > > to > > > > > >> approach QA, where we can record our best practices and evolve > > them > > > as > > > > > we > > > > > >> discover flaws or pitfalls, either for ergonomics or for bug > > > > discovery. > > > > > >> > > > > > >>> What are the benefits of having a releasable trunk as defined > > here? > > > > > >> If we want to have any hope of meeting reasonable release > cadences > > > > _and_ > > > > > >> the high project quality we expect today, then I think a > > ~shippable > > > > > trunk > > > > > >> policy is an absolute necessity. > > > > > >> > > > > > >> I donât think means guaranteeing there are no failing tests > > (though > > > > > >> ideally this would also happen), but about ensuring our best > > > practices > > > > > are > > > > > >> followed for every merge. 4.0 took so long to release because of > > the > > > > > amount > > > > > >> of hidden work that was created by merging work that didnât meet > > the > > > > > >> standard for release. > > > > > >> > > > > > >> Historically we have also had significant pressure to backport > > > > features > > > > > to > > > > > >> earlier versions due to the cost and risk of upgrading. If we > > > maintain > > > > > >> broader version compatibility for upgrade, and reduce the risk > of > > > > > adopting > > > > > >> newer versions, then this pressure is also reduced > significantly. > > > > Though > > > > > >> perhaps we will stick to our guns here anyway, as there seems to > > be > > > > > renewed > > > > > >> pressure to limit work in GA releases to bug fixes exclusively. > It > > > > > remains > > > > > >> to be seen if this holds. > > > > > >> > > > > > >>> What are the costs? > > > > > >> I think the costs are quite low, perhaps even negative. Hidden > > work > > > > > >> produced by merges that break things can be much more costly > than > > > > > getting > > > > > >> the work right first time, as attribution is much more > > challenging. > > > > > >> > > > > > >> One cost that is created, however, is for version compatibility > as > > > we > > > > > >> cannot say âwell, this is a minor version bump so we donât need > to > > > > > support > > > > > >> downgradeâ. But I think we should be investing in this anyway > for > > > > > operator > > > > > >> simplicity and confidence, so I actually see this as a benefit > as > > > > well. > > > > > >> > > > > > >>> Full disclosure: running face-first into 60+ failing tests on > > trunk > > > > > >> I have to apologise here. CircleCI did not uncover these > problems, > > > > > >> apparently due to some way it resolves dependencies, and so I am > > > > > >> responsible for a significant number of these and have been > quite > > > sick > > > > > >> since. > > > > > >> > > > > > >> I think a push to eliminate flaky tests will probably help here > in > > > > > future, > > > > > >> though, and perhaps the project needs to have some (low) > threshold > > > of > > > > > flaky > > > > > >> or failing tests at which point we block merges to force a > > > correction. > > > > > >> > > > > > >> > > > > > >> From: Joshua McKenzie <jmcken...@apache.org> > > > > > >> Date: Saturday, 30 October 2021 at 14:00 > > > > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > > > >> Subject: [DISCUSS] Releasable trunk and quality > > > > > >> We as a project have gone back and forth on the topic of quality > > and > > > > the > > > > > >> notion of a releasable trunk for quite a few years. If people > are > > > > > >> interested, I'd like to rekindle this discussion a bit and see > if > > > > we're > > > > > >> happy with where we are as a project or if we think there's > steps > > we > > > > > should > > > > > >> take to change the quality bar going forward. The following > > > questions > > > > > have > > > > > >> been rattling around for me for awhile: > > > > > >> > > > > > >> 1. How do we define what "releasable trunk" means? All reviewed > > by M > > > > > >> committers? Passing N% of tests? Passing all tests plus some > other > > > > > metrics > > > > > >> (manual testing, raising the number of reviewers, test coverage, > > > usage > > > > > in > > > > > >> dev or QA environments, etc)? Something else entirely? > > > > > >> > > > > > >> 2. With a definition settled upon in #1, what steps, if any, do > we > > > > need > > > > > to > > > > > >> take to get from where we are to having *and keeping* that > > > releasable > > > > > >> trunk? Anything to codify there? > > > > > >> > > > > > >> 3. What are the benefits of having a releasable trunk as defined > > > here? > > > > > What > > > > > >> are the costs? Is it worth pursuing? What are the alternatives > > (for > > > > > >> instance: a freeze before a release + stabilization focus by the > > > > > community > > > > > >> i.e. 4.0 push or the tock in tick-tock)? > > > > > >> > > > > > >> Given the large volumes of work coming down the pike with CEP's, > > > this > > > > > seems > > > > > >> like a good time to at least check in on this topic as a > > community. > > > > > >> > > > > > >> Full disclosure: running face-first into 60+ failing tests on > > trunk > > > > when > > > > > >> going through the commit process for denylisting this week > brought > > > > this > > > > > >> topic back up for me (reminds me of when I went to merge CDC > back > > in > > > > 3.6 > > > > > >> and those test failures riled me up... I sense a pattern ;)) > > > > > >> > > > > > >> Looking forward to hearing what people think. > > > > > >> > > > > > >> ~Josh > > > > > >> > > > > > > > > > > > --------------------------------------------------------------------- > > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > > > > > > > > > > -- > alex p >