Hi, we already have a way to confirm flakiness on circle by running the test repeatedly N times. Like 100 or 500. That has proven to work very well so far, at least for me. #collaborating #justfyi
On the 60+ failures it is not as bad as it looks. Let me explain. I have been tracking failures in 4.0 and trunk daily, it's grown as a habit in me after the 4.0 push. And 4.0 and trunk were hovering around <10 failures solidly (you can check jenkins ci graphs). The random bisect or fix was needed leaving behind 3 or 4 tests that have defeated already 2 or 3 committers, so the really tough guys. I am reasonably convinced once the 60+ failures fix merges we'll be back to the <10 failures with relative little effort. So we're just in the middle of a 'fix' but overall we shouldn't be as bad as it looks now as we've been quite good at keeping CI green-ish imo. Also +1 to releasable branches, which whatever we settle it means it is not a wall of failures, bc of reasons explained like the hidden costs etc My 2cts. On 2/11/21 6:07, Jacek Lewandowski wrote: >> I don’t think means guaranteeing there are no failing tests (though >> ideally this would also happen), but about ensuring our best practices are >> followed for every merge. 4.0 took so long to release because of the amount >> of hidden work that was created by merging work that didn’t meet the >> standard for release. >> > Tests are sometimes considered flaky because they fail intermittently but > it may not be related to the insufficiently consistent test implementation > and can reveal some real problem in the production code. I saw that in > various codebases and I think that it would be great if each such test (or > test group) was guaranteed to have a ticket and some preliminary analysis > was done to confirm it is just a test problem before releasing the new > version > > Historically we have also had significant pressure to backport features to >> earlier versions due to the cost and risk of upgrading. If we maintain >> broader version compatibility for upgrade, and reduce the risk of adopting >> newer versions, then this pressure is also reduced significantly. Though >> perhaps we will stick to our guns here anyway, as there seems to be renewed >> pressure to limit work in GA releases to bug fixes exclusively. It remains >> to be seen if this holds. > > Are there any precise requirements for supported upgrade and downgrade > paths? > > Thanks > - - -- --- ----- -------- ------------- > Jacek Lewandowski > > > On Sat, Oct 30, 2021 at 4:07 PM bened...@apache.org <bened...@apache.org> > wrote: > >>> How do we define what "releasable trunk" means? >> For me, the major criteria is ensuring that work is not merged that is >> known to require follow-up work, or could reasonably have been known to >> require follow-up work if better QA practices had been followed. >> >> So, a big part of this is ensuring we continue to exceed our targets for >> improved QA. For me this means trying to weave tools like Harry and the >> Simulator into our development workflow early on, but we’ll see how well >> these tools gain broader adoption. This also means focus in general on >> possible negative effects of a change. >> >> I think we could do with producing guidance documentation for how to >> approach QA, where we can record our best practices and evolve them as we >> discover flaws or pitfalls, either for ergonomics or for bug discovery. >> >>> What are the benefits of having a releasable trunk as defined here? >> If we want to have any hope of meeting reasonable release cadences _and_ >> the high project quality we expect today, then I think a ~shippable trunk >> policy is an absolute necessity. >> >> I don’t think means guaranteeing there are no failing tests (though >> ideally this would also happen), but about ensuring our best practices are >> followed for every merge. 4.0 took so long to release because of the amount >> of hidden work that was created by merging work that didn’t meet the >> standard for release. >> >> Historically we have also had significant pressure to backport features to >> earlier versions due to the cost and risk of upgrading. If we maintain >> broader version compatibility for upgrade, and reduce the risk of adopting >> newer versions, then this pressure is also reduced significantly. Though >> perhaps we will stick to our guns here anyway, as there seems to be renewed >> pressure to limit work in GA releases to bug fixes exclusively. It remains >> to be seen if this holds. >> >>> What are the costs? >> I think the costs are quite low, perhaps even negative. Hidden work >> produced by merges that break things can be much more costly than getting >> the work right first time, as attribution is much more challenging. >> >> One cost that is created, however, is for version compatibility as we >> cannot say “well, this is a minor version bump so we don’t need to support >> downgrade”. But I think we should be investing in this anyway for operator >> simplicity and confidence, so I actually see this as a benefit as well. >> >>> Full disclosure: running face-first into 60+ failing tests on trunk >> I have to apologise here. CircleCI did not uncover these problems, >> apparently due to some way it resolves dependencies, and so I am >> responsible for a significant number of these and have been quite sick >> since. >> >> I think a push to eliminate flaky tests will probably help here in future, >> though, and perhaps the project needs to have some (low) threshold of flaky >> or failing tests at which point we block merges to force a correction. >> >> >> From: Joshua McKenzie <jmcken...@apache.org> >> Date: Saturday, 30 October 2021 at 14:00 >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> >> Subject: [DISCUSS] Releasable trunk and quality >> We as a project have gone back and forth on the topic of quality and the >> notion of a releasable trunk for quite a few years. If people are >> interested, I'd like to rekindle this discussion a bit and see if we're >> happy with where we are as a project or if we think there's steps we should >> take to change the quality bar going forward. The following questions have >> been rattling around for me for awhile: >> >> 1. How do we define what "releasable trunk" means? All reviewed by M >> committers? Passing N% of tests? Passing all tests plus some other metrics >> (manual testing, raising the number of reviewers, test coverage, usage in >> dev or QA environments, etc)? Something else entirely? >> >> 2. With a definition settled upon in #1, what steps, if any, do we need to >> take to get from where we are to having *and keeping* that releasable >> trunk? Anything to codify there? >> >> 3. What are the benefits of having a releasable trunk as defined here? What >> are the costs? Is it worth pursuing? What are the alternatives (for >> instance: a freeze before a release + stabilization focus by the community >> i.e. 4.0 push or the tock in tick-tock)? >> >> Given the large volumes of work coming down the pike with CEP's, this seems >> like a good time to at least check in on this topic as a community. >> >> Full disclosure: running face-first into 60+ failing tests on trunk when >> going through the commit process for denylisting this week brought this >> topic back up for me (reminds me of when I went to merge CDC back in 3.6 >> and those test failures riled me up... I sense a pattern ;)) >> >> Looking forward to hearing what people think. >> >> ~Josh >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org