Re: [DISCUSS] Releasable trunk and quality

Berenguer Blasi Mon, 01 Nov 2021 23:46:21 -0700

Hi,

we already have a way to confirm flakiness on circle by running the test
repeatedly N times. Like 100 or 500. That has proven to work very well
so far, at least for me. #collaborating #justfyi


On the 60+ failures it is not as bad as it looks. Let me explain. I have
been tracking failures in 4.0 and trunk daily, it's grown as a habit in
me after the 4.0 push. And 4.0 and trunk were hovering around <10
failures solidly (you can check jenkins ci graphs). The random bisect or
fix was needed leaving behind 3 or 4 tests that have defeated already 2
or 3 committers, so the really tough guys. I am reasonably convinced
once the 60+ failures fix merges we'll be back to the <10 failures with
relative little effort.

So we're just in the middle of a 'fix' but overall we shouldn't be as
bad as it looks now as we've been quite good at keeping CI green-ish imo.

Also +1 to releasable branches, which whatever we settle it means it is
not a wall of failures, bc of reasons explained like the hidden costs etc

My 2cts.

On 2/11/21 6:07, Jacek Lewandowski wrote:
>> I don’t think means guaranteeing there are no failing tests (though
>> ideally this would also happen), but about ensuring our best practices are
>> followed for every merge. 4.0 took so long to release because of the amount
>> of hidden work that was created by merging work that didn’t meet the
>> standard for release.
>>
> Tests are sometimes considered flaky because they fail intermittently but
> it may not be related to the insufficiently consistent test implementation
> and can reveal some real problem in the production code. I saw that in
> various codebases and I think that it would be great if each such test (or
> test group) was guaranteed to have a ticket and some preliminary analysis
> was done to confirm it is just a test problem before releasing the new
> version
>
> Historically we have also had significant pressure to backport features to
>> earlier versions due to the cost and risk of upgrading. If we maintain
>> broader version compatibility for upgrade, and reduce the risk of adopting
>> newer versions, then this pressure is also reduced significantly. Though
>> perhaps we will stick to our guns here anyway, as there seems to be renewed
>> pressure to limit work in GA releases to bug fixes exclusively. It remains
>> to be seen if this holds.
>
> Are there any precise requirements for supported upgrade and downgrade
> paths?
>
> Thanks
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>
>
> On Sat, Oct 30, 2021 at 4:07 PM [email protected] <[email protected]>
> wrote:
>
>>> How do we define what "releasable trunk" means?
>> For me, the major criteria is ensuring that work is not merged that is
>> known to require follow-up work, or could reasonably have been known to
>> require follow-up work if better QA practices had been followed.
>>
>> So, a big part of this is ensuring we continue to exceed our targets for
>> improved QA. For me this means trying to weave tools like Harry and the
>> Simulator into our development workflow early on, but we’ll see how well
>> these tools gain broader adoption. This also means focus in general on
>> possible negative effects of a change.
>>
>> I think we could do with producing guidance documentation for how to
>> approach QA, where we can record our best practices and evolve them as we
>> discover flaws or pitfalls, either for ergonomics or for bug discovery.
>>
>>> What are the benefits of having a releasable trunk as defined here?
>> If we want to have any hope of meeting reasonable release cadences _and_
>> the high project quality we expect today, then I think a ~shippable trunk
>> policy is an absolute necessity.
>>
>> I don’t think means guaranteeing there are no failing tests (though
>> ideally this would also happen), but about ensuring our best practices are
>> followed for every merge. 4.0 took so long to release because of the amount
>> of hidden work that was created by merging work that didn’t meet the
>> standard for release.
>>
>> Historically we have also had significant pressure to backport features to
>> earlier versions due to the cost and risk of upgrading. If we maintain
>> broader version compatibility for upgrade, and reduce the risk of adopting
>> newer versions, then this pressure is also reduced significantly. Though
>> perhaps we will stick to our guns here anyway, as there seems to be renewed
>> pressure to limit work in GA releases to bug fixes exclusively. It remains
>> to be seen if this holds.
>>
>>> What are the costs?
>> I think the costs are quite low, perhaps even negative. Hidden work
>> produced by merges that break things can be much more costly than getting
>> the work right first time, as attribution is much more challenging.
>>
>> One cost that is created, however, is for version compatibility as we
>> cannot say “well, this is a minor version bump so we don’t need to support
>> downgrade”. But I think we should be investing in this anyway for operator
>> simplicity and confidence, so I actually see this as a benefit as well.
>>
>>> Full disclosure: running face-first into 60+ failing tests on trunk
>> I have to apologise here. CircleCI did not uncover these problems,
>> apparently due to some way it resolves dependencies, and so I am
>> responsible for a significant number of these and have been quite sick
>> since.
>>
>> I think a push to eliminate flaky tests will probably help here in future,
>> though, and perhaps the project needs to have some (low) threshold of flaky
>> or failing tests at which point we block merges to force a correction.
>>
>>
>> From: Joshua McKenzie <[email protected]>
>> Date: Saturday, 30 October 2021 at 14:00
>> To: [email protected] <[email protected]>
>> Subject: [DISCUSS] Releasable trunk and quality
>> We as a project have gone back and forth on the topic of quality and the
>> notion of a releasable trunk for quite a few years. If people are
>> interested, I'd like to rekindle this discussion a bit and see if we're
>> happy with where we are as a project or if we think there's steps we should
>> take to change the quality bar going forward. The following questions have
>> been rattling around for me for awhile:
>>
>> 1. How do we define what "releasable trunk" means? All reviewed by M
>> committers? Passing N% of tests? Passing all tests plus some other metrics
>> (manual testing, raising the number of reviewers, test coverage, usage in
>> dev or QA environments, etc)? Something else entirely?
>>
>> 2. With a definition settled upon in #1, what steps, if any, do we need to
>> take to get from where we are to having *and keeping* that releasable
>> trunk? Anything to codify there?
>>
>> 3. What are the benefits of having a releasable trunk as defined here? What
>> are the costs? Is it worth pursuing? What are the alternatives (for
>> instance: a freeze before a release + stabilization focus by the community
>> i.e. 4.0 push or the tock in tick-tock)?
>>
>> Given the large volumes of work coming down the pike with CEP's, this seems
>> like a good time to at least check in on this topic as a community.
>>
>> Full disclosure: running face-first into 60+ failing tests on trunk when
>> going through the commit process for denylisting this week brought this
>> topic back up for me (reminds me of when I went to merge CDC back in 3.6
>> and those test failures riled me up... I sense a pattern ;))
>>
>> Looking forward to hearing what people think.
>>
>> ~Josh
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] Releasable trunk and quality

Reply via email to