Re: [DISCUSS] Periodic snapshot publishing with minor version bumps

2021-12-17 Thread Mick Semb Wever
> "During the lead up to 4.0.0 there was plenty of headache and fixes going
> in to deal with how we parse version numbers in different places and
> alpa|beta|rc etc. I would rather bump the versions during the dev cycle and
> work on fixing it, than have that headache again at release time. I also
> feel for third-parties that have to parse our own way of versioning."
> Thank you Mick for sharing again the release management point of view. It
> is always a challenge to find a release manager who will have the time to
> spend on those things and often those efforts are not even really visible
> so it is easy to underestimate them. (All the break&fix that goes with it)
>


Thanks for the summary run-through and support Ekaterina, much appreciated.


I would like to point out that the code and tests do not support "pre" as a
pre-release label.
4.1.0-pre1 would break the code.

Furthermore, the pre-release version is alphanumerically sorted, therefore
"pre" would land between the last beta and the first rc version. Such
a proposal
using a pre-release version needs a label that is alphanumerically before
"alpha". And the code would need to be fixed to accept and sort the new
label. Maybe the drivers too, Jeremiah?



"For the release manager this is a simpler approach (not having to rollback
> version numbers and changelogs), and for those using development published
> artefacts (nightlies, staging, etc) (not having versions clobbered).
> Release manager practices aside, as a user I agree with Brandon, what
> matters is the version is greater and whether major/minor/patch numbers are
> greater."
> This is a very important point. Release management is time consuming enough
> and from what I've seen there are not many people who have that time to
> dedicate it. If there are suggestions for different ways to improve that
> experience, please, share them.



Such a change (replacing takeX with version increments when a vote fails)
wasn't part of my proposal here. It was only meant as anecdotal. It is
still useful to know that this situation can arise for the release manager,
e.g. if the artefacts were accidentally published.



After carefully reading the thread, it seems to me we need to find the
> right balance between:
> 1) users' understanding about versions; also usability
> Please, people, share your experience and feedback, we want to hear it!
> 2) no breaking changes for the ecosystem (or at least as little as
> possible)
> 3) efficient release management (minimal maintenance).
>



We still have only one proposal on the table that works, as was first
raised in this thread.

The only valid objection raised so far is cosmetic, touching on (1). I want
to emphasise that it being cosmetic doesn't make it trivial or to be
ignored: the image of the project belongs to the community; it's an
acceptable objection.  But I hope that objections can be followed up with
working proposals.

Reiterating, the cosmetic change would be that our next yearly release be
4.1 or 4.2 or 5.0 or 5.1 or 5.2 (as we would not be doing more than two
periodic snapshots before next May).

Another concern raised was the released artefacts can have a quality
pre-release label attached (alpha|beta|rc) while other unreleased artefacts
would have no such pre-release label, indicating that the latter has a
stability the former does not. This isn't true: these unreleased periodic
artefacts are only available via dev/snapshot channels. They would be the
same as builds off trunk are today, which currently is "4.1" without any
such pre-release label.

There's no rush on this thread, happy to let it continue through to
January. Thanks for speaking up Ekaterina, I hope others do too.


Re: [DISCUSS] Periodic snapshot publishing with minor version bumps

2021-12-17 Thread bened...@apache.org
> I would like to point out that the code and tests do not support "pre" as a
pre-release label. 4.1.0-pre1 would break the code.

If true this can easily be fixed, but AFAICT CassandraVersion is happy to parse 
this just fine so I doubt there would be many breakages.

> using a pre-release version needs a label that is alphanumerically before
"alpha"

4.0.0-PRE1

> not having to rollback version numbers and changelogs

What is unique about this situation versus alpha, beta and rc? Because these 
are again much more common, so whatever we do to handle these can surely be 
applied here? Why can’t we leave in 4.0.0-PRE1 changelog and release notes, if 
this is such a big deal? What’s so different about using 4.1.0 that permits 
avoiding extra work?

If this is truly impossible, why not use patch numbers rather than minors (with 
additional PRE1)? i.e. we could go 4.0.0-PRE1, 4.0.1-PRE2, 4.0.2-PRE3, 
4.0.4-alpha. I don’t like this, but I dislike it a lot less than using 
unqualified minors.

> We still have only one proposal on the table that works, as was first
raised in this thread.

I’m afraid I’m still flummoxed by this. Could you enumerate precisely what 
makes this proposal not work, as I still don’t see it?


From: Mick Semb Wever 
Date: Friday, 17 December 2021 at 09:18
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Periodic snapshot publishing with minor version bumps
> "During the lead up to 4.0.0 there was plenty of headache and fixes going
> in to deal with how we parse version numbers in different places and
> alpa|beta|rc etc. I would rather bump the versions during the dev cycle and
> work on fixing it, than have that headache again at release time. I also
> feel for third-parties that have to parse our own way of versioning."
> Thank you Mick for sharing again the release management point of view. It
> is always a challenge to find a release manager who will have the time to
> spend on those things and often those efforts are not even really visible
> so it is easy to underestimate them. (All the break&fix that goes with it)
>


Thanks for the summary run-through and support Ekaterina, much appreciated.


I would like to point out that the code and tests do not support "pre" as a
pre-release label.
4.1.0-pre1 would break the code.

Furthermore, the pre-release version is alphanumerically sorted, therefore
"pre" would land between the last beta and the first rc version. Such
a proposal
using a pre-release version needs a label that is alphanumerically before
"alpha". And the code would need to be fixed to accept and sort the new
label. Maybe the drivers too, Jeremiah?



"For the release manager this is a simpler approach (not having to rollback
> version numbers and changelogs), and for those using development published
> artefacts (nightlies, staging, etc) (not having versions clobbered).
> Release manager practices aside, as a user I agree with Brandon, what
> matters is the version is greater and whether major/minor/patch numbers are
> greater."
> This is a very important point. Release management is time consuming enough
> and from what I've seen there are not many people who have that time to
> dedicate it. If there are suggestions for different ways to improve that
> experience, please, share them.



Such a change (replacing takeX with version increments when a vote fails)
wasn't part of my proposal here. It was only meant as anecdotal. It is
still useful to know that this situation can arise for the release manager,
e.g. if the artefacts were accidentally published.



After carefully reading the thread, it seems to me we need to find the
> right balance between:
> 1) users' understanding about versions; also usability
> Please, people, share your experience and feedback, we want to hear it!
> 2) no breaking changes for the ecosystem (or at least as little as
> possible)
> 3) efficient release management (minimal maintenance).
>



We still have only one proposal on the table that works, as was first
raised in this thread.

The only valid objection raised so far is cosmetic, touching on (1). I want
to emphasise that it being cosmetic doesn't make it trivial or to be
ignored: the image of the project belongs to the community; it's an
acceptable objection.  But I hope that objections can be followed up with
working proposals.

Reiterating, the cosmetic change would be that our next yearly release be
4.1 or 4.2 or 5.0 or 5.1 or 5.2 (as we would not be doing more than two
periodic snapshots before next May).

Another concern raised was the released artefacts can have a quality
pre-release label attached (alpha|beta|rc) while other unreleased artefacts
would have no such pre-release label, indicating that the latter has a
stability the former does not. This isn't true: these unreleased periodic
artefacts are only available via dev/snapshot channels. They would be the
same as builds off trunk are today, which currently is "4.1" without any
such pre-release label.

There's no rush on th

Re: Clarifying CI release criteria

2021-12-17 Thread Joshua McKenzie
What if we tried the following:

1. Canonical CI for a release is ci-cassandra. We can optionally, and in
practice will, run circle as well but don't codify blocking on that.
2. (NEW) We don't release unless we get a fully green run.
3. Before any merge, you need either a non-regressing (i.e. no new
failures) run of circleci or of ci-cassandra.
 3.a Non-regressing is defined here as "Doesn't introduce any new test
failures; any new failures in CI are clearly not attributable to this diff"
4. (NEW) The Build Lead role + Butler catches and documents new
intermittent failures; it's unspecified how we resource fixing those
collectively at this time

2 raises the specter of flaky tests unique to apache infra greatly delaying
releases. I can think of a few options to help keep us from regressing on
ci-cassandra (numbered to indicate where they fit in / replace the flow
above):

3: (NEW) Before merging tickets, block on a clean run of ci-cassandra (need
something like merge trains; could automate merging, hard / impossible
w/merge commits)
3: (NEW) Before merging tickets, run ci-cassandra and get an advisory
update on the related JIRA (extra ci runtime burden; long delays w/out CI
tests or infra optimization)
3.c: (NEW) After merging tickets, run ci-cassandra (already do this) and
get an advisory update on the related JIRA for any new errors on the run of
the SHA

I strongly prefer we amend our process with 3.c. I'm pretty sure we could
get granular enough to compute any new test failures and highlight them in
the JIRA ticket and link to the run + the previous run. I believe this
would greatly tighten the loop between a delta and a failure for a variety
of tests, and 4 above would provide the fail-safe for us to catch and
address flakes far earlier than the current model.

~Josh

On Thu, Dec 16, 2021 at 1:20 PM Mick Semb Wever  wrote:

> >
> >
> > > ci-cassandra.a.o needs to be our canonical CI
> >
> > it's the only one fully usable by a volunteer based
> >
> >
> > only green in both counts as green
> >
> > I think today might just be my day to annoy you Mick. :D Sorry!
> >
>
>
> On the day I'm laid up in bed with a cold.
> Go for gold :-)
>
> I think this is contradictory. We can't require circle to be green for a
> > release if the free tier usage of it a) doesn't pass tests, and/or b)
> > requires a license incompatible w/some contributors. That effectively
> would
> > make circle + asf ci our canonical ci, right?
> >
>
>
> That's taking it out (or twisting) my context a bit, let me explain…
>
> First, I did not mean the free tier. It is not usable AFAIK. It could be
> updated so it was constrained in what it could run and was stable, but then
> it's not complete so there's limited value here. IMHO plugging in GitHub
> Actions to do a very basic build+test would hit a larger newcomer audience.
>
> Second, I didn't mean one *had* to run both. Just like post-commit will
> catch things, just so long as that breakage comes around to you and you
> accept your involvement in it. We (the whole community) need to help out
> when the author cannot reproduce/debug the failure, and this isn't just
> limited to premium circleci.
>
>
> less flakies than the previous release
> >
> > This statement makes me wary. :) Why not "no test failures"?
> >
>
>
> More than happy to go for that. And I damn hope we are there for our next
> major release.
> This statement was more just a preference to lean on the more pragmatic
> side. We know our north star, keep moving towards it.
>


Re: Clarifying CI release criteria

2021-12-17 Thread Mick Semb Wever
>
>
> 3.c: (NEW) After merging tickets, run ci-cassandra (already do this) and
> get an advisory update on the related JIRA for any new errors on the run of
> the SHA
>
> I strongly prefer we amend our process with 3.c.



+1   Yup, this is the most important missing piece for me.

I also wouldn't mind we word the responsibility of the author at
post-commit fault to be involved/leading in the fix. This incentivises
people to do 2+3 properly, and not push it onto the build role.


Re: Clarifying CI release criteria

2021-12-17 Thread Ekaterina Dimitrova
+1 (nb) on my end too, I second Mick
Thanks for putting this together Josh

On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever  wrote:

> >
> >
> > 3.c: (NEW) After merging tickets, run ci-cassandra (already do this) and
> > get an advisory update on the related JIRA for any new errors on the run
> of
> > the SHA
> >
> > I strongly prefer we amend our process with 3.c.
>
>
>
> +1   Yup, this is the most important missing piece for me.
>
> I also wouldn't mind we word the responsibility of the author at
> post-commit fault to be involved/leading in the fix. This incentivises
> people to do 2+3 properly, and not push it onto the build role.
>


Re: Clarifying CI release criteria

2021-12-17 Thread Joshua McKenzie
So to clarify it all in one place, the proposed new CI process we should
test for consensus is:

1. Canonical CI for a release is ci-cassandra. We can optionally, and in
practice will, run circle as well but don't codify blocking on that.
2. (NEW) We don't release unless we get a fully green run.
3. Before any merge, you need either a non-regressing (i.e. no new
failures) run of circleci with a (specific suite of tests TBD) or of
ci-cassandra.
 3.a Non-regressing is defined here as "Doesn't introduce any new test
failures; any new failures in CI are clearly not attributable to this diff"
 3.b: (NEW) After merging tickets, ci-cassandra runs against the SHA
and the author gets an advisory update on the related JIRA for any new
errors on CI. The author of the ticket will take point on triaging this new
failure and either fixing (if clearly reproducible or related to their
work) or opening a JIRA for the intermittent failure and linking it in
butler (https://butler.cassandra.apache.org/#/)
4. (NEW) The Build Lead role + Butler catches and documents all failures
and anything that slips through the procedural cracks in 3.b; resourcing
for fixing flakey tests TBD

Our two TBD we can tackle separately from consensus on the above:
1. Suite of tests on circle required to be considered ready for merge
2. How we resource fixing flakey tests that are functionally impossible to
attribute without essentially fixing the flake

On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova 
wrote:

> +1 (nb) on my end too, I second Mick
> Thanks for putting this together Josh
>
> On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever  wrote:
>
> > >
> > >
> > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do this)
> and
> > > get an advisory update on the related JIRA for any new errors on the
> run
> > of
> > > the SHA
> > >
> > > I strongly prefer we amend our process with 3.c.
> >
> >
> >
> > +1   Yup, this is the most important missing piece for me.
> >
> > I also wouldn't mind we word the responsibility of the author at
> > post-commit fault to be involved/leading in the fix. This incentivises
> > people to do 2+3 properly, and not push it onto the build role.
> >
>


Re: Clarifying CI release criteria

2021-12-17 Thread Brandon Williams
Could we also add something about running new tests through the multiplexer?

On Fri, Dec 17, 2021 at 10:23 AM Joshua McKenzie  wrote:
>
> So to clarify it all in one place, the proposed new CI process we should
> test for consensus is:
>
> 1. Canonical CI for a release is ci-cassandra. We can optionally, and in
> practice will, run circle as well but don't codify blocking on that.
> 2. (NEW) We don't release unless we get a fully green run.
> 3. Before any merge, you need either a non-regressing (i.e. no new
> failures) run of circleci with a (specific suite of tests TBD) or of
> ci-cassandra.
>  3.a Non-regressing is defined here as "Doesn't introduce any new test
> failures; any new failures in CI are clearly not attributable to this diff"
>  3.b: (NEW) After merging tickets, ci-cassandra runs against the SHA
> and the author gets an advisory update on the related JIRA for any new
> errors on CI. The author of the ticket will take point on triaging this new
> failure and either fixing (if clearly reproducible or related to their
> work) or opening a JIRA for the intermittent failure and linking it in
> butler (https://butler.cassandra.apache.org/#/)
> 4. (NEW) The Build Lead role + Butler catches and documents all failures
> and anything that slips through the procedural cracks in 3.b; resourcing
> for fixing flakey tests TBD
>
> Our two TBD we can tackle separately from consensus on the above:
> 1. Suite of tests on circle required to be considered ready for merge
> 2. How we resource fixing flakey tests that are functionally impossible to
> attribute without essentially fixing the flake
>
> On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova 
> wrote:
>
> > +1 (nb) on my end too, I second Mick
> > Thanks for putting this together Josh
> >
> > On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever  wrote:
> >
> > > >
> > > >
> > > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do this)
> > and
> > > > get an advisory update on the related JIRA for any new errors on the
> > run
> > > of
> > > > the SHA
> > > >
> > > > I strongly prefer we amend our process with 3.c.
> > >
> > >
> > >
> > > +1   Yup, this is the most important missing piece for me.
> > >
> > > I also wouldn't mind we word the responsibility of the author at
> > > post-commit fault to be involved/leading in the fix. This incentivises
> > > people to do 2+3 properly, and not push it onto the build role.
> > >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Clarifying CI release criteria

2021-12-17 Thread Joshua McKenzie
Good call; thanks for the reminder.

So maybe add a

3.a: Run all new or modified tests through either local or remote
multiplexer N (TBD - 50?) times (w/link to instructions, etc)
3.b Non-regressing is defined here...
3.c After merging tickets...

On Fri, Dec 17, 2021 at 11:29 AM Brandon Williams  wrote:

> Could we also add something about running new tests through the
> multiplexer?
>
> On Fri, Dec 17, 2021 at 10:23 AM Joshua McKenzie 
> wrote:
> >
> > So to clarify it all in one place, the proposed new CI process we should
> > test for consensus is:
> >
> > 1. Canonical CI for a release is ci-cassandra. We can optionally, and in
> > practice will, run circle as well but don't codify blocking on that.
> > 2. (NEW) We don't release unless we get a fully green run.
> > 3. Before any merge, you need either a non-regressing (i.e. no new
> > failures) run of circleci with a (specific suite of tests TBD) or of
> > ci-cassandra.
> >  3.a Non-regressing is defined here as "Doesn't introduce any new
> test
> > failures; any new failures in CI are clearly not attributable to this
> diff"
> >  3.b: (NEW) After merging tickets, ci-cassandra runs against the SHA
> > and the author gets an advisory update on the related JIRA for any new
> > errors on CI. The author of the ticket will take point on triaging this
> new
> > failure and either fixing (if clearly reproducible or related to their
> > work) or opening a JIRA for the intermittent failure and linking it in
> > butler (https://butler.cassandra.apache.org/#/)
> > 4. (NEW) The Build Lead role + Butler catches and documents all failures
> > and anything that slips through the procedural cracks in 3.b; resourcing
> > for fixing flakey tests TBD
> >
> > Our two TBD we can tackle separately from consensus on the above:
> > 1. Suite of tests on circle required to be considered ready for merge
> > 2. How we resource fixing flakey tests that are functionally impossible
> to
> > attribute without essentially fixing the flake
> >
> > On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova <
> e.dimitr...@gmail.com>
> > wrote:
> >
> > > +1 (nb) on my end too, I second Mick
> > > Thanks for putting this together Josh
> > >
> > > On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever  wrote:
> > >
> > > > >
> > > > >
> > > > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do
> this)
> > > and
> > > > > get an advisory update on the related JIRA for any new errors on
> the
> > > run
> > > > of
> > > > > the SHA
> > > > >
> > > > > I strongly prefer we amend our process with 3.c.
> > > >
> > > >
> > > >
> > > > +1   Yup, this is the most important missing piece for me.
> > > >
> > > > I also wouldn't mind we word the responsibility of the author at
> > > > post-commit fault to be involved/leading in the fix. This
> incentivises
> > > > people to do 2+3 properly, and not push it onto the build role.
> > > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>


Re: Clarifying CI release criteria

2021-12-17 Thread Ekaterina Dimitrova
It’s indeed good call but I thought this will be addressed in a separate
document where we discuss required test suites to be run pre-commit. If not
- then I guess we should add those things here too?

On Fri, 17 Dec 2021 at 11:36, Joshua McKenzie  wrote:

> Good call; thanks for the reminder.
>
> So maybe add a
>
> 3.a: Run all new or modified tests through either local or remote
> multiplexer N (TBD - 50?) times (w/link to instructions, etc)
> 3.b Non-regressing is defined here...
> 3.c After merging tickets...
>
> On Fri, Dec 17, 2021 at 11:29 AM Brandon Williams 
> wrote:
>
> > Could we also add something about running new tests through the
> > multiplexer?
> >
> > On Fri, Dec 17, 2021 at 10:23 AM Joshua McKenzie 
> > wrote:
> > >
> > > So to clarify it all in one place, the proposed new CI process we
> should
> > > test for consensus is:
> > >
> > > 1. Canonical CI for a release is ci-cassandra. We can optionally, and
> in
> > > practice will, run circle as well but don't codify blocking on that.
> > > 2. (NEW) We don't release unless we get a fully green run.
> > > 3. Before any merge, you need either a non-regressing (i.e. no new
> > > failures) run of circleci with a (specific suite of tests TBD) or of
> > > ci-cassandra.
> > >  3.a Non-regressing is defined here as "Doesn't introduce any new
> > test
> > > failures; any new failures in CI are clearly not attributable to this
> > diff"
> > >  3.b: (NEW) After merging tickets, ci-cassandra runs against the
> SHA
> > > and the author gets an advisory update on the related JIRA for any new
> > > errors on CI. The author of the ticket will take point on triaging this
> > new
> > > failure and either fixing (if clearly reproducible or related to their
> > > work) or opening a JIRA for the intermittent failure and linking it in
> > > butler (https://butler.cassandra.apache.org/#/)
> > > 4. (NEW) The Build Lead role + Butler catches and documents all
> failures
> > > and anything that slips through the procedural cracks in 3.b;
> resourcing
> > > for fixing flakey tests TBD
> > >
> > > Our two TBD we can tackle separately from consensus on the above:
> > > 1. Suite of tests on circle required to be considered ready for merge
> > > 2. How we resource fixing flakey tests that are functionally impossible
> > to
> > > attribute without essentially fixing the flake
> > >
> > > On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova <
> > e.dimitr...@gmail.com>
> > > wrote:
> > >
> > > > +1 (nb) on my end too, I second Mick
> > > > Thanks for putting this together Josh
> > > >
> > > > On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever 
> wrote:
> > > >
> > > > > >
> > > > > >
> > > > > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do
> > this)
> > > > and
> > > > > > get an advisory update on the related JIRA for any new errors on
> > the
> > > > run
> > > > > of
> > > > > > the SHA
> > > > > >
> > > > > > I strongly prefer we amend our process with 3.c.
> > > > >
> > > > >
> > > > >
> > > > > +1   Yup, this is the most important missing piece for me.
> > > > >
> > > > > I also wouldn't mind we word the responsibility of the author at
> > > > > post-commit fault to be involved/leading in the fix. This
> > incentivises
> > > > > people to do 2+3 properly, and not push it onto the build role.
> > > > >
> > > >
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>


Re: Clarifying CI release criteria

2021-12-17 Thread Joshua McKenzie
I'll get this into a draft article on the wiki so we can collab on those 3
outstanding TBD's without further cluttering up the dev list. :)

On Fri, Dec 17, 2021 at 11:38 AM Ekaterina Dimitrova 
wrote:

> It’s indeed good call but I thought this will be addressed in a separate
> document where we discuss required test suites to be run pre-commit. If not
> - then I guess we should add those things here too?
>
> On Fri, 17 Dec 2021 at 11:36, Joshua McKenzie 
> wrote:
>
> > Good call; thanks for the reminder.
> >
> > So maybe add a
> >
> > 3.a: Run all new or modified tests through either local or remote
> > multiplexer N (TBD - 50?) times (w/link to instructions, etc)
> > 3.b Non-regressing is defined here...
> > 3.c After merging tickets...
> >
> > On Fri, Dec 17, 2021 at 11:29 AM Brandon Williams 
> > wrote:
> >
> > > Could we also add something about running new tests through the
> > > multiplexer?
> > >
> > > On Fri, Dec 17, 2021 at 10:23 AM Joshua McKenzie  >
> > > wrote:
> > > >
> > > > So to clarify it all in one place, the proposed new CI process we
> > should
> > > > test for consensus is:
> > > >
> > > > 1. Canonical CI for a release is ci-cassandra. We can optionally, and
> > in
> > > > practice will, run circle as well but don't codify blocking on that.
> > > > 2. (NEW) We don't release unless we get a fully green run.
> > > > 3. Before any merge, you need either a non-regressing (i.e. no new
> > > > failures) run of circleci with a (specific suite of tests TBD) or of
> > > > ci-cassandra.
> > > >  3.a Non-regressing is defined here as "Doesn't introduce any new
> > > test
> > > > failures; any new failures in CI are clearly not attributable to this
> > > diff"
> > > >  3.b: (NEW) After merging tickets, ci-cassandra runs against the
> > SHA
> > > > and the author gets an advisory update on the related JIRA for any
> new
> > > > errors on CI. The author of the ticket will take point on triaging
> this
> > > new
> > > > failure and either fixing (if clearly reproducible or related to
> their
> > > > work) or opening a JIRA for the intermittent failure and linking it
> in
> > > > butler (https://butler.cassandra.apache.org/#/)
> > > > 4. (NEW) The Build Lead role + Butler catches and documents all
> > failures
> > > > and anything that slips through the procedural cracks in 3.b;
> > resourcing
> > > > for fixing flakey tests TBD
> > > >
> > > > Our two TBD we can tackle separately from consensus on the above:
> > > > 1. Suite of tests on circle required to be considered ready for merge
> > > > 2. How we resource fixing flakey tests that are functionally
> impossible
> > > to
> > > > attribute without essentially fixing the flake
> > > >
> > > > On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova <
> > > e.dimitr...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 (nb) on my end too, I second Mick
> > > > > Thanks for putting this together Josh
> > > > >
> > > > > On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever 
> > wrote:
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > 3.c: (NEW) After merging tickets, run ci-cassandra (already do
> > > this)
> > > > > and
> > > > > > > get an advisory update on the related JIRA for any new errors
> on
> > > the
> > > > > run
> > > > > > of
> > > > > > > the SHA
> > > > > > >
> > > > > > > I strongly prefer we amend our process with 3.c.
> > > > > >
> > > > > >
> > > > > >
> > > > > > +1   Yup, this is the most important missing piece for me.
> > > > > >
> > > > > > I also wouldn't mind we word the responsibility of the author at
> > > > > > post-commit fault to be involved/leading in the fix. This
> > > incentivises
> > > > > > people to do 2+3 properly, and not push it onto the build role.
> > > > > >
> > > > >
> > >
> > > -
> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > > For additional commands, e-mail: dev-h...@cassandra.apache.org
> > >
> > >
> >
>