Re: [DISCUSS] Releasable trunk and quality

2021-12-07 Thread Joshua McKenzie
>
> it would be far preferable for consistency of behaviour to rely on shared
> infrastructure if possible
>
For those of us using CircleCI, we can get a lot of the benefit by having a
script that rewrites and cleans up circle profiles based on use-case; it's
a shared / consistent environment and the scripting approach gives us
flexibility to support different workflows with minimal friction (build and
run every push vs. click to trigger for example).

Is there a reason we discounted modifying the merge strategy?

I took a stab at enumerating some of the current "best in class" I could
find here:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y.
My personal opinion is we'd be well served to do trunk-based development
with cherry-picks (and by that I mean basically re-applying) bugfixes back
to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
up, tomato tomahto), doing away with merge commits, and using git revert
more liberally when a commit breaks CI or introduces instability into it.

All that said, that's somewhat orthogonal (or perhaps complementary) to the
primary thing this discussion surfaced for me, which is that we don't have
standardization or guidance across what tests, on what JDK's, with what
config, etc that we run before commits today. My thinking is to get some
clarity for everyone on that front, reduce friction to encourage that
behavior, and then visit the merge strategy discussion independently after
that.

~Josh



On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi 
wrote:

> +1. I would add a 'post-commit' step: check the jenkins CI run for your
> merge and see if sthg broke regardless.
>
> On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > Hi Josh,
> > All good questions, thank you for raising this topic.
> > To the best of my knowledge, we don't have those documented but I will
> put
> > notes on what tribal knowledge I know about and I personally follow :-)
> >
> >  Pre-commit test suites: * Which JDK's?  - both are officially supported
> so
> > both.
> >
> > * When to include all python tests or do JVM only (if ever)? - if I test
> > only a test fix probably
> >
> >  * When to run upgrade tests? - I haven't heard any definitive guideline.
> > Preferably every time but if there is a tiny change I guess it can be
> > decided for them to be skipped. I would advocate to do more than less.
> >
> > * What to do if a test is also failing on the reference root (i.e. trunk,
> > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> one
> > at least, even if I don't plan to work on it at least to acknowledge
> > the issue and add any info I know about. If we know who broke it, ping
> the
> > author to check it.
> >
> > * What to do if a test fails intermittently? - Open a ticket. During
> > investigation - Use the CircleCI jobs for running tests in a loop to find
> > when it fails or to verify the test was fixed. (This is already in my
> draft
> > CircleCI document, not yet released as it was pending on the documents
> > migration.)
> >
> > Hope that helps.
> >
> > ~Ekaterina
> >
> > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie 
> wrote:
> >
> >> As I work through the scripting on this, I don't know if we've
> documented
> >> or clarified the following (don't see it here:
> >> https://cassandra.apache.org/_/development/testing.html):
> >>
> >> Pre-commit test suites:
> >> * Which JDK's?
> >> * When to include all python tests or do JVM only (if ever)?
> >> * When to run upgrade tests?
> >> * What to do if a test is also failing on the reference root (i.e.
> trunk,
> >> cassandra-4.0, etc)?
> >> * What to do if a test fails intermittently?
> >>
> >> I'll also update the above linked documentation once we hammer this out
> and
> >> try and bake it into the scripting flow as much as possible as well.
> Goal
> >> is to make it easy to do the right thing and hard to do the wrong thing,
> >> and to have these things written down rather than have it be tribal
> >> knowledge that varies a lot across the project.
> >>
> >> ~Josh
> >>
> >> On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie 
> >> wrote:
> >>
> >>> After some offline collab, here's where this thread has landed on a
> >>> proposal to change our processes to incrementally improve our processes
> >> and
> >>> hopefully stabilize the state of CI longer term:
> >>>
> >>> Link:
> >>>
> >>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
> >>> Hopefully the mail server doesn't butcher formatting; if it does, hit
> up
> >>> the gdoc and leave comments there as should be open to all.
> >>>
> >>> Phase 1:
> >>> Document merge criteria; update circle jobs to have a simple pre-merge
> >> job
> >>> (one for each JDK profile)
> >>>  * Donate, document, and formalize usage of circleci-enable.py in
> ASF
> >>> repo (need new commit scripts / dev tooling section?)
> >>> * rewrites circle 

Re: [DISCUSS] Releasable trunk and quality

2021-12-07 Thread bened...@apache.org
> My personal opinion is we'd be well served to do trunk-based development
with cherry-picks … to LTS release branches

Agreed.

> that's somewhat orthogonal … to the primary thing this discussion surfaced 
> for me

The primary outcome of the discussion for me was the need for some external 
pressure to maintain build quality, and the best solution proposed (to my mind) 
was the use of GitHub actions to integrate with various CI services to refuse 
PRs that do not have a clean test run. This doesn’t fully resolve flakiness, 
but it does provide 95%+ of the necessary pressure to maintain test quality, 
and a consistent way of determining that.

This is how a lot of other projects maintain correctness, and I think how many 
forks of Cassandra are maintained outside of the project as well.

From: Joshua McKenzie 
Date: Tuesday, 7 December 2021 at 13:08
To: dev@cassandra.apache.org 
Subject: Re: [DISCUSS] Releasable trunk and quality
>
> it would be far preferable for consistency of behaviour to rely on shared
> infrastructure if possible
>
For those of us using CircleCI, we can get a lot of the benefit by having a
script that rewrites and cleans up circle profiles based on use-case; it's
a shared / consistent environment and the scripting approach gives us
flexibility to support different workflows with minimal friction (build and
run every push vs. click to trigger for example).

Is there a reason we discounted modifying the merge strategy?

I took a stab at enumerating some of the current "best in class" I could
find here:
https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y.
My personal opinion is we'd be well served to do trunk-based development
with cherry-picks (and by that I mean basically re-applying) bugfixes back
to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
up, tomato tomahto), doing away with merge commits, and using git revert
more liberally when a commit breaks CI or introduces instability into it.

All that said, that's somewhat orthogonal (or perhaps complementary) to the
primary thing this discussion surfaced for me, which is that we don't have
standardization or guidance across what tests, on what JDK's, with what
config, etc that we run before commits today. My thinking is to get some
clarity for everyone on that front, reduce friction to encourage that
behavior, and then visit the merge strategy discussion independently after
that.

~Josh



On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi 
wrote:

> +1. I would add a 'post-commit' step: check the jenkins CI run for your
> merge and see if sthg broke regardless.
>
> On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > Hi Josh,
> > All good questions, thank you for raising this topic.
> > To the best of my knowledge, we don't have those documented but I will
> put
> > notes on what tribal knowledge I know about and I personally follow :-)
> >
> >  Pre-commit test suites: * Which JDK's?  - both are officially supported
> so
> > both.
> >
> > * When to include all python tests or do JVM only (if ever)? - if I test
> > only a test fix probably
> >
> >  * When to run upgrade tests? - I haven't heard any definitive guideline.
> > Preferably every time but if there is a tiny change I guess it can be
> > decided for them to be skipped. I would advocate to do more than less.
> >
> > * What to do if a test is also failing on the reference root (i.e. trunk,
> > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> one
> > at least, even if I don't plan to work on it at least to acknowledge
> > the issue and add any info I know about. If we know who broke it, ping
> the
> > author to check it.
> >
> > * What to do if a test fails intermittently? - Open a ticket. During
> > investigation - Use the CircleCI jobs for running tests in a loop to find
> > when it fails or to verify the test was fixed. (This is already in my
> draft
> > CircleCI document, not yet released as it was pending on the documents
> > migration.)
> >
> > Hope that helps.
> >
> > ~Ekaterina
> >
> > On Mon, 6 Dec 2021 at 17:20, Joshua McKenzie 
> wrote:
> >
> >> As I work through the scripting on this, I don't know if we've
> documented
> >> or clarified the following (don't see it here:
> >> https://cassandra.apache.org/_/development/testing.html):
> >>
> >> Pre-commit test suites:
> >> * Which JDK's?
> >> * When to include all python tests or do JVM only (if ever)?
> >> * When to run upgrade tests?
> >> * What to do if a test is also failing on the reference root (i.e.
> trunk,
> >> cassandra-4.0, etc)?
> >> * What to do if a test fails intermittently?
> >>
> >> I'll also update the above linked documentation once we hammer this out
> and
> >> try and bake it into the scripting flow as much as possible as well.
> Goal
> >> is to make it easy to do the right thing and hard to do the wrong thing,
> >> and to have these things written down rather than have it be tribal
> >> knowledge th

Re: [DISCUSS] Releasable trunk and quality

2021-12-07 Thread Joshua McKenzie
>
> the need for some external pressure to maintain build quality, and the
> best solution proposed (to my mind) was the use of GitHub actions to
> integrate with various CI services to refuse PRs that do not have a clean
> test run

Honestly, I agree 100% with this. I took the more conservative approach
(refine and standardize what we have + reduce friction) but I've long been
a believer in intentionally setting up incentives and disincentives to
shape behavior.

So let me pose the question here to the list: is there anyone who would
like to advocate for the current merge strategy (apply to oldest LTS, merge
up, often -s ours w/new patch applied + amend) instead of "apply to trunk
and cherry-pick back to LTS"? If we make this change we'll be able to
integrate w/github actions and block merge on green CI + integrate git
revert into our processes.

On Tue, Dec 7, 2021 at 9:08 AM bened...@apache.org 
wrote:

> > My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks … to LTS release branches
>
> Agreed.
>
> > that's somewhat orthogonal … to the primary thing this discussion
> surfaced for me
>
> The primary outcome of the discussion for me was the need for some
> external pressure to maintain build quality, and the best solution proposed
> (to my mind) was the use of GitHub actions to integrate with various CI
> services to refuse PRs that do not have a clean test run. This doesn’t
> fully resolve flakiness, but it does provide 95%+ of the necessary pressure
> to maintain test quality, and a consistent way of determining that.
>
> This is how a lot of other projects maintain correctness, and I think how
> many forks of Cassandra are maintained outside of the project as well.
>
> From: Joshua McKenzie 
> Date: Tuesday, 7 December 2021 at 13:08
> To: dev@cassandra.apache.org 
> Subject: Re: [DISCUSS] Releasable trunk and quality
> >
> > it would be far preferable for consistency of behaviour to rely on shared
> > infrastructure if possible
> >
> For those of us using CircleCI, we can get a lot of the benefit by having a
> script that rewrites and cleans up circle profiles based on use-case; it's
> a shared / consistent environment and the scripting approach gives us
> flexibility to support different workflows with minimal friction (build and
> run every push vs. click to trigger for example).
>
> Is there a reason we discounted modifying the merge strategy?
>
> I took a stab at enumerating some of the current "best in class" I could
> find here:
>
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.9b52fp49pp3y
> .
> My personal opinion is we'd be well served to do trunk-based development
> with cherry-picks (and by that I mean basically re-applying) bugfixes back
> to LTS release branches (or perhaps doing bugfix on oldest LTS and applying
> up, tomato tomahto), doing away with merge commits, and using git revert
> more liberally when a commit breaks CI or introduces instability into it.
>
> All that said, that's somewhat orthogonal (or perhaps complementary) to the
> primary thing this discussion surfaced for me, which is that we don't have
> standardization or guidance across what tests, on what JDK's, with what
> config, etc that we run before commits today. My thinking is to get some
> clarity for everyone on that front, reduce friction to encourage that
> behavior, and then visit the merge strategy discussion independently after
> that.
>
> ~Josh
>
>
>
> On Tue, Dec 7, 2021 at 1:08 AM Berenguer Blasi 
> wrote:
>
> > +1. I would add a 'post-commit' step: check the jenkins CI run for your
> > merge and see if sthg broke regardless.
> >
> > On 6/12/21 23:51, Ekaterina Dimitrova wrote:
> > > Hi Josh,
> > > All good questions, thank you for raising this topic.
> > > To the best of my knowledge, we don't have those documented but I will
> > put
> > > notes on what tribal knowledge I know about and I personally follow :-)
> > >
> > >  Pre-commit test suites: * Which JDK's?  - both are officially
> supported
> > so
> > > both.
> > >
> > > * When to include all python tests or do JVM only (if ever)? - if I
> test
> > > only a test fix probably
> > >
> > >  * When to run upgrade tests? - I haven't heard any definitive
> guideline.
> > > Preferably every time but if there is a tiny change I guess it can be
> > > decided for them to be skipped. I would advocate to do more than less.
> > >
> > > * What to do if a test is also failing on the reference root (i.e.
> trunk,
> > > cassandra-4.0, etc)? - check if a ticket exists already, if not - open
> > one
> > > at least, even if I don't plan to work on it at least to acknowledge
> > > the issue and add any info I know about. If we know who broke it, ping
> > the
> > > author to check it.
> > >
> > > * What to do if a test fails intermittently? - Open a ticket. During
> > > investigation - Use the CircleCI jobs for running tests in a loop to
> find
> > > when it fails or to verify the 

Re: [DISCUSS] Releasable trunk and quality

2021-12-07 Thread Brandon Williams
On Tue, Dec 7, 2021 at 8:18 AM Joshua McKenzie  wrote:
> So let me pose the question here to the list: is there anyone who would
> like to advocate for the current merge strategy (apply to oldest LTS, merge
> up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> and cherry-pick back to LTS"? If we make this change we'll be able to
> integrate w/github actions and block merge on green CI + integrate git
> revert into our processes.

Changing the merge strategy can have deep and possibly unforeseen
consequences, if the only reasoning is "because github needs it to do
X" then that reasoning doesn't seem sound enough to me.

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: [DISCUSS] Releasable trunk and quality

2021-12-07 Thread Joshua McKenzie
I'd frame the reasoning differently: Our current merge strategy is
vestigial and we can't rely on it in many, if not most, cases. Patches
rarely merge cleanly across majors requiring -s ours w/amend or other
changes per branch. This effectively clutters up our git history, hides
multi-branch changes behind merge commits, makes in-IDE annotations less
effective, and makes the barrier for reverting bad patches higher. It also
just so happens to make it effectively impossible to use github actions to
block merge on green CI.

On the positive side, it makes it much less likely we will forget to apply
a bugfix patch on all branches, and it's the Devil we Know and the entire
project understands and is relatively consistent with the current strategy.

What other positives are there to the current merge strategy that I may not
be thinking of?

~Josh


On Tue, Dec 7, 2021 at 10:35 AM Brandon Williams  wrote:

> On Tue, Dec 7, 2021 at 8:18 AM Joshua McKenzie 
> wrote:
> > So let me pose the question here to the list: is there anyone who would
> > like to advocate for the current merge strategy (apply to oldest LTS,
> merge
> > up, often -s ours w/new patch applied + amend) instead of "apply to trunk
> > and cherry-pick back to LTS"? If we make this change we'll be able to
> > integrate w/github actions and block merge on green CI + integrate git
> > revert into our processes.
>
> Changing the merge strategy can have deep and possibly unforeseen
> consequences, if the only reasoning is "because github needs it to do
> X" then that reasoning doesn't seem sound enough to me.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>