Hey Josh, Thank you for leading these discussions and organizing the wiki pages (also from the other mail). I just wanted to mention about point 4 of Pending work - I have a draft version for CircleCI usage, also Andres has updated the rst documents around running tests in a loop, etc. BUT those are pending on the merge of the ascii docs (to convert his work) and add the correct links and any other changes that have happened since I wrote it last year. Just saying for awareness before anyone decides to make a write up on that not knowing a draft already exists.
Thanks, Ekaterina On Tue, 4 Jan 2022 at 14:51, Joshua McKenzie <jmcken...@apache.org> wrote: > Here's a link to a draft article for the confluence wiki covering what we > discussed on this thread: > https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=199530280&draftShareId=7c72c252-918c-456b-9859-7d12e8fa9309& > > Assuming this article accurately captures what we discussed here as well > as outstanding work, I'll get it published and integrated with the > confluence wiki and get the remainder of the work into a JIRA epic to be > tracked. > > On Fri, Dec 17, 2021 at 4:41 PM Joshua McKenzie <jmcken...@apache.org> > wrote: > >> I'll get this into a draft article on the wiki so we can collab on those >> 3 outstanding TBD's without further cluttering up the dev list. :) >> >> On Fri, Dec 17, 2021 at 11:38 AM Ekaterina Dimitrova < >> e.dimitr...@gmail.com> wrote: >> >>> It’s indeed good call but I thought this will be addressed in a separate >>> document where we discuss required test suites to be run pre-commit. If >>> not >>> - then I guess we should add those things here too? >>> >>> On Fri, 17 Dec 2021 at 11:36, Joshua McKenzie <jmcken...@apache.org> >>> wrote: >>> >>> > Good call; thanks for the reminder. >>> > >>> > So maybe add a >>> > >>> > 3.a: Run all new or modified tests through either local or remote >>> > multiplexer N (TBD - 50?) times (w/link to instructions, etc) >>> > 3.b Non-regressing is defined here... >>> > 3.c After merging tickets... >>> > >>> > On Fri, Dec 17, 2021 at 11:29 AM Brandon Williams <dri...@gmail.com> >>> > wrote: >>> > >>> > > Could we also add something about running new tests through the >>> > > multiplexer? >>> > > >>> > > On Fri, Dec 17, 2021 at 10:23 AM Joshua McKenzie < >>> jmcken...@apache.org> >>> > > wrote: >>> > > > >>> > > > So to clarify it all in one place, the proposed new CI process we >>> > should >>> > > > test for consensus is: >>> > > > >>> > > > 1. Canonical CI for a release is ci-cassandra. We can optionally, >>> and >>> > in >>> > > > practice will, run circle as well but don't codify blocking on >>> that. >>> > > > 2. (NEW) We don't release unless we get a fully green run. >>> > > > 3. Before any merge, you need either a non-regressing (i.e. no new >>> > > > failures) run of circleci with a (specific suite of tests TBD) or >>> of >>> > > > ci-cassandra. >>> > > > 3.a Non-regressing is defined here as "Doesn't introduce any >>> new >>> > > test >>> > > > failures; any new failures in CI are clearly not attributable to >>> this >>> > > diff" >>> > > > 3.b: (NEW) After merging tickets, ci-cassandra runs against >>> the >>> > SHA >>> > > > and the author gets an advisory update on the related JIRA for any >>> new >>> > > > errors on CI. The author of the ticket will take point on triaging >>> this >>> > > new >>> > > > failure and either fixing (if clearly reproducible or related to >>> their >>> > > > work) or opening a JIRA for the intermittent failure and linking >>> it in >>> > > > butler (https://butler.cassandra.apache.org/#/) >>> > > > 4. (NEW) The Build Lead role + Butler catches and documents all >>> > failures >>> > > > and anything that slips through the procedural cracks in 3.b; >>> > resourcing >>> > > > for fixing flakey tests TBD >>> > > > >>> > > > Our two TBD we can tackle separately from consensus on the above: >>> > > > 1. Suite of tests on circle required to be considered ready for >>> merge >>> > > > 2. How we resource fixing flakey tests that are functionally >>> impossible >>> > > to >>> > > > attribute without essentially fixing the flake >>> > > > >>> > > > On Fri, Dec 17, 2021 at 10:56 AM Ekaterina Dimitrova < >>> > > e.dimitr...@gmail.com> >>> > > > wrote: >>> > > > >>> > > > > +1 (nb) on my end too, I second Mick >>> > > > > Thanks for putting this together Josh >>> > > > > >>> > > > > On Fri, 17 Dec 2021 at 10:48, Mick Semb Wever <m...@apache.org> >>> > wrote: >>> > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > 3.c: (NEW) After merging tickets, run ci-cassandra (already >>> do >>> > > this) >>> > > > > and >>> > > > > > > get an advisory update on the related JIRA for any new >>> errors on >>> > > the >>> > > > > run >>> > > > > > of >>> > > > > > > the SHA >>> > > > > > > >>> > > > > > > I strongly prefer we amend our process with 3.c. >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > +1 Yup, this is the most important missing piece for me. >>> > > > > > >>> > > > > > I also wouldn't mind we word the responsibility of the author >>> at >>> > > > > > post-commit fault to be involved/leading in the fix. This >>> > > incentivises >>> > > > > > people to do 2+3 properly, and not push it onto the build role. >>> > > > > > >>> > > > > >>> > > >>> > > --------------------------------------------------------------------- >>> > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> > > For additional commands, e-mail: dev-h...@cassandra.apache.org >>> > > >>> > > >>> > >>> >>