> Revert for only trunk patches right? > I’d say we need to completely stabilize the environment, no noise before we > go into that direction. Hm. Is the concern multi-branch reverts w/merge commits being awful? Because I hear that. Starting trunk-only would be reasonable enough I think, especially since we'd be bugfix only on other branches anyway and expect less test destabilization. This is tickling my memory a bit; I think we talked about... something? Different on how we handle CI and vetting on trunk compared to other branches. I'll have to dig around later and see if I can surface that.
I think completely stabilizing the environment is going to be something of a chicken / egg problem. Until we move away from our heterogenous execution environment w/constant degraded and failing agents and/or get more automated robustness (re-run stage w/just timed out tests for example), I don't think we'll be able to get to a completely stabilized environment. And IMO the "if you break it you buy it (revert)" approach would strictly serve to help us in our move in that direction. As I type this out, it strikes me that this feels similar to being on-call for the code you write. When there's real-world stakes / pain / discomfort that *will be applied* to you if you're not thorough in your consideration, you think about things differently and it improves the quality of your work as a result. I suspect the risk of having personal delivery timelines slip because your code introduced test failures would be a pretty strong incentive to both be more careful about how you work on what you're doing plus incentive to chip in and work on the CI environment as well to prevent any CI-stack specific errors in the future. I think about this in terms of where the tax is being paid. If the pressure is applied to the person who contributed the code, they have to pay the tax. If we allow these kind of failures to rest in the system, the entire rest of the dev community pays the tax. The former seems less aggregate cost to us as a project than the latter to me? On Wed, Jul 12, 2023, at 9:10 AM, Ekaterina Dimitrova wrote: > Revert for only trunk patches right? > I’d say we need to completely stabilize the environment, no noise before we > go into that direction. > > On Wed, 12 Jul 2023 at 8:55, Jacek Lewandowski <lewandowski.ja...@gmail.com> > wrote: >> Would it be re-opening the ticket or creating a new ticket with "revert of >> fix" ? >> >> >> >> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova <e.dimitr...@gmail.com> >> napisał(a): >>> jenkins_jira_integration >>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> >>> script updating the JIRA ticket with test results if you cause a >>> regression + us building a muscle around reverting your commit if they >>> break tests.“ >>> >>> I am not sure people finding the time to fix their breakages will be solved >>> but at least they will be pinged automatically. Hopefully many follow Jira >>> updates. >>> >>> “ I don't take the past as strongly indicative of the future here since >>> we've been allowing circle to validate pre-commit and haven't been >>> multiplexing.” >>> I am interested to compare how many tickets for flaky tests we will have >>> pre-5.0 now compared to pre-4.1. >>> >>> >>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <jmcken...@apache.org> wrote: >>>> __ >>>> (This response ended up being a bit longer than intended; sorry about that) >>>> >>>>> What is more common though is packaging errors, >>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo >>>>> upgrade tests, being less responsive post-commit as you already >>>>> moved on >>>> *Two that ***should ***be resolved in the new regime:*** >>>> * Packaging errors should be caught pre as we're making the artifact >>>> builds part of pre-commit. >>>> * I'm hoping to merge the commit log segment allocation so CDC allocator >>>> is the only one for 5.0 (and just bypasses the cdc-related work on >>>> allocation if it's disabled thus not impacting perf); the existing >>>> targeted testing of cdc specific functionality should be sufficient to >>>> confirm its correctness as it doesn't vary from the primary allocation >>>> path when it comes to mutation space in the buffer >>>> * Upgrade tests are going to be part of the pre-commit suite >>>> >>>> *Outstanding issues:*** >>>> * compression. If we just run with defaults we won't test all cases so >>>> errors could pop up here >>>> * system_ks_directory related things: is this still ongoing or did we have >>>> a transient burst of these types of issues? And would we expect these to >>>> vary based on different JDK's, non-default configurations, etc? >>>> * Being less responsive post-commit: My only ideas here are a combination >>>> of the jenkins_jira_integration >>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py> >>>> script updating the JIRA ticket with test results if you cause a >>>> regression + us building a muscle around reverting your commit if they >>>> break tests. >>>> >>>> To quote Jacek: >>>>> why don't run dtests w/wo sstable compression x w/wo internode encryption >>>>> x w/wo vnodes, >>>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. >>>>> I think this is a matter of cost vs result. >>>> >>>> I think we've organically made these decisions and tradeoffs in the past >>>> without being methodical about it. If we can: >>>> 1. Multiplex changed or new tests >>>> 2. Tighten the feedback loop of "tests were green, now they're >>>> *consistently* not, you're the only one who changed something", and >>>> 3. Instill a culture of "if you can't fix it immediately revert your >>>> commit" >>>> >>>> Then I think we'll only be vulnerable to flaky failures introduced across >>>> different non-default configurations as side effects in tests that aren't >>>> touched, which *intuitively* feels like a lot less than we're facing >>>> today. We could even get clever as a day 2 effort and define packages in >>>> the primary codebase where changes take place and multiplex (on a smaller >>>> scale) their respective packages of unit tests in the future if we see >>>> problems in this area. >>>> >>>> Flakey tests are a giant pain in the ass and a huge drain on productivity, >>>> don't get me wrong. *And* we have to balance how much cost we're paying >>>> before each commit with the benefit we expect to gain from that. >>>> >>>> Does the above make sense? Are there things you've seen in the trenches >>>> that challenge or invalidate any of those perspectives? >>>> >>>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote: >>>>> Isn't novnodes a special case of vnodes with n=1 ? >>>>> >>>>> We should rather select a subset of tests for which it makes sense to run >>>>> with different configurations. >>>>> >>>>> The set of configurations against which we run the tests currently is >>>>> still only the subset of all possible cases. >>>>> I could ask - why don't run dtests w/wo sstable compression x w/wo >>>>> internode encryption x w/wo vnodes, >>>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. >>>>> I think this is a matter of cost vs result. >>>>> This equation contains the likelihood of failure in configuration X, >>>>> given there was no failure in the default >>>>> configuration, the cost of running those tests, the time we delay >>>>> merging, the likelihood that we wait for >>>>> the test results so long that our branch diverge and we will have to >>>>> rerun them or accept the fact that we merge >>>>> a code which was tested on outdated base. Eventually, the overall new >>>>> contributors experience - whether they >>>>> want to participate in the future. >>>>> >>>>> >>>>> >>>>> śr., 12 lip 2023 o 07:24 Berenguer Blasi <berenguerbl...@gmail.com> >>>>> napisał(a): >>>>>> On our 4.0 release I remember a number of such failures but not >>>>>> recently. What is more common though is packaging errors, >>>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade >>>>>> tests, being less responsive post-commit as you already moved on,... >>>>>> Either the smoke pre-commit has approval steps for everything or we >>>>>> should give imo a devBranch alike job to the dev pre-commit. I find it >>>>>> terribly useful. My 2cts. >>>>>> >>>>>> On 11/7/23 18:26, Josh McKenzie wrote: >>>>>>>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: >>>>>>>> at reviewer's discretion >>>>>>> In general, maybe offering a dev the option of choosing either >>>>>>> "pre-commit smoke" or "post-commit full" at their discretion for any >>>>>>> work would be the right play. >>>>>>> >>>>>>> A follow-on thought: even with something as significant as Accord, TCM, >>>>>>> Trie data structures, etc - I'd be a bit surprised to see tests fail on >>>>>>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that >>>>>>> weren't immediately clear the patch stumbled across something >>>>>>> surprising and was immediately trivially attributable if not fixable. >>>>>>> *In theory* the things we're talking about excluding from the >>>>>>> pre-commit smoke test suite are all things that are supposed to be >>>>>>> identical across environments and thus opaque / interchangeable by >>>>>>> default (JDK version outside checking build which we will, vnodes vs. >>>>>>> non, etc). >>>>>>> >>>>>>> Has that not proven to be the case in your experience? >>>>>>> >>>>>>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote: >>>>>>>> A strong +1 to getting to a single CI system. CircleCI definitely has >>>>>>>> some niceties and I understand why it's currently used, but right now >>>>>>>> we get 2 CI systems for twice the price. +1 on the proposed subsets. >>>>>>>> >>>>>>>> Derek >>>>>>>> >>>>>>>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <jmcken...@apache.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> I'm personally not thinking about CircleCI at all; I'm envisioning a >>>>>>>>> world where all of us have 1 CI *software* system (i.e. reproducible >>>>>>>>> on any env) that we use for pre-commit validation, and then >>>>>>>>> post-commit happens on reference ASF hardware. >>>>>>>>> >>>>>>>>> So: >>>>>>>>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On >>>>>>>>> green, merge. >>>>>>>>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, >>>>>>>>> link back to the JIRA where the commit took place >>>>>>>>> >>>>>>>>> Circle would need to remain in lockstep with the requirements for >>>>>>>>> point 1 here. >>>>>>>>> >>>>>>>>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote: >>>>>>>>>> +1 to Josh which is exactly my line of thought as well. But that is >>>>>>>>>> only valid if we have a solid Jenkins that will eventually run all >>>>>>>>>> test configs. So I think I lost track a bit here. Are you proposing: >>>>>>>>>> >>>>>>>>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, >>>>>>>>>> TBD) config of tests >>>>>>>>>> >>>>>>>>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies >>>>>>>>>> you in case of problems? >>>>>>>>>> >>>>>>>>>> Or sthg different like having 1 also in Jenkins? >>>>>>>>>> >>>>>>>>>> On 7/7/23 17:55, Andrés de la Peña wrote: >>>>>>>>>>> I think 500 runs combining all configs could be reasonable, since >>>>>>>>>>> it's unlikely to have config-specific flaky tests. As in five >>>>>>>>>>> configs with 100 repetitions each. >>>>>>>>>>> >>>>>>>>>>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <jmcken...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> Maybe. Kind of depends on how long we write our tests to run >>>>>>>>>>>> doesn't it? :) >>>>>>>>>>>> >>>>>>>>>>>> But point taken. Any non-trivial test would start to be something >>>>>>>>>>>> of a beast under this approach. >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote: >>>>>>>>>>>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie >>>>>>>>>>>>> <jmcken...@apache.org> wrote: >>>>>>>>>>>>> > 3. Multiplexed tests (changed, added) run against all JDK's and >>>>>>>>>>>>> > a broader range of configs (no-vnode, vnode default, >>>>>>>>>>>>> > compression, etc) >>>>>>>>>>>>> >>>>>>>>>>>>> I think this is going to be too heavy...we're taking 500 >>>>>>>>>>>>> iterations >>>>>>>>>>>>> and multiplying that by like 4 or 5? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> +---------------------------------------------------------------+ >>>>>>>> | Derek Chen-Becker | >>>>>>>> | GPG Key available at https://keybase.io/dchenbecker and | >>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | >>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | >>>>>>>> +---------------------------------------------------------------+ >>>>>>>> >>>>>>> >>>>