Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Josh McKenzie Wed, 12 Jul 2023 06:09:21 -0700

> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
I have a weak preference for re-opening the original ticket and tracking the 
revert + fix there. Keeps the workflow in one place. "Downside" is having 
multiple commits with "CASSANDRA-XXXXXX" in the message but that might actually 
be a nice thing grepping through to see what changes were made for a specific 
effort.


> I am not sure people finding the time to fix their breakages will be solved 
> but at least they will be pinged automatically.
That's where the "muscle around git revert" comes in. If we all agree to revert 
patches that break tests, fix them, and then re-merge them, I think that both 
keeps that work in the "original mental bucket required to be done", and also 
pressures all of us to take our pre-commit CI seriously and continue to refine 
it until such breakages don't occur, or occur so rarely they reach an 
acceptable level.

We also will offer the ability to run the pre-commit suite pre-merge or the 
post-commit suite pre-merge for folks who would prefer that approach to 
investment (machine time vs. risk of human time).

On Wed, Jul 12, 2023, at 8:52 AM, Jacek Lewandowski wrote:
> Would it be re-opening the ticket or creating a new ticket with "revert of 
> fix" ?
> 
> 
> 
> śr., 12 lip 2023 o 14:51 Ekaterina Dimitrova <[email protected]> 
> napisał(a):
>> jenkins_jira_integration 
>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>  script updating the JIRA ticket with test results if you cause a regression 
>> + us building a muscle around reverting your commit if they break tests.“
>> 
>> I am not sure people finding the time to fix their breakages will be solved 
>> but at least they will be pinged automatically. Hopefully many follow Jira 
>> updates.
>> 
>> “  I don't take the past as strongly indicative of the future here since 
>> we've been allowing circle to validate pre-commit and haven't been 
>> multiplexing.”
>> I am interested to compare how many tickets for flaky tests we will have 
>> pre-5.0 now compared to pre-4.1.
>> 
>> 
>> On Wed, 12 Jul 2023 at 8:41, Josh McKenzie <[email protected]> wrote:
>>> __
>>> (This response ended up being a bit longer than intended; sorry about that)
>>> 
>>>> What is more common though is packaging errors,
>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo
>>>> upgrade tests, being less responsive post-commit as you already
>>>> moved on
>>> *Two that ***should ***be resolved in the new regime:***
>>> * Packaging errors should be caught pre as we're making the artifact builds 
>>> part of pre-commit.
>>> * I'm hoping to merge the commit log segment allocation so CDC allocator is 
>>> the only one for 5.0 (and just bypasses the cdc-related work on allocation 
>>> if it's disabled thus not impacting perf); the existing targeted testing of 
>>> cdc specific functionality should be sufficient to confirm its correctness 
>>> as it doesn't vary from the primary allocation path when it comes to 
>>> mutation space in the buffer
>>> * Upgrade tests are going to be part of the pre-commit suite
>>> 
>>> *Outstanding issues:***
>>> * compression. If we just run with defaults we won't test all cases so 
>>> errors could pop up here
>>> * system_ks_directory related things: is this still ongoing or did we have 
>>> a transient burst of these types of issues? And would we expect these to 
>>> vary based on different JDK's, non-default configurations, etc?
>>> * Being less responsive post-commit: My only ideas here are a combination 
>>> of the jenkins_jira_integration 
>>> <https://github.com/apache/cassandra-builds/blob/trunk/jenkins-jira-integration/jenkins_jira_integration.py>
>>>  script updating the JIRA ticket with test results if you cause a 
>>> regression + us building a muscle around reverting your commit if they 
>>> break tests.
>>> 
>>> To quote Jacek:
>>>> why don't run dtests w/wo sstable compression x w/wo internode encryption 
>>>> x w/wo vnodes, 
>>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
>>>> think this is a matter of cost vs result. 
>>> 
>>> I think we've organically made these decisions and tradeoffs in the past 
>>> without being methodical about it. If we can:
>>> 1. Multiplex changed or new tests
>>> 2. Tighten the feedback loop of "tests were green, now they're 
>>> *consistently* not, you're the only one who changed something", and
>>> 3. Instill a culture of "if you can't fix it immediately revert your commit"
>>> 
>>> Then I think we'll only be vulnerable to flaky failures introduced across 
>>> different non-default configurations as side effects in tests that aren't 
>>> touched, which *intuitively* feels like a lot less than we're facing today. 
>>> We could even get clever as a day 2 effort and define packages in the 
>>> primary codebase where changes take place and multiplex (on a smaller 
>>> scale) their respective packages of unit tests in the future if we see 
>>> problems in this area.
>>> 
>>> Flakey tests are a giant pain in the ass and a huge drain on productivity, 
>>> don't get me wrong. *And* we have to balance how much cost we're paying 
>>> before each commit with the benefit we expect to gain from that.
>>> 
>>> Does the above make sense? Are there things you've seen in the trenches 
>>> that challenge or invalidate any of those perspectives?
>>> 
>>> On Wed, Jul 12, 2023, at 7:28 AM, Jacek Lewandowski wrote:
>>>> Isn't novnodes a special case of vnodes with n=1 ?
>>>> 
>>>> We should rather select a subset of tests for which it makes sense to run 
>>>> with different configurations. 
>>>> 
>>>> The set of configurations against which we run the tests currently is 
>>>> still only the subset of all possible cases. 
>>>> I could ask - why don't run dtests w/wo sstable compression x w/wo 
>>>> internode encryption x w/wo vnodes, 
>>>> w/wo off-heap buffers x j8/j11/j17 x w/wo CDC x RedHat/Debian/SUSE, etc. I 
>>>> think this is a matter of cost vs result. 
>>>> This equation contains the likelihood of failure in configuration X, given 
>>>> there was no failure in the default 
>>>> configuration, the cost of running those tests, the time we delay merging, 
>>>> the likelihood that we wait for 
>>>> the test results so long that our branch diverge and we will have to rerun 
>>>> them or accept the fact that we merge 
>>>> a code which was tested on outdated base. Eventually, the overall new 
>>>> contributors experience - whether they 
>>>> want to participate in the future.
>>>> 
>>>> 
>>>> 
>>>> śr., 12 lip 2023 o 07:24 Berenguer Blasi <[email protected]> 
>>>> napisał(a):
>>>>> On our 4.0 release I remember a number of such failures but not recently. 
>>>>> What is more common though is packaging errors, 
>>>>> cdc/compression/system_ks_directory targeted fixes, CI w/wo upgrade 
>>>>> tests, being less responsive post-commit as you already moved on,... 
>>>>> Either the smoke pre-commit has approval steps for everything or we 
>>>>> should give imo a devBranch alike job to the dev pre-commit. I find it 
>>>>> terribly useful. My 2cts.
>>>>> 
>>>>> On 11/7/23 18:26, Josh McKenzie wrote:
>>>>>>> 2: Pre-commit 'devBranch' full suite for high risk/disruptive merges: 
>>>>>>> at reviewer's discretion
>>>>>> In general, maybe offering a dev the option of choosing either 
>>>>>> "pre-commit smoke" or "post-commit full" at their discretion for any 
>>>>>> work would be the right play.
>>>>>> 
>>>>>> A follow-on thought: even with something as significant as Accord, TCM, 
>>>>>> Trie data structures, etc - I'd be a bit surprised to see tests fail on 
>>>>>> JDK17 that didn't on 11, or with vs. without vnodes, in ways that 
>>>>>> weren't immediately clear the patch stumbled across something surprising 
>>>>>> and was immediately trivially attributable if not fixable. *In theory* 
>>>>>> the things we're talking about excluding from the pre-commit smoke test 
>>>>>> suite are all things that are supposed to be identical across 
>>>>>> environments and thus opaque / interchangeable by default (JDK version 
>>>>>> outside checking build which we will, vnodes vs. non, etc).
>>>>>> 
>>>>>> Has that not proven to be the case in your experience?
>>>>>> 
>>>>>> On Tue, Jul 11, 2023, at 10:15 AM, Derek Chen-Becker wrote:
>>>>>>> A strong +1 to getting to a single CI system. CircleCI definitely has 
>>>>>>> some niceties and I understand why it's currently used, but right now 
>>>>>>> we get 2 CI systems for twice the price. +1 on the proposed subsets.
>>>>>>> 
>>>>>>> Derek
>>>>>>> 
>>>>>>> On Mon, Jul 10, 2023 at 9:37 AM Josh McKenzie <[email protected]> 
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I'm personally not thinking about CircleCI at all; I'm envisioning a 
>>>>>>>> world where all of us have 1 CI *software* system (i.e. reproducible 
>>>>>>>> on any env) that we use for pre-commit validation, and then 
>>>>>>>> post-commit happens on reference ASF hardware.
>>>>>>>> 
>>>>>>>> So:
>>>>>>>> 1: Pre-commit subset of tests (suites + matrices + env) runs. On 
>>>>>>>> green, merge.
>>>>>>>> 2: Post-commit tests (all suites, matrices, env) runs. If failure, 
>>>>>>>> link back to the JIRA where the commit took place
>>>>>>>> 
>>>>>>>> Circle would need to remain in lockstep with the requirements for 
>>>>>>>> point 1 here.
>>>>>>>> 
>>>>>>>> On Mon, Jul 10, 2023, at 1:04 AM, Berenguer Blasi wrote:
>>>>>>>>> +1 to Josh which is exactly my line of thought as well. But that is 
>>>>>>>>> only valid if we have a solid Jenkins that will eventually run all 
>>>>>>>>> test configs. So I think I lost track a bit here. Are you proposing:
>>>>>>>>> 
>>>>>>>>> 1- CircleCI: Run pre-commit a single (the most common/meaningful, 
>>>>>>>>> TBD) config of tests
>>>>>>>>> 
>>>>>>>>> 2- Jenkins: Runs post-commit _all_ test configs and emails/notifies 
>>>>>>>>> you in case of problems?
>>>>>>>>> 
>>>>>>>>> Or sthg different like having 1 also in Jenkins?
>>>>>>>>> 
>>>>>>>>> On 7/7/23 17:55, Andrés de la Peña wrote:
>>>>>>>>>> I think 500 runs combining all configs could be reasonable, since 
>>>>>>>>>> it's unlikely to have config-specific flaky tests. As in five 
>>>>>>>>>> configs with 100 repetitions each.
>>>>>>>>>> 
>>>>>>>>>> On Fri, 7 Jul 2023 at 16:14, Josh McKenzie <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Maybe. Kind of depends on how long we write our tests to run 
>>>>>>>>>>> doesn't it? :)
>>>>>>>>>>> 
>>>>>>>>>>> But point taken. Any non-trivial test would start to be something 
>>>>>>>>>>> of a beast under this approach.
>>>>>>>>>>> 
>>>>>>>>>>> On Fri, Jul 7, 2023, at 11:12 AM, Brandon Williams wrote:
>>>>>>>>>>>> On Fri, Jul 7, 2023 at 10:09 AM Josh McKenzie 
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>> > 3. Multiplexed tests (changed, added) run against all JDK's and 
>>>>>>>>>>>> > a broader range of configs (no-vnode, vnode default, 
>>>>>>>>>>>> > compression, etc)
>>>>>>>>>>>> 
>>>>>>>>>>>> I think this is going to be too heavy...we're taking 500 iterations
>>>>>>>>>>>> and multiplying that by like 4 or 5?
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> +---------------------------------------------------------------+
>>>>>>> | Derek Chen-Becker                                             |
>>>>>>> | GPG Key available at https://keybase.io/dchenbecker and       |
>>>>>>> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
>>>>>>> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
>>>>>>> +---------------------------------------------------------------+
>>>>>>> 
>>>>>> 
>>>

Re: Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Reply via email to