Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Ekaterina Dimitrova Wed, 05 Jul 2023 14:09:36 -0700

“I'm curious what it triggers for you Brandon, Berenguer, Andres,
Ekaterina, and Mick (when you're back from the mountains ;)). ”
We already have pre-commit a minimum set being mandatory in CircleCI.
People can manually trigger other tests if they feel they might have broken
something. The only tests that are a matter of config twist that is
mandatory to run now in the pre-commit CircleCI workflow are those that
were never added to Jenkins (for example, system keyspaces, and oa unit
tests). Probably the only combination that we might want to reconsider is
with/without vnodes?

I wouldn’t also advocate not running JDK17 tests until we enable all test
suites post-commit. Reminder - those that still have failing tests we are
actively working on are disabled in Jenkins to reduce the noise until we
are ready to fully switch from 8+11 to 11+17. Probably also in the future,
when we work to introduce new JDK versions, we would again want to run
tests and see whether we regress the people who are dealing with all the
maintenance/problems in the background.

Another twist - I think Jenkins dev so far triggers all tests as
post-commit does, no? Probably that can change to mimic what we agreed on
for CircleCI. I am sure the devil will be again in the details, but just a
thing to consider.

“ If a failure makes it to post-commit, it's much more expensive to root
cause and figure out with much higher costs to the community's collective
productivity.”

Totally agree. And my hope is that the pre-commit spinning in loop tests
should help us deal with that to some extend. It is always easy for the
author to do a fix while their thoughts are still on the topic. It also
reduces the time people spend on bisecting and doing archeology later. On
Derek’s point about flakiness being also attributed to tests sometimes -
when we get bitten a few times pre-commit and have to improve our tests to
make them more deterministic, I believe we will learn a thing or two, and
those types of things will happen less in time.

Best regards,
Ekaterina

---------- Forwarded message ---------
From: Josh McKenzie <jmcken...@apache.org>
Date: Wed, 5 Jul 2023 at 8:25
Subject: Re: [DISCUSS] Formalizing requirements for pre-commit patches on
new CI
To: dev <dev@cassandra.apache.org>

choose a consistent, representative subset of stable tests that we feel
give us a reasonable level of confidence in return for a reasonable amount
of runtime
...
Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and
j11 w/wo vnodes. That is 6 times total. I wonder about that ROI.
...

test with the default number of vnodes, test with the default compression
settings, and test with the default heap/off-heap buffers.

If I take these at face value to be true (I happen to agree with them, so
I'm going to do this :)), what falls out for me:

   1. Pre-commit should be an intentional smoke-testing suite, much smaller
   relative to post-commit than it is today
   2. We should aggressively cull all low-signal pre-commit tests, suites,
   and configurations that aren't needed to keep post-commit stable

High signal in pre-commit (indicative; non-exhaustive):

   1. Only the most commonly used JDK (JDK11 atm?)
   2. Config defaults (vnodes, compression, heap/off-heap buffers, memtable
   format, sstable format)
   3. Most popular / general / run-of-the-mill linux distro (debian?)

Low signal in pre-commit (indicative; non-exhaustive):

   1. No vnodes
   2. JDK8; JDK17
   3. Non-default settings (Compression off. Fully mmap, no mmap. Trie
   memtables or sstables, cdc enabled)

So this shape of thinking - I'm curious what it triggers for you Brandon,
Berenguer, Andres, Ekaterina, and Mick (when you're back from the mountains
;)). You guys paid a lot of the debt in the run up to 4.1 so have the most
recent expertise and I trust your perspectives here.

If a failure makes it to post-commit, it's much more expensive to root
cause and figure out with much higher costs to the community's collective
productivity. That said, I think we can make a lot of progress along this
line of thinking.

On Wed, Jul 5, 2023, at 5:54 AM, Jacek Lewandowski wrote:

Perhaps pre-commit checks should include mostly the typical configuration
of Cassandra rather than some subset of possible combinations. Like it was
said somewhere above - test with the default number of vnodes, test with
the default compression settings, and test with the default heap/off-heap
buffers.

A longer-term goal could be to isolate what depends on particular
configuration options. Instead of blindly running everything with, say,
vnodes enabled and disabled, isolate those tests that need to be run with
those two configurations and run the rest with the default one.

... the rule of multiplexing new or changed tests might go a long way to
mitigating that ...

I wonder if there is some commonality in the flaky tests reported so far,
like the presence of certain statements? Also, there could be a tool that
inspects coverage analysis reports and chooses the proper tests to
run/multiplex because, in the end, we want to verify the changed production
code in addition to the modified test files.

thanks,
Jacek

śr., 5 lip 2023 o 06:28 Berenguer Blasi <berenguerbl...@gmail.com>
napisał(a):

Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and
j11 w/wo vnodes. That is 6 times total. I wonder about that ROI.

On dtest cluster reusage yes, I stopped that as at the time we had lots of
CI changes, an upcoming release and priorities. But when the CI starts
flexing it's muscles that'd be easy to pick up again as dtests code
shouldn't have changed much.
On 4/7/23 17:11, Derek Chen-Becker wrote:

Ultimately I think we have to invest in two directions: first, choose a
consistent, representative subset of stable tests that we feel give us a
reasonable level of confidence in return for a reasonable amount of
runtime. Second, we need to invest in figuring out why certain tests fail.
I strongly dislike the term "flaky" because it suggests that it's some
inconsequential issue causing problems. The truth is that a test that fails
is either a bug in the service code or a bug in the test. I've come to
realize that the CI and build framework is way too complex for me to be
able to help with much, but I would love to start chipping away at failing
test bugs. I'm getting settled into my new job and I should be able to
commit some regular time each week to triage and fixing starting in August,
and if there are any other folks who are interested let me know.

Cheers,

Derek

On Mon, Jul 3, 2023, 12:30 PM Josh McKenzie <jmcken...@apache.org> wrote:

Instead of running all the tests through available CI agents every time we
can have presets of tests:

Back when I joined the project in 2014, unit tests took ~ 5 minutes to run
on a local machine. We had pre-commit and post-commit tests as a
distinction as well, but also had flakes in the pre and post batch. I'd
love to see us get back to a unit test regime like that.

The challenge we've always had is flaky tests showing up in either the
pre-commit or post-commit groups and difficulty in attribution on a flaky
failure to where it was introduced (not to lay blame but to educate and
learn and prevent recurrence). While historically further reduced smoke
testing suites would just mean more flakes showing up downstream, the rule
of multiplexing new or changed tests might go a long way to mitigating that.

Should we mention in this concept how we will build the sub-projects (e.g.
Accord) alongside Cassandra?

I think it's an interesting question, but I also think there's no real
dependency of process between primary mainline branches and feature
branches. My intuition is that having the same bar (green CI, multiplex,
don't introduce flakes, smart smoke suite tiering) would be a good idea on
feature branches so there's not a death march right before merge, squashing
flakes when you have to multiplex hundreds of tests before merge to
mainline (since presumably a feature branch would impact a lot of tests).

Now that I write that all out it does sound Painful. =/

On Mon, Jul 3, 2023, at 10:38 AM, Maxim Muzafarov wrote:

For me, the biggest benefit of keeping the build scripts and CI
configurations as well in the same project is that these files are
versioned in the same way as the main sources do. This ensures that we
can build past releases without having any annoying errors in the
scripts, so I would say that this is a pretty necessary change.

I'd like to mention the approach that could work for the projects with
a huge amount of tests. Instead of running all the tests through
available CI agents every time we can have presets of tests:
- base tests (to make sure that your design basically works, the set
will not run longer than 30 min);
- pre-commit tests (the number of tests to make sure that we can
safely commit new changes and fit the run into the 1-2 hour build
timeframe);
- nightly builds (scheduled task to build everything we have once a
day and notify the ML if that build fails);

My question here is:
Should we mention in this concept how we will build the sub-projects
(e.g. Accord) alongside Cassandra?

On Fri, 30 Jun 2023 at 23:19, Josh McKenzie <jmcken...@apache.org> wrote:
>
> Not everyone will have access to such resources, if all you have is 1
such pod you'll be waiting a long time (in theory one month, and you
actually need a few bigger pods for some of the more extensive tests, e.g.
large upgrade tests)….
>
> One thing worth calling out: I believe we have a lot of low hanging fruit
in the domain of "find long running tests and speed them up". Early 2022 I
was poking around at our unit tests on CASSANDRA-17371 and found that 2.62%
of our tests made up 20.4% of our runtime (
https://docs.google.com/spreadsheets/d/1-tkH-hWBlEVInzMjLmJz4wABV6_mGs-2-NNM2XoVTcA/edit#gid=1501761592).
This kind of finding is pretty consistent; I remember Carl Yeksigian at
NGCC back in like 2015 axing an hour plus of aggregate runtime by just
devoting an afternoon to looking at a few badly behaving tests.
>
> I'd like to see us move from "1 pod 1 month" down to something a lot more
manageable. :)
>
> Shout-out to Berenger's work on CASSANDRA-16951 for dtest cluster reuse
(not yet merged), and I have CASSANDRA-15196 to remove the CDC vs. non
segment allocator distinction and axe the test-cdc target entirely.
>
> Ok. Enough of that. Don't want to derail us, just wanted to call out that
the state of things today isn't the way it has to be.
>
> On Fri, Jun 30, 2023, at 4:41 PM, Mick Semb Wever wrote:
>
> - There are hw constraints, is there any approximation on how long it
will take to run all tests? Or is there a stated goal that we will strive
to reach as a project?
>
> Have to defer to Mick on this; I don't think the changes outlined here
will materially change the runtime on our currently donated nodes in CI.
>
>
>
> A recent comparison between CircleCI and the jenkins code underneath
ci-cassandra.a.o was done (not yet shared) to whether a 'repeatable CI' can
be both lower cost and same turn around time.  The exercise undercovered
that there's a lot of waste in our jenkins builds, and once the jenkinsfile
becomes standalone it can stash and unstash the build results.  From this a
conservative estimate was even if we only brought the build time to be
double that of circleci it will still be significantly lower cost while
still using on-demand ec2 instances. (The goal is to use spot instances.)
>
> The real problem here is that our CI pipeline uses ~1000 containers.
ci-cassandra.a.o only has 100 executors (and a few of these at any time are
often down for disk self-cleaning).   The idea with 'repeatable CI', and to
a broader extent Josh's opening email, is that no one will need to use
ci-cassandra.a.o for pre-commit work anymore.  For post-commit we don't
care if it takes 7 hours (we care about stability of results, which
'repeatable CI' also helps us with).
>
> While pre-commit testing will be more accessible to everyone, it will
still depend on the resources you have access to.  For the fastest
turn-around times you will need a k8s cluster that can spawn 1000 pods
(4cpu, 8GB ram) which will run for up to 1-30 minutes, or the equivalent.
Not everyone will have access to such resources, if all you have is 1 such
pod you'll be waiting a long time (in theory one month, and you actually
need a few bigger pods for some of the more extensive tests, e.g. large
upgrade tests)….
>
>

Fwd: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Reply via email to