Thank you, Josh and Mick

Immediate questions on my mind:
- Currently we run at most two parallel CI runs in Jenkins-dev, I guess you
will try to improve that limitation?
- There are hw constraints, is there any approximation on how long it will
take to run all tests? Or is there a stated goal that we will strive to
reach as a project?
- Bringing scripts in-tree will make it easier to add a multiplexer which
we miss at the moment, that’s great. (Running jobs in a loop, helps a lot
with flaky tests) . Also makes it easier to add any new test suites

On Fri, 30 Jun 2023 at 13:35, Derek Chen-Becker <de...@chen-becker.org>
wrote:

> Thanks Josh, this looks great! I think the constraints you've outlined are
> reasonable for an initial attempt. We can always evolve if we run into
> issues.
>
> Cheers,
>
> Derek
>
> On Fri, Jun 30, 2023 at 11:19 AM Josh McKenzie <jmcken...@apache.org>
> wrote:
>
>> Context: we're looking to get away from having split CircleCI and ASF CI
>> as well
>> as getting ASF CI to a stable state. There's a variety of reasons why
>> it's flaky
>> (orchestration, heterogenous hardware, hardware failures, flaky tests,
>> non-deterministic runs, noisy neighbors, etc), many of which Mick has been
>> making great headway on starting to address.
>>
>> If you're curious see:
>> - Mick's 2023/01/09 email thread on CI:
>>     https://lists.apache.org/thread/fqdvqkjmz6w8c864vw98ymvb1995lcy4
>> - Mick's 2023/04/26 email thread on CI:
>>     https://lists.apache.org/thread/xb80v6r857dz5rlm5ckcn69xcl4shvbq
>> - CASSANDRA-18137: epic for "Repeatable ci-cassandra.a.o":
>>     https://issues.apache.org/jira/browse/CASSANDRA-18137
>> - CASSANDRA-18133: In-tree build scripts:
>>     https://issues.apache.org/jira/browse/CASSANDRA-18133
>>
>> What's fallen out from this: the new reference CI will have the following
>> logical layers:
>> 1. ant
>> 2. build/test scripts that setup the env. See run-tests.sh and
>>     run-python-dtests.sh here:
>>
>> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build
>> 3. dockerized build/test scripts that have containerized the flow of 1
>> and 2. See:
>>
>> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build/docker
>> 4. CI integrations. See generation of unified test report in build.xml:
>>
>> https://github.com/thelastpickle/cassandra/blame/mck/18133/trunk/build.xml#L1794-L1817
>> )
>> 5. Optional full CI lifecycle w/Jenkins running in a container (full stack
>>     setup, run, teardown, pending)
>>
>>
>> *I want to let everyone know the high level structure of how this is
>> shaping up,*
>>
>> *as this is a change that will directly impact the work of *all of us* on
>> the*
>> *project.*
>>
>> In terms of our goals, the chief goals I'd like to call out in this
>> context are:
>> * ASF CI needs to be and remain consistent
>> * contributors need a turnkey way to validate their work before merging
>> that
>>     they can accelerate by throwing resources at it.
>>
>> We as a project need to determine what is *required* to run in a CI
>> environment
>>     to consider that run certified for merge. Where Mick and I landed
>> through a lot
>>     of back and forth is that the following would be required:
>> 1. used ant / pytest to build and run tests
>> 2. used the reference scripts being changed in CASSANDRA-18133 (in-tree
>> .build/)
>>     to setup and execute your test environment
>> 3. constrained your runtime environment to the same hardware and time
>>     constraints we use in ASF CI, within reason (CPU count independent of
>> speed,
>>     memory size and disk size independent of hardware specs, etc)
>> 4. reported test results in a unified fashion that has all the
>> information we
>>     normally get from a test run
>> 5. (maybe) Parallelized the tests across the same split lines as upstream
>> ASF
>>     (i.e. no weird env specific neighbor / scheduling flakes)
>>
>> Last but not least is the "What do we do with CircleCI?" angle. The
>> current
>> thought is we allow people to continue using it with the stated goal of
>> migrating the circle config over to using the unified build scripts as
>> well and
>> get it in compliance with the above requirements.
>>
>> For reference, here's a gdoc where we've hashed this out:
>>
>> https://docs.google.com/document/d/1TaYMvE5ryOYX03cxzY6XzuUS651fktVER02JHmZR5FU/edit?usp=sharing
>>
>> So my questions for the community here:
>> 1. What's missing from the above conceptualization of the problem?
>> 2. Are the constraints too strong? Too weak? Just right?
>>
>> Thanks everyone, and happy Friday. ;)
>>
>> ~Josh
>>
>
>
> --
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>
>

Reply via email to