Thank you, Josh and Mick Immediate questions on my mind: - Currently we run at most two parallel CI runs in Jenkins-dev, I guess you will try to improve that limitation? - There are hw constraints, is there any approximation on how long it will take to run all tests? Or is there a stated goal that we will strive to reach as a project? - Bringing scripts in-tree will make it easier to add a multiplexer which we miss at the moment, that’s great. (Running jobs in a loop, helps a lot with flaky tests) . Also makes it easier to add any new test suites
On Fri, 30 Jun 2023 at 13:35, Derek Chen-Becker <de...@chen-becker.org> wrote: > Thanks Josh, this looks great! I think the constraints you've outlined are > reasonable for an initial attempt. We can always evolve if we run into > issues. > > Cheers, > > Derek > > On Fri, Jun 30, 2023 at 11:19 AM Josh McKenzie <jmcken...@apache.org> > wrote: > >> Context: we're looking to get away from having split CircleCI and ASF CI >> as well >> as getting ASF CI to a stable state. There's a variety of reasons why >> it's flaky >> (orchestration, heterogenous hardware, hardware failures, flaky tests, >> non-deterministic runs, noisy neighbors, etc), many of which Mick has been >> making great headway on starting to address. >> >> If you're curious see: >> - Mick's 2023/01/09 email thread on CI: >> https://lists.apache.org/thread/fqdvqkjmz6w8c864vw98ymvb1995lcy4 >> - Mick's 2023/04/26 email thread on CI: >> https://lists.apache.org/thread/xb80v6r857dz5rlm5ckcn69xcl4shvbq >> - CASSANDRA-18137: epic for "Repeatable ci-cassandra.a.o": >> https://issues.apache.org/jira/browse/CASSANDRA-18137 >> - CASSANDRA-18133: In-tree build scripts: >> https://issues.apache.org/jira/browse/CASSANDRA-18133 >> >> What's fallen out from this: the new reference CI will have the following >> logical layers: >> 1. ant >> 2. build/test scripts that setup the env. See run-tests.sh and >> run-python-dtests.sh here: >> >> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build >> 3. dockerized build/test scripts that have containerized the flow of 1 >> and 2. See: >> >> https://github.com/thelastpickle/cassandra/tree/0aecbd873ff4de5474fe15efac4cdde10b603c7b/.build/docker >> 4. CI integrations. See generation of unified test report in build.xml: >> >> https://github.com/thelastpickle/cassandra/blame/mck/18133/trunk/build.xml#L1794-L1817 >> ) >> 5. Optional full CI lifecycle w/Jenkins running in a container (full stack >> setup, run, teardown, pending) >> >> >> *I want to let everyone know the high level structure of how this is >> shaping up,* >> >> *as this is a change that will directly impact the work of *all of us* on >> the* >> *project.* >> >> In terms of our goals, the chief goals I'd like to call out in this >> context are: >> * ASF CI needs to be and remain consistent >> * contributors need a turnkey way to validate their work before merging >> that >> they can accelerate by throwing resources at it. >> >> We as a project need to determine what is *required* to run in a CI >> environment >> to consider that run certified for merge. Where Mick and I landed >> through a lot >> of back and forth is that the following would be required: >> 1. used ant / pytest to build and run tests >> 2. used the reference scripts being changed in CASSANDRA-18133 (in-tree >> .build/) >> to setup and execute your test environment >> 3. constrained your runtime environment to the same hardware and time >> constraints we use in ASF CI, within reason (CPU count independent of >> speed, >> memory size and disk size independent of hardware specs, etc) >> 4. reported test results in a unified fashion that has all the >> information we >> normally get from a test run >> 5. (maybe) Parallelized the tests across the same split lines as upstream >> ASF >> (i.e. no weird env specific neighbor / scheduling flakes) >> >> Last but not least is the "What do we do with CircleCI?" angle. The >> current >> thought is we allow people to continue using it with the stated goal of >> migrating the circle config over to using the unified build scripts as >> well and >> get it in compliance with the above requirements. >> >> For reference, here's a gdoc where we've hashed this out: >> >> https://docs.google.com/document/d/1TaYMvE5ryOYX03cxzY6XzuUS651fktVER02JHmZR5FU/edit?usp=sharing >> >> So my questions for the community here: >> 1. What's missing from the above conceptualization of the problem? >> 2. Are the constraints too strong? Too weak? Just right? >> >> Thanks everyone, and happy Friday. ;) >> >> ~Josh >> > > > -- > +---------------------------------------------------------------+ > | Derek Chen-Becker | > | GPG Key available at https://keybase.io/dchenbecker and | > | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org | > | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7 7F42 AFC5 AFEE 96E4 6ACC | > +---------------------------------------------------------------+ > >