> > - There are hw constraints, is there any approximation on how long it will > take to run all tests? Or is there a stated goal that we will strive to > reach as a project? > > Have to defer to Mick on this; I don't think the changes outlined here > will materially change the runtime on our currently donated nodes in CI. >
A recent comparison between CircleCI and the jenkins code underneath ci-cassandra.a.o was done (not yet shared) to whether a 'repeatable CI' can be both lower cost and same turn around time. The exercise undercovered that there's a lot of waste in our jenkins builds, and once the jenkinsfile becomes standalone it can stash and unstash the build results. From this a conservative estimate was even if we only brought the build time to be double that of circleci it will still be significantly lower cost while still using on-demand ec2 instances. (The goal is to use spot instances.) The real problem here is that our CI pipeline uses ~1000 containers. ci-cassandra.a.o only has 100 executors (and a few of these at any time are often down for disk self-cleaning). The idea with 'repeatable CI', and to a broader extent Josh's opening email, is that no one will need to use ci-cassandra.a.o for pre-commit work anymore. For post-commit we don't care if it takes 7 hours (we care about stability of results, which 'repeatable CI' also helps us with). While pre-commit testing will be more accessible to everyone, it will still depend on the resources you have access to. For the fastest turn-around times you will need a k8s cluster that can spawn 1000 pods (4cpu, 8GB ram) which will run for up to 1-30 minutes, or the equivalent. Not everyone will have access to such resources, if all you have is 1 such pod you'll be waiting a long time (in theory one month, and you actually need a few bigger pods for some of the more extensive tests, e.g. large upgrade tests)….