Re: A proposal for refactoring the CircleCI config

David Capwell Wed, 02 Nov 2022 10:40:47 -0700

Thanks for starting this thread!

If I am reading correctly, there are 2 main improvements proposed: move 
parallel value to a param and have our “patches” update those and not the jobs 
themselves, look into / use matrix to define the set of jobs we need…


For the parallel param logic, sounds fine to me.  Not sure if this would also 
work for resource_type, but I still argue that xlarge isn’t needed in 90% of 
the cases its used… so fixing this may be better than param there…. So yes, I 
would be cool with this change if it basically removes the patching logic… I 
had another JIRA to have a python script rewrite the YAML, but this method may 
solve in a cleaner way.

About matrix jobs; I don’t know them in circle but have used in other places, 
this sounds good to me.  I would also enhance to argue that JVM is just 1 
config and we sadly have many more:

JVM: [8, 11, 17]
VNODE: [true, false]
CDC: [true, false]
COMPRESSION: [true, false]
MEMTABLE: [skiplist, shardedskiplist, trie]

So, if we can express that with matrix; sounds good to me.

One point about JVM support that is subtle and skipped by many: we currently 
also test compiling with the smallest version and running with the larger 
version… this could also be a matrix job, but would just be a matrix over 11/17 
with 8 removed from that list.

> On Oct 28, 2022, at 2:25 PM, Derek Chen-Becker <de...@chen-becker.org> wrote:
> 
> While I've been working on bringing CircleCI to parity with Jenkins,
> I've made some notes about ways that the whole config generation
> process could be improved. Here are my thoughts. I'm not sure if this
> is worthy of a CEP since it's infra and not a feature.
> 
> Cheers,
> 
> Derek
> 
> Problem Statement
> ═════════════════
> 
>  The CircleCI configuration subdivides various steps in the test plan
>  into jobs that can execute independently. This set of jobs are
>  intended to be run by developers under the free/OSS plan as well as
>  under paid CircleCI plans for developers or organizations that wish to
>  spend money to obtain faster test results by committing more resources
>  (LOWRES, MIDRES, HIGHRES).
> 
>  The allocation of resources is currently driven by a shell script that
>  performs textual modification (e.g. patch and sed) of files before
>  processing to effect changes in various configuration parameters.
>  While this approach works, it does not fully utilize the
>  parameterization features of CircleCI’s configuration processor to
>  reduce complexity when adding new tests or making changes to the
>  system, imposing additional burden on developers modifying the CI
>  configuration.
> 
>  This proposal details an initial goal for reducing CircleCI
>  configuration complexity, and provides a high level overview of
>  subsequent goals to be investigated.
> 
> 
> Goal 1: Eliminate patch files
> ═════════════════════════════
> 
>  Patch files are targeted as the first goal for this proposal because
>  there is a significant reduction in configuration complexity for a
>  relatively modest effort. Patch files themselves are brittle; the
>  patch tool can accommodate some changes between the original target
>  file and the current state, but cannot also unambiguously apply
>  changes. When the CircleCI configuration is changed, the patch files
>  also need to be changed or regenerated to match line numbers and any
>  new sections added. This is extra work that does not provide any
>  benefit.
> 
>  The patch files currently apply changes to three types of
>  configuration:
> 
>  • Heap size parameters (only for the HIGHRES config)
>  • Job resource class
>  • Executor parallelism
> 
>  CircleCI handles this use case via parameterization of the
>  configuration. Interestingly, our CircleCI configuration already takes
>  advantage of parameterization in the definition of the executor:
> 
>  ┌────
>  │ java8-executor:
>  │   parameters:
>  │     exec_resource_class:
>  │       type: string
>  └────
> 
>  CircleCI additionally allows for parameters to be defined at the top
>  level of the pipeline, which are then accessible anywhere in the
>  pipeline definition (e.g. steps, jobs, etc). These parameters can be
>  overridden by providing a yaml file to the `circleci config process'
>  command.
> 
>  As an example of what a change would entail, consider that the patch
>  files change the parallelism of all repated dtest executors uniformly.
>  We could introduce a single pipeline parameter for this value:
> 
>  ┌────
>  │ parameters:
>  │   repeated_dtest_parallelism:
>  │     type: integer
>  │     default: 4
>  └────
> 
>  And then update the configuration of the executors to use the
>  parameter:
> 
>  ┌────
>  │ j8_repeated_utest_executor: &j8_repeated_utest_executor
>  │   executor:
>  │     name: java8-executor
>  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
>  │
>  │ j8_repeated_dtest_executor: &j8_repeated_dtest_executor
>  │   executor:
>  │     name: java8-executor
>  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
>  │
>  │ j8_repeated_upgrade_dtest_executor: &j8_repeated_upgrade_dtest_executor
>  │   executor:
>  │     name: java8-executor
>  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
>  │
>  │ j8_repeated_jvm_upgrade_dtest_executor:
> &j8_repeated_jvm_upgrade_dtest_executor
>  │   executor:
>  │     name: java8-executor
>  │   parallelism: << pipeline.parameters.repeated_dtest_parallelism >>
>  └────
> 
>  Then we create a MIDRES-specific override file containing the new
>  value:
> 
>  ┌────
>  │ repeated_dtest_parallelism: 25
>  └────
> 
>  and a HIGHRES-specific override file:
> 
>  ┌────
>  │ repeated_dtest_parallelism: 100
>  └────
> 
>  And then execute the config processor with the proper override:
> 
>  ┌────
>  │ circleci config process --pipeline-parameters MIDRES-params.yml
> config-2_1.yml
>  │ # or
>  │ circleci config process --pipeline-parameters HIGHRES-params.yml
> config-2_1.yml
>  └────
> 
>  If a new job is added that does not use any parameters, only the
>  `config-2_1.yml' file is modified. If new parameters are introduced,
>  then the override files also need updated. One option (although not
>  recommended) to this approach to help catch missed values for MID and
>  HIGH profiles would be to preclude the use of default values and
>  provide a LOWRES-params.yml file with the default values.
> 
>  One caveat is that there was a bug in the CircleCI CLI prior to
>  version 0.1.22322 that resulted in the override file being ignored.
>  One possible workaround, if we do not want to require a minimum
>  circleci CLI version, would be to define the parameters in their own
>  file and concatenate the level-specific definition file with the
>  config-2_1.yml file prior to processing.
> 
>  Either approach (yaml file override or concatenation) provides benefit
>  since there would be no need to modify the level-specific file as the
>  main configuration is changed, unlike with patch files.
> 
>  The changes needed to implement this are relatively small, and are
>  easy to test, since a change to use pipeline parameters (along with
>  requisite changes to `generate.sh') should not result in any changes
>  to the config.yml files.
> 
> 
> Other benefits for future investigation
> ═══════════════════════════════════════
> 
>  Beyond simple replacement of the patch files, there are some
>  additional capabilities in CircleCI configuration that may benefit the
>  project long-term. One of the most interesting is the concept of
>  matrix jobs. Matrix jobs allow you to parameterize a job over multiple
>  values. In our case, this could potentially allow us to parameterize
>  jobs over different version of the JDK. This would reduce the overall
>  size of the configuration, because the current approach duplicates
>  jobs for each Java version. It would also make it simpler to add/test
>  new Java versions by making a relatively small number of changes to
>  the matrix instead of creating a whole set of version-specific jobs.
>  We would still need to create version-specific executors, but if we
>  move things like parallelism and resource class out of the executor
>  and into the job definition, we can parameterize in the workflow and
>  reduce the number of executors to one per Java version. For example:
> 
>  ┌────
>  │ jobs:
>  │   j8_unit_tests:
>  │     parameters:
>  │       executor:
>  │         type: executor
>  │     executor: << parameters.executor >>
>  │     parallelism: << pipeline.parameters.unit_test_parallelism >>
>  │     resource_class: << pipeline.parameters.unit_test_resclass >>
>  │     steps:
>  │       - attach_workspace:
>  │           at: /home/cassandra
>  │       - create_junit_containers
>  │       - log_environment
>  │       - run_parallel_junit_tests
>  │
>  │ workflows:
>  │   pre-commit_tests:
>  │     jobs:
>  │       - j8_unit_tests:
>  │           requires:
>  │             - build
>  │           matrix:
>  │             parameters:
>  │               executor: [java8-executor, java11-executor, java17-executor]
>  └────
> 
> -- 
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+

Re: A proposal for refactoring the CircleCI config

Reply via email to