Re: [DISCUSS] Potential circleci config and workflow changes

Berenguer Blasi Wed, 19 Oct 2022 22:03:38 -0700

Can python upgrade tests be ran without -h? Last time I tried iirc theyfail on -m


On 20/10/22 4:11, Ekaterina Dimitrova wrote:

Thank you Josh. Glad to see that our CI is getting more attention. Asno Cassandra feature will be there if we don't do proper testing,right? Important as all the suites and tools we have. With that beingsaid I am glad to see Derek is volunteering to spend more time on thisas I believe this is always the main issue - ideas and willingness forimprovements are there but people are swamped with other things and welack manpower for something so important.1. Tune parallelism levels per job (David and Ekaterina have insighton this)Question for David, do you tune only parallelism and use only xlarge?If yes, we need to talk :DReading what Stefan shared as experience/feedback, I think we canrevise the current config and move to a more reasonable config thatcan work for most people but there will always be someone who needssomething a bit different. With that said maybe we can add to ourscripts/menu an option to change from command line through parametersparallelism and/or resources? For those who want furthercustomization? I see this as a separate additional ticket probably. Inthat case we might probably skip the use of circleci config processfor that part of the menu. (but not for adding new jobs and meaningfulpermanent updates)
2. Rename jobs on circle to be more indicative of their function
+0 I am probably super used to the current names but Derek brought itto my attention that there are names which are confusing for someonenew to the cassandra world. With that said I would say we can do thisin a separate ticket, mass update.3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595<https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595>)I am against unifying per JDK workflows but I am all in for unifyingthe pre-commit/separate workflows and getting back to 2 workflows assuggested by Andres. If we think of how that will look in the UI Ithink it will be super hard to follow. (the case of having unifiedboth jdks in one workflow)4. Update documentation w/guidance on using circle,.circleci/generate.sh examples, etc 4a. How to commit:https://cassandra.apache.org/_/development/how_to_commit.html4b.Testing: https://cassandra.apache.org/_/development/testing.htmlI will open a ticket and post the guide I was working on. But it alsodoesn't make sense to fully update it now if we are going tosignificantly change the workflow soon. Until then I believe Andreshas updated the circleci readme and provided good usage examples.
5. Flag on generate.sh to allow auto-run on push
Auto-run on push? Can you elaborate? Like to start your whole workflowdirectly without using the UI? There is an approval step in the configfile, we can probably add some flags to change pre-commit workflows tostart build without approval when we use those mentioned flags. Buthaving by default everything to start on push is an overkill in myopinion. People will be forgetting it and pushing builds for nothingon WIP branches. Talking from experience :D6. Clean up the -l, -m, -h flags (test and indicate -l feasibility forall suites, default to -m, deprecate -h?) <- may not be a code-changeissue and instead be a documentation issueIf we agree except the free tier config file we want one morereasonable config which doesn't bump resources to the max without aneed but provides balanced use of resources - absolutely. -h was keptas there was understanding there are people in the community activelyusing it.7. Consider flag on generate.sh to run and commit with "[DO NOT MERGE]temporary circleci config" as the commit message
+0
I also wanted to address a few of the points David made.
"Ekaterina is probably dealing with with her JDK17 work" - if you meanto ensure we have all jobs for all jdks properly, yes. That was myplan. Until Derek was so good at suggesting to work on adding missingjobs in CircleCI now so my work on that will be a bit less for certainthings. This is an effort related to the recent changes in our releasedocument. Ticket CASSANDRA-17950<https://issues.apache.org/jira/browse/CASSANDRA-17950> :-) I amhelping with mentoring/reviews. Everyone is welcome to join the party."1) resource_class used is not because its needed… in HIGHER file wedefault to xlarge but only python upgrade tests need that… reported inCASSANDRA-17600" - one of the reasons. we had the MIDRES in the firstplace as I mentioned in my other email the other day. [1]
"our current patching allows MID/HIGHER to drift as changes need newpatches else patching may do the wrong thing… reported inCASSANDRA-17600" - I'd say the patching is annoying sometimes, indeedbut with/without the patching any changes to config mean we need tocheck it by reading through diff and pushing a run to CI beforecommit. With that said I am all in for automation but this will notchange the fact we need to push test runs and verify the changes didnot hurt us in a way. Same as testing patches on all branches, runningall needed tests and confirming no regressions. Nothing new orchanging here IMHO
"CI is a combinatorial problem, we need to run all jobs for all JDKs,vnode on/of, cdc on/off, compression on/of, etc…. But this iscurrently controlled and fleshed out by humans who want to add newjobs.. we should move away from maintaining .circleci/config-2_1.ymland instead auto-generate it. Simple example of this problem is jdk11support… we run a subset of tests on jdk11 and say its supported… willjdk17 have the same issue? Will it be even less tests? Why does theburden lie on everyone to “do the right thing” when all they want is asimple job?" Controlled and fleshed by humans it will always be but I agree weneed to automate the steps to make it easier for people to add most ofthe combinations and not to skip any because it is too much work. Wewill always need a human to decide which jdks, cdc, vnodes, etc. Withthat said I shared your ticket/patch with Derek as he had similarthoughts, we need to get back to that one at some point.(CASSANDRA-17600) Thanks for working on that!"why do we require people to install “circleci” command to contribute?If you rename .circleci/config-2_1.yml to .circleci/config.yml then CIwill work just fine… we don’t need to call “circleci config process”every time we touch circle config…. Also, seems that w/e someone newto circle config (but not cassandra) touch it they always mutateLOW/MID/HIGH and not .circleci/config-2_1.yml… so I keep going back tofix .circleci/config-2_1.yml…."I'd say config-2_1.yml is mainly for those who will make permanentchanges to config (like adding/removing jobs). config-2_1.yml isactually created as per the CircleCI automation rules - 1st we add andreuse executors, parameters and commands but I think we can reducefurther things if we add even more parameters probably. I have to lookmore into the current file. I am sure there is room for furtherimprovement. 2nd circleci cli tool can verify the config file forerrors and helps with debugging before we push to CircleCI. There iscircleci config validate. If we make changes manually we are on ourown to verify the long yml and also deal with duplication inconfig.yml. My concern is that things that need to be almost identicalmight start to diverge easier. Though I made my suggestion in point 1for what cases probably we can add menu options that potentially willnot require using circleci cli tool. There might be more cases though.Currently config-2_1.yml is 2256 lines while config.yml is 5793 lines.I'd say lots of duplication there
[1] https://lists.apache.org/thread/htxoh60zt8zxc4vgxj9zh71trk0zxwhl

On Wed, 19 Oct 2022 at 17:20, David Capwell <[email protected]> wrote:
    1. Tune parallelism levels per job (David and Ekaterina have
    insight on this)
    +1 to this!  I drastically lower our parallelism as
    only python-dtest upgrade tests need many resources…

    What I do for JVM unit/jvm-dtest is the following

        def java_parallelism(src_dir, kind, num_file_in_worker,
        include = lambda a, b: True):
            d = os.path.join(src_dir, 'test', kind)
            num_files = 0
            for root, dirs, files in os.walk(d):
                for f in files:
                    if f.endswith('Test.java') and
        include(os.path.join(root, f), f):
                        num_files += 1
            return math.floor(num_files / num_file_in_worker)

        def fix_parallelism(args, contents):
            jobs = contents['jobs']

            unit_parallelism                =
        java_parallelism(args.src, 'unit', 20)
            jvm_dtest_parallelism           =
        java_parallelism(args.src, 'distributed', 4, lambda full,
        name: 'upgrade' not in full)
            jvm_dtest_upgrade_parallelism   =
        java_parallelism(args.src, 'distributed', 2, lambda full,
        name: 'upgrade' in full)

    TL;DR - I find all test files we are going to run, and based off a
    pre-defined variable that says “idea” number of files per
    worker, I then calculate how many workers we need.  So unit tests
    are num_files / 20 ~= 35 workers.  Can I be “smarter” by knowing
    which files have higher cost?  Sure… but the “perfect” and
    the “average” are too similar that it wasn’t worth it...
    2. Rename jobs on circle to be more indicative of their function
    Have an example?  I am not against, I just don’t know the problem
    you are referring to.
    3. Unify j8 and j11 workflow pairs into single
    Fine by me, but we need to keep in mind j17 is coming.  Also, most
    developmental CI builds don’t really need to run cross every JDK,
    so we need some way to disable different JDKs…

    When I am testing out a patch I tend to run the following (my
    script): "circleci-enable.py --no-jdk11”; this will remove the
    JDK11 builds.  I know I am going to run them pre-merge so I know
    its safe for me.
    5. Flag on generate.sh to allow auto-run on push
    I really hate that we don’t do this by default… I still to this
    day strongly feel you should *opt-out* of CI rather than opt-in…
    seen several commits get merged as they didn’t see a error in
    circle… because circle didn’t do any work…. Yes, I am fully aware
    that I am beating a dead horse…

    TL;DR +1
    7. Consider flag on generate.sh to run and commit with "[DO NOT
    MERGE] temporary circleci config" as the commit message
    +0 from me… I have seen people not realize you have to commit
    after typing “higher” (wrapper around my personal
    circleci-enable.py script to apply my defaults to the build) but
    not an issue I have… so I don’t mind if people want the tool to
    integrate with git…


    With all that said, I do feel there is more, and something I feel
    Ekaterina is probably dealing with with her JDK17 work…

    1) resource_class used is not because its needed… in HIGHER file
    we default to xlarge but only python upgrade tests need that…
    reported in CASSANDRA-17600
    2) our current patching allows MID/HIGHER to drift as changes need
    new patches else patching may do the wrong thing… reported in
    CASSANDRA-17600
    3) CI is a combinatorial problem, we need to run all jobs for all
    JDKs, vnode on/of, cdc on/off, compression on/of, etc…. But this
    is currently controlled and fleshed out by humans who want to add
    new jobs..  we should move away from
    maintaining .circleci/config-2_1.yml and instead auto-generate
    it.  Simple example of this problem is jdk11 support… we run a
    subset of tests on jdk11 and say its supported… will jdk17 have
    the same issue? Will it be even less tests?  Why does the burden
    lie on everyone to “do the right thing” when all they want is a
    simple job?
    4) why do we require people to install “circleci” command to
    contribute? If you rename .circleci/config-2_1.yml
    to .circleci/config.yml then CI will work just fine… we don’t need
    to call “circleci config process” every time we touch circle
    config…. Also, seems that w/e someone new to circle config (but
    not cassandra) touch it they always mutate LOW/MID/HIGH and
    not .circleci/config-2_1.yml… so I keep going back to
    fix .circleci/config-2_1.yml….
    On Oct 19, 2022, at 1:32 PM, Miklosovic, Stefan
    <[email protected]> wrote:

    1) would be nice to have. The first thing I do is that I change
    the parallelism to 20. None of committed config.yaml's are
    appropriate for our company CircleCI so I have to tweak this
    manually. I think we can not run more that 25/30 containers in
    parallel, something like that. HIGHRES has 100 and MIDRES has
    some jobs having parallelism equal to 50 or so so that is not
    good either. I would be happy with simple way to modify default
    config.yaml on parallelism. I use "sed" to change parallelism: 4
    to parallelism: 20 and leave parallelism: 1 where it does not
    make sense to increase it. However I noticed that there is not
    "4" set everywhere, some jobs have it set to "1" so I have to
    take extra care of these cases (I consider that to be a bug, I
    think there are two or three, I do not remember). Once set, I
    have that config in "git stash" so I just apply it every time I
    need it.

    5) would be nice too.
    7) is nice but not crucial, it takes no time to commit that.

    ________________________________________
    From: Josh McKenzie <[email protected]>
    Sent: Wednesday, October 19, 2022 21:50
    To: dev
    Subject: [DISCUSS] Potential circleci config and workflow changes

    NetApp Security WARNING: This is an external email. Do not click
    links or open attachments unless you recognize the sender and
    know the content is safe.



    While working w/Andres on CASSANDRA-17939 a variety of things
    came up regarding our circleci config and opportunities to
    improve it. Figured I'd hit the list up here to see what people's
    thoughts are since many of us intersect with these systems daily
    and having your workflow disrupted without having a chance to
    provide input is bad.

    The ideas:
    1. Tune parallelism levels per job (David and Ekaterina have
    insight on this)
    2. Rename jobs on circle to be more indicative of their function
    3. Unify j8 and j11 workflow pairs into single (for 2 and 3 see:
    
https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595
    
<https://issues.apache.org/jira/browse/CASSANDRA-17939?focusedCommentId=17616595&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17616595>)
    4. Update documentation w/guidance on using circle,
    .circleci/generate.sh examples, etc
    4a. How to commit:
    https://cassandra.apache.org/_/development/how_to_commit.html
    4b. Testing: https://cassandra.apache.org/_/development/testing.html
    5. Flag on generate.sh to allow auto-run on push
    6. Clean up the -l, -m, -h flags (test and indicate -l
    feasibility for all suites, default to -m, deprecate -h?) <- may
    not be a code-change issue and instead be a documentation issue
    7. Consider flag on generate.sh to run and commit with "[DO NOT
    MERGE] temporary circleci config" as the commit message

    Curious to see what folks think.

    ~Josh

Re: [DISCUSS] Potential circleci config and workflow changes

Reply via email to