Re: Testing out JIRA as replacement for cwiki tracking of 4.0 quality testing
> > what we really need is > some dedicated PM time going forward. Is that something you think you can > help resource from your side? Not a ton, but I think enough yes. (Also, thanks for all the efforts exploring this either way!!) Happy to help. On Sun, Feb 2, 2020 at 2:46 PM Nate McCall wrote: > > > > My .02: I think it'd improve our ability to collaborate and lower > friction > > to testing if we could do so on JIRA instead of the cwiki. *I suspect > *the > > edit access restrictions there plus general UX friction (difficult to > have > > collab discussion, comment chains, links to things, etc) make the > confluent > > wiki a worse tool for this job than JIRA. Plus if we do it in JIRA we can > > track the outstanding scope in the single board and it's far easier to > > visualize everything in one place so we can all know where attention and > > resources need to be directed to best move the needle on things. > > > > But that's just my opinion. What does everyone else think? Like the JIRA > > route? Hate it? No opinion? > > > > If we do decide we want to go the epic / JIRA route, I'd be happy to > > migrate the rest of the information in there for things that haven't been > > completed yet on the wiki (ticket creation, assignee/reviewer chains, > links > > to epic). > > > > So what does everyone think? > > > > I think this is a good idea. Having the resources available to keep the > various bits twiddled correctly on existing and new issues has always been > the hard part for us. So regardless of the path, what we really need is > some dedicated PM time going forward. Is that something you think you can > help resource from your side? > > (Also, thanks for all the efforts exploring this either way!!) >
Re: Testing out JIRA as replacement for cwiki tracking of 4.0 quality testing
>From the people that have modified this page in the past, what are your thoughts? Good for me to pull the rest into JIRA and we redirect from the wiki? +joey lynch +scott andreas +sumanth pasupuleti +marcus eriksson +romain hardouin On Mon, Feb 3, 2020 at 8:57 AM Joshua McKenzie wrote: > what we really need is >> some dedicated PM time going forward. Is that something you think you can >> help resource from your side? > > Not a ton, but I think enough yes. > > (Also, thanks for all the efforts exploring this either way!!) > > Happy to help. > > On Sun, Feb 2, 2020 at 2:46 PM Nate McCall wrote: > >> > >> > My .02: I think it'd improve our ability to collaborate and lower >> friction >> > to testing if we could do so on JIRA instead of the cwiki. *I suspect >> *the >> > edit access restrictions there plus general UX friction (difficult to >> have >> > collab discussion, comment chains, links to things, etc) make the >> confluent >> > wiki a worse tool for this job than JIRA. Plus if we do it in JIRA we >> can >> > track the outstanding scope in the single board and it's far easier to >> > visualize everything in one place so we can all know where attention and >> > resources need to be directed to best move the needle on things. >> > >> > But that's just my opinion. What does everyone else think? Like the JIRA >> > route? Hate it? No opinion? >> > >> > If we do decide we want to go the epic / JIRA route, I'd be happy to >> > migrate the rest of the information in there for things that haven't >> been >> > completed yet on the wiki (ticket creation, assignee/reviewer chains, >> links >> > to epic). >> > >> > So what does everyone think? >> > >> >> I think this is a good idea. Having the resources available to keep the >> various bits twiddled correctly on existing and new issues has always been >> the hard part for us. So regardless of the path, what we really need is >> some dedicated PM time going forward. Is that something you think you can >> help resource from your side? >> >> (Also, thanks for all the efforts exploring this either way!!) >> >
Re: Fwd: [CI] What are the troubles projects face with CI and Infra
Nate, I leave it to you to forward what-you-chose to the board@'s thread. > Are there still troubles and what are they? TL;DR the ASF could provide the Cassandra community with an isolated jenkins installation: so that we can manage and control the Jenkins master, as well as ensure all donated hardware for Jenkins agents are dedicated and isolated to us. The long writeup… For Cassandra's use of ASF's Jenkins I see the following problems. ** Lack of trust (aka reliability) The Jenkins agents re-use their workspaces, as opposed to using new containers per test run, leading to broken agents, disks, git clones, etc. One broken test run, or a broken agent, too easily affects subsequent test executions. The complexity (and flakiness) around our tests is a real problem. CI on a project like Cassandra is a beast and the community is very limited in what it can do, it really needs the help of larger companies. Effort is required in fixing the broken, the flakey, and the ignored tests. Parallelising the tests will help by better isolating failures, but tests (and their execution scripts) also need to be better at cleaning up after themselves, or a more container approach needs to be taken. Another issue is that other projects sometimes using the agents, and Infra sometimes edits our build configurations (out of necessity). ** Lack of resources (throughput and response) Having only 9 agents: none of which can run the large dtests; is a problem. All 9 are from Instaclustr, much kudos! Three companies recently have said they will donate resources, this is work in progress. We have four release branches where we would like to provide per-commit post-commit testing. Each complete test execution currently take 24hr+. Parallelising tests atm won't help much as the agents are generally saturated (with the pipelines doing the top-level parallelisation). Once we get more hardware in place: for the sake of improving throughput; it will make sense to look into parallelising the tests more. The throughput of tests will also improve with effort put into removing/rewriting long running and inefficient tests. Also, and i think this is LHF, throughput could be improved by using (or taking inspiration from) Apache Yetus so to only run tests on what it relevant in the patch/commit. Ref: http://yetus.apache.org/documentation/0.11.1/precommit-basic/ ** Difficulty in use Jenkins is clumsy to use compared to the CI systems we use more often today: Travis, CircleCI, GH Actions. One of the complaints has been that only committers can kick off CI for patches (ie pre-commit CI runs). But I don't believe this to be a crucial issue for a number of reasons. 1. Thorough CI testing of a patch only needs to happen during the review process, to which a committer needs to be involved in anyway. 2. We don't have enough jenkins agents to handle the amount of throughput that automated branch/patch/pull-request testing would require. 3. Our tests could allow unknown contributors to take ownership of the agent servers (eg via the execution of bash scripts). 4. We have CircleCI working that provides basic testing for work-in-progress patches. Focusing on post-commit CI and having canonical results for our release branches, i think then it boils down to the stability and throughput of tests, and the persistence and permanence of results. The persistence and permanence of results is a bug bear for me. It has been partially addressed with posting the build results to the builds@ ML. But this only provides a (pretty raw) summary of the results. I'm keen to take the next step of the posting of CI results back to committed jira tickets (but am waiting on seeing Jenkins run stable for a while). If we had our own Jenkins master we could then look into retaining more/all build results. Being able to see the longer term trends of test results and well as execution times I hope would add the incentive to get more folk involved. Looping back to the ASF and what they could do: it would help us a lot in improving the stability and usability issues by providing us an isolated jenkins. Having our own master would simplify the setup, use and debugging, of Jenkins. It would still require some sunk cost but hopefully we'd end up with something better tailored to our needs. And with isolated agents help restore confidence. regards, Mick PS i really want to hear from those that were involved in the past with cassci, your skills and experience on this topic surpass anything i got. On Sun, 2 Feb 2020, at 22:51, Nate McCall wrote: > Hi folks, > The board is looking for feedback on CI infrastructure. I'm happy to take > some (constructive) comments back. (Shuler, Mick and David Capwell > specifically as folks who've most recently wrestled with this a fair bit). > > Thanks, > -Nate > > -- Forwarded message - > From: Dave Fisher > Date: Mon, Feb 3, 2020 at 8:58 AM > Subject: [CI] W
Re: [Discuss] num_tokens default in Cassandra 4.0
I think it's a good idea to take a step back and get a high level view of the problem we're trying to solve. First, high token counts result in decreased availability as each node has data overlap with with more nodes in the cluster. Specifically, a node can share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is going to almost always share data with every other node in the cluster that isn't in the same rack, unless you're doing something wild like using more than a thousand nodes in a cluster. We advertise With 16 tokens, that is vastly improved, but you still have up to 64 nodes each node needs to query against, so you're again, hitting every node unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I wouldn't use 16 here, and I doubt any of you would either. I've advocated for 4 tokens because you'd have overlap with only 16 nodes, which works well for small clusters as well as large. Assuming I was creating a new cluster for myself (in a hypothetical brand new application I'm building) I would put this in production. I have worked with several teams where I helped them put 4 token clusters in prod and it has worked very well. We didn't see any wild imbalance issues. As Mick's pointed out, our current method of using random token assignment for the default number of problematic for 4 tokens. I fully agree with this, and I think if we were to try to use 4 tokens, we'd want to address this in tandem. We can discuss how to better allocate tokens by default (something more predictable than random), but I'd like to avoid the specifics of that for the sake of this email. To Alex's point, repairs are problematic with lower token counts due to over streaming. I think this is a pretty serious issue and I we'd have to address it before going all the way down to 4. This, in my opinion, is a more complex problem to solve and I think trying to fix it here could make shipping 4.0 take even longer, something none of us want. For the sake of shipping 4.0 without adding extra overhead and time, I'm ok with moving to 16 tokens, and in the process adding extensive documentation outlining what we recommend for production use. I think we should also try to figure out something better than random as the default to fix the data imbalance issues. I've got a few ideas here I've been noodling on. As long as folks are fine with potentially changing the default again in C* 5.0 (after another discussion / debate), 16 is enough of an improvement that I'm OK with the change, and willing to author the docs to help people set up their first cluster. For folks that go into production with the defaults, we're at least not setting them up for total failure once their clusters get large like we are now. In future versions, we'll probably want to address the issue of data imbalance by building something in that shifts individual tokens around. I don't think we should try to do this in 4.0 either. Jon On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna wrote: > I think Mick and Anthony make some valid operational and skew points for > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line > between small and large clusters but I think most would agree that most > clusters are on the small to medium side. (A small nuance is afaict the > probabilities have to do with quorum on a full token range, ie it has to do > with the size of a datacenter not the full cluster > > As I read this discussion I’m personally more inclined to go with 16 for > now. It’s true that if we could fix the skew and topology gotchas for those > starting things up, 4 would be ideal from an availability perspective. > However we’re still in the brainstorming stage for how to address those > challenges. I think we should create tickets for those issues and go with > 16 for 4.0. > > This is about an out of the box experience. It balances availability, > operations (such as skew and general bootstrap friendliness and > streaming/repair), and cluster sizing. Balancing all of those, I think for > now I’m more comfortable with 16 as the default with docs on considerations > and tickets to unblock 4 as the default for all users. > > >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa wrote: > >> On Fri, Jan 31, 2020 at 11:25 AM Joseph Lynch > wrote: > >> I think that we might be bikeshedding this number a bit because it is > easy > >> to debate and there is not yet one right answer. > > > > > > https://www.youtube.com/watch?v=v465T5u9UKo > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Fwd: [CI] What are the troubles projects face with CI and Infra
Mick, this is fantastic! I'll wait another day to see if anyone else chimes in. (Would also love to hear from CassCI folks, anyone else really who has wrestled with this even for internal forks). On Tue, Feb 4, 2020 at 10:37 AM Mick Semb Wever wrote: > Nate, I leave it to you to forward what-you-chose to the board@'s thread. > > > > Are there still troubles and what are they? > > > TL;DR > the ASF could provide the Cassandra community with an isolated jenkins > installation: so that we can manage and control the Jenkins master, as > well as ensure all donated hardware for Jenkins agents are dedicated and > isolated to us. > > > The long writeup… > > For Cassandra's use of ASF's Jenkins I see the following problems. > > ** Lack of trust (aka reliability) > > The Jenkins agents re-use their workspaces, as opposed to using new > containers per test run, leading to broken agents, disks, git clones, etc. > One broken test run, or a broken agent, too easily affects subsequent test > executions. > > The complexity (and flakiness) around our tests is a real problem. CI on > a project like Cassandra is a beast and the community is very limited in > what it can do, it really needs the help of larger companies. Effort is > required in fixing the broken, the flakey, and the ignored tests. > Parallelising the tests will help by better isolating failures, but tests > (and their execution scripts) also need to be better at cleaning up after > themselves, or a more container approach needs to be taken. > > Another issue is that other projects sometimes using the agents, and Infra > sometimes edits our build configurations (out of necessity). > > > ** Lack of resources (throughput and response) > > Having only 9 agents: none of which can run the large dtests; is a > problem. All 9 are from Instaclustr, much kudos! Three companies recently > have said they will donate resources, this is work in progress. > > We have four release branches where we would like to provide per-commit > post-commit testing. Each complete test execution currently take 24hr+. > Parallelising tests atm won't help much as the agents are generally > saturated (with the pipelines doing the top-level parallelisation). Once we > get more hardware in place: for the sake of improving throughput; it will > make sense to look into parallelising the tests more. > > The throughput of tests will also improve with effort put into > removing/rewriting long running and inefficient tests. Also, and i think > this is LHF, throughput could be improved by using (or taking inspiration > from) Apache Yetus so to only run tests on what it relevant in the > patch/commit. Ref: > http://yetus.apache.org/documentation/0.11.1/precommit-basic/ > > > ** Difficulty in use > > Jenkins is clumsy to use compared to the CI systems we use more often > today: Travis, CircleCI, GH Actions. > > One of the complaints has been that only committers can kick off CI for > patches (ie pre-commit CI runs). But I don't believe this to be a crucial > issue for a number of reasons. > > 1. Thorough CI testing of a patch only needs to happen during the review > process, to which a committer needs to be involved in anyway. > 2. We don't have enough jenkins agents to handle the amount of throughput > that automated branch/patch/pull-request testing would require. > 3. Our tests could allow unknown contributors to take ownership of the > agent servers (eg via the execution of bash scripts). > 4. We have CircleCI working that provides basic testing for > work-in-progress patches. > > > Focusing on post-commit CI and having canonical results for our release > branches, i think then it boils down to the stability and throughput of > tests, and the persistence and permanence of results. > > The persistence and permanence of results is a bug bear for me. It has > been partially addressed with posting the build results to the builds@ > ML. But this only provides a (pretty raw) summary of the results. I'm keen > to take the next step of the posting of CI results back to committed jira > tickets (but am waiting on seeing Jenkins run stable for a while). If we > had our own Jenkins master we could then look into retaining more/all build > results. Being able to see the longer term trends of test results and well > as execution times I hope would add the incentive to get more folk involved. > > Looping back to the ASF and what they could do: it would help us a lot in > improving the stability and usability issues by providing us an isolated > jenkins. Having our own master would simplify the setup, use and debugging, > of Jenkins. It would still require some sunk cost but hopefully we'd end up > with something better tailored to our needs. And with isolated agents help > restore confidence. > > regards, > Mick > > PS i really want to hear from those that were involved in the past with > cassci, your skills and experience on this topic surpass anything i got. > > > > On Sun, 2 Feb 2020, at 22:51, Nate
Re: [VOTE] Release Apache Cassandra 4.0-alpha3
> Summary of notes: > - Artifact set checks out OK with regards to key sigs and checksums. > - CASSANDRA-14962 is an issue when not using the current deb build > method (using new docker method results in different source artifact > creation & use). The docker rpm build suffers the same source problem > and the src.rpm is significantly larger, since I think it copies all the > downloaded maven artifacts in. It's fine for now, though :) > - UNRELEASED deb build Thanks for the thorough review Michael. I did not know about CASSANDRA-14962, but it should be easy to fix now that the -src.tar.gz is in the dev dist location and easy to re-use. I'll see if I can create a patch for that (aiming to use it on alpha4). And I was unaware of the UNRELEASED version issue. I can put a patch in for that too, going into the prepare_release.sh script. > Next step > would be to do each package-type install and startup functional testing, > but I don't have that time right now :) I'm going to presume others that have voted have done package-type installs and the basic testing, and move ahead. If I close the vote, I will need your help Michael with the final steps running the patched finish_release.sh from the `mck/14970_sha512-checksums` branch, found in https://github.com/thelastpickle/cassandra-builds/blob/mck/14970_sha512-checksums/ Because only PMC can `svn move` the files into dist.apache.org/repos/dist/release/ And for the upload_bintray.sh script, how do I get credentials, an infra ticket i presume? (ie to https://bintray.com/apache ) - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Fwd: [CI] What are the troubles projects face with CI and Infra
Only have a moment to respond, but Mick hit the higlights with containerization, parallelization, these help solve cleanup, speed, and cascading failures. Dynamic disposable slaves would be icing on that cake, which may require a dedicated master. One more note on jobs, or more correctly unnecessary jobs - pipelines have a `changeset` build condition we should tinker with. There is zero reason to run a job with no actual code diff. For instance, I committed to 2.1 this morning and merged `-s ours` nothing to the newer branches - there's really no reason to run and take up valuable resources with no actual diff changes. https://jenkins.io/doc/book/pipeline/syntax/#built-in-conditions Michael On 2/3/20 3:45 PM, Nate McCall wrote: Mick, this is fantastic! I'll wait another day to see if anyone else chimes in. (Would also love to hear from CassCI folks, anyone else really who has wrestled with this even for internal forks). On Tue, Feb 4, 2020 at 10:37 AM Mick Semb Wever wrote: Nate, I leave it to you to forward what-you-chose to the board@'s thread. Are there still troubles and what are they? TL;DR the ASF could provide the Cassandra community with an isolated jenkins installation: so that we can manage and control the Jenkins master, as well as ensure all donated hardware for Jenkins agents are dedicated and isolated to us. The long writeup… For Cassandra's use of ASF's Jenkins I see the following problems. ** Lack of trust (aka reliability) The Jenkins agents re-use their workspaces, as opposed to using new containers per test run, leading to broken agents, disks, git clones, etc. One broken test run, or a broken agent, too easily affects subsequent test executions. The complexity (and flakiness) around our tests is a real problem. CI on a project like Cassandra is a beast and the community is very limited in what it can do, it really needs the help of larger companies. Effort is required in fixing the broken, the flakey, and the ignored tests. Parallelising the tests will help by better isolating failures, but tests (and their execution scripts) also need to be better at cleaning up after themselves, or a more container approach needs to be taken. Another issue is that other projects sometimes using the agents, and Infra sometimes edits our build configurations (out of necessity). ** Lack of resources (throughput and response) Having only 9 agents: none of which can run the large dtests; is a problem. All 9 are from Instaclustr, much kudos! Three companies recently have said they will donate resources, this is work in progress. We have four release branches where we would like to provide per-commit post-commit testing. Each complete test execution currently take 24hr+. Parallelising tests atm won't help much as the agents are generally saturated (with the pipelines doing the top-level parallelisation). Once we get more hardware in place: for the sake of improving throughput; it will make sense to look into parallelising the tests more. The throughput of tests will also improve with effort put into removing/rewriting long running and inefficient tests. Also, and i think this is LHF, throughput could be improved by using (or taking inspiration from) Apache Yetus so to only run tests on what it relevant in the patch/commit. Ref: http://yetus.apache.org/documentation/0.11.1/precommit-basic/ ** Difficulty in use Jenkins is clumsy to use compared to the CI systems we use more often today: Travis, CircleCI, GH Actions. One of the complaints has been that only committers can kick off CI for patches (ie pre-commit CI runs). But I don't believe this to be a crucial issue for a number of reasons. 1. Thorough CI testing of a patch only needs to happen during the review process, to which a committer needs to be involved in anyway. 2. We don't have enough jenkins agents to handle the amount of throughput that automated branch/patch/pull-request testing would require. 3. Our tests could allow unknown contributors to take ownership of the agent servers (eg via the execution of bash scripts). 4. We have CircleCI working that provides basic testing for work-in-progress patches. Focusing on post-commit CI and having canonical results for our release branches, i think then it boils down to the stability and throughput of tests, and the persistence and permanence of results. The persistence and permanence of results is a bug bear for me. It has been partially addressed with posting the build results to the builds@ ML. But this only provides a (pretty raw) summary of the results. I'm keen to take the next step of the posting of CI results back to committed jira tickets (but am waiting on seeing Jenkins run stable for a while). If we had our own Jenkins master we could then look into retaining more/all build results. Being able to see the longer term trends of test results and well as execution times I hope would add the incentive to get more folk involved. Looping back to th
Re: [VOTE] Release Apache Cassandra 4.0-alpha3
+1, this at least starts up on Windows ;) Dinesh > On Feb 3, 2020, at 3:21 PM, Mick Semb Wever wrote: > > >> Summary of notes: >> - Artifact set checks out OK with regards to key sigs and checksums. >> - CASSANDRA-14962 is an issue when not using the current deb build >> method (using new docker method results in different source artifact >> creation & use). The docker rpm build suffers the same source problem >> and the src.rpm is significantly larger, since I think it copies all the >> downloaded maven artifacts in. It's fine for now, though :) >> - UNRELEASED deb build > > > Thanks for the thorough review Michael. > > I did not know about CASSANDRA-14962, but it should be easy to fix now that > the -src.tar.gz is in the dev dist location and easy to re-use. I'll see if I > can create a patch for that (aiming to use it on alpha4). > > And I was unaware of the UNRELEASED version issue. I can put a patch in for > that too, going into the prepare_release.sh script. > > >> Next step >> would be to do each package-type install and startup functional testing, >> but I don't have that time right now :) > > > I'm going to presume others that have voted have done package-type installs > and the basic testing, and move ahead. If I close the vote, I will need your > help Michael with the final steps running the patched finish_release.sh from > the `mck/14970_sha512-checksums` branch, found in > https://github.com/thelastpickle/cassandra-builds/blob/mck/14970_sha512-checksums/ > Because only PMC can `svn move` the files into > dist.apache.org/repos/dist/release/ > > And for the upload_bintray.sh script, how do I get credentials, an infra > ticket i presume? (ie to https://bintray.com/apache ) > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Feedback from the last Apache Cassandra Contributor Meeting
Hi everyone, One action item I took from our first contributor meeting was gather feedback for the next meetings. I've created a short survey if you would like to offer feedback. I'll let it run for the week and report back on the results. https://www.surveymonkey.com/r/C95B7ZP Thanks, Patrick
Re: Fwd: [CI] What are the troubles projects face with CI and Infra
Following Mick's format =) ** Lack of trust (aka reliability) Mick said it best, but should also add that we have slow tests and tests which don't do anything. Effort is needed to improve our current tests and to make sure future tests are stable (cleaning up works, isolation, etc.); this is not a neglectable amount of work, nor work which can be done by a single person. ** Lack of resources (throughput and response) Our slowest unit tests are around 2 minutes (materialized views), our slowest dtests (not high resource) are around 30 minutes; given enough resources we could run unit in < 10 minutes and dtest in 30-60 minutes. There is also another thing to point out, testing is also a combinatorics problem; we support java 8/11 (more to come), vnode and no-vnode, security and no security, and the list goes on. Bugs are more likely to happen when two features interact, so it is important to test against many combinations. There is work going on in the community to add new kinds of tests (harry, diff, etc.); these tests require even more resources than normal tests. ** Difficulty in use Many people rely on CircleCI as the core CI for the project, but this has a few issues called out in other forms: the low resource version (free) is even more flaky than high (paid), and people get locked out (i have lost access twice so far, others have said the same). The thing which worries me the most is that new members to the project won't have the high resource CircleCI plan, nor do they really have access to Jenkins. This puts a burden on new authors where they wait 24+ hours to run the tests... or just not run them. ** Lack of visibility into quality This is two things for me: commit and pre-commit. For commit, this is more what Mick was referring to as "post-commit CI". There are a few questions I would like to know about our current tests (report most flaky tests, which sections of code cause the most failures, etc.); these are hard to answer at the moment . We don't have a good pre-commit story since it mostly relies on CircleCI. I find that some JIRAs link CircleCI and some don't. I find that if I follow the CircleCI link months later (to see if the build was stable pre-commit) that Circle fails to show the workflow. On Mon, Feb 3, 2020 at 3:42 PM Michael Shuler wrote: > Only have a moment to respond, but Mick hit the higlights with > containerization, parallelization, these help solve cleanup, speed, and > cascading failures. Dynamic disposable slaves would be icing on that > cake, which may require a dedicated master. > > One more note on jobs, or more correctly unnecessary jobs - pipelines > have a `changeset` build condition we should tinker with. There is zero > reason to run a job with no actual code diff. For instance, I committed > to 2.1 this morning and merged `-s ours` nothing to the newer branches - > there's really no reason to run and take up valuable resources with no > actual diff changes. > https://jenkins.io/doc/book/pipeline/syntax/#built-in-conditions > > Michael > > On 2/3/20 3:45 PM, Nate McCall wrote: > > Mick, this is fantastic! > > > > I'll wait another day to see if anyone else chimes in. (Would also love > to > > hear from CassCI folks, anyone else really who has wrestled with this > even > > for internal forks). > > > > On Tue, Feb 4, 2020 at 10:37 AM Mick Semb Wever wrote: > > > >> Nate, I leave it to you to forward what-you-chose to the board@'s > thread. > >> > >> > >>> Are there still troubles and what are they? > >> > >> > >> TL;DR > >>the ASF could provide the Cassandra community with an isolated > jenkins > >> installation: so that we can manage and control the Jenkins master, as > >> well as ensure all donated hardware for Jenkins agents are dedicated and > >> isolated to us. > >> > >> > >> The long writeup… > >> > >> For Cassandra's use of ASF's Jenkins I see the following problems. > >> > >> ** Lack of trust (aka reliability) > >> > >> The Jenkins agents re-use their workspaces, as opposed to using new > >> containers per test run, leading to broken agents, disks, git clones, > etc. > >> One broken test run, or a broken agent, too easily affects subsequent > test > >> executions. > >> > >> The complexity (and flakiness) around our tests is a real problem. CI > on > >> a project like Cassandra is a beast and the community is very limited in > >> what it can do, it really needs the help of larger companies. Effort is > >> required in fixing the broken, the flakey, and the ignored tests. > >> Parallelising the tests will help by better isolating failures, but > tests > >> (and their execution scripts) also need to be better at cleaning up > after > >> themselves, or a more container approach needs to be taken. > >> > >> Another issue is that other projects sometimes using the agents, and > Infra > >> sometimes edits our build configurations (out of necessity). > >> > >> > >> ** Lack of resources (throughput and response) > >> > >> Having only 9 agents:
Re: [VOTE] Release Apache Cassandra 4.0-alpha3
On 2/3/20 5:21 PM, Mick Semb Wever wrote: Summary of notes: - Artifact set checks out OK with regards to key sigs and checksums. - CASSANDRA-14962 is an issue when not using the current deb build method (using new docker method results in different source artifact creation & use). The docker rpm build suffers the same source problem and the src.rpm is significantly larger, since I think it copies all the downloaded maven artifacts in. It's fine for now, though :) - UNRELEASED deb build Thanks for the thorough review Michael. I did not know about CASSANDRA-14962, but it should be easy to fix now that the -src.tar.gz is in the dev dist location and easy to re-use. I'll see if I can create a patch for that (aiming to use it on alpha4). Yep! Similarly, the rpm build has been wrong all along, but it's what we have. The -src.tar.gz should get copied to /$build/$path/SOURCE dir, I think it is(?). I think that might cure the larger .src.rpm. And I was unaware of the UNRELEASED version issue. I can put a patch in for that too, going into the prepare_release.sh script. `dch -r` is usually a step I do before building, also checking NEWS and CHANGES and build.xml versions all align. Then the correct commit gets -tentative tagged. Building `dch -r` in would be OK, if all the other ducks are in a row. Next step would be to do each package-type install and startup functional testing, but I don't have that time right now :) I'm going to presume others that have voted have done package-type installs and the basic testing, and move ahead. If I close the vote, I will need your help Michael with the final steps running the patched finish_release.sh from the `mck/14970_sha512-checksums` branch, found in https://github.com/thelastpickle/cassandra-builds/blob/mck/14970_sha512-checksums/ Because only PMC can `svn move` the files into dist.apache.org/repos/dist/release/ I usually do this before vote. I don't know how many other people, if any, test that all the packages can install and start. And for the upload_bintray.sh script, how do I get credentials, an infra ticket i presume? (ie to https://bintray.com/apache ) If I recall, I did an infra ticket with my github user id - this is how I log in. Once logged into bintray, you can find a token down in the user profile somewhere, which is used in the script. Thanks again for walking through these steps. Michael - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org