Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
> Regarding the proposed agenda of going through the unassigned issues to > improve visibility on what needs to be done to ship 4.0 GA I think this is > a great start but only covers part of the problem. > > I think we have 3 outstanding issues that are hampering visibility of 4.0 > progress: > a) Quality testing issues with no shepherd; > b) Quality testing issues with shepherd, but no recent activity (~2 months > or less); > c) Quality testing issues with no objective acceptance criteria/Definition > of Done; These Quality testing epics are a great focal point. How will we ensure this QA persists, so it's not a manual checklist every release? The following is what I can see outstanding for the 4.0 release, that is not afaik attached to these epic tickets… ** Those issues that slipped alpha… *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta *** CASSANDRA-15234 – Standardise config and JVM parameters *** CASSANDRA-13701 – Lower default num_tokens (blocked by 'CASSANDRA-16079 Improve dtest runtime' ) ** 95 jira tickets in 4.0-beta and 4.0-rc ** 631 jira bug tickets with no assigned "fix version" ** Remaining flakey unit and dtests ** Hundreds of failing and flakey upgrade dtests ** Reports from driver tests, and other external test systems ** Reports and/or integration with Fallout and Harry In a bit more detail… *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta This looks like it is in its final patch and review. Is that correct Sam? *** CASSANDRA-15234 – Standardise config and JVM parameters It looks like we have dropped the ball on this. *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079 Some effort is undergoing from Ekaterina, David, and myself. I've put together a prototype for caching bootstrapped ccm clusters, but i'm not yet sure I can get much savings over the current tests and only a minimal saving off the 13701 patch. Berenguer brought up that 40% of the dtests are single-node, their performance not changed by 13701, and probably better off rewritten to in-jvm tests. ** 95 jira tickets in 4.0-beta and 4.0-rc ** Remaining flakey unit and dtests ** Hundreds of failing and flakey upgrade dtests Do all remaining flakey and failing units and dtests have jira tickets entered for 4.0-beta? Has the same been done, at least with rough grouping, for the upgrade tests? Are these tied to the testing epics in any way? ** 631 jira bug tickets with no assigned "fix version" (who knows how many of these are applicable to 4.0?) Has any triage efforts happened here? Do triaged bugs in this list get moved to fix version "4.x" ? Are we duplicating efforts in the testing epics when others have already identified and reported the bugs but we just haven't triage them?
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
> On 29 Sep 2020, at 09:50, Mick Semb Wever wrote: > >> Regarding the proposed agenda of going through the unassigned issues to >> improve visibility on what needs to be done to ship 4.0 GA I think this is >> a great start but only covers part of the problem. >> >> I think we have 3 outstanding issues that are hampering visibility of 4.0 >> progress: >> a) Quality testing issues with no shepherd; >> b) Quality testing issues with shepherd, but no recent activity (~2 months >> or less); >> c) Quality testing issues with no objective acceptance criteria/Definition >> of Done; > > > These Quality testing epics are a great focal point. How will we ensure > this QA persists, so it's not a manual checklist every release? > > The following is what I can see outstanding for the 4.0 release, that is > not afaik attached to these epic tickets… > > ** Those issues that slipped alpha… >*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming > and compression in protocol v5-beta >*** CASSANDRA-15234 – Standardise config and JVM parameters >*** CASSANDRA-13701 – Lower default num_tokens (blocked by > 'CASSANDRA-16079 Improve dtest runtime' ) > ** 95 jira tickets in 4.0-beta and 4.0-rc > ** 631 jira bug tickets with no assigned "fix version" > ** Remaining flakey unit and dtests > ** Hundreds of failing and flakey upgrade dtests > ** Reports from driver tests, and other external test systems > ** Reports and/or integration with Fallout and Harry > > > In a bit more detail… > > > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and > compression in protocol v5-beta > > This looks like it is in its final patch and review. Is that correct Sam? > Yes it is. I hope to get review finished and post some further perf numbers this week. > > *** CASSANDRA-15234 – Standardise config and JVM parameters > > It looks like we have dropped the ball on this. > > > *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079 > > Some effort is undergoing from Ekaterina, David, and myself. > I've put together a prototype for caching bootstrapped ccm clusters, but > i'm not yet sure I can get much savings over the current tests and only a > minimal saving off the 13701 patch. Berenguer brought up that 40% of the > dtests are single-node, their performance not changed by 13701, and > probably better off rewritten to in-jvm tests. > > > ** 95 jira tickets in 4.0-beta and 4.0-rc > ** Remaining flakey unit and dtests > ** Hundreds of failing and flakey upgrade dtests > > Do all remaining flakey and failing units and dtests have jira tickets > entered for 4.0-beta? > Has the same been done, at least with rough grouping, for the upgrade tests? > > Are these tied to the testing epics in any way? > > > ** 631 jira bug tickets with no assigned "fix version" (who knows how many > of these are applicable to 4.0?) > > Has any triage efforts happened here? > Do triaged bugs in this list get moved to fix version "4.x" ? > Are we duplicating efforts in the testing epics when others have already > identified and reported the bugs but we just haven't triage them? - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
Thanks for bringing up these valuable points, Mick! In fact we focused on the quality epic so far but there is a lot more stuff unaddressed. I commented some of the points you brought up below: > How will we ensure this QA persists, so it's not a manual checklist every release? This is a great question but I believe it warrants a separate discussion as part of a larger discussion on improving our development/quality process post-4.0. > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks like we have dropped the ball on this. It's sad that we dropped the ball on this important change but now I think it's too late to make these changes as it will bring entropy towards stabilizing 4.0. In that sense I think we should postpone this to the next major and prioritize it earlier in the next cycle. > Do all remaining flakey and failing units and dtests have jira tickets entered for 4.0-beta? Has the same been done, at least with rough grouping, for the upgrade tests? Are these tied to the testing epics in any way? Not that I know of. Perhaps we should add a new ticket to the quality epic to track flakey and failing tests? (@cc Josh/Jordan) > Has any triage efforts happened here? Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on looking at it. I can take a stab at triaging some of these tickets. > Do triaged bugs in this list get moved to fix version "4.x" ? I think in the spirit of expediting 4.0RC release we should mark bugs with low severity (ie. those with a simple workaround) to 4.0.1. Any bug with medium-high severity should be marked as 4.0-rc to favor stability. > Are we duplicating efforts in the testing epics when others have already identified and reported the bugs but we just haven't triage them? That's a good point. I think as part of the triaging effort we should link the bugs to existing quality epics so we can keep track of them. Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe escreveu: > > > > On 29 Sep 2020, at 09:50, Mick Semb Wever wrote: > > > >> Regarding the proposed agenda of going through the unassigned issues to > >> improve visibility on what needs to be done to ship 4.0 GA I think this > is > >> a great start but only covers part of the problem. > >> > >> I think we have 3 outstanding issues that are hampering visibility of > 4.0 > >> progress: > >> a) Quality testing issues with no shepherd; > >> b) Quality testing issues with shepherd, but no recent activity (~2 > months > >> or less); > >> c) Quality testing issues with no objective acceptance > criteria/Definition > >> of Done; > > > > > > These Quality testing epics are a great focal point. How will we ensure > > this QA persists, so it's not a manual checklist every release? > > > > The following is what I can see outstanding for the 4.0 release, that is > > not afaik attached to these epic tickets… > > > > ** Those issues that slipped alpha… > >*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming > > and compression in protocol v5-beta > >*** CASSANDRA-15234 – Standardise config and JVM parameters > >*** CASSANDRA-13701 – Lower default num_tokens (blocked by > > 'CASSANDRA-16079 Improve dtest runtime' ) > > ** 95 jira tickets in 4.0-beta and 4.0-rc > > ** 631 jira bug tickets with no assigned "fix version" > > ** Remaining flakey unit and dtests > > ** Hundreds of failing and flakey upgrade dtests > > ** Reports from driver tests, and other external test systems > > ** Reports and/or integration with Fallout and Harry > > > > > > In a bit more detail… > > > > > > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and > > compression in protocol v5-beta > > > > This looks like it is in its final patch and review. Is that correct Sam? > > > > Yes it is. I hope to get review finished and post some further perf > numbers this week. > > > > > *** CASSANDRA-15234 – Standardise config and JVM parameters > > > > It looks like we have dropped the ball on this. > > > > > > *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079 > > > > Some effort is undergoing from Ekaterina, David, and myself. > > I've put together a prototype for caching bootstrapped ccm clusters, but > > i'm not yet sure I can get much savings over the current tests and only a > > minimal saving off the 13701 patch. Berenguer brought up that 40% of the > > dtests are single-node, their performance not changed by 13701, and > > probably better off rewritten to in-jvm tests. > > > > > > ** 95 jira tickets in 4.0-beta and 4.0-rc > > ** Remaining flakey unit and dtests > > ** Hundreds of failing and flakey upgrade dtests > > > > Do all remaining flakey and failing units and dtests have jira tickets > > entered for 4.0-beta? > > Has the same been done, at least with rough grouping, for the upgrade > tests? > > > > Are these tied to the testing epics in any way? > > > > > > ** 631 jira bug tickets with no assigned "fix version" (who knows how > many > > of
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
Not that I know of. Perhaps we should add a new ticket to the quality epic to track flakey and failing tests? (@cc Josh/Jordan) Either a separate epic or a ticket w/sub-tasks either work well in terms of organization. There's value in having one place to go to cleanly pull that kind of work so I have a slight bias towards an independent epic for 4.0 test *fixing* instead of mixing the "new ways to test" with "cleaning up the testing ways we know". --- Josh McKenzie On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta wrote: > Thanks for bringing up these valuable points, Mick! In fact we focused on > the quality epic so far but there is a lot more stuff unaddressed. I > commented some of the points you brought up below: > > How will we ensure this QA persists, so it's not a manual checklist > > every release? > > This is a great question but I believe it warrants a separate discussion > as part of a larger discussion on improving our development/quality process > post-4.0. > > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks > > like we have dropped the ball on this. > > It's sad that we dropped the ball on this important change but now I think > it's too late to make these changes as it will bring entropy towards > stabilizing 4.0. In that sense I think we should postpone this to the next > major and prioritize it earlier in the next cycle. > > Do all remaining flakey and failing units and dtests have jira tickets > > entered for 4.0-beta? Has the same been done, at least with rough > grouping, for the upgrade tests? Are these tied to the testing epics in any > way? > > Not that I know of. Perhaps we should add a new ticket to the quality epic > to track flakey and failing tests? (@cc Josh/Jordan) > > Has any triage efforts happened here? > > Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on looking > at it. I can take a stab at triaging some of these tickets. > > Do triaged bugs in this list get moved to fix version "4.x" ? > > I think in the spirit of expediting 4.0RC release we should mark bugs with > low severity (ie. those with a simple workaround) to 4.0.1. Any bug with > medium-high severity should be marked as 4.0-rc to favor stability. > > Are we duplicating efforts in the testing epics when others have already > > identified and reported the bugs but we just haven't triage them? > > That's a good point. I think as part of the triaging effort we should link > the bugs to existing quality epics so we can keep track of them. > > Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe > escreveu: > > On 29 Sep 2020, at 09:50, Mick Semb Wever wrote: > > Regarding the proposed agenda of going through the unassigned issues to > improve visibility on what needs to be done to ship 4.0 GA I think this > > is > > a great start but only covers part of the problem. > > I think we have 3 outstanding issues that are hampering visibility of > > 4.0 > > progress: > a) Quality testing issues with no shepherd; > b) Quality testing issues with shepherd, but no recent activity (~2 > > months > > or less); > c) Quality testing issues with no objective acceptance > > criteria/Definition > > of Done; > > These Quality testing epics are a great focal point. How will we ensure > this QA persists, so it's not a manual checklist every release? > > The following is what I can see outstanding for the 4.0 release, that is > not afaik attached to these epic tickets… > > ** Those issues that slipped alpha… > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and > compression in protocol v5-beta > *** CASSANDRA-15234 – Standardise config and JVM parameters > *** CASSANDRA-13701 – Lower default num_tokens (blocked by > 'CASSANDRA-16079 Improve dtest runtime' ) > ** 95 jira tickets in 4.0-beta and 4.0-rc > ** 631 jira bug tickets with no assigned "fix version" > ** Remaining flakey unit and dtests > ** Hundreds of failing and flakey upgrade dtests > ** Reports from driver tests, and other external test systems > ** Reports and/or integration with Fallout and Harry > > In a bit more detail… > > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and > compression in protocol v5-beta > > This looks like it is in its final patch and review. Is that correct Sam? > > Yes it is. I hope to get review finished and post some further perf > numbers this week. > > *** CASSANDRA-15234 – Standardise config and JVM parameters > > It looks like we have dropped the ball on this. > > *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079 > > Some effort is undergoing from Ekaterina, David, and myself. I've put > together a prototype for caching bootstrapped ccm clusters, but i'm not yet > sure I can get much savings over the current tests and only a minimal > saving off the 13701 patch. Berenguer brought up that 40% of the dtests are > single-node, their performance not changed by 13701, and probably better > off rewritten to in-jvm tests. > > ** 95 jira ticke
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic (CASSANDRA-15536) so we can track 4.0 completion status in a single place. The way I see it is: * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up 4.0 per macro component. * Kanban board: a different view of CASSANDRA-15536, but all issues in the Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536. * Component sub-issues: both "new ways to test" + "bugs related to the component" * Test failure sub-issue: group test failures/flakies from any components. What do you think? Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie escreveu: > Not that I know of. Perhaps we should add a new ticket to the quality epic > to track flakey and failing tests? (@cc Josh/Jordan) > > Either a separate epic or a ticket w/sub-tasks either work well in terms of > organization. There's value in having one place to go to cleanly pull that > kind of work so I have a slight bias towards an independent epic for 4.0 > test *fixing* instead of mixing the "new ways to test" with "cleaning up > the testing ways we know". > > --- > Josh McKenzie > > > > On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta > wrote: > > > Thanks for bringing up these valuable points, Mick! In fact we focused on > > the quality epic so far but there is a lot more stuff unaddressed. I > > commented some of the points you brought up below: > > > > How will we ensure this QA persists, so it's not a manual checklist > > > > every release? > > > > This is a great question but I believe it warrants a separate discussion > > as part of a larger discussion on improving our development/quality > process > > post-4.0. > > > > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks > > > > like we have dropped the ball on this. > > > > It's sad that we dropped the ball on this important change but now I > think > > it's too late to make these changes as it will bring entropy towards > > stabilizing 4.0. In that sense I think we should postpone this to the > next > > major and prioritize it earlier in the next cycle. > > > > Do all remaining flakey and failing units and dtests have jira tickets > > > > entered for 4.0-beta? Has the same been done, at least with rough > > grouping, for the upgrade tests? Are these tied to the testing epics in > any > > way? > > > > Not that I know of. Perhaps we should add a new ticket to the quality > epic > > to track flakey and failing tests? (@cc Josh/Jordan) > > > > Has any triage efforts happened here? > > > > Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on > looking > > at it. I can take a stab at triaging some of these tickets. > > > > Do triaged bugs in this list get moved to fix version "4.x" ? > > > > I think in the spirit of expediting 4.0RC release we should mark bugs > with > > low severity (ie. those with a simple workaround) to 4.0.1. Any bug with > > medium-high severity should be marked as 4.0-rc to favor stability. > > > > Are we duplicating efforts in the testing epics when others have already > > > > identified and reported the bugs but we just haven't triage them? > > > > That's a good point. I think as part of the triaging effort we should > link > > the bugs to existing quality epics so we can keep track of them. > > > > Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe > > escreveu: > > > > On 29 Sep 2020, at 09:50, Mick Semb Wever wrote: > > > > Regarding the proposed agenda of going through the unassigned issues to > > improve visibility on what needs to be done to ship 4.0 GA I think this > > > > is > > > > a great start but only covers part of the problem. > > > > I think we have 3 outstanding issues that are hampering visibility of > > > > 4.0 > > > > progress: > > a) Quality testing issues with no shepherd; > > b) Quality testing issues with shepherd, but no recent activity (~2 > > > > months > > > > or less); > > c) Quality testing issues with no objective acceptance > > > > criteria/Definition > > > > of Done; > > > > These Quality testing epics are a great focal point. How will we ensure > > this QA persists, so it's not a manual checklist every release? > > > > The following is what I can see outstanding for the 4.0 release, that is > > not afaik attached to these epic tickets… > > > > ** Those issues that slipped alpha… > > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and > > compression in protocol v5-beta > > *** CASSANDRA-15234 – Standardise config and JVM parameters > > *** CASSANDRA-13701 – Lower default num_tokens (blocked by > > 'CASSANDRA-16079 Improve dtest runtime' ) > > ** 95 jira tickets in 4.0-beta and 4.0-rc > > ** 631 jira bug tickets with no assigned "fix version" > > ** Remaining flakey unit and dtests > > ** Hundreds of failing and flakey upgrade dtests > > ** Reports from driver tests, and other external test systems > > ** Reports and/or integration with Fallout and Harry > > > > In a bit more detail… > > > >
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
addendum: I understand the original way CASSANDRA-15536 was proposed is not the way I'm describing, but it could be easily adaptable to that so we can have a single place to track all tasks related to 4.0 quality and address some of the visibility points raised by Mick. Em ter., 29 de set. de 2020 às 11:07, Paulo Motta escreveu: > I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic > (CASSANDRA-15536) so we can track 4.0 completion status in a single place. > > The way I see it is: > * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up > 4.0 per macro component. > * Kanban board: a different view of CASSANDRA-15536, but all issues in the > Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536. > * Component sub-issues: both "new ways to test" + "bugs related to the > component" > * Test failure sub-issue: group test failures/flakies from any components. > > What do you think? > > Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie > escreveu: > >> Not that I know of. Perhaps we should add a new ticket to the quality epic >> to track flakey and failing tests? (@cc Josh/Jordan) >> >> Either a separate epic or a ticket w/sub-tasks either work well in terms >> of >> organization. There's value in having one place to go to cleanly pull that >> kind of work so I have a slight bias towards an independent epic for 4.0 >> test *fixing* instead of mixing the "new ways to test" with "cleaning up >> the testing ways we know". >> >> --- >> Josh McKenzie >> >> >> >> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta >> wrote: >> >> > Thanks for bringing up these valuable points, Mick! In fact we focused >> on >> > the quality epic so far but there is a lot more stuff unaddressed. I >> > commented some of the points you brought up below: >> > >> > How will we ensure this QA persists, so it's not a manual checklist >> > >> > every release? >> > >> > This is a great question but I believe it warrants a separate discussion >> > as part of a larger discussion on improving our development/quality >> process >> > post-4.0. >> > >> > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks >> > >> > like we have dropped the ball on this. >> > >> > It's sad that we dropped the ball on this important change but now I >> think >> > it's too late to make these changes as it will bring entropy towards >> > stabilizing 4.0. In that sense I think we should postpone this to the >> next >> > major and prioritize it earlier in the next cycle. >> > >> > Do all remaining flakey and failing units and dtests have jira tickets >> > >> > entered for 4.0-beta? Has the same been done, at least with rough >> > grouping, for the upgrade tests? Are these tied to the testing epics in >> any >> > way? >> > >> > Not that I know of. Perhaps we should add a new ticket to the quality >> epic >> > to track flakey and failing tests? (@cc Josh/Jordan) >> > >> > Has any triage efforts happened here? >> > >> > Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on >> looking >> > at it. I can take a stab at triaging some of these tickets. >> > >> > Do triaged bugs in this list get moved to fix version "4.x" ? >> > >> > I think in the spirit of expediting 4.0RC release we should mark bugs >> with >> > low severity (ie. those with a simple workaround) to 4.0.1. Any bug with >> > medium-high severity should be marked as 4.0-rc to favor stability. >> > >> > Are we duplicating efforts in the testing epics when others have already >> > >> > identified and reported the bugs but we just haven't triage them? >> > >> > That's a good point. I think as part of the triaging effort we should >> link >> > the bugs to existing quality epics so we can keep track of them. >> > >> > Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe >> > escreveu: >> > >> > On 29 Sep 2020, at 09:50, Mick Semb Wever wrote: >> > >> > Regarding the proposed agenda of going through the unassigned issues to >> > improve visibility on what needs to be done to ship 4.0 GA I think this >> > >> > is >> > >> > a great start but only covers part of the problem. >> > >> > I think we have 3 outstanding issues that are hampering visibility of >> > >> > 4.0 >> > >> > progress: >> > a) Quality testing issues with no shepherd; >> > b) Quality testing issues with shepherd, but no recent activity (~2 >> > >> > months >> > >> > or less); >> > c) Quality testing issues with no objective acceptance >> > >> > criteria/Definition >> > >> > of Done; >> > >> > These Quality testing epics are a great focal point. How will we ensure >> > this QA persists, so it's not a manual checklist every release? >> > >> > The following is what I can see outstanding for the 4.0 release, that is >> > not afaik attached to these epic tickets… >> > >> > ** Those issues that slipped alpha… >> > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming >> and >> > compression in protocol v5-beta >> > *** CASSANDRA-15234 – Standardise config a
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic > (CASSANDRA-15536) so we can track 4.0 completion status in a single place. > > The way I see it is: > * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up > 4.0 per macro component. > Isn't this hi-jacking the meaning (and value) of the "4.0-beta" and "4.0-rc" fixVersion placeholders? My understanding is that nothing should be set to fixVersion "4.0-beta" if it is not a blocker to 4.0-rc. And likewise nothing is set to "4.0-rc" unless it is a blocker to 4.0-GA. i.e. these were not wishlist placeholders. Of course folk are still free to "scratch their itch" and review and merge the non-blocker bugs from "4.x", so long as all release lifecycle concerns are met. Kinda agree with Josh here on what the epics should focus on. Personally, because that better isolates and highlights what's missing from continuous and automated QA post-4.0, looping back to my first question and concern.
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
> I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic > (CASSANDRA-15536) so we can track 4.0 completion status in a single place. Strongly recommend against this approach. If we have hundreds of failing upgrade tests (or even dozens) then we end up with a wild mix of scope in one epic. Some things 1 day tasks (fix a test), other things multi-week or month efforts (scope and build tests for area X). I thought you were suggesting we do test failures as sub-tasks on a ticket in 15536 which could work. But having them children of 15536 is just going to make that noisy enough as to be not useful. I'd recommend we rely on JQL for the release scope and kanban board it for visibility based on fixversion to have our "single pane of glass" for 4.0 progress. -- Joshua McKenzie On Tue, Sep 29, 2020 at 10:07 AM, Paulo Motta < pauloricard...@gmail.com > wrote: > > > > I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic > > (CASSANDRA-15536) so we can track 4.0 completion status in a single place. > > > > > The way I see it is: > * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up > 4.0 per macro component. > * Kanban board: a different view of CASSANDRA-15536, but all issues in the > Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536. > * Component sub-issues: both "new ways to test" + "bugs related to the > component" > * Test failure sub-issue: group test failures/flakies from any components. > > > > > What do you think? > > > > Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie < jmckenzie@ apache. org > ( jmcken...@apache.org ) > escreveu: > > >> >> >> Not that I know of. Perhaps we should add a new ticket to the quality epic >> to track flakey and failing tests? (@cc Josh/Jordan) >> >> >> >> Either a separate epic or a ticket w/sub-tasks either work well in terms >> of organization. There's value in having one place to go to cleanly pull >> that kind of work so I have a slight bias towards an independent epic for >> 4.0 test *fixing* instead of mixing the "new ways to test" with "cleaning >> up the testing ways we know". >> >> >> >> --- >> Josh McKenzie >> >> >> >> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta < pauloricardomg@ gmail. com ( >> pauloricard...@gmail.com ) > wrote: >> >> >>> >>> >>> Thanks for bringing up these valuable points, Mick! In fact we focused on >>> the quality epic so far but there is a lot more stuff unaddressed. I >>> commented some of the points you brought up below: >>> >>> >>> >>> How will we ensure this QA persists, so it's not a manual checklist >>> >>> >>> >>> every release? >>> >>> >>> >>> This is a great question but I believe it warrants a separate discussion >>> as part of a larger discussion on improving our development/quality >>> >>> >> >> >> >> process >> >> >>> >>> >>> post-4.0. >>> >>> >>> >>> *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks >>> >>> >>> >>> like we have dropped the ball on this. >>> >>> >>> >>> It's sad that we dropped the ball on this important change but now I >>> >>> >> >> >> >> think >> >> >>> >>> >>> it's too late to make these changes as it will bring entropy towards >>> stabilizing 4.0. In that sense I think we should postpone this to the >>> >>> >> >> >> >> next >> >> >>> >>> >>> major and prioritize it earlier in the next cycle. >>> >>> >>> >>> Do all remaining flakey and failing units and dtests have jira tickets >>> >>> >>> >>> entered for 4.0-beta? Has the same been done, at least with rough >>> grouping, for the upgrade tests? Are these tied to the testing epics in >>> >>> >> >> >> >> any >> >> >>> >>> >>> way? >>> >>> >>> >>> Not that I know of. Perhaps we should add a new ticket to the quality >>> >>> >> >> >> >> epic >> >> >>> >>> >>> to track flakey and failing tests? (@cc Josh/Jordan) >>> >>> >>> >>> Has any triage efforts happened here? >>> >>> >>> >>> Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on >>> >>> >> >> >> >> looking >> >> >>> >>> >>> at it. I can take a stab at triaging some of these tickets. >>> >>> >>> >>> Do triaged bugs in this list get moved to fix version "4.x" ? >>> >>> >>> >>> I think in the spirit of expediting 4.0RC release we should mark bugs >>> >>> >> >> >> >> with >> >> >>> >>> >>> low severity (ie. those with a simple workaround) to 4.0.1. Any bug with >>> medium-high severity should be marked as 4.0-rc to favor stability. >>> >>> >>> >>> Are we duplicating efforts in the testing epics when others have already >>> >>> >>> >>> identified and reported the bugs but we just haven't triage them? >>> >>> >>> >>> That's a good point. I think as part of the triaging effort we should >>> >>> >> >> >> >> link >> >> >>> >>> >>> the bugs to existing quality epics so we can keep track of them. >>> >>> >>> >>> Em ter., 29 de set. de 20
Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues
> Isn't this hi-jacking the meaning (and value) of the "4.0-beta" and "4.0-rc" fixVersion placeholders? Makes sense, I hadn't thought of this. I retract my suggestion. > Kinda agree with Josh here on what the epics should focus on. Personally, because that better isolates and highlights what's missing from continuous and automated QA post-4.0, looping back to my first question and concern. +1 > I thought you were suggesting we do test failures as sub-tasks on a ticket in 15536 which could work. But having them children of 15536 is just going to make that noisy enough as to be not useful. I was actually advocating for the former but I agree we should restrict the scope of CASSANDRA-15536 single epic.to new improvements as you suggested and Mick concurred. > I'd recommend we rely on JQL for the release scope and kanban board it for visibility based on fixversion to have our "single pane of glass" for 4.0 progress. +1 With that said, I think it could be beneficial for visibility to track flaky/test failures blocking 4.0 on a single epic with fixversion 4.0-rc. Em ter., 29 de set. de 2020 às 11:38, Joshua McKenzie < joshua.mcken...@gmail.com> escreveu: > > I personally prefer to track fail/flaky tests as sub-issue of the 4.0 > epic > > > (CASSANDRA-15536) so we can track 4.0 completion status in a single > place. > > Strongly recommend against this approach. If we have hundreds of failing > upgrade tests (or even dozens) then we end up with a wild mix of scope in > one epic. Some things 1 day tasks (fix a test), other things multi-week or > month efforts (scope and build tests for area X). > > I thought you were suggesting we do test failures as sub-tasks on a ticket > in 15536 which could work. But having them children of 15536 is just going > to make that noisy enough as to be not useful. > > I'd recommend we rely on JQL for the release scope and kanban board it for > visibility based on fixversion to have our "single pane of glass" for 4.0 > progress. > > -- > Joshua McKenzie > > On Tue, Sep 29, 2020 at 10:07 AM, Paulo Motta < pauloricard...@gmail.com > > wrote: > > > > > > > > > I personally prefer to track fail/flaky tests as sub-issue of the 4.0 > epic > > > > (CASSANDRA-15536) so we can track 4.0 completion status in a single > place. > > > > > > > > > > The way I see it is: > > * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up > > 4.0 per macro component. > > * Kanban board: a different view of CASSANDRA-15536, but all issues in > the > > Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536. > > * Component sub-issues: both "new ways to test" + "bugs related to the > > component" > > * Test failure sub-issue: group test failures/flakies from any > components. > > > > > > > > > > What do you think? > > > > > > > > Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie < jmckenzie@ > apache. org > > ( jmcken...@apache.org ) > escreveu: > > > > > >> > >> > >> Not that I know of. Perhaps we should add a new ticket to the quality > epic > >> to track flakey and failing tests? (@cc Josh/Jordan) > >> > >> > >> > >> Either a separate epic or a ticket w/sub-tasks either work well in terms > >> of organization. There's value in having one place to go to cleanly pull > >> that kind of work so I have a slight bias towards an independent epic > for > >> 4.0 test *fixing* instead of mixing the "new ways to test" with > "cleaning > >> up the testing ways we know". > >> > >> > >> > >> --- > >> Josh McKenzie > >> > >> > >> > >> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta < pauloricardomg@ gmail. > com ( > >> pauloricard...@gmail.com ) > wrote: > >> > >> > >>> > >>> > >>> Thanks for bringing up these valuable points, Mick! In fact we focused > on > >>> the quality epic so far but there is a lot more stuff unaddressed. I > >>> commented some of the points you brought up below: > >>> > >>> > >>> > >>> How will we ensure this QA persists, so it's not a manual checklist > >>> > >>> > >>> > >>> every release? > >>> > >>> > >>> > >>> This is a great question but I believe it warrants a separate > discussion > >>> as part of a larger discussion on improving our development/quality > >>> > >>> > >> > >> > >> > >> process > >> > >> > >>> > >>> > >>> post-4.0. > >>> > >>> > >>> > >>> *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks > >>> > >>> > >>> > >>> like we have dropped the ball on this. > >>> > >>> > >>> > >>> It's sad that we dropped the ball on this important change but now I > >>> > >>> > >> > >> > >> > >> think > >> > >> > >>> > >>> > >>> it's too late to make these changes as it will bring entropy towards > >>> stabilizing 4.0. In that sense I think we should postpone this to the > >>> > >>> > >> > >> > >> > >> next > >> > >> > >>> > >>> > >>> major and prioritize it earlier in the next cycle. > >>> > >>> > >>> > >>> Do all remaining flakey and failing units and dtests have jira tickets > >>> > >>> > >>> > >>> entered for 4.0-beta? Has the same been
2020-09-29 Contributor Meeting
Hi everyone, I have the meeting video and transcripts uploaded: https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-29+Apache+Cassandra+Contributor+Meeting+-+4.0+push+edition Takeaways from today's meeting - Shephards, shepherds shepherds. Quite a few places that no longer have a clear shephard to help others and push for completion. Several people will be reviewing Jiras and pinging people. Expect pings. - Identifying low hanging fruit especially with relation to 4.0 backlog. Patrick and Melissa will be taking this as an action item to help promote what's out there and get more interest from the community. - Identifying workloads and data models for better testing using Harry and NoSQL Bench. A separate [DISCUSS] thread is forthcoming. Thanks for the great participation today! Patrick
Re: [DISCUSS] Next steps for Kubernetes operator SIG
Hello Dev list, I'm Chris Bradford a Product Manager at DataStax working with the cass-operator team. For background, we started down the path of developing an operator internally to power our C*aaS platform, Astra. Care was taken from day 1 to keep anything specific to this product at a layer above cass-operator so it could solely focus on the task of operating Cassandra clusters. With that being said, every single cluster on Astra is provisioned and operated by cass-operator. The value of an advanced operator to Cassandra users is tremendous so we decided to open source the project (and associated components) with the goal of building a community. It absolutely makes sense to offer this project and codebase up for donation as a standard / baseline for running C* on Kubernetes. Below you will find a collection of cass-operator features, differentiators, and roadmap / inflight initiatives. Table-stakes Must-have functionality for a C* operator - Datacenter provisioning - Schedule all pods - Bootstrap nodes in the appropriate order - Seeds - Across racks - etc. - Uniform configuration - Scale-up - Add new nodes in a balanced manner across rack - Scale-down - Remove nodes one at a time across racks - Node recovery - Restart process - Reschedule instance (IE replace node) - Replace instance - Specific workflows for seed node replacements - Multi-DC / Multi-Rack - Multi-Region / Multi-K8s Cluster - Note this requires support at a networking layer for pod to pod IP connectivity. This may be accomplished within the cluster with CNIs like Cilium or externally via traditional networking tools. Differentiators - OSS Ecosystem / Components - Cass Config Builder - OSS project extracted from DataStax OpsCenter Life Cycle Manager to provide automated configuration file rendering - Cass Config Definitions - definitions files for cass-config-builder, defines all configuration files, their parameters, and templates - Management API for Apache Cassandra (MAAC) - Metrics Collector for Apache Cassandra (MCAC) - Reference Prometheus Operator CRDs - ServiceMonitor - Instance - Reference Grafana Operator CRDs - Instance - Dashboards - Datasource - PodTemplateSpec - Customization of existing pods including support for adding containers, volumes, etc - Advanced Networking - Node Port - Host Network - Simple security - Management API mTLS support - Automated generation of keystore and truststore for internode and client to node TLS - Automated superuser account configuration - The default superuser (cassandra/cassandra) is disabled and never available to clients - Cluster administration account may be automatically (or provided) with values stored in a k8s secret - Automatic application of NetworkTopologyStrategy with appropriate RF for system keyspaces - Validating webhook - Invalid changes are rejected with a helpful message - Rolling cluster updates - Change in binary (C* upgrade) - Change in configuration - Canary deployments - single rack application of changes for validation before broader deployment - Rolling restart - Platform Integration / Testing / Certification - Red Hat Openshift compatible and certified - Secure, Universal Base Image (UBI) foundation images with security scanning performed by Red Hat - cass-operator - cass-config-builder - apache-cassandra w/ MCAC and MAAC - Integration with Red Hat certification pipeline / marketplace - Presence in Red Hat Operator Hub built into OpenShift interface - VMware Tanzu Kubernetes Grid Integrated Edition compatible and certified - Security scanning for images performed by VMware - Amazon EKS - Google GKE - Azure AKS - Documentation / Reference Implementations - Cloud storage classes - Ingress solutions - Sample connection validation application with reference implementations of Java Driver client connection parameters - Cluster-level Stop / Resume - stop all running instances while keeping persistent storage. Allows for scaling compute down to zero. Bringing the cluster ba