date:20200929

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Mick Semb Wever

> Regarding the proposed agenda of going through the unassigned issues to
> improve visibility on what needs to be done to ship 4.0 GA I think this is
> a great start but only covers part of the problem.
>
> I think we have 3 outstanding issues that are hampering visibility of 4.0
> progress:
> a) Quality testing issues with no shepherd;
> b) Quality testing issues with shepherd, but no recent activity (~2 months
> or less);
> c) Quality testing issues with no objective acceptance criteria/Definition
> of Done;


These Quality testing epics are a great focal point.  How will we ensure
this QA persists, so it's not a manual checklist every release?

The following is what I can see outstanding for the 4.0 release, that is
not afaik attached to these epic tickets…

 ** Those issues that slipped alpha…
*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming
and compression in protocol v5-beta
*** CASSANDRA-15234 – Standardise config and JVM parameters
*** CASSANDRA-13701 – Lower default num_tokens (blocked by
'CASSANDRA-16079 Improve dtest runtime' )
 ** 95 jira tickets in 4.0-beta and 4.0-rc
 ** 631 jira bug tickets with no assigned "fix version"
 ** Remaining flakey unit and dtests
 ** Hundreds of failing and flakey upgrade dtests
 ** Reports from driver tests, and other external test systems
 ** Reports and/or integration with Fallout and Harry


In a bit more detail…


*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
compression in protocol v5-beta

This looks like it is in its final patch and review. Is that correct Sam?


*** CASSANDRA-15234 – Standardise config and JVM parameters

It looks like we have dropped the ball on this.


*** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079

Some effort is undergoing from Ekaterina, David, and myself.
I've put together a prototype for caching bootstrapped ccm clusters, but
i'm not yet sure I can get much savings over the current tests and only a
minimal saving off the 13701 patch. Berenguer brought up that 40% of the
dtests are single-node, their performance not changed by 13701, and
probably better off rewritten to in-jvm tests.


 ** 95 jira tickets in 4.0-beta and 4.0-rc
 ** Remaining flakey unit and dtests
 ** Hundreds of failing and flakey upgrade dtests

Do all remaining flakey and failing units and dtests have jira tickets
entered for 4.0-beta?
Has the same been done, at least with rough grouping, for the upgrade tests?

Are these tied to the testing epics in any way?


 ** 631 jira bug tickets with no assigned "fix version" (who knows how many
of these are applicable to 4.0?)

Has any triage efforts happened here?
Do triaged bugs in this list get moved to fix version "4.x" ?
Are we duplicating efforts in the testing epics when others have already
identified and reported the bugs but we just haven't triage them?

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Sam Tunnicliffe




> On 29 Sep 2020, at 09:50, Mick Semb Wever  wrote:
> 
>> Regarding the proposed agenda of going through the unassigned issues to
>> improve visibility on what needs to be done to ship 4.0 GA I think this is
>> a great start but only covers part of the problem.
>> 
>> I think we have 3 outstanding issues that are hampering visibility of 4.0
>> progress:
>> a) Quality testing issues with no shepherd;
>> b) Quality testing issues with shepherd, but no recent activity (~2 months
>> or less);
>> c) Quality testing issues with no objective acceptance criteria/Definition
>> of Done;
> 
> 
> These Quality testing epics are a great focal point.  How will we ensure
> this QA persists, so it's not a manual checklist every release?
> 
> The following is what I can see outstanding for the 4.0 release, that is
> not afaik attached to these epic tickets…
> 
> ** Those issues that slipped alpha…
>*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming
> and compression in protocol v5-beta
>*** CASSANDRA-15234 – Standardise config and JVM parameters
>*** CASSANDRA-13701 – Lower default num_tokens (blocked by
> 'CASSANDRA-16079 Improve dtest runtime' )
> ** 95 jira tickets in 4.0-beta and 4.0-rc
> ** 631 jira bug tickets with no assigned "fix version"
> ** Remaining flakey unit and dtests
> ** Hundreds of failing and flakey upgrade dtests
> ** Reports from driver tests, and other external test systems
> ** Reports and/or integration with Fallout and Harry
> 
> 
> In a bit more detail…
> 
> 
> *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
> compression in protocol v5-beta
> 
> This looks like it is in its final patch and review. Is that correct Sam?
> 

Yes it is. I hope to get review finished and post some further perf numbers 
this week.

> 
> *** CASSANDRA-15234 – Standardise config and JVM parameters
> 
> It looks like we have dropped the ball on this.
> 
> 
> *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079
> 
> Some effort is undergoing from Ekaterina, David, and myself.
> I've put together a prototype for caching bootstrapped ccm clusters, but
> i'm not yet sure I can get much savings over the current tests and only a
> minimal saving off the 13701 patch. Berenguer brought up that 40% of the
> dtests are single-node, their performance not changed by 13701, and
> probably better off rewritten to in-jvm tests.
> 
> 
> ** 95 jira tickets in 4.0-beta and 4.0-rc
> ** Remaining flakey unit and dtests
> ** Hundreds of failing and flakey upgrade dtests
> 
> Do all remaining flakey and failing units and dtests have jira tickets
> entered for 4.0-beta?
> Has the same been done, at least with rough grouping, for the upgrade tests?
> 
> Are these tied to the testing epics in any way?
> 
> 
> ** 631 jira bug tickets with no assigned "fix version" (who knows how many
> of these are applicable to 4.0?)
> 
> Has any triage efforts happened here?
> Do triaged bugs in this list get moved to fix version "4.x" ?
> Are we duplicating efforts in the testing epics when others have already
> identified and reported the bugs but we just haven't triage them?


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Paulo Motta

Thanks for bringing up these valuable points, Mick! In fact we focused on
the quality epic so far but there is a lot more stuff unaddressed. I
commented some of the points you brought up below:

>  How will we ensure this QA persists, so it's not a manual checklist
every release?

This is a great question but I believe it warrants a separate discussion as
part of a larger discussion on improving our development/quality process
post-4.0.

> *** CASSANDRA-15234 – Standardise config and JVM parameters -  It looks
like we have dropped the ball on this.

It's sad that we dropped the ball on this important change but now I think
it's too late to make these changes as it will bring entropy towards
stabilizing 4.0. In that sense I think we should postpone this to the next
major and prioritize it earlier in the next cycle.

> Do all remaining flakey and failing units and dtests have jira tickets
entered for 4.0-beta? Has the same been done, at least with rough grouping,
for the upgrade tests?  Are these tied to the testing epics in any way?

Not that I know of. Perhaps we should add a new ticket to the quality epic
to track flakey and failing tests? (@cc Josh/Jordan)

> Has any triage efforts happened here?

Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on looking
at it. I can take a stab at triaging some of these tickets.

>  Do triaged bugs in this list get moved to fix version "4.x" ?

I think in the spirit of expediting 4.0RC release we should mark bugs with
low severity (ie. those with a simple workaround) to 4.0.1. Any bug with
medium-high severity should be marked as 4.0-rc to favor stability.

> Are we duplicating efforts in the testing epics when others have already
identified and reported the bugs but we just haven't triage them?

That's a good point. I think as part of the triaging effort we should link
the bugs to existing quality epics so we can keep track of them.

Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe 
escreveu:

>
>
> > On 29 Sep 2020, at 09:50, Mick Semb Wever  wrote:
> >
> >> Regarding the proposed agenda of going through the unassigned issues to
> >> improve visibility on what needs to be done to ship 4.0 GA I think this
> is
> >> a great start but only covers part of the problem.
> >>
> >> I think we have 3 outstanding issues that are hampering visibility of
> 4.0
> >> progress:
> >> a) Quality testing issues with no shepherd;
> >> b) Quality testing issues with shepherd, but no recent activity (~2
> months
> >> or less);
> >> c) Quality testing issues with no objective acceptance
> criteria/Definition
> >> of Done;
> >
> >
> > These Quality testing epics are a great focal point.  How will we ensure
> > this QA persists, so it's not a manual checklist every release?
> >
> > The following is what I can see outstanding for the 4.0 release, that is
> > not afaik attached to these epic tickets…
> >
> > ** Those issues that slipped alpha…
> >*** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming
> > and compression in protocol v5-beta
> >*** CASSANDRA-15234 – Standardise config and JVM parameters
> >*** CASSANDRA-13701 – Lower default num_tokens (blocked by
> > 'CASSANDRA-16079 Improve dtest runtime' )
> > ** 95 jira tickets in 4.0-beta and 4.0-rc
> > ** 631 jira bug tickets with no assigned "fix version"
> > ** Remaining flakey unit and dtests
> > ** Hundreds of failing and flakey upgrade dtests
> > ** Reports from driver tests, and other external test systems
> > ** Reports and/or integration with Fallout and Harry
> >
> >
> > In a bit more detail…
> >
> >
> > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
> > compression in protocol v5-beta
> >
> > This looks like it is in its final patch and review. Is that correct Sam?
> >
>
> Yes it is. I hope to get review finished and post some further perf
> numbers this week.
>
> >
> > *** CASSANDRA-15234 – Standardise config and JVM parameters
> >
> > It looks like we have dropped the ball on this.
> >
> >
> > *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079
> >
> > Some effort is undergoing from Ekaterina, David, and myself.
> > I've put together a prototype for caching bootstrapped ccm clusters, but
> > i'm not yet sure I can get much savings over the current tests and only a
> > minimal saving off the 13701 patch. Berenguer brought up that 40% of the
> > dtests are single-node, their performance not changed by 13701, and
> > probably better off rewritten to in-jvm tests.
> >
> >
> > ** 95 jira tickets in 4.0-beta and 4.0-rc
> > ** Remaining flakey unit and dtests
> > ** Hundreds of failing and flakey upgrade dtests
> >
> > Do all remaining flakey and failing units and dtests have jira tickets
> > entered for 4.0-beta?
> > Has the same been done, at least with rough grouping, for the upgrade
> tests?
> >
> > Are these tied to the testing epics in any way?
> >
> >
> > ** 631 jira bug tickets with no assigned "fix version" (who knows how
> many
> > of

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Josh McKenzie

Not that I know of. Perhaps we should add a new ticket to the quality epic
to track flakey and failing tests? (@cc Josh/Jordan)

Either a separate epic or a ticket w/sub-tasks either work well in terms of
organization. There's value in having one place to go to cleanly pull that
kind of work so I have a slight bias towards an independent epic for 4.0
test *fixing* instead of mixing the "new ways to test" with "cleaning up
the testing ways we know".

---
Josh McKenzie



On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta 
wrote:

> Thanks for bringing up these valuable points, Mick! In fact we focused on
> the quality epic so far but there is a lot more stuff unaddressed. I
> commented some of the points you brought up below:
>
> How will we ensure this QA persists, so it's not a manual checklist
>
> every release?
>
> This is a great question but I believe it warrants a separate discussion
> as part of a larger discussion on improving our development/quality process
> post-4.0.
>
> *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks
>
> like we have dropped the ball on this.
>
> It's sad that we dropped the ball on this important change but now I think
> it's too late to make these changes as it will bring entropy towards
> stabilizing 4.0. In that sense I think we should postpone this to the next
> major and prioritize it earlier in the next cycle.
>
> Do all remaining flakey and failing units and dtests have jira tickets
>
> entered for 4.0-beta? Has the same been done, at least with rough
> grouping, for the upgrade tests? Are these tied to the testing epics in any
> way?
>
> Not that I know of. Perhaps we should add a new ticket to the quality epic
> to track flakey and failing tests? (@cc Josh/Jordan)
>
> Has any triage efforts happened here?
>
> Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on looking
> at it. I can take a stab at triaging some of these tickets.
>
> Do triaged bugs in this list get moved to fix version "4.x" ?
>
> I think in the spirit of expediting 4.0RC release we should mark bugs with
> low severity (ie. those with a simple workaround) to 4.0.1. Any bug with
> medium-high severity should be marked as 4.0-rc to favor stability.
>
> Are we duplicating efforts in the testing epics when others have already
>
> identified and reported the bugs but we just haven't triage them?
>
> That's a good point. I think as part of the triaging effort we should link
> the bugs to existing quality epics so we can keep track of them.
>
> Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe 
> escreveu:
>
> On 29 Sep 2020, at 09:50, Mick Semb Wever  wrote:
>
> Regarding the proposed agenda of going through the unassigned issues to
> improve visibility on what needs to be done to ship 4.0 GA I think this
>
> is
>
> a great start but only covers part of the problem.
>
> I think we have 3 outstanding issues that are hampering visibility of
>
> 4.0
>
> progress:
> a) Quality testing issues with no shepherd;
> b) Quality testing issues with shepherd, but no recent activity (~2
>
> months
>
> or less);
> c) Quality testing issues with no objective acceptance
>
> criteria/Definition
>
> of Done;
>
> These Quality testing epics are a great focal point. How will we ensure
> this QA persists, so it's not a manual checklist every release?
>
> The following is what I can see outstanding for the 4.0 release, that is
> not afaik attached to these epic tickets…
>
> ** Those issues that slipped alpha…
> *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
> compression in protocol v5-beta
> *** CASSANDRA-15234 – Standardise config and JVM parameters
> *** CASSANDRA-13701 – Lower default num_tokens (blocked by
> 'CASSANDRA-16079 Improve dtest runtime' )
> ** 95 jira tickets in 4.0-beta and 4.0-rc
> ** 631 jira bug tickets with no assigned "fix version"
> ** Remaining flakey unit and dtests
> ** Hundreds of failing and flakey upgrade dtests
> ** Reports from driver tests, and other external test systems
> ** Reports and/or integration with Fallout and Harry
>
> In a bit more detail…
>
> *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
> compression in protocol v5-beta
>
> This looks like it is in its final patch and review. Is that correct Sam?
>
> Yes it is. I hope to get review finished and post some further perf
> numbers this week.
>
> *** CASSANDRA-15234 – Standardise config and JVM parameters
>
> It looks like we have dropped the ball on this.
>
> *** CASSANDRA-13701 – Lower default num_tokens, and CASSANDRA-16079
>
> Some effort is undergoing from Ekaterina, David, and myself. I've put
> together a prototype for caching bootstrapped ccm clusters, but i'm not yet
> sure I can get much savings over the current tests and only a minimal
> saving off the 13701 patch. Berenguer brought up that 40% of the dtests are
> single-node, their performance not changed by 13701, and probably better
> off rewritten to in-jvm tests.
>
> ** 95 jira ticke

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Paulo Motta

I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic
(CASSANDRA-15536) so we can track 4.0 completion status in a single place.

The way I see it is:
* CASSANDRA-15536 epic: track everything that needs to be done to wrap-up
4.0 per macro component.
* Kanban board: a different view of CASSANDRA-15536, but all issues in the
Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536.
* Component sub-issues: both "new ways to test" + "bugs related to the
component"
* Test failure sub-issue: group test failures/flakies from any components.

What do you think?

Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie 
escreveu:

> Not that I know of. Perhaps we should add a new ticket to the quality epic
> to track flakey and failing tests? (@cc Josh/Jordan)
>
> Either a separate epic or a ticket w/sub-tasks either work well in terms of
> organization. There's value in having one place to go to cleanly pull that
> kind of work so I have a slight bias towards an independent epic for 4.0
> test *fixing* instead of mixing the "new ways to test" with "cleaning up
> the testing ways we know".
>
> ---
> Josh McKenzie
>
>
>
> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta 
> wrote:
>
> > Thanks for bringing up these valuable points, Mick! In fact we focused on
> > the quality epic so far but there is a lot more stuff unaddressed. I
> > commented some of the points you brought up below:
> >
> > How will we ensure this QA persists, so it's not a manual checklist
> >
> > every release?
> >
> > This is a great question but I believe it warrants a separate discussion
> > as part of a larger discussion on improving our development/quality
> process
> > post-4.0.
> >
> > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks
> >
> > like we have dropped the ball on this.
> >
> > It's sad that we dropped the ball on this important change but now I
> think
> > it's too late to make these changes as it will bring entropy towards
> > stabilizing 4.0. In that sense I think we should postpone this to the
> next
> > major and prioritize it earlier in the next cycle.
> >
> > Do all remaining flakey and failing units and dtests have jira tickets
> >
> > entered for 4.0-beta? Has the same been done, at least with rough
> > grouping, for the upgrade tests? Are these tied to the testing epics in
> any
> > way?
> >
> > Not that I know of. Perhaps we should add a new ticket to the quality
> epic
> > to track flakey and failing tests? (@cc Josh/Jordan)
> >
> > Has any triage efforts happened here?
> >
> > Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on
> looking
> > at it. I can take a stab at triaging some of these tickets.
> >
> > Do triaged bugs in this list get moved to fix version "4.x" ?
> >
> > I think in the spirit of expediting 4.0RC release we should mark bugs
> with
> > low severity (ie. those with a simple workaround) to 4.0.1. Any bug with
> > medium-high severity should be marked as 4.0-rc to favor stability.
> >
> > Are we duplicating efforts in the testing epics when others have already
> >
> > identified and reported the bugs but we just haven't triage them?
> >
> > That's a good point. I think as part of the triaging effort we should
> link
> > the bugs to existing quality epics so we can keep track of them.
> >
> > Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe 
> > escreveu:
> >
> > On 29 Sep 2020, at 09:50, Mick Semb Wever  wrote:
> >
> > Regarding the proposed agenda of going through the unassigned issues to
> > improve visibility on what needs to be done to ship 4.0 GA I think this
> >
> > is
> >
> > a great start but only covers part of the problem.
> >
> > I think we have 3 outstanding issues that are hampering visibility of
> >
> > 4.0
> >
> > progress:
> > a) Quality testing issues with no shepherd;
> > b) Quality testing issues with shepherd, but no recent activity (~2
> >
> > months
> >
> > or less);
> > c) Quality testing issues with no objective acceptance
> >
> > criteria/Definition
> >
> > of Done;
> >
> > These Quality testing epics are a great focal point. How will we ensure
> > this QA persists, so it's not a manual checklist every release?
> >
> > The following is what I can see outstanding for the 4.0 release, that is
> > not afaik attached to these epic tickets…
> >
> > ** Those issues that slipped alpha…
> > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming and
> > compression in protocol v5-beta
> > *** CASSANDRA-15234 – Standardise config and JVM parameters
> > *** CASSANDRA-13701 – Lower default num_tokens (blocked by
> > 'CASSANDRA-16079 Improve dtest runtime' )
> > ** 95 jira tickets in 4.0-beta and 4.0-rc
> > ** 631 jira bug tickets with no assigned "fix version"
> > ** Remaining flakey unit and dtests
> > ** Hundreds of failing and flakey upgrade dtests
> > ** Reports from driver tests, and other external test systems
> > ** Reports and/or integration with Fallout and Harry
> >
> > In a bit more detail…
> >
> >

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Paulo Motta

addendum: I understand the original way CASSANDRA-15536 was proposed is not
the way I'm describing, but it could be easily adaptable to that so we can
have a single place to track all tasks related to 4.0 quality and address
some of the visibility points raised by Mick.

Em ter., 29 de set. de 2020 às 11:07, Paulo Motta 
escreveu:

> I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic
> (CASSANDRA-15536) so we can track 4.0 completion status in a single place.
>
> The way I see it is:
> * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up
> 4.0 per macro component.
> * Kanban board: a different view of CASSANDRA-15536, but all issues in the
> Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536.
> * Component sub-issues: both "new ways to test" + "bugs related to the
> component"
> * Test failure sub-issue: group test failures/flakies from any components.
>
> What do you think?
>
> Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie 
> escreveu:
>
>> Not that I know of. Perhaps we should add a new ticket to the quality epic
>> to track flakey and failing tests? (@cc Josh/Jordan)
>>
>> Either a separate epic or a ticket w/sub-tasks either work well in terms
>> of
>> organization. There's value in having one place to go to cleanly pull that
>> kind of work so I have a slight bias towards an independent epic for 4.0
>> test *fixing* instead of mixing the "new ways to test" with "cleaning up
>> the testing ways we know".
>>
>> ---
>> Josh McKenzie
>>
>>
>>
>> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta 
>> wrote:
>>
>> > Thanks for bringing up these valuable points, Mick! In fact we focused
>> on
>> > the quality epic so far but there is a lot more stuff unaddressed. I
>> > commented some of the points you brought up below:
>> >
>> > How will we ensure this QA persists, so it's not a manual checklist
>> >
>> > every release?
>> >
>> > This is a great question but I believe it warrants a separate discussion
>> > as part of a larger discussion on improving our development/quality
>> process
>> > post-4.0.
>> >
>> > *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks
>> >
>> > like we have dropped the ball on this.
>> >
>> > It's sad that we dropped the ball on this important change but now I
>> think
>> > it's too late to make these changes as it will bring entropy towards
>> > stabilizing 4.0. In that sense I think we should postpone this to the
>> next
>> > major and prioritize it earlier in the next cycle.
>> >
>> > Do all remaining flakey and failing units and dtests have jira tickets
>> >
>> > entered for 4.0-beta? Has the same been done, at least with rough
>> > grouping, for the upgrade tests? Are these tied to the testing epics in
>> any
>> > way?
>> >
>> > Not that I know of. Perhaps we should add a new ticket to the quality
>> epic
>> > to track flakey and failing tests? (@cc Josh/Jordan)
>> >
>> > Has any triage efforts happened here?
>> >
>> > Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on
>> looking
>> > at it. I can take a stab at triaging some of these tickets.
>> >
>> > Do triaged bugs in this list get moved to fix version "4.x" ?
>> >
>> > I think in the spirit of expediting 4.0RC release we should mark bugs
>> with
>> > low severity (ie. those with a simple workaround) to 4.0.1. Any bug with
>> > medium-high severity should be marked as 4.0-rc to favor stability.
>> >
>> > Are we duplicating efforts in the testing epics when others have already
>> >
>> > identified and reported the bugs but we just haven't triage them?
>> >
>> > That's a good point. I think as part of the triaging effort we should
>> link
>> > the bugs to existing quality epics so we can keep track of them.
>> >
>> > Em ter., 29 de set. de 2020 às 06:11, Sam Tunnicliffe 
>> > escreveu:
>> >
>> > On 29 Sep 2020, at 09:50, Mick Semb Wever  wrote:
>> >
>> > Regarding the proposed agenda of going through the unassigned issues to
>> > improve visibility on what needs to be done to ship 4.0 GA I think this
>> >
>> > is
>> >
>> > a great start but only covers part of the problem.
>> >
>> > I think we have 3 outstanding issues that are hampering visibility of
>> >
>> > 4.0
>> >
>> > progress:
>> > a) Quality testing issues with no shepherd;
>> > b) Quality testing issues with shepherd, but no recent activity (~2
>> >
>> > months
>> >
>> > or less);
>> > c) Quality testing issues with no objective acceptance
>> >
>> > criteria/Definition
>> >
>> > of Done;
>> >
>> > These Quality testing epics are a great focal point. How will we ensure
>> > this QA persists, so it's not a manual checklist every release?
>> >
>> > The following is what I can see outstanding for the 4.0 release, that is
>> > not afaik attached to these epic tickets…
>> >
>> > ** Those issues that slipped alpha…
>> > *** CASSANDRA-15299 – CASSANDRA-13304 follow-up: improve checksumming
>> and
>> > compression in protocol v5-beta
>> > *** CASSANDRA-15234 – Standardise config a

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Mick Semb Wever

I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic
> (CASSANDRA-15536) so we can track 4.0 completion status in a single place.
>
> The way I see it is:
> * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up
> 4.0 per macro component.
>


Isn't this hi-jacking the meaning (and value) of the "4.0-beta" and
"4.0-rc" fixVersion placeholders?

My understanding is that nothing should be set to fixVersion "4.0-beta" if
it is not a blocker to 4.0-rc. And likewise nothing is set to "4.0-rc"
unless it is a blocker to 4.0-GA. i.e. these were not wishlist placeholders.

Of course folk are still free to "scratch their itch" and review and merge
the non-blocker bugs from "4.x", so long as all release lifecycle concerns
are met.

Kinda agree with Josh here on what the epics should focus on. Personally,
because that better isolates and highlights what's missing from continuous
and automated QA post-4.0, looping back to my first question and concern.

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Joshua McKenzie

> I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic

> (CASSANDRA-15536) so we can track 4.0 completion status in a single place.

Strongly recommend against this approach. If we have hundreds of failing 
upgrade tests (or even dozens) then we end up with a wild mix of scope in one 
epic. Some things 1 day tasks (fix a test), other things multi-week or month 
efforts (scope and build tests for area X).

I thought you were suggesting we do test failures as sub-tasks on a ticket in 
15536 which could work. But having them children of 15536 is just going to make 
that noisy enough as to be not useful.

I'd recommend we rely on JQL for the release scope and kanban board it for 
visibility based on fixversion to have our "single pane of glass" for 4.0 
progress.

--
Joshua McKenzie

On Tue, Sep 29, 2020 at 10:07 AM, Paulo Motta < pauloricard...@gmail.com > 
wrote:

> 
> 
> 
> I personally prefer to track fail/flaky tests as sub-issue of the 4.0 epic
> 
> (CASSANDRA-15536) so we can track 4.0 completion status in a single place.
> 
> 
> 
> 
> The way I see it is:
> * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up
> 4.0 per macro component.
> * Kanban board: a different view of CASSANDRA-15536, but all issues in the
> Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536.
> * Component sub-issues: both "new ways to test" + "bugs related to the
> component"
> * Test failure sub-issue: group test failures/flakies from any components.
> 
> 
> 
> 
> What do you think?
> 
> 
> 
> Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie < jmckenzie@ apache. org
> ( jmcken...@apache.org ) > escreveu:
> 
> 
>> 
>> 
>> Not that I know of. Perhaps we should add a new ticket to the quality epic
>> to track flakey and failing tests? (@cc Josh/Jordan)
>> 
>> 
>> 
>> Either a separate epic or a ticket w/sub-tasks either work well in terms
>> of organization. There's value in having one place to go to cleanly pull
>> that kind of work so I have a slight bias towards an independent epic for
>> 4.0 test *fixing* instead of mixing the "new ways to test" with "cleaning
>> up the testing ways we know".
>> 
>> 
>> 
>> ---
>> Josh McKenzie
>> 
>> 
>> 
>> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta < pauloricardomg@ gmail. com (
>> pauloricard...@gmail.com ) > wrote:
>> 
>> 
>>> 
>>> 
>>> Thanks for bringing up these valuable points, Mick! In fact we focused on
>>> the quality epic so far but there is a lot more stuff unaddressed. I
>>> commented some of the points you brought up below:
>>> 
>>> 
>>> 
>>> How will we ensure this QA persists, so it's not a manual checklist
>>> 
>>> 
>>> 
>>> every release?
>>> 
>>> 
>>> 
>>> This is a great question but I believe it warrants a separate discussion
>>> as part of a larger discussion on improving our development/quality
>>> 
>>> 
>> 
>> 
>> 
>> process
>> 
>> 
>>> 
>>> 
>>> post-4.0.
>>> 
>>> 
>>> 
>>> *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks
>>> 
>>> 
>>> 
>>> like we have dropped the ball on this.
>>> 
>>> 
>>> 
>>> It's sad that we dropped the ball on this important change but now I
>>> 
>>> 
>> 
>> 
>> 
>> think
>> 
>> 
>>> 
>>> 
>>> it's too late to make these changes as it will bring entropy towards
>>> stabilizing 4.0. In that sense I think we should postpone this to the
>>> 
>>> 
>> 
>> 
>> 
>> next
>> 
>> 
>>> 
>>> 
>>> major and prioritize it earlier in the next cycle.
>>> 
>>> 
>>> 
>>> Do all remaining flakey and failing units and dtests have jira tickets
>>> 
>>> 
>>> 
>>> entered for 4.0-beta? Has the same been done, at least with rough
>>> grouping, for the upgrade tests? Are these tied to the testing epics in
>>> 
>>> 
>> 
>> 
>> 
>> any
>> 
>> 
>>> 
>>> 
>>> way?
>>> 
>>> 
>>> 
>>> Not that I know of. Perhaps we should add a new ticket to the quality
>>> 
>>> 
>> 
>> 
>> 
>> epic
>> 
>> 
>>> 
>>> 
>>> to track flakey and failing tests? (@cc Josh/Jordan)
>>> 
>>> 
>>> 
>>> Has any triage efforts happened here?
>>> 
>>> 
>>> 
>>> Not that I know of but maybe Josh/Jordan/Jon (J^3) are planning on
>>> 
>>> 
>> 
>> 
>> 
>> looking
>> 
>> 
>>> 
>>> 
>>> at it. I can take a stab at triaging some of these tickets.
>>> 
>>> 
>>> 
>>> Do triaged bugs in this list get moved to fix version "4.x" ?
>>> 
>>> 
>>> 
>>> I think in the spirit of expediting 4.0RC release we should mark bugs
>>> 
>>> 
>> 
>> 
>> 
>> with
>> 
>> 
>>> 
>>> 
>>> low severity (ie. those with a simple workaround) to 4.0.1. Any bug with
>>> medium-high severity should be marked as 4.0-rc to favor stability.
>>> 
>>> 
>>> 
>>> Are we duplicating efforts in the testing epics when others have already
>>> 
>>> 
>>> 
>>> identified and reported the bugs but we just haven't triage them?
>>> 
>>> 
>>> 
>>> That's a good point. I think as part of the triaging effort we should
>>> 
>>> 
>> 
>> 
>> 
>> link
>> 
>> 
>>> 
>>> 
>>> the bugs to existing quality epics so we can keep track of them.
>>> 
>>> 
>>> 
>>> Em ter., 29 de set. de 20

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Thread Paulo Motta

> Isn't this hi-jacking the meaning (and value) of the "4.0-beta" and
"4.0-rc" fixVersion placeholders?

Makes sense, I hadn't thought of this. I retract my suggestion.

> Kinda agree with Josh here on what the epics should focus on. Personally,
because that better isolates and highlights what's missing from continuous
and automated QA post-4.0, looping back to my first question and concern.

+1

> I thought you were suggesting we do test failures as sub-tasks on a
ticket in 15536 which could work. But having them children of 15536 is just
going to make that noisy enough as to be not useful.

I was actually advocating for the former but I agree we should restrict the
scope of CASSANDRA-15536 single epic.to new improvements as you suggested
and Mick concurred.

>  I'd recommend we rely on JQL for the release scope and kanban board it
for visibility based on fixversion to have our "single pane of glass" for
4.0 progress.

+1

With that said, I think it could be beneficial for visibility to track
flaky/test failures blocking 4.0 on a single epic with fixversion 4.0-rc.

Em ter., 29 de set. de 2020 às 11:38, Joshua McKenzie <
joshua.mcken...@gmail.com> escreveu:

> > I personally prefer to track fail/flaky tests as sub-issue of the 4.0
> epic
>
> > (CASSANDRA-15536) so we can track 4.0 completion status in a single
> place.
>
> Strongly recommend against this approach. If we have hundreds of failing
> upgrade tests (or even dozens) then we end up with a wild mix of scope in
> one epic. Some things 1 day tasks (fix a test), other things multi-week or
> month efforts (scope and build tests for area X).
>
> I thought you were suggesting we do test failures as sub-tasks on a ticket
> in 15536 which could work. But having them children of 15536 is just going
> to make that noisy enough as to be not useful.
>
> I'd recommend we rely on JQL for the release scope and kanban board it for
> visibility based on fixversion to have our "single pane of glass" for 4.0
> progress.
>
> --
> Joshua McKenzie
>
> On Tue, Sep 29, 2020 at 10:07 AM, Paulo Motta < pauloricard...@gmail.com
> > wrote:
>
> >
> >
> >
> > I personally prefer to track fail/flaky tests as sub-issue of the 4.0
> epic
> >
> > (CASSANDRA-15536) so we can track 4.0 completion status in a single
> place.
> >
> >
> >
> >
> > The way I see it is:
> > * CASSANDRA-15536 epic: track everything that needs to be done to wrap-up
> > 4.0 per macro component.
> > * Kanban board: a different view of CASSANDRA-15536, but all issues in
> the
> > Kanban should be ultimately tied to a sub-issue on CASSANDRA-15536.
> > * Component sub-issues: both "new ways to test" + "bugs related to the
> > component"
> > * Test failure sub-issue: group test failures/flakies from any
> components.
> >
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Em ter., 29 de set. de 2020 às 10:47, Josh McKenzie < jmckenzie@
> apache. org
> > ( jmcken...@apache.org ) > escreveu:
> >
> >
> >>
> >>
> >> Not that I know of. Perhaps we should add a new ticket to the quality
> epic
> >> to track flakey and failing tests? (@cc Josh/Jordan)
> >>
> >>
> >>
> >> Either a separate epic or a ticket w/sub-tasks either work well in terms
> >> of organization. There's value in having one place to go to cleanly pull
> >> that kind of work so I have a slight bias towards an independent epic
> for
> >> 4.0 test *fixing* instead of mixing the "new ways to test" with
> "cleaning
> >> up the testing ways we know".
> >>
> >>
> >>
> >> ---
> >> Josh McKenzie
> >>
> >>
> >>
> >> On Tue, Sep 29, 2020 at 8:34 AM, Paulo Motta < pauloricardomg@ gmail.
> com (
> >> pauloricard...@gmail.com ) > wrote:
> >>
> >>
> >>>
> >>>
> >>> Thanks for bringing up these valuable points, Mick! In fact we focused
> on
> >>> the quality epic so far but there is a lot more stuff unaddressed. I
> >>> commented some of the points you brought up below:
> >>>
> >>>
> >>>
> >>> How will we ensure this QA persists, so it's not a manual checklist
> >>>
> >>>
> >>>
> >>> every release?
> >>>
> >>>
> >>>
> >>> This is a great question but I believe it warrants a separate
> discussion
> >>> as part of a larger discussion on improving our development/quality
> >>>
> >>>
> >>
> >>
> >>
> >> process
> >>
> >>
> >>>
> >>>
> >>> post-4.0.
> >>>
> >>>
> >>>
> >>> *** CASSANDRA-15234 – Standardise config and JVM parameters - It looks
> >>>
> >>>
> >>>
> >>> like we have dropped the ball on this.
> >>>
> >>>
> >>>
> >>> It's sad that we dropped the ball on this important change but now I
> >>>
> >>>
> >>
> >>
> >>
> >> think
> >>
> >>
> >>>
> >>>
> >>> it's too late to make these changes as it will bring entropy towards
> >>> stabilizing 4.0. In that sense I think we should postpone this to the
> >>>
> >>>
> >>
> >>
> >>
> >> next
> >>
> >>
> >>>
> >>>
> >>> major and prioritize it earlier in the next cycle.
> >>>
> >>>
> >>>
> >>> Do all remaining flakey and failing units and dtests have jira tickets
> >>>
> >>>
> >>>
> >>> entered for 4.0-beta? Has the same been

2020-09-29 Contributor Meeting

2020-09-29 Thread Patrick McFadin

Hi everyone,

I have the meeting video and transcripts uploaded:
https://cwiki.apache.org/confluence/display/CASSANDRA/2020-09-29+Apache+Cassandra+Contributor+Meeting+-+4.0+push+edition

Takeaways from today's meeting

 - Shephards, shepherds shepherds. Quite a few places that no longer have a
clear shephard to help others and push for completion. Several people will
be reviewing Jiras and pinging people. Expect pings.

 - Identifying low hanging fruit especially with relation to 4.0 backlog.
Patrick and Melissa will be taking this as an action item to help promote
what's out there and get more interest from the community.

 - Identifying workloads and data models for better testing using Harry and
NoSQL Bench. A separate [DISCUSS] thread is forthcoming.

Thanks for the great participation today!

Patrick

Re: [DISCUSS] Next steps for Kubernetes operator SIG

2020-09-29 Thread Christopher Bradford

Hello Dev list,

I'm Chris Bradford a Product Manager at DataStax working with the
cass-operator team. For background, we started down the path of developing
an operator internally to power our C*aaS platform, Astra. Care was taken
from day 1 to keep anything specific to this product at a layer above
cass-operator so it could solely focus on the task of operating Cassandra
clusters. With that being said, every single cluster on Astra is
provisioned and operated by cass-operator. The value of an advanced
operator to Cassandra users is tremendous so we decided to open source the
project (and associated components) with the goal of building a community.
It absolutely makes sense to offer this project and codebase up for
donation as a standard / baseline for running C* on Kubernetes.

Below you will find a collection of cass-operator features,
differentiators, and roadmap / inflight initiatives.
Table-stakes
Must-have functionality for a C* operator

   -

   Datacenter provisioning
   -

  Schedule all pods
  -

  Bootstrap nodes in the appropriate order
  -

 Seeds
 -

 Across racks
 -

 etc.
 -

  Uniform configuration
  -

   Scale-up
   -

  Add new nodes in a balanced manner across rack
  -

   Scale-down
   -

  Remove nodes one at a time across racks
  -

   Node recovery
   -

  Restart process
  -

  Reschedule instance (IE replace node)
  - Replace instance
  -

  Specific workflows for seed node replacements
  -

   Multi-DC / Multi-Rack
   -

   Multi-Region / Multi-K8s Cluster
   -

  Note this requires support at a networking layer for pod to pod IP
  connectivity. This may be accomplished within the cluster with CNIs like
  Cilium or externally via traditional networking tools.

Differentiators

   -

   OSS Ecosystem / Components
   -

  Cass Config Builder - OSS project extracted from DataStax OpsCenter
  Life Cycle Manager to provide automated configuration file rendering
  -

  Cass Config Definitions - definitions files for cass-config-builder,
  defines all configuration files, their parameters, and templates
  -

  Management API for Apache Cassandra (MAAC)
  -

  Metrics Collector for Apache Cassandra (MCAC)
  -

 Reference Prometheus Operator CRDs
 -

ServiceMonitor
-

Instance
-

 Reference Grafana Operator CRDs
 -

Instance
-

Dashboards
-

Datasource
-

   PodTemplateSpec
   -

  Customization of existing pods including support for adding
  containers, volumes, etc
  -

   Advanced Networking
   -

  Node Port
  -

  Host Network
  -

   Simple security
   -

  Management API mTLS support
  -

  Automated generation of keystore and truststore for internode and
  client to node TLS
  -

   Automated superuser account configuration
   -

  The default superuser (cassandra/cassandra) is disabled and never
  available to clients
  -

  Cluster administration account may be automatically (or provided)
  with values stored in a k8s secret
  -

   Automatic application of NetworkTopologyStrategy with appropriate RF for
   system keyspaces
   -

   Validating webhook
   -

  Invalid changes are rejected with a helpful message
  -

   Rolling cluster updates
   -

  Change in binary (C* upgrade)
  -

  Change in configuration
  -

  Canary deployments - single rack application of changes for
  validation before broader deployment
  -

   Rolling restart
   -

   Platform Integration / Testing / Certification
   -

  Red Hat Openshift compatible and certified
  -

 Secure, Universal Base Image (UBI) foundation images with security
 scanning performed by Red Hat
 -

cass-operator
-

cass-config-builder
-

apache-cassandra w/ MCAC and MAAC
-

 Integration with Red Hat certification pipeline / marketplace
 -

 Presence in Red Hat Operator Hub built into OpenShift interface
 -

  VMware Tanzu Kubernetes Grid Integrated Edition compatible and
  certified
  -

 Security scanning for images performed by VMware
 -

  Amazon EKS
  -

  Google GKE
  -

  Azure AKS
  -

  Documentation / Reference Implementations
  -

  Cloud storage classes
  -

  Ingress solutions
  -

 Sample connection validation application with reference
 implementations of Java Driver client connection parameters
 -

   Cluster-level Stop / Resume - stop all running instances while keeping
   persistent storage. Allows for scaling compute down to zero. Bringing the
   cluster ba

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

Re: Cassandra Contributor Meeting to focus on outstanding 4.0 issues

2020-09-29 Contributor Meeting

Re: [DISCUSS] Next steps for Kubernetes operator SIG

11 matches

Site Navigation

Mail list logo

Footer information