Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Benedict Elliott Smith
I think our "pre-beta" criteria should also be our "not in a major" criteria. 

If work is prohibited because it invalidates our pre-release verification, then 
it should not land until we next perform pre-release verification, which only 
currently happens once per major.

This could mean either landing less in a major, or permitting more in beta etc.

On 26/05/2020, 19:24, "Joshua McKenzie"  wrote:

I think an interesting question that informs when to stop accepting
specific changes in a release is when we expect any extensive pre-release
testing to take place.

If we go by our release lifecycle, gutting deprecated code seems compatible
w/Alpha but I wouldn't endorse merging it into Beta:
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
Since almost all of the 40_quality_testing epic stuff is also beta phase
and hasn't really taken off yet, it also seems like there will be extensive
testing after this phase transition.

All that being said, I'd advocate for marking FixVer 4.x to indicate
optionality and disallow merge of tickets like this after we're done
w/alpha phase in keeping w/our lifecycle doc in general.

Does that make sense? Should we consider revisiting and revising the
lifecycle doc re: larger deprecation / changes and cycle stages?



On Tue, May 26, 2020 at 12:53 PM Oleksandr Petrov <
oleksandr.pet...@gmail.com> wrote:

> > 1) Would you block the release over this ticket?
>
> I would definitely not block the release on this ticket.
>
> > 2) Would you prioritize this ticket over testing?
>
> Same here, I would prioritise testing.
>
> > 3) Does fixing this ticket make 4.0 a more stable release?
>
> I wanted to give some context: I wrote that in August 2018. While I still
> believe it is important to get rid of this code, I'm disinclined to merge
> it into 4.0.
>
> Given that the patch is rather big (421 additions and 1,480 deletions) and
> touches many important places, including parser, I would be extremely
> cautious to merge it that late in release cycle. It would be great to also
> hear arguments that would justify the risk.
>
> Thank you for starting this discussion,
> -- Alex
>
>
>
> On Tue, May 26, 2020 at 5:20 PM Ekaterina Dimitrova <
> ekaterina.dimitr...@datastax.com> wrote:
>
> > Dear all,
> >
> > Following the ticket review sent on 12th May I wanted to bring up
> > https://issues.apache.org/jira/browse/CASSANDRA-13994: Remove COMPACT
> >
> > STORAGE internals before 4.0 release.
> >
> > It is already under review by Dinesh Joshi and Alex Petrov. Not a
> > blocker but already under review.
> >
> > Below are my responses to the questions brought up.
> >
> >
> > 1) Would you block the release over this
> >
> > ticket? - probably not
> >
> > 2) Would you prioritize this ticket over testing? - already
> > implemented but if there are some big changes needed after the review,
> > I doubt it we will want to prioritize over the testing
> >
> > 3) Does fixing
> > this ticket make 4.0 a more stable release? - I will just cite Alex
> > Petrov who reported this Jira and I think the rest of us would agree
> > with him here.
> >
> > "I would say it's quite important to clean up compact storage
> > internals in 4.0 before the release. It should have no visible
> > side-effects, but it'd be very good to have as it simplifies multiple
> > code paths."
> >
> >
> > Ekaterina Dimitrova
> > e. ekaterina.dimitr...@datastax.com
> > w. www.datastax.com
> >
>
>
> --
> alex p
>



-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Fwd: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Ekaterina Dimitrova
Thank you all for your input.
I think an important topic is again to revise the lifecycle and ensure we
really have the vision on what is left until beta. I will start a separate
thread on the flaky tests situation soon.

For this particular ticket I see a couple of things:
- There are a lot of deletions of already not used code
- I implemented it still in alpha as per our agreement that this will give
us enough time for testing. Probably Dinesh as a reviewer can give some
valuable feedback/opinion on the patch.
- It definitely touches around important places but the important thing is
to see how exactly it touches, I think
- Considering it for alpha before the major testing in beta sounds
reasonable to me but I guess it also depends on people availability to
review it in detail and the exact test plans afterwards

On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith 
wrote:

> I think our "pre-beta" criteria should also be our "not in a major"
> criteria.
>
> If work is prohibited because it invalidates our pre-release verification,
> then it should not land until we next perform pre-release verification,
> which only currently happens once per major.
>
> This could mean either landing less in a major, or permitting more in beta
> etc.
>
> On 26/05/2020, 19:24, "Joshua McKenzie"  wrote:
>
> I think an interesting question that informs when to stop accepting
> specific changes in a release is when we expect any extensive
> pre-release
> testing to take place.
>
> If we go by our release lifecycle, gutting deprecated code seems
> compatible
> w/Alpha but I wouldn't endorse merging it into Beta:
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
> Since almost all of the 40_quality_testing epic stuff is also beta
> phase
> and hasn't really taken off yet, it also seems like there will be
> extensive
> testing after this phase transition.
>
> All that being said, I'd advocate for marking FixVer 4.x to indicate
> optionality and disallow merge of tickets like this after we're done
> w/alpha phase in keeping w/our lifecycle doc in general.
>
> Does that make sense? Should we consider revisiting and revising the
> lifecycle doc re: larger deprecation / changes and cycle stages?
>
>
>
> On Tue, May 26, 2020 at 12:53 PM Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
>
> > > 1) Would you block the release over this ticket?
> >
> > I would definitely not block the release on this ticket.
> >
> > > 2) Would you prioritize this ticket over testing?
> >
> > Same here, I would prioritise testing.
> >
> > > 3) Does fixing this ticket make 4.0 a more stable release?
> >
> > I wanted to give some context: I wrote that in August 2018. While I
> still
> > believe it is important to get rid of this code, I'm disinclined to
> merge
> > it into 4.0.
> >
> > Given that the patch is rather big (421 additions and 1,480
> deletions) and
> > touches many important places, including parser, I would be extremely
> > cautious to merge it that late in release cycle. It would be great
> to also
> > hear arguments that would justify the risk.
> >
> > Thank you for starting this discussion,
> > -- Alex
> >
> >
> >
> > On Tue, May 26, 2020 at 5:20 PM Ekaterina Dimitrova <
> > ekaterina.dimitr...@datastax.com> wrote:
> >
> > > Dear all,
> > >
> > > Following the ticket review sent on 12th May I wanted to bring up
> > > https://issues.apache.org/jira/browse/CASSANDRA-13994: Remove
> COMPACT
> > >
> > > STORAGE internals before 4.0 release.
> > >
> > > It is already under review by Dinesh Joshi and Alex Petrov. Not a
> > > blocker but already under review.
> > >
> > > Below are my responses to the questions brought up.
> > >
> > >
> > > 1) Would you block the release over this
> > >
> > > ticket? - probably not
> > >
> > > 2) Would you prioritize this ticket over testing? - already
> > > implemented but if there are some big changes needed after the
> review,
> > > I doubt it we will want to prioritize over the testing
> > >
> > > 3) Does fixing
> > > this ticket make 4.0 a more stable release? - I will just cite Alex
> > > Petrov who reported this Jira and I think the rest of us would
> agree
> > > with him here.
> > >
> > > "I would say it's quite important to clean up compact storage
> > > internals in 4.0 before the release. It should have no visible
> > > side-effects, but it'd be very good to have as it simplifies
> multiple
> > > code paths."
> > >
> > >
> > > Ekaterina Dimitrova
> > > e. ekaterina.dimitr...@datastax.com
> > > w. www.datastax.com
> > >
> >
> >
> > --
> > alex p
> >
>
>
>
> -
> To unsubscribe, 

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Joshua McKenzie
>
> because it invalidates our pre-release verification, then it should not
> land

until we next perform pre-release verification

At least for me there's a little softness around our collective alignment
on when pre-release verification takes place. If it's between alpha-1 and
ga we don't want changes that would invalidate those changes to land during
that time frame. Different for beta-1 to ga. We also risk invalidating
testing if we do any of that testing before wherever that cutoff is, and a
lack of clarity on that cutoff further muddies those waters.

My very loosely held perspective is that beta-1 to ga is the window in
which we apply the "don't do things that will invalidate verification", and
we plan to do that verification during the beta phase. I *think* this is
consistent w/the current framing of the lifecycle doc. That being said, I
don't have strong religion on this so if we collectively want to call it
"don't majorly disrupt from alpha-1 to ga", we can formalize that in the
docs and go ahead and triage current open scope for 4.0 and move things out.



On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
ekaterina.dimitr...@datastax.com> wrote:

> Thank you all for your input.
> I think an important topic is again to revise the lifecycle and ensure we
> really have the vision on what is left until beta. I will start a separate
> thread on the flaky tests situation soon.
>
> For this particular ticket I see a couple of things:
> - There are a lot of deletions of already not used code
> - I implemented it still in alpha as per our agreement that this will give
> us enough time for testing. Probably Dinesh as a reviewer can give some
> valuable feedback/opinion on the patch.
> - It definitely touches around important places but the important thing is
> to see how exactly it touches, I think
> - Considering it for alpha before the major testing in beta sounds
> reasonable to me but I guess it also depends on people availability to
> review it in detail and the exact test plans afterwards
>
> On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith 
> wrote:
>
> > I think our "pre-beta" criteria should also be our "not in a major"
> > criteria.
> >
> > If work is prohibited because it invalidates our pre-release
> verification,
> > then it should not land until we next perform pre-release verification,
> > which only currently happens once per major.
> >
> > This could mean either landing less in a major, or permitting more in
> beta
> > etc.
> >
> > On 26/05/2020, 19:24, "Joshua McKenzie"  wrote:
> >
> > I think an interesting question that informs when to stop accepting
> > specific changes in a release is when we expect any extensive
> > pre-release
> > testing to take place.
> >
> > If we go by our release lifecycle, gutting deprecated code seems
> > compatible
> > w/Alpha but I wouldn't endorse merging it into Beta:
> >
> > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
> > Since almost all of the 40_quality_testing epic stuff is also beta
> > phase
> > and hasn't really taken off yet, it also seems like there will be
> > extensive
> > testing after this phase transition.
> >
> > All that being said, I'd advocate for marking FixVer 4.x to indicate
> > optionality and disallow merge of tickets like this after we're done
> > w/alpha phase in keeping w/our lifecycle doc in general.
> >
> > Does that make sense? Should we consider revisiting and revising the
> > lifecycle doc re: larger deprecation / changes and cycle stages?
> >
> >
> >
> > On Tue, May 26, 2020 at 12:53 PM Oleksandr Petrov <
> > oleksandr.pet...@gmail.com> wrote:
> >
> > > > 1) Would you block the release over this ticket?
> > >
> > > I would definitely not block the release on this ticket.
> > >
> > > > 2) Would you prioritize this ticket over testing?
> > >
> > > Same here, I would prioritise testing.
> > >
> > > > 3) Does fixing this ticket make 4.0 a more stable release?
> > >
> > > I wanted to give some context: I wrote that in August 2018. While I
> > still
> > > believe it is important to get rid of this code, I'm disinclined to
> > merge
> > > it into 4.0.
> > >
> > > Given that the patch is rather big (421 additions and 1,480
> > deletions) and
> > > touches many important places, including parser, I would be
> extremely
> > > cautious to merge it that late in release cycle. It would be great
> > to also
> > > hear arguments that would justify the risk.
> > >
> > > Thank you for starting this discussion,
> > > -- Alex
> > >
> > >
> > >
> > > On Tue, May 26, 2020 at 5:20 PM Ekaterina Dimitrova <
> > > ekaterina.dimitr...@datastax.com> wrote:
> > >
> > > > Dear all,
> > > >
> > > > Following the ticket review sent on 12th May I wanted to bring up
> > > > https://issues.apache.org/jira/browse/CASSANDRA-13994: Remove
>

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Benedict Elliott Smith
I'm not sure if I communicated my point very well.  I mean to say that if the 
reason we are prohibiting a patch to land post-beta is because it invalidates 
work we only perform pre-ga, then it probably should not be permitted to land 
post-ga either, since it must also invalidate the same work?

That is to say, if we're comfortable with work landing post-ga because we 
believe it to be safe to release without our pre-major-release verification, we 
should be comfortable with it landing at any time pre-ga too.  Anything else 
seems inconsistent to me, and we should examine what assumptions we're making 
that permit this inconsistency to arise.


On 27/05/2020, 18:49, "Joshua McKenzie"  wrote:

>
> because it invalidates our pre-release verification, then it should not
> land

until we next perform pre-release verification

At least for me there's a little softness around our collective alignment
on when pre-release verification takes place. If it's between alpha-1 and
ga we don't want changes that would invalidate those changes to land during
that time frame. Different for beta-1 to ga. We also risk invalidating
testing if we do any of that testing before wherever that cutoff is, and a
lack of clarity on that cutoff further muddies those waters.

My very loosely held perspective is that beta-1 to ga is the window in
which we apply the "don't do things that will invalidate verification", and
we plan to do that verification during the beta phase. I *think* this is
consistent w/the current framing of the lifecycle doc. That being said, I
don't have strong religion on this so if we collectively want to call it
"don't majorly disrupt from alpha-1 to ga", we can formalize that in the
docs and go ahead and triage current open scope for 4.0 and move things out.



On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
ekaterina.dimitr...@datastax.com> wrote:

> Thank you all for your input.
> I think an important topic is again to revise the lifecycle and ensure we
> really have the vision on what is left until beta. I will start a separate
> thread on the flaky tests situation soon.
>
> For this particular ticket I see a couple of things:
> - There are a lot of deletions of already not used code
> - I implemented it still in alpha as per our agreement that this will give
> us enough time for testing. Probably Dinesh as a reviewer can give some
> valuable feedback/opinion on the patch.
> - It definitely touches around important places but the important thing is
> to see how exactly it touches, I think
> - Considering it for alpha before the major testing in beta sounds
> reasonable to me but I guess it also depends on people availability to
> review it in detail and the exact test plans afterwards
>
> On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith 
> wrote:
>
> > I think our "pre-beta" criteria should also be our "not in a major"
> > criteria.
> >
> > If work is prohibited because it invalidates our pre-release
> verification,
> > then it should not land until we next perform pre-release verification,
> > which only currently happens once per major.
> >
> > This could mean either landing less in a major, or permitting more in
> beta
> > etc.
> >
> > On 26/05/2020, 19:24, "Joshua McKenzie"  wrote:
> >
> > I think an interesting question that informs when to stop accepting
> > specific changes in a release is when we expect any extensive
> > pre-release
> > testing to take place.
> >
> > If we go by our release lifecycle, gutting deprecated code seems
> > compatible
> > w/Alpha but I wouldn't endorse merging it into Beta:
> >
> > https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
> > Since almost all of the 40_quality_testing epic stuff is also beta
> > phase
> > and hasn't really taken off yet, it also seems like there will be
> > extensive
> > testing after this phase transition.
> >
> > All that being said, I'd advocate for marking FixVer 4.x to indicate
> > optionality and disallow merge of tickets like this after we're done
> > w/alpha phase in keeping w/our lifecycle doc in general.
> >
> > Does that make sense? Should we consider revisiting and revising the
> > lifecycle doc re: larger deprecation / changes and cycle stages?
> >
> >
> >
> > On Tue, May 26, 2020 at 12:53 PM Oleksandr Petrov <
> > oleksandr.pet...@gmail.com> wrote:
> >
> > > > 1) Would you block the release over this ticket?
> > >
> > > I would definitely not block the release on this ticket.
> > >
> > > > 2) Would you prioritize this ticket over testing?
> > >
> > > Same here, I would p

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Benedict Elliott Smith
I'm being told this still isn't clear, so let me try in a bullet-point timeline:

* 4.0 Beta
* 4.0 Verification Work
* [Merge Window]
* 4.0 GA
* 4.0 Minor Releases 
* ...
* 5.0 Dev
* ...
* 5.0 Verification Work 
* GA 5.0

I think that anything that is prohibited from "[Merge Window]" because it 
invalidates "4.0 Verification Work" must also be prohibited until "5.0 Dev" 
because the next equivalent work that can now validate it occurs only at "5.0 
Verification Work"

On 27/05/2020, 19:05, "Benedict Elliott Smith"  wrote:

I'm not sure if I communicated my point very well.  I mean to say that if 
the reason we are prohibiting a patch to land post-beta is because it 
invalidates work we only perform pre-ga, then it probably should not be 
permitted to land post-ga either, since it must also invalidate the same work?

That is to say, if we're comfortable with work landing post-ga because we 
believe it to be safe to release without our pre-major-release verification, we 
should be comfortable with it landing at any time pre-ga too.  Anything else 
seems inconsistent to me, and we should examine what assumptions we're making 
that permit this inconsistency to arise.


On 27/05/2020, 18:49, "Joshua McKenzie"  wrote:

>
> because it invalidates our pre-release verification, then it should 
not
> land

until we next perform pre-release verification

At least for me there's a little softness around our collective 
alignment
on when pre-release verification takes place. If it's between alpha-1 
and
ga we don't want changes that would invalidate those changes to land 
during
that time frame. Different for beta-1 to ga. We also risk invalidating
testing if we do any of that testing before wherever that cutoff is, 
and a
lack of clarity on that cutoff further muddies those waters.

My very loosely held perspective is that beta-1 to ga is the window in
which we apply the "don't do things that will invalidate verification", 
and
we plan to do that verification during the beta phase. I *think* this is
consistent w/the current framing of the lifecycle doc. That being said, 
I
don't have strong religion on this so if we collectively want to call it
"don't majorly disrupt from alpha-1 to ga", we can formalize that in the
docs and go ahead and triage current open scope for 4.0 and move things 
out.



On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
ekaterina.dimitr...@datastax.com> wrote:

> Thank you all for your input.
> I think an important topic is again to revise the lifecycle and 
ensure we
> really have the vision on what is left until beta. I will start a 
separate
> thread on the flaky tests situation soon.
>
> For this particular ticket I see a couple of things:
> - There are a lot of deletions of already not used code
> - I implemented it still in alpha as per our agreement that this will 
give
> us enough time for testing. Probably Dinesh as a reviewer can give 
some
> valuable feedback/opinion on the patch.
> - It definitely touches around important places but the important 
thing is
> to see how exactly it touches, I think
> - Considering it for alpha before the major testing in beta sounds
> reasonable to me but I guess it also depends on people availability to
> review it in detail and the exact test plans afterwards
>
> On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith 

> wrote:
>
> > I think our "pre-beta" criteria should also be our "not in a major"
> > criteria.
> >
> > If work is prohibited because it invalidates our pre-release
> verification,
> > then it should not land until we next perform pre-release 
verification,
> > which only currently happens once per major.
> >
> > This could mean either landing less in a major, or permitting more 
in
> beta
> > etc.
> >
> > On 26/05/2020, 19:24, "Joshua McKenzie"  
wrote:
> >
> > I think an interesting question that informs when to stop 
accepting
> > specific changes in a release is when we expect any extensive
> > pre-release
> > testing to take place.
> >
> > If we go by our release lifecycle, gutting deprecated code seems
> > compatible
> > w/Alpha but I wouldn't endorse merging it into Beta:
> >
> > 
https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
> > Since almost all of the 40_quality_testing epic stuff is also 
beta
> > phase
> > and hasn't really taken off yet, it also seems like there will 
be
> > extensive
> > testing after this phase transit

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Joshua McKenzie
In this hypothetical, certainly not post-ga, and I'd argue we shouldn't
allow it post-beta1 and we need a clear demarcation of "this type of work
is ok to merge before X, it's not ok after X. Validation testing *will not
occur* before X, and will start after X".

It's a bit rigid, but it's the only way to have a clear inflection point
where you know subsequent work won't be invalidated. Otherwise we end up in
"I'm pretty sure the validation for disruptive thing X hasn't occurred so
I'm going to merge it now" hell.

(does what I'm typing here make sense given the context of what you said?
Had a little trouble parsing, I think because there's fuzziness around
"alpha 1 release" vs. "alpha phase" on how we label things in the project.
Maybe)

On Wed, May 27, 2020 at 2:05 PM Benedict Elliott Smith 
wrote:

> I'm not sure if I communicated my point very well.  I mean to say that if
> the reason we are prohibiting a patch to land post-beta is because it
> invalidates work we only perform pre-ga, then it probably should not be
> permitted to land post-ga either, since it must also invalidate the same
> work?
>
> That is to say, if we're comfortable with work landing post-ga because we
> believe it to be safe to release without our pre-major-release
> verification, we should be comfortable with it landing at any time pre-ga
> too.  Anything else seems inconsistent to me, and we should examine what
> assumptions we're making that permit this inconsistency to arise.
>
>
> On 27/05/2020, 18:49, "Joshua McKenzie"  wrote:
>
> >
> > because it invalidates our pre-release verification, then it should
> not
> > land
>
> until we next perform pre-release verification
>
> At least for me there's a little softness around our collective
> alignment
> on when pre-release verification takes place. If it's between alpha-1
> and
> ga we don't want changes that would invalidate those changes to land
> during
> that time frame. Different for beta-1 to ga. We also risk invalidating
> testing if we do any of that testing before wherever that cutoff is,
> and a
> lack of clarity on that cutoff further muddies those waters.
>
> My very loosely held perspective is that beta-1 to ga is the window in
> which we apply the "don't do things that will invalidate
> verification", and
> we plan to do that verification during the beta phase. I *think* this
> is
> consistent w/the current framing of the lifecycle doc. That being
> said, I
> don't have strong religion on this so if we collectively want to call
> it
> "don't majorly disrupt from alpha-1 to ga", we can formalize that in
> the
> docs and go ahead and triage current open scope for 4.0 and move
> things out.
>
>
>
> On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
> ekaterina.dimitr...@datastax.com> wrote:
>
> > Thank you all for your input.
> > I think an important topic is again to revise the lifecycle and
> ensure we
> > really have the vision on what is left until beta. I will start a
> separate
> > thread on the flaky tests situation soon.
> >
> > For this particular ticket I see a couple of things:
> > - There are a lot of deletions of already not used code
> > - I implemented it still in alpha as per our agreement that this
> will give
> > us enough time for testing. Probably Dinesh as a reviewer can give
> some
> > valuable feedback/opinion on the patch.
> > - It definitely touches around important places but the important
> thing is
> > to see how exactly it touches, I think
> > - Considering it for alpha before the major testing in beta sounds
> > reasonable to me but I guess it also depends on people availability
> to
> > review it in detail and the exact test plans afterwards
> >
> > On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith <
> bened...@apache.org>
> > wrote:
> >
> > > I think our "pre-beta" criteria should also be our "not in a major"
> > > criteria.
> > >
> > > If work is prohibited because it invalidates our pre-release
> > verification,
> > > then it should not land until we next perform pre-release
> verification,
> > > which only currently happens once per major.
> > >
> > > This could mean either landing less in a major, or permitting more
> in
> > beta
> > > etc.
> > >
> > > On 26/05/2020, 19:24, "Joshua McKenzie" 
> wrote:
> > >
> > > I think an interesting question that informs when to stop
> accepting
> > > specific changes in a release is when we expect any extensive
> > > pre-release
> > > testing to take place.
> > >
> > > If we go by our release lifecycle, gutting deprecated code
> seems
> > > compatible
> > > w/Alpha but I wouldn't endorse merging it into Beta:
> > >
> > >
> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
> > > Since almost all 

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Jeremiah D Jordan
+1 strongly agree.  If we aren’t going to let something go into 4.0.0 because 
it would "invalidate testing” then we can not let such a thing go into 4.0.1 
unless we plan to re-do said testing for the patch release.

> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith  
> wrote:
> 
> I'm being told this still isn't clear, so let me try in a bullet-point 
> timeline:
> 
> * 4.0 Beta
> * 4.0 Verification Work
> * [Merge Window]
> * 4.0 GA
> * 4.0 Minor Releases 
> * ...
> * 5.0 Dev
> * ...
> * 5.0 Verification Work 
> * GA 5.0
> 
> I think that anything that is prohibited from "[Merge Window]" because it 
> invalidates "4.0 Verification Work" must also be prohibited until "5.0 Dev" 
> because the next equivalent work that can now validate it occurs only at "5.0 
> Verification Work"
> 
> On 27/05/2020, 19:05, "Benedict Elliott Smith"  wrote:
> 
>I'm not sure if I communicated my point very well.  I mean to say that if 
> the reason we are prohibiting a patch to land post-beta is because it 
> invalidates work we only perform pre-ga, then it probably should not be 
> permitted to land post-ga either, since it must also invalidate the same work?
> 
>That is to say, if we're comfortable with work landing post-ga because we 
> believe it to be safe to release without our pre-major-release verification, 
> we should be comfortable with it landing at any time pre-ga too.  Anything 
> else seems inconsistent to me, and we should examine what assumptions we're 
> making that permit this inconsistency to arise.
> 
> 
>On 27/05/2020, 18:49, "Joshua McKenzie"  wrote:
> 
>> 
>> because it invalidates our pre-release verification, then it should not
>> land
> 
>until we next perform pre-release verification
> 
>At least for me there's a little softness around our collective 
> alignment
>on when pre-release verification takes place. If it's between alpha-1 
> and
>ga we don't want changes that would invalidate those changes to land 
> during
>that time frame. Different for beta-1 to ga. We also risk invalidating
>testing if we do any of that testing before wherever that cutoff is, 
> and a
>lack of clarity on that cutoff further muddies those waters.
> 
>My very loosely held perspective is that beta-1 to ga is the window in
>which we apply the "don't do things that will invalidate 
> verification", and
>we plan to do that verification during the beta phase. I *think* this 
> is
>consistent w/the current framing of the lifecycle doc. That being 
> said, I
>don't have strong religion on this so if we collectively want to call 
> it
>"don't majorly disrupt from alpha-1 to ga", we can formalize that in 
> the
>docs and go ahead and triage current open scope for 4.0 and move 
> things out.
> 
> 
> 
>On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
>ekaterina.dimitr...@datastax.com> wrote:
> 
>> Thank you all for your input.
>> I think an important topic is again to revise the lifecycle and ensure we
>> really have the vision on what is left until beta. I will start a separate
>> thread on the flaky tests situation soon.
>> 
>> For this particular ticket I see a couple of things:
>> - There are a lot of deletions of already not used code
>> - I implemented it still in alpha as per our agreement that this will give
>> us enough time for testing. Probably Dinesh as a reviewer can give some
>> valuable feedback/opinion on the patch.
>> - It definitely touches around important places but the important thing is
>> to see how exactly it touches, I think
>> - Considering it for alpha before the major testing in beta sounds
>> reasonable to me but I guess it also depends on people availability to
>> review it in detail and the exact test plans afterwards
>> 
>> On Wed, 27 May 2020 at 7:14, Benedict Elliott Smith 
>> wrote:
>> 
>>> I think our "pre-beta" criteria should also be our "not in a major"
>>> criteria.
>>> 
>>> If work is prohibited because it invalidates our pre-release
>> verification,
>>> then it should not land until we next perform pre-release verification,
>>> which only currently happens once per major.
>>> 
>>> This could mean either landing less in a major, or permitting more in
>> beta
>>> etc.
>>> 
>>> On 26/05/2020, 19:24, "Joshua McKenzie"  wrote:
>>> 
>>>I think an interesting question that informs when to stop accepting
>>>specific changes in a release is when we expect any extensive
>>> pre-release
>>>testing to take place.
>>> 
>>>If we go by our release lifecycle, gutting deprecated code seems
>>> compatible
>>>w/Alpha but I wouldn't endorse merging it into Beta:
>>> 
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/Release+Lifecycle.
>>>Since almost all of the 40_quality_testing epic stuff is also beta
>>> phase
>>>and hasn't really taken off yet, it also seems like there will be
>>> extensive
>>>testing after this ph

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Scott Andreas
That makes sense to me, yep.

My hope and expectation is that the time required for "verification work" will 
shrink dramatically in the not too distant future - ideally to a period of less 
than a month. In this world, the cost of missing one train is reduced to 
catching the next one.

One of the main goals in shifting focus from "testing" and "test plans" to 
"test engineering" is automating as many aspects of release qualification as 
possible, with an asymptotic ideal as a function of compute capacity and time. 
While such automation will never be complete (it's likely that development of 
new features will/must include qualification infra changes to exercise them), 
if we're able to apply the same rigor to major releases as we are to patchlevel 
builds with little incremental effort, I'd be thrilled.

This is mostly a way of saying:
– I like the cadence/sequencing Benedict proposes below.
– I think improvements in test engineering can reduce/eliminate invalidation 
and may increase the scope of what can be a candidate for merge on a given 
branch
– And if not, the cost of missing the train is lower because we'll be able to 
deliver major releases more often.

Scott


From: Jeremiah D Jordan 
Sent: Wednesday, May 27, 2020 11:54 AM
To: Cassandra DEV
Subject: Re: [DISCUSS] CASSANDRA-13994

+1 strongly agree.  If we aren’t going to let something go into 4.0.0 because 
it would "invalidate testing” then we can not let such a thing go into 4.0.1 
unless we plan to re-do said testing for the patch release.

> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith  
> wrote:
>
> I'm being told this still isn't clear, so let me try in a bullet-point 
> timeline:
>
> * 4.0 Beta
> * 4.0 Verification Work
> * [Merge Window]
> * 4.0 GA
> * 4.0 Minor Releases
> * ...
> * 5.0 Dev
> * ...
> * 5.0 Verification Work
> * GA 5.0
>
> I think that anything that is prohibited from "[Merge Window]" because it 
> invalidates "4.0 Verification Work" must also be prohibited until "5.0 Dev" 
> because the next equivalent work that can now validate it occurs only at "5.0 
> Verification Work"
>
> On 27/05/2020, 19:05, "Benedict Elliott Smith"  wrote:
>
>I'm not sure if I communicated my point very well.  I mean to say that if 
> the reason we are prohibiting a patch to land post-beta is because it 
> invalidates work we only perform pre-ga, then it probably should not be 
> permitted to land post-ga either, since it must also invalidate the same work?
>
>That is to say, if we're comfortable with work landing post-ga because we 
> believe it to be safe to release without our pre-major-release verification, 
> we should be comfortable with it landing at any time pre-ga too.  Anything 
> else seems inconsistent to me, and we should examine what assumptions we're 
> making that permit this inconsistency to arise.
>
>
>On 27/05/2020, 18:49, "Joshua McKenzie"  wrote:
>
>>
>> because it invalidates our pre-release verification, then it should not
>> land
>
>until we next perform pre-release verification
>
>At least for me there's a little softness around our collective 
> alignment
>on when pre-release verification takes place. If it's between alpha-1 
> and
>ga we don't want changes that would invalidate those changes to land 
> during
>that time frame. Different for beta-1 to ga. We also risk invalidating
>testing if we do any of that testing before wherever that cutoff is, 
> and a
>lack of clarity on that cutoff further muddies those waters.
>
>My very loosely held perspective is that beta-1 to ga is the window in
>which we apply the "don't do things that will invalidate 
> verification", and
>we plan to do that verification during the beta phase. I *think* this 
> is
>consistent w/the current framing of the lifecycle doc. That being 
> said, I
>don't have strong religion on this so if we collectively want to call 
> it
>"don't majorly disrupt from alpha-1 to ga", we can formalize that in 
> the
>docs and go ahead and triage current open scope for 4.0 and move 
> things out.
>
>
>
>On Wed, May 27, 2020 at 12:59 PM Ekaterina Dimitrova <
>ekaterina.dimitr...@datastax.com> wrote:
>
>> Thank you all for your input.
>> I think an important topic is again to revise the lifecycle and ensure we
>> really have the vision on what is left until beta. I will start a separate
>> thread on the flaky tests situation soon.
>>
>> For this particular ticket I see a couple of things:
>> - There are a lot of deletions of already not used code
>> - I implemented it still in alpha as per our agreement that this will give
>> us enough time for testing. Probably Dinesh as a reviewer can give some
>> valuable feedback/opinion on the patch.
>> - It definitely touches around important places but the important thing is
>> to see how exactly it touches, I think
>> - Considering it for

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Joshua McKenzie
I think we're all on the same page here; I was focusing more on the release
lifecycles and sequencing than the entire version cycle. Good to broaden
scope I think.

One thing we're not considering is the separation of API changes from major
changes and how that intersects with release milestones.

Meaning:
1. alpha phase
2. Milestone: API freeze (all API changes pushed to next major)
3. beta phase
4. Verification phase (all major disruptive pushed to next major)

A clear point to cut RC's doesn't surface from the above for me. Releasing
an RC before broad verification seems wrong, and cutting an RC after the 4
points above may as well be GA because it's all known scope.

Thoughts?

On Wed, May 27, 2020 at 3:28 PM Scott Andreas  wrote:

> That makes sense to me, yep.
>
> My hope and expectation is that the time required for "verification work"
> will shrink dramatically in the not too distant future - ideally to a
> period of less than a month. In this world, the cost of missing one train
> is reduced to catching the next one.
>
> One of the main goals in shifting focus from "testing" and "test plans" to
> "test engineering" is automating as many aspects of release qualification
> as possible, with an asymptotic ideal as a function of compute capacity and
> time. While such automation will never be complete (it's likely that
> development of new features will/must include qualification infra changes
> to exercise them), if we're able to apply the same rigor to major releases
> as we are to patchlevel builds with little incremental effort, I'd be
> thrilled.
>
> This is mostly a way of saying:
> – I like the cadence/sequencing Benedict proposes below.
> – I think improvements in test engineering can reduce/eliminate
> invalidation and may increase the scope of what can be a candidate for
> merge on a given branch
> – And if not, the cost of missing the train is lower because we'll be able
> to deliver major releases more often.
>
> Scott
>
> 
> From: Jeremiah D Jordan 
> Sent: Wednesday, May 27, 2020 11:54 AM
> To: Cassandra DEV
> Subject: Re: [DISCUSS] CASSANDRA-13994
>
> +1 strongly agree.  If we aren’t going to let something go into 4.0.0
> because it would "invalidate testing” then we can not let such a thing go
> into 4.0.1 unless we plan to re-do said testing for the patch release.
>
> > On May 27, 2020, at 1:31 PM, Benedict Elliott Smith 
> wrote:
> >
> > I'm being told this still isn't clear, so let me try in a bullet-point
> timeline:
> >
> > * 4.0 Beta
> > * 4.0 Verification Work
> > * [Merge Window]
> > * 4.0 GA
> > * 4.0 Minor Releases
> > * ...
> > * 5.0 Dev
> > * ...
> > * 5.0 Verification Work
> > * GA 5.0
> >
> > I think that anything that is prohibited from "[Merge Window]" because
> it invalidates "4.0 Verification Work" must also be prohibited until "5.0
> Dev" because the next equivalent work that can now validate it occurs only
> at "5.0 Verification Work"
> >
> > On 27/05/2020, 19:05, "Benedict Elliott Smith" 
> wrote:
> >
> >I'm not sure if I communicated my point very well.  I mean to say
> that if the reason we are prohibiting a patch to land post-beta is because
> it invalidates work we only perform pre-ga, then it probably should not be
> permitted to land post-ga either, since it must also invalidate the same
> work?
> >
> >That is to say, if we're comfortable with work landing post-ga
> because we believe it to be safe to release without our pre-major-release
> verification, we should be comfortable with it landing at any time pre-ga
> too.  Anything else seems inconsistent to me, and we should examine what
> assumptions we're making that permit this inconsistency to arise.
> >
> >
> >On 27/05/2020, 18:49, "Joshua McKenzie" 
> wrote:
> >
> >>
> >> because it invalidates our pre-release verification, then it should not
> >> land
> >
> >until we next perform pre-release verification
> >
> >At least for me there's a little softness around our collective
> alignment
> >on when pre-release verification takes place. If it's between
> alpha-1 and
> >ga we don't want changes that would invalidate those changes to
> land during
> >that time frame. Different for beta-1 to ga. We also risk
> invalidating
> >testing if we do any of that testing before wherever that cutoff
> is, and a
> >lack of clarity on that cutoff further muddies those waters.
> >
> >My very loosely held perspective is that beta-1 to ga is the
> window in
> >which we apply the "don't do things that will invalidate
> verification", and
> >we plan to do that verification during the beta phase. I *think*
> this is
> >consistent w/the current framing of the lifecycle doc. That being
> said, I
> >don't have strong religion on this so if we collectively want to
> call it
> >"don't majorly disrupt from alpha-1 to ga", we can formalize that
> in the
> >docs and go ah

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Jeremiah D Jordan
> A clear point to cut RC's doesn't surface from the above for me. Releasing
> an RC before broad verification seems wrong, and cutting an RC after the 4
> points above may as well be GA because it's all known scope.

Isn’t the whole point of an RC is that it could be the GA?  It is a “release 
candidate”, meaning if no one finds any issues with it, that can them become 
the release?  So that seems like exactly the right time to make RC releases?

> On May 27, 2020, at 2:45 PM, Joshua McKenzie  wrote:
> 
> I think we're all on the same page here; I was focusing more on the release
> lifecycles and sequencing than the entire version cycle. Good to broaden
> scope I think.
> 
> One thing we're not considering is the separation of API changes from major
> changes and how that intersects with release milestones.
> 
> Meaning:
> 1. alpha phase
> 2. Milestone: API freeze (all API changes pushed to next major)
> 3. beta phase
> 4. Verification phase (all major disruptive pushed to next major)
> 
> A clear point to cut RC's doesn't surface from the above for me. Releasing
> an RC before broad verification seems wrong, and cutting an RC after the 4
> points above may as well be GA because it's all known scope.
> 
> Thoughts?
> 
> On Wed, May 27, 2020 at 3:28 PM Scott Andreas  wrote:
> 
>> That makes sense to me, yep.
>> 
>> My hope and expectation is that the time required for "verification work"
>> will shrink dramatically in the not too distant future - ideally to a
>> period of less than a month. In this world, the cost of missing one train
>> is reduced to catching the next one.
>> 
>> One of the main goals in shifting focus from "testing" and "test plans" to
>> "test engineering" is automating as many aspects of release qualification
>> as possible, with an asymptotic ideal as a function of compute capacity and
>> time. While such automation will never be complete (it's likely that
>> development of new features will/must include qualification infra changes
>> to exercise them), if we're able to apply the same rigor to major releases
>> as we are to patchlevel builds with little incremental effort, I'd be
>> thrilled.
>> 
>> This is mostly a way of saying:
>> – I like the cadence/sequencing Benedict proposes below.
>> – I think improvements in test engineering can reduce/eliminate
>> invalidation and may increase the scope of what can be a candidate for
>> merge on a given branch
>> – And if not, the cost of missing the train is lower because we'll be able
>> to deliver major releases more often.
>> 
>> Scott
>> 
>> 
>> From: Jeremiah D Jordan 
>> Sent: Wednesday, May 27, 2020 11:54 AM
>> To: Cassandra DEV
>> Subject: Re: [DISCUSS] CASSANDRA-13994
>> 
>> +1 strongly agree.  If we aren’t going to let something go into 4.0.0
>> because it would "invalidate testing” then we can not let such a thing go
>> into 4.0.1 unless we plan to re-do said testing for the patch release.
>> 
>>> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith 
>> wrote:
>>> 
>>> I'm being told this still isn't clear, so let me try in a bullet-point
>> timeline:
>>> 
>>> * 4.0 Beta
>>> * 4.0 Verification Work
>>> * [Merge Window]
>>> * 4.0 GA
>>> * 4.0 Minor Releases
>>> * ...
>>> * 5.0 Dev
>>> * ...
>>> * 5.0 Verification Work
>>> * GA 5.0
>>> 
>>> I think that anything that is prohibited from "[Merge Window]" because
>> it invalidates "4.0 Verification Work" must also be prohibited until "5.0
>> Dev" because the next equivalent work that can now validate it occurs only
>> at "5.0 Verification Work"
>>> 
>>> On 27/05/2020, 19:05, "Benedict Elliott Smith" 
>> wrote:
>>> 
>>>   I'm not sure if I communicated my point very well.  I mean to say
>> that if the reason we are prohibiting a patch to land post-beta is because
>> it invalidates work we only perform pre-ga, then it probably should not be
>> permitted to land post-ga either, since it must also invalidate the same
>> work?
>>> 
>>>   That is to say, if we're comfortable with work landing post-ga
>> because we believe it to be safe to release without our pre-major-release
>> verification, we should be comfortable with it landing at any time pre-ga
>> too.  Anything else seems inconsistent to me, and we should examine what
>> assumptions we're making that permit this inconsistency to arise.
>>> 
>>> 
>>>   On 27/05/2020, 18:49, "Joshua McKenzie" 
>> wrote:
>>> 
 
 because it invalidates our pre-release verification, then it should not
 land
>>> 
>>>   until we next perform pre-release verification
>>> 
>>>   At least for me there's a little softness around our collective
>> alignment
>>>   on when pre-release verification takes place. If it's between
>> alpha-1 and
>>>   ga we don't want changes that would invalidate those changes to
>> land during
>>>   that time frame. Different for beta-1 to ga. We also risk
>> invalidating
>>>   testing if we do any of that testing before wherever that cutoff
>> is, and a
>

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Brandon Williams
Absolutely my understanding.

On Wed, May 27, 2020, 2:49 PM Jeremiah D Jordan 
wrote:

> > A clear point to cut RC's doesn't surface from the above for me.
> Releasing
> > an RC before broad verification seems wrong, and cutting an RC after the
> 4
> > points above may as well be GA because it's all known scope.
>
> Isn’t the whole point of an RC is that it could be the GA?  It is a
> “release candidate”, meaning if no one finds any issues with it, that can
> them become the release?  So that seems like exactly the right time to make
> RC releases?
>
> > On May 27, 2020, at 2:45 PM, Joshua McKenzie 
> wrote:
> >
> > I think we're all on the same page here; I was focusing more on the
> release
> > lifecycles and sequencing than the entire version cycle. Good to broaden
> > scope I think.
> >
> > One thing we're not considering is the separation of API changes from
> major
> > changes and how that intersects with release milestones.
> >
> > Meaning:
> > 1. alpha phase
> > 2. Milestone: API freeze (all API changes pushed to next major)
> > 3. beta phase
> > 4. Verification phase (all major disruptive pushed to next major)
> >
> > A clear point to cut RC's doesn't surface from the above for me.
> Releasing
> > an RC before broad verification seems wrong, and cutting an RC after the
> 4
> > points above may as well be GA because it's all known scope.
> >
> > Thoughts?
> >
> > On Wed, May 27, 2020 at 3:28 PM Scott Andreas 
> wrote:
> >
> >> That makes sense to me, yep.
> >>
> >> My hope and expectation is that the time required for "verification
> work"
> >> will shrink dramatically in the not too distant future - ideally to a
> >> period of less than a month. In this world, the cost of missing one
> train
> >> is reduced to catching the next one.
> >>
> >> One of the main goals in shifting focus from "testing" and "test plans"
> to
> >> "test engineering" is automating as many aspects of release
> qualification
> >> as possible, with an asymptotic ideal as a function of compute capacity
> and
> >> time. While such automation will never be complete (it's likely that
> >> development of new features will/must include qualification infra
> changes
> >> to exercise them), if we're able to apply the same rigor to major
> releases
> >> as we are to patchlevel builds with little incremental effort, I'd be
> >> thrilled.
> >>
> >> This is mostly a way of saying:
> >> – I like the cadence/sequencing Benedict proposes below.
> >> – I think improvements in test engineering can reduce/eliminate
> >> invalidation and may increase the scope of what can be a candidate for
> >> merge on a given branch
> >> – And if not, the cost of missing the train is lower because we'll be
> able
> >> to deliver major releases more often.
> >>
> >> Scott
> >>
> >> 
> >> From: Jeremiah D Jordan 
> >> Sent: Wednesday, May 27, 2020 11:54 AM
> >> To: Cassandra DEV
> >> Subject: Re: [DISCUSS] CASSANDRA-13994
> >>
> >> +1 strongly agree.  If we aren’t going to let something go into 4.0.0
> >> because it would "invalidate testing” then we can not let such a thing
> go
> >> into 4.0.1 unless we plan to re-do said testing for the patch release.
> >>
> >>> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith <
> bened...@apache.org>
> >> wrote:
> >>>
> >>> I'm being told this still isn't clear, so let me try in a bullet-point
> >> timeline:
> >>>
> >>> * 4.0 Beta
> >>> * 4.0 Verification Work
> >>> * [Merge Window]
> >>> * 4.0 GA
> >>> * 4.0 Minor Releases
> >>> * ...
> >>> * 5.0 Dev
> >>> * ...
> >>> * 5.0 Verification Work
> >>> * GA 5.0
> >>>
> >>> I think that anything that is prohibited from "[Merge Window]" because
> >> it invalidates "4.0 Verification Work" must also be prohibited until
> "5.0
> >> Dev" because the next equivalent work that can now validate it occurs
> only
> >> at "5.0 Verification Work"
> >>>
> >>> On 27/05/2020, 19:05, "Benedict Elliott Smith" 
> >> wrote:
> >>>
> >>>   I'm not sure if I communicated my point very well.  I mean to say
> >> that if the reason we are prohibiting a patch to land post-beta is
> because
> >> it invalidates work we only perform pre-ga, then it probably should not
> be
> >> permitted to land post-ga either, since it must also invalidate the same
> >> work?
> >>>
> >>>   That is to say, if we're comfortable with work landing post-ga
> >> because we believe it to be safe to release without our
> pre-major-release
> >> verification, we should be comfortable with it landing at any time
> pre-ga
> >> too.  Anything else seems inconsistent to me, and we should examine what
> >> assumptions we're making that permit this inconsistency to arise.
> >>>
> >>>
> >>>   On 27/05/2020, 18:49, "Joshua McKenzie" 
> >> wrote:
> >>>
> 
>  because it invalidates our pre-release verification, then it should
> not
>  land
> >>>
> >>>   until we next perform pre-release verification
> >>>
> >>>   At least for me there's a little softness around our collective
>

Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Joshua McKenzie
Maybe. Do we just time box, say we're going to cut an RC and give it 4
weeks, if nothing awful surfaces we GA?

On Wed, May 27, 2020 at 4:12 PM Brandon Williams  wrote:

> Absolutely my understanding.
>
> On Wed, May 27, 2020, 2:49 PM Jeremiah D Jordan  >
> wrote:
>
> > > A clear point to cut RC's doesn't surface from the above for me.
> > Releasing
> > > an RC before broad verification seems wrong, and cutting an RC after
> the
> > 4
> > > points above may as well be GA because it's all known scope.
> >
> > Isn’t the whole point of an RC is that it could be the GA?  It is a
> > “release candidate”, meaning if no one finds any issues with it, that can
> > them become the release?  So that seems like exactly the right time to
> make
> > RC releases?
> >
> > > On May 27, 2020, at 2:45 PM, Joshua McKenzie 
> > wrote:
> > >
> > > I think we're all on the same page here; I was focusing more on the
> > release
> > > lifecycles and sequencing than the entire version cycle. Good to
> broaden
> > > scope I think.
> > >
> > > One thing we're not considering is the separation of API changes from
> > major
> > > changes and how that intersects with release milestones.
> > >
> > > Meaning:
> > > 1. alpha phase
> > > 2. Milestone: API freeze (all API changes pushed to next major)
> > > 3. beta phase
> > > 4. Verification phase (all major disruptive pushed to next major)
> > >
> > > A clear point to cut RC's doesn't surface from the above for me.
> > Releasing
> > > an RC before broad verification seems wrong, and cutting an RC after
> the
> > 4
> > > points above may as well be GA because it's all known scope.
> > >
> > > Thoughts?
> > >
> > > On Wed, May 27, 2020 at 3:28 PM Scott Andreas 
> > wrote:
> > >
> > >> That makes sense to me, yep.
> > >>
> > >> My hope and expectation is that the time required for "verification
> > work"
> > >> will shrink dramatically in the not too distant future - ideally to a
> > >> period of less than a month. In this world, the cost of missing one
> > train
> > >> is reduced to catching the next one.
> > >>
> > >> One of the main goals in shifting focus from "testing" and "test
> plans"
> > to
> > >> "test engineering" is automating as many aspects of release
> > qualification
> > >> as possible, with an asymptotic ideal as a function of compute
> capacity
> > and
> > >> time. While such automation will never be complete (it's likely that
> > >> development of new features will/must include qualification infra
> > changes
> > >> to exercise them), if we're able to apply the same rigor to major
> > releases
> > >> as we are to patchlevel builds with little incremental effort, I'd be
> > >> thrilled.
> > >>
> > >> This is mostly a way of saying:
> > >> – I like the cadence/sequencing Benedict proposes below.
> > >> – I think improvements in test engineering can reduce/eliminate
> > >> invalidation and may increase the scope of what can be a candidate for
> > >> merge on a given branch
> > >> – And if not, the cost of missing the train is lower because we'll be
> > able
> > >> to deliver major releases more often.
> > >>
> > >> Scott
> > >>
> > >> 
> > >> From: Jeremiah D Jordan 
> > >> Sent: Wednesday, May 27, 2020 11:54 AM
> > >> To: Cassandra DEV
> > >> Subject: Re: [DISCUSS] CASSANDRA-13994
> > >>
> > >> +1 strongly agree.  If we aren’t going to let something go into 4.0.0
> > >> because it would "invalidate testing” then we can not let such a thing
> > go
> > >> into 4.0.1 unless we plan to re-do said testing for the patch release.
> > >>
> > >>> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith <
> > bened...@apache.org>
> > >> wrote:
> > >>>
> > >>> I'm being told this still isn't clear, so let me try in a
> bullet-point
> > >> timeline:
> > >>>
> > >>> * 4.0 Beta
> > >>> * 4.0 Verification Work
> > >>> * [Merge Window]
> > >>> * 4.0 GA
> > >>> * 4.0 Minor Releases
> > >>> * ...
> > >>> * 5.0 Dev
> > >>> * ...
> > >>> * 5.0 Verification Work
> > >>> * GA 5.0
> > >>>
> > >>> I think that anything that is prohibited from "[Merge Window]"
> because
> > >> it invalidates "4.0 Verification Work" must also be prohibited until
> > "5.0
> > >> Dev" because the next equivalent work that can now validate it occurs
> > only
> > >> at "5.0 Verification Work"
> > >>>
> > >>> On 27/05/2020, 19:05, "Benedict Elliott Smith"  >
> > >> wrote:
> > >>>
> > >>>   I'm not sure if I communicated my point very well.  I mean to say
> > >> that if the reason we are prohibiting a patch to land post-beta is
> > because
> > >> it invalidates work we only perform pre-ga, then it probably should
> not
> > be
> > >> permitted to land post-ga either, since it must also invalidate the
> same
> > >> work?
> > >>>
> > >>>   That is to say, if we're comfortable with work landing post-ga
> > >> because we believe it to be safe to release without our
> > pre-major-release
> > >> verification, we should be comfortable with it landing at any time
> > pre-ga
> > >> too.  Anything e

[DISCUSSION] Flaky tests

2020-05-27 Thread Ekaterina Dimitrova
Dear all,
I spent some time these days looking into the Release Lifecycle document.
As we keep on saying we approach Beta based on the Jira board, I was
curious what is the exact borderline to cut it.

Looking at all the latest reports (thanks to everyone who was working on
that; I think having an overview on what's going on is always a good
thing), I have the feeling that the thing that prevents us primarily from
cutting beta at the moment is flaky tests. According to the lifecycle
document:

   - No flaky tests - All tests (Unit Tests and DTests) should pass
   consistently. A failing test, upon analyzing the root cause of failure, may
   be “ignored in exceptional cases”, if appropriate, for the release, after
   discussion in the dev mailing list."

 Now the related questions that popped up into my mind:
- "ignored in exceptional cases" - examples?
- No flaky tests according to Jenkins or CircleCI? Also, some people run
the free tier, others take advantage of premium CircleCI. What should be
the framework?
- Furthermore, flaky tests with what frequency? (This is a tricky question,
I know)

In different conversations with colleagues from the C* community I got the
impression that canonical suite (in this case Jenkins) might be the right
direction to follow.

To be clear, I am always checking any failures seen in any environment and
I truly believe that they are worth it to be checked. Not advocating to
skip anything!  But also, sometimes I feel in many cases CircleCI could
provide input worth tracking but less likely to be product flakes. Am I
right? In addition, different people use different CircleCI config and see
different output. Not to mention flaky tests on Mac running with two
cores... Yes, this is sometimes the only way to reproduce some of the
reported tests' issues...

So my idea was to suggest to start tracking an exact Jenkins report maybe?
Anything reported out of it also to be checked but potentially to be able
to leave it for Beta in case we don't feel it shows a product defect. One
more thing to consider is that the big Test epic is primarily happening in
beta.

Curious to hear what the community thinks about this topic. Probably people
also have additional thoughts based on experience from the previous
releases. How those things worked in the past? Any lessons learned? What is
our "plan Beta"?

Ekaterina Dimitrova
e. ekaterina.dimitr...@datastax.com
w. www.datastax.com


Re: [DISCUSS] CASSANDRA-13994

2020-05-27 Thread Jordan West
On Wed, May 27, 2020 at 1:23 PM Joshua McKenzie 
wrote:

> Maybe. Do we just time box, say we're going to cut an RC and give it 4
> weeks, if nothing awful surfaces we GA?
>

I've seen that work well in the past on other projects. I agree with the
notion that RCs are real candidates for release if no one finds issues with
them. Ideally we would have as little RCs as possible and have more
alphas/betas.

>
> On Wed, May 27, 2020 at 4:12 PM Brandon Williams  wrote:
>
> > Absolutely my understanding.
> >
> > On Wed, May 27, 2020, 2:49 PM Jeremiah D Jordan <
> jeremiah.jor...@gmail.com
> > >
> > wrote:
> >
> > > > A clear point to cut RC's doesn't surface from the above for me.
> > > Releasing
> > > > an RC before broad verification seems wrong, and cutting an RC after
> > the
> > > 4
> > > > points above may as well be GA because it's all known scope.
> > >
> > > Isn’t the whole point of an RC is that it could be the GA?  It is a
> > > “release candidate”, meaning if no one finds any issues with it, that
> can
> > > them become the release?  So that seems like exactly the right time to
> > make
> > > RC releases?
> > >
> > > > On May 27, 2020, at 2:45 PM, Joshua McKenzie 
> > > wrote:
> > > >
> > > > I think we're all on the same page here; I was focusing more on the
> > > release
> > > > lifecycles and sequencing than the entire version cycle. Good to
> > broaden
> > > > scope I think.
> > > >
> > > > One thing we're not considering is the separation of API changes from
> > > major
> > > > changes and how that intersects with release milestones.
> > > >
> > > > Meaning:
> > > > 1. alpha phase
> > > > 2. Milestone: API freeze (all API changes pushed to next major)
> > > > 3. beta phase
> > > > 4. Verification phase (all major disruptive pushed to next major)
> > > >
> > > > A clear point to cut RC's doesn't surface from the above for me.
> > > Releasing
> > > > an RC before broad verification seems wrong, and cutting an RC after
> > the
> > > 4
> > > > points above may as well be GA because it's all known scope.
> > > >
> > > > Thoughts?
> > > >
> > > > On Wed, May 27, 2020 at 3:28 PM Scott Andreas 
> > > wrote:
> > > >
> > > >> That makes sense to me, yep.
> > > >>
> > > >> My hope and expectation is that the time required for "verification
> > > work"
> > > >> will shrink dramatically in the not too distant future - ideally to
> a
> > > >> period of less than a month. In this world, the cost of missing one
> > > train
> > > >> is reduced to catching the next one.
> > > >>
> > > >> One of the main goals in shifting focus from "testing" and "test
> > plans"
> > > to
> > > >> "test engineering" is automating as many aspects of release
> > > qualification
> > > >> as possible, with an asymptotic ideal as a function of compute
> > capacity
> > > and
> > > >> time. While such automation will never be complete (it's likely that
> > > >> development of new features will/must include qualification infra
> > > changes
> > > >> to exercise them), if we're able to apply the same rigor to major
> > > releases
> > > >> as we are to patchlevel builds with little incremental effort, I'd
> be
> > > >> thrilled.
> > > >>
> > > >> This is mostly a way of saying:
> > > >> – I like the cadence/sequencing Benedict proposes below.
> > > >> – I think improvements in test engineering can reduce/eliminate
> > > >> invalidation and may increase the scope of what can be a candidate
> for
> > > >> merge on a given branch
> > > >> – And if not, the cost of missing the train is lower because we'll
> be
> > > able
> > > >> to deliver major releases more often.
> > > >>
> > > >> Scott
> > > >>
> > > >> 
> > > >> From: Jeremiah D Jordan 
> > > >> Sent: Wednesday, May 27, 2020 11:54 AM
> > > >> To: Cassandra DEV
> > > >> Subject: Re: [DISCUSS] CASSANDRA-13994
> > > >>
> > > >> +1 strongly agree.  If we aren’t going to let something go into
> 4.0.0
> > > >> because it would "invalidate testing” then we can not let such a
> thing
> > > go
> > > >> into 4.0.1 unless we plan to re-do said testing for the patch
> release.
> > > >>
> > > >>> On May 27, 2020, at 1:31 PM, Benedict Elliott Smith <
> > > bened...@apache.org>
> > > >> wrote:
> > > >>>
> > > >>> I'm being told this still isn't clear, so let me try in a
> > bullet-point
> > > >> timeline:
> > > >>>
> > > >>> * 4.0 Beta
> > > >>> * 4.0 Verification Work
> > > >>> * [Merge Window]
> > > >>> * 4.0 GA
> > > >>> * 4.0 Minor Releases
> > > >>> * ...
> > > >>> * 5.0 Dev
> > > >>> * ...
> > > >>> * 5.0 Verification Work
> > > >>> * GA 5.0
> > > >>>
> > > >>> I think that anything that is prohibited from "[Merge Window]"
> > because
> > > >> it invalidates "4.0 Verification Work" must also be prohibited until
> > > "5.0
> > > >> Dev" because the next equivalent work that can now validate it
> occurs
> > > only
> > > >> at "5.0 Verification Work"
> > > >>>
> > > >>> On 27/05/2020, 19:05, "Benedict Elliott Smith" <
> bened...@apache.org
> > >
> > > >> wrote:

Re: [DISCUSSION] Flaky tests

2020-05-27 Thread Joshua McKenzie
>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?

Basing our point of view on the canonical test runs on apache infra makes
sense to me, assuming that infra is behaving these days. :) Pretty sure
Mick got that in working order.

At least for me, what I learned in the past is we'd drive to a green test
board and immediately transition it as a milestone, so flaky tests would
reappear like a disappointing game of whack-a-mole. They seem frustratingly
ever-present.

I'd personally advocate for us taking the following stance on flaky tests
from this point in the cycle forward:

   - Default posture to label fix version as beta
   - *excepting* on case-by-case basis, if flake could imply product defect
   that would greatly impair beta testing we leave alpha
   - Take current flakes and go fixver beta
   - Hard, no compromise position on "we don't RC until all flakes are dead"
   - Use Jenkins as canonical source of truth for "is beta ready" cutoff

I'm personally balancing the risk of flaky tests confounding beta work
against my perceived value of being able to widely signal beta's
availability and encourage widespread user testing. I believe the value in
the latter justifies the risk of the former (I currently perceive that risk
as minimal; I could be wrong). I am also weighting the risk of "test
failures persist to or past RC" at 0. That's a hill I'll die on.


On Wed, May 27, 2020 at 5:13 PM Ekaterina Dimitrova <
ekaterina.dimitr...@datastax.com> wrote:

> Dear all,
> I spent some time these days looking into the Release Lifecycle document.
> As we keep on saying we approach Beta based on the Jira board, I was
> curious what is the exact borderline to cut it.
>
> Looking at all the latest reports (thanks to everyone who was working on
> that; I think having an overview on what's going on is always a good
> thing), I have the feeling that the thing that prevents us primarily from
> cutting beta at the moment is flaky tests. According to the lifecycle
> document:
>
>- No flaky tests - All tests (Unit Tests and DTests) should pass
>consistently. A failing test, upon analyzing the root cause of failure,
> may
>be “ignored in exceptional cases”, if appropriate, for the release,
> after
>discussion in the dev mailing list."
>
>  Now the related questions that popped up into my mind:
> - "ignored in exceptional cases" - examples?
> - No flaky tests according to Jenkins or CircleCI? Also, some people run
> the free tier, others take advantage of premium CircleCI. What should be
> the framework?
> - Furthermore, flaky tests with what frequency? (This is a tricky question,
> I know)
>
> In different conversations with colleagues from the C* community I got the
> impression that canonical suite (in this case Jenkins) might be the right
> direction to follow.
>
> To be clear, I am always checking any failures seen in any environment and
> I truly believe that they are worth it to be checked. Not advocating to
> skip anything!  But also, sometimes I feel in many cases CircleCI could
> provide input worth tracking but less likely to be product flakes. Am I
> right? In addition, different people use different CircleCI config and see
> different output. Not to mention flaky tests on Mac running with two
> cores... Yes, this is sometimes the only way to reproduce some of the
> reported tests' issues...
>
> So my idea was to suggest to start tracking an exact Jenkins report maybe?
> Anything reported out of it also to be checked but potentially to be able
> to leave it for Beta in case we don't feel it shows a product defect. One
> more thing to consider is that the big Test epic is primarily happening in
> beta.
>
> Curious to hear what the community thinks about this topic. Probably people
> also have additional thoughts based on experience from the previous
> releases. How those things worked in the past? Any lessons learned? What is
> our "plan Beta"?
>
> Ekaterina Dimitrova
> e. ekaterina.dimitr...@datastax.com
> w. www.datastax.com
>