date:20200507

2020-05-07 Cassandra Kubernetes Operator SIG reminder

2020-05-07 Thread Patrick McFadin

Hi everyone,

Cassandra Kubernetes Operator SIG today at 10AM PST. Just a reminder, I
switched the conference link to Jitsi from Zoom. Link in the wiki:
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG

Today we will be discussing CEP-2 so bring your opinions.
https://docs.google.com/document/d/18Ow4R3tB9GIvdcFO7WmUvjb0a-sT6h0zSCEnfHsPz58/edit#heading=h.haeraryxhhvn

Specifically nailing down Level 1, 2 and 3

See you then

Patrick

Re: List of serious issues fixed in 3.0.x

2020-05-07 Thread Joshua McKenzie

I did a little analysis on this data (any defect marked with fixversion 4.0
that rose to the level of critical in terms of availability, correctness,
or corruption/loss) and charted some things the rest of the project
community might find interesting:

1: Critical (availability, correctness, corruption/loss) defects fixed per
month since about 6 months before 3.11.0:
[image: monthly.png]

2: Components in which critical defects arose (note: bright red bar == sum
of 3 dark red):
[image: Total Defects by Component.png]

3: Type of defect found and fixed (bright red: cluster down or permaloss,
dark red: temp corrupt/loss, yellow: incorrect response):

[image: Total Defects by Type.png]

My personal takeaways from this: a ton of great defect fixing work has gone
into 4.0. I'd love it if we had both code coverage analysis for testing on
the codebase as well as data to surface where hotspots of defects are in
the code that might need further testing (caveat: many have voiced their
skepticism of the value of this type of data in the past in this project
community, so that's probably another conversation to have on another
thread)

Hope someone else finds the above interesting if not useful.

~Josh


On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi  wrote:

> Hi Sankalp,
>
> Thanks for bringing this up. At the very minimum, I hope we have
> regression tests for the specific issues we have fixed.
>
> I personally think, the project should focus on building a comprehensive
> test suite. However, some of these issues can only be detected at scale. We
> need users to test* C* in their environment for their use-cases. Ideally
> these folks stand up large clusters and tee their traffic to the new
> cluster and report issues.
>
> If we had an automated test suite that everyone can run at a large scale
> that would be even better.
>
> Thanks,
>
> Dinesh
>
>
> * test != starting C* in a few nodes and looking at logs.
>
> > On May 6, 2020, at 10:11 AM, sankalp kohli 
> wrote:
> >
> > Hi,
> >I want to share some of the serious issues that were found and fixed
> in
> > 3.0.x. I have created this list from JIRA to help us identify areas for
> > validating 4.0.  This will also give an insight to the dev community.
> >
> > Let us know if anyone has suggestions on how to better use this data in
> > validating 4.0. Also this list might be missing some issues identified
> > early on in 3.0.x or some latest ones.
> >
> > Link: https://tinyurl.com/30seriousissues
> >
> > Thanks,
> > Sankalp
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: List of serious issues fixed in 3.0.x

2020-05-07 Thread Joshua McKenzie

Hearing the images got killed by the web server. Trying from gmail (sorry
for spam). Time to see if it's the apache smtp server or the list culling
images:

---
I did a little analysis on this data (any defect marked with fixversion 4.0
that rose to the level of critical in terms of availability, correctness,
or corruption/loss) and charted some things the rest of the project
community might find interesting:

1: Critical (availability, correctness, corruption/loss) defects fixed per
month since about 6 months before 3.11.0:
[image: monthly.png]

2: Components in which critical defects arose (note: bright red bar == sum
of 3 dark red):
[image: Total Defects by Component.png]

3: Type of defect found and fixed (bright red: cluster down or permaloss,
dark red: temp corrupt/loss, yellow: incorrect response):

[image: Total Defects by Type.png]

My personal takeaways from this: a ton of great defect fixing work has gone
into 4.0. I'd love it if we had both code coverage analysis for testing on
the codebase as well as data to surface where hotspots of defects are in
the code that might need further testing (caveat: many have voiced their
skepticism of the value of this type of data in the past in this project
community, so that's probably another conversation to have on another
thread)

Hope someone else finds the above interesting if not useful.

--
Joshua McKenzie

On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie 
wrote:

> I did a little analysis on this data (any defect marked with fixversion
> 4.0 that rose to the level of critical in terms of availability,
> correctness, or corruption/loss) and charted some things the rest of the
> project community might find interesting:
>
> 1: Critical (availability, correctness, corruption/loss) defects fixed per
> month since about 6 months before 3.11.0:
> [image: monthly.png]
>
> 2: Components in which critical defects arose (note: bright red bar == sum
> of 3 dark red):
> [image: Total Defects by Component.png]
>
> 3: Type of defect found and fixed (bright red: cluster down or permaloss,
> dark red: temp corrupt/loss, yellow: incorrect response):
>
> [image: Total Defects by Type.png]
>
> My personal takeaways from this: a ton of great defect fixing work has
> gone into 4.0. I'd love it if we had both code coverage analysis for
> testing on the codebase as well as data to surface where hotspots of
> defects are in the code that might need further testing (caveat: many have
> voiced their skepticism of the value of this type of data in the past in
> this project community, so that's probably another conversation to have on
> another thread)
>
> Hope someone else finds the above interesting if not useful.
>
> ~Josh
>
>
> On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi  wrote:
>
>> Hi Sankalp,
>>
>> Thanks for bringing this up. At the very minimum, I hope we have
>> regression tests for the specific issues we have fixed.
>>
>> I personally think, the project should focus on building a comprehensive
>> test suite. However, some of these issues can only be detected at scale. We
>> need users to test* C* in their environment for their use-cases. Ideally
>> these folks stand up large clusters and tee their traffic to the new
>> cluster and report issues.
>>
>> If we had an automated test suite that everyone can run at a large scale
>> that would be even better.
>>
>> Thanks,
>>
>> Dinesh
>>
>>
>> * test != starting C* in a few nodes and looking at logs.
>>
>> > On May 6, 2020, at 10:11 AM, sankalp kohli 
>> wrote:
>> >
>> > Hi,
>> >I want to share some of the serious issues that were found and fixed
>> in
>> > 3.0.x. I have created this list from JIRA to help us identify areas for
>> > validating 4.0.  This will also give an insight to the dev community.
>> >
>> > Let us know if anyone has suggestions on how to better use this data in
>> > validating 4.0. Also this list might be missing some issues identified
>> > early on in 3.0.x or some latest ones.
>> >
>> > Link: https://tinyurl.com/30seriousissues
>> >
>> > Thanks,
>> > Sankalp
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>
>>

Re: List of serious issues fixed in 3.0.x

2020-05-07 Thread Scott Andreas

Sankalp, thanks for sending the spreadsheet and Josh for preparing this 
analysis (pending image issues; look forward to reading)!

I'd encourage everyone involved in the project to review the list of tickets 
captured here. These issues aren't theoretical and represent real scenarios 
that result in data loss, data corruption, incorrect responses to queries, and 
other violations of fundamental properties of the database.

As a community, we've made great progress over the past two years. The focus on 
quality has dramatically improved the safety of Cassandra as a database -- 
especially in the most recent patchlevel releases of the 3.0.x and 3.11.x 
series.

That said, we're also not out of the woods. The following three issues have 
been reported and confirmed genuine in the past week:

– CASSANDRA-15789: Rows can get duplicated in mixed major-version clusters and 
after full upgrade
– CASSANDRA-15778: CorruptSSTableException after a 2.1 SSTable is upgraded to 
3.0, failing reads
– CASSANDRA-15790: EmptyType doesn't override writeValue so could attempt to 
write bytes when expected not to

Regarding Dinesh's point on regression tests, we're beginning to go even 
further. In response to the issues in this spreadsheet, we're evolving new 
approaches toward *active assertion* of data integrity. C-15789 adds 
read/repair/compaction-path detection of primary key duplication, a great way 
to audit and remediate instances of corruption detected in a cluster. Repaired 
data tracking introduced in C-14145 and improvements to Preview Repair are also 
great examples, enabling Cassandra to assert the consistency of repaired data 
(something we'd taken for granted). Active assertion of data integrity 
invariants in Cassandra is an important frontier -- and one we need to explore 
further.

Previously-adopted methodologies like property-based testing, large-scale diff 
tests asserting identity of data between 2.1- and 3.0.x clusters post-upgrade 
via billions of randomized queries, fault injection, model-based tests, CI 
improvements, and flaky test reduction have helped us make huge progress toward 
quality and continue to pay dividends.

I want to thank everyone for their work on safety and stability. It's clear we 
have more ahead, but it's critical to Apache Cassandra's future and toward 
shipping a 4.0 release that users can trust and adopt quickly.

– Scott


From: Joshua McKenzie 
Sent: Thursday, May 7, 2020 9:31 AM
Cc: dev@cassandra.apache.org
Subject: Re: List of serious issues fixed in 3.0.x

Hearing the images got killed by the web server. Trying from gmail (sorry for 
spam). Time to see if it's the apache smtp server or the list culling images:

---
I did a little analysis on this data (any defect marked with fixversion 4.0 
that rose to the level of critical in terms of availability, correctness, or 
corruption/loss) and charted some things the rest of the project community 
might find interesting:

1: Critical (availability, correctness, corruption/loss) defects fixed per 
month since about 6 months before 3.11.0:
[monthly.png]

2: Components in which critical defects arose (note: bright red bar == sum of 3 
dark red):
[Total Defects by Component.png]

3: Type of defect found and fixed (bright red: cluster down or permaloss, dark 
red: temp corrupt/loss, yellow: incorrect response):

[Total Defects by Type.png]

My personal takeaways from this: a ton of great defect fixing work has gone 
into 4.0. I'd love it if we had both code coverage analysis for testing on the 
codebase as well as data to surface where hotspots of defects are in the code 
that might need further testing (caveat: many have voiced their skepticism of 
the value of this type of data in the past in this project community, so that's 
probably another conversation to have on another thread)

Hope someone else finds the above interesting if not useful.

--
Joshua McKenzie

On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie 
mailto:jmcken...@apache.org>> wrote:
I did a little analysis on this data (any defect marked with fixversion 4.0 
that rose to the level of critical in terms of availability, correctness, or 
corruption/loss) and charted some things the rest of the project community 
might find interesting:

1: Critical (availability, correctness, corruption/loss) defects fixed per 
month since about 6 months before 3.11.0:
[monthly.png]

2: Components in which critical defects arose (note: bright red bar == sum of 3 
dark red):
[Total Defects by Component.png]

3: Type of defect found and fixed (bright red: cluster down or permaloss, dark 
red: temp corrupt/loss, yellow: incorrect response):

[Total Defects by Type.png]

My personal takeaways from this: a ton of great defect fixing work has gone 
into 4.0. I'd love it if we had both code coverage analysis for testing on the 
codebase as well as data to surface where hotspots of defects are in the code 
that m

Re: List of serious issues fixed in 3.0.x

2020-05-07 Thread Joshua McKenzie

"ML is plaintext bro" - thanks Mick. ಠ_ಠ

Since we're stuck in the late 90's, here's some links to a gsheet:

Defects by month:
https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1584867240
Defects by component:
https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1946109279
Defects by type:
https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=385136105

On Thu, May 7, 2020 at 12:31 PM Joshua McKenzie 
wrote:

> Hearing the images got killed by the web server. Trying from gmail (sorry
> for spam). Time to see if it's the apache smtp server or the list culling
> images:
>
> ---
> I did a little analysis on this data (any defect marked with fixversion
> 4.0 that rose to the level of critical in terms of availability,
> correctness, or corruption/loss) and charted some things the rest of the
> project community might find interesting:
>
> 1: Critical (availability, correctness, corruption/loss) defects fixed per
> month since about 6 months before 3.11.0:
> [image: monthly.png]
>
> 2: Components in which critical defects arose (note: bright red bar == sum
> of 3 dark red):
> [image: Total Defects by Component.png]
>
> 3: Type of defect found and fixed (bright red: cluster down or permaloss,
> dark red: temp corrupt/loss, yellow: incorrect response):
>
> [image: Total Defects by Type.png]
>
> My personal takeaways from this: a ton of great defect fixing work has
> gone into 4.0. I'd love it if we had both code coverage analysis for
> testing on the codebase as well as data to surface where hotspots of
> defects are in the code that might need further testing (caveat: many have
> voiced their skepticism of the value of this type of data in the past in
> this project community, so that's probably another conversation to have on
> another thread)
>
> Hope someone else finds the above interesting if not useful.
>
> --
> Joshua McKenzie
>
> On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie 
> wrote:
>
>> I did a little analysis on this data (any defect marked with fixversion
>> 4.0 that rose to the level of critical in terms of availability,
>> correctness, or corruption/loss) and charted some things the rest of the
>> project community might find interesting:
>>
>> 1: Critical (availability, correctness, corruption/loss) defects fixed
>> per month since about 6 months before 3.11.0:
>> [image: monthly.png]
>>
>> 2: Components in which critical defects arose (note: bright red bar ==
>> sum of 3 dark red):
>> [image: Total Defects by Component.png]
>>
>> 3: Type of defect found and fixed (bright red: cluster down or permaloss,
>> dark red: temp corrupt/loss, yellow: incorrect response):
>>
>> [image: Total Defects by Type.png]
>>
>> My personal takeaways from this: a ton of great defect fixing work has
>> gone into 4.0. I'd love it if we had both code coverage analysis for
>> testing on the codebase as well as data to surface where hotspots of
>> defects are in the code that might need further testing (caveat: many have
>> voiced their skepticism of the value of this type of data in the past in
>> this project community, so that's probably another conversation to have on
>> another thread)
>>
>> Hope someone else finds the above interesting if not useful.
>>
>> ~Josh
>>
>>
>> On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi  wrote:
>>
>>> Hi Sankalp,
>>>
>>> Thanks for bringing this up. At the very minimum, I hope we have
>>> regression tests for the specific issues we have fixed.
>>>
>>> I personally think, the project should focus on building a comprehensive
>>> test suite. However, some of these issues can only be detected at scale. We
>>> need users to test* C* in their environment for their use-cases. Ideally
>>> these folks stand up large clusters and tee their traffic to the new
>>> cluster and report issues.
>>>
>>> If we had an automated test suite that everyone can run at a large scale
>>> that would be even better.
>>>
>>> Thanks,
>>>
>>> Dinesh
>>>
>>>
>>> * test != starting C* in a few nodes and looking at logs.
>>>
>>> > On May 6, 2020, at 10:11 AM, sankalp kohli 
>>> wrote:
>>> >
>>> > Hi,
>>> >I want to share some of the serious issues that were found and
>>> fixed in
>>> > 3.0.x. I have created this list from JIRA to help us identify areas for
>>> > validating 4.0.  This will also give an insight to the dev community.
>>> >
>>> > Let us know if anyone has suggestions on how to better use this data in
>>> > validating 4.0. Also this list might be missing some issues identified
>>> > early on in 3.0.x or some latest ones.
>>> >
>>> > Link: https://tinyurl.com/30seriousissues
>>> >
>>> > Thanks,
>>> > Sankalp
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
>>>
>>>

Calling for release managers (Committers and PMC)

2020-05-07 Thread Mick Semb Wever

The Cassandra release process has had some improvements to better in
line with the ASF guidelines: sha256 & sha512 checksums, staged
artefacts in svnpubsub, dep and rpm repositories complete and signed
in staging, and separate scripts and manual steps merged together.

The updated documentation for cutting, voting, and publishing a
release is found here:
https://cassandra.apache.org/doc/latest/development/release_process.html

I am hoping to get as many Committers* and PMC members interested as
possible for cutting a future release.

Who is interested? How many names can I get :-)

The more that are interested then the easier it is to take turns and
be flexible depending on our own availability each time. I will help
out everyone on their first run. Indeed most of my motivation in
getting involved with the release process was to make it all as simple
and as forgettable as possible, so the role of the role manager can
change easily from release to release.

*When a Committer cuts a release, a PMC member has to perform the very
last post-vote publish step.

regards,
Mick

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Calling for release managers (Committers and PMC)

2020-05-07 Thread Jordan West

*raises hand*

- Jordan

On Thu, May 7, 2020 at 11:29 AM Mick Semb Wever  wrote:

> The Cassandra release process has had some improvements to better in
> line with the ASF guidelines: sha256 & sha512 checksums, staged
> artefacts in svnpubsub, dep and rpm repositories complete and signed
> in staging, and separate scripts and manual steps merged together.
>
> The updated documentation for cutting, voting, and publishing a
> release is found here:
> https://cassandra.apache.org/doc/latest/development/release_process.html
>
> I am hoping to get as many Committers* and PMC members interested as
> possible for cutting a future release.
>
> Who is interested? How many names can I get :-)
>
> The more that are interested then the easier it is to take turns and
> be flexible depending on our own availability each time. I will help
> out everyone on their first run. Indeed most of my motivation in
> getting involved with the release process was to make it all as simple
> and as forgettable as possible, so the role of the role manager can
> change easily from release to release.
>
> *When a Committer cuts a release, a PMC member has to perform the very
> last post-vote publish step.
>
> regards,
> Mick
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

Re: Calling for release managers (Committers and PMC)

2020-05-07 Thread Robert Stupp

I can help

--
Robert Stupp
@snazy

> Am 07.05.2020 um 20:29 schrieb Mick Semb Wever :
> 
> The Cassandra release process has had some improvements to better in
> line with the ASF guidelines: sha256 & sha512 checksums, staged
> artefacts in svnpubsub, dep and rpm repositories complete and signed
> in staging, and separate scripts and manual steps merged together.
> 
> The updated documentation for cutting, voting, and publishing a
> release is found here:
> https://cassandra.apache.org/doc/latest/development/release_process.html
> 
> I am hoping to get as many Committers* and PMC members interested as
> possible for cutting a future release.
> 
> Who is interested? How many names can I get :-)
> 
> The more that are interested then the easier it is to take turns and
> be flexible depending on our own availability each time. I will help
> out everyone on their first run. Indeed most of my motivation in
> getting involved with the release process was to make it all as simple
> and as forgettable as possible, so the role of the role manager can
> change easily from release to release.
> 
> *When a Committer cuts a release, a PMC member has to perform the very
> last post-vote publish step.
> 
> regards,
> Mick
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Calling for release managers (Committers and PMC)

2020-05-07 Thread Dinesh Joshi

I can help out.

Dinesh

> On May 7, 2020, at 11:29 AM, Mick Semb Wever  wrote:
> 
> The Cassandra release process has had some improvements to better in
> line with the ASF guidelines: sha256 & sha512 checksums, staged
> artefacts in svnpubsub, dep and rpm repositories complete and signed
> in staging, and separate scripts and manual steps merged together.
> 
> The updated documentation for cutting, voting, and publishing a
> release is found here:
> https://cassandra.apache.org/doc/latest/development/release_process.html
> 
> I am hoping to get as many Committers* and PMC members interested as
> possible for cutting a future release.
> 
> Who is interested? How many names can I get :-)
> 
> The more that are interested then the easier it is to take turns and
> be flexible depending on our own availability each time. I will help
> out everyone on their first run. Indeed most of my motivation in
> getting involved with the release process was to make it all as simple
> and as forgettable as possible, so the role of the role manager can
> change easily from release to release.
> 
> *When a Committer cuts a release, a PMC member has to perform the very
> last post-vote publish step.
> 
> regards,
> Mick
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Re: Calling for release managers (Committers and PMC)

2020-05-07 Thread Jon Meredith

Sign me up.

On Thu, May 7, 2020 at 12:36 PM Robert Stupp  wrote:
>
> I can help
>
> --
> Robert Stupp
> @snazy
>
> > Am 07.05.2020 um 20:29 schrieb Mick Semb Wever :
> >
> > The Cassandra release process has had some improvements to better in
> > line with the ASF guidelines: sha256 & sha512 checksums, staged
> > artefacts in svnpubsub, dep and rpm repositories complete and signed
> > in staging, and separate scripts and manual steps merged together.
> >
> > The updated documentation for cutting, voting, and publishing a
> > release is found here:
> > https://cassandra.apache.org/doc/latest/development/release_process.html
> >
> > I am hoping to get as many Committers* and PMC members interested as
> > possible for cutting a future release.
> >
> > Who is interested? How many names can I get :-)
> >
> > The more that are interested then the easier it is to take turns and
> > be flexible depending on our own availability each time. I will help
> > out everyone on their first run. Indeed most of my motivation in
> > getting involved with the release process was to make it all as simple
> > and as forgettable as possible, so the role of the role manager can
> > change easily from release to release.
> >
> > *When a Committer cuts a release, a PMC member has to perform the very
> > last post-vote publish step.
> >
> > regards,
> > Mick
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

2020-05-07 Cassandra Kubernetes Operator SIG reminder

Re: List of serious issues fixed in 3.0.x

Re: List of serious issues fixed in 3.0.x

Re: List of serious issues fixed in 3.0.x

Re: List of serious issues fixed in 3.0.x

Calling for release managers (Committers and PMC)

Re: Calling for release managers (Committers and PMC)

Re: Calling for release managers (Committers and PMC)

Re: Calling for release managers (Committers and PMC)

Re: Calling for release managers (Committers and PMC)

10 matches

Site Navigation

Mail list logo

Footer information