2020-05-07 Cassandra Kubernetes Operator SIG reminder
Hi everyone, Cassandra Kubernetes Operator SIG today at 10AM PST. Just a reminder, I switched the conference link to Jitsi from Zoom. Link in the wiki: https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Kubernetes+Operator+SIG Today we will be discussing CEP-2 so bring your opinions. https://docs.google.com/document/d/18Ow4R3tB9GIvdcFO7WmUvjb0a-sT6h0zSCEnfHsPz58/edit#heading=h.haeraryxhhvn Specifically nailing down Level 1, 2 and 3 See you then Patrick
Re: List of serious issues fixed in 3.0.x
I did a little analysis on this data (any defect marked with fixversion 4.0 that rose to the level of critical in terms of availability, correctness, or corruption/loss) and charted some things the rest of the project community might find interesting: 1: Critical (availability, correctness, corruption/loss) defects fixed per month since about 6 months before 3.11.0: [image: monthly.png] 2: Components in which critical defects arose (note: bright red bar == sum of 3 dark red): [image: Total Defects by Component.png] 3: Type of defect found and fixed (bright red: cluster down or permaloss, dark red: temp corrupt/loss, yellow: incorrect response): [image: Total Defects by Type.png] My personal takeaways from this: a ton of great defect fixing work has gone into 4.0. I'd love it if we had both code coverage analysis for testing on the codebase as well as data to surface where hotspots of defects are in the code that might need further testing (caveat: many have voiced their skepticism of the value of this type of data in the past in this project community, so that's probably another conversation to have on another thread) Hope someone else finds the above interesting if not useful. ~Josh On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi wrote: > Hi Sankalp, > > Thanks for bringing this up. At the very minimum, I hope we have > regression tests for the specific issues we have fixed. > > I personally think, the project should focus on building a comprehensive > test suite. However, some of these issues can only be detected at scale. We > need users to test* C* in their environment for their use-cases. Ideally > these folks stand up large clusters and tee their traffic to the new > cluster and report issues. > > If we had an automated test suite that everyone can run at a large scale > that would be even better. > > Thanks, > > Dinesh > > > * test != starting C* in a few nodes and looking at logs. > > > On May 6, 2020, at 10:11 AM, sankalp kohli > wrote: > > > > Hi, > >I want to share some of the serious issues that were found and fixed > in > > 3.0.x. I have created this list from JIRA to help us identify areas for > > validating 4.0. This will also give an insight to the dev community. > > > > Let us know if anyone has suggestions on how to better use this data in > > validating 4.0. Also this list might be missing some issues identified > > early on in 3.0.x or some latest ones. > > > > Link: https://tinyurl.com/30seriousissues > > > > Thanks, > > Sankalp > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: List of serious issues fixed in 3.0.x
Hearing the images got killed by the web server. Trying from gmail (sorry for spam). Time to see if it's the apache smtp server or the list culling images: --- I did a little analysis on this data (any defect marked with fixversion 4.0 that rose to the level of critical in terms of availability, correctness, or corruption/loss) and charted some things the rest of the project community might find interesting: 1: Critical (availability, correctness, corruption/loss) defects fixed per month since about 6 months before 3.11.0: [image: monthly.png] 2: Components in which critical defects arose (note: bright red bar == sum of 3 dark red): [image: Total Defects by Component.png] 3: Type of defect found and fixed (bright red: cluster down or permaloss, dark red: temp corrupt/loss, yellow: incorrect response): [image: Total Defects by Type.png] My personal takeaways from this: a ton of great defect fixing work has gone into 4.0. I'd love it if we had both code coverage analysis for testing on the codebase as well as data to surface where hotspots of defects are in the code that might need further testing (caveat: many have voiced their skepticism of the value of this type of data in the past in this project community, so that's probably another conversation to have on another thread) Hope someone else finds the above interesting if not useful. -- Joshua McKenzie On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie wrote: > I did a little analysis on this data (any defect marked with fixversion > 4.0 that rose to the level of critical in terms of availability, > correctness, or corruption/loss) and charted some things the rest of the > project community might find interesting: > > 1: Critical (availability, correctness, corruption/loss) defects fixed per > month since about 6 months before 3.11.0: > [image: monthly.png] > > 2: Components in which critical defects arose (note: bright red bar == sum > of 3 dark red): > [image: Total Defects by Component.png] > > 3: Type of defect found and fixed (bright red: cluster down or permaloss, > dark red: temp corrupt/loss, yellow: incorrect response): > > [image: Total Defects by Type.png] > > My personal takeaways from this: a ton of great defect fixing work has > gone into 4.0. I'd love it if we had both code coverage analysis for > testing on the codebase as well as data to surface where hotspots of > defects are in the code that might need further testing (caveat: many have > voiced their skepticism of the value of this type of data in the past in > this project community, so that's probably another conversation to have on > another thread) > > Hope someone else finds the above interesting if not useful. > > ~Josh > > > On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi wrote: > >> Hi Sankalp, >> >> Thanks for bringing this up. At the very minimum, I hope we have >> regression tests for the specific issues we have fixed. >> >> I personally think, the project should focus on building a comprehensive >> test suite. However, some of these issues can only be detected at scale. We >> need users to test* C* in their environment for their use-cases. Ideally >> these folks stand up large clusters and tee their traffic to the new >> cluster and report issues. >> >> If we had an automated test suite that everyone can run at a large scale >> that would be even better. >> >> Thanks, >> >> Dinesh >> >> >> * test != starting C* in a few nodes and looking at logs. >> >> > On May 6, 2020, at 10:11 AM, sankalp kohli >> wrote: >> > >> > Hi, >> >I want to share some of the serious issues that were found and fixed >> in >> > 3.0.x. I have created this list from JIRA to help us identify areas for >> > validating 4.0. This will also give an insight to the dev community. >> > >> > Let us know if anyone has suggestions on how to better use this data in >> > validating 4.0. Also this list might be missing some issues identified >> > early on in 3.0.x or some latest ones. >> > >> > Link: https://tinyurl.com/30seriousissues >> > >> > Thanks, >> > Sankalp >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> >>
Re: List of serious issues fixed in 3.0.x
Sankalp, thanks for sending the spreadsheet and Josh for preparing this analysis (pending image issues; look forward to reading)! I'd encourage everyone involved in the project to review the list of tickets captured here. These issues aren't theoretical and represent real scenarios that result in data loss, data corruption, incorrect responses to queries, and other violations of fundamental properties of the database. As a community, we've made great progress over the past two years. The focus on quality has dramatically improved the safety of Cassandra as a database -- especially in the most recent patchlevel releases of the 3.0.x and 3.11.x series. That said, we're also not out of the woods. The following three issues have been reported and confirmed genuine in the past week: – CASSANDRA-15789: Rows can get duplicated in mixed major-version clusters and after full upgrade – CASSANDRA-15778: CorruptSSTableException after a 2.1 SSTable is upgraded to 3.0, failing reads – CASSANDRA-15790: EmptyType doesn't override writeValue so could attempt to write bytes when expected not to Regarding Dinesh's point on regression tests, we're beginning to go even further. In response to the issues in this spreadsheet, we're evolving new approaches toward *active assertion* of data integrity. C-15789 adds read/repair/compaction-path detection of primary key duplication, a great way to audit and remediate instances of corruption detected in a cluster. Repaired data tracking introduced in C-14145 and improvements to Preview Repair are also great examples, enabling Cassandra to assert the consistency of repaired data (something we'd taken for granted). Active assertion of data integrity invariants in Cassandra is an important frontier -- and one we need to explore further. Previously-adopted methodologies like property-based testing, large-scale diff tests asserting identity of data between 2.1- and 3.0.x clusters post-upgrade via billions of randomized queries, fault injection, model-based tests, CI improvements, and flaky test reduction have helped us make huge progress toward quality and continue to pay dividends. I want to thank everyone for their work on safety and stability. It's clear we have more ahead, but it's critical to Apache Cassandra's future and toward shipping a 4.0 release that users can trust and adopt quickly. – Scott From: Joshua McKenzie Sent: Thursday, May 7, 2020 9:31 AM Cc: dev@cassandra.apache.org Subject: Re: List of serious issues fixed in 3.0.x Hearing the images got killed by the web server. Trying from gmail (sorry for spam). Time to see if it's the apache smtp server or the list culling images: --- I did a little analysis on this data (any defect marked with fixversion 4.0 that rose to the level of critical in terms of availability, correctness, or corruption/loss) and charted some things the rest of the project community might find interesting: 1: Critical (availability, correctness, corruption/loss) defects fixed per month since about 6 months before 3.11.0: [monthly.png] 2: Components in which critical defects arose (note: bright red bar == sum of 3 dark red): [Total Defects by Component.png] 3: Type of defect found and fixed (bright red: cluster down or permaloss, dark red: temp corrupt/loss, yellow: incorrect response): [Total Defects by Type.png] My personal takeaways from this: a ton of great defect fixing work has gone into 4.0. I'd love it if we had both code coverage analysis for testing on the codebase as well as data to surface where hotspots of defects are in the code that might need further testing (caveat: many have voiced their skepticism of the value of this type of data in the past in this project community, so that's probably another conversation to have on another thread) Hope someone else finds the above interesting if not useful. -- Joshua McKenzie On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie mailto:jmcken...@apache.org>> wrote: I did a little analysis on this data (any defect marked with fixversion 4.0 that rose to the level of critical in terms of availability, correctness, or corruption/loss) and charted some things the rest of the project community might find interesting: 1: Critical (availability, correctness, corruption/loss) defects fixed per month since about 6 months before 3.11.0: [monthly.png] 2: Components in which critical defects arose (note: bright red bar == sum of 3 dark red): [Total Defects by Component.png] 3: Type of defect found and fixed (bright red: cluster down or permaloss, dark red: temp corrupt/loss, yellow: incorrect response): [Total Defects by Type.png] My personal takeaways from this: a ton of great defect fixing work has gone into 4.0. I'd love it if we had both code coverage analysis for testing on the codebase as well as data to surface where hotspots of defects are in the code that m
Re: List of serious issues fixed in 3.0.x
"ML is plaintext bro" - thanks Mick. ಠ_ಠ Since we're stuck in the late 90's, here's some links to a gsheet: Defects by month: https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1584867240 Defects by component: https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=1946109279 Defects by type: https://docs.google.com/spreadsheets/d/1Qt8lLIiqVvK7mlSML7zsmXdAc-LsvktFW5RXJDRtN8k/edit#gid=385136105 On Thu, May 7, 2020 at 12:31 PM Joshua McKenzie wrote: > Hearing the images got killed by the web server. Trying from gmail (sorry > for spam). Time to see if it's the apache smtp server or the list culling > images: > > --- > I did a little analysis on this data (any defect marked with fixversion > 4.0 that rose to the level of critical in terms of availability, > correctness, or corruption/loss) and charted some things the rest of the > project community might find interesting: > > 1: Critical (availability, correctness, corruption/loss) defects fixed per > month since about 6 months before 3.11.0: > [image: monthly.png] > > 2: Components in which critical defects arose (note: bright red bar == sum > of 3 dark red): > [image: Total Defects by Component.png] > > 3: Type of defect found and fixed (bright red: cluster down or permaloss, > dark red: temp corrupt/loss, yellow: incorrect response): > > [image: Total Defects by Type.png] > > My personal takeaways from this: a ton of great defect fixing work has > gone into 4.0. I'd love it if we had both code coverage analysis for > testing on the codebase as well as data to surface where hotspots of > defects are in the code that might need further testing (caveat: many have > voiced their skepticism of the value of this type of data in the past in > this project community, so that's probably another conversation to have on > another thread) > > Hope someone else finds the above interesting if not useful. > > -- > Joshua McKenzie > > On Thu, May 7, 2020 at 12:24 PM Joshua McKenzie > wrote: > >> I did a little analysis on this data (any defect marked with fixversion >> 4.0 that rose to the level of critical in terms of availability, >> correctness, or corruption/loss) and charted some things the rest of the >> project community might find interesting: >> >> 1: Critical (availability, correctness, corruption/loss) defects fixed >> per month since about 6 months before 3.11.0: >> [image: monthly.png] >> >> 2: Components in which critical defects arose (note: bright red bar == >> sum of 3 dark red): >> [image: Total Defects by Component.png] >> >> 3: Type of defect found and fixed (bright red: cluster down or permaloss, >> dark red: temp corrupt/loss, yellow: incorrect response): >> >> [image: Total Defects by Type.png] >> >> My personal takeaways from this: a ton of great defect fixing work has >> gone into 4.0. I'd love it if we had both code coverage analysis for >> testing on the codebase as well as data to surface where hotspots of >> defects are in the code that might need further testing (caveat: many have >> voiced their skepticism of the value of this type of data in the past in >> this project community, so that's probably another conversation to have on >> another thread) >> >> Hope someone else finds the above interesting if not useful. >> >> ~Josh >> >> >> On Wed, May 6, 2020 at 3:38 PM Dinesh Joshi wrote: >> >>> Hi Sankalp, >>> >>> Thanks for bringing this up. At the very minimum, I hope we have >>> regression tests for the specific issues we have fixed. >>> >>> I personally think, the project should focus on building a comprehensive >>> test suite. However, some of these issues can only be detected at scale. We >>> need users to test* C* in their environment for their use-cases. Ideally >>> these folks stand up large clusters and tee their traffic to the new >>> cluster and report issues. >>> >>> If we had an automated test suite that everyone can run at a large scale >>> that would be even better. >>> >>> Thanks, >>> >>> Dinesh >>> >>> >>> * test != starting C* in a few nodes and looking at logs. >>> >>> > On May 6, 2020, at 10:11 AM, sankalp kohli >>> wrote: >>> > >>> > Hi, >>> >I want to share some of the serious issues that were found and >>> fixed in >>> > 3.0.x. I have created this list from JIRA to help us identify areas for >>> > validating 4.0. This will also give an insight to the dev community. >>> > >>> > Let us know if anyone has suggestions on how to better use this data in >>> > validating 4.0. Also this list might be missing some issues identified >>> > early on in 3.0.x or some latest ones. >>> > >>> > Link: https://tinyurl.com/30seriousissues >>> > >>> > Thanks, >>> > Sankalp >>> >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org >>> >>>
Calling for release managers (Committers and PMC)
The Cassandra release process has had some improvements to better in line with the ASF guidelines: sha256 & sha512 checksums, staged artefacts in svnpubsub, dep and rpm repositories complete and signed in staging, and separate scripts and manual steps merged together. The updated documentation for cutting, voting, and publishing a release is found here: https://cassandra.apache.org/doc/latest/development/release_process.html I am hoping to get as many Committers* and PMC members interested as possible for cutting a future release. Who is interested? How many names can I get :-) The more that are interested then the easier it is to take turns and be flexible depending on our own availability each time. I will help out everyone on their first run. Indeed most of my motivation in getting involved with the release process was to make it all as simple and as forgettable as possible, so the role of the role manager can change easily from release to release. *When a Committer cuts a release, a PMC member has to perform the very last post-vote publish step. regards, Mick - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Calling for release managers (Committers and PMC)
*raises hand* - Jordan On Thu, May 7, 2020 at 11:29 AM Mick Semb Wever wrote: > The Cassandra release process has had some improvements to better in > line with the ASF guidelines: sha256 & sha512 checksums, staged > artefacts in svnpubsub, dep and rpm repositories complete and signed > in staging, and separate scripts and manual steps merged together. > > The updated documentation for cutting, voting, and publishing a > release is found here: > https://cassandra.apache.org/doc/latest/development/release_process.html > > I am hoping to get as many Committers* and PMC members interested as > possible for cutting a future release. > > Who is interested? How many names can I get :-) > > The more that are interested then the easier it is to take turns and > be flexible depending on our own availability each time. I will help > out everyone on their first run. Indeed most of my motivation in > getting involved with the release process was to make it all as simple > and as forgettable as possible, so the role of the role manager can > change easily from release to release. > > *When a Committer cuts a release, a PMC member has to perform the very > last post-vote publish step. > > regards, > Mick > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > >
Re: Calling for release managers (Committers and PMC)
I can help -- Robert Stupp @snazy > Am 07.05.2020 um 20:29 schrieb Mick Semb Wever : > > The Cassandra release process has had some improvements to better in > line with the ASF guidelines: sha256 & sha512 checksums, staged > artefacts in svnpubsub, dep and rpm repositories complete and signed > in staging, and separate scripts and manual steps merged together. > > The updated documentation for cutting, voting, and publishing a > release is found here: > https://cassandra.apache.org/doc/latest/development/release_process.html > > I am hoping to get as many Committers* and PMC members interested as > possible for cutting a future release. > > Who is interested? How many names can I get :-) > > The more that are interested then the easier it is to take turns and > be flexible depending on our own availability each time. I will help > out everyone on their first run. Indeed most of my motivation in > getting involved with the release process was to make it all as simple > and as forgettable as possible, so the role of the role manager can > change easily from release to release. > > *When a Committer cuts a release, a PMC member has to perform the very > last post-vote publish step. > > regards, > Mick > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Calling for release managers (Committers and PMC)
I can help out. Dinesh > On May 7, 2020, at 11:29 AM, Mick Semb Wever wrote: > > The Cassandra release process has had some improvements to better in > line with the ASF guidelines: sha256 & sha512 checksums, staged > artefacts in svnpubsub, dep and rpm repositories complete and signed > in staging, and separate scripts and manual steps merged together. > > The updated documentation for cutting, voting, and publishing a > release is found here: > https://cassandra.apache.org/doc/latest/development/release_process.html > > I am hoping to get as many Committers* and PMC members interested as > possible for cutting a future release. > > Who is interested? How many names can I get :-) > > The more that are interested then the easier it is to take turns and > be flexible depending on our own availability each time. I will help > out everyone on their first run. Indeed most of my motivation in > getting involved with the release process was to make it all as simple > and as forgettable as possible, so the role of the role manager can > change easily from release to release. > > *When a Committer cuts a release, a PMC member has to perform the very > last post-vote publish step. > > regards, > Mick > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Calling for release managers (Committers and PMC)
Sign me up. On Thu, May 7, 2020 at 12:36 PM Robert Stupp wrote: > > I can help > > -- > Robert Stupp > @snazy > > > Am 07.05.2020 um 20:29 schrieb Mick Semb Wever : > > > > The Cassandra release process has had some improvements to better in > > line with the ASF guidelines: sha256 & sha512 checksums, staged > > artefacts in svnpubsub, dep and rpm repositories complete and signed > > in staging, and separate scripts and manual steps merged together. > > > > The updated documentation for cutting, voting, and publishing a > > release is found here: > > https://cassandra.apache.org/doc/latest/development/release_process.html > > > > I am hoping to get as many Committers* and PMC members interested as > > possible for cutting a future release. > > > > Who is interested? How many names can I get :-) > > > > The more that are interested then the easier it is to take turns and > > be flexible depending on our own availability each time. I will help > > out everyone on their first run. Indeed most of my motivation in > > getting involved with the release process was to make it all as simple > > and as forgettable as possible, so the role of the role manager can > > change easily from release to release. > > > > *When a Committer cuts a release, a PMC member has to perform the very > > last post-vote publish step. > > > > regards, > > Mick > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > - > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org