Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
is there a reason all guardrails and reliability (aka repair retries) configs are off by default? They are off by default in the normal config for backwards compatibility reasons, but if we are defining a config saying what we recommend, we should enable these things by default IMO. This is one more question to be answered by this discussion. Are there other options that should be enabled by the "latest" configuration? To what values should they be set? Is there something that is currently enabled that should not be? Should we merge the configs breaking these tests? No…. When we have failing tests people do not spend the time to figure out if their logic caused a regression and merge, making things more unstable… so when we merge failing tests that leads to people merging even more failing tests... In this case this also means that people will not see at all failures that they introduce in any of the advanced features, as they are not tested at all. Also, since CASSANDRA-19167 and 19168 already have fixes, the non-latest test suite will remain clean after merge. Note that these two problems demonstrate that we have failures in the configuration we ship with, because we are not actually testing it at all. IMHO this is a problem that we should not delay fixing. Regards, Branimir On Wed, Feb 14, 2024 at 1:07 AM David Capwell wrote: > so can cause repairs to deadlock forever > > > Small correction, I finished fixing the tests in CASSANDRA-19042 and we > don’t deadlock, we timeout and fail repair if any of those messages are > dropped. > > On Feb 13, 2024, at 11:04 AM, David Capwell wrote: > > and to point potential users that are evaluating the technology to an > optimized set of defaults > > > Left this comment in the GH… is there a reason all guardrails and > reliability (aka repair retries) configs are off by default? They are > off by default in the normal config for backwards compatibility reasons, > but if we are defining a config saying what we recommend, we should enable > these things by default IMO. > > There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community > okay with committing the patch before all of these are addressed? > > > I was tagged on CASSANDRA-19042, the paxos repair message handing does > not have the repair reliably improvements that 5.0 have, so can cause > repairs to deadlock forever (same as current 4.x repairs). Bringing these > up to par with the rest of repair would be very much welcome (they are also > lacking visibility, so need to fallback to heap dumps to see what’s going > on; same as 4.0.x but not 4.1.x), but I doubt I have cycles to do that…. > This refactor is not 100% trivial as it has fun subtle concurrency issues > to address (message retries and dedupping), and making sure this logic > works with the existing repair simulation tests does require refactoring > how the paxos cleanup state is tracked, which could have subtle consequents. > > I do think this should be fixed, but should it block 5.0? Not sure… will > leave to others…. > > Should we merge the configs breaking these tests? No…. When we have > failing tests people do not spend the time to figure out if their logic > caused a regression and merge, making things more unstable… so when we > merge failing tests that leads to people merging even more failing tests... > > On Feb 13, 2024, at 8:41 AM, Branimir Lambov wrote: > > Hi All, > > CASSANDRA-18753 introduces a second set of defaults (in a separate > "cassandra_latest.yaml") that enable new features of Cassandra. The > objective is two-fold: to be able to test the database in this > configuration, and to point potential users that are evaluating the > technology to an optimized set of defaults that give a clearer picture of > the expected performance of the database for a new user. The objective is > to get this configuration into 5.0 to have the extra bit of confidence that > we are not releasing (and recommending) options that have not gone through > thorough CI. > > The implementation has already gone through review, but I'd like to get > people's opinion on two things: > - There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community > okay with committing the patch before all of these are addressed? This > should prevent the introduction of new failures and make sure we don't > release before clearing the existing ones. > - I'd like to get an opinion on what's suitable wording and documentation > for the new defaults set. Currently, the patch proposes adding the > following text to the yaml (see > https://github.com/apache/cassandra/pull/2896/files): > # NOTE: > # This file is provided in two versions: > # - cassandra.yaml: Contains configuration defaults for a "compatible" > # configuration that operates using settings that are > backwards-compatible > #
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
Wording looks good to me. I would also put that into NEWS.txt but I am not sure what section. New features, Upgrading nor Deprecation does not seem to be a good category. On Tue, Feb 13, 2024 at 5:42 PM Branimir Lambov wrote: > Hi All, > > CASSANDRA-18753 introduces a second set of defaults (in a separate > "cassandra_latest.yaml") that enable new features of Cassandra. The > objective is two-fold: to be able to test the database in this > configuration, and to point potential users that are evaluating the > technology to an optimized set of defaults that give a clearer picture of > the expected performance of the database for a new user. The objective is > to get this configuration into 5.0 to have the extra bit of confidence that > we are not releasing (and recommending) options that have not gone through > thorough CI. > > The implementation has already gone through review, but I'd like to get > people's opinion on two things: > - There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community > okay with committing the patch before all of these are addressed? This > should prevent the introduction of new failures and make sure we don't > release before clearing the existing ones. > - I'd like to get an opinion on what's suitable wording and documentation > for the new defaults set. Currently, the patch proposes adding the > following text to the yaml (see > https://github.com/apache/cassandra/pull/2896/files): > # NOTE: > # This file is provided in two versions: > # - cassandra.yaml: Contains configuration defaults for a "compatible" > # configuration that operates using settings that are > backwards-compatible > # and interoperable with machines running older versions of > Cassandra. > # This version is provided to facilitate pain-free upgrades for > existing > # users of Cassandra running in production who want to gradually and > # carefully introduce new features. > # - cassandra_latest.yaml: Contains configuration defaults that enable > # the latest features of Cassandra, including improved functionality > as > # well as higher performance. This version is provided for new users > of > # Cassandra who want to get the most out of their cluster, and for > users > # evaluating the technology. > # To use this version, simply copy this file over cassandra.yaml, or > specify > # it using the -Dcassandra.config system property, e.g. by running > # cassandra > -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml > # /NOTE > Does this sound sensible? Should we add a pointer to this defaults set > elsewhere in the documentation? > > Regards, > Branimir >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
We should not block merging configuration changes given it is a valid configuration - which I understand as it is correct, passes all config validations, it matches documented rules, etc. And this provided latest config matches those requirements I assume. The failures should block release or we should not advertise we have those features at all, and the configuration should be named "experimental" rather than "latest". The config changes are not responsible for broken features and we should not bury our heads in the sand pretending that everything is ok. Thanks, śr., 14 lut 2024, 10:47 użytkownik Štefan Miklošovič < stefan.mikloso...@gmail.com> napisał: > Wording looks good to me. I would also put that into NEWS.txt but I am not > sure what section. New features, Upgrading nor Deprecation does not seem to > be a good category. > > On Tue, Feb 13, 2024 at 5:42 PM Branimir Lambov > wrote: > >> Hi All, >> >> CASSANDRA-18753 introduces a second set of defaults (in a separate >> "cassandra_latest.yaml") that enable new features of Cassandra. The >> objective is two-fold: to be able to test the database in this >> configuration, and to point potential users that are evaluating the >> technology to an optimized set of defaults that give a clearer picture of >> the expected performance of the database for a new user. The objective is >> to get this configuration into 5.0 to have the extra bit of confidence that >> we are not releasing (and recommending) options that have not gone through >> thorough CI. >> >> The implementation has already gone through review, but I'd like to get >> people's opinion on two things: >> - There are currently a number of test failures when the new options are >> selected, some of which appear to be genuine problems. Is the community >> okay with committing the patch before all of these are addressed? This >> should prevent the introduction of new failures and make sure we don't >> release before clearing the existing ones. >> - I'd like to get an opinion on what's suitable wording and documentation >> for the new defaults set. Currently, the patch proposes adding the >> following text to the yaml (see >> https://github.com/apache/cassandra/pull/2896/files): >> # NOTE: >> # This file is provided in two versions: >> # - cassandra.yaml: Contains configuration defaults for a "compatible" >> # configuration that operates using settings that are >> backwards-compatible >> # and interoperable with machines running older versions of >> Cassandra. >> # This version is provided to facilitate pain-free upgrades for >> existing >> # users of Cassandra running in production who want to gradually and >> # carefully introduce new features. >> # - cassandra_latest.yaml: Contains configuration defaults that enable >> # the latest features of Cassandra, including improved >> functionality as >> # well as higher performance. This version is provided for new >> users of >> # Cassandra who want to get the most out of their cluster, and for >> users >> # evaluating the technology. >> # To use this version, simply copy this file over cassandra.yaml, >> or specify >> # it using the -Dcassandra.config system property, e.g. by running >> # cassandra >> -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml >> # /NOTE >> Does this sound sensible? Should we add a pointer to this defaults set >> elsewhere in the documentation? >> >> Regards, >> Branimir >> >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
+1 to not doing, imo, the ostrich lol On 14/2/24 10:58, Jacek Lewandowski wrote: We should not block merging configuration changes given it is a valid configuration - which I understand as it is correct, passes all config validations, it matches documented rules, etc. And this provided latest config matches those requirements I assume. The failures should block release or we should not advertise we have those features at all, and the configuration should be named "experimental" rather than "latest". The config changes are not responsible for broken features and we should not bury our heads in the sand pretending that everything is ok. Thanks, śr., 14 lut 2024, 10:47 użytkownik Štefan Miklošovič napisał: Wording looks good to me. I would also put that into NEWS.txt but I am not sure what section. New features, Upgrading nor Deprecation does not seem to be a good category. On Tue, Feb 13, 2024 at 5:42 PM Branimir Lambov wrote: Hi All, CASSANDRA-18753 introduces a second set of defaults (in a separate "cassandra_latest.yaml") that enable new features of Cassandra. The objective is two-fold: to be able to test the database in this configuration, and to point potential users that are evaluating the technology to an optimized set of defaults that give a clearer picture of the expected performance of the database for a new user. The objective is to get this configuration into 5.0 to have the extra bit of confidence that we are not releasing (and recommending) options that have not gone through thorough CI. The implementation has already gone through review, but I'd like to get people's opinion on two things: - There are currently a number of test failures when the new options are selected, some of which appear to be genuine problems. Is the community okay with committing the patch before all of these are addressed? This should prevent the introduction of new failures and make sure we don't release before clearing the existing ones. - I'd like to get an opinion on what's suitable wording and documentation for the new defaults set. Currently, the patch proposes adding the following text to the yaml (see https://github.com/apache/cassandra/pull/2896/files): # NOTE: # This file is provided in two versions: # - cassandra.yaml: Contains configuration defaults for a "compatible" # configuration that operates using settings that are backwards-compatible # and interoperable with machines running older versions of Cassandra. # This version is provided to facilitate pain-free upgrades for existing # users of Cassandra running in production who want to gradually and # carefully introduce new features. # - cassandra_latest.yaml: Contains configuration defaults that enable # the latest features of Cassandra, including improved functionality as # well as higher performance. This version is provided for new users of # Cassandra who want to get the most out of their cluster, and for users # evaluating the technology. # To use this version, simply copy this file over cassandra.yaml, or specify # it using the -Dcassandra.config system property, e.g. by running # cassandra -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml # /NOTE Does this sound sensible? Should we add a pointer to this defaults set elsewhere in the documentation? Regards, Branimir
Re: [VOTE] Release Apache Cassandra 4.1.4
> > The vote will be open for 72 hours (longer if needed). Everyone who has > tested the build is invited to vote. Votes by PMC members are considered > binding. A vote passes if there are at least three binding +1s and no -1's. > +1 Checked - signing correct - checksums are correct - source artefact builds (JDK 8+11) - binary artefact runs (JDK 8+11) - debian package runs (JDK 8+11) - debian repo runs (JDK 8+11) - redhat* package runs (JDK 8+11) - redhat* repo runs (JDK 8+11)
Re: [Discuss] Introducing Flexible Authentication in Cassandra via Feature Flag
Hi, I think what Gaurav means is what we know at DataStax as transitional authenticator, which temporarily allows for partially enabled authentication - when the system allows the clients to authenticate but does not enforce it. All in all, that should be included in CEP-31 - also CEP-31 aims to let the administrators enable/disable and reconfigure authentication without a restart so we could discuss whether such transitional mode would be needed at all in that case. Thanks, - - -- --- - - Jacek Lewandowski wt., 13 lut 2024 o 07:04 Jeff Jirsa napisał(a): > Auth is one of those things that needs to be a bit more concrete > > In the scenario you describe, you already have an option to deploy the > auth in piece partially during the rollout (pause halfway through) in the > cluster and look for asymmetric connections, and the option to drop in a > new Authenticator jar in the class path that does the flexible auth you > describe > > I fear that the extra flexibility this allows for 1% of operations exposes > people to long term problems > > Have you considered just implementing the feature flag you describe using > the existing plugin infrastructure ? > > On Feb 12, 2024, at 9:47 PM, Gaurav Agarwal > wrote: > > > Dear Dinesh and Abe, > > Thank you for reviewing the document on enabling Cassandra authentication. > I apologize that I didn't initially include the following failure scenarios > where this feature could be particularly beneficial (I've included them > now): > > *Below are the failure scenarios:* > >- Incorrect credentials: If a client accidentally uses the wrong >username/password combination during the rollout, While restarting the >server to enable authentication, it will refuse connections with incorrect >credentials. This can temporarily interrupt the service until correct >credentials are sent. >- Missed service auth updates: In a large-scale system, a service "X" >might miss the credential update during rollout. After some server nodes >restart, service "X" might finally realize it needs correct credentials, >but it's too late. Nodes are already expecting authorized requests, and >this mismatch causes "X" to stop working on auth enabled and restarted >nodes. >- Infrequent traffic: Suppose one of the services only interacts with >the server once a week. Suppose it starts sending requests with incorrect >credentials after authentication is enabled. Since the entire cluster is >now running on authentication, the service's outdated credentials cause it >to be denied access, resulting in a service-wide outage. > > > The overall aim of the proposed feature flag would allow clients to > connect momentarily without authentication during the rollout, mitigating > these risks and ensuring a smoother transition. > > Thanks in advance for your continued review of the proposal. > > > > On Mon, Feb 12, 2024 at 2:24 PM Abe Ratnofsky wrote: > >> Hey Guarav, >> >> Thanks for your proposal. >> >> > disruptive, full-cluster restart, posing significant risks in live >> environments >> >> For configuration that isn't hot-reloadable, like providing a new >> IAuthenticator implementation, a rolling restart is required. But rolling >> restarts are zero-downtime and safe in production, as long as you pace them >> accordingly. >> >> In general, changing authenticators is a risky thing because it requires >> coordination with clients. To mitigate this risk and support clients while >> they transition between authenticators, I like the approach taken by >> MutualTlsWithPasswordFallbackAuthenticator: >> >> https://github.com/apache/cassandra/blob/bec6bfde1f3b6a782f123f9f9ff18072a97e379f/src/java/org/apache/cassandra/auth/MutualTlsWithPasswordFallbackAuthenticator.java#L34 >> >> If client certificates are available, then use those, otherwise use the >> existing PasswordAuthenticator that clients are already using. The existing >> IAuthenticator interface supports this transitional behavior well. >> >> Your proposal to include a new configuration for auth_enforcement_flag >> doesn't clearly cover how to transition from one authenticator to another. >> It says: >> >> > Soft: Operates in a monitoring mode without enforcing authentication >> >> Most users use authentication today, so auth_enforcement_flag=Soft would >> allow unauthenticated clients to connect to the database. >> >> -- >> Abe >> >> On Feb 12, 2024, at 2:44 PM, Gaurav Agarwal >> wrote: >> >> Dear Cassandra Community, >> >> I'm excited to share a proposal for a new feature that I believe would >> significantly enhance the platform's security and operational flexibility: *a >> flexible authentication mechanism implemented through a feature flag *. >> >> Currently, enforcing authentication in Cassandra requires a disruptive, >> full-cluster restart, posing significant risks in live environments. My >> proposal, the *auth_enforcement_flag*, addresses this challenge
[RESULT][VOTE] Release Apache Cassandra 4.1.4
The vote has passed with three binding +1s and no vetoes.
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
> When we have failing tests people do not spend the time to figure out if > their logic caused a regression and merge, making things more unstable… so > when we merge failing tests that leads to people merging even more failing > tests... What's the counter position to this Jacek / Berenguer? Mick and Ekaterina (and everyone really) - any thoughts on what test coverage, if any, we should commit to for this new configuration? Acknowledging that we already have *a lot* of CI that we run. On Wed, Feb 14, 2024, at 5:11 AM, Berenguer Blasi wrote: > +1 to not doing, imo, the ostrich lol > > On 14/2/24 10:58, Jacek Lewandowski wrote: >> We should not block merging configuration changes given it is a valid >> configuration - which I understand as it is correct, passes all config >> validations, it matches documented rules, etc. And this provided latest >> config matches those requirements I assume. >> >> The failures should block release or we should not advertise we have those >> features at all, and the configuration should be named "experimental" rather >> than "latest". >> >> The config changes are not responsible for broken features and we should not >> bury our heads in the sand pretending that everything is ok. >> >> Thanks, >> >> śr., 14 lut 2024, 10:47 użytkownik Štefan Miklošovič >> napisał: >>> Wording looks good to me. I would also put that into NEWS.txt but I am not >>> sure what section. New features, Upgrading nor Deprecation does not seem to >>> be a good category. >>> >>> On Tue, Feb 13, 2024 at 5:42 PM Branimir Lambov wrote: Hi All, CASSANDRA-18753 introduces a second set of defaults (in a separate "cassandra_latest.yaml") that enable new features of Cassandra. The objective is two-fold: to be able to test the database in this configuration, and to point potential users that are evaluating the technology to an optimized set of defaults that give a clearer picture of the expected performance of the database for a new user. The objective is to get this configuration into 5.0 to have the extra bit of confidence that we are not releasing (and recommending) options that have not gone through thorough CI. The implementation has already gone through review, but I'd like to get people's opinion on two things: - There are currently a number of test failures when the new options are selected, some of which appear to be genuine problems. Is the community okay with committing the patch before all of these are addressed? This should prevent the introduction of new failures and make sure we don't release before clearing the existing ones. - I'd like to get an opinion on what's suitable wording and documentation for the new defaults set. Currently, the patch proposes adding the following text to the yaml (see https://github.com/apache/cassandra/pull/2896/files): # NOTE: # This file is provided in two versions: # - cassandra.yaml: Contains configuration defaults for a "compatible" # configuration that operates using settings that are backwards-compatible # and interoperable with machines running older versions of Cassandra. # This version is provided to facilitate pain-free upgrades for existing # users of Cassandra running in production who want to gradually and # carefully introduce new features. # - cassandra_latest.yaml: Contains configuration defaults that enable # the latest features of Cassandra, including improved functionality as # well as higher performance. This version is provided for new users of # Cassandra who want to get the most out of their cluster, and for users # evaluating the technology. # To use this version, simply copy this file over cassandra.yaml, or specify # it using the -Dcassandra.config system property, e.g. by running # cassandra -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml # /NOTE Does this sound sensible? Should we add a pointer to this defaults set elsewhere in the documentation? Regards, Branimir
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
Cool stuff! This will make it easier to advance configuration defaults without affecting stable configuration. Wording looks good to me. +1 to include a NEWS.txt note. I'm ok with breaking trunk CI temporarily as long as failures are tracked and triaged/addressed before the next release. I haven't had the chance to look into CASSANDRA-18753 yet so apologies if this was already discussed but I have the following questions about handling 2 configuration files moving forward: 1) Will cassandra.yaml remain the default test config? Is the plan moving forward to require green CI for both configurations on pre-commit, or pre-release? 2) What will this mean for the release artifact, is the idea to continue shipping with the current cassandra.yaml or eventually switch to the optimized configuration (ie. 6.X) while making the legacy default configuration available via an optional flag? On Tue, Feb 13, 2024 at 11:42 AM Branimir Lambov wrote: > Hi All, > > CASSANDRA-18753 introduces a second set of defaults (in a separate > "cassandra_latest.yaml") that enable new features of Cassandra. The > objective is two-fold: to be able to test the database in this > configuration, and to point potential users that are evaluating the > technology to an optimized set of defaults that give a clearer picture of > the expected performance of the database for a new user. The objective is > to get this configuration into 5.0 to have the extra bit of confidence that > we are not releasing (and recommending) options that have not gone through > thorough CI. > > The implementation has already gone through review, but I'd like to get > people's opinion on two things: > - There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community > okay with committing the patch before all of these are addressed? This > should prevent the introduction of new failures and make sure we don't > release before clearing the existing ones. > - I'd like to get an opinion on what's suitable wording and documentation > for the new defaults set. Currently, the patch proposes adding the > following text to the yaml (see > https://github.com/apache/cassandra/pull/2896/files): > # NOTE: > # This file is provided in two versions: > # - cassandra.yaml: Contains configuration defaults for a "compatible" > # configuration that operates using settings that are > backwards-compatible > # and interoperable with machines running older versions of > Cassandra. > # This version is provided to facilitate pain-free upgrades for > existing > # users of Cassandra running in production who want to gradually and > # carefully introduce new features. > # - cassandra_latest.yaml: Contains configuration defaults that enable > # the latest features of Cassandra, including improved functionality > as > # well as higher performance. This version is provided for new users > of > # Cassandra who want to get the most out of their cluster, and for > users > # evaluating the technology. > # To use this version, simply copy this file over cassandra.yaml, or > specify > # it using the -Dcassandra.config system property, e.g. by running > # cassandra > -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml > # /NOTE > Does this sound sensible? Should we add a pointer to this defaults set > elsewhere in the documentation? > > Regards, > Branimir >
[RELEASE] Apache Cassandra 4.1.4 released
The Cassandra team is pleased to announce the release of Apache Cassandra version 4.1.4. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. https://cassandra.apache.org/ Downloads of source and binary distributions are listed in our download section: https://cassandra.apache.org/download/ This version is a bug fix release[1] on the 4.1 series. As always, please pay attention to the release notes[2] and Let us know[3] if you were to encounter any problem. [WARNING] Debian and RedHat package repositories have moved! Debian /etc/apt/sources.list.d/cassandra.sources.list and RedHat /etc/yum.repos.d/cassandra.repo files must be updated to the new repository URLs. For Debian it is now https://debian.cassandra.apache.org . For RedHat it is now https://redhat.cassandra.apache.org/41x/ . Enjoy! [1]: CHANGES.txt https://github.com/apache/cassandra/blob/cassandra-4.1.4/CHANGES.txt [2]: NEWS.txt https://github.com/apache/cassandra/blob/cassandra-4.1.4/NEWS.txt [3]: https://issues.apache.org/jira/browse/CASSANDRA
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
1) If there’s an “old compatible default” and “latest recommended settings”, when does the value in “old compatible default” get updated? Never? 2) If there are test failures with the new values, it seems REALLY IMPORTANT to make sure those test failures are discovered + fixed IN THE FUTURE TOO. If pushing new yaml into a different file makes us less likely to catch the failures in the future, it seems like we’re hurting ourselves. Branimir mentions this, but how do we ensure that we don’t let this pattern disguise future bugs? > On Feb 13, 2024, at 8:41 AM, Branimir Lambov wrote: > > Hi All, > > CASSANDRA-18753 introduces a second set of defaults (in a separate > "cassandra_latest.yaml") that enable new features of Cassandra. The objective > is two-fold: to be able to test the database in this configuration, and to > point potential users that are evaluating the technology to an optimized set > of defaults that give a clearer picture of the expected performance of the > database for a new user. The objective is to get this configuration into 5.0 > to have the extra bit of confidence that we are not releasing (and > recommending) options that have not gone through thorough CI. > > The implementation has already gone through review, but I'd like to get > people's opinion on two things: > - There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community okay > with committing the patch before all of these are addressed? This should > prevent the introduction of new failures and make sure we don't release > before clearing the existing ones. > - I'd like to get an opinion on what's suitable wording and documentation for > the new defaults set. Currently, the patch proposes adding the following text > to the yaml (see https://github.com/apache/cassandra/pull/2896/files): > # NOTE: > # This file is provided in two versions: > # - cassandra.yaml: Contains configuration defaults for a "compatible" > # configuration that operates using settings that are > backwards-compatible > # and interoperable with machines running older versions of Cassandra. > # This version is provided to facilitate pain-free upgrades for existing > # users of Cassandra running in production who want to gradually and > # carefully introduce new features. > # - cassandra_latest.yaml: Contains configuration defaults that enable > # the latest features of Cassandra, including improved functionality as > # well as higher performance. This version is provided for new users of > # Cassandra who want to get the most out of their cluster, and for users > # evaluating the technology. > # To use this version, simply copy this file over cassandra.yaml, or > specify > # it using the -Dcassandra.config system property, e.g. by running > # cassandra > -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml > # /NOTE > Does this sound sensible? Should we add a pointer to this defaults set > elsewhere in the documentation? > > Regards, > Branimir
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): > When we have failing tests people do not spend the time to figure out if > their logic caused a regression and merge, making things more unstable… so > when we merge failing tests that leads to people merging even more failing > tests... > > What's the counter position to this Jacek / Berenguer? > For how long are we going to deceive ourselves? Are we shipping those features or not? Perhaps it is also a good opportunity to distinguish subsets of tests which make sense to run with a configuration matrix. If we don't add those tests to the pre-commit pipeline, "people do not spend the time to figure out if their logic caused a regression and merge, making things more unstable…" I think it is much more valuable to test those various configurations rather than test against j11 and j17 separately. I can see a really little value in doing that.
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
I agree with Jacek, I don't quite understand why we are running the pipeline for j17 and j11 every time. I think this should be opt-in. Majority of the time, we are just refactoring and coding stuff for Cassandra where testing it for both jvms is just pointless and we _know_ that it will be fine in 11 and 17 too because we do not do anything special. If we find some subsystems where testing that on both jvms is crucial, we might do that, I just do not remember when it was last time that testing it in both j17 and j11 suddenly uncovered some bug. Seems more like a hassle. We might then test the whole pipeline with a different config basically for same time as we currently do. On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski < lewandowski.ja...@gmail.com> wrote: > śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): > >> When we have failing tests people do not spend the time to figure out if >> their logic caused a regression and merge, making things more unstable… so >> when we merge failing tests that leads to people merging even more failing >> tests... >> >> What's the counter position to this Jacek / Berenguer? >> > > For how long are we going to deceive ourselves? Are we shipping those > features or not? Perhaps it is also a good opportunity to distinguish > subsets of tests which make sense to run with a configuration matrix. > > If we don't add those tests to the pre-commit pipeline, "people do not > spend the time to figure out if their logic caused a regression and merge, > making things more unstable…" > I think it is much more valuable to test those various configurations > rather than test against j11 and j17 separately. I can see a really little > value in doing that. > > >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
Stefan, can you elaborate on what you are proposing? It's not clear (at least to me) what level of testing you're advocating for. Dropping testing both on dev branches, every commit, just on release? In addition, can you elaborate on what is a hassle about it? It's been a long time since I committed anything but I don't remember 2 JVMs (8 & 11) being a problem. Jon On Wed, Feb 14, 2024 at 2:35 PM Štefan Miklošovič < stefan.mikloso...@gmail.com> wrote: > I agree with Jacek, I don't quite understand why we are running the > pipeline for j17 and j11 every time. I think this should be opt-in. > Majority of the time, we are just refactoring and coding stuff for > Cassandra where testing it for both jvms is just pointless and we _know_ > that it will be fine in 11 and 17 too because we do not do anything > special. If we find some subsystems where testing that on both jvms is > crucial, we might do that, I just do not remember when it was last time > that testing it in both j17 and j11 suddenly uncovered some bug. Seems more > like a hassle. > > We might then test the whole pipeline with a different config basically > for same time as we currently do. > > On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski < > lewandowski.ja...@gmail.com> wrote: > >> śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): >> >>> When we have failing tests people do not spend the time to figure out if >>> their logic caused a regression and merge, making things more unstable… so >>> when we merge failing tests that leads to people merging even more failing >>> tests... >>> >>> What's the counter position to this Jacek / Berenguer? >>> >> >> For how long are we going to deceive ourselves? Are we shipping those >> features or not? Perhaps it is also a good opportunity to distinguish >> subsets of tests which make sense to run with a configuration matrix. >> >> If we don't add those tests to the pre-commit pipeline, "people do not >> spend the time to figure out if their logic caused a regression and merge, >> making things more unstable…" >> I think it is much more valuable to test those various configurations >> rather than test against j11 and j17 separately. I can see a really little >> value in doing that. >> >> >>
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
> > I'm ok with breaking trunk CI temporarily as long as failures are tracked > and triaged/addressed before the next release. >From the ticket, I understand it is meant for 5.0-rc I share this sentiment for the release we decide to ship with: > The failures should block release or we should not advertise we have those > features at all, and the configuration should be named "experimental" > rather than "latest". Is the community okay with committing the patch before all of these are > addressed? If we aim to fix everything before the next release 5.0-rc, we can commit CASSANDRA-18753 after the fixes are applied. If we are not going to do all the fixes anytime soon - I prefer to commit and have the failures and the tickets open. Otherwise, I can guarantee that I, personally, will forget some of those failures and miss them in time... and I am suspicious I won’t be the only one :-) This version is provided for new users of # Cassandra who want to get the > most out of their cluster and for users # evaluating the technology. >From reading this thread, we do not recommend using it straight into production but to experiment, gain trust, and then use it in production. Did I get it correctly? We need to confirm what it is and be sure it is clearly stated in the docs. Announcing this new yaml file under NEWS.txt features sounds reasonable to me. Or can we add a new separate section on top of NEWS.txt 5.0, dedicated only to the announcement of this new configuration file? Mick and Ekaterina (and everyone really) - any thoughts on what test > coverage we should commit to for this new configuration? Acknowledging that > we already have *a lot* of CI that we run. I do not have an immediate answer. I see there is some proposed CI configuration in the ticket. As far as I can tell from a quick look, the suggestion is to replace unit-trie with unit-latest (which exercises also tries) and the additional new jobs will be Python and Java DTests. (no new upgrade tests) On top of my mind - we probably need a cost-benefit analysis, risk analysis, and tradeoffs discussed - burnt resources vs manpower, early detection vs late discovery, or even prod issues. Experimental vs production-ready, etc Now, this question can have different answers depending on whether this is an experimental config or we recommend it for production use. I would expect new features to be enabled in this configuration and all tests to be run pre-commit with the default and the new YAML files. Is this a correct assumption? Probably done with a note on the ML. The question is, do we have enough resources in Jenkins to facilitate all this testing post-commit? > I think it is much more valuable to test those various configurations > rather than test against j11 and j17 separately. I can see a really little > value in doing that. Excellent point, I was saying for some time that IMHO we can reduce to running in CI at least pre-commit: 1) Build J11 2) build J17 3) run tests with build 11 + runtime 11 4) run tests with build 11 and runtime 17. Technically, that is what we also ship in 5.0. (Except the 2), the JDK17 build but we should not remove that from CI) Does it make sense to reduce to what I mentioned in 1,2,3,4 and instead add the suggested jobs with the new configuration from CASSANDRA-18753 in pre-commit? Please correct me if I am wrong, but I understand that running with JDK17 tests on the 17 build is experimental in CI, so we can gain confidence until the release when we will drop 11. No? If that is correct, I do not see why we run those tests on every pre-commit and not only what we ship. Best regards, Ekaterina On Wed, 14 Feb 2024 at 17:35, Štefan Miklošovič wrote: > I agree with Jacek, I don't quite understand why we are running the > pipeline for j17 and j11 every time. I think this should be opt-in. > Majority of the time, we are just refactoring and coding stuff for > Cassandra where testing it for both jvms is just pointless and we _know_ > that it will be fine in 11 and 17 too because we do not do anything > special. If we find some subsystems where testing that on both jvms is > crucial, we might do that, I just do not remember when it was last time > that testing it in both j17 and j11 suddenly uncovered some bug. Seems more > like a hassle. > > We might then test the whole pipeline with a different config basically > for same time as we currently do. > > On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski < > lewandowski.ja...@gmail.com> wrote: > >> śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): >> >>> When we have failing tests people do not spend the time to figure out if >>> their logic caused a regression and merge, making things more unstable… so >>> when we merge failing tests that leads to people merging even more failing >>> tests... >>> >>> What's the counter position to this Jacek / Berenguer? >>> >> >> For how long are we going to deceive ourselves? Are we shipping those >> features or not? Perhaps it is
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
Jon, I was mostly referring to Circle CI where we have two pre-commit workflows. (just click on anything here https://app.circleci.com/pipelines/github/instaclustr/cassandra) java17_pre-commit_tests This workflow is compiling & testing everything with Java 17 java11_pre-commit_tests This workflow is compiling with Java 11 and it contains jobs which are also run with Java 11 and another set of jobs which run with Java 17. The workflow I have so far is that when I want to merge something, it is required to formally provide builds for both workflows. Maybe I am doing more work than necessary here but my understanding is that this has to be done and it is required. I think that Jacek was talking also about this and that it is questionable what value it brings. On Thu, Feb 15, 2024 at 12:13 AM Jon Haddad wrote: > Stefan, can you elaborate on what you are proposing? It's not clear (at > least to me) what level of testing you're advocating for. Dropping testing > both on dev branches, every commit, just on release? In addition, can you > elaborate on what is a hassle about it? It's been a long time since I > committed anything but I don't remember 2 JVMs (8 & 11) being a problem. > > Jon > > > > On Wed, Feb 14, 2024 at 2:35 PM Štefan Miklošovič < > stefan.mikloso...@gmail.com> wrote: > >> I agree with Jacek, I don't quite understand why we are running the >> pipeline for j17 and j11 every time. I think this should be opt-in. >> Majority of the time, we are just refactoring and coding stuff for >> Cassandra where testing it for both jvms is just pointless and we _know_ >> that it will be fine in 11 and 17 too because we do not do anything >> special. If we find some subsystems where testing that on both jvms is >> crucial, we might do that, I just do not remember when it was last time >> that testing it in both j17 and j11 suddenly uncovered some bug. Seems more >> like a hassle. >> >> We might then test the whole pipeline with a different config basically >> for same time as we currently do. >> >> On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski < >> lewandowski.ja...@gmail.com> wrote: >> >>> śr., 14 lut 2024 o 17:30 Josh McKenzie >>> napisał(a): >>> When we have failing tests people do not spend the time to figure out if their logic caused a regression and merge, making things more unstable… so when we merge failing tests that leads to people merging even more failing tests... What's the counter position to this Jacek / Berenguer? >>> >>> For how long are we going to deceive ourselves? Are we shipping those >>> features or not? Perhaps it is also a good opportunity to distinguish >>> subsets of tests which make sense to run with a configuration matrix. >>> >>> If we don't add those tests to the pre-commit pipeline, "people do not >>> spend the time to figure out if their logic caused a regression and merge, >>> making things more unstable…" >>> I think it is much more valuable to test those various configurations >>> rather than test against j11 and j17 separately. I can see a really little >>> value in doing that. >>> >>> >>>
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
I share Jacek’s and Stefan’s sentiment about the low value of requiring precommit j11+j17 tests for all changes. Perhaps this was needed during j17 stabilization but is no longer required? Please correct if I’m missing some context. To have a practical proposal to address this, how about: 1) Define “standard” java version for branch (11 or 17). 2) Define “standard” cassandra.yaml variant for branch (legacy cassandra.yaml or shiny cassandra_latest.yaml). 3) Require green CI on precommit on standard java version + standard cassandra.yaml variant. 4) Any known java-related changes require precommit j11 + j17. 5) Any known configuration changes require precommit tests on all cassandra.yaml variants. 6) All supported java versions + cassandra.yaml variants need to be checked before a release is proposed, to catch any issue missed during 4) or 5). For example: - If j17 is set as “default” java version of the branch cassandra-5.0, then j11 tests are no longer required for patches that don’t touch java-related stuff - if cassandra_latest.yaml becomes the new default configuration for 6.0, then precommit only needs to be run against thatversion - prerelease needs to be run against all cassandra.yaml variants. Wdyt? On Wed, 14 Feb 2024 at 18:25 Štefan Miklošovič wrote: > Jon, > > I was mostly referring to Circle CI where we have two pre-commit > workflows. (just click on anything here > https://app.circleci.com/pipelines/github/instaclustr/cassandra) > > java17_pre-commit_tests > > This workflow is compiling & testing everything with Java 17 > > java11_pre-commit_tests > > This workflow is compiling with Java 11 and it contains jobs which are > also run with Java 11 and another set of jobs which run with Java 17. > > The workflow I have so far is that when I want to merge something, it is > required to formally provide builds for both workflows. Maybe I am doing > more work than necessary here but my understanding is that this has to be > done and it is required. > > I think that Jacek was talking also about this and that it is questionable > what value it brings. > > > > On Thu, Feb 15, 2024 at 12:13 AM Jon Haddad wrote: > >> Stefan, can you elaborate on what you are proposing? It's not clear (at >> least to me) what level of testing you're advocating for. Dropping testing >> both on dev branches, every commit, just on release? In addition, can you >> elaborate on what is a hassle about it? It's been a long time since I >> committed anything but I don't remember 2 JVMs (8 & 11) being a problem. >> >> Jon >> >> >> >> On Wed, Feb 14, 2024 at 2:35 PM Štefan Miklošovič < >> stefan.mikloso...@gmail.com> wrote: >> >>> I agree with Jacek, I don't quite understand why we are running the >>> pipeline for j17 and j11 every time. I think this should be opt-in. >>> Majority of the time, we are just refactoring and coding stuff for >>> Cassandra where testing it for both jvms is just pointless and we _know_ >>> that it will be fine in 11 and 17 too because we do not do anything >>> special. If we find some subsystems where testing that on both jvms is >>> crucial, we might do that, I just do not remember when it was last time >>> that testing it in both j17 and j11 suddenly uncovered some bug. Seems more >>> like a hassle. >>> >>> We might then test the whole pipeline with a different config basically >>> for same time as we currently do. >>> >>> On Wed, Feb 14, 2024 at 9:32 PM Jacek Lewandowski < >>> lewandowski.ja...@gmail.com> wrote: >>> śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): > When we have failing tests people do not spend the time to figure out > if their logic caused a regression and merge, making things more unstable… > so when we merge failing tests that leads to people merging even more > failing tests... > > What's the counter position to this Jacek / Berenguer? > For how long are we going to deceive ourselves? Are we shipping those features or not? Perhaps it is also a good opportunity to distinguish subsets of tests which make sense to run with a configuration matrix. If we don't add those tests to the pre-commit pipeline, "people do not spend the time to figure out if their logic caused a regression and merge, making things more unstable…" I think it is much more valuable to test those various configurations rather than test against j11 and j17 separately. I can see a really little value in doing that.
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
> If there’s an “old compatible default” and “latest recommended settings”, when does the value in “old compatible default” get updated? Never? How about replacing cassandra.yaml with cassandra_latest.yaml on trunk when cutting cassandra-6.0 branch? Any new default changes on trunk go to cassandra_latest.yaml. Basically major branch creation syncs cassandra_latest.yaml with cassandra.yaml on trunk, and default changes on trunk are added to cassandra_latest.yaml which will be eventually synced to cassandra.yaml when the next major is cut. On Wed, 14 Feb 2024 at 13:42 Jeff Jirsa wrote: > 1) If there’s an “old compatible default” and “latest recommended > settings”, when does the value in “old compatible default” get updated? > Never? > 2) If there are test failures with the new values, it seems REALLY > IMPORTANT to make sure those test failures are discovered + fixed IN THE > FUTURE TOO. If pushing new yaml into a different file makes us less likely > to catch the failures in the future, it seems like we’re hurting ourselves. > Branimir mentions this, but how do we ensure that we don’t let this pattern > disguise future bugs? > > > > > > On Feb 13, 2024, at 8:41 AM, Branimir Lambov wrote: > > Hi All, > > CASSANDRA-18753 introduces a second set of defaults (in a separate > "cassandra_latest.yaml") that enable new features of Cassandra. The > objective is two-fold: to be able to test the database in this > configuration, and to point potential users that are evaluating the > technology to an optimized set of defaults that give a clearer picture of > the expected performance of the database for a new user. The objective is > to get this configuration into 5.0 to have the extra bit of confidence that > we are not releasing (and recommending) options that have not gone through > thorough CI. > > The implementation has already gone through review, but I'd like to get > people's opinion on two things: > - There are currently a number of test failures when the new options are > selected, some of which appear to be genuine problems. Is the community > okay with committing the patch before all of these are addressed? This > should prevent the introduction of new failures and make sure we don't > release before clearing the existing ones. > - I'd like to get an opinion on what's suitable wording and documentation > for the new defaults set. Currently, the patch proposes adding the > following text to the yaml (see > https://github.com/apache/cassandra/pull/2896/files): > # NOTE: > # This file is provided in two versions: > # - cassandra.yaml: Contains configuration defaults for a "compatible" > # configuration that operates using settings that are > backwards-compatible > # and interoperable with machines running older versions of > Cassandra. > # This version is provided to facilitate pain-free upgrades for > existing > # users of Cassandra running in production who want to gradually and > # carefully introduce new features. > # - cassandra_latest.yaml: Contains configuration defaults that enable > # the latest features of Cassandra, including improved functionality > as > # well as higher performance. This version is provided for new users > of > # Cassandra who want to get the most out of their cluster, and for > users > # evaluating the technology. > # To use this version, simply copy this file over cassandra.yaml, or > specify > # it using the -Dcassandra.config system property, e.g. by running > # cassandra > -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml > # /NOTE > Does this sound sensible? Should we add a pointer to this defaults set > elsewhere in the documentation? > > Regards, > Branimir > > >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
> Perhaps it is also a good opportunity to distinguish subsets of tests which make sense to run with a configuration matrix. Agree. I think we should define a “standard/golden” configuration for each branch and minimally require precommit tests for that configuration. Assignees and reviewers can determine if additional test variants are required based on the patch scope. Nightly and prerelease tests can be run to catch any issues outside the standard configuration based on the supported configuration matrix. On Wed, 14 Feb 2024 at 15:32 Jacek Lewandowski wrote: > śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): > >> When we have failing tests people do not spend the time to figure out if >> their logic caused a regression and merge, making things more unstable… so >> when we merge failing tests that leads to people merging even more failing >> tests... >> >> What's the counter position to this Jacek / Berenguer? >> > > For how long are we going to deceive ourselves? Are we shipping those > features or not? Perhaps it is also a good opportunity to distinguish > subsets of tests which make sense to run with a configuration matrix. > > If we don't add those tests to the pre-commit pipeline, "people do not > spend the time to figure out if their logic caused a regression and merge, > making things more unstable…" > I think it is much more valuable to test those various configurations > rather than test against j11 and j17 separately. I can see a really little > value in doing that. > > >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
Something along what Paulo is proposing makes sense to me. To sum it up, knowing what workflows we have now: java17_pre-commit_tests java11_pre-commit_tests java17_separate_tests java11_separate_tests We would have couple more, together like: java17_pre-commit_tests java17_pre-commit_tests-latest-yaml java11_pre-commit_tests java11_pre-commit_tests-latest-yaml java17_separate_tests java17_separate_tests-default-yaml java11_separate_tests java11_separate_tests-latest-yaml To go over Paulo's plan, his steps 1-3 for 5.0 would result in requiring just one workflow java11_pre-commit_tests when no configuration is touched and two workflows java11_pre-commit_tests java11_pre-commit_tests-latest-yaml when there is some configuration change. Now the term "some configuration change" is quite tricky and it is not always easy to evaluate if both default and latest yaml workflows need to be executed. It might happen that a change is of such a nature that it does not change the configuration but it is necessary to verify that it still works with both scenarios. -latest.yaml config might be such that a change would make sense to do in isolation for default config only but it would not work with -latest.yaml too. I don't know if this is just a theoretical problem or not but my gut feeling is that we would be safer if we just required both default and latest yaml workflows together. Even if we do, we basically replace "two jvms" builds for "two yamls" builds but I consider "two yamls" builds to be more valuable in general than "two jvms" builds. It would take basically the same amount of time, we would just reoriented our building matrix from different jvms to different yamls. For releases we would for sure need to just run it across jvms too. On Thu, Feb 15, 2024 at 7:05 AM Paulo Motta wrote: > > Perhaps it is also a good opportunity to distinguish subsets of tests > which make sense to run with a configuration matrix. > > Agree. I think we should define a “standard/golden” configuration for each > branch and minimally require precommit tests for that configuration. > Assignees and reviewers can determine if additional test variants are > required based on the patch scope. > > Nightly and prerelease tests can be run to catch any issues outside the > standard configuration based on the supported configuration matrix. > > On Wed, 14 Feb 2024 at 15:32 Jacek Lewandowski < > lewandowski.ja...@gmail.com> wrote: > >> śr., 14 lut 2024 o 17:30 Josh McKenzie napisał(a): >> >>> When we have failing tests people do not spend the time to figure out if >>> their logic caused a regression and merge, making things more unstable… so >>> when we merge failing tests that leads to people merging even more failing >>> tests... >>> >>> What's the counter position to this Jacek / Berenguer? >>> >> >> For how long are we going to deceive ourselves? Are we shipping those >> features or not? Perhaps it is also a good opportunity to distinguish >> subsets of tests which make sense to run with a configuration matrix. >> >> If we don't add those tests to the pre-commit pipeline, "people do not >> spend the time to figure out if their logic caused a regression and merge, >> making things more unstable…" >> I think it is much more valuable to test those various configurations >> rather than test against j11 and j17 separately. I can see a really little >> value in doing that. >> >> >>
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
> > Excellent point, I was saying for some time that IMHO we can reduce > to running in CI at least pre-commit: > 1) Build J11 2) build J17 > 3) run tests with build 11 + runtime 11 > 4) run tests with build 11 and runtime 17. Ekaterina, I was thinking more about: 1) build J11 2) build J17 3) run tests with build J11 + runtime J11 4) run smoke tests with build J17 and runtime J17 Again, I don't see value in running build J11 and J17 runtime additionally to J11 runtime - just pick one unless we change something specific to JVM If we need to decide whether to test the latest or default, I think we should pick the latest because this is actually Cassandra 5.0 defined as a set of new features that will shine on the website. Also - we have configurations which test some features but they more like dimensions: - commit log compression - sstable compression - CDC - Trie memtables - Trie SSTable format - Extended deletion time ... Currently, with what we call the default configuration is tested with: - no compression, no CDC, no extended deletion time - *commit log compression + sstable compression*, no cdc, no extended deletion time - no compression, *CDC enabled*, no extended deletion time - no compression, no CDC, *enabled extended deletion time* This applies only to unit tests of course Then, are we going to test all of those scenarios with the "latest" configuration? I'm asking because the latest configuration is mostly about tries and UCS and has nothing to do with compression or CDC. Then why the default configuration should be tested more thoroughly than latest which enables essential Cassandra 5.0 features? I propose to significantly reduce that stuff. Let's distinguish the packages of tests that need to be run with CDC enabled / disabled, with commitlog compression enabled / disabled, tests that verify sstable formats (mostly io and index I guess), and leave other parameters set as with the latest configuration - this is the easiest way I think. For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about other stuff. To me running no-vnodes makes no sense because no-vnodes is just a special case of vnodes=1. On the other hand offheap/onheap buffers could be tested in unit tests. In short, I'd run dtests only with the default and latest configuration. Sorry for being too wordy, czw., 15 lut 2024 o 07:39 Štefan Miklošovič napisał(a): > Something along what Paulo is proposing makes sense to me. To sum it up, > knowing what workflows we have now: > > java17_pre-commit_tests > java11_pre-commit_tests > java17_separate_tests > java11_separate_tests > > We would have couple more, together like: > > java17_pre-commit_tests > java17_pre-commit_tests-latest-yaml > java11_pre-commit_tests > java11_pre-commit_tests-latest-yaml > java17_separate_tests > java17_separate_tests-default-yaml > java11_separate_tests > java11_separate_tests-latest-yaml > > To go over Paulo's plan, his steps 1-3 for 5.0 would result in requiring > just one workflow > > java11_pre-commit_tests > > when no configuration is touched and two workflows > > java11_pre-commit_tests > java11_pre-commit_tests-latest-yaml > > when there is some configuration change. > > Now the term "some configuration change" is quite tricky and it is not > always easy to evaluate if both default and latest yaml workflows need to > be executed. It might happen that a change is of such a nature that it does > not change the configuration but it is necessary to verify that it still > works with both scenarios. -latest.yaml config might be such that a change > would make sense to do in isolation for default config only but it would > not work with -latest.yaml too. I don't know if this is just a theoretical > problem or not but my gut feeling is that we would be safer if we just > required both default and latest yaml workflows together. > > Even if we do, we basically replace "two jvms" builds for "two yamls" > builds but I consider "two yamls" builds to be more valuable in general > than "two jvms" builds. It would take basically the same amount of time, we > would just reoriented our building matrix from different jvms to different > yamls. > > For releases we would for sure need to just run it across jvms too. > > On Thu, Feb 15, 2024 at 7:05 AM Paulo Motta wrote: > >> > Perhaps it is also a good opportunity to distinguish subsets of tests >> which make sense to run with a configuration matrix. >> >> Agree. I think we should define a “standard/golden” configuration for >> each branch and minimally require precommit tests for that configuration. >> Assignees and reviewers can determine if additional test variants are >> required based on the patch scope. >> >> Nightly and prerelease tests can be run to catch any issues outside the >> standard configuration based on the supported configuration matrix. >> >> On Wed, 14 Feb 2024 at 15:32 Jacek Lewandowski < >> lewandowski.ja...@gmail.com> wrote: >> >>> śr., 14 lut 2024 o 17:30 Josh McKenzie >
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
By the way, I am not sure if it is all completely transparent and understood by everybody but let me guide you through a typical patch which is meant to be applied from 4.0 to trunk (4 branches) to see how it looks like. I do not have the luxury of running CircleCI on 100 containers, I have just 25. So what takes around 2.5h for 100 containers takes around 6-7 for 25. That is a typical java11_pre-commit_tests for trunk. Then I have to provide builds for java17_pre-commit_tests too, that takes around 3-4 hours because it just tests less, let's round it up to 10 hours for trunk. Then I need to do this for 5.0 as well, basically double the time because as I am writing this the difference is not too big between these two branches. So 20 hours. Then I need to build 4.1 and 4.0 too, 4.0 is very similar to 4.1 when it comes to the number of tests, nevertheless, there are workflows for Java 8 and Java 11 for each so lets say this takes 10 hours again. So together I'm 35. To schedule all the builds, trigger them, monitor their progress etc is work in itself. I am scripting this like crazy to not touch the UI in Circle at all and I made my custom scripts which call Circle API and it triggers the builds from the console to speed this up because as soon as a developer is meant to be clicking around all day, needing to tracking the progress, it gets old pretty quickly. Thank god this is just a patch from 4.0, when it comes to 3.0 and 3.11 just add more hours to that. So all in all, a typical 4.0 - trunk patch is tested for two days at least, that's when all is nice and I do not need to rework it and rurun it again ... Does this all sound flexible and speedy enough for people? If we dropped the formal necessity to build various jvms it would significantly speed up the development. On Thu, Feb 15, 2024 at 8:10 AM Jacek Lewandowski < lewandowski.ja...@gmail.com> wrote: > Excellent point, I was saying for some time that IMHO we can reduce >> to running in CI at least pre-commit: >> 1) Build J11 2) build J17 >> 3) run tests with build 11 + runtime 11 >> 4) run tests with build 11 and runtime 17. > > > Ekaterina, I was thinking more about: > 1) build J11 > 2) build J17 > 3) run tests with build J11 + runtime J11 > 4) run smoke tests with build J17 and runtime J17 > > Again, I don't see value in running build J11 and J17 runtime > additionally to J11 runtime - just pick one unless we change something > specific to JVM > > If we need to decide whether to test the latest or default, I think we > should pick the latest because this is actually Cassandra 5.0 defined as a > set of new features that will shine on the website. > > Also - we have configurations which test some features but they more like > dimensions: > - commit log compression > - sstable compression > - CDC > - Trie memtables > - Trie SSTable format > - Extended deletion time > ... > > Currently, with what we call the default configuration is tested with: > - no compression, no CDC, no extended deletion time > - *commit log compression + sstable compression*, no cdc, no extended > deletion time > - no compression, *CDC enabled*, no extended deletion time > - no compression, no CDC, *enabled extended deletion time* > > This applies only to unit tests of course > > Then, are we going to test all of those scenarios with the "latest" > configuration? I'm asking because the latest configuration is mostly about > tries and UCS and has nothing to do with compression or CDC. Then why the > default configuration should be tested more thoroughly than latest which > enables essential Cassandra 5.0 features? > > I propose to significantly reduce that stuff. Let's distinguish the > packages of tests that need to be run with CDC enabled / disabled, with > commitlog compression enabled / disabled, tests that verify sstable formats > (mostly io and index I guess), and leave other parameters set as with the > latest configuration - this is the easiest way I think. > > For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about > other stuff. To me running no-vnodes makes no sense because no-vnodes is > just a special case of vnodes=1. On the other hand offheap/onheap buffers > could be tested in unit tests. In short, I'd run dtests only with the > default and latest configuration. > > Sorry for being too wordy, > > > czw., 15 lut 2024 o 07:39 Štefan Miklošovič > napisał(a): > >> Something along what Paulo is proposing makes sense to me. To sum it up, >> knowing what workflows we have now: >> >> java17_pre-commit_tests >> java11_pre-commit_tests >> java17_separate_tests >> java11_separate_tests >> >> We would have couple more, together like: >> >> java17_pre-commit_tests >> java17_pre-commit_tests-latest-yaml >> java11_pre-commit_tests >> java11_pre-commit_tests-latest-yaml >> java17_separate_tests >> java17_separate_tests-default-yaml >> java11_separate_tests >> java11_separate_tests-latest-yaml >> >> To go over Paulo's plan, his steps 1-3 for 5.0 would result
Re: [Discuss] "Latest" configuration for testing and evaluation (CASSANDRA-18753)
I fully understand you. Although I have that luxury to use more containers, I simply feel that rerunning the same code with different configurations which do not impact that code is just a waste of resources and money. - - -- --- - - Jacek Lewandowski czw., 15 lut 2024 o 08:41 Štefan Miklošovič napisał(a): > By the way, I am not sure if it is all completely transparent and > understood by everybody but let me guide you through a typical patch which > is meant to be applied from 4.0 to trunk (4 branches) to see how it looks > like. > > I do not have the luxury of running CircleCI on 100 containers, I have > just 25. So what takes around 2.5h for 100 containers takes around 6-7 for > 25. That is a typical java11_pre-commit_tests for trunk. Then I have to > provide builds for java17_pre-commit_tests too, that takes around 3-4 hours > because it just tests less, let's round it up to 10 hours for trunk. > > Then I need to do this for 5.0 as well, basically double the time because > as I am writing this the difference is not too big between these two > branches. So 20 hours. > > Then I need to build 4.1 and 4.0 too, 4.0 is very similar to 4.1 when it > comes to the number of tests, nevertheless, there are workflows for Java 8 > and Java 11 for each so lets say this takes 10 hours again. So together I'm > 35. > > To schedule all the builds, trigger them, monitor their progress etc is > work in itself. I am scripting this like crazy to not touch the UI in > Circle at all and I made my custom scripts which call Circle API and it > triggers the builds from the console to speed this up because as soon as a > developer is meant to be clicking around all day, needing to tracking the > progress, it gets old pretty quickly. > > Thank god this is just a patch from 4.0, when it comes to 3.0 and 3.11 > just add more hours to that. > > So all in all, a typical 4.0 - trunk patch is tested for two days at > least, that's when all is nice and I do not need to rework it and rurun it > again ... Does this all sound flexible and speedy enough for people? > > If we dropped the formal necessity to build various jvms it would > significantly speed up the development. > > > On Thu, Feb 15, 2024 at 8:10 AM Jacek Lewandowski < > lewandowski.ja...@gmail.com> wrote: > >> Excellent point, I was saying for some time that IMHO we can reduce >>> to running in CI at least pre-commit: >>> 1) Build J11 2) build J17 >>> 3) run tests with build 11 + runtime 11 >>> 4) run tests with build 11 and runtime 17. >> >> >> Ekaterina, I was thinking more about: >> 1) build J11 >> 2) build J17 >> 3) run tests with build J11 + runtime J11 >> 4) run smoke tests with build J17 and runtime J17 >> >> Again, I don't see value in running build J11 and J17 runtime >> additionally to J11 runtime - just pick one unless we change something >> specific to JVM >> >> If we need to decide whether to test the latest or default, I think we >> should pick the latest because this is actually Cassandra 5.0 defined as a >> set of new features that will shine on the website. >> >> Also - we have configurations which test some features but they more like >> dimensions: >> - commit log compression >> - sstable compression >> - CDC >> - Trie memtables >> - Trie SSTable format >> - Extended deletion time >> ... >> >> Currently, with what we call the default configuration is tested with: >> - no compression, no CDC, no extended deletion time >> - *commit log compression + sstable compression*, no cdc, no extended >> deletion time >> - no compression, *CDC enabled*, no extended deletion time >> - no compression, no CDC, *enabled extended deletion time* >> >> This applies only to unit tests of course >> >> Then, are we going to test all of those scenarios with the "latest" >> configuration? I'm asking because the latest configuration is mostly about >> tries and UCS and has nothing to do with compression or CDC. Then why the >> default configuration should be tested more thoroughly than latest which >> enables essential Cassandra 5.0 features? >> >> I propose to significantly reduce that stuff. Let's distinguish the >> packages of tests that need to be run with CDC enabled / disabled, with >> commitlog compression enabled / disabled, tests that verify sstable formats >> (mostly io and index I guess), and leave other parameters set as with the >> latest configuration - this is the easiest way I think. >> >> For dtests we have vnodes/no-vnodes, offheap/onheap, and nothing about >> other stuff. To me running no-vnodes makes no sense because no-vnodes is >> just a special case of vnodes=1. On the other hand offheap/onheap buffers >> could be tested in unit tests. In short, I'd run dtests only with the >> default and latest configuration. >> >> Sorry for being too wordy, >> >> >> czw., 15 lut 2024 o 07:39 Štefan Miklošovič >> napisał(a): >> >>> Something along what Paulo is proposing makes sense to me. To sum it up, >>> knowing what workflows we have now: >>>