> so can cause repairs to deadlock forever Small correction, I finished fixing the tests in CASSANDRA-19042 and we don’t deadlock, we timeout and fail repair if any of those messages are dropped.
> On Feb 13, 2024, at 11:04 AM, David Capwell <dcapw...@apple.com> wrote: > >> and to point potential users that are evaluating the technology to an >> optimized set of defaults > > Left this comment in the GH… is there a reason all guardrails and reliability > (aka repair retries) configs are off by default? They are off by default in > the normal config for backwards compatibility reasons, but if we are defining > a config saying what we recommend, we should enable these things by default > IMO. > >> There are currently a number of test failures when the new options are >> selected, some of which appear to be genuine problems. Is the community okay >> with committing the patch before all of these are addressed? > > I was tagged on CASSANDRA-19042, the paxos repair message handing does not > have the repair reliably improvements that 5.0 have, so can cause repairs to > deadlock forever (same as current 4.x repairs). Bringing these up to par > with the rest of repair would be very much welcome (they are also lacking > visibility, so need to fallback to heap dumps to see what’s going on; same as > 4.0.x but not 4.1.x), but I doubt I have cycles to do that…. This refactor is > not 100% trivial as it has fun subtle concurrency issues to address (message > retries and dedupping), and making sure this logic works with the existing > repair simulation tests does require refactoring how the paxos cleanup state > is tracked, which could have subtle consequents. > > I do think this should be fixed, but should it block 5.0? Not sure… will > leave to others…. > > Should we merge the configs breaking these tests? No…. When we have failing > tests people do not spend the time to figure out if their logic caused a > regression and merge, making things more unstable… so when we merge failing > tests that leads to people merging even more failing tests... > >> On Feb 13, 2024, at 8:41 AM, Branimir Lambov <blam...@apache.org> wrote: >> >> Hi All, >> >> CASSANDRA-18753 introduces a second set of defaults (in a separate >> "cassandra_latest.yaml") that enable new features of Cassandra. The >> objective is two-fold: to be able to test the database in this >> configuration, and to point potential users that are evaluating the >> technology to an optimized set of defaults that give a clearer picture of >> the expected performance of the database for a new user. The objective is to >> get this configuration into 5.0 to have the extra bit of confidence that we >> are not releasing (and recommending) options that have not gone through >> thorough CI. >> >> The implementation has already gone through review, but I'd like to get >> people's opinion on two things: >> - There are currently a number of test failures when the new options are >> selected, some of which appear to be genuine problems. Is the community okay >> with committing the patch before all of these are addressed? This should >> prevent the introduction of new failures and make sure we don't release >> before clearing the existing ones. >> - I'd like to get an opinion on what's suitable wording and documentation >> for the new defaults set. Currently, the patch proposes adding the following >> text to the yaml (see https://github.com/apache/cassandra/pull/2896/files): >> # NOTE: >> # This file is provided in two versions: >> # - cassandra.yaml: Contains configuration defaults for a "compatible" >> # configuration that operates using settings that are >> backwards-compatible >> # and interoperable with machines running older versions of Cassandra. >> # This version is provided to facilitate pain-free upgrades for >> existing >> # users of Cassandra running in production who want to gradually and >> # carefully introduce new features. >> # - cassandra_latest.yaml: Contains configuration defaults that enable >> # the latest features of Cassandra, including improved functionality as >> # well as higher performance. This version is provided for new users of >> # Cassandra who want to get the most out of their cluster, and for >> users >> # evaluating the technology. >> # To use this version, simply copy this file over cassandra.yaml, or >> specify >> # it using the -Dcassandra.config system property, e.g. by running >> # cassandra >> -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml >> # /NOTE >> Does this sound sensible? Should we add a pointer to this defaults set >> elsewhere in the documentation? >> >> Regards, >> Branimir >