> so can cause repairs to deadlock forever

Small correction, I finished fixing the tests in CASSANDRA-19042 and we don’t 
deadlock, we timeout and fail repair if any of those messages are dropped.  

> On Feb 13, 2024, at 11:04 AM, David Capwell <dcapw...@apple.com> wrote:
> 
>> and to point potential users that are evaluating the technology to an 
>> optimized set of defaults
> 
> Left this comment in the GH… is there a reason all guardrails and reliability 
> (aka repair retries) configs are off by default?  They are off by default in 
> the normal config for backwards compatibility reasons, but if we are defining 
> a config saying what we recommend, we should enable these things by default 
> IMO.
> 
>> There are currently a number of test failures when the new options are 
>> selected, some of which appear to be genuine problems. Is the community okay 
>> with committing the patch before all of these are addressed?
> 
> I was tagged on CASSANDRA-19042, the paxos repair message handing does not 
> have the repair reliably improvements that 5.0 have, so can cause repairs to 
> deadlock forever (same as current 4.x repairs).  Bringing these up to par 
> with the rest of repair would be very much welcome (they are also lacking 
> visibility, so need to fallback to heap dumps to see what’s going on; same as 
> 4.0.x but not 4.1.x), but I doubt I have cycles to do that…. This refactor is 
> not 100% trivial as it has fun subtle concurrency issues to address (message 
> retries and dedupping), and making sure this logic works with the existing 
> repair simulation tests does require refactoring how the paxos cleanup state 
> is tracked, which could have subtle consequents.
> 
> I do think this should be fixed, but should it block 5.0?  Not sure… will 
> leave to others….
> 
> Should we merge the configs breaking these tests?  No…. When we have failing 
> tests people do not spend the time to figure out if their logic caused a 
> regression and merge, making things more unstable… so when we merge failing 
> tests that leads to people merging even more failing tests...
> 
>> On Feb 13, 2024, at 8:41 AM, Branimir Lambov <blam...@apache.org> wrote:
>> 
>> Hi All,
>> 
>> CASSANDRA-18753 introduces a second set of defaults (in a separate 
>> "cassandra_latest.yaml") that enable new features of Cassandra. The 
>> objective is two-fold: to be able to test the database in this 
>> configuration, and to point potential users that are evaluating the 
>> technology to an optimized set of defaults that give a clearer picture of 
>> the expected performance of the database for a new user. The objective is to 
>> get this configuration into 5.0 to have the extra bit of confidence that we 
>> are not releasing (and recommending) options that have not gone through 
>> thorough CI.
>> 
>> The implementation has already gone through review, but I'd like to get 
>> people's opinion on two things:
>> - There are currently a number of test failures when the new options are 
>> selected, some of which appear to be genuine problems. Is the community okay 
>> with committing the patch before all of these are addressed? This should 
>> prevent the introduction of new failures and make sure we don't release 
>> before clearing the existing ones.
>> - I'd like to get an opinion on what's suitable wording and documentation 
>> for the new defaults set. Currently, the patch proposes adding the following 
>> text to the yaml (see https://github.com/apache/cassandra/pull/2896/files):
>> # NOTE:
>> #   This file is provided in two versions:
>> #     - cassandra.yaml: Contains configuration defaults for a "compatible"
>> #       configuration that operates using settings that are 
>> backwards-compatible
>> #       and interoperable with machines running older versions of Cassandra.
>> #       This version is provided to facilitate pain-free upgrades for 
>> existing
>> #       users of Cassandra running in production who want to gradually and
>> #       carefully introduce new features.
>> #     - cassandra_latest.yaml: Contains configuration defaults that enable
>> #       the latest features of Cassandra, including improved functionality as
>> #       well as higher performance. This version is provided for new users of
>> #       Cassandra who want to get the most out of their cluster, and for 
>> users
>> #       evaluating the technology.
>> #       To use this version, simply copy this file over cassandra.yaml, or 
>> specify
>> #       it using the -Dcassandra.config system property, e.g. by running
>> #         cassandra 
>> -Dcassandra.config=file:/$CASSANDRA_HOME/conf/cassandra_latest.yaml
>> # /NOTE
>> Does this sound sensible? Should we add a pointer to this defaults set 
>> elsewhere in the documentation?
>> 
>> Regards,
>> Branimir
> 

Reply via email to