Re: [DISCUSS] Enabling check_data_resurrection by default in cassandra_latest.yaml

Isaac Reath Tue, 21 Apr 2026 09:59:43 -0700

Thank you to everyone who contributed to CASSANDRA-21293 and
CASSANDRA-21290 incorporate Chris's feedback and improve the reliability of
this feature!


Coming back to defaults, Chris mentions using a default minimum value of 3
hours as the amount of time a node is allowed to be down if any tables in
the cluster have a very small or 0 gc_grace_seconds. I think this makes
sense as a default value for the cassandra_latest.yaml implementation. Are
there any concerns with using 3 hours or suggestions for an
alternative value?

On Wed, Mar 25, 2026 at 12:42 PM Isaac Reath <[email protected]> wrote:

> Happy to add in the docs into the PR for CASSANDRA-21247 if there's
> nothing already available.
>
> On Wed, Mar 25, 2026 at 12:36 PM Štefan Miklošovič <[email protected]>
> wrote:
>
>> Hi Chris,
>>
>> If you have some time to put a patch together with these improvements
>> that would be great. I can definitely review.
>>
>> Regards
>>
>> On Wed, Mar 25, 2026 at 5:24 PM Chris Lohfink <[email protected]>
>> wrote:
>> >
>> > We enabled this across our fleet. We did make a couple small tweaks we
>> might wanna consider
>> > 1. (important one) if the process shuts down mid write you can end up
>> with a corrupt json hint file then the process refuses to start up. We
>> added fallback to the timestamp of the file and an atomic write.
>> > 2 is we made it a minimum of 3 hours which was because we do have a lot
>> of things that are set to 0 (or very short) gc_grace in the fleet and that
>> we don't care about. There should probably be a setting for minimum
>> threshold otherwise they can't really do anything other than delete
>> heartbeat after every restart
>> > 3. add some documentation to evaluate and delete heartbeat if its
>> blocking startup
>> >
>> > On Wed, Mar 25, 2026 at 10:17 AM Štefan Miklošovič <
>> [email protected]> wrote:
>> >>
>> >> Hi Isaac,
>> >>
>> >> I am fine with having that property set to true in
>> cassandra_latest.yaml only.
>> >>
>> >> Regards
>> >>
>> >> On Tue, Mar 24, 2026 at 10:05 PM Isaac Reath <[email protected]>
>> wrote:
>> >> >
>> >> > Hi all,
>> >> >
>> >> > There’s ongoing interest in preventing nodes from starting after
>> being offline longer than gc_grace_seconds, to avoid data resurrection
>> issues.
>> >> >
>> >> > This is already supported via `check_data_resurrection.enabled`
>> (added in 4.1 via CASSANDRA-17180), but it remains disabled by default.
>> Recent discussion in CASSANDRA-21221 suggests that operators may be unaware
>> of this setting and end up reimplementing similar safeguards themselves.
>> >> >
>> >> > Given that this feature has now been available in 4.1 and 5.0, I'd
>> like to propose enabling it by default in cassandra_latest.yaml for 6.0.
>> Are there any concerns with making this change?
>> >> >
>> >> > Isaac
>>
>

Re: [DISCUSS] Enabling check_data_resurrection by default in cassandra_latest.yaml

Reply via email to