The hypothetical concern described is around potential data resurrection - would you still use resumable bootstrap if you knew that data deleted during those STW pauses was improperly resurrected?
On Wed, Aug 3, 2022 at 2:40 PM Bowen Song via dev <[email protected]> wrote: > I have benefited from the resumable bootstrap before, and I'm in favour of > keeping the feature around. > > I've had streaming failures due to long STW GC pauses on some > bootstrapping nodes, and I had to resume the bootstrap once or twice in > order to get these nodes finish joinning the cluster. They had not > experienced more long STW GC pauses since they joined the cluster. I would > imagine I will spend a lots of time tuning the GC parameters in order get > these nodes to join if the resumable bootstrapping feature is removed. > Also, I'm not concerned about racing conditions involving repairs, because > we don't run repairs while we are adding new nodes (to minimize the > additional load on the cluster). > > > On 03/08/2022 19:46, Josh McKenzie wrote: > > Context: https://issues.apache.org/jira/browse/CASSANDRA-17679 > > From the .yaml comment on the param I was working on adding: > > In certain environments, operators may want to disable resumable bootstrap in > order to avoid potential correctness violations or data loss scenarios. > Largely this centers around nodes going down during bootstrap, tombstones > being written, and potential races with repair. By default we leave this on > as it's been enabled for quite some time, however the option to disable it is > more palatable now that we have zero copy streaming as that greatly > accelerates > > > Given zero copy streaming in the system and the general unexplored > correctness concerns of > https://issues.apache.org/jira/browse/CASSANDRA-8838, specifically > pointed out by Jeff here: > https://issues.apache.org/jira/browse/CASSANDRA-8838?focusedCommentId=16900234&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16900234, > I've > been chatting w/Paulo about this and we've both concluded we think the > functionality should be made configurable, default off (?), deprecated in > 4.2 and then completely removed next. > > - First: anyone have any concerns with the general arc of "remove > resumable bootstrap and decommission"? > - Second: Should we leave them enabled by default in 4.2 or disabled? > - Third: Should we consider revisiting older branches with this > functionality and making it toggle-able? > > ~Josh > >
