Hi Andrés,

Non-blocking commentary on my part:

I see the terms "distributed restart" and "rolling restart" used both in
the CEP and in this thread. There are subtle differences between the two
which should be explored. IMO a "*rolling*" restart focuses on
availability, limiting any potential impact to downstream clients.
Separately, a *distributed* restart could be perceived as a coordinated
effort to restart a cluster with optional requirements around availability.
I have worked with some users where the application of changes quickly is
more important than availability to clients. In some cases this is handled
by bringing down the cluster immediately, applying the change, and using a
fast-path start (seeds first, then all other nodes simultaneously).

Given the CEP title and implementation, focus on availability appears to be
the clear goal here. Is there a follow-up goal for a "fast-path" restart
which does not provide the availability guarantee? Maybe I missed if this
is possible today in the proposed implementation.

~Chris

Christopher Bradford



On Mon, Sep 29, 2025 at 12:25 PM Andrés Beck-Ruiz <[email protected]>
wrote:

> Hello all, I will be bringing CEP-53 to a vote on Wednesday unless there
> are any more outstanding comments. Thanks!
>
>
> On Wed, Sep 24, 2025 at 5:49 PM Andrés Beck-Ruiz <[email protected]>
> wrote:
>
>> Hey everyone, thanks again for all of the feedback. I've updated CEP-53
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar>
>> to discuss how rolling restarts and future cluster-wide operations can be
>> integrated into the existing operational job framework. For the most part,
>> you can find these updates in the OperationalJob Framework
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework>,
>> API
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-API>,
>> and New or Changed Public Interfaces
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-NeworChangedPublicInterfaces>
>> sections. I also addressed any outstanding comments or questions that were
>> raised in this discussion thread.
>>
>> I'll give folks a few days to look over the CEP again in case there are
>> any other questions, and hope to start the voting process next week.
>>
>> On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <
>> [email protected]> wrote:
>>
>>> Hello everyone,
>>>
>>> We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar (
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
>>> )
>>>
>>> This CEP builds off of CEP-1
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated>
>>> and proposes a design for safe, efficient, and operator friendly rolling
>>> restarts on Cassandra clusters, as well as an extensible approach for
>>> persisting future cluster-wide operations in Cassandra Sidecar. We hope to
>>> leverage this infrastructure in the future to implement upgrade automation.
>>>
>>> We welcome all feedback and discussion. Thank you in advance for your
>>> time and consideration of this proposal!
>>>
>>> Best,
>>> Andrés and Paulo
>>>
>>

Reply via email to