Hi Chris-- yes, the focus of the initial design is ensuring that Sidecar
can conduct a restart without affecting any downstream clients and while
maintaining availability.

For the initial implementation, I think that availability should be
prioritized. However, I do think that it should be possible for an operator
to skip health checks and/or allow for up to all Cassandra instances to be
restarted in parallel, if users value restart speed over availability--
this could be implemented with an "unsafeRestart" flag, maybe.

I think this should be left out of scope for the initial implementation,
but can be added in a future iteration.

On Mon, Sep 29, 2025 at 2:15 PM Christopher Bradford <[email protected]>
wrote:

> Hi Andrés,
>
> Non-blocking commentary on my part:
>
> I see the terms "distributed restart" and "rolling restart" used both in
> the CEP and in this thread. There are subtle differences between the two
> which should be explored. IMO a "*rolling*" restart focuses on
> availability, limiting any potential impact to downstream clients.
> Separately, a *distributed* restart could be perceived as a coordinated
> effort to restart a cluster with optional requirements around availability.
> I have worked with some users where the application of changes quickly is
> more important than availability to clients. In some cases this is handled
> by bringing down the cluster immediately, applying the change, and using a
> fast-path start (seeds first, then all other nodes simultaneously).
>
> Given the CEP title and implementation, focus on availability appears to
> be the clear goal here. Is there a follow-up goal for a "fast-path" restart
> which does not provide the availability guarantee? Maybe I missed if this
> is possible today in the proposed implementation.
>
> ~Chris
>
> Christopher Bradford
>
>
>
> On Mon, Sep 29, 2025 at 12:25 PM Andrés Beck-Ruiz <
> [email protected]> wrote:
>
>> Hello all, I will be bringing CEP-53 to a vote on Wednesday unless there
>> are any more outstanding comments. Thanks!
>>
>>
>> On Wed, Sep 24, 2025 at 5:49 PM Andrés Beck-Ruiz <
>> [email protected]> wrote:
>>
>>> Hey everyone, thanks again for all of the feedback. I've updated CEP-53
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar>
>>> to discuss how rolling restarts and future cluster-wide operations can be
>>> integrated into the existing operational job framework. For the most part,
>>> you can find these updates in the OperationalJob Framework
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework>,
>>> API
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-API>,
>>> and New or Changed Public Interfaces
>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-NeworChangedPublicInterfaces>
>>> sections. I also addressed any outstanding comments or questions that were
>>> raised in this discussion thread.
>>>
>>> I'll give folks a few days to look over the CEP again in case there are
>>> any other questions, and hope to start the voting process next week.
>>>
>>> On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <
>>> [email protected]> wrote:
>>>
>>>> Hello everyone,
>>>>
>>>> We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar
>>>> (
>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
>>>> )
>>>>
>>>> This CEP builds off of CEP-1
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated>
>>>> and proposes a design for safe, efficient, and operator friendly rolling
>>>> restarts on Cassandra clusters, as well as an extensible approach for
>>>> persisting future cluster-wide operations in Cassandra Sidecar. We hope to
>>>> leverage this infrastructure in the future to implement upgrade automation.
>>>>
>>>> We welcome all feedback and discussion. Thank you in advance for your
>>>> time and consideration of this proposal!
>>>>
>>>> Best,
>>>> Andrés and Paulo
>>>>
>>>

Reply via email to