Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Andrés Beck-Ruiz Tue, 02 Sep 2025 11:58:21 -0700

Thanks everyone for the feedback. +1 to using the term 'cluster-wide
operations'.


> The only suggestion I have is to keep in mind the pluggability aspect of
> Sidecar. For example, for the Distributed Restart portion of the work, we
> should consider making interfaces that would allow us to potentially move
> the responsibility of keeping the state outside of Cassandra.

Are you referring to tracking the state of a restart job (and cluster-wide
operations in general) outside of sidecar_internal Cassandra tables?

> What do you think about broadening the scope of the CEP to propose a way
(API) to perform bulk operations, and propose the current Rolling restarts
as the first implementation for that bulk operations API? I’m proposing
this as I see value to reuse this proposal for other bulk operations such
as enabling CDC (it requires enabling cdc on cassandra.yml and some other
> operations) for better supporting CEP-44.

We propose a way to persist and monitor cluster-wide operations in the new
sidecar_internal system tables. (
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-CassandraSidecarSystemTables).
I think it would make sense to also generalize the API to apply to
cluster-wide operations. I'm curious about any feedback on whether this
should be a separate API from the current operational job framework and
live under the /cluster resource. We've discussed why we didn't propose to
use the existing API and how the current framework would need to be
extended here (
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework
).

> I’m not quite sold on using a PATCH to move from pending state to running
state. Quick question, what is the goal of the pending state? I see a PATCH
operation as modifying part of an object data. In this case, modifying the
state looks like a change on the operation state, not on its metadata. I’d
love to hear your thoughts on this one.

The "PENDING" state allows for an operator to double check a submitted
cluster-wide operation, which could have unintended consequences, before
starting it. For example, performing a rolling restart could prevent other
operations on the cluster that might be scheduled or needed, such as
replacing a Cassandra instance. While an operator should be able to abort a
restart job, I see value in having this guard against operator error.

Given that we are applying a partial update to the resource, which in this
context would be the restart job, we chose PATCH for this API.

Best,
Andrés


On Tue, Sep 2, 2025 at 12:33 PM Dinesh Joshi <[email protected]> wrote:

> I would like to chime in and say that we need to refine our vocabulary.
> The term 'bulk commands' was used originally in CEP-1. This is my fault
> totally as I originally wrote that down. But over time it has caused
> confusion. I believe 'cluster-wide operations' is a better term to describe
> those operations. We have also used 'Bulk' in the context of CEP-28 which
> means something rather different which leads to confusion. So I propose
> using the term 'cluster-wide operations' for operations that have to be run
> across all nodes in the cluster.
>
> Thanks,
>
> Dinesh
>
>
> On Tue, Sep 2, 2025 at 9:21 AM Bernardo Botella <
> [email protected]> wrote:
>
>> This is an incredible contribution. Thanks a lot!
>>
>> Now, let me throw some thoughts :-)
>>
>> Rolling restarts is a great example of a broader feature that could be
>> seen as bulk operations on a cluster via Sidecar.
>>
>> What do you think about broadening the scope of the CEP to propose a way
>> (API) to perform bulk operations, and propose the current Rolling restarts
>> as the first implementation for that bulk operations API? I’m proposing
>> this as I see value to reuse this proposal for other bulk operations such
>> as enabling CDC (it requires enabling cdc on cassandra.yml and some other
>> operations) for better supporting CEP-44.
>>
>> I’m not quite sold on using a PATCH to move from pending state to running
>> state. Quick question, what is the goal of the pending state? I see a PATCH
>> operation as modifying part of an object data. In this case, modifying the
>> state looks like a change on the operation state, not on its metadata. I’d
>> love to hear your thoughts on this one.
>>
>> Again, thanks a lot for the contribution!
>> Bernardo
>>
>>
>> > On Aug 30, 2025, at 7:02 AM, Francisco Guerrero <[email protected]>
>> wrote:
>> >
>> > Thanks Andrés for the CEP. This is a great contribution to the project
>> and
>> > aligns with the original intent of the Sidecar stated in CEP-1. I've
>> gone
>> > over the CEP details and it is consistent with the internals of Sidecar.
>> >
>> > The only suggestion I have is to keep in mind the pluggability aspect of
>> > Sidecar. For example, for the Distributed Restart portion of the work,
>> we
>> > should consider making interfaces that would allow us to potentially
>> move
>> > the responsibility of keeping the state outside of Cassandra.
>> >
>> > Best,
>> > - Francisco
>> >
>> > On 2025/08/29 19:56:08 Andrés Beck-Ruiz wrote:
>> >> Hello everyone,
>> >>
>> >> We would like to propose CEP 53: Cassandra Rolling Restarts via
>> Sidecar (
>> >>
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
>> >> )
>> >>
>> >> This CEP builds off of CEP-1
>> >> <
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated
>> >
>> >> and proposes a design for safe, efficient, and operator friendly
>> rolling
>> >> restarts on Cassandra clusters, as well as an extensible approach for
>> >> persisting future cluster-wide operations in Cassandra Sidecar. We
>> hope to
>> >> leverage this infrastructure in the future to implement upgrade
>> automation.
>> >>
>> >> We welcome all feedback and discussion. Thank you in advance for your
>> time
>> >> and consideration of this proposal!
>> >>
>> >> Best,
>> >> Andrés and Paulo
>> >>
>>
>>

Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Reply via email to