Thanks everyone for the feedback. +1 to using the term 'cluster-wide operations'.
> The only suggestion I have is to keep in mind the pluggability aspect of > Sidecar. For example, for the Distributed Restart portion of the work, we > should consider making interfaces that would allow us to potentially move > the responsibility of keeping the state outside of Cassandra. Are you referring to tracking the state of a restart job (and cluster-wide operations in general) outside of sidecar_internal Cassandra tables? > What do you think about broadening the scope of the CEP to propose a way (API) to perform bulk operations, and propose the current Rolling restarts as the first implementation for that bulk operations API? I’m proposing this as I see value to reuse this proposal for other bulk operations such as enabling CDC (it requires enabling cdc on cassandra.yml and some other > operations) for better supporting CEP-44. We propose a way to persist and monitor cluster-wide operations in the new sidecar_internal system tables. ( https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-CassandraSidecarSystemTables). I think it would make sense to also generalize the API to apply to cluster-wide operations. I'm curious about any feedback on whether this should be a separate API from the current operational job framework and live under the /cluster resource. We've discussed why we didn't propose to use the existing API and how the current framework would need to be extended here ( https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework ). > I’m not quite sold on using a PATCH to move from pending state to running state. Quick question, what is the goal of the pending state? I see a PATCH operation as modifying part of an object data. In this case, modifying the state looks like a change on the operation state, not on its metadata. I’d love to hear your thoughts on this one. The "PENDING" state allows for an operator to double check a submitted cluster-wide operation, which could have unintended consequences, before starting it. For example, performing a rolling restart could prevent other operations on the cluster that might be scheduled or needed, such as replacing a Cassandra instance. While an operator should be able to abort a restart job, I see value in having this guard against operator error. Given that we are applying a partial update to the resource, which in this context would be the restart job, we chose PATCH for this API. Best, Andrés On Tue, Sep 2, 2025 at 12:33 PM Dinesh Joshi <[email protected]> wrote: > I would like to chime in and say that we need to refine our vocabulary. > The term 'bulk commands' was used originally in CEP-1. This is my fault > totally as I originally wrote that down. But over time it has caused > confusion. I believe 'cluster-wide operations' is a better term to describe > those operations. We have also used 'Bulk' in the context of CEP-28 which > means something rather different which leads to confusion. So I propose > using the term 'cluster-wide operations' for operations that have to be run > across all nodes in the cluster. > > Thanks, > > Dinesh > > > On Tue, Sep 2, 2025 at 9:21 AM Bernardo Botella < > [email protected]> wrote: > >> This is an incredible contribution. Thanks a lot! >> >> Now, let me throw some thoughts :-) >> >> Rolling restarts is a great example of a broader feature that could be >> seen as bulk operations on a cluster via Sidecar. >> >> What do you think about broadening the scope of the CEP to propose a way >> (API) to perform bulk operations, and propose the current Rolling restarts >> as the first implementation for that bulk operations API? I’m proposing >> this as I see value to reuse this proposal for other bulk operations such >> as enabling CDC (it requires enabling cdc on cassandra.yml and some other >> operations) for better supporting CEP-44. >> >> I’m not quite sold on using a PATCH to move from pending state to running >> state. Quick question, what is the goal of the pending state? I see a PATCH >> operation as modifying part of an object data. In this case, modifying the >> state looks like a change on the operation state, not on its metadata. I’d >> love to hear your thoughts on this one. >> >> Again, thanks a lot for the contribution! >> Bernardo >> >> >> > On Aug 30, 2025, at 7:02 AM, Francisco Guerrero <[email protected]> >> wrote: >> > >> > Thanks Andrés for the CEP. This is a great contribution to the project >> and >> > aligns with the original intent of the Sidecar stated in CEP-1. I've >> gone >> > over the CEP details and it is consistent with the internals of Sidecar. >> > >> > The only suggestion I have is to keep in mind the pluggability aspect of >> > Sidecar. For example, for the Distributed Restart portion of the work, >> we >> > should consider making interfaces that would allow us to potentially >> move >> > the responsibility of keeping the state outside of Cassandra. >> > >> > Best, >> > - Francisco >> > >> > On 2025/08/29 19:56:08 Andrés Beck-Ruiz wrote: >> >> Hello everyone, >> >> >> >> We would like to propose CEP 53: Cassandra Rolling Restarts via >> Sidecar ( >> >> >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar >> >> ) >> >> >> >> This CEP builds off of CEP-1 >> >> < >> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated >> > >> >> and proposes a design for safe, efficient, and operator friendly >> rolling >> >> restarts on Cassandra clusters, as well as an extensible approach for >> >> persisting future cluster-wide operations in Cassandra Sidecar. We >> hope to >> >> leverage this infrastructure in the future to implement upgrade >> automation. >> >> >> >> We welcome all feedback and discussion. Thank you in advance for your >> time >> >> and consideration of this proposal! >> >> >> >> Best, >> >> Andrés and Paulo >> >> >> >>
