Hi Andrés, Non-blocking commentary on my part:
I see the terms "distributed restart" and "rolling restart" used both in the CEP and in this thread. There are subtle differences between the two which should be explored. IMO a "*rolling*" restart focuses on availability, limiting any potential impact to downstream clients. Separately, a *distributed* restart could be perceived as a coordinated effort to restart a cluster with optional requirements around availability. I have worked with some users where the application of changes quickly is more important than availability to clients. In some cases this is handled by bringing down the cluster immediately, applying the change, and using a fast-path start (seeds first, then all other nodes simultaneously). Given the CEP title and implementation, focus on availability appears to be the clear goal here. Is there a follow-up goal for a "fast-path" restart which does not provide the availability guarantee? Maybe I missed if this is possible today in the proposed implementation. ~Chris Christopher Bradford On Mon, Sep 29, 2025 at 12:25 PM Andrés Beck-Ruiz <[email protected]> wrote: > Hello all, I will be bringing CEP-53 to a vote on Wednesday unless there > are any more outstanding comments. Thanks! > > > On Wed, Sep 24, 2025 at 5:49 PM Andrés Beck-Ruiz <[email protected]> > wrote: > >> Hey everyone, thanks again for all of the feedback. I've updated CEP-53 >> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar> >> to discuss how rolling restarts and future cluster-wide operations can be >> integrated into the existing operational job framework. For the most part, >> you can find these updates in the OperationalJob Framework >> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework>, >> API >> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-API>, >> and New or Changed Public Interfaces >> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-NeworChangedPublicInterfaces> >> sections. I also addressed any outstanding comments or questions that were >> raised in this discussion thread. >> >> I'll give folks a few days to look over the CEP again in case there are >> any other questions, and hope to start the voting process next week. >> >> On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz < >> [email protected]> wrote: >> >>> Hello everyone, >>> >>> We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar ( >>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar >>> ) >>> >>> This CEP builds off of CEP-1 >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated> >>> and proposes a design for safe, efficient, and operator friendly rolling >>> restarts on Cassandra clusters, as well as an extensible approach for >>> persisting future cluster-wide operations in Cassandra Sidecar. We hope to >>> leverage this infrastructure in the future to implement upgrade automation. >>> >>> We welcome all feedback and discussion. Thank you in advance for your >>> time and consideration of this proposal! >>> >>> Best, >>> Andrés and Paulo >>> >>
