Hi Chris-- yes, the focus of the initial design is ensuring that Sidecar can conduct a restart without affecting any downstream clients and while maintaining availability.
For the initial implementation, I think that availability should be prioritized. However, I do think that it should be possible for an operator to skip health checks and/or allow for up to all Cassandra instances to be restarted in parallel, if users value restart speed over availability-- this could be implemented with an "unsafeRestart" flag, maybe. I think this should be left out of scope for the initial implementation, but can be added in a future iteration. On Mon, Sep 29, 2025 at 2:15 PM Christopher Bradford <[email protected]> wrote: > Hi Andrés, > > Non-blocking commentary on my part: > > I see the terms "distributed restart" and "rolling restart" used both in > the CEP and in this thread. There are subtle differences between the two > which should be explored. IMO a "*rolling*" restart focuses on > availability, limiting any potential impact to downstream clients. > Separately, a *distributed* restart could be perceived as a coordinated > effort to restart a cluster with optional requirements around availability. > I have worked with some users where the application of changes quickly is > more important than availability to clients. In some cases this is handled > by bringing down the cluster immediately, applying the change, and using a > fast-path start (seeds first, then all other nodes simultaneously). > > Given the CEP title and implementation, focus on availability appears to > be the clear goal here. Is there a follow-up goal for a "fast-path" restart > which does not provide the availability guarantee? Maybe I missed if this > is possible today in the proposed implementation. > > ~Chris > > Christopher Bradford > > > > On Mon, Sep 29, 2025 at 12:25 PM Andrés Beck-Ruiz < > [email protected]> wrote: > >> Hello all, I will be bringing CEP-53 to a vote on Wednesday unless there >> are any more outstanding comments. Thanks! >> >> >> On Wed, Sep 24, 2025 at 5:49 PM Andrés Beck-Ruiz < >> [email protected]> wrote: >> >>> Hey everyone, thanks again for all of the feedback. I've updated CEP-53 >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar> >>> to discuss how rolling restarts and future cluster-wide operations can be >>> integrated into the existing operational job framework. For the most part, >>> you can find these updates in the OperationalJob Framework >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-OperationalJobFramework>, >>> API >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-API>, >>> and New or Changed Public Interfaces >>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-NeworChangedPublicInterfaces> >>> sections. I also addressed any outstanding comments or questions that were >>> raised in this discussion thread. >>> >>> I'll give folks a few days to look over the CEP again in case there are >>> any other questions, and hope to start the voting process next week. >>> >>> On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz < >>> [email protected]> wrote: >>> >>>> Hello everyone, >>>> >>>> We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar >>>> ( >>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar >>>> ) >>>> >>>> This CEP builds off of CEP-1 >>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-1%3A+Apache+Cassandra+Management+Process%28es%29+-+Deprecated> >>>> and proposes a design for safe, efficient, and operator friendly rolling >>>> restarts on Cassandra clusters, as well as an extensible approach for >>>> persisting future cluster-wide operations in Cassandra Sidecar. We hope to >>>> leverage this infrastructure in the future to implement upgrade automation. >>>> >>>> We welcome all feedback and discussion. Thank you in advance for your >>>> time and consideration of this proposal! >>>> >>>> Best, >>>> Andrés and Paulo >>>> >>>
