Hi Andrés, thanks for this comprehensive CEP. I have a query about the representation of nodes in the various tables and responses.
In the sidecar_internal.cluster_ops_node_state table, "We store the node UUID instead of IP address to guarantee that the correct Cassandra nodes are restarted in case any node moves to another host.". However, in the main sidecar_internal.cluster_ops table the nodes participating in the operation are represented as a list of IP addresses. Likewise, in the sample HTTP responses nodes always appear to be identified by their address, not ID. It's true that operators are more accustomed to dealing with addresses than IDs but it's equally the case that the address is not a persistent node identifier, as noted in this CEP. For that reason, in C* trunk the emphasis is shifting to lean more on node IDs, so I feel it would be a retrograde step to introduce new APIs which include only addresses. Could the schema and API responses in this CEP be unified in some way, either by using IDs exclusively or by extending the node representation to something that can incorporate both an ID and address? Among other things, Accord relies on persistent node IDs for correctness and a unique and persistent identifier is now assigned to every node as a prerequisite for it joining the cluster. This ID is a simple integer and is encoded into the node's Host ID which is the UUID available in various system tables, gossip state and nodetool commands. The initial thinking behind encoding in the Host ID was to maintain compatibility with existing tooling but at some point we will start to expose the ID directly in more places. Right now there is a vtable which shows the IDs directly, system_views.cluster_metadata_directory. Thanks, Sam > On 8 Sep 2025, at 02:36, Andrés Beck-Ruiz <[email protected]> wrote: > > Hello all, > > Thanks for the feedback. I agree with the suggestions that operation state > storage should be pluggable, with an initial implementation leveraging > Cassandra as we have proposed. I have made edits to the Distributed Restart > Design section in the CEP to reflect this. > > > As for the API, I think the question that needs to be answered is if it is > > worthwhile to have a distinction between single-node operations and > > cluster-wide operations. For example, if I wanted to restart a single node > > using the API proposed in CEP-53, I could submit a restart job with a > > single node in the “nodes” list. This provides API simplicity at the cost > > of ergonomics. It also means that all inter-sidecar communication would go > > through the proposed cluster_ops_node_state table. Personally, I think > > these are acceptable tradeoffs to provide a unified API for operations that > > is simpler for a user or operator to use and learn. > > I agree that we should provide a unified API that does not distinguish > between single-node and cluster-wide operations. I think the benefit of API > simplicity from a development and client perspective outweighs the cost of > ergonomics. > > > A small question from my side: I see that the underlying assumption is that > > Sidecar is able to query Cassandra instances before bouncing/recognizing > > the bounce. What if it could not communicate with the Cassandra instance > > (e.g., binary protocol disabled, C* process experiencing issues, or C* > > process starting as part of a new DC)? > > This would fall under scenario #2 in the Error Handling section of the CEP. > If a Sidecar instance can’t communicate with Cassandra, after a configurable > timeout and amount of retries, the Sidecar instance should mark the job as > failed. > > > 1. Have we considered introducing the concept of a datacenter alongside > > cluster? > > I imagine there will be cases where a user wants to perform a rolling > > restart on a > > single datacenter rather than across the entire cluster. > > I think this could be added in the future, but for this initial > implementation an operator would submit the nodes part of a datacenter to > restart a datacenter. I prefer providing a unified API that can handle single > node and cluster (or datacenter) wide operations over separate APIs which > might be easier to use in isolation but complicate development and > discoverability. > > >2. Do we see this framework extending to other cluster- or datacenter-wide > >operations, > > such as scale-up/scale-down operations, or backups/restores, or nodetool > > rebuilds > > run as part of adding a new datacenter? > > Yes, our goal with this design is that it is extensible for future > operations, as well as currently supported operations (such as node > decommissions) that already exist in Sidecar. In the initial Cassandra > storage implementation, all inter-sidecar communication and operation > tracking could occur in the proposed cluster_ops_node_state table. > > > > The design seems focused on cluster/availability signals (ring stable, > > peers up), which is a great start, but doesn’t mention pluggable workload > > signals like: 1) compaction load (nodetool compactionstats) 2) netstats > > activity (nodetool netstats) 3) hints backlog / streaming pending flushes > > or memtable pressure. > > Since restarting during heavy compaction/hints can add risk, are these > > kinds of workload-aware checks in scope for the MVP, or considered future > > work? > > I agree that the health check should be pluggable as well— this was also > proposed in CEP-1. For the first iteration of rolling restarts, we are > thinking of providing a health check implementation that checks for all other > Cassandra peers being up, and future work can add more robust health checks. > > Best, > Andrés > > On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <[email protected]> > wrote: > Hello everyone, > > We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar > (https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar) > > This CEP builds off of CEP-1 and proposes a design for safe, efficient, and > operator friendly rolling restarts on Cassandra clusters, as well as an > extensible approach for persisting future cluster-wide operations in > Cassandra Sidecar. We hope to leverage this infrastructure in the future to > implement upgrade automation. > > We welcome all feedback and discussion. Thank you in advance for your time > and consideration of this proposal! > > Best, > Andrés and Paulo
