Our original thinking was that we could store node UUIDs but return IP addresses so that operators can better identify Cassandra nodes, but I recognize that it could also cause confusion as the address is not a persistent node identifier. I agree that the benefit of unifying the API and schema by using node UUIDs exclusively outweighs the cost of API ergonomics. I can update the CEP to reflect this unless there are differing opinions.
Best, Andrés On Mon, Sep 8, 2025 at 5:03 AM Sam Tunnicliffe <[email protected]> wrote: > Hi Andrés, thanks for this comprehensive CEP. > > I have a query about the representation of nodes in the various tables and > responses. > > In the sidecar_internal.cluster_ops_node_state table, "We store the node > UUID instead of IP address to guarantee that the correct Cassandra nodes > are restarted in case any node moves to another host.". However, in the > main sidecar_internal.cluster_ops table the nodes participating in the > operation are represented as a list of IP addresses. Likewise, in the > sample HTTP responses nodes always appear to be identified by their > address, not ID. > > It's true that operators are more accustomed to dealing with addresses > than IDs but it's equally the case that the address is not a persistent > node identifier, as noted in this CEP. For that reason, in C* trunk the > emphasis is shifting to lean more on node IDs, so I feel it would be a > retrograde step to introduce new APIs which include only addresses. Could > the schema and API responses in this CEP be unified in some way, either by > using IDs exclusively or by extending the node representation to something > that can incorporate both an ID and address? > > Among other things, Accord relies on persistent node IDs for correctness > and a unique and persistent identifier is now assigned to every node as a > prerequisite for it joining the cluster. This ID is a simple integer and is > encoded into the node's Host ID which is the UUID available in various > system tables, gossip state and nodetool commands. The initial thinking > behind encoding in the Host ID was to maintain compatibility with existing > tooling but at some point we will start to expose the ID directly in more > places. Right now there is a vtable which shows the IDs directly, > system_views.cluster_metadata_directory. > > Thanks, > Sam > > > On 8 Sep 2025, at 02:36, Andrés Beck-Ruiz <[email protected]> > wrote: > > > > Hello all, > > > > Thanks for the feedback. I agree with the suggestions that operation > state storage should be pluggable, with an initial implementation > leveraging Cassandra as we have proposed. I have made edits to the > Distributed Restart Design section in the CEP to reflect this. > > > > > As for the API, I think the question that needs to be answered is if > it is > > > worthwhile to have a distinction between single-node operations and > > > cluster-wide operations. For example, if I wanted to restart a single > node > > > using the API proposed in CEP-53, I could submit a restart job with a > > > single node in the “nodes” list. This provides API simplicity at the > cost > > > of ergonomics. It also means that all inter-sidecar communication > would go > > > through the proposed cluster_ops_node_state table. Personally, I think > > > these are acceptable tradeoffs to provide a unified API for operations > that > > > is simpler for a user or operator to use and learn. > > > > I agree that we should provide a unified API that does not distinguish > between single-node and cluster-wide operations. I think the benefit of API > simplicity from a development and client perspective outweighs the cost of > ergonomics. > > > > > A small question from my side: I see that the underlying assumption is > that > > > Sidecar is able to query Cassandra instances before > bouncing/recognizing > > > the bounce. What if it could not communicate with the Cassandra > instance > > > (e.g., binary protocol disabled, C* process experiencing issues, or C* > > > process starting as part of a new DC)? > > > > This would fall under scenario #2 in the Error Handling section of the > CEP. If a Sidecar instance can’t communicate with Cassandra, after a > configurable timeout and amount of retries, the Sidecar instance should > mark the job as failed. > > > > > 1. Have we considered introducing the concept of a datacenter > alongside cluster? > > > I imagine there will be cases where a user wants to perform a rolling > restart on a > > > single datacenter rather than across the entire cluster. > > > > I think this could be added in the future, but for this initial > implementation an operator would submit the nodes part of a datacenter to > restart a datacenter. I prefer providing a unified API that can handle > single node and cluster (or datacenter) wide operations over separate APIs > which might be easier to use in isolation but complicate development and > discoverability. > > > > >2. Do we see this framework extending to other cluster- or > datacenter-wide operations, > > > such as scale-up/scale-down operations, or backups/restores, or > nodetool rebuilds > > > run as part of adding a new datacenter? > > > > Yes, our goal with this design is that it is extensible for future > operations, as well as currently supported operations (such as node > decommissions) that already exist in Sidecar. In the initial Cassandra > storage implementation, all inter-sidecar communication and operation > tracking could occur in the proposed cluster_ops_node_state table. > > > > > > > The design seems focused on cluster/availability signals (ring stable, > > > peers up), which is a great start, but doesn’t mention pluggable > workload > > > signals like: 1) compaction load (nodetool compactionstats) 2) netstats > > > activity (nodetool netstats) 3) hints backlog / streaming pending > flushes > > > or memtable pressure. > > > Since restarting during heavy compaction/hints can add risk, are these > > > kinds of workload-aware checks in scope for the MVP, or considered > future > > > work? > > > > I agree that the health check should be pluggable as well— this was also > proposed in CEP-1. For the first iteration of rolling restarts, we are > thinking of providing a health check implementation that checks for all > other Cassandra peers being up, and future work can add more robust health > checks. > > > > Best, > > Andrés > > > > On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz < > [email protected]> wrote: > > Hello everyone, > > > > We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar ( > https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar > ) > > > > This CEP builds off of CEP-1 and proposes a design for safe, efficient, > and operator friendly rolling restarts on Cassandra clusters, as well as an > extensible approach for persisting future cluster-wide operations in > Cassandra Sidecar. We hope to leverage this infrastructure in the future to > implement upgrade automation. > > > > We welcome all feedback and discussion. Thank you in advance for your > time and consideration of this proposal! > > > > Best, > > Andrés and Paulo > >
