Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Andrés Beck-Ruiz Tue, 09 Sep 2025 11:25:42 -0700

Our original thinking was that we could store node UUIDs but return IP
addresses so that operators can better identify Cassandra nodes, but I
recognize that it could also cause confusion as the address is not a
persistent node identifier. I agree that the benefit of unifying the API
and schema by using node UUIDs exclusively outweighs the cost of API
ergonomics. I can update the CEP to reflect this unless there are differing
opinions.


Best,
Andrés

On Mon, Sep 8, 2025 at 5:03 AM Sam Tunnicliffe <[email protected]> wrote:

> Hi Andrés, thanks for this comprehensive CEP.
>
> I have a query about the representation of nodes in the various tables and
> responses.
>
> In the sidecar_internal.cluster_ops_node_state table, "We store the node
> UUID instead of IP address to guarantee that the correct Cassandra nodes
> are restarted in case any node moves to another host.". However, in the
> main sidecar_internal.cluster_ops table the nodes participating in the
> operation are represented as a list of IP addresses. Likewise, in the
> sample HTTP responses nodes always appear to be identified by their
> address, not ID.
>
> It's true that operators are more accustomed to dealing with addresses
> than IDs but it's equally the case that the address is not a persistent
> node identifier, as noted in this CEP. For that reason, in C* trunk the
> emphasis is shifting to lean more on node IDs, so I feel it would be a
> retrograde step to introduce new APIs which include only addresses. Could
> the schema and API responses in this CEP be unified in some way, either by
> using IDs exclusively or by extending the node representation to something
> that can incorporate both an ID and address?
>
> Among other things, Accord relies on persistent node IDs for correctness
> and a unique and persistent identifier is now assigned to every node as a
> prerequisite for it joining the cluster. This ID is a simple integer and is
> encoded into the node's Host ID which is the UUID available in various
> system tables, gossip state and nodetool commands. The initial thinking
> behind encoding in the Host ID was to maintain compatibility with existing
> tooling but at some point we will start to expose the ID directly in more
> places. Right now there is a vtable which shows the IDs directly,
> system_views.cluster_metadata_directory.
>
> Thanks,
> Sam
>
> > On 8 Sep 2025, at 02:36, Andrés Beck-Ruiz <[email protected]>
> wrote:
> >
> > Hello all,
> >
> > Thanks for the feedback. I agree with the suggestions that operation
> state storage should be pluggable, with an initial implementation
> leveraging Cassandra as we have proposed. I have made edits to the
> Distributed Restart Design section in the CEP to reflect this.
> >
> > > As for the API, I think the question that needs to be answered is if
> it is
> > > worthwhile to have a distinction between single-node operations and
> > > cluster-wide operations. For example, if I wanted to restart a single
> node
> > > using the API proposed in CEP-53, I could submit a restart job with a
> > > single node in the “nodes” list. This provides API simplicity at the
> cost
> > > of ergonomics. It also means that all inter-sidecar communication
> would go
> > > through the proposed cluster_ops_node_state table. Personally, I think
> > > these are acceptable tradeoffs to provide a unified API for operations
> that
> > > is simpler for a user or operator to use and learn.
> >
> > I agree that we should provide a unified API that does not distinguish
> between single-node and cluster-wide operations. I think the benefit of API
> simplicity from a development and client perspective outweighs the cost of
> ergonomics.
> >
> > > A small question from my side: I see that the underlying assumption is
> that
> > > Sidecar is able to query Cassandra instances before
> bouncing/recognizing
> > > the bounce. What if it could not communicate with the Cassandra
> instance
> > > (e.g., binary protocol disabled, C* process experiencing issues, or C*
> > > process starting as part of a new DC)?
> >
> > This would fall under scenario #2 in the Error Handling section of the
> CEP. If a Sidecar instance can’t communicate with Cassandra, after a
> configurable timeout and amount of retries, the Sidecar instance should
> mark the job as failed.
> >
> > > 1. Have we considered introducing the concept of a datacenter
> alongside cluster?
> > > I imagine there will be cases where a user wants to perform a rolling
> restart on a
> > > single datacenter rather than across the entire cluster.
> >
> > I think this could be added in the future, but for this initial
> implementation an operator would submit the nodes part of a datacenter to
> restart a datacenter. I prefer providing a unified API that can handle
> single node and cluster (or datacenter) wide operations over separate APIs
> which might be easier to use in isolation but complicate development and
> discoverability.
> >
> > >2. Do we see this framework extending to other cluster- or
> datacenter-wide operations,
> > > such as scale-up/scale-down operations, or backups/restores, or
> nodetool rebuilds
> > > run as part of adding a new datacenter?
> >
> > Yes, our goal with this design is that it is extensible for future
> operations, as well as currently supported operations (such as node
> decommissions) that already exist in Sidecar. In the initial Cassandra
> storage implementation, all inter-sidecar communication and operation
> tracking could occur in the proposed cluster_ops_node_state table.
> >
> >
> > > The design seems focused on cluster/availability signals (ring stable,
> > > peers up), which is a great start, but doesn’t mention pluggable
> workload
> > > signals like: 1) compaction load (nodetool compactionstats) 2) netstats
> > > activity (nodetool netstats) 3) hints backlog / streaming pending
> flushes
> > > or memtable pressure.
> > > Since restarting during heavy compaction/hints can add risk, are these
> > > kinds of workload-aware checks in scope for the MVP, or considered
> future
> > > work?
> >
> > I agree that the health check should be pluggable as well— this was also
> proposed in CEP-1. For the first iteration of rolling restarts, we are
> thinking of providing a health check implementation that checks for all
> other Cassandra peers being up, and future work can add more robust health
> checks.
> >
> > Best,
> > Andrés
> >
> > On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <
> [email protected]> wrote:
> > Hello everyone,
> >
> > We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar (
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
> )
> >
> > This CEP builds off of CEP-1 and proposes a design for safe, efficient,
> and operator friendly rolling restarts on Cassandra clusters, as well as an
> extensible approach for persisting future cluster-wide operations in
> Cassandra Sidecar. We hope to leverage this infrastructure in the future to
> implement upgrade automation.
> >
> > We welcome all feedback and discussion. Thank you in advance for your
> time and consideration of this proposal!
> >
> > Best,
> > Andrés and Paulo
>
>

Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Reply via email to