Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Venkata Harikrishna Nukala Wed, 10 Sep 2025 12:06:53 -0700

> A small question from my side: I see that the underlying assumption is
> that
>
> > Sidecar is able to query Cassandra instances before bouncing/recognizing
>
> > the bounce. What if it could not communicate with the Cassandra instance
>
> > (e.g., binary protocol disabled, C* process experiencing issues, or C*
>
> > process starting as part of a new DC)?
>
> This would fall under scenario #2 in the Error Handling
> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-ErrorHandling>
>  section
> of the CEP. If a Sidecar instance can’t communicate with Cassandra, after a
> configurable timeout and amount of retries, the Sidecar instance should
> mark the job as failed.
>


Returning an error is indeed appropriate for communication failures, but
this means operators would need to find alternative ways to bounce nodes
and couldn't rely on these APIs consistently. A hard dependency on
Cassandra instances would also create challenges for startup scenarios
where operators need to start C* instances for the first time—such as brand
new cluster creation, replacing an existing node, or starting nodes in a
new datacenter.

If the responsibility of updating Cassandra tables is delegated to a plugin
or default implementation for communicating bounces, it would allow others
to implement their own solutions more freely (potentially bypassing C*
table updates for bounce communication). I understand that this CEP focuses
on restart operations (stop + start), but I believe an interface and plugin
pattern could address a broader range of operational challenges.

On Mon, Sep 8, 2025 at 10:22 PM Andrés Beck-Ruiz <[email protected]>
wrote:

> Our original thinking was that we could store node UUIDs but return IP
> addresses so that operators can better identify Cassandra nodes, but I
> recognize that it could also cause confusion as the address is not a
> persistent node identifier. I agree that the benefit of unifying the API
> and schema by using node UUIDs exclusively outweighs the cost of API
> ergonomics. I can update the CEP to reflect this unless there are differing
> opinions.
>
> Best,
> Andrés
>
> On Mon, Sep 8, 2025 at 5:03 AM Sam Tunnicliffe <[email protected]> wrote:
>
>> Hi Andrés, thanks for this comprehensive CEP.
>>
>> I have a query about the representation of nodes in the various tables
>> and responses.
>>
>> In the sidecar_internal.cluster_ops_node_state table, "We store the node
>> UUID instead of IP address to guarantee that the correct Cassandra nodes
>> are restarted in case any node moves to another host.". However, in the
>> main sidecar_internal.cluster_ops table the nodes participating in the
>> operation are represented as a list of IP addresses. Likewise, in the
>> sample HTTP responses nodes always appear to be identified by their
>> address, not ID.
>>
>> It's true that operators are more accustomed to dealing with addresses
>> than IDs but it's equally the case that the address is not a persistent
>> node identifier, as noted in this CEP. For that reason, in C* trunk the
>> emphasis is shifting to lean more on node IDs, so I feel it would be a
>> retrograde step to introduce new APIs which include only addresses. Could
>> the schema and API responses in this CEP be unified in some way, either by
>> using IDs exclusively or by extending the node representation to something
>> that can incorporate both an ID and address?
>>
>> Among other things, Accord relies on persistent node IDs for correctness
>> and a unique and persistent identifier is now assigned to every node as a
>> prerequisite for it joining the cluster. This ID is a simple integer and is
>> encoded into the node's Host ID which is the UUID available in various
>> system tables, gossip state and nodetool commands. The initial thinking
>> behind encoding in the Host ID was to maintain compatibility with existing
>> tooling but at some point we will start to expose the ID directly in more
>> places. Right now there is a vtable which shows the IDs directly,
>> system_views.cluster_metadata_directory.
>>
>> Thanks,
>> Sam
>>
>> > On 8 Sep 2025, at 02:36, Andrés Beck-Ruiz <[email protected]>
>> wrote:
>> >
>> > Hello all,
>> >
>> > Thanks for the feedback. I agree with the suggestions that operation
>> state storage should be pluggable, with an initial implementation
>> leveraging Cassandra as we have proposed. I have made edits to the
>> Distributed Restart Design section in the CEP to reflect this.
>> >
>> > > As for the API, I think the question that needs to be answered is if
>> it is
>> > > worthwhile to have a distinction between single-node operations and
>> > > cluster-wide operations. For example, if I wanted to restart a single
>> node
>> > > using the API proposed in CEP-53, I could submit a restart job with a
>> > > single node in the “nodes” list. This provides API simplicity at the
>> cost
>> > > of ergonomics. It also means that all inter-sidecar communication
>> would go
>> > > through the proposed cluster_ops_node_state table. Personally, I think
>> > > these are acceptable tradeoffs to provide a unified API for
>> operations that
>> > > is simpler for a user or operator to use and learn.
>> >
>> > I agree that we should provide a unified API that does not distinguish
>> between single-node and cluster-wide operations. I think the benefit of API
>> simplicity from a development and client perspective outweighs the cost of
>> ergonomics.
>> >
>> > > A small question from my side: I see that the underlying assumption
>> is that
>> > > Sidecar is able to query Cassandra instances before
>> bouncing/recognizing
>> > > the bounce. What if it could not communicate with the Cassandra
>> instance
>> > > (e.g., binary protocol disabled, C* process experiencing issues, or C*
>> > > process starting as part of a new DC)?
>> >
>> > This would fall under scenario #2 in the Error Handling section of the
>> CEP. If a Sidecar instance can’t communicate with Cassandra, after a
>> configurable timeout and amount of retries, the Sidecar instance should
>> mark the job as failed.
>> >
>> > > 1. Have we considered introducing the concept of a datacenter
>> alongside cluster?
>> > > I imagine there will be cases where a user wants to perform a rolling
>> restart on a
>> > > single datacenter rather than across the entire cluster.
>> >
>> > I think this could be added in the future, but for this initial
>> implementation an operator would submit the nodes part of a datacenter to
>> restart a datacenter. I prefer providing a unified API that can handle
>> single node and cluster (or datacenter) wide operations over separate APIs
>> which might be easier to use in isolation but complicate development and
>> discoverability.
>> >
>> > >2. Do we see this framework extending to other cluster- or
>> datacenter-wide operations,
>> > > such as scale-up/scale-down operations, or backups/restores, or
>> nodetool rebuilds
>> > > run as part of adding a new datacenter?
>> >
>> > Yes, our goal with this design is that it is extensible for future
>> operations, as well as currently supported operations (such as node
>> decommissions) that already exist in Sidecar. In the initial Cassandra
>> storage implementation, all inter-sidecar communication and operation
>> tracking could occur in the proposed cluster_ops_node_state table.
>> >
>> >
>> > > The design seems focused on cluster/availability signals (ring stable,
>> > > peers up), which is a great start, but doesn’t mention pluggable
>> workload
>> > > signals like: 1) compaction load (nodetool compactionstats) 2)
>> netstats
>> > > activity (nodetool netstats) 3) hints backlog / streaming pending
>> flushes
>> > > or memtable pressure.
>> > > Since restarting during heavy compaction/hints can add risk, are these
>> > > kinds of workload-aware checks in scope for the MVP, or considered
>> future
>> > > work?
>> >
>> > I agree that the health check should be pluggable as well— this was
>> also proposed in CEP-1. For the first iteration of rolling restarts, we are
>> thinking of providing a health check implementation that checks for all
>> other Cassandra peers being up, and future work can add more robust health
>> checks.
>> >
>> > Best,
>> > Andrés
>> >
>> > On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <
>> [email protected]> wrote:
>> > Hello everyone,
>> >
>> > We would like to propose CEP 53: Cassandra Rolling Restarts via Sidecar
>> (
>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
>> )
>> >
>> > This CEP builds off of CEP-1 and proposes a design for safe, efficient,
>> and operator friendly rolling restarts on Cassandra clusters, as well as an
>> extensible approach for persisting future cluster-wide operations in
>> Cassandra Sidecar. We hope to leverage this infrastructure in the future to
>> implement upgrade automation.
>> >
>> > We welcome all feedback and discussion. Thank you in advance for your
>> time and consideration of this proposal!
>> >
>> > Best,
>> > Andrés and Paulo
>>
>>

Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Reply via email to