Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Andrés Beck-Ruiz Wed, 10 Sep 2025 12:27:07 -0700

> A hard dependency on Cassandra instances would also create challenges for
startup scenarios
> where operators need to start C* instances for the first time—such as
brand new cluster
> creation, replacing an existing node, or starting nodes in a new
datacenter.


Are you referring to depending on Cassandra for storing and
communicating operation state? The goal of an operation state management
interface is that Cassandra wouldn't need to be a dependency, as we can add
implementations for different storages in the future. Any operation storage
implementation would also track operations beyond just restarts.

Furthermore, we can add cluster_name to the partition keys of the tables in
the Cassandra storage implementation. This would allow a separate cluster
from the one being managed by the Cassandra Sidecar to be used for
operation state storage, so if any operation fails we can still communicate
with the storage cluster which isn't being operated on.

You can also bounce Cassandra nodes with the lifecycle APIs which don't
depend on operation state storage or communication with the Cassandra
instance being managed (
https://issues.apache.org/jira/projects/CASSSIDECAR/issues/CASSSIDECAR-266?filter=allopenissues
).

On Wed, Sep 10, 2025 at 3:05 PM Venkata Harikrishna Nukala <
[email protected]> wrote:

>
> > A small question from my side: I see that the underlying assumption is
>> that
>>
>> > Sidecar is able to query Cassandra instances before bouncing/recognizing
>>
>> > the bounce. What if it could not communicate with the Cassandra instance
>>
>> > (e.g., binary protocol disabled, C* process experiencing issues, or C*
>>
>> > process starting as part of a new DC)?
>>
>> This would fall under scenario #2 in the Error Handling
>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar#CEP53:CassandraRollingRestartsviaSidecar-ErrorHandling>
>>  section
>> of the CEP. If a Sidecar instance can’t communicate with Cassandra, after a
>> configurable timeout and amount of retries, the Sidecar instance should
>> mark the job as failed.
>>
>
> Returning an error is indeed appropriate for communication failures, but
> this means operators would need to find alternative ways to bounce nodes
> and couldn't rely on these APIs consistently. A hard dependency on
> Cassandra instances would also create challenges for startup scenarios
> where operators need to start C* instances for the first time—such as brand
> new cluster creation, replacing an existing node, or starting nodes in a
> new datacenter.
>
> If the responsibility of updating Cassandra tables is delegated to a
> plugin or default implementation for communicating bounces, it would allow
> others to implement their own solutions more freely (potentially bypassing
> C* table updates for bounce communication). I understand that this CEP
> focuses on restart operations (stop + start), but I believe an interface
> and plugin pattern could address a broader range of operational challenges.
>
> On Mon, Sep 8, 2025 at 10:22 PM Andrés Beck-Ruiz <[email protected]>
> wrote:
>
>> Our original thinking was that we could store node UUIDs but return IP
>> addresses so that operators can better identify Cassandra nodes, but I
>> recognize that it could also cause confusion as the address is not a
>> persistent node identifier. I agree that the benefit of unifying the API
>> and schema by using node UUIDs exclusively outweighs the cost of API
>> ergonomics. I can update the CEP to reflect this unless there are differing
>> opinions.
>>
>> Best,
>> Andrés
>>
>> On Mon, Sep 8, 2025 at 5:03 AM Sam Tunnicliffe <[email protected]> wrote:
>>
>>> Hi Andrés, thanks for this comprehensive CEP.
>>>
>>> I have a query about the representation of nodes in the various tables
>>> and responses.
>>>
>>> In the sidecar_internal.cluster_ops_node_state table, "We store the node
>>> UUID instead of IP address to guarantee that the correct Cassandra nodes
>>> are restarted in case any node moves to another host.". However, in the
>>> main sidecar_internal.cluster_ops table the nodes participating in the
>>> operation are represented as a list of IP addresses. Likewise, in the
>>> sample HTTP responses nodes always appear to be identified by their
>>> address, not ID.
>>>
>>> It's true that operators are more accustomed to dealing with addresses
>>> than IDs but it's equally the case that the address is not a persistent
>>> node identifier, as noted in this CEP. For that reason, in C* trunk the
>>> emphasis is shifting to lean more on node IDs, so I feel it would be a
>>> retrograde step to introduce new APIs which include only addresses. Could
>>> the schema and API responses in this CEP be unified in some way, either by
>>> using IDs exclusively or by extending the node representation to something
>>> that can incorporate both an ID and address?
>>>
>>> Among other things, Accord relies on persistent node IDs for correctness
>>> and a unique and persistent identifier is now assigned to every node as a
>>> prerequisite for it joining the cluster. This ID is a simple integer and is
>>> encoded into the node's Host ID which is the UUID available in various
>>> system tables, gossip state and nodetool commands. The initial thinking
>>> behind encoding in the Host ID was to maintain compatibility with existing
>>> tooling but at some point we will start to expose the ID directly in more
>>> places. Right now there is a vtable which shows the IDs directly,
>>> system_views.cluster_metadata_directory.
>>>
>>> Thanks,
>>> Sam
>>>
>>> > On 8 Sep 2025, at 02:36, Andrés Beck-Ruiz <[email protected]>
>>> wrote:
>>> >
>>> > Hello all,
>>> >
>>> > Thanks for the feedback. I agree with the suggestions that operation
>>> state storage should be pluggable, with an initial implementation
>>> leveraging Cassandra as we have proposed. I have made edits to the
>>> Distributed Restart Design section in the CEP to reflect this.
>>> >
>>> > > As for the API, I think the question that needs to be answered is if
>>> it is
>>> > > worthwhile to have a distinction between single-node operations and
>>> > > cluster-wide operations. For example, if I wanted to restart a
>>> single node
>>> > > using the API proposed in CEP-53, I could submit a restart job with a
>>> > > single node in the “nodes” list. This provides API simplicity at the
>>> cost
>>> > > of ergonomics. It also means that all inter-sidecar communication
>>> would go
>>> > > through the proposed cluster_ops_node_state table. Personally, I
>>> think
>>> > > these are acceptable tradeoffs to provide a unified API for
>>> operations that
>>> > > is simpler for a user or operator to use and learn.
>>> >
>>> > I agree that we should provide a unified API that does not distinguish
>>> between single-node and cluster-wide operations. I think the benefit of API
>>> simplicity from a development and client perspective outweighs the cost of
>>> ergonomics.
>>> >
>>> > > A small question from my side: I see that the underlying assumption
>>> is that
>>> > > Sidecar is able to query Cassandra instances before
>>> bouncing/recognizing
>>> > > the bounce. What if it could not communicate with the Cassandra
>>> instance
>>> > > (e.g., binary protocol disabled, C* process experiencing issues, or
>>> C*
>>> > > process starting as part of a new DC)?
>>> >
>>> > This would fall under scenario #2 in the Error Handling section of the
>>> CEP. If a Sidecar instance can’t communicate with Cassandra, after a
>>> configurable timeout and amount of retries, the Sidecar instance should
>>> mark the job as failed.
>>> >
>>> > > 1. Have we considered introducing the concept of a datacenter
>>> alongside cluster?
>>> > > I imagine there will be cases where a user wants to perform a
>>> rolling restart on a
>>> > > single datacenter rather than across the entire cluster.
>>> >
>>> > I think this could be added in the future, but for this initial
>>> implementation an operator would submit the nodes part of a datacenter to
>>> restart a datacenter. I prefer providing a unified API that can handle
>>> single node and cluster (or datacenter) wide operations over separate APIs
>>> which might be easier to use in isolation but complicate development and
>>> discoverability.
>>> >
>>> > >2. Do we see this framework extending to other cluster- or
>>> datacenter-wide operations,
>>> > > such as scale-up/scale-down operations, or backups/restores, or
>>> nodetool rebuilds
>>> > > run as part of adding a new datacenter?
>>> >
>>> > Yes, our goal with this design is that it is extensible for future
>>> operations, as well as currently supported operations (such as node
>>> decommissions) that already exist in Sidecar. In the initial Cassandra
>>> storage implementation, all inter-sidecar communication and operation
>>> tracking could occur in the proposed cluster_ops_node_state table.
>>> >
>>> >
>>> > > The design seems focused on cluster/availability signals (ring
>>> stable,
>>> > > peers up), which is a great start, but doesn’t mention pluggable
>>> workload
>>> > > signals like: 1) compaction load (nodetool compactionstats) 2)
>>> netstats
>>> > > activity (nodetool netstats) 3) hints backlog / streaming pending
>>> flushes
>>> > > or memtable pressure.
>>> > > Since restarting during heavy compaction/hints can add risk, are
>>> these
>>> > > kinds of workload-aware checks in scope for the MVP, or considered
>>> future
>>> > > work?
>>> >
>>> > I agree that the health check should be pluggable as well— this was
>>> also proposed in CEP-1. For the first iteration of rolling restarts, we are
>>> thinking of providing a health check implementation that checks for all
>>> other Cassandra peers being up, and future work can add more robust health
>>> checks.
>>> >
>>> > Best,
>>> > Andrés
>>> >
>>> > On Fri, Aug 29, 2025 at 3:56 PM Andrés Beck-Ruiz <
>>> [email protected]> wrote:
>>> > Hello everyone,
>>> >
>>> > We would like to propose CEP 53: Cassandra Rolling Restarts via
>>> Sidecar (
>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-53%3A+Cassandra+Rolling+Restarts+via+Sidecar
>>> )
>>> >
>>> > This CEP builds off of CEP-1 and proposes a design for safe,
>>> efficient, and operator friendly rolling restarts on Cassandra clusters, as
>>> well as an extensible approach for persisting future cluster-wide
>>> operations in Cassandra Sidecar. We hope to leverage this infrastructure in
>>> the future to implement upgrade automation.
>>> >
>>> > We welcome all feedback and discussion. Thank you in advance for your
>>> time and consideration of this proposal!
>>> >
>>> > Best,
>>> > Andrés and Paulo
>>>
>>>

Re: [DISCUSS] CEP 53: Cassandra Rolling Restarts via Sidecar

Reply via email to