Hi Patrick, I agree. I updated the CEP to specify > Connections that do *not* subscribe to GRACEFUL_DISCONNECT behave as they do today, with the same grace window. That means, when the server is shutting down with a mixed fleet of connections, both kinds of connections will continue processing requests until the gracefully-draining connections are closed or timed out.
Thank you! Sincerely, Jane On Wed, Feb 11, 2026 at 11:43 AM Patrick McFadin <[email protected]> wrote: > Thanks, Jane, you've clarified a lot. > > One area I’m still thinking about is how to handle non-opt-in connections > in mixed fleets. Even if they don’t receive GRACEFUL_DISCONNECT, the server > will still need a deterministic draining policy for them. It might help the > CEP to explicitly state how draining applies to connections that never > REGISTER the event (same grace window and eventual enforcement), so > operators understand the guarantees in mixed environments. > > Also, since this introduces a per-connection draining state, it may be > worth explicitly noting that drivers will need per-connection scheduling > awareness rather than treating host pools as homogeneous. That could help > driver developers interpret the intent consistently. > > Patrick > > On Tue, Feb 10, 2026 at 9:23 PM Jaydeep Chovatia < > [email protected]> wrote: > >> >The proposed solution is to add an in-band GRACEFUL_DISCONNECT event >> that both control and query connections can opt into via REGISTER. When >> a node is shutting down, it will emit the event to all subscribed >> connections. Drivers will stop sending new queries on that connection/host, >> allow in-flight requests to finish, then reconnect with exponential backoff. >> >> Overall, I like the proposal because it helps reduce p99 latency from the >> client perspective by avoiding retries when the Cassandra server is >> restarted for planned activities, which happen quite frequently. >> >> >> On Tue, Feb 10, 2026 at 4:08 PM Jane H <[email protected]> wrote: >> >>> Hi Runtian, >>> >>> Yes, GRACEFUL_DISCONNECT is a per-connection draining signal, meaning a >>> node tells the client that this connection is going away, rather than a >>> node telling the client about other nodes. >>> >>> There isn’t a reliable way for a driver to distinguish whether a node is >>> permanently going away or just restarting, and Graceful Disconnect is >>> intentionally scoped to connection draining, not lifecycle intent. >>> In practice, this isn’t a big issue. After receiving GRACEFUL_DISCONNECT, >>> the driver stops sending new queries and only retries reconnection with >>> backoff. If the node is later removed from cluster metadata, the driver >>> stops reconnecting. The overhead during that window is limited to a few >>> reconnection attempts. >>> Adding intent like “going away” vs. “restarting” would be hard to make >>> reliable, since the server often doesn’t know it's shutting down or coming >>> back up. >>> >>> Thank you for your questions! Hope this clarifies. >>> >>> Sincerely, >>> Jane >>> >>> On Wed, Feb 4, 2026 at 11:07 AM Runtian Liu <[email protected]> wrote: >>> >>>> Hi Jane, all, >>>> >>>> Thanks for the detailed discussion so far—this proposal resonates >>>> strongly with issues we see in production. >>>> >>>> I wanted to raise a related scenario around *gray failures* during >>>> shutdown. In some cases, an operator has clear intent to shut down a node >>>> (or the host is unhealthy), but the Cassandra process remains reachable for >>>> some time. During this window, service clients can still connect and >>>> continue sending queries, which often leads to timeouts and confusing >>>> behavior downstream. Gossip-based DOWN events or socket closes are not >>>> always timely or reliable enough to prevent this. >>>> >>>> A couple of questions on the intended semantics of GRACEFUL_DISCONNECT >>>> in this context: >>>> >>>> 1. >>>> >>>> *Can GRACEFUL_DISCONNECT be emitted explicitly by server-side >>>> tooling (or an operator-triggered path)* to signal intentional >>>> unavailability, even if the process is still alive? >>>> This would allow operators to proactively instruct clients to stop >>>> sending traffic to a node before or during shutdown-related gray >>>> failures. >>>> 2. >>>> >>>> After receiving GRACEFUL_DISCONNECT, *drivers may still attempt >>>> reconnection after backoff*. >>>> Is there a way for drivers to deterministically know that the >>>> server is intentionally being taken out of service (as opposed to >>>> transient >>>> unavailability), so that they avoid sending *any* new queries to >>>> that node until a restart or explicit “back in service” signal is >>>> observed? >>>> >>>> Put differently, I’m curious whether GRACEFUL_DISCONNECT is meant to be >>>> purely a per-connection draining signal, or whether it could also serve as >>>> a stronger expression of *operator intent* to remove a node from >>>> service—something that would help eliminate gray-failure traffic entirely. >>>> >>>> Thanks again for pushing this forward; it looks very promising. >>>> >>>> Best, >>>> Runtian >>>> >>>> On Thu, Jan 29, 2026 at 5:44 PM Jane H <[email protected]> wrote: >>>> >>>>> Hi Patrick, >>>>> >>>>> Thanks for reading the CEP and for the thoughtful questions! Replies >>>>> below. >>>>> >>>>> Driver backward compatibility / mixed rollouts >>>>> -------- >>>>> This is fully opt-in per connection. Older drivers won’t REGISTER for >>>>> GRACEFUL_DISCONNECT, so servers won’t send it to them, and those >>>>> connections behave exactly as they do today. >>>>> >>>>> REGISTER vs STARTUP for opt-in >>>>> ------- >>>>> There are two plausible ways for a driver to opt in to >>>>> GRACEFUL_DISCONNECT: >>>>> >>>>> Option A: REGISTER (as proposed today) >>>>> | Driver behavior | Server >>>>> behavior >>>>> | >>>>> >>>>> |------------------------------------------------------------|------------------------------------------------------------------------------------------------| >>>>> | Send `OPTIONS` | Return >>>>> `SUPPORTED` (`Map<String, List<String>>`) containing >>>>> `"GRACEFUL_DISCONNECT": ["true"]`. | >>>>> | Send `STARTUP` as normal | >>>>> Optionally handle authentication as normal. Send `READY` as normal. >>>>> | >>>>> | Send `REGISTER` including event type `GRACEFUL_DISCONNECT` | >>>>> Acknowledge normally (e.g., `READY`). >>>>> | >>>>> >>>>> This is consistent with the protocol: REGISTER is the standard >>>>> mechanism to subscribe to events. >>>>> However, this does add an extra round trip per query connection that >>>>> wants the event. Today most drivers only REGISTER on the control >>>>> connection >>>>> for cluster-wide events (STATUS_CHANGE / TOPOLOGY_CHANGE / SCHEMA_CHANGE), >>>>> and query connections typically do not REGISTER anything. If we want every >>>>> query connection to receive GRACEFUL_DISCONNECT (because the signal is >>>>> connection-local), then every query connection would need to send >>>>> REGISTER, >>>>> which means one additional message exchange during connection >>>>> establishment. >>>>> >>>>> Option B: STARTUP opt-in (alternative) >>>>> | Driver behavior >>>>> | Server >>>>> behavior >>>>> >>>>> | >>>>> | >>>>> ----------------------------------------------------------------------------------------------------------------------------- >>>>> | >>>>> ---------------------------------------------------------------------------------------------- >>>>> | >>>>> | Send `OPTIONS` >>>>> | Return >>>>> `SUPPORTED` (`Map<String, List<String>>`) containing >>>>> `"GRACEFUL_DISCONNECT": ["true"]`. | >>>>> | Send STARTUP with an additional entry in the options map, e.g. { >>>>> "CQL_VERSION": "3.0.0", "GRACEFUL_DISCONNECT": "true", ... } | Optionally >>>>> handle authentication as normal. Send `READY` as normal. >>>>> | >>>>> >>>>> This avoids the extra round trip, because the opt-in piggybacks on an >>>>> existing step in the handshake. But it introduces new semantics: STARTUP >>>>> options would be used to request an event stream subscription, which is >>>>> non-standard given that REGISTER already exists for that purpose. >>>>> >>>>> Given the above, we prefer REGISTER for Protocol semantics >>>>> consistency, even though it costs one additional round trip on each query >>>>> connection that opts in. >>>>> >>>>> Signal multiplication >>>>> -------- >>>>> The protocol guidance about “don’t REGISTER on all connections” is >>>>> primarily aimed at the existing out-of-band events (STATUS_CHANGE / >>>>> TOPOLOGY_CHANGE / SCHEMA_CHANGE). Those events are gossip-driven and >>>>> broadcast by multiple nodes, so registering on many connections can easily >>>>> produce redundant notifications. >>>>> >>>>> Concrete example (duplication with STATUS_CHANGE): >>>>> * In a 3-node cluster (node1, node2, node3), node1 is going down. >>>>> * Node2 and node3 learn about node1’s state change via gossip. >>>>> * Both node2 and node3 will send a STATUS_CHANGE event (“node1 is >>>>> DOWN”) to every client connection that registered for STATUS_CHANGE. >>>>> * If a driver registers for STATUS_CHANGE on connections to both node2 >>>>> and node3, it will receive two notifications for the same cluster event. >>>>> That’s the “signal multiplication” the spec warns about. >>>>> >>>>> But the protocol does not stop us from adding an in-band event like >>>>> GRACEFUL_DISCONNECT. In the above example of node1 going down: >>>>> * GRACEFUL_DISCONNECT is in-band and connection-local, not >>>>> gossip/broadcast. >>>>> * Only the node that is actually shutting down (node1) emits >>>>> GRACEFUL_DISCONNECT, and it emits it only on its own native connections >>>>> that opted in. >>>>> * Node2 and node3 do not emit GRACEFUL_DISCONNECT for node1’s >>>>> shutdown, because they are not the node being drained. >>>>> So even if a driver has connections to node2 and node3 that are >>>>> registered for other events, it will not receive any GRACEFUL_DISCONNECT >>>>> from them for node1 going down. >>>>> >>>>> I understand such an in-band event is new. We can add a clarification >>>>> to the protocol explaining that the recommendation of “don’t REGISTER on >>>>> all connections” will not apply to in-band events like >>>>> GRACEFUL_DISCONNECT. >>>>> >>>>> Event timing for operators >>>>> --------- >>>>> Servers should emit GRACEFUL_DISCONNECT whenever it needs to close a >>>>> connection gracefully, regardless of the trigger. >>>>> I’ll update the CEP to clarify that GRACEFUL_DISCONNECT to whenever >>>>> the server intends to close a connection gracefully, including nodetool >>>>> drain, nodetool disablebinary + shutdown, rolling restarts, or a >>>>> controlled >>>>> JVM shutdown hook path. >>>>> >>>>> Operator control + observability >>>>> --------- >>>>> +1. I agree to add the server-side configs: >>>>> graceful_disconnect_enabled graceful_disconnect_grace_period_ms >>>>> graceful_disconnect_max_drain_ms >>>>> And metrics/counters such as: connections_draining forced_disconnects >>>>> I’ll update the CEP accordingly. >>>>> >>>>> Thanks again—this feedback is super helpful for tightening the >>>>> proposal. >>>>> >>>>> Regards, >>>>> Jane >>>>> >>>>> On Wed, Jan 14, 2026 at 1:33 PM Patrick McFadin <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi Jane, >>>>>> >>>>>> Thank you for the thought-out CEP. I certainly see the use of >>>>>> a feature like this to add resilience during cluster state changes. I >>>>>> have >>>>>> a few questions after reading the CEP. >>>>>> >>>>>> Driver compatibility: The way I read this, it's based on an ideal >>>>>> scenario where client and server are on the same version to support this >>>>>> feature. In my experience, client rollouts are never complete and often >>>>>> lag >>>>>> far behind the cluster upgrade. What happens when the driver completely >>>>>> ignores GRACEFUL_DISCONNECT? It might mean considering something on the >>>>>> server side. >>>>>> >>>>>> Discovery things: Speaking of the client, you want to use the >>>>>> SUPPORTED as listed in the v4 spec[1], but why not add this to STARTUP? >>>>>> You >>>>>> mention something in the "Rejected alternatives," but could you expand >>>>>> your >>>>>> thinking here? >>>>>> >>>>>> Signal multiplication: You have this in the CEP "Other protocols >>>>>> (HTTP/2, PostgreSQL, Redis Cluster) use connection-local in-band signals >>>>>> to >>>>>> enable safe draining." Our protocol guidance[1] explicitly notes that >>>>>> drivers often keep multiple connections and should not register for >>>>>> events >>>>>> on all of them, as this duplicates traffic. I don't know how you could >>>>>> ensure that every connection would be aware of a GRACEFUL_DISCONNECT >>>>>> without changing that aspect of the spec. >>>>>> >>>>>> >>>>>> Event timing for operators: It's not clear to me when >>>>>> the GRACEFUL_DISCONNECT is emitted when you do something like a drain, >>>>>> disablebinary or just a JVM shutdown hook. This is crucial for operators >>>>>> to >>>>>> understand how this could work and should be in the CEP spec for >>>>>> clarity. I >>>>>> think it will matter to a lot of people. >>>>>> >>>>>> Operator control: I've been on this push for a while and so I have to >>>>>> mention it. Opt-in vs default. We need more controls in the config YAML. >>>>>> graceful_disconnect_enabled >>>>>> >>>>>> If there is a server-side component: >>>>>> graceful_disconnect_grace_period_ms >>>>>> graceful_disconnect_max_drain_ms >>>>>> >>>>>> And finally, it needs more observability... >>>>>> logging/metrics counters: connections_draining, forced_disconnects >>>>>> >>>>>> >>>>>> Thanks for proposing this! >>>>>> >>>>>> Patrick >>>>>> >>>>>> 1 - >>>>>> https://cassandra.apache.org/doc/latest/cassandra/_attachments/native_protocol_v4.html >>>>>> >>>>>> On Tue, Jan 13, 2026 at 4:30 PM Jane H <[email protected]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I’d like to start a discussion on a CEP proposal: *CEP-59: Graceful >>>>>>> Disconnect*, to make intentional node shutdown/drain less >>>>>>> disruptive for clients (link: >>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619103 >>>>>>> ). >>>>>>> >>>>>>> Today, intentional node shutdown (e.g., rolling restarts) can still >>>>>>> be disruptive from a client perspective. Drivers often ignore DOWN >>>>>>> events because they are not reliable, and outstanding requests can end >>>>>>> up >>>>>>> as client-facing TimeOut exceptions. >>>>>>> >>>>>>> The proposed solution is to add an in-band GRACEFUL_DISCONNECT >>>>>>> event that both control and query connections can opt into via >>>>>>> REGISTER. When a node is shutting down, it will emit the event to >>>>>>> all subscribed connections. Drivers will stop sending new queries on >>>>>>> that >>>>>>> connection/host, allow in-flight requests to finish, then reconnect with >>>>>>> exponential backoff. >>>>>>> >>>>>>> If you have thoughts on the proposed protocol, server shutdown >>>>>>> behavior, driver expectations, edge cases, or general feedback, I’d >>>>>>> really >>>>>>> appreciate it. >>>>>>> >>>>>>> Regards, >>>>>>> Jane >>>>>>> >>>>>>
