Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Jane H Thu, 26 Feb 2026 13:12:33 -0800

Hi Patrick,

I agree. I updated the CEP to specify
> Connections that do *not* subscribe to GRACEFUL_DISCONNECT behave as they
do today, with the same grace window. That means, when the server is
shutting down with a mixed fleet of connections, both kinds of connections
will continue processing requests until the gracefully-draining connections
are closed or timed out.


Thank you!

Sincerely,
Jane

On Wed, Feb 11, 2026 at 11:43 AM Patrick McFadin <[email protected]> wrote:

> Thanks, Jane, you've clarified a lot.
>
> One area I’m still thinking about is how to handle non-opt-in connections
> in mixed fleets. Even if they don’t receive GRACEFUL_DISCONNECT, the server
> will still need a deterministic draining policy for them. It might help the
> CEP to explicitly state how draining applies to connections that never
> REGISTER the event (same grace window and eventual enforcement), so
> operators understand the guarantees in mixed environments.
>
> Also, since this introduces a per-connection draining state, it may be
> worth explicitly noting that drivers will need per-connection scheduling
> awareness rather than treating host pools as homogeneous. That could help
> driver developers interpret the intent consistently.
>
> Patrick
>
> On Tue, Feb 10, 2026 at 9:23 PM Jaydeep Chovatia <
> [email protected]> wrote:
>
>> >The proposed solution is to add an in-band GRACEFUL_DISCONNECT event
>> that both control and query connections can opt into via REGISTER. When
>> a node is shutting down, it will emit the event to all subscribed
>> connections. Drivers will stop sending new queries on that connection/host,
>> allow in-flight requests to finish, then reconnect with exponential backoff.
>>
>> Overall, I like the proposal because it helps reduce p99 latency from the
>> client perspective by avoiding retries when the Cassandra server is
>> restarted for planned activities, which happen quite frequently.
>>
>>
>> On Tue, Feb 10, 2026 at 4:08 PM Jane H <[email protected]> wrote:
>>
>>> Hi Runtian,
>>>
>>> Yes, GRACEFUL_DISCONNECT is a per-connection draining signal, meaning a
>>> node tells the client that this connection is going away, rather than a
>>> node telling the client about other nodes.
>>>
>>> There isn’t a reliable way for a driver to distinguish whether a node is
>>> permanently going away or just restarting, and Graceful Disconnect is
>>> intentionally scoped to connection draining, not lifecycle intent.
>>> In practice, this isn’t a big issue. After receiving GRACEFUL_DISCONNECT,
>>> the driver stops sending new queries and only retries reconnection with
>>> backoff. If the node is later removed from cluster metadata, the driver
>>> stops reconnecting. The overhead during that window is limited to a few
>>> reconnection attempts.
>>> Adding intent like “going away” vs. “restarting” would be hard to make
>>> reliable, since the server often doesn’t know it's shutting down or coming
>>> back up.
>>>
>>> Thank you for your questions! Hope this clarifies.
>>>
>>> Sincerely,
>>> Jane
>>>
>>> On Wed, Feb 4, 2026 at 11:07 AM Runtian Liu <[email protected]> wrote:
>>>
>>>> Hi Jane, all,
>>>>
>>>> Thanks for the detailed discussion so far—this proposal resonates
>>>> strongly with issues we see in production.
>>>>
>>>> I wanted to raise a related scenario around *gray failures* during
>>>> shutdown. In some cases, an operator has clear intent to shut down a node
>>>> (or the host is unhealthy), but the Cassandra process remains reachable for
>>>> some time. During this window, service clients can still connect and
>>>> continue sending queries, which often leads to timeouts and confusing
>>>> behavior downstream. Gossip-based DOWN events or socket closes are not
>>>> always timely or reliable enough to prevent this.
>>>>
>>>> A couple of questions on the intended semantics of GRACEFUL_DISCONNECT
>>>> in this context:
>>>>
>>>>    1.
>>>>
>>>>    *Can GRACEFUL_DISCONNECT be emitted explicitly by server-side
>>>>    tooling (or an operator-triggered path)* to signal intentional
>>>>    unavailability, even if the process is still alive?
>>>>    This would allow operators to proactively instruct clients to stop
>>>>    sending traffic to a node before or during shutdown-related gray 
>>>> failures.
>>>>    2.
>>>>
>>>>    After receiving GRACEFUL_DISCONNECT, *drivers may still attempt
>>>>    reconnection after backoff*.
>>>>    Is there a way for drivers to deterministically know that the
>>>>    server is intentionally being taken out of service (as opposed to 
>>>> transient
>>>>    unavailability), so that they avoid sending *any* new queries to
>>>>    that node until a restart or explicit “back in service” signal is 
>>>> observed?
>>>>
>>>> Put differently, I’m curious whether GRACEFUL_DISCONNECT is meant to be
>>>> purely a per-connection draining signal, or whether it could also serve as
>>>> a stronger expression of *operator intent* to remove a node from
>>>> service—something that would help eliminate gray-failure traffic entirely.
>>>>
>>>> Thanks again for pushing this forward; it looks very promising.
>>>>
>>>> Best,
>>>> Runtian
>>>>
>>>> On Thu, Jan 29, 2026 at 5:44 PM Jane H <[email protected]> wrote:
>>>>
>>>>> Hi Patrick,
>>>>>
>>>>> Thanks for reading the CEP and for the thoughtful questions! Replies
>>>>> below.
>>>>>
>>>>> Driver backward compatibility / mixed rollouts
>>>>> --------
>>>>> This is fully opt-in per connection. Older drivers won’t REGISTER for
>>>>> GRACEFUL_DISCONNECT, so servers won’t send it to them, and those
>>>>> connections behave exactly as they do today.
>>>>>
>>>>> REGISTER vs STARTUP for opt-in
>>>>> -------
>>>>> There are two plausible ways for a driver to opt in to
>>>>> GRACEFUL_DISCONNECT:
>>>>>
>>>>> Option A: REGISTER (as proposed today)
>>>>> | Driver behavior                                            | Server
>>>>> behavior
>>>>>              |
>>>>>
>>>>> |------------------------------------------------------------|------------------------------------------------------------------------------------------------|
>>>>> | Send `OPTIONS`                                             | Return
>>>>> `SUPPORTED` (`Map<String, List<String>>`) containing
>>>>> `"GRACEFUL_DISCONNECT": ["true"]`. |
>>>>> | Send `STARTUP` as normal                                   |
>>>>> Optionally handle authentication as normal. Send `READY` as normal.
>>>>>                    |
>>>>> | Send `REGISTER` including event type `GRACEFUL_DISCONNECT` |
>>>>> Acknowledge normally (e.g., `READY`).
>>>>>                    |
>>>>>
>>>>> This is consistent with the protocol: REGISTER is the standard
>>>>> mechanism to subscribe to events.
>>>>> However, this does add an extra round trip per query connection that
>>>>> wants the event. Today most drivers only REGISTER on the control 
>>>>> connection
>>>>> for cluster-wide events (STATUS_CHANGE / TOPOLOGY_CHANGE / SCHEMA_CHANGE),
>>>>> and query connections typically do not REGISTER anything. If we want every
>>>>> query connection to receive GRACEFUL_DISCONNECT (because the signal is
>>>>> connection-local), then every query connection would need to send 
>>>>> REGISTER,
>>>>> which means one additional message exchange during connection 
>>>>> establishment.
>>>>>
>>>>> Option B: STARTUP opt-in (alternative)
>>>>> | Driver behavior
>>>>>                                                           | Server 
>>>>> behavior
>>>>>
>>>>>    |
>>>>> |
>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>> |
>>>>> ----------------------------------------------------------------------------------------------
>>>>> |
>>>>> | Send `OPTIONS`
>>>>>                                                          | Return
>>>>> `SUPPORTED` (`Map<String, List<String>>`) containing
>>>>> `"GRACEFUL_DISCONNECT": ["true"]`. |
>>>>> | Send STARTUP with an additional entry in the options map, e.g. {
>>>>> "CQL_VERSION": "3.0.0", "GRACEFUL_DISCONNECT": "true", ... } | Optionally
>>>>> handle authentication as normal. Send `READY` as normal.
>>>>>          |
>>>>>
>>>>> This avoids the extra round trip, because the opt-in piggybacks on an
>>>>> existing step in the handshake. But it introduces new semantics: STARTUP
>>>>> options would be used to request an event stream subscription, which is
>>>>> non-standard given that REGISTER already exists for that purpose.
>>>>>
>>>>> Given the above, we prefer REGISTER for Protocol semantics
>>>>> consistency, even though it costs one additional round trip on each query
>>>>> connection that opts in.
>>>>>
>>>>> Signal multiplication
>>>>> --------
>>>>> The protocol guidance about “don’t REGISTER on all connections” is
>>>>> primarily aimed at the existing out-of-band events (STATUS_CHANGE /
>>>>> TOPOLOGY_CHANGE / SCHEMA_CHANGE). Those events are gossip-driven and
>>>>> broadcast by multiple nodes, so registering on many connections can easily
>>>>> produce redundant notifications.
>>>>>
>>>>> Concrete example (duplication with STATUS_CHANGE):
>>>>> * In a 3-node cluster (node1, node2, node3), node1 is going down.
>>>>> * Node2 and node3 learn about node1’s state change via gossip.
>>>>> * Both node2 and node3 will send a STATUS_CHANGE event (“node1 is
>>>>> DOWN”) to every client connection that registered for STATUS_CHANGE.
>>>>> * If a driver registers for STATUS_CHANGE on connections to both node2
>>>>> and node3, it will receive two notifications for the same cluster event.
>>>>> That’s the “signal multiplication” the spec warns about.
>>>>>
>>>>> But the protocol does not stop us from adding an in-band event like
>>>>> GRACEFUL_DISCONNECT. In the above example of node1 going down:
>>>>> * GRACEFUL_DISCONNECT is in-band and connection-local, not
>>>>> gossip/broadcast.
>>>>> * Only the node that is actually shutting down (node1) emits
>>>>> GRACEFUL_DISCONNECT, and it emits it only on its own native connections
>>>>> that opted in.
>>>>> * Node2 and node3 do not emit GRACEFUL_DISCONNECT for node1’s
>>>>> shutdown, because they are not the node being drained.
>>>>> So even if a driver has connections to node2 and node3 that are
>>>>> registered for other events, it will not receive any GRACEFUL_DISCONNECT
>>>>> from them for node1 going down.
>>>>>
>>>>> I understand such an in-band event is new. We can add a clarification
>>>>> to the protocol explaining that the recommendation of “don’t REGISTER on
>>>>> all connections” will not apply to in-band events like 
>>>>> GRACEFUL_DISCONNECT.
>>>>>
>>>>> Event timing for operators
>>>>> ---------
>>>>> Servers should emit GRACEFUL_DISCONNECT whenever it needs to close a
>>>>> connection gracefully, regardless of the trigger.
>>>>> I’ll update the CEP to clarify that GRACEFUL_DISCONNECT to whenever
>>>>> the server intends to close a connection gracefully, including nodetool
>>>>> drain, nodetool disablebinary + shutdown, rolling restarts, or a 
>>>>> controlled
>>>>> JVM shutdown hook path.
>>>>>
>>>>> Operator control + observability
>>>>> ---------
>>>>> +1. I agree to add the server-side configs:
>>>>> graceful_disconnect_enabled graceful_disconnect_grace_period_ms
>>>>> graceful_disconnect_max_drain_ms
>>>>> And metrics/counters such as: connections_draining forced_disconnects
>>>>> I’ll update the CEP accordingly.
>>>>>
>>>>> Thanks again—this feedback is super helpful for tightening the
>>>>> proposal.
>>>>>
>>>>> Regards,
>>>>> Jane
>>>>>
>>>>> On Wed, Jan 14, 2026 at 1:33 PM Patrick McFadin <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Jane,
>>>>>>
>>>>>> Thank you for the thought-out CEP. I certainly see the use of
>>>>>> a feature like this to add resilience during cluster state changes. I 
>>>>>> have
>>>>>> a few questions after reading the CEP.
>>>>>>
>>>>>> Driver compatibility: The way I read this, it's based on an ideal
>>>>>> scenario where client and server are on the same version to support this
>>>>>> feature. In my experience, client rollouts are never complete and often 
>>>>>> lag
>>>>>> far behind the cluster upgrade. What happens when the driver completely
>>>>>> ignores GRACEFUL_DISCONNECT? It might mean considering something on the
>>>>>> server side.
>>>>>>
>>>>>> Discovery things: Speaking of the client, you want to use the
>>>>>> SUPPORTED as listed in the v4 spec[1], but why not add this to STARTUP? 
>>>>>> You
>>>>>> mention something in the "Rejected alternatives," but could you expand 
>>>>>> your
>>>>>> thinking here?
>>>>>>
>>>>>> Signal multiplication: You have this in the CEP "Other protocols
>>>>>> (HTTP/2, PostgreSQL, Redis Cluster) use connection-local in-band signals 
>>>>>> to
>>>>>> enable safe draining." Our protocol guidance[1] explicitly notes that
>>>>>> drivers often keep multiple connections and should not register for 
>>>>>> events
>>>>>> on all of them, as this duplicates traffic. I don't know how you could
>>>>>> ensure that every connection would be aware of a GRACEFUL_DISCONNECT
>>>>>> without changing that aspect of the spec.
>>>>>>
>>>>>>
>>>>>> Event timing for operators: It's not clear to me when
>>>>>> the GRACEFUL_DISCONNECT is emitted when you do something like a drain,
>>>>>> disablebinary or just a JVM shutdown hook. This is crucial for operators 
>>>>>> to
>>>>>> understand how this could work and should be in the CEP spec for 
>>>>>> clarity. I
>>>>>> think it will matter to a lot of people.
>>>>>>
>>>>>> Operator control: I've been on this push for a while and so I have to
>>>>>> mention it. Opt-in vs default. We need more controls in the config YAML.
>>>>>> graceful_disconnect_enabled
>>>>>>
>>>>>> If there is a server-side component:
>>>>>> graceful_disconnect_grace_period_ms
>>>>>> graceful_disconnect_max_drain_ms
>>>>>>
>>>>>> And finally, it needs more observability...
>>>>>> logging/metrics counters: connections_draining, forced_disconnects
>>>>>>
>>>>>>
>>>>>> Thanks for proposing this!
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> 1 -
>>>>>> https://cassandra.apache.org/doc/latest/cassandra/_attachments/native_protocol_v4.html
>>>>>>
>>>>>> On Tue, Jan 13, 2026 at 4:30 PM Jane H <[email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I’d like to start a discussion on a CEP proposal: *CEP-59: Graceful
>>>>>>> Disconnect*, to make intentional node shutdown/drain less
>>>>>>> disruptive for clients (link:
>>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406619103
>>>>>>> ).
>>>>>>>
>>>>>>> Today, intentional node shutdown (e.g., rolling restarts) can still
>>>>>>> be disruptive from a client perspective. Drivers often ignore DOWN
>>>>>>> events because they are not reliable, and outstanding requests can end 
>>>>>>> up
>>>>>>> as client-facing TimeOut exceptions.
>>>>>>>
>>>>>>> The proposed solution is to add an in-band GRACEFUL_DISCONNECT
>>>>>>> event that both control and query connections can opt into via
>>>>>>> REGISTER. When a node is shutting down, it will emit the event to
>>>>>>> all subscribed connections. Drivers will stop sending new queries on 
>>>>>>> that
>>>>>>> connection/host, allow in-flight requests to finish, then reconnect with
>>>>>>> exponential backoff.
>>>>>>>
>>>>>>> If you have thoughts on the proposed protocol, server shutdown
>>>>>>> behavior, driver expectations, edge cases, or general feedback, I’d 
>>>>>>> really
>>>>>>> appreciate it.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jane
>>>>>>>
>>>>>>

Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Reply via email to