Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Jane H Thu, 29 Jan 2026 17:32:04 -0800

Hi Abe,

Thank you for your questions! Replies below.


Why OverloadedException is insufficient
--------
There are two different situations here, and they call for different client
behavior:

* Desired client behavior on a *real* OverloadedException:
  Retry the request on the next node (optionally with some temporary
backoff for that node), because the overload may be transient and the node
is still healthy.

* Desired client behavior on a node that is intentionally shutting down:
  Stop sending *new* requests on that connection, allow in-flight requests
to complete if possible, and begin reconnecting with backoff.

Today, OverloadedException doesn’t let drivers consistently implement the
second behavior. Drivers typically treat it as a per-request failure: they
retry that request on another node, but the pool can still send subsequent
requests to the same node/connection. If the node is actually in the
process of going away, those subsequent requests can still hit that
connection and end up as client-visible timeouts.

Therefore we need GRACEFUL_DISCONNECT as a deterministic, connection-local
drain signal: it lets the driver immediately transition that specific
connection into draining mode, avoiding failures/retries/timeouts in the
first place.

Appendix [1] includes a Simulacron example illustrating this: the server
first responds with OverloadedException and then goes away; the client's
subsequent requests sent on the same connection time out.

Server handling of requests after GRACEFUL_DISCONNECT
--------
I think the server must assume it can still receive some requests right
after emitting the event (e.g. requests already in flight on the wire / in
buffers, or the event being delayed). I don’t see a robust way for the
server to perfectly classify “sent before vs after the client observed the
event” at the protocol level.
The safest behavior I can think of is:
* Server stops accepting new connections immediately, preventing the race
conditions where the new connections never get to receive
GRACEFUL_DISCONNECT,
* emit GRACEFUL_DISCONNECT,
* keep processing requests that arrive on already-established subscribed
connections during a bounded grace/drain window,
* then force-close anything that hasn’t drained by the deadline.
That minimizes mid-flight interruption (and late retries deep into the
client latency budget), while still giving operators a hard upper bound on
how long shutdown can be deferred.

Thanks again for the feedback!

Regards,
Jane

Appendix [1] Simulacron example
private static final SimulacronRule SIMULACRON_RULE =
    new SimulacronRule(ClusterSpec.builder().withNodes(1));

private static final SessionRule<CqlSession> SESSION_RULE =
    SessionRule.builder(SIMULACRON_RULE).build();
@Test
public void should_timeout_when_node_returns_overloaded_then_delays() {
  // This test reproduces the case where a server shutting down throws an
OverloadedException
  // and the driver still sends subsequent requests to that node, which
ends up timing out.

  String query = "SELECT * FROM test";

  // First request: node returns OverloadedException (simulating server
shutting down)

SIMULACRON_RULE.cluster().prime(when(query).then(overloaded("Overloaded")));

  SimpleStatement st =
      SimpleStatement.builder(query)
          .setTimeout(Duration.ofSeconds(2))
          .setConsistencyLevel(DefaultConsistencyLevel.ONE)
          .build();

  // Execute first request - should get OverloadedException
  Throwable firstError = catchThrowable(() ->
SESSION_RULE.session().execute(st));
  assertThat(firstError).isInstanceOf(OverloadedException.class);

  // Clear the first prime and set up a no response
  SIMULACRON_RULE.cluster().clearPrimes(true);
  SIMULACRON_RULE.cluster().prime(when(query).then(noResult()));

  // Execute second request - timeout
  Throwable secondError = catchThrowable(() ->
SESSION_RULE.session().execute(st));

  assertThat(secondError)
      .isInstanceOf(DriverTimeoutException.class)
      .hasMessageContaining("Query timed out after PT2S");
}


On Fri, Jan 16, 2026 at 12:28 PM Abe Ratnofsky <[email protected]> wrote:

> Thanks for the CEP Jane!
>
> Could you elaborate on why the current OverloadedException behavior is
> insufficient in 5.x (CASSANDRA-19534)? If a server instance can’t accept a
> new request, it signals that immediately to the client for retry against
> another host, then eventually the client should receive a NODE_DOWN event
> on the control connection. This does mean that clients need to have queries
> marked as idempotent in order for them to retry, and even non-idempotent
> queries are safe to retry when they fail in this way. In the past, we’ve
> discussed a new exception hierarchy that allows the server to indicate
> whether a non-idempotent query would be safe to retry.
>
> If we can address that drawback of OverloadedException for non-idempotent
> queries, are there any other drawbacks of the current approach?
>
> In your proposal, the server will still need to handle new requests issued
> from clients after GRACEFUL_DISCONNECT is sent, particularly if the EVENT
> is delayed or dropped. If those requests are going to get processed, we’ll
> either have to continue deferring shutdown, or interrupt them and trigger
> retries later into the client’s latency budget.

Re: [DISCUSS] CEP-59: Graceful Disconnect – In-Band Connection Draining for Node Shutdown

Reply via email to