Tomcat doesn't gracefully close keep-alive connections

M. Thiim Thu, 17 Nov 2022 11:40:38 -0800

Hi,

We have observed that Tomcat doesn't gracefully close
keep-alive connections. Tomcat waits for already started requests to
complete, but once those are done, Tomcat will close all connections
immediately, irrespective of any configured keepAliveTimeout. This causes
problems for some HTTP clients, especially in Kubernetes-like environments
when scaling down pods. Here, it can only work gracefully if the HTTP
client who falls victim to an unexpectedly closed connection retries on a
fresh connection, and it is not all clients that do this.


I would think that an entirely graceful shutdown sequence, in the presence
of keep-alive connections, would work like the following:

1) Server receives shutdown request
2) Server immediately stops accepting new connections (already happens)
3) Server completes all requests already in  (already happens)
4) New behavior: If new requests come in on already established keep-alive
connections those are processed, but a "Connection: close" is returned so
the client knows this connection can no longer be used. So at most one more
request can be processed on each of those existing connections.
5) New behavior: When all keep-alive connections are gone, shutdown
proceeds. If there are still connections left after the keepAliveTimeout
has passed, this means no requests can have been received on those during
the shutdown period (otherwise they would have been closed in #4). And
since Tomcat returned the keep-alive timeout value to the client when the
connection was setup, the client should know that the connection is no
longer usable. Therefore it is from this point safe for Tomcat to close
those remaining connections.
6) Rest of server shutdown continues

Br, M. Thiim

---

Background: The current behavior is problematic in e.g. a Kubernetes
environment because there's no way to drain the traffic when scaling down
pods. While Kubernetes will immediately stop forwarding new connections to
the running pod, it can't do anything about already established
connections, including keep-alive connections. Those can be partially
handled by defining a preStop delay so that Kubernetes waits a certain
amount of time from the time it received the stop request and stopped new
connections and until it actually shuts down the application. However, even
this doesn't solve the problem because even if preStop delay is configured
to be longer than the keepAliveTimeout, there can still be open connections
because the keepAliveTimeout requires that the connection isn't used for
the whole period. A client can therefore avoid the timeout by just keep
using the connection (as will happen in a system with constant traffic). So
when the preStop delay ends,there can still be many open connections and
once Tomcat receives the shutdown signal it will currently just close these
connections. This causes problems for many different HTTP client
implementations. Some can be fixed by configuring those HTTP clients (i.e.
max lifetime of a keep alive connections) but that requires fixing all HTTP
clients that might call (Ingress proxies, pod-to-pod communication etc.)
and it seems the problem is better addressed through enforcement on the
server.

Tomcat doesn't gracefully close keep-alive connections

Reply via email to