Hi Chris That is a good suggestion. But more importantly, maybe update the documentation of the bindOnInit parameter, since it's not very intuitive this has anything to do with graceful shutdown and I don't think any connection with keep-alive is mentioned. I'm also not sure why bindOnInit=false it's the default, since it is needed for graceful shutdowns. Finally, in which versions is this supported? The options are documented in Tomcat 9 but they don't lead to this good behavior with keep-alive connections (I've tried the latest 9.0.69) whereas it works fine in the 10-series (tried 10.1.0). This is a bit annoying for us because we use Spring Boot and the newest production version is 2.7.5 and this specifically doesn't work with Tomcat 10. Tomcat 10 requires Spring Boot 3.0 which is only in release-candidate status. So we are a bit torn on what to do and even thinking of implementing my custom fix for the 9-series.
Br, M. Thiim Den fre. 18. nov. 2022 kl. 20.03 skrev Christopher Schultz < ch...@christopherschultz.net>: > Mark, > > On 11/18/22 10:38, Mark Thomas wrote: > > I've just found the comment I missed the first time. > > > > This has already been implemented. > > > > You need to set the following: > > > > Connector > > bindOnInit="false" > > > > Service > > gracefulStopAwaitMillis="20000" > > > > This assumes you are using the default keep-alive timeout of 20s. If > > not, adjust gracefulStopAwaitMillis accordingly. It needs to be at least > > as long as the keep-alive timeout. > > Should we cross-check these two configuration items and issue WARNING at > startup if gracefulStopAwaitMillis < keepAliveTimeout? > > -chris > > > On 18/11/2022 14:59, Mark Thomas wrote: > >> PRs are always welcome. From the description provided, I think more > >> use of the existing graceful shutdown could be made. > >> > >> I am also looking at whether a slightly broader refactoring is called > >> for as there looks to be scope for better alignment between pause, > >> stop and graceful stop. > >> > >> Mark > >> > >> > >> On 18/11/2022 14:29, M. Thiim wrote: > >>> Hi Mark > >>> > >>> Great! I've actually made an experimental patch that fixes it, but it's > >>> perhaps not completely clean. To get the behavior the waiting needs to > >>> happen after the server has stopped accepting new connections, this > >>> seems > >>> to be the pause-phase. But the Http11Processoer rejects incoming > >>> requests > >>> during the pause phase (condition check in the while loop in the > >>> service() > >>> method and also further down in the method), so I had to change this > >>> behaviour as well (so it accepts requests but responds to them with > >>> Connection: Close). I don't know if this breaks some assumptions that > it > >>> now continues to process requests in the pause phase. I also had to > >>> add a > >>> .waitForConnectionDrain() to the Connector class and to the protocol > >>> handler. > >>> > >>> Also, in order to avoid waiting for the full duration of the > >>> keepAliveTimeout in the cases when there's either no kept alive > >>> connections > >>> to begin with or where the server is able to close them earlier (due > >>> to new > >>> requests coming in that can be responded with Connection: Close, as > >>> would > >>> typically be the case in a busy system) I had to have a count on > current > >>> number of connections. The getConnectionCount() in > AbstractEndpoint.java > >>> seemed useful but there's catch: Its counting is based on the value > >>> of the > >>> connectionLimitLatch and this is specifically nulled out during the > >>> pause > >>> phase (I suppose because it also has the function of limiting > >>> connections), > >>> so the getConnectionCount() method returns -1 even if there are open > >>> connections. So I added a separate AtomicInteger that tracks the > >>> number of > >>> connections independently of the state and then changed > >>> getConnectionCount() to use this in the situations where the > >>> connectionLimitLatch is null. > >>> > >>> I can clean up and share the patch, but let me know if you think it > >>> should > >>> be done differently than described above. :-) > >>> > >>> Br, M. Thiim > >>> > >>> Den fre. 18. nov. 2022 kl. 14.34 skrev Mark Thomas <ma...@apache.org>: > >>> > >>>> On 17/11/2022 19:39, M. Thiim wrote: > >>>>> Hi, > >>>>> > >>>>> We have observed that Tomcat doesn't gracefully close > >>>>> keep-alive connections. Tomcat waits for already started requests to > >>>>> complete, but once those are done, Tomcat will close all connections > >>>>> immediately, irrespective of any configured keepAliveTimeout. This > >>>>> causes > >>>>> problems for some HTTP clients, especially in Kubernetes-like > >>>> environments > >>>>> when scaling down pods. Here, it can only work gracefully if the HTTP > >>>>> client who falls victim to an unexpectedly closed connection > >>>>> retries on a > >>>>> fresh connection, and it is not all clients that do this. > >>>>> > >>>>> I would think that an entirely graceful shutdown sequence, in the > >>>> presence > >>>>> of keep-alive connections, would work like the following: > >>>>> > >>>>> 1) Server receives shutdown request > >>>>> 2) Server immediately stops accepting new connections (already > >>>>> happens) > >>>>> 3) Server completes all requests already in (already happens) > >>>>> 4) New behavior: If new requests come in on already established > >>>> keep-alive > >>>>> connections those are processed, but a "Connection: close" is > >>>>> returned so > >>>>> the client knows this connection can no longer be used. So at most > one > >>>> more > >>>>> request can be processed on each of those existing connections. > >>>>> 5) New behavior: When all keep-alive connections are gone, shutdown > >>>>> proceeds. If there are still connections left after the > >>>>> keepAliveTimeout > >>>>> has passed, this means no requests can have been received on those > >>>>> during > >>>>> the shutdown period (otherwise they would have been closed in #4). > And > >>>>> since Tomcat returned the keep-alive timeout value to the client > >>>>> when the > >>>>> connection was setup, the client should know that the connection is > no > >>>>> longer usable. Therefore it is from this point safe for Tomcat to > >>>>> close > >>>>> those remaining connections. > >>>>> 6) Rest of server shutdown continues > >>>> > >>>> Seems a reasonable addition. > >>>> > >>>> It looks like extending the behaviour when gracefulStopAwaitMillis is > >>>> set on the Service would work. > >>>> > >>>> gracefulStopAwaitMillis would need to be greater than or equal to the > >>>> keep-alive timeout but we can document that as part of the patch. > >>>> > >>>> Mark > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > >>>> For additional commands, e-mail: dev-h...@tomcat.apache.org > >>>> > >>>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > >> For additional commands, e-mail: dev-h...@tomcat.apache.org > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > > For additional commands, e-mail: dev-h...@tomcat.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org > For additional commands, e-mail: dev-h...@tomcat.apache.org > >