Re: A curious case of Tomcat 10.1.x NIO(1) acceptor not stopping clearly on some setups
> On Feb 23, 2024, at 10:18, Michał Szymborski > wrote: > > Hi, I've encountered an issue where the acceptor doesn't stop cleanly when > shutting Tomcat down on some machines. I'm using tomcat-embed-core 10.1.19, > I've tested it on Ubuntu 22.04 and MacOs Sonoma 14.3.1. > > Here is a minimum reproducible example (also available here > https://github.com/lared/tomcat-acceptor-not-stopping-cleanly): > > ``` > import org.apache.catalina.LifecycleException; > import org.apache.catalina.connector.Connector; > import org.apache.catalina.startup.Tomcat; > > class SlowStop { >public static void main(String[] args) throws LifecycleException { >var connector = new > Connector("org.apache.coyote.http11.Http11NioProtocol"); >connector.setPort(0); > >Tomcat tomcat = new Tomcat(); >tomcat.getService().addConnector(connector); > >tomcat.start(); >tomcat.stop(); >} > } > ``` > > > Whenever I run this code on setup as listed above, shutdown of the > ProtocolHandler times out when trying to stop the acceptor. Here is an > excerpt from logs: > > ``` > Feb 23, 2024 4:42:34 PM org.apache.coyote.AbstractProtocol stop > INFO: Stopping ProtocolHandler ["http-nio-8080"] > Feb 23, 2024 4:42:44 PM org.apache.tomcat.util.net.Acceptor stop > WARNING: The acceptor thread [http-nio-8080-Acceptor] did not stop cleanly > ``` > > This can be traced down to `NioEndpoint` stopping, particularly stopping the > acceptor: > > https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/NioEndpoint.java#L308 > > https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/Acceptor.java#L180 > > > This issue is not reproducible on every setup (see github actions run on the > linked repo), and it only affects NIO - NIO2 works as expected. > > I've encountered this issue when Spring Boot got upgraded to 3.2.x line > (switch from Tomcat 9 to 10.1.x). > > Could you please advise me where I can go from here? This is particularly > annoying in our testing setup, where timely teardown of the Tomcat is very > appreciated (even if disruptive). > > Unfortunately I'm a fair bit out of my depth here when it comes to > troubleshooting this further. It is rather surprising that even though people > were not able to reproduce it on some of their setups (like MacOs 13.6.4), > but I actually was able to reproduce it on two machines I own, my colleagues > at work have the same issue. I could not reproduce the error in several hundred executions of your test program on my M1 MacBook Pro (Sonoma 14.3.1, OpenJDK 21.0.1) using just command line execution: chuck@Chuck-MacBookPro JavaTest > java -version openjdk version "21.0.1" 2023-10-17 LTS OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS) OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode) chuck@Chuck-MacBookPro JavaTest > java -cp `find ~/Downloads/apache-tomcat-10.1.19-embed -name '*.jar' | tr '\n' ':' | sed 's/:$/:./'` SlowStop Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-nio-auto-1"] Feb 25, 2024 8:55:54 AM org.apache.catalina.core.StandardService startInternal INFO: Starting service [Tomcat] Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-nio-auto-1-56406"] Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["http-nio-auto-1-56406"] Feb 25, 2024 8:55:54 AM org.apache.catalina.core.StandardService stopInternal INFO: Stopping service [Tomcat] Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol stop INFO: Stopping ProtocolHandler ["http-nio-auto-1-56406"] Any chance the problem has something to do with your gradle environment? - Chuck
Re: A curious case of Tomcat 10.1.x NIO(1) acceptor not stopping clearly on some setups
Thank you for taking the time. Removing Gradle from equation did not make a difference. I did what I should have done in the beginning though and turned up the logging level. When an endpoint gets shut down, since there is no other way of waking the Acceptor up, a fake connection is made. This is timing out apparently: Feb 25, 2024 6:23:24 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept FINE: Caught exception trying to unlock accept on port [0] java.net.SocketTimeoutException: Connect timed out at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) at java.base/java.net.Socket.connect(Socket.java:633) at org.apache.tomcat.util.net.AbstractEndpoint.unlockAccept(AbstractEndpoint.java:1124) at org.apache.tomcat.util.net.NioEndpoint.unlockAccept(NioEndpoint.java:390) at org.apache.tomcat.util.net.AbstractEndpoint.pause(AbstractEndpoint.java:1394) at org.apache.coyote.AbstractProtocol.pause(AbstractProtocol.java:678) at org.apache.catalina.connector.Connector.pause(Connector.java:963) at org.apache.catalina.core.StandardService.stopInternal(StandardService.java:484) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:242) at org.apache.catalina.core.StandardServer.stopInternal(StandardServer.java:974) at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:242) at org.apache.catalina.startup.Tomcat.stop(Tomcat.java:447) at SlowStop.main(SlowStop.java:14) You can see the procedure here: https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/AbstractEndpoint.java#L1086 What I noted: - the local address for shutdown is NOT loopback - the connection is timing out After I did some magic to increase the connection timeout from 2000ms to 60 000ms - I got "No route to host". So after looking up which IP the device belonged to - it wasn't the interface I was looking for. On quick inspection the acceptor thread (https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/Acceptor.java#L128) was listening on [/[0:0:0:0:0:0:0:0]:39033] , which was correctly picked up at first, but then this local address got transformed: https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/AbstractEndpoint.java#L1164 It started picking up interfaces to use, and it stopped at the first non-loopback non-link local address, which also happens to be some sort of a bridge network for one of my VMs. I don't quite know why there is no routing set up, but this interface should not have been picked in the first place. Here's a list of my interfaces: - lo (loopback) - wlp4s0 (wifi) - docker0 (docker networking stuff) - br-e26d1e697a66 (some bridge) - br-54aa505a4d52 (some bridge) The code picked up br-54aa505a4d52 first, it has those IP addresses: - /fe80:0:0:0:0:0:0:1%br-e26d1e697a66 - /fc00:f853:ccd:e793:0:0:0:1%br-e26d1e697a66 - /172.18.0.1 It stopped iterating once it got to the second one, tried using it and failed. It should not have tried to touch the bridge networks at all. I'll take a look at how it works on my work laptop with MacOs, but I'd wager a guess that some corporate VPNs have interfaces which have messed up routing as well. Can't tell if it's a bug or a feature, but it sure is unexpected. Making this timeout for acceptor shutdown configurable would be one sort of band-aid. Chuck - could you please share your thoughts on this? Sample logs, I updated the repo with this script: $ ./no_gradle.sh Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol init INFO: Initializing ProtocolHandler ["http-nio-auto-1"] Feb 25, 2024 7:13:08 PM org.apache.catalina.core.StandardService startInternal INFO: Starting service [Tomcat] Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol start INFO: Starting ProtocolHandler ["http-nio-auto-1-37591"] Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol pause INFO: Pausing ProtocolHandler ["http-nio-auto-1-37591"] Feb 25, 2024 7:13:08 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept FINER: About to unlock socket for:/[fc00:f853:ccd:e793:0:0:0:1%br-e26d1e697a66]:37591 Feb 25, 2024 7:13:09 PM org.apache.tomcat.util.net.NioEndpoint$Poller timeout FINER: timeout completed: keys processed=0; now=1708884789930; nextExpiration=0; keyCount=0; hasEvents=false; eval=false Feb 25, 2024 7:13:10 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept FINE: Caught exception trying to unlock accept on port [0] java.net.SocketTimeoutException: Connect timed out at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) at java.base/java.net
Need help for a problem on migrating from Tomcat-8 to Tomcat-9
Hi Experts, In our product, we are using Tomcat [OriginalFileName: prunsrv.exe] for creating a service[Say, Service-A]. It's a huge product running in market for last 20 years. We are in progress of moving from Tomcat-8 to tomcat-9. When we are creating the Service-A with Tomcat-8 [tomcat8.exe]. In "Services" desktop app, we can see the service is created with "Local System" in "Log On as". When we are creating the Service-A with Tomcat-9 [tomcat9.exe]. in "Services" desktop app, we can see the service is created with "Local service" in "Log On as". Looks like "Local service" has less power than "Local System". Due to it, Service-A created with Tomcat-9 failing for several operation inside product. Can somebody suggest, how we can create a service with tomcat-9, with the privilege of "Local System"? Please let me know, if any more details is required from my side. Regards Rajib