Re: A curious case of Tomcat 10.1.x NIO(1) acceptor not stopping clearly on some setups

2024-02-25 Thread Chuck Caldarale

> On Feb 23, 2024, at 10:18, Michał Szymborski  
> wrote:
> 
> Hi, I've encountered an issue where the acceptor doesn't stop cleanly when 
> shutting Tomcat down on some machines. I'm using tomcat-embed-core 10.1.19, 
> I've tested it on Ubuntu 22.04 and MacOs Sonoma 14.3.1.
> 
> Here is a minimum reproducible example (also available here 
> https://github.com/lared/tomcat-acceptor-not-stopping-cleanly):
> 
> ```
> import org.apache.catalina.LifecycleException;
> import org.apache.catalina.connector.Connector;
> import org.apache.catalina.startup.Tomcat;
> 
> class SlowStop {
>public static void main(String[] args) throws LifecycleException {
>var connector = new 
> Connector("org.apache.coyote.http11.Http11NioProtocol");
>connector.setPort(0);
> 
>Tomcat tomcat = new Tomcat();
>tomcat.getService().addConnector(connector);
> 
>tomcat.start();
>tomcat.stop();
>}
> }
> ```
> 
> 
> Whenever I run this code on setup as listed above, shutdown of the 
> ProtocolHandler times out when trying to stop the acceptor. Here is an 
> excerpt from logs:
> 
> ```
> Feb 23, 2024 4:42:34 PM org.apache.coyote.AbstractProtocol stop
> INFO: Stopping ProtocolHandler ["http-nio-8080"]
> Feb 23, 2024 4:42:44 PM org.apache.tomcat.util.net.Acceptor stop
> WARNING: The acceptor thread [http-nio-8080-Acceptor] did not stop cleanly
> ```
> 
> This can be traced down to `NioEndpoint` stopping, particularly stopping the 
> acceptor:
> 
> https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/NioEndpoint.java#L308
> 
> https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/Acceptor.java#L180
> 
> 
> This issue is not reproducible on every setup (see github actions run on the 
> linked repo), and it only affects NIO - NIO2 works as expected.
> 
> I've encountered this issue when Spring Boot got upgraded to 3.2.x line 
> (switch from Tomcat 9 to 10.1.x).
> 
> Could you please advise me where I can go from here? This is particularly 
> annoying in our testing setup, where timely teardown of the Tomcat is very 
> appreciated (even if disruptive).
> 
> Unfortunately I'm a fair bit out of my depth here when it comes to 
> troubleshooting this further. It is rather surprising that even though people 
> were not able to reproduce it on some of their setups (like MacOs 13.6.4), 
> but I actually was able to reproduce it on two machines I own, my colleagues 
> at work have the same issue.


I could not reproduce the error in several hundred executions of your test 
program on my M1 MacBook Pro (Sonoma 14.3.1, OpenJDK 21.0.1) using just command 
line execution:

chuck@Chuck-MacBookPro JavaTest > java -version
openjdk version "21.0.1" 2023-10-17 LTS
OpenJDK Runtime Environment Temurin-21.0.1+12 (build 21.0.1+12-LTS)
OpenJDK 64-Bit Server VM Temurin-21.0.1+12 (build 21.0.1+12-LTS, mixed mode)
chuck@Chuck-MacBookPro JavaTest > java -cp `find 
~/Downloads/apache-tomcat-10.1.19-embed -name '*.jar' | tr '\n' ':' | sed 
's/:$/:./'` SlowStop
Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-nio-auto-1"]
Feb 25, 2024 8:55:54 AM org.apache.catalina.core.StandardService startInternal
INFO: Starting service [Tomcat]
Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-nio-auto-1-56406"]
Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-nio-auto-1-56406"]
Feb 25, 2024 8:55:54 AM org.apache.catalina.core.StandardService stopInternal
INFO: Stopping service [Tomcat]
Feb 25, 2024 8:55:54 AM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler ["http-nio-auto-1-56406"]

Any chance the problem has something to do with your gradle environment?

  - Chuck



Re: A curious case of Tomcat 10.1.x NIO(1) acceptor not stopping clearly on some setups

2024-02-25 Thread Michał Szymborski
Thank you for taking the time. Removing Gradle from equation did not 
make a difference.


I did what I should have done in the beginning though and turned up the 
logging level. When an endpoint gets shut down, since there is no other 
way of waking the Acceptor up, a fake connection is made. This is timing 
out apparently:



Feb 25, 2024 6:23:24 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept
FINE: Caught exception trying to unlock accept on port [0]
java.net.SocketTimeoutException: Connect timed out
at 
java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327)
at java.base/java.net.Socket.connect(Socket.java:633)
at 
org.apache.tomcat.util.net.AbstractEndpoint.unlockAccept(AbstractEndpoint.java:1124)
at 
org.apache.tomcat.util.net.NioEndpoint.unlockAccept(NioEndpoint.java:390)
at 
org.apache.tomcat.util.net.AbstractEndpoint.pause(AbstractEndpoint.java:1394)
at org.apache.coyote.AbstractProtocol.pause(AbstractProtocol.java:678)
at org.apache.catalina.connector.Connector.pause(Connector.java:963)
at 
org.apache.catalina.core.StandardService.stopInternal(StandardService.java:484)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:242)
at 
org.apache.catalina.core.StandardServer.stopInternal(StandardServer.java:974)
at org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:242)
at org.apache.catalina.startup.Tomcat.stop(Tomcat.java:447)
at SlowStop.main(SlowStop.java:14)


You can see the procedure here: 
https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/AbstractEndpoint.java#L1086


What I noted:
- the local address for shutdown is NOT loopback
- the connection is timing out

After I did some magic to increase the connection timeout from 2000ms to 
60 000ms - I got "No route to host".


So after looking up which IP the device belonged to - it wasn't the 
interface I was looking for.


On quick inspection the acceptor thread 
(https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/Acceptor.java#L128) 
was listening on [/[0:0:0:0:0:0:0:0]:39033]
, which was correctly picked up at first, but then this local address 
got transformed:


https://github.com/apache/tomcat/blob/10.1.x/java/org/apache/tomcat/util/net/AbstractEndpoint.java#L1164

It started picking up interfaces to use, and it stopped at the first 
non-loopback non-link local address, which also happens to be some sort 
of a bridge network for one of my VMs. I don't quite know why there is 
no routing set up, but this interface should not have been picked in the 
first place.


Here's a list of my interfaces:

- lo (loopback)
- wlp4s0 (wifi)
- docker0 (docker networking stuff)
- br-e26d1e697a66 (some bridge)
- br-54aa505a4d52 (some bridge)

The code picked up br-54aa505a4d52 first, it has those IP addresses:

- /fe80:0:0:0:0:0:0:1%br-e26d1e697a66
- /fc00:f853:ccd:e793:0:0:0:1%br-e26d1e697a66
- /172.18.0.1

It stopped iterating once it got to the second one, tried using it and 
failed. It should not have tried to touch the bridge networks at all.


I'll take a look at how it works on my work laptop with MacOs, but I'd 
wager a guess that some corporate VPNs have interfaces which have messed 
up routing as well. Can't tell if it's a bug or a feature, but it sure 
is unexpected. Making this timeout for acceptor shutdown configurable 
would be one sort of band-aid.


Chuck - could you please share your thoughts on this?

Sample logs, I updated the repo with this script:


$ ./no_gradle.sh
Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler ["http-nio-auto-1"]
Feb 25, 2024 7:13:08 PM org.apache.catalina.core.StandardService startInternal
INFO: Starting service [Tomcat]
Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-nio-auto-1-37591"]
Feb 25, 2024 7:13:08 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-nio-auto-1-37591"]
Feb 25, 2024 7:13:08 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept
FINER: About to unlock socket 
for:/[fc00:f853:ccd:e793:0:0:0:1%br-e26d1e697a66]:37591
Feb 25, 2024 7:13:09 PM org.apache.tomcat.util.net.NioEndpoint$Poller timeout
FINER: timeout completed: keys processed=0; now=1708884789930; 
nextExpiration=0; keyCount=0; hasEvents=false; eval=false
Feb 25, 2024 7:13:10 PM org.apache.tomcat.util.net.AbstractEndpoint unlockAccept
FINE: Caught exception trying to unlock accept on port [0]
java.net.SocketTimeoutException: Connect timed out
at 
java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
at java.base/java.net

Need help for a problem on migrating from Tomcat-8 to Tomcat-9

2024-02-25 Thread Saha, Rajib
Hi Experts,

In our product, we are using Tomcat [OriginalFileName:  prunsrv.exe] for 
creating a service[Say, Service-A]. It's a huge product running in market for 
last 20 years.
We are in progress of moving from Tomcat-8 to tomcat-9.

When we are creating the Service-A with Tomcat-8 [tomcat8.exe]. In "Services" 
desktop app, we can see the service is created with "Local System" in "Log On 
as".
When we are creating the Service-A with Tomcat-9 [tomcat9.exe]. in "Services" 
desktop app, we can see the service is created with "Local service" in "Log On 
as".

Looks like "Local service" has less power than "Local System".
Due to it, Service-A created with Tomcat-9 failing for several operation inside 
product.

Can somebody suggest, how we can create a service with tomcat-9, with the 
privilege of "Local System"?

Please let me know, if any more details is required from my side.

Regards
Rajib