Thanks Marc.

In the meantime, I can also say, that it happens for TC 8.5 using JSSE or OpenSSL and NIO or NIO2. I did not try APR.
Unfortunately I can still reproduce on TC 8.5 even without the sync 
patch you mentioned below. The code curretly under test has:
diff --git a/java/org/apache/coyote/http2/Stream.java 
b/java/org/apache/coyote/http2/Stream.java
index fffd2403e8..196cd5b85c 100644
--- a/java/org/apache/coyote/http2/Stream.java
+++ b/java/org/apache/coyote/http2/Stream.java
@@ -614,14 +614,8 @@ class Stream extends AbstractNonZeroStream implements HeaderEmitter { log.debug(sm.getString("stream.reset.send", getConnectionId(), getIdAsString(),
                             se.getError()));
                 }
- // Sync ensures that if the call to sendReset() triggers resets - // in other threads, that the RST frame associated with this - // thread is sent before the RST frames associated with those
-                // threads.
-                synchronized (state) {
-                    state.sendReset();
-                    handler.sendStreamReset(se);
-                }
+                state.sendReset();
+                handler.sendStreamReset(se);
                 cancelAllocationRequests();
                 if (inputBuffer != null) {
                     inputBuffer.swallowUnread();

But: For TC 10.0 I could not reproduce as easily, even not with the original sync code in place. I stopped after 20 attempts without deadlock, but will try longer for TC 10 and also TC 9 later today.
Thanks and regards,

Rainer

Am 03.01.2022 um 11:55 schrieb Mark Thomas:
On 03/01/2022 09:45, Mark Thomas wrote:
Hi Rainer,

Thanks for finding this. It isn't something I have seen in my testing. I think this is something that needs to be fixed before the January set of releases.
 From the stack trace, it looks like the root cause is locks being 
obtained in an inconsistent order - a classic deadlock.
I haven't looked at the code or the history yet so I am not sure if 
this is the direct result of a recent change or if another change has 
just made this easier to trigger. I plan to look at this today.
This change and the back-ports appear to be the trigger for this deadlock:

https://github.com/apache/tomcat/commit/5782322afd31adf98b72288f99965c6811dcdcdd
Mark

Mark


On 01/01/2022 19:07, Rainer Jung wrote:
Hi hi,

I am running the unit tests for TC 8.5.73 plus few post-release patches on Solaris 10 Sparc with various Java 8 JVMs. I noticed one deadlock when running on Zulu 8.58.0.13-CA-solaris (build 1.8.0_312-b07). Maybe it is a sporadic deadlock and could also happen on the 1.8.0 variations, but I could not yet check that. I did not notice such a deadlock on 5 Linux distributions on which I also ran all unit tests with a variety of JVMs, including the Zulu one.
Due to the logs the deadlock happens in 
org.apache.coyote.http2.TestCancelledUpload, but 
org.apache.coyote.http2.TestFlowControl runs concurrently at the same 
time (zwo test threads). Test methods are testCancelledRequest rwsp. 
testNotFound.
The stacks are:

Found one Java-level deadlock:
=============================
"http-nio-127.0.0.1-auto-1-exec-7":
   waiting to lock monitor 0x0000000100f99508 (object 0xffffffff41a99b40, a org.apache.coyote.http2.StreamStateMachine),
   which is held by "http-nio-127.0.0.1-auto-1-exec-5"
"http-nio-127.0.0.1-auto-1-exec-5":
   waiting to lock monitor 0x00000001002da838 (object 0xffffffff42015548, a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper),
   which is held by "http-nio-127.0.0.1-auto-1-exec-7"

Java stack information for the threads listed above:
===================================================
"http-nio-127.0.0.1-auto-1-exec-7":
         at org.apache.coyote.http2.StreamStateMachine.checkFrameType(StreamStateMachine.java:125)          - waiting to lock <0xffffffff41a99b40> (a org.apache.coyote.http2.StreamStateMachine)          at org.apache.coyote.http2.AbstractNonZeroStream.checkState(AbstractNonZeroStream.java:144)          at org.apache.coyote.http2.Http2UpgradeHandler.startRequestBodyFrame(Http2UpgradeHandler.java:1641)          at org.apache.coyote.http2.Http2Parser.readDataFrame(Http2Parser.java:168)          at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:95)          at org.apache.coyote.http2.Http2Parser.readFrame(Http2Parser.java:69)          at org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:340)          at org.apache.coyote.http11.upgrade.UpgradeProcessorInternal.dispatch(UpgradeProcessorInternal.java:60)          at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:59)          at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:849)          at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1677)          at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)          - locked <0xffffffff42015548> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)          at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)          at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)          at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
         at java.lang.Thread.run(Thread.java:748)


"http-nio-127.0.0.1-auto-1-exec-5":
         at org.apache.coyote.http2.Http2UpgradeHandler.sendStreamReset(Http2UpgradeHandler.java:558)          - waiting to lock <0xffffffff42015548> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
         at org.apache.coyote.http2.Stream.close(Stream.java:623)
         - locked <0xffffffff41a99b40> (a org.apache.coyote.http2.StreamStateMachine)          at org.apache.coyote.http2.StreamProcessor.process(StreamProcessor.java:85)          - locked <0xffffffff41ac4888> (a org.apache.coyote.http2.StreamProcessor)          at org.apache.coyote.http2.StreamRunnable.run(StreamRunnable.java:35)          at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191)          at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)          at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
         at java.lang.Thread.run(Thread.java:748)

I am attaching the detailed log from the start of the test cases until the last line that was logged for either of the two deadlocked threads. Note that unit testing proceeds for test thread 1 until the remaining tests are done. Only testing on thread 2 stopps due to the deadlock.
I will kill the process now and I will see, whether it is reproducible.

The three added patches - I guess they are not responsible, but mentioning them for the sake of completeness - are:
- 
ThreadPoolExecutor_prestartAllCoreThreads-23c78507b5d3dc4c0bd36d263e4f99aa8221205c.patch 

- 
revert_previous_fix-BZ65714-07747b8ca36ffd29350af24d1c9fd05a174ba25d.patch 

- improved_fix-BZ65714-4795df9bf89f84decafa276805d0c265f93eb368.patch

Thanks and regards,

Rainer
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to