Re: tomcat 10.1.33 random rare 500 response status for http2 upgrade with tls
Am 13.02.25 um 09:40 schrieb Cenk Pekyaman: ... We can reproduce the issue somewhat reliably with the reproducer setup we have in https://github.com/cpekyaman/java-server-http2-test. The odd part is we could only reproduce it with certain clients reliably, with go and curl (we include go client as the failing requests in our production systems come from a monitoring system written in go). Could you give the curl version as well? I noticed a test failure for some other web server in the HTTP2 area and learned it was due to some detail behavior change in curl. In this other situation the failure was a wrong expectation in the test class. but although you observer something here in real communication it might still be useful to try reproducing with the same curl version that you use in case an easy reproduction attempt does not work. Thanks! Rainer - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
tomcat 10.1.33 random rare 500 response status for http2 upgrade with tls
We run tomcat on java17 with the embedded tomcat setup. We have http and https connectors and we have http2 upgradeProtocol for both. We recently upgraded from 9.0.88 to 10.1.24 to work on javax to jakarta migration, and after a while, upgraded to 10.1.33. After the upgrade, we started to see random and rare 500 errors for some of the http2 GET requests over https. When reproducing the error on our development machines, we could see the following trace on the server: ``` java.io.IOException: null at org.apache.coyote.http2.Stream$StandardStreamInputBuffer.receiveReset(Stream.java:1516) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Stream.receiveReset(Stream.java:224) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Http2UpgradeHandler.close(Http2UpgradeHandler.java:1305) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:437) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Http2AsyncUpgradeHandler.upgradeDispatch(Http2AsyncUpgradeHandler.java:43) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:337) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:167) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.tomcat.util.net.SocketWrapperBase$VectoredIOCompletionHandler.failed(SocketWrapperBase.java:1148) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1660) ~[tomcat-coyote-10.1.33.jar:10.1.33] at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) [tomcat-util-10.1.33.jar:10.1.33] at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-util-10.1.33.jar:10.1.33] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) [tomcat-util-10.1.33.jar:10.1.33] at java.lang.Thread.run(Thread.java:840) [?:?] ``` after further testing, we think the change https://github.com/apache/tomcat/commit/f902edf085c0c73139a66d1bfc4d5790a416b130 introduced in 10.1.29 is the reason we get 500 status. but the change seems to be doing what is intended, so we tested with multiple tomcat versions to see if there were already unexposed failures prior. and in versions like 10.1.24 we see the below error which does not result in 500 status: ``` [org.apache.coyote.http2.Http2Parser] {https-jsse-nio-8443-exec-6} Connection [92], Stream [0], Frame type [null], Error java.io.IOException: Unable to unwrap data, invalid status [CLOSED] at org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:772) ~[tomcat-coyote-10.1.24.jar:10.1.24] at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1609) ~[tomcat-coyote-10.1.24.jar:10.1.24] at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) [tomcat-util-10.1.24.jar:10.1.24] at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) [tomcat-util-10.1.24.jar:10.1.24] at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) [tomcat-util-10.1.24.jar:10.1.24] at java.base/java.lang.Thread.run(Thread.java:840) [?:?] ``` it is not the same error, but since it is happening around the same code context NioOperationState.run, it gave us the impression that the traces might be related. We have run our reproducer with multiple versions: 10.1.24, 10.1.28, 10.1.29, 10.1.33, 10.1.35, 11.0.2. versions 10.1.29 and up show the rarer "receiveReset" issue (500 status), versions 10.1.28 and down showcase only "Unable to unwrap data" issue (no 500). We can reproduce the issue somewhat reliably with the reproducer setup we have in https://github.com/cpekyaman/java-server-http2-test. The odd part is we could only reproduce it with certain clients reliably, with go and curl (we include go client as the failing requests in our production systems come from a monitoring system written in go). Some others, like the JDK client itself, either can't trigger the issue, or require much longer test time (couldn't verify this). The linked reproducer repo contains information to try it if you need. and it also contains some trace level logs from some of the runs we already did in the trace_logs_history folder. It seems something is off, but we are not sure if tomcat is doing something unexpected or not. Thanks.
Re: tomcat 10.1.33 random rare 500 response status for http2 upgrade with tls
On Thu, Feb 13, 2025 at 9:41 AM Cenk Pekyaman wrote: > > We run tomcat on java17 with the embedded tomcat setup. > We have http and https connectors and we have http2 upgradeProtocol for > both. > > We recently upgraded from 9.0.88 to 10.1.24 to work on javax to jakarta > migration, and after a while, upgraded to 10.1.33. > After the upgrade, we started to see random and rare 500 errors for some of > the http2 GET requests over https. > When reproducing the error on our development machines, we could see the > following trace on the server: > ``` > java.io.IOException: null Tomcat now sets an IO exception to trigger ReadListener.onError with an appropriate error in that case. So things seem normal so far. Now the "null" is not normal, the corresponding string exists in the resource bundle: stream.clientResetRequest=Client reset the stream before the request was fully read Rémy > at > org.apache.coyote.http2.Stream$StandardStreamInputBuffer.receiveReset(Stream.java:1516) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at org.apache.coyote.http2.Stream.receiveReset(Stream.java:224) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.coyote.http2.Http2UpgradeHandler.close(Http2UpgradeHandler.java:1305) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.coyote.http2.Http2UpgradeHandler.upgradeDispatch(Http2UpgradeHandler.java:437) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.coyote.http2.Http2AsyncUpgradeHandler.upgradeDispatch(Http2AsyncUpgradeHandler.java:43) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:337) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.failed(Http2AsyncParser.java:167) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.tomcat.util.net.SocketWrapperBase$VectoredIOCompletionHandler.failed(SocketWrapperBase.java:1148) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1660) > ~[tomcat-coyote-10.1.33.jar:10.1.33] > at > org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) > [tomcat-util-10.1.33.jar:10.1.33] > at > org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) > [tomcat-util-10.1.33.jar:10.1.33] > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) > [tomcat-util-10.1.33.jar:10.1.33] > at java.lang.Thread.run(Thread.java:840) [?:?] > ``` > > after further testing, we think the change > https://github.com/apache/tomcat/commit/f902edf085c0c73139a66d1bfc4d5790a416b130 > introduced in 10.1.29 is the reason we get 500 status. > but the change seems to be doing what is intended, so we tested with > multiple tomcat versions to see if there were already unexposed failures > prior. > and in versions like 10.1.24 we see the below error which does not result > in 500 status: > ``` > [org.apache.coyote.http2.Http2Parser] {https-jsse-nio-8443-exec-6} > Connection [92], Stream [0], Frame type [null], Error > java.io.IOException: Unable to unwrap data, invalid status [CLOSED] > at > org.apache.tomcat.util.net.SecureNioChannel.read(SecureNioChannel.java:772) > ~[tomcat-coyote-10.1.24.jar:10.1.24] > at > org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper$NioOperationState.run(NioEndpoint.java:1609) > ~[tomcat-coyote-10.1.24.jar:10.1.24] > at > org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190) > [tomcat-util-10.1.24.jar:10.1.24] > at > org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) > [tomcat-util-10.1.24.jar:10.1.24] > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) > [tomcat-util-10.1.24.jar:10.1.24] > at java.base/java.lang.Thread.run(Thread.java:840) [?:?] > ``` > it is not the same error, but since it is happening around the same code > context NioOperationState.run, it gave us the impression that the traces > might be related. > > We have run our reproducer with multiple versions: 10.1.24, 10.1.28, > 10.1.29, 10.1.33, 10.1.35, 11.0.2. > versions 10.1.29 and up show the rarer "receiveReset" issue (500 status), > versions 10.1.28 and down showcase only "Unable to unwrap data" issue (no > 500). > > We can reproduce the issue somewhat reliably with the reproducer setup we > have in https://github.com/cpekyaman/java-server-http2-test. > The odd part is we could only reproduce it with certain clients reliably, > with go and curl (we include go client as the failing requests in our > production systems come from a monitoring system written in go). > Some others, like the JDK client itself, either can't trigger the issue, or > require much longer test time (couldn't verify this). > The linked reproducer repo contains information to try it if you need. and > it also
Re: tomcat 10.1.33 random rare 500 response status for http2 upgrade with tls
On Thu, Feb 13, 2025 at 9:58 AM Rainer Jung wrote: > Am 13.02.25 um 09:40 schrieb Cenk Pekyaman: > ... > > > We can reproduce the issue somewhat reliably with the reproducer setup we > > have in https://github.com/cpekyaman/java-server-http2-test. > > The odd part is we could only reproduce it with certain clients reliably, > > with go and curl (we include go client as the failing requests in our > > production systems come from a monitoring system written in go). > > Could you give the curl version as well? I noticed a test failure for > some other web server in the HTTP2 area and learned it was due to some > detail behavior change in curl. In this other situation the failure was > a wrong expectation in the test class. but although you observer > something here in real communication it might still be useful to try > reproducing with the same curl version that you use in case an easy > reproduction attempt does not work. > > Thanks! > > Rainer > > curl versions: - prod test system 1: 7.76.1 (x86_64-redhat-linux-gnu) libcurl/7.76.1 OpenSSL/3.0.7 zlib/1.2.11 brotli/1.0.9 libidn2/2.3.0 libpsl/0.21.1 (+libidn2/2.3.0) libssh/0.10.4/openssl/zlib nghttp2/1.43.0 - prod test system 2: 7.61.1 (x86_64-redhat-linux-gnu) libcurl/7.61.1 OpenSSL/1.1.1k zlib/1.2.11 brotli/1.0.6 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.2.0) libssh/0.9.6/openssl/zlib nghttp2/1.33.0 - reproducer (local development machine): 8.7.1 (x86_64-apple-darwin24.0) libcurl/8.7.1 (SecureTransport) LibreSSL/3.3.6 zlib/1.2.12 nghttp2/1.63.0 first two versions are from the systems where our SREs performed initial tests on actual environments, just after the issue was first observed (both were able to reproduce the issue). the last version is from our local development environments.