On Fri, May 10, 2019 at 11:14 AM Mark Thomas <ma...@apache.org> wrote:

> On 09/05/2019 21:53, Mark Thomas wrote:
> > On 05/05/2019 09:37, r...@apache.org wrote:
> >> This is an automated email from the ASF dual-hosted git repository.
> >>
> >> remm pushed a commit to branch master
> >> in repository https://gitbox.apache.org/repos/asf/tomcat.git
> >>
> >>
> >> The following commit(s) were added to refs/heads/master by this push:
> >>      new 6232d82  Make sure timeout elapses before calling a timeout
> >> 6232d82 is described below
> >>
> >> commit 6232d82cc5db73e5da5392d6ec6d9d01ce65c85e
> >> Author: remm <r...@apache.org>
> >> AuthorDate: Sun May 5 10:37:05 2019 +0200
> >>
> >>     Make sure timeout elapses before calling a timeout
> >>
> >>     It seems there are extra stream notify as although 0 bytes have been
> >>     allocated only a few ms have passed when there is a failure.
> >
> > I think something else is going on here. I'm still trying to figure it
> out.
>
> I can't reproduce the test failure - either with the code or some code
> written to explicitly test this scenario. However, I do believe I know
> what the problem is.
>
> backLogStreams is a concurrent Map of Stream to int[]. The int[] is a
> two element array where the first element is number of bytes the stream
> requires and the second is the number allocated.
>
> I believe the issue occurs as follows:
> - Thread A requests more bytes for Stream A than are available in the
>   connection window
> - Stream A is placed into the backlog
> - Thread A enters streamA.wait()
> - Thread B processes a Window update message
> - Thread B allocates some bytes to streamA
> - Thread B calls streamA.notify()
> - Thread A exits the wait but, because the elements of the int[] are not
>   volatile, it does not see the window update
> - Thread A treats existing the wait without an allocation as a timeout
>
> I've left out the obtaining and releasing of locks on streamA and the
> Http2UpgradeHandler for clarity but I'll note that some reads of the
> allocation occur outside of any lock
>
> The above description is consistent with a known failure trace:
>
> https://ci.apache.org/projects/tomcat/tomcat9/logs/4288/TEST-org.apache.coyote.http2.TestHttp2Section_5_3.NIO2.txt
>
> My plan is as follows:
> - revert the "check it has really timed out fix"
> - modify the access to the allocation so that updates to the allocation
>   are guaranteed to be visible after the wait() exits
>

I agree there's something wrong, but the current algorithm may not be
robust enough because anything notifying the stream object will break it.

>From the logs, the problem (looking only at test 5_3) is with stream 19.
When the test passes, the stream was never actually backloged. When it
fails, it was and the window update frame read does not cause the backlog
release (so when Stream.incrementWindowSize does notifyAll, it breaks).

The trace is:
    at
org.apache.coyote.http2.AbstractStream.incrementWindowSize(AbstractStream.java:121)
    at org.apache.coyote.http2.Stream.incrementWindowSize(Stream.java:251)
    at
org.apache.coyote.http2.Http2UpgradeHandler.incrementWindowSize(Http2UpgradeHandler.java:1533)
    at
org.apache.coyote.http2.Http2AsyncUpgradeHandler.incrementWindowSize(Http2AsyncUpgradeHandler.java:39)
    at
org.apache.coyote.http2.Http2Parser.readWindowUpdateFrame(Http2Parser.java:416)
    at
org.apache.coyote.http2.Http2AsyncParser$FrameCompletionHandler.completed(Http2AsyncParser.java:269)

Rémy

Reply via email to