On Fri, May 10, 2019 at 5:41 PM Mark Thomas <ma...@apache.org> wrote:
> On 10/05/2019 16:26, Rémy Maucherat wrote: > > On Fri, May 10, 2019 at 11:14 AM Mark Thomas <ma...@apache.org> wrote: > > > >> On 09/05/2019 21:53, Mark Thomas wrote: > >>> On 05/05/2019 09:37, r...@apache.org wrote: > >>>> This is an automated email from the ASF dual-hosted git repository. > >>>> > >>>> remm pushed a commit to branch master > >>>> in repository https://gitbox.apache.org/repos/asf/tomcat.git > >>>> > >>>> > >>>> The following commit(s) were added to refs/heads/master by this push: > >>>> new 6232d82 Make sure timeout elapses before calling a timeout > >>>> 6232d82 is described below > >>>> > >>>> commit 6232d82cc5db73e5da5392d6ec6d9d01ce65c85e > >>>> Author: remm <r...@apache.org> > >>>> AuthorDate: Sun May 5 10:37:05 2019 +0200 > >>>> > >>>> Make sure timeout elapses before calling a timeout > >>>> > >>>> It seems there are extra stream notify as although 0 bytes have > been > >>>> allocated only a few ms have passed when there is a failure. > >>> > >>> I think something else is going on here. I'm still trying to figure it > >> out. > >> > >> I can't reproduce the test failure - either with the code or some code > >> written to explicitly test this scenario. However, I do believe I know > >> what the problem is. > >> > >> backLogStreams is a concurrent Map of Stream to int[]. The int[] is a > >> two element array where the first element is number of bytes the stream > >> requires and the second is the number allocated. > >> > >> I believe the issue occurs as follows: > >> - Thread A requests more bytes for Stream A than are available in the > >> connection window > >> - Stream A is placed into the backlog > >> - Thread A enters streamA.wait() > >> - Thread B processes a Window update message > >> - Thread B allocates some bytes to streamA > >> - Thread B calls streamA.notify() > >> - Thread A exits the wait but, because the elements of the int[] are not > >> volatile, it does not see the window update > >> - Thread A treats existing the wait without an allocation as a timeout > >> > >> I've left out the obtaining and releasing of locks on streamA and the > >> Http2UpgradeHandler for clarity but I'll note that some reads of the > >> allocation occur outside of any lock > >> > >> The above description is consistent with a known failure trace: > >> > >> > https://ci.apache.org/projects/tomcat/tomcat9/logs/4288/TEST-org.apache.coyote.http2.TestHttp2Section_5_3.NIO2.txt > >> > >> My plan is as follows: > >> - revert the "check it has really timed out fix" > >> - modify the access to the allocation so that updates to the allocation > >> are guaranteed to be visible after the wait() exits > >> > > > > I agree there's something wrong, but the current algorithm may not be > > robust enough because anything notifying the stream object will break it. > > I think we have / had multiple bugs here. That hasn't been helping. > > > From the logs, the problem (looking only at test 5_3) is with stream 19. > > When the test passes, the stream was never actually backloged. When it > > fails, it was and the window update frame read does not cause the backlog > > release (so when Stream.incrementWindowSize does notifyAll, it breaks). > > I see what you mean. Waiting on the stream object for the stream window > update and the connection window update isn't going to work. > > If you can see a solution - do let me know. My instinct is that this is > going to be a tricky one to fix. > > Is this a regression that needs to be fixed? Or should I go ahead with > the releases. Personally, I'm leaning towards going ahead. > I'm not sure. I don't think it happens without async [for whatever timing reason]. Rémy