https://bz.apache.org/bugzilla/show_bug.cgi?id=66508

--- Comment #14 from Venkata Manda <mvssrmur...@gmail.com> ---
Hi Mark Thomas,

I am still able to reproduce the issue with the latest changes.
My understanding is the recent changes help only when the
writeCompletionHandler failed method in Nio2Endpoint fires the
SocketEvent.ERROR.

                    if (!endpoint.processSocket(Nio2SocketWrapper.this,
SocketEvent.ERROR, true))
                        close();


Firing this event is not happening as the thread that is executing to fire is
not entering the synchronized block on socketWrapper.
(https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/tomcat/util/net/SocketProcessorBase.java#L40)

    public final void run() {
        synchronized (socketWrapper) {


The socketWrapper is hold by the Event 1 thread (in below logs it is http-10).
Please find below the timings of each thread in play to give you a better
picture of the issue.



2023-03-16 21:25:19 (353) pool-12-thread-1 startMessage - Acquired
messagePartInProgress semaphore lock : Session ID : 0


2023-03-16 21:25:19 (360) http-10 sendMessageBlock before acquiring
messagePartInProgress tryAcquire - Session ID : 0


2023-03-16 21:25:19 (369) http-11 writeCompletionHandler.failed called :
java.io.IOException: Broken Pipe


2023-03-16 21:25:39 (361) http-10 sendMessageBlock could not acquire
messagePartInProgress.tryAcquire. Closing socket. - SessionID : 0

java.net.SocketTimeoutException: The current message was not fully sent within
the specified timeout


2023-03-16 21:25:39 (362) Inside the synchronized block of
SocketProcessorBase.run method, Thread Name: http-nio2-8082-exec-15

2023-03-16 21:25:39 (362) Inside the synchronized block of
SocketProcessorBase.run -- socketWrapper.isClosed: true. Not running doRun
method. Thread Name : http-nio2-8082-exec-15





If you notice there are three are four threads in play.
At the start, there are two threads.

pool-12-thread-1 (a thread on the server) trying to write a message on the
Session ID 0.

http-10 thread received a message from client with close code 1000 and a
reason.
Our client is a javascript and is using the below API to send a close message
with code 1000 and a reason.
webSocket.close(code, reason);

http-10 thread to process the received message should have acquired the lock on
socketWrapper. As this thread could not acquire the messagePartInProgress
semaphore as it is already acquired by pool-12-thread-1, it will be waiting for
20 seconds (configured time out) and then though a socket time out exception.
In above logs you can see a difference of 20 seconds from the time the log
printed before calling 
!messagePartInProgress.tryAcquire(timeout, TimeUnit.MILLISECONDS)

(https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/tomcat/websocket/WsRemoteEndpointImplBase.java#L292)


pool-12-thread-1 is failing to write and writeCompletionHandler.failed method
is called. In my test it is by thread http-11.

https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/tomcat/util/net/Nio2Endpoint.java#L707

http-11 thread is trying to fire SocketEvent.ERROR, but it is not getting
processed as http-nio2-8082-exec-15 is unable to enter synchronization block as
the socketWrapper is hold by http-10 thread. 


Let me know if you have any questions.

I am thinking adding below code before
https://github.com/apache/tomcat/blob/8.5.x/java/org/apache/tomcat/websocket/WsRemoteEndpointImplBase.java#L292
may address the issue. If the opcode is close, instead of doing a tryAcquire
with time out, simply doing a tryAcquire may be nonproblematic.

if (opCode == Constants.OPCODE_CLOSE)  {
                if (!messagePartInProgress.tryAcquire()) {
                    String msg =
sm.getString("wsRemoteEndpoint.acquireTimeout");
                    wsSession.doClose(new CloseReason(CloseCodes.GOING_AWAY,
msg),
                        new CloseReason(CloseCodes.CLOSED_ABNORMALLY, msg),
true);
                    throw new SocketTimeoutException(msg);
                }
            }


Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to