https://bz.apache.org/bugzilla/show_bug.cgi?id=70103
Bug ID: 70103
Summary: Deadlock in unlockAccept() when using Unix Domain
Socket (UDS) during endpoint pause/stop
Product: Tomcat 11
Version: unspecified
Hardware: All
OS: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Catalina
Assignee: [email protected]
Reporter: [email protected]
Target Milestone: -------
Description:
When Tomcat is configured to use Unix Domain Sockets (UDS), the pause() method
(invoked during the normal shutdown/stop sequence) can hang indefinitely. This
is caused by a race condition between the control thread executing pause() and
the Acceptor thread, leading to a deadlock due to the different blocking
semantics of UDS connect() compared to TCP connect().
Root Cause Analysis:
The unlockAccept() method is designed to unblock the Acceptor thread from
ServerSocket.accept() by initiating a dummy connection to the endpoint. This
works flawlessly for TCP because the OS kernel buffers the TCP handshake,
causing connect() to return immediately, regardless of whether the accept() has
been called yet.
However, Unix Domain Sockets behave differently. If the UDS accept queue
(backlog) is full, or if the OS requires the listening side to consume the
connection, the connect() call will block until accept() is executed on the
other end. If the UDS socket is not explicitly set to non-blocking or given a
timeout, connect() blocks infinitely.
The Race Condition / Deadlock Scenario:
Control Thread: Calls pause(), which sets the paused flag to true.
Acceptor Thread: Just happens to finish processing its previous connection,
loops back, evaluates the paused flag, and enters the Object.wait() state.
Crucially, it is no longer blocking on accept(), meaning it will no longer
consume connections from the OS accept queue.
Control Thread: Proceeds to call unlockAccept(), which attempts to establish a
dummy connection to the UDS endpoint.
OS Kernel: The UDS connect() system call blocks because the Acceptor thread is
in wait() and not calling accept() to consume the connection.
Deadlock: The Control thread is blocked waiting for the UDS connect() to
succeed, while the Acceptor thread is blocked in wait() waiting for the Control
thread to resume the endpoint. The shutdown process hangs indefinitely.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]