Applications use ECONNREFUSED as returned from write() in order to
determine that a socket should be closed. When using connected dgram
unix sockets in a poll/write loop, this relies on POLLOUT being
signaled when the remote end closes. However, due to a race POLLOUT
can be missed when the remote closes:
thread 1 (client) thread 2 (server)
connect() to server
write() returns -EAGAIN
unix_dgram_poll()
-> unix_recvq_full() is true
close()
->unix_release_sock()
->wake_up_interruptible_all()
unix_dgram_poll() (due to the
wake_up_interruptible_all)
-> unix_recvq_full() still is true
->free all skbs
Now thread 1 is stuck and will not receive anymore wakeups. In this
case, when thread 1 gets the -EAGAIN, it has not queued any skbs
otherwise the 'free all skbs' step would in fact cause a wakeup and
a POLLOUT return. So the race here is probably fairly rare because
it means there are no skbs that thread 1 queued and that thread 1
runs before the 'free all skbs' step. Nevertheless, this has been
observed when the syslog daemon closes /dev/log. Tested against
a reproducer that re-creates the syslog hang.
The proposed fix is to move the wake_up_interruptible_all() call
after the 'free all skbs' step.
Reported-by: Ian Lance Taylor <[email protected]>
Cc: Rainer Weikusat <[email protected]>
Signed-off-by: Jason Baron <[email protected]>
---
net/unix/af_unix.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index e5473c0..de242cf 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -529,8 +529,6 @@ static void unix_release_sock(struct sock *sk, int embrion)
sk->sk_state = TCP_CLOSE;
unix_state_unlock(sk);
- wake_up_interruptible_all(&u->peer_wait);
-
skpair = unix_peer(sk);
if (skpair != NULL) {
@@ -560,6 +558,9 @@ static void unix_release_sock(struct sock *sk, int embrion)
kfree_skb(skb);
}
+ /* after freeing skbs to make sure POLLOUT triggers */
+ wake_up_interruptible_all(&u->peer_wait);
+
if (path.dentry)
path_put(&path);
--
2.7.4