Thanks Eric, you're right. After inet_csk_reqsk_queue_add() succeeds, the ref acquired in reuseport_migrate_sock() is effectively transferred to nreq->rsk_listener. Another CPU can then dequeue nreq (via accept() or listener shutdown), hit reqsk_put(), and drop that listener ref.
Since listeners are SOCK_RCU_FREE, the post-queue_add() dereferences of nsk should be under rcu_read_lock()/ rcu_read_unlock(), which also covers the existing sock_net(nsk) access in that path. I also checked reqsk_timer_handler(): reqsk_queue_migrated() there is only accounting, and once nreq becomes visible via inet_ehash_insert(), the handler no longer appears to dereference nsk. I'll fold this into v2. Eric Dumazet <[email protected]> 于2026年4月18日周六 14:02写道: > > On Fri, Apr 17, 2026 at 9:17 PM Zhenzhong Wu <[email protected]> wrote: > > > > When inet_csk_listen_stop() migrates an established child socket from > > a closing listener to another socket in the same SO_REUSEPORT group, > > the target listener gets a new accept-queue entry via > > inet_csk_reqsk_queue_add(), but that path never notifies the target > > listener's waiters. > > > > As a result, a nonblocking accept() still succeeds because it checks > > the accept queue directly, but waiters that sleep for listener > > readiness can remain asleep until another connection generates a > > wakeup. This affects poll()/epoll_wait()-based waiters, and can also > > leave a blocking accept() asleep after migration even though the > > child is already in the target listener's accept queue. > > > > This was observed in a local test where listener A completed the > > handshake, queued the child, and was closed before userspace called > > accept(). The child was migrated to listener B, but listener B never > > received a wakeup for the migrated accept-queue entry. > > > > Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration > > in inet_csk_listen_stop(). > > > > The reqsk_timer_handler() path does not need the same change: > > half-open requests only become readable to userspace when the final > > ACK completes the handshake, and tcp_child_process() already wakes > > the listener in that case. > > > > Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in > > accept queues.") > > Cc: [email protected] > > Signed-off-by: Zhenzhong Wu <[email protected]> > > --- > > net/ipv4/inet_connection_sock.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/net/ipv4/inet_connection_sock.c > > b/net/ipv4/inet_connection_sock.c > > index 4ac3ae1bc..da1ce082f 100644 > > --- a/net/ipv4/inet_connection_sock.c > > +++ b/net/ipv4/inet_connection_sock.c > > @@ -1483,6 +1483,7 @@ void inet_csk_listen_stop(struct sock *sk) > > __NET_INC_STATS(sock_net(nsk), > > > > LINUX_MIB_TCPMIGRATEREQSUCCESS); > > reqsk_migrate_reset(req); > > + READ_ONCE(nsk->sk_data_ready)(nsk); > > I think this is adding a potential UAF (Use Afte Free). > @nsk might have been freed already by another thread/cpu. > Note the existing code already has similar issues. > > Untested patch: > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c > index > 4ac3ae1bc1afc3a39f2790e39b4dda877dc3272b..287b6e01c4f71bfec3dd2a708f316224d9eb4a64 > 100644 > --- a/net/ipv4/inet_connection_sock.c > +++ b/net/ipv4/inet_connection_sock.c > @@ -1479,6 +1479,7 @@ void inet_csk_listen_stop(struct sock *sk) > if (nreq) { > refcount_set(&nreq->rsk_refcnt, 1); > > + rcu_read_lock(); > if (inet_csk_reqsk_queue_add(nsk, > nreq, child)) { > __NET_INC_STATS(sock_net(nsk), > > LINUX_MIB_TCPMIGRATEREQSUCCESS); > @@ -1489,7 +1490,7 @@ void inet_csk_listen_stop(struct sock *sk) > reqsk_migrate_reset(nreq); > __reqsk_free(nreq); > } > - > + rcu_read_unlock(); > /* inet_csk_reqsk_queue_add() has already > * called inet_child_forget() on failure case. > */

