Public bug reported: SRU Justification
Impact: WARN_ON messages casued by race condition between the close of a TCP socket and another process inspecting the same socket. The code of interest is the following; in tcp_close function : ... release_sock(sk); ... WARN_ON(sock_owned_by_user(sk)); ... While in release_sock(sk), sock_release_owner function is called which sets the sk->sk_lock.owned=0. When WARN_ON(sock_owned_by_user(sk)) is called it expects to find that the socket is not owned by anyone. According to upstream commit 8873c064d1de579ea2341, while a socket is being closed is possible that other threads find it in rtnetlink dump. tcp_get_info() function acquires the socket lock ( and sets sk_lock.owned=1 ) for a short amount of time, however long enough to trigger this warning. Fix: Fixed by upstream commit in v4.20: Commit: 8873c064d1de579ea23412a6d3eee972593f142b "tcp: do not release socket ownership in tcp_close()" Commit 8873c064d1de579ea23412a6d3eee972 fixes this bug by delegating the release of ownership (calling release_sock(sk)) to later; just before exiting tcp_close function. Testcase: Reporter has tested and verified test 4.15 test kernel for Bionic. This bug is difficult to be reproduced locally because the race condition cannot be triggered in a deterministic way. To hit this bug we need the following : a) a process closing a socket and while the execution is between release_sock(s) and WARN_ON(sock_owned_by_user(sk)) b) another process inspecting the same socket to get into tcp_get_info(), acquire ownership of the socket and not release it until the first process reaches the WARN_ON(sock_owned_by_user(sk)). This scenario is difficult to be achieved in a testing environment. Regression Potential: As far as Bionic (4.15 kernel) is concerned the reporter of the bug has tested and verified a test kernel with the fix. Concerning Cosmic (4.18 kernel) the fix has not been tested. However, given that a) this fix essentially removes the WARN_ON(sock_owned_by_user(sk)) and delegates the release of the ownership to later in the tcp_close function, and b) the relevant code paths in 4.15 and 4.18 are largely the same the regression potential should be minimal. ** Affects: linux (Ubuntu) Importance: Undecided Status: New ** Affects: linux (Ubuntu Bionic) Importance: Undecided Status: New ** Affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Cosmic) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Bionic) Importance: Undecided Status: New -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1830813 Title: TCP : race condition on socket ownership in tcp_close() Status in linux package in Ubuntu: New Status in linux source package in Bionic: New Status in linux source package in Cosmic: New Bug description: SRU Justification Impact: WARN_ON messages casued by race condition between the close of a TCP socket and another process inspecting the same socket. The code of interest is the following; in tcp_close function : ... release_sock(sk); ... WARN_ON(sock_owned_by_user(sk)); ... While in release_sock(sk), sock_release_owner function is called which sets the sk->sk_lock.owned=0. When WARN_ON(sock_owned_by_user(sk)) is called it expects to find that the socket is not owned by anyone. According to upstream commit 8873c064d1de579ea2341, while a socket is being closed is possible that other threads find it in rtnetlink dump. tcp_get_info() function acquires the socket lock ( and sets sk_lock.owned=1 ) for a short amount of time, however long enough to trigger this warning. Fix: Fixed by upstream commit in v4.20: Commit: 8873c064d1de579ea23412a6d3eee972593f142b "tcp: do not release socket ownership in tcp_close()" Commit 8873c064d1de579ea23412a6d3eee972 fixes this bug by delegating the release of ownership (calling release_sock(sk)) to later; just before exiting tcp_close function. Testcase: Reporter has tested and verified test 4.15 test kernel for Bionic. This bug is difficult to be reproduced locally because the race condition cannot be triggered in a deterministic way. To hit this bug we need the following : a) a process closing a socket and while the execution is between release_sock(s) and WARN_ON(sock_owned_by_user(sk)) b) another process inspecting the same socket to get into tcp_get_info(), acquire ownership of the socket and not release it until the first process reaches the WARN_ON(sock_owned_by_user(sk)). This scenario is difficult to be achieved in a testing environment. Regression Potential: As far as Bionic (4.15 kernel) is concerned the reporter of the bug has tested and verified a test kernel with the fix. Concerning Cosmic (4.18 kernel) the fix has not been tested. However, given that a) this fix essentially removes the WARN_ON(sock_owned_by_user(sk)) and delegates the release of the ownership to later in the tcp_close function, and b) the relevant code paths in 4.15 and 4.18 are largely the same the regression potential should be minimal. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1830813/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp