From: Sean Tranchetti <stran...@codeaurora.org> Date: Tue, 30 Jun 2020 11:50:17 -0600
> A potential deadlock can occur during registering or unregistering a > new generic netlink family between the main nl_table_lock and the > cb_lock where each thread wants the lock held by the other, as > demonstrated below. > > 1) Thread 1 is performing a netlink_bind() operation on a socket. As part > of this call, it will call netlink_lock_table(), incrementing the > nl_table_users count to 1. > 2) Thread 2 is registering (or unregistering) a genl_family via the > genl_(un)register_family() API. The cb_lock semaphore will be taken for > writing. > 3) Thread 1 will call genl_bind() as part of the bind operation to handle > subscribing to GENL multicast groups at the request of the user. It will > attempt to take the cb_lock semaphore for reading, but it will fail and > be scheduled away, waiting for Thread 2 to finish the write. > 4) Thread 2 will call netlink_table_grab() during the (un)registration > call. However, as Thread 1 has incremented nl_table_users, it will not > be able to proceed, and both threads will be stuck waiting for the > other. > > genl_bind() is a noop, unless a genl_family implements the mcast_bind() > function to handle setting up family-specific multicast operations. Since > no one in-tree uses this functionality as Cong pointed out, simply removing > the genl_bind() function will remove the possibility for deadlock, as there > is no attempt by Thread 1 above to take the cb_lock semaphore. > > Fixes: c380d9a7afff ("genetlink: pass multicast bind/unbind to families") > Suggested-by: Cong Wang <xiyou.wangc...@gmail.com> > Acked-by: Johannes Berg <johannes.b...@intel.com> > Reported-by: kernel test robot <l...@intel.com> > Signed-off-by: Sean Tranchetti <stran...@codeaurora.org> Applied and queued up for -stable, thanks.