Hi, Paolo, below is couple my thoughts about this.
On 06.06.2018 12:44, Paolo Abeni wrote: > On Tue, 2018-06-05 at 18:06 +0200, Paolo Abeni wrote: >> On Tue, 2018-06-05 at 08:35 -0700, Tom Herbert wrote: >>> Paolo, thanks for looking into this! Can you try replacing >>> __skb_dequeue in requeue_rx_msgs with skb_dequeue to see if that is >>> the fix. >> >> Sure, I'll retrigger the test, and report the result here (or directly >> a new patch, should the test be succesful) > > Contrary to my expectations, the suggested change does not fix the > issue. I'm still investigating the overall locking schema. kcm_rcv_strparser()->unreserve_rx_kcm()->requeue_rx_msgs()->__skb_dequeue() seems needed to be synchronized with: kcm_recvmsg()->kcm_wait_data(). Otherwise, requeue_rx_msgs() removes kcm_recvmsg() peeked skb. The solution could be to take lock_sock(&kcm->sk) in requeue_rx_msgs(), but we can't do that since there is already locked another socket (and potentially, this may be a reason of deadlock). The approach you made in initial patch seems good for me to solve this problem. The only thing I'm not sure is either lock_sock() is needed in kcm_recvmsg() after this. Thanks, Kirill