Hi Jon, Jakub, I tried with your comment. But looks like we got into circular locking and deadlock could happen like this: CPU0 CPU1 ---- ---- lock(&n->lock#2); lock(&tn->nametbl_lock); lock(&n->lock#2); lock(&tn->nametbl_lock);
*** DEADLOCK *** Regards, Hoang > -----Original Message----- > From: Jon Maloy <jma...@redhat.com> > Sent: Friday, October 9, 2020 1:01 AM > To: Jakub Kicinski <k...@kernel.org>; Hoang Huu Le <hoang.h...@dektech.com.au> > Cc: ma...@donjonn.com; ying....@windriver.com; > tipc-discuss...@lists.sourceforge.net; netdev@vger.kernel.org > Subject: Re: [net] tipc: fix NULL pointer dereference in tipc_named_rcv > > > > On 10/8/20 1:25 PM, Jakub Kicinski wrote: > > On Thu, 8 Oct 2020 14:31:56 +0700 Hoang Huu Le wrote: > >> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c > >> index 2f9c148f17e2..fe4edce459ad 100644 > >> --- a/net/tipc/name_distr.c > >> +++ b/net/tipc/name_distr.c > >> @@ -327,8 +327,13 @@ static struct sk_buff *tipc_named_dequeue(struct > >> sk_buff_head *namedq, > >> struct tipc_msg *hdr; > >> u16 seqno; > >> > >> + spin_lock_bh(&namedq->lock); > >> skb_queue_walk_safe(namedq, skb, tmp) { > >> - skb_linearize(skb); > >> + if (unlikely(skb_linearize(skb))) { > >> + __skb_unlink(skb, namedq); > >> + kfree_skb(skb); > >> + continue; > >> + } > >> hdr = buf_msg(skb); > >> seqno = msg_named_seqno(hdr); > >> if (msg_is_last_bulk(hdr)) { > >> @@ -338,12 +343,14 @@ static struct sk_buff *tipc_named_dequeue(struct > >> sk_buff_head *namedq, > >> > >> if (msg_is_bulk(hdr) || msg_is_legacy(hdr)) { > >> __skb_unlink(skb, namedq); > >> + spin_unlock_bh(&namedq->lock); > >> return skb; > >> } > >> > >> if (*open && (*rcv_nxt == seqno)) { > >> (*rcv_nxt)++; > >> __skb_unlink(skb, namedq); > >> + spin_unlock_bh(&namedq->lock); > >> return skb; > >> } > >> > >> @@ -353,6 +360,7 @@ static struct sk_buff *tipc_named_dequeue(struct > >> sk_buff_head *namedq, > >> continue; > >> } > >> } > >> + spin_unlock_bh(&namedq->lock); > >> return NULL; > >> } > >> > >> diff --git a/net/tipc/node.c b/net/tipc/node.c > >> index cf4b239fc569..d269ebe382e1 100644 > >> --- a/net/tipc/node.c > >> +++ b/net/tipc/node.c > >> @@ -1496,7 +1496,7 @@ static void node_lost_contact(struct tipc_node *n, > >> > >> /* Clean up broadcast state */ > >> tipc_bcast_remove_peer(n->net, n->bc_entry.link); > >> - __skb_queue_purge(&n->bc_entry.namedq); > >> + skb_queue_purge(&n->bc_entry.namedq); > > Patch looks fine, but I'm not sure why not hold > > spin_unlock_bh(&tn->nametbl_lock) here instead? > > > > Seems like node_lost_contact() should be relatively rare, > > so adding another lock to tipc_named_dequeue() is not the > > right trade off. > Actually, I agree with previous speaker here. We already have the > nametbl_lock when tipc_named_dequeue() is called, and the same lock is > accessible from no.c where node_lost_contact() is executed. The patch > and the code becomes simpler. > I suggest you post a v2 of this one. > > ///jon > > >> /* Abort any ongoing link failover */ > >> for (i = 0; i < MAX_BEARERS; i++) {