Hi Jon,  Jakub,

I tried with your comment. But looks like we got into circular locking and 
deadlock could happen like this:
        CPU0                    CPU1
        ----                    ----
   lock(&n->lock#2);
                                lock(&tn->nametbl_lock);
                                lock(&n->lock#2);
   lock(&tn->nametbl_lock);

  *** DEADLOCK ***

Regards,
Hoang
> -----Original Message-----
> From: Jon Maloy <jma...@redhat.com>
> Sent: Friday, October 9, 2020 1:01 AM
> To: Jakub Kicinski <k...@kernel.org>; Hoang Huu Le <hoang.h...@dektech.com.au>
> Cc: ma...@donjonn.com; ying....@windriver.com; 
> tipc-discuss...@lists.sourceforge.net; netdev@vger.kernel.org
> Subject: Re: [net] tipc: fix NULL pointer dereference in tipc_named_rcv
> 
> 
> 
> On 10/8/20 1:25 PM, Jakub Kicinski wrote:
> > On Thu,  8 Oct 2020 14:31:56 +0700 Hoang Huu Le wrote:
> >> diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c
> >> index 2f9c148f17e2..fe4edce459ad 100644
> >> --- a/net/tipc/name_distr.c
> >> +++ b/net/tipc/name_distr.c
> >> @@ -327,8 +327,13 @@ static struct sk_buff *tipc_named_dequeue(struct 
> >> sk_buff_head *namedq,
> >>    struct tipc_msg *hdr;
> >>    u16 seqno;
> >>
> >> +  spin_lock_bh(&namedq->lock);
> >>    skb_queue_walk_safe(namedq, skb, tmp) {
> >> -          skb_linearize(skb);
> >> +          if (unlikely(skb_linearize(skb))) {
> >> +                  __skb_unlink(skb, namedq);
> >> +                  kfree_skb(skb);
> >> +                  continue;
> >> +          }
> >>            hdr = buf_msg(skb);
> >>            seqno = msg_named_seqno(hdr);
> >>            if (msg_is_last_bulk(hdr)) {
> >> @@ -338,12 +343,14 @@ static struct sk_buff *tipc_named_dequeue(struct 
> >> sk_buff_head *namedq,
> >>
> >>            if (msg_is_bulk(hdr) || msg_is_legacy(hdr)) {
> >>                    __skb_unlink(skb, namedq);
> >> +                  spin_unlock_bh(&namedq->lock);
> >>                    return skb;
> >>            }
> >>
> >>            if (*open && (*rcv_nxt == seqno)) {
> >>                    (*rcv_nxt)++;
> >>                    __skb_unlink(skb, namedq);
> >> +                  spin_unlock_bh(&namedq->lock);
> >>                    return skb;
> >>            }
> >>
> >> @@ -353,6 +360,7 @@ static struct sk_buff *tipc_named_dequeue(struct 
> >> sk_buff_head *namedq,
> >>                    continue;
> >>            }
> >>    }
> >> +  spin_unlock_bh(&namedq->lock);
> >>    return NULL;
> >>   }
> >>
> >> diff --git a/net/tipc/node.c b/net/tipc/node.c
> >> index cf4b239fc569..d269ebe382e1 100644
> >> --- a/net/tipc/node.c
> >> +++ b/net/tipc/node.c
> >> @@ -1496,7 +1496,7 @@ static void node_lost_contact(struct tipc_node *n,
> >>
> >>    /* Clean up broadcast state */
> >>    tipc_bcast_remove_peer(n->net, n->bc_entry.link);
> >> -  __skb_queue_purge(&n->bc_entry.namedq);
> >> +  skb_queue_purge(&n->bc_entry.namedq);
> > Patch looks fine, but I'm not sure why not hold
> > spin_unlock_bh(&tn->nametbl_lock) here instead?
> >
> > Seems like node_lost_contact() should be relatively rare,
> > so adding another lock to tipc_named_dequeue() is not the
> > right trade off.
> Actually, I agree with previous speaker here. We already have the
> nametbl_lock when tipc_named_dequeue() is called, and the same lock is
> accessible from no.c where node_lost_contact() is executed. The patch
> and the code becomes simpler.
> I suggest you post a v2 of this one.
> 
> ///jon
> 
> >>    /* Abort any ongoing link failover */
> >>    for (i = 0; i < MAX_BEARERS; i++) {

Reply via email to