On 12/27/2017 1:24 PM, John Fastabend wrote:
> On 12/24/2017 07:49 PM, Wei Yongjun wrote:
> > When dev_requeue_skb() is called with bluked skb list, only the
> > first skb of the list will be requeued to qdisc layer, and leak
> > the others without free them.
> >
> > TCP is broken due to skb leak since no free skb will be considered
> > as still in the host queue and never be retransmitted. This happend
> > when dev_requeue_skb() called from qdisc_restart().
> >   qdisc_restart
> >   |-- dequeue_skb
> >   |-- sch_direct_xmit()
> >       |-- dev_requeue_skb() <-- skb may bluked
> >
> > Fix dev_requeue_skb() to requeue the full bluked list.
> >
> > Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> > Signed-off-by: Wei Yongjun <weiyongj...@huawei.com>
> > ---
> > v1 -> v2: add net-next prefix
> > ---
> 
> First, thanks for tracking this down.
> 
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 981c08f..0df2dbf 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -111,10 +111,16 @@ static inline void
> qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
> >
> >  static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> >  {
> > -   __skb_queue_head(&q->gso_skb, skb);
> > -   q->qstats.requeues++;
> > -   qdisc_qstats_backlog_inc(q, skb);
> > -   q->q.qlen++;    /* it's still part of the queue */
> > +   while (skb) {
> > +           struct sk_buff *next = skb->next;
> > +
> > +           __skb_queue_tail(&q->gso_skb, skb);
> 
> Was the change from __skb_queue_head to __skb_queue_tail here
> intentional? We should re-queue packets to the head of the list.

skb is dequeue from gso_skb in order, so if packets re-queue to the head
of the list, skb will be sent in reverse order, so I think __skb_queue_tail is
better here. Tcpdump show those skbs are out of order packets.

> 
> > +           q->qstats.requeues++;
> > +           qdisc_qstats_backlog_inc(q, skb);
> > +           q->q.qlen++;    /* it's still part of the queue */
> > +
> > +           skb = next;
> > +   }
> >     __netif_schedule(q);
> >
> >     return 0;
> > @@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct
> sk_buff *skb, struct Qdisc *q)
> >  {
> >     spinlock_t *lock = qdisc_lock(q);
> >
> > -   spin_lock(lock);
> > -   __skb_queue_tail(&q->gso_skb, skb);
> > -   spin_unlock(lock);
> > +   while (skb) {
> > +           struct sk_buff *next = skb->next;
> > +
> > +           spin_lock(lock);
> 
> In this case I suspect its better to move the lock to be around the
> while loop rather than grab and drop it repeatedly. I don't have
> any data at this point so OK either way. Assuming other head/tail
> comment is addressed.

I will fix it in v3.

> 
> > +           __skb_queue_tail(&q->gso_skb, skb);
> 
> Same here *_tail should be *_head?
> 
> > +           spin_unlock(lock);
> > +
> > +           qdisc_qstats_cpu_requeues_inc(q);
> > +           qdisc_qstats_cpu_backlog_inc(q, skb);
> > +           qdisc_qstats_cpu_qlen_inc(q);
> > +
> > +           skb = next;
> > +   }
> >
> > -   qdisc_qstats_cpu_requeues_inc(q);
> > -   qdisc_qstats_cpu_backlog_inc(q, skb);
> > -   qdisc_qstats_cpu_qlen_inc(q);
> >     __netif_schedule(q);
> >
> >     return 0;
> >

Reply via email to