On 12/27/2017 1:24 PM, John Fastabend wrote: > On 12/24/2017 07:49 PM, Wei Yongjun wrote: > > When dev_requeue_skb() is called with bluked skb list, only the > > first skb of the list will be requeued to qdisc layer, and leak > > the others without free them. > > > > TCP is broken due to skb leak since no free skb will be considered > > as still in the host queue and never be retransmitted. This happend > > when dev_requeue_skb() called from qdisc_restart(). > > qdisc_restart > > |-- dequeue_skb > > |-- sch_direct_xmit() > > |-- dev_requeue_skb() <-- skb may bluked > > > > Fix dev_requeue_skb() to requeue the full bluked list. > > > > Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback") > > Signed-off-by: Wei Yongjun <weiyongj...@huawei.com> > > --- > > v1 -> v2: add net-next prefix > > --- > > First, thanks for tracking this down. > > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > > index 981c08f..0df2dbf 100644 > > --- a/net/sched/sch_generic.c > > +++ b/net/sched/sch_generic.c > > @@ -111,10 +111,16 @@ static inline void > qdisc_enqueue_skb_bad_txq(struct Qdisc *q, > > > > static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) > > { > > - __skb_queue_head(&q->gso_skb, skb); > > - q->qstats.requeues++; > > - qdisc_qstats_backlog_inc(q, skb); > > - q->q.qlen++; /* it's still part of the queue */ > > + while (skb) { > > + struct sk_buff *next = skb->next; > > + > > + __skb_queue_tail(&q->gso_skb, skb); > > Was the change from __skb_queue_head to __skb_queue_tail here > intentional? We should re-queue packets to the head of the list.
skb is dequeue from gso_skb in order, so if packets re-queue to the head of the list, skb will be sent in reverse order, so I think __skb_queue_tail is better here. Tcpdump show those skbs are out of order packets. > > > + q->qstats.requeues++; > > + qdisc_qstats_backlog_inc(q, skb); > > + q->q.qlen++; /* it's still part of the queue */ > > + > > + skb = next; > > + } > > __netif_schedule(q); > > > > return 0; > > @@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct > sk_buff *skb, struct Qdisc *q) > > { > > spinlock_t *lock = qdisc_lock(q); > > > > - spin_lock(lock); > > - __skb_queue_tail(&q->gso_skb, skb); > > - spin_unlock(lock); > > + while (skb) { > > + struct sk_buff *next = skb->next; > > + > > + spin_lock(lock); > > In this case I suspect its better to move the lock to be around the > while loop rather than grab and drop it repeatedly. I don't have > any data at this point so OK either way. Assuming other head/tail > comment is addressed. I will fix it in v3. > > > + __skb_queue_tail(&q->gso_skb, skb); > > Same here *_tail should be *_head? > > > + spin_unlock(lock); > > + > > + qdisc_qstats_cpu_requeues_inc(q); > > + qdisc_qstats_cpu_backlog_inc(q, skb); > > + qdisc_qstats_cpu_qlen_inc(q); > > + > > + skb = next; > > + } > > > > - qdisc_qstats_cpu_requeues_inc(q); > > - qdisc_qstats_cpu_backlog_inc(q, skb); > > - qdisc_qstats_cpu_qlen_inc(q); > > __netif_schedule(q); > > > > return 0; > >