On Mon, Jul 6, 2020 at 1:34 PM YU, Xiangning <xiangning...@alibaba-inc.com> wrote: > > > > On 7/6/20 1:10 PM, Cong Wang wrote: > > On Mon, Jul 6, 2020 at 11:11 AM YU, Xiangning > > <xiangning...@alibaba-inc.com> wrote: > >> +static int ltb_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t > >> *root_lock, > >> + struct sk_buff **to_free) > >> +{ > >> + struct ltb_sched *ltb = qdisc_priv(sch); > >> + struct ltb_pcpu_sched *pcpu_q; > >> + struct ltb_class *cl; > >> + struct ltb_pcpu_data *pcpu = this_cpu_ptr(ltb->pcpu_data); > >> + int cpu; > >> + > >> + cpu = smp_processor_id(); > >> + pcpu_q = qdisc_priv(pcpu->qdisc); > >> + ltb_skb_cb(skb)->cpu = cpu; > >> + > >> + cl = ltb_classify(sch, ltb, skb); > >> + if (unlikely(!cl)) { > >> + kfree_skb(skb); > >> + return NET_XMIT_DROP; > >> + } > >> + > >> + pcpu->active = true; > >> + if (unlikely(kfifo_put(&cl->aggr_queues[cpu], skb) == 0)) { > >> + kfree_skb(skb); > >> + atomic64_inc(&cl->stat_drops); > >> + return NET_XMIT_DROP; > >> + } > > > > > > How do you prevent out-of-order packets? > > > > Hi Cong, > > That's a good question. In theory there will be out of order packets during > aggregation. While keep in mind this is per-class aggregation, and it runs at > a high frequency, that the chance to have any left over skbs from the same > TCP flow on many CPUs is low. > > Also, based on real deployment experience, we haven't observed an increased > out of order events even under heavy work load. >
Yeah, but unless you always classify packets into proper flows, there is always a chance to generate out-of-order packets here, which means the default configuration is flawed. > > > >> +static int ltb_init(struct Qdisc *sch, struct nlattr *opt, > > ... > >> + ltb->default_cls = ltb->shadow_cls; /* Default hasn't been created > >> */ > >> + tasklet_init(<b->fanout_tasklet, ltb_fanout_tasklet, > >> + (unsigned long)ltb); > >> + > >> + /* Bandwidth balancer, this logic can be implemented in user-land. > >> */ > >> + init_waitqueue_head(<b->bwbalancer_wq); > >> + ltb->bwbalancer_task = > >> + kthread_create(ltb_bw_balancer_kthread, ltb, "ltb-balancer"); > >> + wake_up_process(ltb->bwbalancer_task); > > > > Creating a kthread for each qdisc doesn't look good. Why do you > > need a per-qdisc kthread or even a kernel thread at all? > > > > We moved the bandwidth sharing out of the critical data path, that's why we > use a kernel thread to balance the current maximum bandwidth used by each > class periodically. > > This part could be implemented at as timer. What's your suggestion? I doubt you can use a timer, as you call rtnl_trylock() there. Why not use a delayed work? Thanks.