Thu, Dec 21, 2017 at 12:34:05AM CET, john.fastab...@gmail.com wrote: >On 12/20/2017 03:23 PM, Cong Wang wrote: >> On Wed, Dec 20, 2017 at 3:05 PM, John Fastabend >> <john.fastab...@gmail.com> wrote: >>> On 12/20/2017 02:41 PM, Cong Wang wrote: >>>> On Wed, Dec 20, 2017 at 12:09 PM, John Fastabend >>>> <john.fastab...@gmail.com> wrote: >>>>> RCU grace period is needed for lockless qdiscs added in the commit >>>>> c5ad119fb6c09 ("net: sched: pfifo_fast use skb_array"). >>>>> >>>>> It is needed now that qdiscs may be lockless otherwise we risk >>>>> free'ing a qdisc that is still in use from datapath. Additionally, >>>>> push list cleanup into RCU callback. Otherwise we risk the datapath >>>>> adding skbs during removal. >>>> >>>> What about qdisc_graft() -> dev_deactivate() -> synchronize_net() ? >>>> It doesn't work with your "lockless" patches? >>>> >>> >>> Well this is only in the 'parent == NULL' case otherwise we call >>> cops->graft(). Most sch_* seem to use qdisc_replace and this uses >>> sch_tree_lock(). >>> >>> The only converted qdisc mq and mqprio at this point don't care >>> though and do their own dev_deactivate/activate. So its not fixing >>> anything in the above mentioned commit. >> >> Sure, removing a class does not impact the whole device, >> but removing the root qdisc does. >> >> After your "lockless", skb_array_consume_bh() is called in >> pfifo_fast_reset() and ptr_ring_cleanup() is called in >> pfifo_fast_destroy(), assuming skb_array is not buggy, what race >> do we have here with datapath? >> > >None at the moment. > >> >>> >>> I still think it will need to be done eventually. If it resolves >>> the miniq case it seems like a good idea. Although per Jakub's comment >>> perhaps I pulled too much into the RCU handler. >> >> The case Jakub reported is a RCU callback missing a rcu >> barrier. I don't understand why you keep believing it is RCU >> readers on datapath.> >> Not even to mention ingress is not affected by your "lockless" >> thing. >> > >I was thinking about the case where we want a lockless qdisc >with classes. Doing the qdisc destroy after a grace period would >solve this. Also we could start to cleanup a lot of the locking >and extra bits around 'running' qdisc and such by doing a clean >xchg on the qdisc layer. It seems that a dev_activate/deactivate >just to install a new qdisc is not needed. > >Anyways future work. However if it resolves the miniq issue, as >Jiri indicated, seems like a clean fix. Although Jakub's issue >with the patch would need to be addressed. Seems he gets a WARN_ON >if the offload is not disabled but the device is unitialized.
Why just moving qdisc_free to rcu is not enough? It would resolve this issue and also avoid using synchronize net. Something like: diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 83a3e47..487288e 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -100,6 +100,7 @@ struct Qdisc { refcount_t refcnt; spinlock_t busylock ____cacheline_aligned_in_smp; + struct rcu_head rcu; }; static inline void qdisc_refcount_inc(struct Qdisc *qdisc) diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cd1b200..9beffd1 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -698,8 +698,10 @@ void qdisc_reset(struct Qdisc *qdisc) } EXPORT_SYMBOL(qdisc_reset); -static void qdisc_free(struct Qdisc *qdisc) +static void qdisc_free_rcu(struct rcu_head *rcu) { + struct Qdisc *qdisc = container_of(rcu, struct Qdisc, rcu); + if (qdisc_is_percpu_stats(qdisc)) { free_percpu(qdisc->cpu_bstats); free_percpu(qdisc->cpu_qstats); @@ -732,7 +734,7 @@ void qdisc_destroy(struct Qdisc *qdisc) kfree_skb_list(qdisc->gso_skb); kfree_skb(qdisc->skb_bad_txq); - qdisc_free(qdisc); + call_rcu(&qdisc->rcu, qdisc_free_rcu); } EXPORT_SYMBOL(qdisc_destroy); -- 2.9.5