On Wed, Dec 30, 2015 at 09:53:13AM -0800, John Fastabend wrote: > The qdisc_reset operation depends on the qdisc lock at the moment > to halt any additions to gso_skb and statistics while the list is > free'd and the stats zeroed. > > Without the qdisc lock we can not guarantee another cpu is not in > the process of adding a skb to one of the "cells". Here are the > two cases we have to handle. > > case 1: qdisc_graft operation. In this case a "new" qdisc is attached > and the 'qdisc_destroy' operation is called on the old qdisc. > The destroy operation will wait a rcu grace period and call > qdisc_rcu_free(). At which point gso_cpu_skb is free'd along > with all stats so no need to zero stats and gso_cpu_skb from > the reset operation itself. > > Because we can not continue to call qdisc_reset before waiting > an rcu grace period so that the qdisc is detached from all > cpus simply do not call qdisc_reset() at all and let the > qdisc_destroy operation clean up the qdisc. Note, a refcnt > greater than 1 would cause the destroy operation to be > aborted however if this ever happened the reference to the > qdisc would be lost and we would have a memory leak. > > case 2: dev_deactivate sequence. This can come from a user bringing > the interface down which causes the gso_skb list to be flushed > and the qlen zero'd. At the moment this is protected by the > qdisc lock so while we clear the qlen/gso_skb fields we are > guaranteed no new skbs are added. For the lockless case > though this is not true. To resolve this move the qdisc_reset > call after the new qdisc is assigned and a grace period is > exercised to ensure no new skbs can be enqueued. Further > the RTNL lock is held so we can not get another call to > activate the qdisc while the skb lists are being free'd. > > Finally, fix qdisc_reset to handle the per cpu stats and > skb lists. > > Signed-off-by: John Fastabend <john.r.fastab...@intel.com> ... > - /* Prune old scheduler */ > - if (oqdisc && atomic_read(&oqdisc->refcnt) <= 1) > - qdisc_reset(oqdisc); > - ... > - sync_needed |= !dev->dismantle; > + sync_needed = true;
I think killing above <=1 check and forcing synchronize_net() will make qdisc destruction more reliable than it's right now. Your commit log sounds too pessimistic :) btw, sync_needed variable can be removed as well. All other patches look good. Great stuff overall! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html