On 08/01/2021 02:20, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.olt...@nxp.com> > > There is an effort to convert .ndo_get_stats64 to sleepable context, and > for that to work, we need to prevent callers of dev_get_stats from using > atomic locking. > > The bonding driver retrieves its statistics recursively from its lower > interfaces, with additional care to only count packets sent/received > while those lowers were actually enslaved to the bond - see commit > 5f0c5f73e5ef ("bonding: make global bonding stats more reliable"). > > Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and > bond->lock itself"), the bonding driver uses the following protection > for its array of slaves: RCU for readers and rtnl_mutex for updaters. > This is not great because there is another movement [ somehow > simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce > driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has > become a very contended resource. > > The aforementioned commit removed an interesting comment: > > /* [...] we can't hold bond->lock [...] because we'll > * deadlock. The only solution is to rely on the fact > * that we're under rtnl_lock here, and the slaves > * list won't change. This doesn't solve the problem > * of setting the slave's MTU while it is > * transmitting, but the assumption is that the base > * driver can handle that. > * > * TODO: figure out a way to safely iterate the slaves > * list, but without holding a lock around the actual > * call to the base driver. > */ > > The above summarizes pretty well the challenges we have with nested > bonding interfaces (bond over bond over bond over...), which need to be > addressed by a better locking scheme that also not relies on the bloated > rtnl_mutex. > > Instead of using something as broad as the rtnl_mutex to ensure > serialization of updates to the slave array, we can reintroduce a > private mutex in the bonding driver, called slaves_lock. > This mutex circles the only updater, bond_update_slave_arr, and ensures > that whatever other readers want to see a consistent slave array, they > don't need to hold the rtnl_mutex for that. > > Now _of_course_ I did not convert the entire driver to use > bond_for_each_slave protected by the bond->slaves_lock, and > rtnl_dereference to bond_dereference. I just started that process by > converting the one reader I needed: ndo_get_stats64. Not only is it nice > to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact > forbidden to do so (since top-level callers may hold netif_lists_lock, > which is a sub-lock of the rtnl_mutex, and therefore this would cause a > lock inversion and a deadlock). > > To solve the nesting problem, the simple way is to not hold any locks > when recursing into the slave netdev operation, which is exactly the > approach that we take. We can "cheat" and use dev_hold to take a > reference on the slave net_device, which is enough to ensure that > netdev_wait_allrefs() waits until we finish, and the kernel won't fault. > However, the slave structure might no longer be valid, just its > associated net_device. That isn't a biggie. We just need to do some more > work to ensure that the slave exists after we took the statistics, and > if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef. > > Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com> > --- > Changes in v4: > Now there is code to propagate errors. > > Changes in v3: > - Added kfree(dev_stats); > - Removed the now useless stats_lock (bond->bond_stats and > slave->slave_stats are now protected by bond->slaves_lock) > > Changes in v2: > Switched to the new scheme of holding just a refcnt to the slave > interfaces while recursing with dev_get_stats. > > drivers/net/bonding/bond_main.c | 113 ++++++++++++++------------------ > include/net/bonding.h | 49 +++++++++++++- > 2 files changed, 99 insertions(+), 63 deletions(-) > [snip] > +static inline int bond_get_slave_arr(struct bonding *bond, > + struct net_device ***slaves, > + int *num_slaves) > +{ > + struct net *net = dev_net(bond->dev); > + struct list_head *iter; > + struct slave *slave; > + int i = 0; > + > + mutex_lock(&bond->slaves_lock); > + > + *slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL); > + if (!(*slaves)) { > + netif_lists_unlock(net); > + return -ENOMEM; > + }
The error path looks wrong, you unlock netif_lists and return with slaves_lock held. Cheers, Nik > + > + bond_for_each_slave(bond, slave, iter) { > + dev_hold(slave->dev); > + *slaves[i++] = slave->dev; > + } > + > + *num_slaves = bond->slave_cnt; > + > + mutex_unlock(&bond->slaves_lock); > + > + return 0; > +} > + > +static inline void bond_put_slave_arr(struct net_device **slaves, > + int num_slaves) > +{ > + int i; > + > + for (i = 0; i < num_slaves; i++) > + dev_put(slaves[i]); > + > + kfree(slaves); > +} > + > #define BOND_PRI_RESELECT_ALWAYS 0 > #define BOND_PRI_RESELECT_BETTER 1 > #define BOND_PRI_RESELECT_FAILURE 2 >