On 08/01/2021 02:20, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.olt...@nxp.com>
> 
> There is an effort to convert .ndo_get_stats64 to sleepable context, and
> for that to work, we need to prevent callers of dev_get_stats from using
> atomic locking.
> 
> The bonding driver retrieves its statistics recursively from its lower
> interfaces, with additional care to only count packets sent/received
> while those lowers were actually enslaved to the bond - see commit
> 5f0c5f73e5ef ("bonding: make global bonding stats more reliable").
> 
> Since commit 87163ef9cda7 ("bonding: remove last users of bond->lock and
> bond->lock itself"), the bonding driver uses the following protection
> for its array of slaves: RCU for readers and rtnl_mutex for updaters.
> This is not great because there is another movement [ somehow
> simultaneous with the one to make .ndo_get_stats64 sleepable ] to reduce
> driver usage of rtnl_mutex. This makes sense, because the rtnl_mutex has
> become a very contended resource.
> 
> The aforementioned commit removed an interesting comment:
> 
>       /* [...] we can't hold bond->lock [...] because we'll
>        * deadlock. The only solution is to rely on the fact
>        * that we're under rtnl_lock here, and the slaves
>        * list won't change. This doesn't solve the problem
>        * of setting the slave's MTU while it is
>        * transmitting, but the assumption is that the base
>        * driver can handle that.
>        *
>        * TODO: figure out a way to safely iterate the slaves
>        * list, but without holding a lock around the actual
>        * call to the base driver.
>        */
> 
> The above summarizes pretty well the challenges we have with nested
> bonding interfaces (bond over bond over bond over...), which need to be
> addressed by a better locking scheme that also not relies on the bloated
> rtnl_mutex.
> 
> Instead of using something as broad as the rtnl_mutex to ensure
> serialization of updates to the slave array, we can reintroduce a
> private mutex in the bonding driver, called slaves_lock.
> This mutex circles the only updater, bond_update_slave_arr, and ensures
> that whatever other readers want to see a consistent slave array, they
> don't need to hold the rtnl_mutex for that.
> 
> Now _of_course_ I did not convert the entire driver to use
> bond_for_each_slave protected by the bond->slaves_lock, and
> rtnl_dereference to bond_dereference. I just started that process by
> converting the one reader I needed: ndo_get_stats64. Not only is it nice
> to not hold rtnl_mutex in .ndo_get_stats64, but it is also in fact
> forbidden to do so (since top-level callers may hold netif_lists_lock,
> which is a sub-lock of the rtnl_mutex, and therefore this would cause a
> lock inversion and a deadlock).
> 
> To solve the nesting problem, the simple way is to not hold any locks
> when recursing into the slave netdev operation, which is exactly the
> approach that we take. We can "cheat" and use dev_hold to take a
> reference on the slave net_device, which is enough to ensure that
> netdev_wait_allrefs() waits until we finish, and the kernel won't fault.
> However, the slave structure might no longer be valid, just its
> associated net_device. That isn't a biggie. We just need to do some more
> work to ensure that the slave exists after we took the statistics, and
> if it still does, reapply the logic from Andy's commit 5f0c5f73e5ef.
> 
> Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com>
> ---
> Changes in v4:
> Now there is code to propagate errors.
> 
> Changes in v3:
> - Added kfree(dev_stats);
> - Removed the now useless stats_lock (bond->bond_stats and
>   slave->slave_stats are now protected by bond->slaves_lock)
> 
> Changes in v2:
> Switched to the new scheme of holding just a refcnt to the slave
> interfaces while recursing with dev_get_stats.
> 
>  drivers/net/bonding/bond_main.c | 113 ++++++++++++++------------------
>  include/net/bonding.h           |  49 +++++++++++++-
>  2 files changed, 99 insertions(+), 63 deletions(-)
> 
[snip]
> +static inline int bond_get_slave_arr(struct bonding *bond,
> +                                  struct net_device ***slaves,
> +                                  int *num_slaves)
> +{
> +     struct net *net = dev_net(bond->dev);
> +     struct list_head *iter;
> +     struct slave *slave;
> +     int i = 0;
> +
> +     mutex_lock(&bond->slaves_lock);
> +
> +     *slaves = kcalloc(bond->slave_cnt, sizeof(*slaves), GFP_KERNEL);
> +     if (!(*slaves)) {
> +             netif_lists_unlock(net);
> +             return -ENOMEM;
> +     }

The error path looks wrong, you unlock netif_lists and return with slaves_lock 
held.

Cheers,
 Nik

> +
> +     bond_for_each_slave(bond, slave, iter) {
> +             dev_hold(slave->dev);
> +             *slaves[i++] = slave->dev;
> +     }
> +
> +     *num_slaves = bond->slave_cnt;
> +
> +     mutex_unlock(&bond->slaves_lock);
> +
> +     return 0;
> +}
> +
> +static inline void bond_put_slave_arr(struct net_device **slaves,
> +                                   int num_slaves)
> +{
> +     int i;
> +
> +     for (i = 0; i < num_slaves; i++)
> +             dev_put(slaves[i]);
> +
> +     kfree(slaves);
> +}
> +
>  #define BOND_PRI_RESELECT_ALWAYS     0
>  #define BOND_PRI_RESELECT_BETTER     1
>  #define BOND_PRI_RESELECT_FAILURE    2
> 

Reply via email to