On Thu, 11 Jun 2020 at 08:21, Cong Wang <[email protected]> wrote: >
Hi Cong :) > On Wed, Jun 10, 2020 at 7:48 AM Taehee Yoo <[email protected]> wrote: > > > > On Tue, 9 Jun 2020 at 06:53, Cong Wang <[email protected]> wrote: > > > > > > > Hi Cong, > > Thank you for this work! > > > > > The dynamic key update for addr_list_lock still causes troubles, > > > for example the following race condition still exists: > > > > > > CPU 0: CPU 1: > > > (RCU read lock) (RTNL lock) > > > dev_mc_seq_show() netdev_update_lockdep_key() > > > -> lockdep_unregister_key() > > > -> netif_addr_lock_bh() > > > > > > because lockdep doesn't provide an API to update it atomically. > > > Therefore, we have to move it back to static keys and use subclass > > > for nest locking like before. > > > > > > > I'm sorry for the late reply. > > I agree that using subclass mechanism to avoid too many lockdep keys. > > Avoiding too many lockdep keys is not the real goal of my patch, > its main purpose is to fix a race condition shown above. Just FYI. > Thank you for notifying me. > > > But the subclass mechanism is also not updated its subclass key > > automatically. So, if upper/lower relationship is changed, > > interface would have incorrect subclass key. > > It eventually results in lockdep warning. > > So dev->lower_level is not updated accordingly? I just blindly trust > dev->lower_level, as you use it in other places too. > > > And, I think this patch doesn't contain bonding and team module part. > > So, an additional patch is needed. > > Hmm, dev->lower_level is generic, so is addr_list_lock. > > Again, I just assume you already update dev->lower_level each time > the topology changes. I added some printk() to verify it too for my > simple bond over bond case. So, I can't immediately see what is > wrong with dev->lower_level here. Do you mind to be more specific? > Or I misunderstand your point? > > > > + lockdep_set_class_and_subclass(&dev->addr_list_lock, > > > + &vlan_netdev_addr_lock_key, > > > + subclass); In this patch, lockdep_set_class_and_subclass() is used. As far as I know, this function initializes lockdep key and subclass value with a given variable. A dev->lower_level variable is used as a subclass value in this patch. When dev->lower_level value is changed, the subclass value of this lockdep key is not changed automatically. If this value has to be changed, additional function is needed. >>> netif_addr_lock_bh(from); In this function, internally spin_lock_bh() is used and this function might use an 'initialized subclass value' not a current dev->lower_level. At this point, I think the lockdep splat might occur. +static inline void netif_addr_lock_nested(struct net_device *dev) +{ + spin_lock_nested(&dev->addr_list_lock, dev->lower_level); +} In this patch, you used netif_addr_lock_nested() too. These two subclass values could be different. But I'm not sure whether using spin_lock_nested with two different subclass values are the right way or not. If I misunderstood the lockdep and this logic, please let me know! Thanks :)
