On 1/11/21 3:09 PM, Vladimir Oltean wrote: > From: Vladimir Oltean <vladimir.olt...@nxp.com> > > Currently the following happens when a DSA master driver unbinds while > there are DSA switches attached to it: > > $ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507 > Call trace: > rollback_registered_many+0x5fc/0x688 > unregister_netdevice_queue+0x98/0x120 > dsa_slave_destroy+0x4c/0x88 > dsa_port_teardown.part.16+0x78/0xb0 > dsa_tree_teardown_switches+0x58/0xc0 > dsa_unregister_switch+0x104/0x1b8 > felix_pci_remove+0x24/0x48 > pci_device_remove+0x48/0xf0 > device_release_driver_internal+0x118/0x1e8 > device_driver_detach+0x28/0x38 > unbind_store+0xd0/0x100 > > Located at the above location is this WARN_ON: > > /* Notifier chain MUST detach us all upper devices. */ > WARN_ON(netdev_has_any_upper_dev(dev)); > > Other stacked interfaces, like VLAN, do indeed listen for > NETDEV_UNREGISTER on the real_dev and also unregister themselves at that > time, which is clearly the behavior that rollback_registered_many > expects. But DSA interfaces are not VLAN. They have backing hardware > (platform devices, PCI devices, MDIO, SPI etc) which have a life cycle > of their own and we can't just trigger an unregister from the DSA > framework when we receive a netdev notifier that the master unregisters. > > Luckily, there is something we can do, and that is to inform the driver > core that we have a runtime dependency to the DSA master interface's > device, and create a device link where that is the supplier and we are > the consumer. Having this device link will make the DSA switch unbind > before the DSA master unbinds, which is enough to avoid the WARN_ON from > rollback_registered_many. > > Note that even before the blamed commit, DSA did nothing intelligent > when the master interface got unregistered either. See the discussion > here: > https://lore.kernel.org/netdev/20200505210253.20311-1-f.faine...@gmail.com/ > But this time, at least the WARN_ON is loud enough that the > upper_dev_link commit can be blamed. > > The advantage with this approach vs dev_hold(master) in the attached > link is that the latter is not meant for long term reference counting. > With dev_hold, the only thing that will happen is that when the user > attempts an unbind of the DSA master, netdev_wait_allrefs will keep > waiting and waiting, due to DSA keeping the refcount forever. DSA would > not access freed memory corresponding to the master interface, but the > unbind would still result in a freeze. Whereas with device links, > graceful teardown is ensured. It even works with cascaded DSA trees. > > $ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind > [ 1818.797546] device swp0 left promiscuous mode > [ 1819.301112] sja1105 spi2.0: Link is Down > [ 1819.307981] DSA: tree 1 torn down > [ 1819.312408] device eno2 left promiscuous mode > [ 1819.656803] mscc_felix 0000:00:00.5: Link is Down > [ 1819.667194] DSA: tree 0 torn down > [ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down > > This approach allows us to keep the DSA framework absolutely unchanged, > and the driver core will just know to unbind us first when the master > goes away - as opposed to the large (and probably impossible) rework > required if attempting to listen for NETDEV_UNREGISTER. > > As per the documentation at Documentation/driver-api/device_link.rst, > specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link > to be automatically purged when the consumer fails to probe or later > unbinds. So we don't need to keep the consumer_link variable in struct > dsa_switch. > > Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get > rid of lockdep warnings") > Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com>
Reviewed-by: Florian Fainelli <f.faine...@gmail.com> Tested-by: Florian Fainelli <f.faine...@gmail.com> Thanks! -- Florian