On 1/11/21 3:09 PM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.olt...@nxp.com>
> 
> Currently the following happens when a DSA master driver unbinds while
> there are DSA switches attached to it:
> 
> $ echo 0000:00:00.5 > /sys/bus/pci/drivers/mscc_felix/unbind
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 392 at net/core/dev.c:9507
> Call trace:
>  rollback_registered_many+0x5fc/0x688
>  unregister_netdevice_queue+0x98/0x120
>  dsa_slave_destroy+0x4c/0x88
>  dsa_port_teardown.part.16+0x78/0xb0
>  dsa_tree_teardown_switches+0x58/0xc0
>  dsa_unregister_switch+0x104/0x1b8
>  felix_pci_remove+0x24/0x48
>  pci_device_remove+0x48/0xf0
>  device_release_driver_internal+0x118/0x1e8
>  device_driver_detach+0x28/0x38
>  unbind_store+0xd0/0x100
> 
> Located at the above location is this WARN_ON:
> 
>       /* Notifier chain MUST detach us all upper devices. */
>       WARN_ON(netdev_has_any_upper_dev(dev));
> 
> Other stacked interfaces, like VLAN, do indeed listen for
> NETDEV_UNREGISTER on the real_dev and also unregister themselves at that
> time, which is clearly the behavior that rollback_registered_many
> expects. But DSA interfaces are not VLAN. They have backing hardware
> (platform devices, PCI devices, MDIO, SPI etc) which have a life cycle
> of their own and we can't just trigger an unregister from the DSA
> framework when we receive a netdev notifier that the master unregisters.
> 
> Luckily, there is something we can do, and that is to inform the driver
> core that we have a runtime dependency to the DSA master interface's
> device, and create a device link where that is the supplier and we are
> the consumer. Having this device link will make the DSA switch unbind
> before the DSA master unbinds, which is enough to avoid the WARN_ON from
> rollback_registered_many.
> 
> Note that even before the blamed commit, DSA did nothing intelligent
> when the master interface got unregistered either. See the discussion
> here:
> https://lore.kernel.org/netdev/20200505210253.20311-1-f.faine...@gmail.com/
> But this time, at least the WARN_ON is loud enough that the
> upper_dev_link commit can be blamed.
> 
> The advantage with this approach vs dev_hold(master) in the attached
> link is that the latter is not meant for long term reference counting.
> With dev_hold, the only thing that will happen is that when the user
> attempts an unbind of the DSA master, netdev_wait_allrefs will keep
> waiting and waiting, due to DSA keeping the refcount forever. DSA would
> not access freed memory corresponding to the master interface, but the
> unbind would still result in a freeze. Whereas with device links,
> graceful teardown is ensured. It even works with cascaded DSA trees.
> 
> $ echo 0000:00:00.2 > /sys/bus/pci/drivers/fsl_enetc/unbind
> [ 1818.797546] device swp0 left promiscuous mode
> [ 1819.301112] sja1105 spi2.0: Link is Down
> [ 1819.307981] DSA: tree 1 torn down
> [ 1819.312408] device eno2 left promiscuous mode
> [ 1819.656803] mscc_felix 0000:00:00.5: Link is Down
> [ 1819.667194] DSA: tree 0 torn down
> [ 1819.711557] fsl_enetc 0000:00:00.2 eno2: Link is Down
> 
> This approach allows us to keep the DSA framework absolutely unchanged,
> and the driver core will just know to unbind us first when the master
> goes away - as opposed to the large (and probably impossible) rework
> required if attempting to listen for NETDEV_UNREGISTER.
> 
> As per the documentation at Documentation/driver-api/device_link.rst,
> specifying the DL_FLAG_AUTOREMOVE_CONSUMER flag causes the device link
> to be automatically purged when the consumer fails to probe or later
> unbinds. So we don't need to keep the consumer_link variable in struct
> dsa_switch.
> 
> Fixes: 2f1e8ea726e9 ("net: dsa: link interfaces with the DSA master to get 
> rid of lockdep warnings")
> Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com>

Reviewed-by: Florian Fainelli <f.faine...@gmail.com>
Tested-by: Florian Fainelli <f.faine...@gmail.com>

Thanks!
-- 
Florian

Reply via email to