On 10/15/2020 2:27 PM, Vladimir Oltean wrote:
Currently any DSA switch that implements the multicast ops (properly, that is) gets these errors after just sitting for a while, with at least 2 ports bridged: [ 286.013814] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object (id=3) The reason has to do with this piece of code: netdev_for_each_lower_dev(dev, lower_dev, iter) br_mdb_switchdev_host_port(dev, lower_dev, mp, type); called from: br_multicast_group_expired -> br_multicast_host_leave -> br_mdb_notify -> br_mdb_switchdev_host Basically, that code is correct. It tells each switchdev port that the host can leave that multicast group. But in the case of DSA, all user ports are connected to the host through the same pipe. So, because DSA translates a host MDB to a normal MDB on the CPU port, this means that when all user ports leave a multicast group, DSA tries to remove it N times from the CPU port. We should be reference-counting these addresses. Fixes: 5f4dbc50ce4d ("net: dsa: slave: Handle switchdev host mdb add/del") Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com>
This looks good to me, just one possible problem below: [snip]
+ + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return -ENOMEM;
I believe this needs to be GFP_ATOMIC if we are to follow net/bridge/br_mdb.c::br_mdb_notify which does its netlink messages allocations using GFP_ATOMIC. On a side note, it woul dbe very helpful if we could annotate all bridge functions accordingly with their context (BH, process, etc.).
-- Florian