On 10/15/2020 2:27 PM, Vladimir Oltean wrote:
Currently any DSA switch that implements the multicast ops (properly,
that is) gets these errors after just sitting for a while, with at least
2 ports bridged:

[  286.013814] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object 
(id=3)

The reason has to do with this piece of code:

        netdev_for_each_lower_dev(dev, lower_dev, iter)
                br_mdb_switchdev_host_port(dev, lower_dev, mp, type);

called from:

br_multicast_group_expired
-> br_multicast_host_leave
    -> br_mdb_notify
       -> br_mdb_switchdev_host

Basically, that code is correct. It tells each switchdev port that the
host can leave that multicast group. But in the case of DSA, all user
ports are connected to the host through the same pipe. So, because DSA
translates a host MDB to a normal MDB on the CPU port, this means that
when all user ports leave a multicast group, DSA tries to remove it N
times from the CPU port.

We should be reference-counting these addresses.

Fixes: 5f4dbc50ce4d ("net: dsa: slave: Handle switchdev host mdb add/del")
Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com>

This looks good to me, just one possible problem below:

[snip]

+
+       a = kzalloc(sizeof(*a), GFP_KERNEL);
+       if (!a)
+               return -ENOMEM;

I believe this needs to be GFP_ATOMIC if we are to follow net/bridge/br_mdb.c::br_mdb_notify which does its netlink messages allocations using GFP_ATOMIC. On a side note, it woul dbe very helpful if we could annotate all bridge functions accordingly with their context (BH, process, etc.).
--
Florian

Reply via email to