Re: [PATCH net] net: dsa: reference count the host mdb addresses

Florian Fainelli Mon, 19 Oct 2020 19:12:06 -0700



On 10/15/2020 2:27 PM, Vladimir Oltean wrote:

Currently any DSA switch that implements the multicast ops (properly,
that is) gets these errors after just sitting for a while, with at least
2 ports bridged:

[  286.013814] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object 
(id=3)

The reason has to do with this piece of code:

        netdev_for_each_lower_dev(dev, lower_dev, iter)
                br_mdb_switchdev_host_port(dev, lower_dev, mp, type);

called from:

br_multicast_group_expired
-> br_multicast_host_leave
    -> br_mdb_notify
       -> br_mdb_switchdev_host

Basically, that code is correct. It tells each switchdev port that the
host can leave that multicast group. But in the case of DSA, all user
ports are connected to the host through the same pipe. So, because DSA
translates a host MDB to a normal MDB on the CPU port, this means that
when all user ports leave a multicast group, DSA tries to remove it N
times from the CPU port.

We should be reference-counting these addresses.

Fixes: 5f4dbc50ce4d ("net: dsa: slave: Handle switchdev host mdb add/del")
Signed-off-by: Vladimir Oltean <[email protected]>


This looks good to me, just one possible problem below:

[snip]

+
+       a = kzalloc(sizeof(*a), GFP_KERNEL);
+       if (!a)
+               return -ENOMEM;

I believe this needs to be GFP_ATOMIC if we are to follownet/bridge/br_mdb.c::br_mdb_notify which does its netlink messagesallocations using GFP_ATOMIC. On a side note, it woul dbe very helpfulif we could annotate all bridge functions accordingly with their context(BH, process, etc.).

--
Florian

Re: [PATCH net] net: dsa: reference count the host mdb addresses

Reply via email to