Hi Florian, On Mon, Oct 19, 2020 at 07:11:47PM -0700, Florian Fainelli wrote: > On 10/15/2020 2:27 PM, Vladimir Oltean wrote: > > Currently any DSA switch that implements the multicast ops (properly, > > that is) gets these errors after just sitting for a while, with at least > > 2 ports bridged: > > > > [ 286.013814] mscc_felix 0000:00:00.5 swp3: failed (err=-2) to del object > > (id=3) > > > > The reason has to do with this piece of code: > > > > netdev_for_each_lower_dev(dev, lower_dev, iter) > > br_mdb_switchdev_host_port(dev, lower_dev, mp, type); > > > > called from: > > > > br_multicast_group_expired > > -> br_multicast_host_leave > > -> br_mdb_notify > > -> br_mdb_switchdev_host > > > > Basically, that code is correct. It tells each switchdev port that the > > host can leave that multicast group. But in the case of DSA, all user > > ports are connected to the host through the same pipe. So, because DSA > > translates a host MDB to a normal MDB on the CPU port, this means that > > when all user ports leave a multicast group, DSA tries to remove it N > > times from the CPU port. > > > > We should be reference-counting these addresses. > > > > Fixes: 5f4dbc50ce4d ("net: dsa: slave: Handle switchdev host mdb add/del") > > Signed-off-by: Vladimir Oltean <vladimir.olt...@nxp.com> > > This looks good to me, just one possible problem below: > > [snip] > > > + > > + a = kzalloc(sizeof(*a), GFP_KERNEL); > > + if (!a) > > + return -ENOMEM; > > I believe this needs to be GFP_ATOMIC if we are to follow > net/bridge/br_mdb.c::br_mdb_notify which does its netlink messages > allocations using GFP_ATOMIC. On a side note, it woul dbe very helpful if we > could annotate all bridge functions accordingly with their context (BH, > process, etc.).
We are not in atomic context here, since Andrew took care to use SWITCHDEV_F_DEFER in br_mdb_switchdev_host_port(). Honestly, if he wouldn't have done that, we would have been pretty toast anyway, since we end up calling dsa_port_mdb_add(), which assumes it can sleep. And deferring in DSA is super complicated, due to the fact that SWITCHDEV_OBJ_ID_HOST_MDB is transactional. So it's ok to use GFP_KERNEL, I'll leave this as-is when I respin.