On 03/25, Stanislav Fomichev wrote: > On 03/24, Jakub Kicinski wrote: > > On Tue, 24 Mar 2026 15:49:27 -0700 Stanislav Fomichev wrote: > > > > > Not sure why cancel+release, maybe you're thinking about the > > > > > unregister > > > > > path? This is rtnl_unlock -> netdev_run_todo -> __rtnl_unlock + some > > > > > extras. > > > > > > > > > > And the flush is here to plumb the addresses to the real devices > > > > > before we return to the callers. Mostly because of the following > > > > > things we have in the tests: > > > > > > > > > > # TEST: team cleanup mode lacp > > > > > [FAIL] > > > > > # macvlan unicast address not found on a slave > > > > > > > > > > Can you explain a bit more on the suggestion? > > > > > > > > Oh, I thought it's here for unregister! Feels like it'd be cleaner to > > > > add the flush in dev_*c_add() and friends? How hard would it be to > > > > identify the callers in atomic context? > > > > > > Not sure we can do it in dev_xc_add because it runs under rtnl :-( > > > I currently do flush in netdev_run_todo because that's the place that > > > doesn't hold rtnl. Otherwise flush will get stuck because the work > > > handler grabs it... > > > > I was thinking of something a'la linkwatch. We can "steal" / "flush" > > the pending work inline. I guess linkwatch is a major source of races > > over the years... > > > > Does the macvlan + team problem still happens with the current > > implementation minus the flush? We are only flushing once so only > > pushing the addresses thru one layer of async callbacks. > > Yes, it does happen consistently when I remove the flush. It also > happens with my internal v4, so I need to look again at what's going on. > Not sure whether it's my internal regression or I was just sloppy/lucky > (since you're correct in pointing out that we flush only once).
Hmm, the test does 'team -d' in the background. That's why it works for bonding, but not the teaming. I'll update the test to a bunch of 'ip' commands instead of starting a daemon.. > Before I went down the workqueue route, I had a simple > net_todo_list-like approach: `list_add_tail` on enqueue and > `while(!list_empty) run_work()` on rtnl_unlock. This had a nice properly of > tracking re-submissions (by checking whether the device's list_head is > linked into the list or not) and it was relatively easy to do the > recursive flush. Let me try get back to this approach and see whether > it solves the flush? Not sure what wq buys us at this point. Will still look into that, maybe something similar to the linkwatch as you mentioned.

