On Sun 17 Jan 2021 at 17:15, Jamal Hadi Salim <j...@mojatatu.com> wrote: > On 2021-01-16 7:56 p.m., Cong Wang wrote: >> From: Cong Wang <cong.w...@bytedance.com> >> tcf_action_init_1() loads tc action modules automatically with >> request_module() after parsing the tc action names, and it drops RTNL >> lock and re-holds it before and after request_module(). This causes a >> lot of troubles, as discovered by syzbot, because we can be in the >> middle of batch initializations when we create an array of tc actions. >> One of the problem is deadlock: >> CPU 0 CPU 1 >> rtnl_lock(); >> for (...) { >> tcf_action_init_1(); >> -> rtnl_unlock(); >> -> request_module(); >> rtnl_lock(); >> for (...) { >> tcf_action_init_1(); >> -> tcf_idr_check_alloc(); >> // Insert one action into idr, >> // but it is not committed until >> // tcf_idr_insert_many(), then drop >> // the RTNL lock in the _next_ >> // iteration >> -> rtnl_unlock(); >> -> rtnl_lock(); >> -> a_o->init(); >> -> tcf_idr_check_alloc(); >> // Now waiting for the same index >> // to be committed >> -> request_module(); >> -> rtnl_lock() >> // Now waiting for RTNL lock >> } >> rtnl_unlock(); >> } >> rtnl_unlock(); >> This is not easy to solve, we can move the request_module() before >> this loop and pre-load all the modules we need for this netlink >> message and then do the rest initializations. So the loop breaks down >> to two now: >> for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { >> struct tc_action_ops *a_o; >> a_o = tc_action_load_ops(name, tb[i]...); >> ops[i - 1] = a_o; >> } >> for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { >> act = tcf_action_init_1(ops[i - 1]...); >> } >> Although this looks serious, it only has been reported by syzbot, so it >> seems hard to trigger this by humans. And given the size of this patch, >> I'd suggest to make it to net-next and not to backport to stable. >> This patch has been tested by syzbot and tested with tdc.py by me. >> > > LGTM. > Initially i was worried about performance impact but i found nothing > observable. We need to add a tdc test for batch (I can share how i did > batch testing at next meet). > > Tested-by: Jamal Hadi Salim <j...@mojatatu.com> > Acked-by: Jamal Hadi Salim <j...@mojatatu.com> > > cheers, > jamal
Hi, Thanks for adding me to the thread! I ran our performance tests with the patch applied and didn't observe any regression. Regards, Vlad