On Tue, 2020-06-16 at 13:38 +0300, Vladimir Oltean wrote: > Hi Davide, > > On Tue, 16 Jun 2020 at 13:13, Davide Caratti <dcara...@redhat.com> wrote: > > hello Vladimir, > > > > thanks a lot for reviewing this. > > > > On Tue, 2020-06-16 at 00:55 +0300, Vladimir Oltean wrote:
[...] > > > What if you split the "replace" functionality of gate_setup_timer into > > > a separate gate_cancel_timer function, which you could call earlier > > > (before taking the spin lock)? > > > > I think it would introduce the following 2 problems: > > > > problem #1) a race condition, see below: [...] > > > @@ -433,6 +448,11 @@ static int tcf_gate_init(struct net *net, struct > > > nlattr *nla, > > > > if (goto_ch) > > > > tcf_chain_put_by_act(goto_ch); > > > > release_idr: > > > > + /* action is not in: hitimer can be inited without taking > > > > tcf_lock */ > > > > + if (ret == ACT_P_CREATED) > > > > + gate_setup_timer(gact, gact->param.tcfg_basetime, > > > > + gact->tk_offset, > > > > gact->param.tcfg_clockid, > > > > + true); > > > > please note, here I felt the need to add a comment, because when ret == > > ACT_P_CREATED the action is not inserted in any list, so there is no > > concurrent writer of gact-> members for that action. > > > > Then please rephrase the comment. I had read it and it still wasn't > clear at all for me what you were talking about. something like: /* action is not yet inserted in any list: it's safe to init hitimer * without taking tcf_lock. */ would be ok? [...] > I wonder, could you call tcf_gate_cleanup instead of just canceling the > hrtimer? not with the current tcf_gate_cleanup() [1] and parse_gate_list() [2], because it would introduce another bug: 'p->entries' gets cleared on action overwrite after being successfully created here: 395 if (tb[TCA_GATE_ENTRY_LIST]) { 396 err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack); 397 if (err < 0) 398 goto chain_put; 399 } like mentioned earlier, 'hitimer' can not be canceled/re-initialized easily when tcf_gate_init() still has a possible error path. And in my understanding 'p->entries' must be consistent when the timer is initialized. IMO, the correct way to handle 'entries' is to: - populate the list on a local variable, before taking the spinlock and allocating the IDR - assign to p->entries after validation is successful (with the spinlock taken). Same as what was done with 'cycletime' in patch 1/2, but with the variable initialized (btw, thanks for catching this), and free the old list in case of action replace - release the newly allocated list in the error path of tcf_gate_init() (but again, this would be a fix for 'entries' - not for 'hitimer', so I plan to work on it as a separate patch, that fits better 'net-next' rather than 'net'). -- davide [1] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L450 [2] https://elixir.bootlin.com/linux/v5.8-rc1/source/net/sched/act_gate.c#L235