On 16-11-26 04:18 PM, Daniel Borkmann wrote: > Roi reported a crash in flower where tp->root was NULL in ->classify() > callbacks. Reason is that in ->destroy() tp->root is set to NULL via > RCU_INIT_POINTER(). It's problematic for some of the classifiers, because > this doesn't respect RCU grace period for them, and as a result, still > outstanding readers from tc_classify() will try to blindly dereference > a NULL tp->root. > > The tp->root object is strictly private to the classifier implementation > and holds internal data the core such as tc_ctl_tfilter() doesn't know > about. Within some classifiers, such as cls_bpf, cls_basic, etc, tp->root > is only checked for NULL in ->get() callback, but nowhere else. This is > misleading and seemed to be copied from old classifier code that was not > cleaned up properly. For example, d3fa76ee6b4a ("[NET_SCHED]: cls_basic: > fix NULL pointer dereference") moved tp->root initialization into ->init() > routine, where before it was part of ->change(), so ->get() had to deal > with tp->root being NULL back then, so that was indeed a valid case, after > d3fa76ee6b4a, not really anymore. We used to set tp->root to NULL long > ago in ->destroy(), see 47a1a1d4be29 ("pkt_sched: remove unnecessary xchg() > in packet classifiers"); but the NULLifying was reintroduced with the > RCUification, but it's not correct for every classifier implementation. > > In the cases that are fixed here with one exception of cls_cgroup, tp->root > object is allocated and initialized inside ->init() callback, which is always > performed at a point in time after we allocate a new tp, which means tp and > thus tp->root was not globally visible in the tp chain yet (see > tc_ctl_tfilter()). > Also, on destruction tp->root is strictly kfree_rcu()'ed in ->destroy() > handler, same for the tp which is kfree_rcu()'ed right when we return > from ->destroy() in tcf_destroy(). This means, the head object's lifetime > for such classifiers is always tied to the tp lifetime. The RCU callback > invocation for the two kfree_rcu() could be out of order, but that's fine > since both are independent. > > Dropping the RCU_INIT_POINTER(tp->root, NULL) for these classifiers here > means that 1) we don't need a useless NULL check in fast-path and, 2) that > outstanding readers of that tp in tc_classify() can still execute under > respect with RCU grace period as it is actually expected. > > Things that haven't been touched here: cls_fw and cls_route. They each > handle tp->root being NULL in ->classify() path for historic reasons, so > their ->destroy() implementation can stay as is. If someone actually > cares, they could get cleaned up at some point to avoid the test in fast > path. cls_u32 doesn't set tp->root to NULL. For cls_rsvp, I just added a > !head should anyone actually be using/testing it, so it at least aligns with > cls_fw and cls_route. For cls_flower we additionally need to defer rhashtable > destruction (to a sleepable context) after RCU grace period as concurrent > readers might still access it. (Note that in this case we need to hold module > reference to keep work callback address intact, since we only wait on module > unload for all call_rcu()s to finish.) > > This fixes one race to bring RCU grace period guarantees back. Next step > as worked on by Cong however is to fix 1e052be69d04 ("net_sched: destroy > proto tp when all filters are gone") to get the order of unlinking the tp > in tc_ctl_tfilter() for the RTM_DELTFILTER case right by moving > RCU_INIT_POINTER() before tcf_destroy() and let the notification for > removal be done through the prior ->delete() callback. Both are independant > issues. Once we have that right, we can then clean tp->root up for a number > of classifiers by not making them RCU pointers, which requires a new callback > (->uninit) that is triggered from tp's RCU callback, where we just kfree() > tp->root from there.
Thanks looks good to me and appreciate the detailed commit message. Acked-by: John Fastabend <john.r.fastab...@intel.com> > > Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") > Fixes: 9888faefe132 ("net: sched: cls_basic use RCU") > Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") > Fixes: 77b9900ef53a ("tc: introduce Flower classifier") > Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") > Fixes: 952313bd6258 ("net: sched: cls_cgroup use RCU") > Reported-by: Roi Dayan <r...@mellanox.com> > Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > Cc: Cong Wang <xiyou.wangc...@gmail.com> > Cc: John Fastabend <john.fastab...@gmail.com> > Cc: Roi Dayan <r...@mellanox.com> > Cc: Jiri Pirko <j...@mellanox.com> > ---