On Sat, Jun 20, 2020 at 8:58 AM Roman Gushchin <g...@fb.com> wrote: > > On Fri, Jun 19, 2020 at 08:00:41PM -0700, Cong Wang wrote: > > On Fri, Jun 19, 2020 at 6:14 PM Roman Gushchin <g...@fb.com> wrote: > > > > > > On Sat, Jun 20, 2020 at 09:00:40AM +0800, Zefan Li wrote: > > > > I think so, though I'm not familiar with the bfp cgroup code. > > > > > > > > > If so, we might wanna fix it in a different way, > > > > > just checking if (!(css->flags & CSS_NO_REF)) in cgroup_bpf_put() > > > > > like in cgroup_put(). It feels more reliable to me. > > > > > > > > > > > > > Yeah I also have this idea in my mind. > > > > > > I wonder if the following patch will fix the issue? > > > > Interesting, AFAIU, this refcnt is for bpf programs attached > > to the cgroup. By this suggestion, do you mean the root > > cgroup does not need to refcnt the bpf programs attached > > to it? This seems odd, as I don't see how root is different > > from others in terms of bpf programs which can be attached > > and detached in the same way. > > > > I certainly understand the root cgroup is never gone, but this > > does not mean the bpf programs attached to it too. > > > > What am I missing? > > It's different because the root cgroup can't be deleted. > > All this reference counting is required to automatically detach bpf programs > from a _deleted_ cgroup (look at cgroup_bpf_offline()). It's required > because a cgroup can be in dying state for a long time being pinned by a > pagecache page, for example. Only a user can detach a bpf program from > an existing cgroup.
Yeah, but users can still detach the bpf programs from root cgroup. IIUC, after detaching, the pointer in the bpf array will be empty_prog_array which is just an array of NULL. Then __cgroup_bpf_run_filter_skb() will deref it without checking NULL (as check_non_null == false). This matches the 0000000000000010 pointer seen in the bug reports, the 0x10, that is 16, is the offset of items[] in struct bpf_prog_array. So looks like we have to add a NULL check there regardless of refcnt. Also, I am not sure whether your suggested patch makes a difference for percpu refcnt, as percpu_ref_put() will never call ->release() until percpu_ref_kill(), which is never called on root cgroup? Thanks.