On Tue, Jun 23, 2020 at 1:45 AM Zhang,Qiang <qiang.zh...@windriver.com> wrote: > > There are some message in kernelv5.4, I don't know if it will help. > > demsg: > > cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or > net_cls activation ... > -----------[ cut here ]----------- > percpu ref (cgroup_bpf_release_fn) <= 0 (-12) after switching to atomic > WARNING: CPU: 1 PID: 0 at lib/percpu-refcount.c:161 > percpu_ref_switch_to_atomic_rcu+0x12a/0x140
Yes, this proves we have the refcnt bug which my patch tries to fix. The negative refcnt is exactly a consequence of the bug, as without my patch we just put the refcnt without holding it first. If you can reproduce it, please test my patch: https://patchwork.ozlabs.org/project/netdev/patch/20200616180352.18602-1-xiyou.wangc...@gmail.com/ But, so far I still don't have a good explanation to the NULL pointer deref. I think that one is an older bug, and we need to check for NULL even after we fix the refcnt bug, but I do not know how it is just exposed recently with Zefan's patch. I am still trying to find an explanation. Thanks!