Hello,

On Sun, Jun 14, 2020 at 5:39 AM Daniël Sonck <dsonc...@gmail.com> wrote:
>
> Hello,
>
> I found on the archive that this bug I encountered also happened to
> others. I too have a very similar stacktrace. The issue I'm
> experiencing is:
>
> Whenever I fully boot my cluster, in some time, the host crashes with
> the __cgroup_bpf_run_filter_skb NULL pointer dereference. This has
> been sporadic enough before not to cause real issues. However, as of
> lately, the bug is triggered much more frequently. I've changed my
> server hardware so I could capture serial output in order to get the
> trace. This trace looked very similar as reported by Lu Fengqi. As it
> currently stands, I cannot run the cluster as it's almost instantly
> crashing the host.

This has been reported for multiple times. Are you able to test the
attached patch? And let me know if everything goes fine with it.

I suspect we may still leak some cgroup refcnt even with the patch,
but it might be much harder to trigger with this patch applied.

Thanks.
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 6c9c6ac83936..c01245a19ea2 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6438,9 +6438,6 @@ void cgroup_sk_alloc_disable(void)
 
 void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 {
-	if (cgroup_sk_alloc_disabled)
-		return;
-
 	/* Socket clone path */
 	if (skcd->val) {
 		/*
@@ -6453,6 +6450,9 @@ void cgroup_sk_alloc(struct sock_cgroup_data *skcd)
 		return;
 	}
 
+	if (cgroup_sk_alloc_disabled)
+		return;
+
 	/* Don't associate the sock with unrelated interrupted task's cgroup. */
 	if (in_interrupt())
 		return;

Reply via email to