On 03/16/2016 04:22 PM, Peter Zijlstra wrote:
> Subject: sched: Fix/cleanup cgroup teardown/init
>
> The cpu controller hasn't kept up with the various changes in the whole
> cgroup initialization / destruction sequence, and commit 2e91fa7f6d45
> ("cgroup: keep zombies associated with their original cgroups") caused
> it to explode.
>
> The reason for this is that zombies do not inhibit css_offline() from
> being called, but do stall css_released(). Now we tear down the cfs_rq
> structures on css_offline() but zombies can run after that, leading to
> use-after-free issues.
>
> The solution is to move the tear-down to css_released(), which
> guarantees nobody (including no zombies) is still using our cgroup.
>
> Furthermore, a few simple cleanups are possible too. There doesn't
> appear to be any point to us using css_online() (anymore?) so fold that
> in css_alloc().
>
> And since cgroup code guarantees an RCU grace period between
> css_released() and css_free() we can forgo using call_rcu() and free the
> stuff immediately.
>
> Cc: [email protected]
> Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original
> cgroups")
> Suggested-by: Tejun Heo <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Survived 500 reboots. Without the patch, I've never gone past 84 reboots.
Tested-by: Niklas Cassel <[email protected]>