tg_set_cpu_limit() calls __tg_set_cfs_bandwidth(), which iterates over
for_each_online_cpu(i)and takes per-CPU rq locks. However,
tg_set_cpu_limit() does not hold cpus_read_lock().
The requirement to hold cpus_read_lock() was introduced by the upstream
commit
0e59bdaea75f ("sched/fair: Disable runtime_enabled on dying rq"),
which changed the iteration in __tg_set_cfs_bandwidth() from
for_each_possible_cpu to for_each_online_cpu and added
get_online_cpus()/put_online_cpus() around the call. This was done to
prevent a race between setting cfs_rq->runtime_enabled and
unthrottle_offline_cfs_rqs().
If a CPU goes offline while __tg_set_cfs_bandwidth() is executing inside
tg_set_cpu_limit(), the function may re-enable runtime_enabled on a
dying CPU's cfs_rq after unthrottle_offline_cfs_rqs() has already
cleared it, leaving tasks stranded on a dead CPU with no way to
migrate.
The bug was inherited from the original commit
4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature"),
where tg_set_cpu_limit() was ported from vz7 (kernel 3.10) without
accounting for the changed locking requirements. In the vz7 kernel,
__tg_set_cfs_bandwidth() used for_each_possible_cpu, so cpus_read_lock()
was not needed.
Fixes: 4514c5835d32f ("sched: Port CONFIG_CFS_CPULIMIT feature")
https://virtuozzo.atlassian.net/browse/VSTOR-127251
Signed-off-by: Dmitry Sepp <[email protected]>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0423c1b323ca..36cef7e6bfeb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10030,6 +10030,7 @@ static int tg_set_cpu_limit(struct task_group *tg,
unsigned int nr_cpus)
quota = max(quota, min_cfs_quota_period);
}
+ cpus_read_lock();
mutex_lock(&cfs_constraints_mutex);
ret = __tg_set_cfs_bandwidth(tg, period, quota, burst);
if (!ret) {
@@ -10037,6 +10038,7 @@ static int tg_set_cpu_limit(struct task_group *tg,
unsigned int nr_cpus)
tg->nr_cpus = nr_cpus;
}
mutex_unlock(&cfs_constraints_mutex);
+ cpus_read_unlock();
return ret;
}
--
2.47.1
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel