As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
unnecessary task iteration and updating of tasks' CPU and node masks
when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
due to the fact that the temporary new_cpus and new_mems masks do not
inherit parent's effective_cpus/mems when they are empty which is the
expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
Enable cpuset controller in default hierarchy").
Fix that and avoid unnecessay work by enhancing
compute_effective_cpumask() to add the empty cpumask check
and inheriting the parent's versions if empty when in v2. A new
compute_effective_nodemask() helper is also added to perform similar
function for new effective_mems.
Add new test_cpuset_prs.sh test cases to confirm that effective_cpus
will inherit the parent's version if cpuset.cpus is empty.
[1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
Suggested-by: Ridong Chen <[email protected]>
Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cpuset.c | 45 +++++++++++--------
.../selftests/cgroup/test_cpuset_prs.sh | 11 ++++-
2 files changed, 35 insertions(+), 21 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index aff86acea701..044ddbf66f8e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1094,12 +1094,35 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs,
struct cpumask *new_cpus)
* @cs: the cpuset the need to recompute the new effective_cpus mask
* @parent: the parent cpuset
*
+ * For v2, the parent's effective_cpus is inherited if cpumask empty.
* The result is valid only if the given cpuset isn't a partition root.
*/
static void compute_effective_cpumask(struct cpumask *new_cpus,
struct cpuset *cs, struct cpuset *parent)
{
- cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
+ bool has_cpus;
+
+ has_cpus = cpumask_and(new_cpus, cs->cpus_allowed,
parent->effective_cpus);
+ if (!has_cpus && is_in_v2_mode())
+ cpumask_copy(new_cpus, parent->effective_cpus);
+}
+
+/**
+ * compute_effective_nodemask - Compute the effective nodemask of the cpuset
+ * @new_cpus: the temp variable for the new effective_mems mask
+ * @cs: the cpuset the need to recompute the new effective_mems mask
+ * @parent: the parent cpuset
+ *
+ * For v2, the parent's effective_mems is inherited if nodemask empty.
+ */
+static void compute_effective_nodemask(nodemask_t *new_mems,
+ struct cpuset *cs, struct cpuset *parent)
+{
+ bool has_mems;
+
+ has_mems = nodes_and(*new_mems, cs->mems_allowed,
parent->effective_mems);
+ if (!has_mems && is_in_v2_mode())
+ nodes_copy(*new_mems, parent->effective_mems);
}
/*
@@ -2148,15 +2171,6 @@ static void update_cpumasks_hier(struct cpuset *cs,
struct tmpmasks *tmp,
goto update_parent_effective;
}
- /*
- * If it becomes empty, inherit the effective mask of the
- * parent, which is guaranteed to have some CPUs unless
- * it is a partition root that has explicitly distributed
- * out all its CPUs.
- */
- if (is_in_v2_mode() && !remote && cpumask_empty(tmp->new_cpus))
- cpumask_copy(tmp->new_cpus, parent->effective_cpus);
-
/*
* Skip the whole subtree if
* 1) the cpumask remains the same,
@@ -2704,14 +2718,7 @@ static void update_nodemasks_hier(struct cpuset *cs,
nodemask_t *new_mems)
cpuset_for_each_descendant_pre(cp, pos_css, cs) {
struct cpuset *parent = parent_cs(cp);
- bool has_mems = nodes_and(*new_mems, cp->mems_allowed, parent->effective_mems);
-
- /*
- * If it becomes empty, inherit the effective mask of the
- * parent, which is guaranteed to have some MEMs.
- */
- if (is_in_v2_mode() && !has_mems)
- *new_mems = parent->effective_mems;
+ compute_effective_nodemask(new_mems, cp, parent);
/* Skip the whole subtree if the nodemask remains the same. */
if (nodes_equal(*new_mems, cp->effective_mems)) {
@@ -3923,7 +3930,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset
*cs, struct tmpmasks *tmp)
parent = parent_cs(cs);
compute_effective_cpumask(&new_cpus, cs, parent);
- nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ compute_effective_nodemask(&new_mems, cs, parent);
if (!tmp || !cs->partition_root_state)
goto update_tasks;
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 0d41aa0d343d..ca9bc38fdb95 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -495,13 +495,20 @@ REMOTE_TEST_MATRIX=(
# Narrowing cpuset.cpus to previously sibling-excluded CPUs should
# not return CPUs that were never actually owned.
" C1-4:P1 . C1-2:P1 C1-3:P2 . . \
- . . . C3 . . p1:4|c11:1-2|c12:3 \
+ . . . C3 . . p1:4|c11:1-2|c12:3 \
p1:P1|c11:P1|c12:P2 3"
# Expanding cpuset.cpus to include a previously sibling-excluded CPU
# after the sibling has become a member should correctly request it.
" C1-4:P1 . C1-2:P1 C1-3:P2 . . \
- . . P0 C2-3 . . p1:1,4|c11:1|c12:2-3 \
+ . . P0 C2-3 . . p1:1,4|c11:1|c12:2-3 \
p1:P1|c11:P0|c12:P2
2-3"
+ # Cpusets with empty cpuset.cpus should inherit parent's effective_cpus
+ " C1-4:P1 C5-6 C1-2 . C5 . \
+ . P1 P1 . . .
p1:3-4|p2:5-6|c11:1-2|c12:3-4|c21:5|c22:5-6 \
+ p1:P1|p2:P1|c11:P1"
+ " C1-4:P1 C5-6 C1-2 . C5 . \
+ . P1 P1 . O5=0 .
p1:3-4|p2:6|c11:1-2|c12:3-4|c21:6|c22:6 \
+ p1:P1|p2:P1|c11:P1"
)
#