On 11/15/25 1:02 AM, Sun Shaojie wrote:
On 2015/11/15 08:58, Chen Ridong wrote:
On 2025/11/15 0:14, Michal Koutný wrote:
On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong 
<[email protected]> wrote:
After further consideration, I still suggest retaining this rule.
Apologies, I'm slightly lost which rule. I hope the new iteration from
Shaojie with both before/after tables will explain it.

The rule has changed in this patch from "If either cpuset is exclusive, check 
if they are mutually
exclusive" to
"If both cpusets are exclusive, check if they are mutually exclusive"

  -    /* If either cpuset is exclusive, check if they are mutually exclusive */
  -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
  +    /* If both cpusets are exclusive, check if they are mutually exclusive */
  +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
  +        return !cpusets_are_exclusive(cs1, cs2);

I suggest not modifying this rule and keeping the original logic intact:

For am example:
   Step                                       | A1's prstate | B1's prstate |
   #1> mkdir -p A1                            | member       |              |
   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
   #4> mkdir -p B1                            | root         | member       |
   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |

Currently, we mark A1 as invalid. But similar to the logic in this patch, why 
must A1 be
invalidated?
A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
anymore.

B1 could also use the parent's effective CPUs, right?
Here you assume some ordering between siblings treating A1 more
important than B1. But it's symmetrical in principle, no?

I’m using an example to illustrate that if Shaojie’s patch is accepted, other 
rules could be relaxed
following the same logic—but I’m not in favor of doing so.
Hi, Ridong,

Thank you for pointing out the issue with the current patch; this is indeed
not what our product intends. I must admit that I haven't thoroughly tested
on such recent kernel versions.

Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
"other rules" you mentioned, we do not intend to relax them. On the
contrary, we aim to maintain them firmly.

Our product need ensure the following behavior: in cgroup-v2, user
modifications to one cpuset should not affect the partition state of its
sibling cpusets. This is justified and meaningful, as it aligns with the
isolation characteristics of cgroups.

This can be divided into two scenarios:
Scenario 1: Only one of A1 and B1 is "root".
Scenario 2: Both A1 and B1 are "root".

We plan to implement Scenario 1 first. This is the goal of patch v2.
However, patch v2 is flawed because it does not strictly adhere to the
following existing rule.

However, it is worth noting that the current cgroup v2 implementation does
not strictly adhere to the following rule either (which is also an
objective for patch v3 to address).

Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".

Inside the cpuset code, the rule should be "cpuset.cpus should not be a subset of sibling's cpuset.cpus.exclusive".

Note that one rule that should always be followed in v2 is that writing any valid cpumask into cpuset.cpus is allowed, but writing to cpuset.cpus.exclusive may fail if some rules are violated. If the new cpuset.cpus violate the rules for a sibling partition root, the current code will invalidate the sibling partition. I am not against changing the cpuset.cpus.effective to a suitable value to avoid the conflict instead invalidating a sibling partition. We do need to make sure that the new behavior is consistent under different circumstances.


Using your example to illustrate.
  Step (refer to the steps in the table below)
  #1> mkdir -p A1
  #2> echo "0-1" > A1/cpuset.cpus.exclusive
  #3> echo "root" > A1/cpuset.cpus.partition
  #4> mkdir -p B1
  #5> echo "0" > B1/cpuset.cpus

Table 1: Current result
  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
  #1   | 0      |                |           | member       |              |
  #2   | 0      | 0-1            |           | member       |              |
  #3   | 0      | 0-1            |           | root         |              |
  #4   | 0      | 0-1            |           | root         | member       |
  #5   | 0      | 0-1            | 0         | root invalid | member       |

Table 2: Expected result
  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
  #1   | 0      |                |           | member       |              |
  #2   | 0      | 0-1            |           | member       |              |
  #3   | 0      | 0-1            |           | root         |              |
  #4   | 0      | 0-1            |           | root         | member       |
  #5   | error  | 0-1            |           | root         | member       |

Currently, after step #5, the operation returns success, which clearly
violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's
"cpuset.cpus.exclusive".

Therefore, after step #5, the operation should return error, with A1
remaining as "root". This better complies with the Rule 1.

------
The following content is provided for reference, and we hope it may be
adopted in the future.
!!These are not part of what patch v3 will implement.

As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current
cgroup v2 behavior. This patch series does not modify it, but we hope to
draw the maintainers' attention, as we indeed have plans for future
modifications. Our intent can be seen from the following examples.

For example:
  Step (refer to the steps in the table below)
  #1> mkdir -p A1
  #2> echo "0-1"  > A1/cpuset.cpus
  #3> echo "root" > A1/cpuset.cpus.partition
  #4> mkdir -p B1
  #5> echo "2-3"  > B1/cpuset.cpus
  #6> echo "root" > B1/cpuset.cpus.partition
  #7> echo "1-2"  > B1/cpuset.cpus

Table 1: Current result
  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
  #1   | from parent   |               | member       |              |
  #2   | 0-1           |               | member       |              |
  #3   | 0-1           |               | root         |              |
  #4   | 0-1           | from parent   | root         | member       |
  #5   | 0-1           | 2-3           | root         | member       |
  #6   | 0-1           | 2-3           | root         | root         |
  #7   | 0-1           | 1-2           | root invalid | root invalid |

Table 2: Expected result
  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
  #1   | from parent   |               | member       |              |
  #2   | 0-1           |               | member       |              |
  #3   | 0-1           |               | root         |              |
  #4   | 0-1           | from parent   | root         | member       |
  #5   | 0-1           | 2-3           | root         | member       |
  #6   | 0-1           | 2-3           | root         | root         |
  #7   | 0-1           | 2             | root         | root invalid |

After step #7, we expect A1 to remain "root" (unaffected), while only B1
becomes "root invalid".

The following Rule 2 and Rule 3 are alsomplemented and adhered to by our
product. The current cgroup v2 implementation does not enforce them.
Likewise, we hope this will draw the maintainers' attention. Maybe, they can
be applied in the future.

Rule 2: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.effective"
         must either be a subset of it, or "cpuset.cpus.effective" is null.

cpuset.cpus.effective will never be null in v2 with the exception of a partition root distributing out all its CPUs to child sub-partitions.



Rule 3: In one cpuset, when "cpuset.cpus" is not null, "cpuset.cpus.exclusive"
         must either be a subset of it, or "cpuset.cpus.exclusive" is null.

We currently don't have this rule in the cpuset code as cpuset.cpus.exclusive can be independent of "cpuset.cpus".  We could implement this rule by failing the cpuset.cpus.exclusive write in this case, but we can't fail the write to "cpuset.cpus" if the rule is violated.

Cheers,
Longman


Reply via email to