Thanks again for all the feedback, Matthew. I've incorporated the requested suggestions and will send out a v2 shortly.
** Summary changed: - Re-enable memcg v1 on Noble (6.14) + enable MEMCG_V1 and CPUSETS_V1 on Noble HWE ** Description changed: [Impact] - Although v1 cgroups are deprecated in Noble, it was still possible for + Although v1 cgroups are deprecated in Noble, it was still possible for users on 6.8 kernels to utilize them. This was especially helpful in - helping migrating users to Noble and then separately upgrading their - remaining v1 cgroups applications. Instead of requiring all users to - upgrade and fix their v2 support, v1 support could be provisionally - enabled until the necessary support was available in the applications - that still lack v2 support. + the Noble migration process. It allowed users to pick up the new OS and + then separately upgrade their remaining v1 cgroups applications. This + unblocked the migration path for v1 cgroups users, because v1 support + could be provisionally enabled until the necessary support was available + in the applications that still lack v2 support. - Starting in 6.12, CONFIG_MEMCG_V1 was added and defaulted to false. - Noble 6.8 users that were unlucky enough to still need V1 cgroups found - that they could no longer use memcgs in the 6.14 kernel. + Starting in 6.12, CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 were added and + defaulted to false. Noble 6.8 users that were unlucky enough to still + need these V1 cgroups found that they could no longer use them in the + 6.14 kernel. - Specific use cases include older JVMs that fail to correctly handle - missing controllers from /proc/cgroups. In that case, the container - limit detection is turned off and the JVM uses the host's limits. + Some of the specific failures that were encountered include older JVMs + that fail to correctly handle missing controllers from /proc/cgroups. + If memory or cpuset are absent, the container limit detection is turned + off and the JVM uses the host's limits. JVMs configured in containers + with specific memory usage percentages then end up consuming too much + memory and often crash. - Further, Apache Yarn is still completing their v1 -> v2 migration, which - leaves some Hadoop use cases without proper support. + Apache Yarn is still completing their v1 -> v2 migration, which leaves + some Hadoop use cases without proper support. - The request here is to enable MEMCG_V1 on Noble, but not newer releases, - for as long as the Noble HWE kernel train still has kernels with cgroup - v1 support. This gives users a little bit longer to complete their - migration while still using newer hardware, but with the understanding - that this really is the end of the line for v1 cgroups. + The request here is to enable these V1 controllers on Noble, but not + newer releases, for as long as the Noble HWE kernel train still has + kernels with upstream cgroup v1 support. This gives users a little bit + longer to complete their migration while still using newer hardware, but + with the understanding that this really is the end of the line for v1 + cgroups. [Fix] - Re-enable CONFIG_MEMCG_V1 in the 6.14 Noble config. + Re-enable the missing v1 controllers in the 6.14 Noble config. + + In 6.8 there were 14 controllers. In the current 6.14 config there are + also 14 controllers. However, the difference is that the current 6.14 + build the dmem controller was added, and the cpuset and memory + controllers were removed. + + Diffing both the /proc/cgroups and configs between the 6.14 and 6.8 + releases gives: + + -CPUSETS_V1 n + -MEMCG_V1 n + + These differences were also corroborated via source inspection. Changes + in 6.12 moved these controllers to be guarded by ifdefs that default to + being disabled via make olddefconfig. + + In order to ensure that 6.14 has the same v1 cgroup controllers enabled + as 6.8, enable both CONFIG_CPUSETS_V1 and CONFIG_MEMCG_V1 for Noble. [Test] - Booted a kernel with this change and validated that v1 memcgs were - present again. + Booted a kernel with this change and validated that the missing v1 + memcgs were present again. - [Potential Regression] + Before: - The regression potential here should be low since this merely restores - and existing feature that most users were not using but that a few still - depended upon. + $ grep memory /proc/cgroups + $ grep cpuset /proc/cgroups + + with v1 cgroups enabled: + + $ mount | grep cgroup | grep memory + $ mount | grep cgroup | grep cpuset + + $ ls /sys/fs/cgroup | grep memory + $ ls /sys/fs/cgroup | grep cpuset + + After: + + $ grep memory /proc/cgroups + memory 0 88 1 + $ grep cpuset /proc/cgroups + cpuset 0 88 1 + + with v1 cgroups enabled: + + $ mount | grep cgroup | grep memory + cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) + $ mount | grep cgroup | grep cpuset + cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) + + $ ls /sys/fs/cgroup | grep memory + memory + $ ls /sys/fs/cgroup | grep cpuset + cpuset + + A config diff of the previous build versus a build cranked from these + patches: + + CPUSETS_V1 n -> y + MEMCG_V1 n -> y + + [Where problems can occur] + + Since these changes re-introduce code that was disabled via ifdef, + there's a possible increase in the binary size. After comparing the + results from an identical build with these config flags disabled, the + difference in compressed artifact size for an x86 vmlinuz is an increase + of 16k. + + The difference in uncompressed memory usage after boot is an increase of + 40k, broken down as 21k code, 19k rwdata, 12k rodata, 8k init, -28k + bss, and 8k reserved. + + The primary remaining risk is around future breakage of these interfaces + since they are no longer part of the default configuration. If this is + not part of upstream's test matrix, then there is additional potential + breakage possible. However, the author has no knowledge of actual v1 + cgroups breakage at the time this patch is being submitted. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux-hwe-6.14 in Ubuntu. https://bugs.launchpad.net/bugs/2122368 Title: enable MEMCG_V1 and CPUSETS_V1 on Noble HWE Status in linux-hwe-6.14 package in Ubuntu: New Status in linux-hwe-6.14 source package in Noble: In Progress Bug description: [Impact] Although v1 cgroups are deprecated in Noble, it was still possible for users on 6.8 kernels to utilize them. This was especially helpful in the Noble migration process. It allowed users to pick up the new OS and then separately upgrade their remaining v1 cgroups applications. This unblocked the migration path for v1 cgroups users, because v1 support could be provisionally enabled until the necessary support was available in the applications that still lack v2 support. Starting in 6.12, CONFIG_MEMCG_V1 and CONFIG_CPUSETS_V1 were added and defaulted to false. Noble 6.8 users that were unlucky enough to still need these V1 cgroups found that they could no longer use them in the 6.14 kernel. Some of the specific failures that were encountered include older JVMs that fail to correctly handle missing controllers from /proc/cgroups. If memory or cpuset are absent, the container limit detection is turned off and the JVM uses the host's limits. JVMs configured in containers with specific memory usage percentages then end up consuming too much memory and often crash. Apache Yarn is still completing their v1 -> v2 migration, which leaves some Hadoop use cases without proper support. The request here is to enable these V1 controllers on Noble, but not newer releases, for as long as the Noble HWE kernel train still has kernels with upstream cgroup v1 support. This gives users a little bit longer to complete their migration while still using newer hardware, but with the understanding that this really is the end of the line for v1 cgroups. [Fix] Re-enable the missing v1 controllers in the 6.14 Noble config. In 6.8 there were 14 controllers. In the current 6.14 config there are also 14 controllers. However, the difference is that the current 6.14 build the dmem controller was added, and the cpuset and memory controllers were removed. Diffing both the /proc/cgroups and configs between the 6.14 and 6.8 releases gives: -CPUSETS_V1 n -MEMCG_V1 n These differences were also corroborated via source inspection. Changes in 6.12 moved these controllers to be guarded by ifdefs that default to being disabled via make olddefconfig. In order to ensure that 6.14 has the same v1 cgroup controllers enabled as 6.8, enable both CONFIG_CPUSETS_V1 and CONFIG_MEMCG_V1 for Noble. [Test] Booted a kernel with this change and validated that the missing v1 memcgs were present again. Before: $ grep memory /proc/cgroups $ grep cpuset /proc/cgroups with v1 cgroups enabled: $ mount | grep cgroup | grep memory $ mount | grep cgroup | grep cpuset $ ls /sys/fs/cgroup | grep memory $ ls /sys/fs/cgroup | grep cpuset After: $ grep memory /proc/cgroups memory 0 88 1 $ grep cpuset /proc/cgroups cpuset 0 88 1 with v1 cgroups enabled: $ mount | grep cgroup | grep memory cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) $ mount | grep cgroup | grep cpuset cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) $ ls /sys/fs/cgroup | grep memory memory $ ls /sys/fs/cgroup | grep cpuset cpuset A config diff of the previous build versus a build cranked from these patches: CPUSETS_V1 n -> y MEMCG_V1 n -> y [Where problems can occur] Since these changes re-introduce code that was disabled via ifdef, there's a possible increase in the binary size. After comparing the results from an identical build with these config flags disabled, the difference in compressed artifact size for an x86 vmlinuz is an increase of 16k. The difference in uncompressed memory usage after boot is an increase of 40k, broken down as 21k code, 19k rwdata, 12k rodata, 8k init, -28k bss, and 8k reserved. The primary remaining risk is around future breakage of these interfaces since they are no longer part of the default configuration. If this is not part of upstream's test matrix, then there is additional potential breakage possible. However, the author has no knowledge of actual v1 cgroups breakage at the time this patch is being submitted. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-6.14/+bug/2122368/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

