After further investigation, I found that the observed behavior is expected and consistent with how cgroups work in conjunction with the Linux memory management subsystem.
The leak-memory tool used to reproduce the issue reports the virtual memory (total-vm) reserved by the process. This memory is obtained via malloc() to reserve virtual address space. However, this does not immediately result in physical memory allocation. Physical memory (i.e., pages backed by RAM) is only mapped and charged when the process actually accesses the memory, typically by writing to it. The memory cgroup limit (memory.max) constrains physical memory usage, not virtual address space. In particular, it limits the amount of memory that can be resident in RAM and charged to the cgroup. This corresponds to the anon-rss value, which only increases when pages are physically committed. Therefore, a process can reserve large amounts of virtual memory (high total-vm) without violating the memory cgroup limit, as long as it does not access enough of it to exceed the anon-rss budget. What happens when running the tool is that initially both total-vm and anon-rss increase together, as pages are allocated and accessed. However, once anon-rss approaches the configured memory.max threshold, the kernel begins to reclaim physical memory in order to stay within the cgroup limit. This may involve reclaiming clean pages or unmapping anonymous pages that are not recently used, and ultimately, if the memory pressure cannot be relieved, the kernel invokes the OOM killer for the offending process. In order to show this I modified the leak-memory tool to make it print also other statistics about the memory used. The diff containing the changes I made to leak-memory will be attached to this comment in a .diff. Running the modified version using the command: $ systemd-run --scope -p MemoryMax=50M --user ./leak-memory It is possible to see the behavior described above: [*] ==Allocation== VmSize: 3780 kB RssAnon: 1024 kB RssFile: 1468 kB RssShmem: 0 kB VmPTE: 48 kB ... [*] ==Allocation== VmSize: 35648 kB RssAnon: 32896 kB RssFile: 1596 kB RssShmem: 0 kB VmPTE: 112 kB ... [*] ==Allocation== VmSize: 53124 kB RssAnon: 50432 kB RssFile: 1596 kB RssShmem: 0 kB VmPTE: 148 kB ... [*] ==Allocation== VmSize: 103496 kB RssAnon: 50304 kB RssFile: 1596 kB RssShmem: 0 kB VmPTE: 244 kB ... Changing the memory.max limit will also change the final value of RssAnon. Given this I will mark the bug as "Invalid". ** Patch added: "leak-memory.diff" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105866/+attachment/5869057/+files/leak-memory.diff ** Changed in: linux (Ubuntu) Status: Confirmed => Invalid -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2105866 Title: cgroup: weird calculation for memory limit Status in linux package in Ubuntu: Invalid Bug description: Starting with Plucky (~~verified not to reproduce on Oracular or Noble~~ the verification was done in a LXD VM, but even with plucky, this doesn't reproduce. Status for Noble/Oracular is actually unknown), there is a weirdness in how cgroup's memory.max constraint is taken into account: Here is a test case without systemd in the equation (thanks enr0n for coming up with that): ``` ubuntu@ubuntu:~$ ps -o cgroup $$ CGROUP 0::/user.slice/user-1000.slice/session-4.scope ubuntu@ubuntu:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max max ubuntu@ubuntu:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.current 3538944 ubuntu@ubuntu:~$ echo 5000000 | sudo tee /sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max 5000000 ubuntu@ubuntu:~$ cat /sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max 4997120 ubuntu@ubuntu:~$ leak-memory Starting memory consumption in 1.00 MiB steps to maximum 42.0 TiB. Allocated 10.0 MiB. Allocated 20.0 MiB. Allocated 30.0 MiB. Allocated 40.0 MiB. Allocated 50.0 MiB. Allocated 60.0 MiB. Allocated 70.0 MiB. ... Allocated 1.88 GiB. Allocated 1.89 GiB. Allocated 1.90 GiB. Allocated 1.91 GiB. Allocated 1.92 GiB. Killed ``` Here is my original test case, that makes reproducing the issue a one-liner: ``` ❯ systemd-run --scope -p MemoryMax=1M --user leak-memory Running as unit: run-p222392-i222692.scope; invocation ID: 942b1e8b1e374e82abeff046a62dcbf7 Starting memory consumption in 1.00 MiB steps to maximum 42.0 TiB. Allocated 10.0 MiB. Allocated 20.0 MiB. .... Allocated 470.0 MiB. zsh: killed systemd-run --scope -p MemoryMax=1M --user leak-memory ``` The issue obviously scales with the value set in MemoryMax. The only lead I've got for now, is that the value I set in MemoryMax weirdly correlates to what is reported by oom-kill in the pgtables value: With 1M: ``` [62584.409068] Memory cgroup out of memory: Killed process 237541 (leak-memory) total-vm:490024kB, anon-rss:0kB, file-rss:1448kB, shmem-rss:0kB, UID:1000 pgtables:1004kB oom_score_adj:0 ``` With 4M: ``` [62693.780200] Memory cgroup out of memory: Killed process 237732 (leak-memory) total-vm:2058752kB, anon-rss:128kB, file-rss:1512kB, shmem-rss:0kB, UID:1000 pgtables:4072kB oom_score_adj:0 ``` In both cases, the reported `total-vm` value is way above the expected limit. My current system: ❯ uname -rv 6.12.0-12-generic #12-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 22 16:36:37 UTC 2025 --- ProblemType: Bug ApportVersion: 2.32.0-0ubuntu3 Architecture: amd64 CRDA: N/A CasperMD5CheckResult: unknown CurrentDesktop: sway DistroRelease: Ubuntu 25.04 InstallationDate: Installed on 2023-08-02 (609 days ago) InstallationMedia: Ubuntu 23.04 "Lunar Lobster" - Release amd64 (20230418) MachineType: LENOVO 20UES1K600 NonfreeKernelModules: zfs Package: linux (not installed) ProcEnviron: LANG=fr_FR.UTF-8 PATH=(custom, no user) SHELL=/bin/zsh TERM=alacritty XDG_RUNTIME_DIR=<set> ProcFB: 0 amdgpudrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.14.0-13-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash vt.handoff=7 ProcVersionSignature: Ubuntu 6.14.0-13.13-generic 6.14.0 PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-6.14.0-13-generic N/A linux-backports-modules-6.14.0-13-generic N/A linux-firmware 20250317.git1d4c88ee-0ubuntu1 Tags: plucky wayland-session Uname: Linux 6.14.0-13-generic x86_64 UpgradeStatus: Upgraded to plucky on 2024-06-12 (294 days ago) UserGroups: adm cdrom dip input lpadmin lxd plugdev sbuild sudo users wireshark _MarkForUpload: True dmi.bios.date: 11/05/2024 dmi.bios.release: 1.51 dmi.bios.vendor: LENOVO dmi.bios.version: R1BET82W(1.51 ) dmi.board.asset.tag: Not Available dmi.board.name: 20UES1K600 dmi.board.vendor: LENOVO dmi.board.version: SDK0J40697 WIN dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: None dmi.ec.firmware.release: 1.51 dmi.modalias: dmi:bvnLENOVO:bvrR1BET82W(1.51):bd11/05/2024:br1.51:efr1.51:svnLENOVO:pn20UES1K600:pvrThinkPadT14Gen1:rvnLENOVO:rn20UES1K600:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20UE_BU_Think_FM_ThinkPadT14Gen1: dmi.product.family: ThinkPad T14 Gen 1 dmi.product.name: 20UES1K600 dmi.product.sku: LENOVO_MT_20UE_BU_Think_FM_ThinkPad T14 Gen 1 dmi.product.version: ThinkPad T14 Gen 1 dmi.sys.vendor: LENOVO To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105866/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp