[Kernel-packages] [Bug 2105866] Re: cgroup: weird calculation for memory limit

Massimiliano Pellizzer Thu, 03 Apr 2025 02:15:39 -0700

The previous tests were executed in a lxc VM, but using the
ubuntu-25.04-beta-live-server-amd64.iso as a starting point, instead of
the lxd provided one.


The main difference between the two images is the swap memory. The
server image gets swap memory by default which allows the kernel to swap
out unused memory pages. The lxd image does not have swap memory so the
kernel can not swap out pages when the requested virtual memory is more
than memory.max.

This difference can bee seen in the message buffer of the kernel when
the OOM killer kills the process:

# Server iso
[  397.347689] leak-memory invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  397.347708] CPU: 1 UID: 1000 PID: 1470 Comm: leak-memory Kdump: loaded Not 
tainted 6.14.0-13-generic #13-Ubuntu
[  397.347717] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 
unknown 2/2/2022
[  397.347727] Call Trace:
[  397.347736]  <TASK>
[  397.347742]  show_stack+0x49/0x60
[  397.347770]  dump_stack_lvl+0x5f/0x90
[  397.347784]  dump_stack+0x10/0x18
[  397.347788]  dump_header+0x48/0x1be
[  397.347803]  oom_kill_process.cold+0xb/0xaf
[  397.347809]  out_of_memory+0x100/0x2b0
[  397.347820]  mem_cgroup_out_of_memory+0x13b/0x170
[  397.347834]  try_charge_memcg+0x40f/0x5c0
[  397.347846]  charge_memcg+0x34/0x80
[  397.347851]  __mem_cgroup_charge+0x2d/0xa0
[  397.347860]  alloc_anon_folio+0x219/0x480
[  397.347875]  do_anonymous_page+0x148/0x480
[  397.347882]  handle_pte_fault+0x1db/0x200
[  397.347889]  __handle_mm_fault+0x3d2/0x7a0
[  397.347902]  handle_mm_fault+0xef/0x2d0
[  397.347909]  do_user_addr_fault+0x2ff/0x7e0
[  397.347926]  exc_page_fault+0x85/0x1e0
[  397.347952]  asm_exc_page_fault+0x27/0x30
[  397.347960] RIP: 0033:0x5eb47b03eaf4
[  397.347965] Code: e8 71 f7 ff ff 48 89 45 d0 48 8b 55 d8 48 8b 45 d0 48 89 
10 48 8b 45 d0 48 83 c0 08 48 89 45 e0 eb 10 48 8b 45 e0 48 8b 55 c0 <48> 89 10 
48 83 45 e0 08 48 8b 45 b8 48 83 e0 f8 48 89 c2 48 8b 45
[  397.347969] RSP: 002b:00007ffc8ea0d730 EFLAGS: 00010287
[  397.347980] RAX: 0000722fb0c0b000 RBX: 00007ffc8ea0d8f8 RCX: 0000722fb0be6010
[  397.347986] RDX: 0000000101a00000 RSI: 0000000000100000 RDI: 0000722fb0be6010
[  397.347988] RBP: 00007ffc8ea0d790 R08: 00000000ffffffff R09: 0000000000000000
[  397.347990] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[  397.347993] R13: 0000000000000000 R14: 00005eb47b040d30 R15: 00007230b3875000
[  397.348003]  </TASK>
[  397.348008] memory: usage 51200kB, limit 51200kB, failcnt 904335
[  397.348029] swap: usage 4194052kB, limit 9007199254740988kB, failcnt 0
[  397.348033] Memory cgroup stats for 
/user.slice/user-1000.slice/user@1000.service/app.slice/run-p1470-i1471.scope:
...

# LXD image
[  434.148492] leak-memory invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), 
order=0, oom_score_adj=0
[  434.148509] CPU: 1 UID: 1001 PID: 1240 Comm: leak-memory Not tainted 
6.14.0-13-generic #13-Ubuntu
[  434.148515] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS 
unknown 2/2/2022
[  434.148519] Call Trace:
[  434.148522]  <TASK>
[  434.148527]  show_stack+0x49/0x60
[  434.148537]  dump_stack_lvl+0x5f/0x90
[  434.148544]  dump_stack+0x10/0x18
[  434.148564]  dump_header+0x48/0x1be
[  434.148569]  oom_kill_process.cold+0xb/0xaf
[  434.148576]  out_of_memory+0x100/0x2b0
[  434.148582]  mem_cgroup_out_of_memory+0x13b/0x170
[  434.148592]  try_charge_memcg+0x40f/0x5c0
[  434.148601]  charge_memcg+0x34/0x80
[  434.148605]  __mem_cgroup_charge+0x2d/0xa0
[  434.148611]  alloc_anon_folio+0x219/0x480
[  434.148619]  do_anonymous_page+0x148/0x480
[  434.148626]  handle_pte_fault+0x1db/0x200
[  434.148633]  __handle_mm_fault+0x3d2/0x7a0
[  434.148646]  handle_mm_fault+0xef/0x2d0
[  434.148653]  do_user_addr_fault+0x2ff/0x7e0
[  434.148663]  exc_page_fault+0x85/0x1e0
[  434.148670]  asm_exc_page_fault+0x27/0x30
[  434.148675] RIP: 0033:0x5f93dfd6daf4
[  434.148680] Code: e8 71 f7 ff ff 48 89 45 d0 48 8b 55 d8 48 8b 45 d0 48 89 
10 48 8b 45 d0 48 83 c0 08 48 89 45 e0 eb 10 48 8b 45 e0 48 8b 55 c0 <48> 89 10 
48 83 45 e0 08 48 8b 45 b8 48 83 e0 f8 48 89 c2 48 8b 45
[  434.148684] RSP: 002b:00007ffd89fb0f70 EFLAGS: 00010287
[  434.148688] RAX: 00007ed2f6d5b000 RBX: 00007ffd89fb1138 RCX: 00007ed2f6ccf010
[  434.148691] RDX: 0000000003200000 RSI: 0000000000100000 RDI: 00007ed2f6ccf010
[  434.148693] RBP: 00007ffd89fb0fd0 R08: 00000000ffffffff R09: 0000000000000000
[  434.148695] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[  434.148698] R13: 0000000000000000 R14: 00005f93dfd6fd30 R15: 00007ed2fa1b2000
[  434.148708]  </TASK>
[  434.148710] memory: usage 51200kB, limit 51200kB, failcnt 40
[  434.148736] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[  434.148741] Memory cgroup stats for 
/user.slice/user-1001.slice/user@1001.service/app.slice/run-p1240-i1241.scope:
...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2105866

Title:
  cgroup: weird calculation for memory limit

Status in linux package in Ubuntu:
  Invalid

Bug description:
  Starting with Plucky (~~verified not to reproduce on Oracular or
  Noble~~ the verification was done in a LXD VM, but even with plucky,
  this doesn't reproduce. Status for Noble/Oracular is actually
  unknown), there is a weirdness in how cgroup's memory.max constraint
  is taken into account:

  Here is a test case without systemd in the equation (thanks enr0n for coming 
up with that):
  ```
  ubuntu@ubuntu:~$ ps -o cgroup $$
  CGROUP
  0::/user.slice/user-1000.slice/session-4.scope
  ubuntu@ubuntu:~$ cat 
/sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max
  max
  ubuntu@ubuntu:~$ cat 
/sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.current
  3538944
  ubuntu@ubuntu:~$ echo 5000000 | sudo tee 
/sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max
  5000000
  ubuntu@ubuntu:~$ cat 
/sys/fs/cgroup/user.slice/user-1000.slice/session-4.scope/memory.max
  4997120
  ubuntu@ubuntu:~$ leak-memory
  Starting memory consumption in 1.00 MiB steps to maximum 42.0 TiB.
  Allocated 10.0 MiB.
  Allocated 20.0 MiB.
  Allocated 30.0 MiB.
  Allocated 40.0 MiB.
  Allocated 50.0 MiB.
  Allocated 60.0 MiB.
  Allocated 70.0 MiB.

  ...

  Allocated 1.88 GiB.
  Allocated 1.89 GiB.
  Allocated 1.90 GiB.
  Allocated 1.91 GiB.
  Allocated 1.92 GiB.
  Killed
  ```

  Here is my original test case, that makes reproducing the issue a one-liner:
  ```
  ❯ systemd-run --scope -p MemoryMax=1M --user leak-memory
  Running as unit: run-p222392-i222692.scope; invocation ID: 
942b1e8b1e374e82abeff046a62dcbf7
  Starting memory consumption in 1.00 MiB steps to maximum 42.0 TiB.
  Allocated 10.0 MiB.
  Allocated 20.0 MiB.
  ....

  Allocated 470.0 MiB.
  zsh: killed     systemd-run --scope -p MemoryMax=1M --user leak-memory
  ```

  The issue obviously scales with the value set in MemoryMax.

  The only lead I've got for now, is that the value I set in MemoryMax weirdly 
correlates to what is reported by oom-kill in the pgtables value:
  With 1M:
  ```
  [62584.409068] Memory cgroup out of memory: Killed process 237541 
(leak-memory) total-vm:490024kB, anon-rss:0kB, file-rss:1448kB, shmem-rss:0kB, 
UID:1000 pgtables:1004kB oom_score_adj:0
  ```
  With 4M:
  ```
  [62693.780200] Memory cgroup out of memory: Killed process 237732 
(leak-memory) total-vm:2058752kB, anon-rss:128kB, file-rss:1512kB, 
shmem-rss:0kB, UID:1000 pgtables:4072kB oom_score_adj:0
  ```
  In both cases, the reported `total-vm` value is way above the expected limit.

  My current system:
  ❯ uname -rv
  6.12.0-12-generic #12-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 22 16:36:37 UTC 2025
  ---
  ProblemType: Bug
  ApportVersion: 2.32.0-0ubuntu3
  Architecture: amd64
  CRDA: N/A
  CasperMD5CheckResult: unknown
  CurrentDesktop: sway
  DistroRelease: Ubuntu 25.04
  InstallationDate: Installed on 2023-08-02 (609 days ago)
  InstallationMedia: Ubuntu 23.04 "Lunar Lobster" - Release amd64 (20230418)
  MachineType: LENOVO 20UES1K600
  NonfreeKernelModules: zfs
  Package: linux (not installed)
  ProcEnviron:
   LANG=fr_FR.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/zsh
   TERM=alacritty
   XDG_RUNTIME_DIR=<set>
  ProcFB: 0 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.14.0-13-generic 
root=/dev/mapper/ubuntu--vg-ubuntu--lv ro quiet splash vt.handoff=7
  ProcVersionSignature: Ubuntu 6.14.0-13.13-generic 6.14.0
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-6.14.0-13-generic N/A
   linux-backports-modules-6.14.0-13-generic  N/A
   linux-firmware                             20250317.git1d4c88ee-0ubuntu1
  Tags: plucky wayland-session
  Uname: Linux 6.14.0-13-generic x86_64
  UpgradeStatus: Upgraded to plucky on 2024-06-12 (294 days ago)
  UserGroups: adm cdrom dip input lpadmin lxd plugdev sbuild sudo users 
wireshark
  _MarkForUpload: True
  dmi.bios.date: 11/05/2024
  dmi.bios.release: 1.51
  dmi.bios.vendor: LENOVO
  dmi.bios.version: R1BET82W(1.51 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20UES1K600
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40697 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.ec.firmware.release: 1.51
  dmi.modalias: 
dmi:bvnLENOVO:bvrR1BET82W(1.51):bd11/05/2024:br1.51:efr1.51:svnLENOVO:pn20UES1K600:pvrThinkPadT14Gen1:rvnLENOVO:rn20UES1K600:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:skuLENOVO_MT_20UE_BU_Think_FM_ThinkPadT14Gen1:
  dmi.product.family: ThinkPad T14 Gen 1
  dmi.product.name: 20UES1K600
  dmi.product.sku: LENOVO_MT_20UE_BU_Think_FM_ThinkPad T14 Gen 1
  dmi.product.version: ThinkPad T14 Gen 1
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2105866/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2105866] Re: cgroup: weird calculation for memory limit

Reply via email to