Public bug reported:

The oom-killer goes into a loop if a task in a memory CGroup with oom_score_adj 
set to -1000 goes over the memory limit.
The task (python3 in this case) is not killed and the CPU usage of the machine 
goes high but the machine still remained responsive.


Setup details
-------------

root@vm1:~# cat /etc/os-release 
PRETTY_NAME="Ubuntu 24.10"
NAME="Ubuntu"
VERSION_ID="24.10"
VERSION="24.10 (Oracular Oriole)"
VERSION_CODENAME=oracular
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
UBUNTU_CODENAME=oracular
LOGO=ubuntu-logo


root@vm1:~# uname -a
Linux vm1 6.11.0-8-generic #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 16 13:41:20 
UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
root@vm1:~# 


How to reproduce ?
----------------

This was done on a VM with 1 GB Ram and 2 CPU cores with no swap
root@vm1:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           960Mi       287Mi       203Mi       1.2Mi       608Mi       673Mi
Swap:             0B          0B          0B
root@vm1:~# nproc
2
root@vm1:~# 


On terminal 1, 

root@vm1:~# python3
Python 3.12.7 (main, Oct  3 2024, 15:15:22) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 


On terminal 2, 

root@vm1:~# mkdir /sys/fs/cgroup/testcg
root@vm1:~# ps aux|grep python                                          # get 
pid of python 2536 in this case
root@vm1:~# echo -1000 > /proc/2536/oom_score_adj                               
                                                                                
                                   
root@vm1:~# 2536 > /sys/fs/cgroup/testcg/cgroup.procs 

On terminal 1, 
>>> c2 = {i: i**4 for i in range(6000100)}


Logs
----

Dmesg continuously gets the following message along with other traces.
Collected dmesg attached.


[Thu Oct 31 13:11:56 2024] python3 invoked oom-killer: 
gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-1000
[Thu Oct 31 13:11:56 2024] CPU: 1 UID: 0 PID: 2653 Comm: python3 Not tainted 
6.11.0-8-generic #8-Ubuntu
[Thu Oct 31 13:11:56 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), 
BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[Thu Oct 31 13:11:56 2024] Call Trace:
[Thu Oct 31 13:11:56 2024]  <TASK>
[Thu Oct 31 13:11:56 2024]  show_stack+0x49/0x60
[Thu Oct 31 13:11:56 2024]  dump_stack_lvl+0x5f/0x90
[Thu Oct 31 13:11:56 2024]  dump_stack+0x10/0x18
[Thu Oct 31 13:11:56 2024]  dump_header+0x46/0x1a6
[Thu Oct 31 13:11:56 2024]  out_of_memory.cold+0x1d/0x8d
[Thu Oct 31 13:11:56 2024]  mem_cgroup_out_of_memory+0x13b/0x170
[Thu Oct 31 13:11:56 2024]  try_charge_memcg+0x40f/0x5c0
[Thu Oct 31 13:11:56 2024]  __mem_cgroup_charge+0x45/0xd0
[Thu Oct 31 13:11:56 2024]  alloc_anon_folio+0x21b/0x450
[Thu Oct 31 13:11:56 2024]  do_anonymous_page+0x13b/0x400
[Thu Oct 31 13:11:56 2024]  handle_pte_fault+0x1ad/0x1c0
[Thu Oct 31 13:11:56 2024]  __handle_mm_fault+0x3d5/0x7a0
[Thu Oct 31 13:11:56 2024]  handle_mm_fault+0xef/0x2d0
[Thu Oct 31 13:11:56 2024]  do_user_addr_fault+0x2ff/0x7e0
[Thu Oct 31 13:11:56 2024]  exc_page_fault+0x85/0x1c0
[Thu Oct 31 13:11:56 2024]  asm_exc_page_fault+0x27/0x30
[Thu Oct 31 13:11:56 2024] RIP: 0033:0x72a4d5d961d3
[Thu Oct 31 13:11:56 2024] Code: c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 
0f 1f 84 00 00 00 00 00 48 3b 15 69 20 08 00 73 77 40 0f b6 c6 48 89 d1 48 89 
fa <f3> aa 48 89 d0 c5 f8 77 c3 0f 1f 40 00 c4 e2 79 78 c0 83 fa 10 7d
[Thu Oct 31 13:11:56 2024] RSP: 002b:00007ffd70a12718 EFLAGS: 00010287
[Thu Oct 31 13:11:56 2024] RAX: 0000000000000000 RBX: 0000000000b509f0 RCX: 
00000000003d2020
[Thu Oct 31 13:11:56 2024] RDX: 000072a4d4900030 RSI: 0000000000000000 RDI: 
000072a4d492e000
[Thu Oct 31 13:11:56 2024] RBP: 00007ffd70a12780 R08: 00000000ffffffff R09: 
0000000000000000
[Thu Oct 31 13:11:56 2024] R10: 0000000000000022 R11: 000072a4d4800030 R12: 
00000000003ffff0
[Thu Oct 31 13:11:56 2024] R13: 000072a4d5000010 R14: 000072a4d5bf59c0 R15: 
000072a4d4800010
[Thu Oct 31 13:11:56 2024]  </TASK>
[Thu Oct 31 13:11:56 2024] memory: usage 10240kB, limit 10240kB, failcnt 
18041088
[Thu Oct 31 13:11:56 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
[Thu Oct 31 13:11:56 2024] Memory cgroup stats for /testcg:
[Thu Oct 31 13:11:56 2024] anon 10457088
[Thu Oct 31 13:11:56 2024] file 0
[Thu Oct 31 13:11:56 2024] kernel 28672
[Thu Oct 31 13:11:56 2024] kernel_stack 0
[Thu Oct 31 13:11:56 2024] pagetables 24576
[Thu Oct 31 13:11:56 2024] sec_pagetables 0
[Thu Oct 31 13:11:56 2024] percpu 0
[Thu Oct 31 13:11:56 2024] sock 0
[Thu Oct 31 13:11:56 2024] vmalloc 0
[Thu Oct 31 13:11:56 2024] shmem 0
[Thu Oct 31 13:11:56 2024] zswap 0
[Thu Oct 31 13:11:56 2024] zswapped 0
[Thu Oct 31 13:11:56 2024] file_mapped 0
[Thu Oct 31 13:11:56 2024] file_dirty 0
[Thu Oct 31 13:11:56 2024] file_writeback 0
[Thu Oct 31 13:11:56 2024] swapcached 0
[Thu Oct 31 13:11:56 2024] anon_thp 0
[Thu Oct 31 13:11:56 2024] file_thp 0
[Thu Oct 31 13:11:56 2024] shmem_thp 0
[Thu Oct 31 13:11:56 2024] inactive_anon 0
[Thu Oct 31 13:11:56 2024] active_anon 10457088
[Thu Oct 31 13:11:56 2024] inactive_file 0
[Thu Oct 31 13:11:56 2024] active_file 0
[Thu Oct 31 13:11:56 2024] unevictable 0
[Thu Oct 31 13:11:56 2024] slab_reclaimable 0
[Thu Oct 31 13:11:56 2024] slab_unreclaimable 2096
[Thu Oct 31 13:11:56 2024] slab 2096
[Thu Oct 31 13:11:56 2024] workingset_refault_anon 0
[Thu Oct 31 13:11:56 2024] workingset_refault_file 0
[Thu Oct 31 13:11:56 2024] workingset_activate_anon 0
[Thu Oct 31 13:11:56 2024] workingset_activate_file 0
[Thu Oct 31 13:11:56 2024] workingset_restore_anon 0
[Thu Oct 31 13:11:56 2024] workingset_restore_file 0
[Thu Oct 31 13:11:56 2024] workingset_nodereclaim 0
[Thu Oct 31 13:11:56 2024] pgscan 0
[Thu Oct 31 13:11:56 2024] pgsteal 0
[Thu Oct 31 13:11:56 2024] pgscan_kswapd 0
[Thu Oct 31 13:11:56 2024] pgscan_direct 0
[Thu Oct 31 13:11:56 2024] pgscan_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgsteal_kswapd 0
[Thu Oct 31 13:11:56 2024] pgsteal_direct 0
[Thu Oct 31 13:11:56 2024] pgsteal_khugepaged 0
[Thu Oct 31 13:11:56 2024] pgfault 961958
[Thu Oct 31 13:11:56 2024] pgmajfault 36510
[Thu Oct 31 13:11:56 2024] pgrefill 0
[Thu Oct 31 13:11:56 2024] pgactivate 0
[Thu Oct 31 13:11:56 2024] pgdeactivate 0
[Thu Oct 31 13:11:56 2024] pglazyfree 0
[Thu Oct 31 13:11:56 2024] pglazyfreed 0
[Thu Oct 31 13:11:56 2024] zswpin 0
[Thu Oct 31 13:11:56 2024] zswpout 0
[Thu Oct 31 13:11:56 2024] zswpwb 0
[Thu Oct 31 13:11:56 2024] thp_fault_alloc 0
[Thu Oct 31 13:11:56 2024] thp_collapse_alloc 0
[Thu Oct 31 13:11:56 2024] thp_swpout 0
[Thu Oct 31 13:11:56 2024] thp_swpout_fallback 0
[Thu Oct 31 13:11:56 2024] Tasks state (memory values in pages):
[Thu Oct 31 13:11:56 2024] [  pid  ]   uid  tgid total_vm      rss rss_anon 
rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[Thu Oct 31 13:11:56 2024] [   2653]     0  2653     7961     4564     3984     
 580         0   106496        0         -1000 python3
[Thu Oct 31 13:11:56 2024] Out of memory and no killable processes...

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "dmesg.log"
   https://bugs.launchpad.net/bugs/2086198/+attachment/5833418/+files/dmesg.log

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2086198

Title:
  If a task in a CGroup with oom_score_adj set to -1000 goes over the
  memory limit, it is not killed but CPU usage of the machine goes high.

Status in linux package in Ubuntu:
  New

Bug description:
  The oom-killer goes into a loop if a task in a memory CGroup with 
oom_score_adj set to -1000 goes over the memory limit.
  The task (python3 in this case) is not killed and the CPU usage of the 
machine goes high but the machine still remained responsive.

  
  Setup details
  -------------

  root@vm1:~# cat /etc/os-release 
  PRETTY_NAME="Ubuntu 24.10"
  NAME="Ubuntu"
  VERSION_ID="24.10"
  VERSION="24.10 (Oracular Oriole)"
  VERSION_CODENAME=oracular
  ID=ubuntu
  ID_LIKE=debian
  HOME_URL="https://www.ubuntu.com/";
  SUPPORT_URL="https://help.ubuntu.com/";
  BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
  
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
  UBUNTU_CODENAME=oracular
  LOGO=ubuntu-logo

  
  root@vm1:~# uname -a
  Linux vm1 6.11.0-8-generic #8-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 16 13:41:20 
UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  root@vm1:~# 


  How to reproduce ?
  ----------------

  This was done on a VM with 1 GB Ram and 2 CPU cores with no swap
  root@vm1:~# free -h
                 total        used        free      shared  buff/cache   
available
  Mem:           960Mi       287Mi       203Mi       1.2Mi       608Mi       
673Mi
  Swap:             0B          0B          0B
  root@vm1:~# nproc
  2
  root@vm1:~# 

  
  On terminal 1, 

  root@vm1:~# python3
  Python 3.12.7 (main, Oct  3 2024, 15:15:22) [GCC 14.2.0] on linux
  Type "help", "copyright", "credits" or "license" for more information.
  >>> 

  
  On terminal 2, 

  root@vm1:~# mkdir /sys/fs/cgroup/testcg
  root@vm1:~# ps aux|grep python                                                
# get pid of python 2536 in this case
  root@vm1:~# echo -1000 > /proc/2536/oom_score_adj                             
                                                                                
                                     
  root@vm1:~# 2536 > /sys/fs/cgroup/testcg/cgroup.procs 

  On terminal 1, 
  >>> c2 = {i: i**4 for i in range(6000100)}

  
  Logs
  ----

  Dmesg continuously gets the following message along with other traces.
  Collected dmesg attached.

  
  [Thu Oct 31 13:11:56 2024] python3 invoked oom-killer: 
gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-1000
  [Thu Oct 31 13:11:56 2024] CPU: 1 UID: 0 PID: 2653 Comm: python3 Not tainted 
6.11.0-8-generic #8-Ubuntu
  [Thu Oct 31 13:11:56 2024] Hardware name: QEMU Standard PC (Q35 + ICH9, 
2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [Thu Oct 31 13:11:56 2024] Call Trace:
  [Thu Oct 31 13:11:56 2024]  <TASK>
  [Thu Oct 31 13:11:56 2024]  show_stack+0x49/0x60
  [Thu Oct 31 13:11:56 2024]  dump_stack_lvl+0x5f/0x90
  [Thu Oct 31 13:11:56 2024]  dump_stack+0x10/0x18
  [Thu Oct 31 13:11:56 2024]  dump_header+0x46/0x1a6
  [Thu Oct 31 13:11:56 2024]  out_of_memory.cold+0x1d/0x8d
  [Thu Oct 31 13:11:56 2024]  mem_cgroup_out_of_memory+0x13b/0x170
  [Thu Oct 31 13:11:56 2024]  try_charge_memcg+0x40f/0x5c0
  [Thu Oct 31 13:11:56 2024]  __mem_cgroup_charge+0x45/0xd0
  [Thu Oct 31 13:11:56 2024]  alloc_anon_folio+0x21b/0x450
  [Thu Oct 31 13:11:56 2024]  do_anonymous_page+0x13b/0x400
  [Thu Oct 31 13:11:56 2024]  handle_pte_fault+0x1ad/0x1c0
  [Thu Oct 31 13:11:56 2024]  __handle_mm_fault+0x3d5/0x7a0
  [Thu Oct 31 13:11:56 2024]  handle_mm_fault+0xef/0x2d0
  [Thu Oct 31 13:11:56 2024]  do_user_addr_fault+0x2ff/0x7e0
  [Thu Oct 31 13:11:56 2024]  exc_page_fault+0x85/0x1c0
  [Thu Oct 31 13:11:56 2024]  asm_exc_page_fault+0x27/0x30
  [Thu Oct 31 13:11:56 2024] RIP: 0033:0x72a4d5d961d3
  [Thu Oct 31 13:11:56 2024] Code: c5 fe 7f 47 40 c5 fe 7f 47 60 c5 f8 77 c3 66 
0f 1f 84 00 00 00 00 00 48 3b 15 69 20 08 00 73 77 40 0f b6 c6 48 89 d1 48 89 
fa <f3> aa 48 89 d0 c5 f8 77 c3 0f 1f 40 00 c4 e2 79 78 c0 83 fa 10 7d
  [Thu Oct 31 13:11:56 2024] RSP: 002b:00007ffd70a12718 EFLAGS: 00010287
  [Thu Oct 31 13:11:56 2024] RAX: 0000000000000000 RBX: 0000000000b509f0 RCX: 
00000000003d2020
  [Thu Oct 31 13:11:56 2024] RDX: 000072a4d4900030 RSI: 0000000000000000 RDI: 
000072a4d492e000
  [Thu Oct 31 13:11:56 2024] RBP: 00007ffd70a12780 R08: 00000000ffffffff R09: 
0000000000000000
  [Thu Oct 31 13:11:56 2024] R10: 0000000000000022 R11: 000072a4d4800030 R12: 
00000000003ffff0
  [Thu Oct 31 13:11:56 2024] R13: 000072a4d5000010 R14: 000072a4d5bf59c0 R15: 
000072a4d4800010
  [Thu Oct 31 13:11:56 2024]  </TASK>
  [Thu Oct 31 13:11:56 2024] memory: usage 10240kB, limit 10240kB, failcnt 
18041088
  [Thu Oct 31 13:11:56 2024] swap: usage 0kB, limit 9007199254740988kB, failcnt 0
  [Thu Oct 31 13:11:56 2024] Memory cgroup stats for /testcg:
  [Thu Oct 31 13:11:56 2024] anon 10457088
  [Thu Oct 31 13:11:56 2024] file 0
  [Thu Oct 31 13:11:56 2024] kernel 28672
  [Thu Oct 31 13:11:56 2024] kernel_stack 0
  [Thu Oct 31 13:11:56 2024] pagetables 24576
  [Thu Oct 31 13:11:56 2024] sec_pagetables 0
  [Thu Oct 31 13:11:56 2024] percpu 0
  [Thu Oct 31 13:11:56 2024] sock 0
  [Thu Oct 31 13:11:56 2024] vmalloc 0
  [Thu Oct 31 13:11:56 2024] shmem 0
  [Thu Oct 31 13:11:56 2024] zswap 0
  [Thu Oct 31 13:11:56 2024] zswapped 0
  [Thu Oct 31 13:11:56 2024] file_mapped 0
  [Thu Oct 31 13:11:56 2024] file_dirty 0
  [Thu Oct 31 13:11:56 2024] file_writeback 0
  [Thu Oct 31 13:11:56 2024] swapcached 0
  [Thu Oct 31 13:11:56 2024] anon_thp 0
  [Thu Oct 31 13:11:56 2024] file_thp 0
  [Thu Oct 31 13:11:56 2024] shmem_thp 0
  [Thu Oct 31 13:11:56 2024] inactive_anon 0
  [Thu Oct 31 13:11:56 2024] active_anon 10457088
  [Thu Oct 31 13:11:56 2024] inactive_file 0
  [Thu Oct 31 13:11:56 2024] active_file 0
  [Thu Oct 31 13:11:56 2024] unevictable 0
  [Thu Oct 31 13:11:56 2024] slab_reclaimable 0
  [Thu Oct 31 13:11:56 2024] slab_unreclaimable 2096
  [Thu Oct 31 13:11:56 2024] slab 2096
  [Thu Oct 31 13:11:56 2024] workingset_refault_anon 0
  [Thu Oct 31 13:11:56 2024] workingset_refault_file 0
  [Thu Oct 31 13:11:56 2024] workingset_activate_anon 0
  [Thu Oct 31 13:11:56 2024] workingset_activate_file 0
  [Thu Oct 31 13:11:56 2024] workingset_restore_anon 0
  [Thu Oct 31 13:11:56 2024] workingset_restore_file 0
  [Thu Oct 31 13:11:56 2024] workingset_nodereclaim 0
  [Thu Oct 31 13:11:56 2024] pgscan 0
  [Thu Oct 31 13:11:56 2024] pgsteal 0
  [Thu Oct 31 13:11:56 2024] pgscan_kswapd 0
  [Thu Oct 31 13:11:56 2024] pgscan_direct 0
  [Thu Oct 31 13:11:56 2024] pgscan_khugepaged 0
  [Thu Oct 31 13:11:56 2024] pgsteal_kswapd 0
  [Thu Oct 31 13:11:56 2024] pgsteal_direct 0
  [Thu Oct 31 13:11:56 2024] pgsteal_khugepaged 0
  [Thu Oct 31 13:11:56 2024] pgfault 961958
  [Thu Oct 31 13:11:56 2024] pgmajfault 36510
  [Thu Oct 31 13:11:56 2024] pgrefill 0
  [Thu Oct 31 13:11:56 2024] pgactivate 0
  [Thu Oct 31 13:11:56 2024] pgdeactivate 0
  [Thu Oct 31 13:11:56 2024] pglazyfree 0
  [Thu Oct 31 13:11:56 2024] pglazyfreed 0
  [Thu Oct 31 13:11:56 2024] zswpin 0
  [Thu Oct 31 13:11:56 2024] zswpout 0
  [Thu Oct 31 13:11:56 2024] zswpwb 0
  [Thu Oct 31 13:11:56 2024] thp_fault_alloc 0
  [Thu Oct 31 13:11:56 2024] thp_collapse_alloc 0
  [Thu Oct 31 13:11:56 2024] thp_swpout 0
  [Thu Oct 31 13:11:56 2024] thp_swpout_fallback 0
  [Thu Oct 31 13:11:56 2024] Tasks state (memory values in pages):
  [Thu Oct 31 13:11:56 2024] [  pid  ]   uid  tgid total_vm      rss rss_anon 
rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
  [Thu Oct 31 13:11:56 2024] [   2653]     0  2653     7961     4564     3984   
   580         0   106496        0         -1000 python3
  [Thu Oct 31 13:11:56 2024] Out of memory and no killable processes...

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2086198/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to