On Thu, Mar 12, 2026 at 10:09:10AM -0700, Nhat Pham wrote: > On Wed, Mar 11, 2026 at 9:01 PM Li Wang <[email protected]> wrote: > > > > On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote: > > > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <[email protected]> wrote: > > > > > > > > test_swapin_nozswap can hit OOM before reaching its assertions on some > > > > setups. The test currently sets memory.max=8M and then allocates/reads > > > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill > > > > the workload process. > > > > > > > > Raise memory.max to 24M so the workload can make forward progress, and > > > > lower the swap_peak expectation from 24M to 8M to keep the check robust > > > > across environments. > > > > > > > > The test intent is unchanged: verify that swapping happens while zswap > > > > remains unused when memory.zswap.max=0. > > > > > > > > === Error Logs === > > > > > > > > # ./test_zswap > > > > TAP version 13 > > > > 1..7 > > > > ok 1 test_zswap_usage > > > > not ok 2 test_swapin_nozswap > > > > ... > > > > > > > > # dmesg > > > > [271641.879153] test_zswap invoked oom-killer: > > > > gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 > > > > [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: > > > > loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY > > > > [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) > > > > 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries > > > > [271641.879173] Call Trace: > > > > [271641.879174] [c00000037540f730] [c00000000127ec44] > > > > dump_stack_lvl+0x88/0xc4 (unreliable) > > > > [271641.879184] [c00000037540f760] [c0000000005cc594] > > > > dump_header+0x5c/0x1e4 > > > > [271641.879188] [c00000037540f7e0] [c0000000005cb464] > > > > oom_kill_process+0x324/0x3b0 > > > > [271641.879192] [c00000037540f860] [c0000000005cbe48] > > > > out_of_memory+0x118/0x420 > > > > [271641.879196] [c00000037540f8f0] [c00000000070d8ec] > > > > mem_cgroup_out_of_memory+0x18c/0x1b0 > > > > [271641.879200] [c00000037540f990] [c000000000713888] > > > > try_charge_memcg+0x598/0x890 > > > > [271641.879204] [c00000037540fa70] [c000000000713dbc] > > > > charge_memcg+0x5c/0x110 > > > > [271641.879207] [c00000037540faa0] [c0000000007159f8] > > > > __mem_cgroup_charge+0x48/0x120 > > > > [271641.879211] [c00000037540fae0] [c000000000641914] > > > > alloc_anon_folio+0x2b4/0x5a0 > > > > [271641.879215] [c00000037540fb60] [c000000000641d58] > > > > do_anonymous_page+0x158/0x6b0 > > > > [271641.879218] [c00000037540fbd0] [c000000000642f8c] > > > > __handle_mm_fault+0x4bc/0x910 > > > > [271641.879221] [c00000037540fcf0] [c000000000643500] > > > > handle_mm_fault+0x120/0x3c0 > > > > [271641.879224] [c00000037540fd40] [c00000000014bba0] > > > > ___do_page_fault+0x1c0/0x980 > > > > [271641.879228] [c00000037540fdf0] [c00000000014c44c] > > > > hash__do_page_fault+0x2c/0xc0 > > > > [271641.879232] [c00000037540fe20] [c0000000001565d8] > > > > do_hash_fault+0x128/0x1d0 > > > > [271641.879236] [c00000037540fe50] [c000000000008be0] > > > > data_access_common_virt+0x210/0x220 > > > > [271641.879548] Tasks state (memory values in pages): > > > > ... > > > > [271641.879550] [ pid ] uid tgid total_vm rss rss_anon > > > > rss_file rss_shmem pgtables_bytes swapents oom_score_adj name > > > > [271641.879555] [ 177372] 0 177372 571 0 0 > > > > 0 0 51200 96 0 test_zswap > > > > [271641.879562] > > > > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0 > > > > [271641.879578] Memory cgroup out of memory: Killed process 177372 > > > > (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, > > > > shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0 > > > > > > Why are we getting an OOM kill when there's a swap device? Is the > > > device slow / not keeping up with reclaim pace? > > > > This is a good question. The OOM is triggered very likely because memcg > > reclaim can't make forward progress fast enough within the retry budget > > of try_charge_memcg. > > > > Looking at the OOM info, the system has 64K pages, so memory.max=8M gives > > only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space > > itself isn't full, the charge path simply gave up trying to reclaim. > > > > The core issue, I guess, is that with memory.zswap.max=0, every page > > reclaimed must go through the real block device. The charge path works > > like this: a page fault fires, charge_memcg tries to charge 64K to the > > cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct > > reclaim to free space. If the swap device can't drain pages fast enough, > > the reclaim attempts within the retry loop fail to bring usage below > > memory.max, and the kernel invokes OOM, even though swap space is > > technically available. > > > > Raising memory.max to 24M gives reclaim a much larger pool to work with, > > so it can absorb I/O latency without exhausting its retry budget. > > Hmmm, perhaps we should change all these constants to multiples of > base page size of a system?
Yeah, this may better, let me try it in next version. -- Regards, Li Wang

