[Kernel-packages] [Bug 2103680] Re: System hangs when running the memory stress test

AceLan Kao Tue, 23 Sep 2025 19:55:33 -0700

This commit in linux-next looks promissing

commit 4a077b6dd4a4872075d45065e1bfb2b6d79f9ea7
Author: Johannes Weiner <[email protected]>
Date:   Fri Sep 19 12:21:34 2025 -0400


    mm: page_alloc: avoid kswapd thrashing due to NUMA restrictions
    
    On NUMA systems without bindings, allocations check all nodes for free
    space, then wake up the kswapds on all nodes and retry. This ensures
    all available space is evenly used before reclaim begins. However,
    when one process or certain allocations have node restrictions, they
    can cause kswapds on only a subset of nodes to be woken up.
    
    Since kswapd hysteresis targets watermarks that are *higher* than
    needed for allocation, even *unrestricted* allocations can now get
    suckered onto such nodes that are already pressured. This ends up
    concentrating all allocations on them, even when there are idle nodes
    available for the unrestricted requests.
    
    This was observed with two numa nodes, where node0 is normal and node1
    is ZONE_MOVABLE to facilitate hotplugging: a kernel allocation wakes
    kswapd on node0 only (since node1 is not eligible); once kswapd0 is
    active, the watermarks hover between low and high, and then even the
    movable allocations end up on node0, only to be kicked out again;
    meanwhile node1 is empty and idle.
    
    Similar behavior is possible when a process with NUMA bindings is
    causing selective kswapd wakeups.
    
    To fix this, on NUMA systems augment the (misleading) watermark test
    with a check for whether kswapd is already active during the first
    iteration through the zonelist. If this fails to place the request,
    kswapd must be running everywhere already, and the watermark test is
    good enough to decide placement.
    
    With this patch, unrestricted requests successfully make use of node1,
    even while kswapd is reclaiming node0 for restricted allocations.
    
    [[email protected]: don't retry if no kswapds were active]
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Gregory Price <[email protected]>
    Tested-by: Joshua Hahn <[email protected]>
    Signed-off-by: Johannes Weiner <[email protected]>
    Acked-by: Zi Yan <[email protected]>
    Cc: Brendan Jackman <[email protected]>
    Cc: Joshua Hahn <[email protected]>
    Cc: Michal Hocko <[email protected]>
    Cc: Suren Baghdasaryan <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-oem-6.11 in Ubuntu.
https://bugs.launchpad.net/bugs/2103680

Title:
  System hangs when running the memory stress test

Status in HWE Next:
  Opinion
Status in linux package in Ubuntu:
  New
Status in linux-oem-6.11 package in Ubuntu:
  Invalid
Status in linux source package in Noble:
  New
Status in linux-oem-6.11 source package in Noble:
  Fix Released

Bug description:
  [Impact]
  While running the memory stress test, the system becomes unresponsive.

  [Fix]
  The commit in v6.11-rc1 introduce the issue.
  4e63aeb5d010 blk-wbt: don't throttle swap writes in direct reclaim

  And we are seeking for help from the patch owner and other developers on the 
mailing list
  https://lkml.org/lkml/2025/3/20/90

  Currently, we have to revert this commit, because this issue happens
  on many platforms.

  [Test]
  Run the following command on the machine with kernel version greater or equal 
to v6.11
     sudo stress-ng --aggressive --verify --timeout 300 --mmapmany 0
  It should finish the test in 5mins.

  [Where problems could occur]
  From the commit message reverts this commit may trigger a hang
  "When a process holds a lock to allocate a free page, and enters direct
  reclaim because there is no free memory, then it might trigger a hung
  due to the wbt throttling that causes other processes to fail to get
  the lock."

To manage notifications about this bug go to:
https://bugs.launchpad.net/hwe-next/+bug/2103680/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2103680] Re: System hangs when running the memory stress test

Reply via email to