Re: [PATCH v2 1/1] mm: numa_memblks: Identify the accurate NUMA ID of CFMW

Andrew Morton Fri, 06 Feb 2026 07:58:19 -0800

On Fri, 6 Feb 2026 15:09:41 +0000 Jonathan Cameron 
<[email protected]> wrote:


> > Andrew if Jonathan is good with it then with changelog updates this can
> > go in, otherwise I don't think this warrants a backport or anything.
> 
> Wait and see if anyone hits it on a real machine (or even non creative QEMU
> setup!)  So for now no need to backport.

Thanks, all.

Below is the current state of this patch.  Is the changelog suitable?


From: Cui Chao <[email protected]>
Subject: mm: numa_memblks: identify the accurate NUMA ID of CFMW
Date: Tue, 6 Jan 2026 11:10:42 +0800

In some physical memory layout designs, the address space of CFMW (CXL
Fixed Memory Window) resides between multiple segments of system memory
belonging to the same NUMA node.  In numa_cleanup_meminfo, these multiple
segments of system memory are merged into a larger numa_memblk.  When
identifying which NUMA node the CFMW belongs to, it may be incorrectly
assigned to the NUMA node of the merged system memory.

When a CXL RAM region is created in userspace, the memory capacity of
the newly created region is not added to the CFMW-dedicated NUMA node. 
Instead, it is accumulated into an existing NUMA node (e.g., NUMA0
containing RAM).  This makes it impossible to clearly distinguish
between the two types of memory, which may affect memory-tiering
applications.

Example memory layout:

Physical address space:
    0x00000000 - 0x1FFFFFFF  System RAM (node0)
    0x20000000 - 0x2FFFFFFF  CXL CFMW (node2)
    0x40000000 - 0x5FFFFFFF  System RAM (node0)
    0x60000000 - 0x7FFFFFFF  System RAM (node1)

After numa_cleanup_meminfo, the two node0 segments are merged into one:
    0x00000000 - 0x5FFFFFFF  System RAM (node0) // CFMW is inside the range
    0x60000000 - 0x7FFFFFFF  System RAM (node1)

So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0.

To address this scenario, accurately identifying the correct NUMA node
can be achieved by checking whether the region belongs to both
numa_meminfo and numa_reserved_meminfo.


1. Issue Impact and Backport Recommendation:

This patch fixes an issue on hardware platforms (not QEMU emulation)
where, during the dynamic creation of a CXL RAM region, the memory
capacity is not assigned to the correct CFMW-dedicated NUMA node.  This
issue leads to:

    Failure of the memory tiering mechanism: The system is designed to
    treat System RAM as fast memory and CXL memory as slow memory. For
    performance optimization, hot pages may be migrated to fast memory
    while cold pages are migrated to slow memory. The system uses NUMA
    IDs as an index to identify different tiers of memory. If the NUMA
    ID for CXL memory is calculated incorrectly and its capacity is
    aggregated into the NUMA node containing System RAM (i.e., the node
    for fast memory), the CXL memory cannot be correctly identified. It
    may be misjudged as fast memory, thereby affecting performance
    optimization strategies.

    Inability to distinguish between System RAM and CXL memory even for
    simple manual binding: Tools like |numactl|and other NUMA policy
    utilities cannot differentiate between System RAM and CXL memory,
    making it impossible to perform reasonable memory binding.

    Inaccurate system reporting: Tools like |numactl -H|would display
    memory capacities that do not match the actual physical hardware
    layout, impacting operations and monitoring.

This issue affects all users utilizing the CXL RAM functionality who
rely on memory tiering or NUMA-aware scheduling.  Such configurations
are becoming increasingly common in data centers, cloud computing, and
high-performance computing scenarios.

Therefore, I recommend backporting this patch to all stable kernel 
series that support dynamic CXL region creation.

2. Why a Kernel Update is Recommended Over a Firmware Update:

In the scenario of dynamic CXL region creation, the association between
the memory's HPA range and its corresponding NUMA node is established
when the kernel driver performs the commit operation.  This is a
runtime, OS-managed operation where the platform firmware cannot
intervene to provide a fix.

Considering factors like hardware platform architecture, memory
resources, and others, such a physical address layout can indeed occur.
This patch does not introduce risk; it simply correctly handles the
NUMA node assignment for CXL RAM regions within such a physical address
layout.

Thus, I believe a kernel fix is necessary.

Link: 
https://lkml.kernel.org/r/[email protected]
Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
Signed-off-by: Cui Chao <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Wang Yinfeng <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Gregory Price <[email protected]>
Cc: Joanthan Cameron <[email protected]>
Cc: Wang Yinfeng <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

 mm/numa_memblks.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/numa_memblks.c~mm-numa_memblks-identify-the-accurate-numa-id-of-cfmw
+++ a/mm/numa_memblks.c
@@ -570,15 +570,16 @@ static int meminfo_to_nid(struct numa_me
 int phys_to_target_node(u64 start)
 {
        int nid = meminfo_to_nid(&numa_meminfo, start);
+       int reserved_nid = meminfo_to_nid(&numa_reserved_meminfo, start);
 
        /*
         * Prefer online nodes, but if reserved memory might be
         * hot-added continue the search with reserved ranges.
         */
-       if (nid != NUMA_NO_NODE)
+       if (nid != NUMA_NO_NODE && reserved_nid == NUMA_NO_NODE)
                return nid;
 
-       return meminfo_to_nid(&numa_reserved_meminfo, start);
+       return reserved_nid;
 }
 EXPORT_SYMBOL_GPL(phys_to_target_node);
 
_

Re: [PATCH v2 1/1] mm: numa_memblks: Identify the accurate NUMA ID of CFMW

Reply via email to