On Thu, 5 Feb 2026 18:10:55 -0500 Gregory Price <[email protected]> wrote:
> On Thu, Feb 05, 2026 at 02:58:42PM -0800, Andrew Morton wrote: > > On Mon, 26 Jan 2026 17:06:52 +0800 Cui Chao <[email protected]> > > wrote: > > > > > > All that said, this does look harmless, and seems reasonable - but the > > > > changelog should reflect what the hardware is doing above. > > > This issue was discovered on the QEMU platform. I need to apologize for > > > my earlier imprecise statement (claiming it was hardware instead of > > > QEMU). My core point at the time was to emphasize that this is a problem > > > in the general code path when facing this scenario, not a QEMU-specific > > > emulation issue, and therefore it could theoretically affect real > > > hardware as well. I apologize for any confusion this may have caused. > > > > This patch doesn't sounds very urgent. Perhaps we should do a v3 with > > updated changelog and handle that in the next -rc cycle? > > Mostly QEMU just needs to add SRAT entries associated with the > CEDT/CFMWS it adds. HI Gregory, I got a bit carried away - but the following basically says: No QEMU should not add SRAT Memory Affinity Structures. As to Andrew's question: I'm fine with this fix taking a little longer. I disagree. There is nothing in the specification to say it should do that and we have very intentionally not done so in QEMU - this is far from the first time this has come up!. We won't be doing so any time soon unless someone convinces me with clear spec references and tight reasoning for why it is the right thing to do. The only time providing SRAT Memory Affinity Structures for CEDT CXL Fixed Memory Window Structures (CFMWSs) is definitely the right thing to do is if the BIOS has also programmed the full set of decoders etc. That is something we could do in QEMU as an option. Only if we do that would it be valid to provide SRAT Memory structures for the CXL memory. I'd suggest that's probably a job for EDK2 rather than QEMU but that's an implementation detail and there is a dance between EDK2 and QEMU for creating some of the tables anyway. This configuration reflects the pre hotplug / early CXL deployment situation. Now we have proper support in Linux we have moved beyond that. We do need to solve the dynamic NUMA node cases though and I'm hoping your current work will make that a bit easier. Note that I give the same advice to our firmware folk I talk to. This stuff is policy - it belongs in OS control, not in a bunch of config menus in the BIOS or output of some unknown heuristic. BIOS authors are not clairvoyant. They have no way to know (in a non trivial topology) what makes sense for a given use case or what devices are going to be hotplugged later. I'd increasingly expect shipping BIOSes to have a "hands off" option in which they make not attempt to guess about what is beyond the host bridges. One argument I have heard for why a BIOS could know an appropriate CFMWS to SRAT memory structure mapping is the CFWMS / QTG (Quality of Service) mapping implying a consistency of performance expectations in a given CFMWS. However that's very specific to particular designs. For others PA space is expensive thus they use one large CFMWS for everything and QoS handing in the uarch relies on information derived from the host bridge decoders. Often no one cares about cross host bridge interleave for same reason full system DRAM interleave is a niche thing. PA space is too expensive to provide the extra CFMWS to support it. If we are looking at forwards looking systems, that are built to work with full gamut of CXL then all that should be in SRAT for CXL topology is the Generic Port Structures (provide a handle for perf data in HMAT to the edge of the 'known world' - the host bridge / root ports). Nothing else. If we do have BIOSes that are guessing what to put in SRAT and associated HMAT etc then there is a fairly strong argument that a good OS implementation should at most take such structures as a hint not a rule (though obviously we don't do that today). > > A system providing a CEDT/CFMWS entry without an SRAT entry is arguably > bad BIOS. I'd argue that if you aren't programming the decoder topology (and probably locking everything down) and are providing SRAT then you are providing a guess at best and that's a bad BIOS - not the other way around. Note we've supported this from the first in Linux so it's not like there is anything missing, just a corner case to tidy up. > > But yeah, this is not urgent. I'd like to see it fixed, but given we don't know of a system where this applies today it doesn't need to be super rushed! Jonathan > > ~Gregory
