Re: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Ankit Agrawal Wed, 27 Sep 2023 00:15:35 -0700

> >
> > Based on the suggestions here, can we consider something like the
> > following?
> > 1. Introduce a new -numa subparam 'devnode', which tells Qemu to mark
> > the node with MEM_AFFINITY_HOTPLUGGABLE in the SRAT's memory affinity
> > structure to make it hotpluggable.
>
> Is that "devnode=on" parameter required? Can't we simply expose any node
> that does *not* have any boot memory assigned as MEM_AFFINITY_HOTPLUGGABLE?
>
> Right now, with "ordinary", fixed-location memory devices
> (DIMM/NVDIMM/virtio-mem/virtio-pmem), we create an srat entry that
> covers the device memory region for these devices with
> MEM_AFFINITY_HOTPLUGGABLE. We use the highest NUMA node in the machine,
> which does not quite work IIRC. All applicable nodes that don't have
> boot memory would need MEM_AFFINITY_HOTPLUGGABLE for Linux to create them.


Yeah, you're right that it isn't required. Exposing the node without any memory 
as
MEM_AFFINITY_HOTPLUGGABLE seems like a better approach than using
"devnode=on".

> In your example, which memory ranges would we use for these nodes in SRAT?

We are setting the Base Address and the Size as 0 in the SRAT memory affinity
structures. This is done through the following:
build_srat_memory(table_data, 0, 0, i,
                  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);

This results in the following logs in the VM from the Linux ACPI SRAT parsing 
code:
[    0.000000] ACPI: SRAT: Node 2 PXM 2 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 3 PXM 3 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 4 PXM 4 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 5 PXM 5 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 6 PXM 6 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 7 PXM 7 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 8 PXM 8 [mem 0x00000000-0xffffffffffffffff] 
hotplug
[    0.000000] ACPI: SRAT: Node 9 PXM 9 [mem 0x00000000-0xffffffffffffffff] 
hotplug

I would re-iterate that we are just emulating the baremetal behavior here.


> I don't see how these numa-node args on a vfio-pci device have any
> general utility.  They're only used to create a firmware table, so why
> don't we be explicit about it and define the firmware table as an
> object?  For example:
>
>        -numa node,nodeid=2 \
>        -numa node,nodeid=3 \
>        -numa node,nodeid=4 \
>        -numa node,nodeid=5 \
>        -numa node,nodeid=6 \
>        -numa node,nodeid=7 \
>        -numa node,nodeid=8 \
>        -numa node,nodeid=9 \
>        -device 
>vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=nvgrace0 
>\
>        -object nvidia-gpu-mem-acpi,devid=nvgrace0,nodeset=2-9 \

Yeah, that is fine with me. If we agree with this approach, I can go
implement it.


> There are some suggestions in this thread that CXL could have similar
> requirements, but I haven't found any evidence that these
> dev-mem-pxm-{start,count} attributes in the _DSD are standardized in
> any way.  If they are, maybe this would be a dev-mem-pxm-acpi object
> rather than an NVIDIA specific one.

Maybe Jason, Jonathan can chime in on this?


> It seems like we could almost meet the requirement for this table via
> -acpitable, but I think we'd like to avoid the VM orchestration tool
> from creating, compiling, and passing ACPI data blobs into the VM.

Re: [PATCH v1 0/4] vfio: report NUMA nodes for device memory

Reply via email to