Public bug reported:

This upstream (v6.16) fix resolves an issue when trying to pin a memfd
folio before it has been faulted in, which can lead to a crash when
CONFIG_DEBUG_VM is enabled or an accounting issue with resv_huge_pages
when that kconfig is not present. Contiguous memory is required for the
vCMDQ feature on Grace and one way of achieving that is by using huge
pages to back the VM memory.

While testing PR 179 with the 4k host kernel and a QEMU branch with the 
pluggable SMMUv3 interface, I found that the VM would exhibit symptoms of the 
vCMDQ not being backed back contiguous memory:
[    0.377799] acpi NVDA200C:00: tegra241_cmdqv: unexpected error reported. 
vintf_map: 0000000000000001, vcmdq_map 00000000:00000000:00000000:00000002
[    0.379174] arm-smmu-v3 arm-smmu-v3.0.auto: CMDQ error (cons 0x04000000): 
Unknown
[    0.379954] arm-smmu-v3 arm-smmu-v3.0.auto: skipping command in error state:
[    0.380632] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0001000000000011
[    0.381147] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000

When this occurred, I noticed that the huge page metadata did not match 
expectations. Notably, it showed that an extra 16G of hugepages was being used 
and also reflected a negative “in reserve” count, indicating an underflow 
condition.
# grep -i hugep /proc/meminfo 
AnonHugePages:     69632 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:      64
HugePages_Free:       32
HugePages_Rsvd:    18446744073709551600
HugePages_Surp:        0
Hugepagesize:    1048576 kB

After instrumenting the kernel, I was able to prove the underflow and
then noticed this upstream fix. The data also showed that the newer QEMU
branch makes more calls to memfd_pin_foilios() during GPU VFIO setup,
which triggered the bug in the kernel - I never saw this bug with the
older QEMU branch we’ve been using for quite some time for Grace
virtualization. After applying the fix, I no longer see the bad huge
page metadata and the vCMDQ feature works properly with the 4k host
kernel.

Lore discussion: 
https://lkml.kernel.org/r/[email protected]
Upstream SHA: eb920662230f mm/hugetlb: don't crash when allocating a folio if 
there are no resv

This commit picked cleanly to 24.04_linux-nvidia-6.14-next.

Testing:
GPU PT on 4k host with more huge pages than the VM requires (e.g. 32 1G 
hugepages for a 16G VM)
QEMU: https://github.com/nvmochs/QEMU/tree/smmuv3-accel-07212025_egm

qemu-system-aarch64 \
        -object iommufd,id=iommufd0 \
        -machine hmat=on -machine 
virt,accel=kvm,gic-version=3,ras=on,highmem-mmio-size=512G \
        -cpu host -smp cpus=4 -m size=16G,slots=2,maxmem=66G -nographic \
        -object 
memory-backend-file,size=8G,id=m0,mem-path=/hugepages/,prealloc=on,share=off \
        -object 
memory-backend-file,size=8G,id=m1,mem-path=/hugepages/,prealloc=on,share=off \
        -numa node,memdev=m0,cpus=0-3,nodeid=0 -numa node,memdev=m1,nodeid=1 \
        -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa 
node,nodeid=5\
        -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa 
node,nodeid=9\
        -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 -device 
arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on,cmdqv=on \
        -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,io-reserve=0 \
        -device 
vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.port1,rombar=0,id=dev0,iommufd=iommufd0
 \
        -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
        -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
        -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
        -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
        -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
        -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
        -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
        -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
        -bios /usr/share/AAVMF/AAVMF_CODE.fd \
        -device nvme,drive=nvme0,serial=deadbeaf1,bus=pcie.0 \
        -drive 
file=guest.qcow2,index=0,media=disk,format=qcow2,if=none,id=nvme0 \
        -device 
e1000,romfile=/usr/local/share/qemu/efi-e1000.rom,netdev=net0,bus=pcie.0 \
        -netdev user,id=net0,hostfwd=tcp::5558-:22,hostfwd=tcp::5586-:5586

** Affects: linux-nvidia-6.14 (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.14 in Ubuntu.
https://bugs.launchpad.net/bugs/2119577

Title:
  Backport: mm/hugetlb: don't crash when allocating a folio if there are
  no resv

Status in linux-nvidia-6.14 package in Ubuntu:
  New

Bug description:
  This upstream (v6.16) fix resolves an issue when trying to pin a memfd
  folio before it has been faulted in, which can lead to a crash when
  CONFIG_DEBUG_VM is enabled or an accounting issue with resv_huge_pages
  when that kconfig is not present. Contiguous memory is required for
  the vCMDQ feature on Grace and one way of achieving that is by using
  huge pages to back the VM memory.

  While testing PR 179 with the 4k host kernel and a QEMU branch with the 
pluggable SMMUv3 interface, I found that the VM would exhibit symptoms of the 
vCMDQ not being backed back contiguous memory:
  [    0.377799] acpi NVDA200C:00: tegra241_cmdqv: unexpected error reported. 
vintf_map: 0000000000000001, vcmdq_map 00000000:00000000:00000000:00000002
  [    0.379174] arm-smmu-v3 arm-smmu-v3.0.auto: CMDQ error (cons 0x04000000): 
Unknown
  [    0.379954] arm-smmu-v3 arm-smmu-v3.0.auto: skipping command in error 
state:
  [    0.380632] arm-smmu-v3 arm-smmu-v3.0.auto:        0x0001000000000011
  [    0.381147] arm-smmu-v3 arm-smmu-v3.0.auto:        0x0000000000000000

  When this occurred, I noticed that the huge page metadata did not match 
expectations. Notably, it showed that an extra 16G of hugepages was being used 
and also reflected a negative “in reserve” count, indicating an underflow 
condition.
  # grep -i hugep /proc/meminfo 
  AnonHugePages:     69632 kB
  ShmemHugePages:        0 kB
  FileHugePages:         0 kB
  HugePages_Total:      64
  HugePages_Free:       32
  HugePages_Rsvd:    18446744073709551600
  HugePages_Surp:        0
  Hugepagesize:    1048576 kB

  After instrumenting the kernel, I was able to prove the underflow and
  then noticed this upstream fix. The data also showed that the newer
  QEMU branch makes more calls to memfd_pin_foilios() during GPU VFIO
  setup, which triggered the bug in the kernel - I never saw this bug
  with the older QEMU branch we’ve been using for quite some time for
  Grace virtualization. After applying the fix, I no longer see the bad
  huge page metadata and the vCMDQ feature works properly with the 4k
  host kernel.

  Lore discussion: 
https://lkml.kernel.org/r/[email protected]
  Upstream SHA: eb920662230f mm/hugetlb: don't crash when allocating a folio if 
there are no resv

  This commit picked cleanly to 24.04_linux-nvidia-6.14-next.

  Testing:
  GPU PT on 4k host with more huge pages than the VM requires (e.g. 32 1G 
hugepages for a 16G VM)
  QEMU: https://github.com/nvmochs/QEMU/tree/smmuv3-accel-07212025_egm

  qemu-system-aarch64 \
        -object iommufd,id=iommufd0 \
        -machine hmat=on -machine 
virt,accel=kvm,gic-version=3,ras=on,highmem-mmio-size=512G \
        -cpu host -smp cpus=4 -m size=16G,slots=2,maxmem=66G -nographic \
        -object 
memory-backend-file,size=8G,id=m0,mem-path=/hugepages/,prealloc=on,share=off \
        -object 
memory-backend-file,size=8G,id=m1,mem-path=/hugepages/,prealloc=on,share=off \
        -numa node,memdev=m0,cpus=0-3,nodeid=0 -numa node,memdev=m1,nodeid=1 \
        -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 -numa 
node,nodeid=5\
        -numa node,nodeid=6 -numa node,nodeid=7 -numa node,nodeid=8 -numa 
node,nodeid=9\
        -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 -device 
arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1,accel=on,cmdqv=on \
        -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,io-reserve=0 \
        -device 
vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.port1,rombar=0,id=dev0,iommufd=iommufd0
 \
        -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
        -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
        -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
        -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
        -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
        -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
        -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
        -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
        -bios /usr/share/AAVMF/AAVMF_CODE.fd \
        -device nvme,drive=nvme0,serial=deadbeaf1,bus=pcie.0 \
        -drive 
file=guest.qcow2,index=0,media=disk,format=qcow2,if=none,id=nvme0 \
        -device 
e1000,romfile=/usr/local/share/qemu/efi-e1000.rom,netdev=net0,bus=pcie.0 \
        -netdev user,id=net0,hostfwd=tcp::5558-:22,hostfwd=tcp::5586-:5586

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.14/+bug/2119577/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to