Hi Sairaj, On 1/7/26 1:09 AM, Sairaj Kodilkar wrote: > Hello all, > > Gentle ping, >
I mentioned privately that I am investigating the main limitations/alternatives that are listed below, but I don't yet have any suggestions as to the best way to move forward. So while I am still hoping from feedback from others in the list, I'll reply to this series as if it will be merged using the current approach. Thank you, Alejandro > On 11/18/2025 3:45 PM, Sairaj Kodilkar wrote: >> Resending this series with KVM and IOMMU maintainers in CC. >> >> AMD IOMMU can route upto 2048 MSI vectors through a single >> Interrupt Remapping Table (IRT) entry. This series brings the same >> capability to the emulated AMD IOMMU in QEMU. >> >> Highlights >> ---------- >> * Sets bits [9:8] in Extended-Feature-Register-2 to advertise 2K MSI >> support to the guest. >> * Uses bits [10:0] of the MSI data to select the IRTE when the guest >> programs MSIs in logical-destination mode. >> * Introduces a new IOMMU device property: >> -device amd-iommu,...,numint2k=on >> >> The feature is **opt-in**; guests keep the 512-MSI behaviour unless >> `numint2k=on` is supplied. >> >> Passthrough devices >> ------------------- >> When a PCI function is passed through via iommufd the code checks the >> host’s vendor capabilities. If the host IOMMU has not enabled >> 2K-MSI support (bits [44:43] set in the control register) the guest >> feature is disabled even if `numint2k=on` was requested. >> >> The detection logic relies on the iommufd interface; with the legacy >> VFIO container the guest always falls back to 512 MSIs. >> >> Example >> ------- >> qemu-system-x86_64 \ >> -enable-kvm -m 10G -smp cpus=8 \ >> -kernel /boot/vmlinuz \ >> -initrd /boot/initrd.img \ >> -append "console=ttyS0 earlyprintk=serial root=<DEVICE>" >> -device amd-iommu,dma-remap=on,numint2k=on \ >> -object iommufd,id=iommufd0 \ >> -device vfio-pci,host=<DEVID>,iommufd=iommufd0 \ >> -global kvm-pit.lost_tick_policy=discard \ >> -cpu host \ >> -machine q35,kernel_irqchip=split \ >> -nographic \ >> -smbios type=0,version=2.8 \ >> -blockdev node- >> name=drive0,driver=qcow2,file.driver=file,file.filename=<IMAGE> \ >> -device virtio-blk-pci,drive=drive0 >> >> Limitations >> ----------- >> This approach works well for features queried after IOMMUFD >> initialization but cannot handle features needed during early QEMU >> setup, before IOMMUFD is available. >> >> A key example is EFR2[HTRangeIgnore]. When this bit is set, the physical >> IOMMU treats HyperTransport (HT) address ranges as regular memory >> accesses rather than reserved regions. This has important implications >> for memory layout: >> >> * Without HTRangeIgnore: QEMU must relocate RAM above 4G to above 1T on >> AMD platforms to avoid HT conflicts >> * With HTRangeIgnore: QEMU can safely place RAM immediately above 4G, >> improving memory utilization >> >> Since RAM layout must be determined before IOMMUFD initialization, QEMU >> cannot use hwinfo to query EFR2[HTRangeIgnore] feature bit. >> >> Another limitation with using the control register is that, if BIOS enables >> particular feature (e.g. ControlRegister[GCR3TRPMode) without kernel support >> QEMU incorrectly assumes that host kernel supports that feature potentially >> causing guest failure. >> >> Alternative considered >> ---------------------- >> We also explored alternate approach which uses KVM capability >> "KVM_CAP_AMD_NUM_INT_2K_SUP", which user can query to know if host >> kernel supports 2K MSIs. Similarly, this enables qemu to detect the >> presence of EFR2[HTRangeIgnore] during RAM initialization. >> >> Although current implementation allows 2K MSI support only with >> iommufd, it keeps the logic inside the vfio/iommufd and avoids >> modifying KVM ABI. I am happy to discuss advantages and drawbacks of >> both approaches. >> >> ------------------------------------------------------------------------ >> >> The patches are based on top of bc831f37398b (qemu master). Additionally >> it requires linux kernel with patches[1] which expose control register >> via IOMMU_GET_HW_INFO ioctl. >> >> [1] https://lore.kernel.org/linux-iommu/20251029095846.4486-1- >> [email protected]/ >> >> ------------------------------------------------------------------------ >> >> Sairaj Kodilkar (3): >> vfio/iommufd: Add amd specific hardware info struct to vendor >> capability >> amd_iommu: Add support for extended feature register 2 >> amd_iommu: Add support for upto 2048 interrupts per IRT >> >> Suravee Suthikulpanit (2): >> [DO NOT MERGE] linux-headers: Introduce struct iommu_hw_info_amd >> amd-iommu: Add support for set/unset IOMMU for VFIO PCI devices >> >> hw/i386/acpi-build.c | 4 +- >> hw/i386/amd_iommu-stub.c | 5 + >> hw/i386/amd_iommu.c | 163 +++++++++++++++++++++++++++-- >> hw/i386/amd_iommu.h | 24 +++++ >> include/system/host_iommu_device.h | 1 + >> linux-headers/linux/iommufd.h | 20 ++++ >> 6 files changed, 207 insertions(+), 10 deletions(-) >> >
