Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus. However, when retrieving IOMMU ops for a device using pci_device_get_iommu_bus_devfn(), the function checks the parent_dev and fetches IOMMU ops from the parent device, even if the current bus does not have any associated IOMMU ops.
This behavior works for now because QEMU's IOMMU implementations are globally scoped, and host bridges rely on the bypass_iommu property to skip IOMMU translation when needed. However, this model will break with the soon to be introduced arm-smmuv3 device, which allows users to associate the IOMMU with a specific PCIe root complex (e.g., the default pcie.0 or a pxb-pcie root complex). For example, consider the following setup with multiple root complexes: -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \ ... -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device pcie-root-port,id=pcie.port1,bus=pcie.1 \ -device virtio-net-pci,bus=pcie.port1 In Qemu, pxb-pcie acts as a special root complex whose parent is effectively the default root complex(pcie.0). Hence, though pcie.1 has no associated SMMUv3 as per above, pci_device_get_iommu_bus_devfn() will incorrectly return the IOMMU ops from pcie.0 due to the fallback via parent_dev. To fix this, introduce a new helper pci_setup_iommu_per_bus() that explicitly sets the new iommu_per_bus field in the PCIBus structure. Update pci_device_get_iommu_bus_devfn() to use this when determining the correct IOMMU ops, ensuring accurate behavior for per-bus IOMMUs. Signed-off-by: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> --- Please refer cover letter for more details on the issue that this is trying to fix. --- hw/pci/pci.c | 25 +++++++++++++++++++++++++ include/hw/pci/pci.h | 2 ++ include/hw/pci/pci_bus.h | 1 + 3 files changed, 28 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index c70b5ceeba..e1940c05d9 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2909,6 +2909,19 @@ static void pci_device_get_iommu_bus_devfn(PCIDevice *dev, } } + /* + * When multiple PCI Express Root Buses are defined using pxb-pcie, + * the IOMMU configuration may be specific to each root bus. However, + * pxb-pcie acts as a special root complex whose parent is effectively + * the default root complex(pcie.0). Ensure that we retrieve the + * correct IOMMU ops(if any) in such cases. + */ + if (pci_bus_is_express(iommu_bus) && pci_bus_is_root(iommu_bus)) { + if (!iommu_bus->iommu_per_bus && parent_bus->iommu_per_bus) { + break; + } + } + iommu_bus = parent_bus; } @@ -3169,6 +3182,18 @@ void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque) bus->iommu_opaque = opaque; } +/* + * This is same as pci_setup_iommu() except it sets the iommu_per_bus + * to true indicating the iommu is specific to this bus and + * not applicable to any parent or child. + */ +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, + void *opaque) +{ + pci_setup_iommu(bus, ops, opaque); + bus->iommu_per_bus = true; +} + static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque) { Range *range = opaque; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index df3cc7b875..a3e0870a15 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -764,6 +764,8 @@ int pci_iommu_unregister_iotlb_notifier(PCIDevice *dev, uint32_t pasid, */ void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque); +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque); + pcibus_t pci_bar_address(PCIDevice *d, int reg, uint8_t type, pcibus_t size); diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h index 2261312546..c738446788 100644 --- a/include/hw/pci/pci_bus.h +++ b/include/hw/pci/pci_bus.h @@ -35,6 +35,7 @@ struct PCIBus { enum PCIBusFlags flags; const PCIIOMMUOps *iommu_ops; void *iommu_opaque; + bool iommu_per_bus; uint8_t devfn_min; uint32_t slot_reserved_mask; pci_set_irq_fn set_irq; -- 2.34.1