Hi Shameer, On 6/23/25 11:42 AM, Shameer Kolothum wrote: > Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus. > However, when retrieving IOMMU ops for a device using > pci_device_get_iommu_bus_devfn(), the function checks the parent_dev > and fetches IOMMU ops from the parent device, even if the current > bus does not have any associated IOMMU ops. > > This behavior works for now because QEMU's IOMMU implementations are > globally scoped, and host bridges rely on the bypass_iommu property > to skip IOMMU translation when needed. > > However, this model will break with the soon to be introduced > arm-smmuv3 device, which allows users to associate the IOMMU > with a specific PCIe root complex (e.g., the default pcie.0 > or a pxb-pcie root complex). > > For example, consider the following setup with multiple root > complexes: > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \ > ... > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ > -device pcie-root-port,id=pcie.port1,bus=pcie.1 \ > -device virtio-net-pci,bus=pcie.port1 > > In Qemu, pxb-pcie acts as a special root complex whose parent is > effectively the default root complex(pcie.0). Hence, though pcie.1 > has no associated SMMUv3 as per above, pci_device_get_iommu_bus_devfn() > will incorrectly return the IOMMU ops from pcie.0 due to the fallback > via parent_dev. > > To fix this, introduce a new helper pci_setup_iommu_per_bus() that > explicitly sets the new iommu_per_bus field in the PCIBus structure. > Update pci_device_get_iommu_bus_devfn() to use this when determining > the correct IOMMU ops, ensuring accurate behavior for per-bus IOMMUs. > > Signed-off-by: Shameer Kolothum <shameerali.kolothum.th...@huawei.com> > --- > Please refer cover letter for more details on the issue that > this is trying to fix. > --- > hw/pci/pci.c | 25 +++++++++++++++++++++++++ > include/hw/pci/pci.h | 2 ++ > include/hw/pci/pci_bus.h | 1 + > 3 files changed, 28 insertions(+) > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > index c70b5ceeba..e1940c05d9 100644 > --- a/hw/pci/pci.c > +++ b/hw/pci/pci.c > @@ -2909,6 +2909,19 @@ static void pci_device_get_iommu_bus_devfn(PCIDevice > *dev, > } > } > > + /* > + * When multiple PCI Express Root Buses are defined using pxb-pcie, > + * the IOMMU configuration may be specific to each root bus. However, > + * pxb-pcie acts as a special root complex whose parent is > effectively > + * the default root complex(pcie.0). Ensure that we retrieve the I generally leave a space before the opening bracket. Here and elsewhere > + * correct IOMMU ops(if any) in such cases. > + */ > + if (pci_bus_is_express(iommu_bus) && pci_bus_is_root(iommu_bus)) { > + if (!iommu_bus->iommu_per_bus && parent_bus->iommu_per_bus) { > + break; > + } > + } > + > iommu_bus = parent_bus; > } > > @@ -3169,6 +3182,18 @@ void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps > *ops, void *opaque) > bus->iommu_opaque = opaque; > } > > +/* > + * This is same as pci_setup_iommu() except it sets the iommu_per_bus > + * to true indicating the iommu is specific to this bus and > + * not applicable to any parent or child. or child? if there are root ports below, doesn't it protect them as well?
Besides looks OK to me. However I would encourage you to add pci maintainers (MST) in to. besides, Reviewed-by: Eric Auger <eric.au...@redhat.com> Eric > + */ > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, > + void *opaque) > +{ > + pci_setup_iommu(bus, ops, opaque); > + bus->iommu_per_bus = true; > +} > + > static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque) > { > Range *range = opaque; > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > index df3cc7b875..a3e0870a15 100644 > --- a/include/hw/pci/pci.h > +++ b/include/hw/pci/pci.h > @@ -764,6 +764,8 @@ int pci_iommu_unregister_iotlb_notifier(PCIDevice *dev, > uint32_t pasid, > */ > void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque); > > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, void > *opaque); > + > pcibus_t pci_bar_address(PCIDevice *d, > int reg, uint8_t type, pcibus_t size); > > diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h > index 2261312546..c738446788 100644 > --- a/include/hw/pci/pci_bus.h > +++ b/include/hw/pci/pci_bus.h > @@ -35,6 +35,7 @@ struct PCIBus { > enum PCIBusFlags flags; > const PCIIOMMUOps *iommu_ops; > void *iommu_opaque; > + bool iommu_per_bus; > uint8_t devfn_min; > uint32_t slot_reserved_mask; > pci_set_irq_fn set_irq;