> -----Original Message----- > From: Eric Auger <eric.au...@redhat.com> > Sent: Friday, June 27, 2025 1:04 PM > To: Shameerali Kolothum Thodi > <shameerali.kolothum.th...@huawei.com>; qemu-...@nongnu.org; > qemu-devel@nongnu.org > Cc: peter.mayd...@linaro.org; j...@nvidia.com; nicol...@nvidia.com; > ddut...@redhat.com; berra...@redhat.com; imamm...@redhat.com; > nath...@nvidia.com; mo...@nvidia.com; smost...@google.com; > gustavo.rom...@linaro.org; Linuxarm <linux...@huawei.com>; Wangzhou > (B) <wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>; > Jonathan Cameron <jonathan.came...@huawei.com>; > zhangfei....@linaro.org > Subject: Re: [PATCH v5 06/11] hw/pci: Introduce > pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval > > Hi Shameer, > > On 6/23/25 11:42 AM, Shameer Kolothum wrote: > > Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus. > > However, when retrieving IOMMU ops for a device using > > pci_device_get_iommu_bus_devfn(), the function checks the parent_dev > > and fetches IOMMU ops from the parent device, even if the current > > bus does not have any associated IOMMU ops. > > > > This behavior works for now because QEMU's IOMMU implementations > are > > globally scoped, and host bridges rely on the bypass_iommu property > > to skip IOMMU translation when needed. > > > > However, this model will break with the soon to be introduced > > arm-smmuv3 device, which allows users to associate the IOMMU > > with a specific PCIe root complex (e.g., the default pcie.0 > > or a pxb-pcie root complex). > > > > For example, consider the following setup with multiple root > > complexes: > > > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \ > > ... > > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ > > -device pcie-root-port,id=pcie.port1,bus=pcie.1 \ > > -device virtio-net-pci,bus=pcie.port1 > > > > In Qemu, pxb-pcie acts as a special root complex whose parent is > > effectively the default root complex(pcie.0). Hence, though pcie.1 > > has no associated SMMUv3 as per above, > pci_device_get_iommu_bus_devfn() > > will incorrectly return the IOMMU ops from pcie.0 due to the fallback > > via parent_dev. > > > > To fix this, introduce a new helper pci_setup_iommu_per_bus() that > > explicitly sets the new iommu_per_bus field in the PCIBus structure. > > Update pci_device_get_iommu_bus_devfn() to use this when determining > > the correct IOMMU ops, ensuring accurate behavior for per-bus IOMMUs. > > > > Signed-off-by: Shameer Kolothum > <shameerali.kolothum.th...@huawei.com> > > --- > > Please refer cover letter for more details on the issue that > > this is trying to fix. > > --- > > hw/pci/pci.c | 25 +++++++++++++++++++++++++ > > include/hw/pci/pci.h | 2 ++ > > include/hw/pci/pci_bus.h | 1 + > > 3 files changed, 28 insertions(+) > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > index c70b5ceeba..e1940c05d9 100644 > > --- a/hw/pci/pci.c > > +++ b/hw/pci/pci.c > > @@ -2909,6 +2909,19 @@ static void > pci_device_get_iommu_bus_devfn(PCIDevice *dev, > > } > > } > > > > + /* > > + * When multiple PCI Express Root Buses are defined using pxb- > pcie, > > + * the IOMMU configuration may be specific to each root bus. > However, > > + * pxb-pcie acts as a special root complex whose parent is > effectively > > + * the default root complex(pcie.0). Ensure that we retrieve the > I generally leave a space before the opening bracket. Here and elsewhere
Ok. > > + * correct IOMMU ops(if any) in such cases. > > + */ > > + if (pci_bus_is_express(iommu_bus) && > pci_bus_is_root(iommu_bus)) { > > + if (!iommu_bus->iommu_per_bus && parent_bus- > >iommu_per_bus) { > > + break; > > + } > > + } > > + > > iommu_bus = parent_bus; > > } > > > > @@ -3169,6 +3182,18 @@ void pci_setup_iommu(PCIBus *bus, const > PCIIOMMUOps *ops, void *opaque) > > bus->iommu_opaque = opaque; > > } > > > > +/* > > + * This is same as pci_setup_iommu() except it sets the iommu_per_bus > > + * to true indicating the iommu is specific to this bus and > > + * not applicable to any parent or child. > or child? if there are root ports below, doesn't it protect them as well? Yeah. It will cover root ports. Actually I meant the root buses and specifically the pci.0 and pxb-pcie special relationship here. I will update and make it clear. > > Besides looks OK to me. However I would encourage you to add pci > maintainers (MST) in to. Sure. > besides, > Reviewed-by: Eric Auger <eric.au...@redhat.com> Thanks, Shameer > > Eric > > + */ > > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, > > + void *opaque) > > +{ > > + pci_setup_iommu(bus, ops, opaque); > > + bus->iommu_per_bus = true; > > +} > > + > > static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque) > > { > > Range *range = opaque; > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h > > index df3cc7b875..a3e0870a15 100644 > > --- a/include/hw/pci/pci.h > > +++ b/include/hw/pci/pci.h > > @@ -764,6 +764,8 @@ int pci_iommu_unregister_iotlb_notifier(PCIDevice > *dev, uint32_t pasid, > > */ > > void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void > *opaque); > > > > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops, > void *opaque); > > + > > pcibus_t pci_bar_address(PCIDevice *d, > > int reg, uint8_t type, pcibus_t size); > > > > diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h > > index 2261312546..c738446788 100644 > > --- a/include/hw/pci/pci_bus.h > > +++ b/include/hw/pci/pci_bus.h > > @@ -35,6 +35,7 @@ struct PCIBus { > > enum PCIBusFlags flags; > > const PCIIOMMUOps *iommu_ops; > > void *iommu_opaque; > > + bool iommu_per_bus; > > uint8_t devfn_min; > > uint32_t slot_reserved_mask; > > pci_set_irq_fn set_irq; >