> -----Original Message-----
> From: Eric Auger <eric.au...@redhat.com>
> Sent: Friday, June 27, 2025 1:04 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.th...@huawei.com>; qemu-...@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.mayd...@linaro.org; j...@nvidia.com; nicol...@nvidia.com;
> ddut...@redhat.com; berra...@redhat.com; imamm...@redhat.com;
> nath...@nvidia.com; mo...@nvidia.com; smost...@google.com;
> gustavo.rom...@linaro.org; Linuxarm <linux...@huawei.com>; Wangzhou
> (B) <wangzh...@hisilicon.com>; jiangkunkun <jiangkun...@huawei.com>;
> Jonathan Cameron <jonathan.came...@huawei.com>;
> zhangfei....@linaro.org
> Subject: Re: [PATCH v5 06/11] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> Hi Shameer,
> 
> On 6/23/25 11:42 AM, Shameer Kolothum wrote:
> > Currently, pci_setup_iommu() registers IOMMU ops for a given PCIBus.
> > However, when retrieving IOMMU ops for a device using
> > pci_device_get_iommu_bus_devfn(), the function checks the parent_dev
> > and fetches IOMMU ops from the parent device, even if the current
> > bus does not have any associated IOMMU ops.
> >
> > This behavior works for now because QEMU's IOMMU implementations
> are
> > globally scoped, and host bridges rely on the bypass_iommu property
> > to skip IOMMU translation when needed.
> >
> > However, this model will break with the soon to be introduced
> > arm-smmuv3 device, which allows users to associate the IOMMU
> > with a specific PCIe root complex (e.g., the default pcie.0
> > or a pxb-pcie root complex).
> >
> > For example, consider the following setup with multiple root
> > complexes:
> >
> > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 \
> > ...
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,bus=pcie.1 \
> > -device virtio-net-pci,bus=pcie.port1
> >
> > In Qemu, pxb-pcie acts as a special root complex whose parent is
> > effectively the default root complex(pcie.0). Hence, though pcie.1
> > has no associated SMMUv3 as per above,
> pci_device_get_iommu_bus_devfn()
> > will incorrectly return the IOMMU ops from pcie.0 due to the fallback
> > via parent_dev.
> >
> > To fix this, introduce a new helper pci_setup_iommu_per_bus() that
> > explicitly sets the new iommu_per_bus field in the PCIBus structure.
> > Update pci_device_get_iommu_bus_devfn() to use this when determining
> > the correct IOMMU ops, ensuring accurate behavior for per-bus IOMMUs.
> >
> > Signed-off-by: Shameer Kolothum
> <shameerali.kolothum.th...@huawei.com>
> > ---
> > Please refer cover letter for more details on the issue that
> > this is trying to fix.
> > ---
> >  hw/pci/pci.c             | 25 +++++++++++++++++++++++++
> >  include/hw/pci/pci.h     |  2 ++
> >  include/hw/pci/pci_bus.h |  1 +
> >  3 files changed, 28 insertions(+)
> >
> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > index c70b5ceeba..e1940c05d9 100644
> > --- a/hw/pci/pci.c
> > +++ b/hw/pci/pci.c
> > @@ -2909,6 +2909,19 @@ static void
> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> >              }
> >          }
> >
> > +        /*
> > +         * When multiple PCI Express Root Buses are defined using pxb-
> pcie,
> > +         * the IOMMU configuration may be specific to each root bus.
> However,
> > +         * pxb-pcie acts as a special root complex whose parent is
> effectively
> > +         * the default root complex(pcie.0). Ensure that we retrieve the
> I generally leave a space before the opening bracket. Here and elsewhere

Ok.

> > +         * correct IOMMU ops(if any) in such cases.
> > +         */
> > +        if (pci_bus_is_express(iommu_bus) &&
> pci_bus_is_root(iommu_bus)) {
> > +            if (!iommu_bus->iommu_per_bus && parent_bus-
> >iommu_per_bus) {
> > +                break;
> > +            }
> > +        }
> > +
> >          iommu_bus = parent_bus;
> >      }
> >
> > @@ -3169,6 +3182,18 @@ void pci_setup_iommu(PCIBus *bus, const
> PCIIOMMUOps *ops, void *opaque)
> >      bus->iommu_opaque = opaque;
> >  }
> >
> > +/*
> > + * This is same as pci_setup_iommu() except it sets the iommu_per_bus
> > + * to true indicating the iommu is specific to this bus and
> > + * not applicable to any parent or child.
> or child? if there are root ports below, doesn't it protect them as well?

Yeah. It will cover root ports. Actually I meant the root buses and specifically
the pci.0 and pxb-pcie special relationship here. I will update and make it 
clear.

> 
> Besides looks OK to me. However I would encourage you to add pci
> maintainers (MST) in to.

Sure.

> besides,
> Reviewed-by: Eric Auger <eric.au...@redhat.com>

Thanks,
Shameer

 
> 
> Eric
> > + */
> > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops,
> > +                             void *opaque)
> > +{
> > +    pci_setup_iommu(bus, ops, opaque);
> > +    bus->iommu_per_bus = true;
> > +}
> > +
> >  static void pci_dev_get_w64(PCIBus *b, PCIDevice *dev, void *opaque)
> >  {
> >      Range *range = opaque;
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index df3cc7b875..a3e0870a15 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -764,6 +764,8 @@ int pci_iommu_unregister_iotlb_notifier(PCIDevice
> *dev, uint32_t pasid,
> >   */
> >  void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void
> *opaque);
> >
> > +void pci_setup_iommu_per_bus(PCIBus *bus, const PCIIOMMUOps *ops,
> void *opaque);
> > +
> >  pcibus_t pci_bar_address(PCIDevice *d,
> >                           int reg, uint8_t type, pcibus_t size);
> >
> > diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
> > index 2261312546..c738446788 100644
> > --- a/include/hw/pci/pci_bus.h
> > +++ b/include/hw/pci/pci_bus.h
> > @@ -35,6 +35,7 @@ struct PCIBus {
> >      enum PCIBusFlags flags;
> >      const PCIIOMMUOps *iommu_ops;
> >      void *iommu_opaque;
> > +    bool iommu_per_bus;
> >      uint8_t devfn_min;
> >      uint32_t slot_reserved_mask;
> >      pci_set_irq_fn set_irq;
> 

Reply via email to