> -----Original Message-----
> From: Nicolin Chen <nicol...@nvidia.com>
> Sent: Thursday, July 10, 2025 5:56 PM
> To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>
> Cc: Donald Dutile <ddut...@redhat.com>; qemu-...@nongnu.org; qemu-
> de...@nongnu.org; eric.au...@redhat.com; peter.mayd...@linaro.org;
> j...@nvidia.com; berra...@redhat.com; imamm...@redhat.com;
> nath...@nvidia.com; mo...@nvidia.com; smost...@google.com;
> gustavo.rom...@linaro.org; m...@redhat.com;
> marcel.apfelb...@gmail.com; Linuxarm <linux...@huawei.com>;
> Wangzhou (B) <wangzh...@hisilicon.com>; jiangkunkun
> <jiangkun...@huawei.com>; Jonathan Cameron
> <jonathan.came...@huawei.com>; zhangfei....@linaro.org
> Subject: Re: [PATCH v7 07/12] hw/pci: Introduce
> pci_setup_iommu_per_bus() for per-bus IOMMU ops retrieval
> 
> On Thu, Jul 10, 2025 at 04:21:41PM +0000, Shameerali Kolothum Thodi
> wrote:
> > > >> On Wed, Jul 09, 2025 at 08:20:35AM +0000, Shameerali Kolothum
> Thodi
> > > >> wrote:
> > > >>>> On Tue, Jul 08, 2025 at 04:40:50PM +0100, Shameer Kolothum
> wrote:
> > > >>>>> @@ -2909,6 +2909,19 @@ static void
> > > >>>> pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> > > >>>>>               }
> > > >>>>>           }
> > > >>>>>
> > > >>>>> +        /*
> > > >>>>> +         * When multiple PCI Express Root Buses are defined using
> > > >>>>> + pxb-
> > > >>>> pcie,
> > > >>>>> +         * the IOMMU configuration may be specific to each root
> bus.
> > > >>>> However,
> > > >>>>> +         * pxb-pcie acts as a special root complex whose parent
> > > >>>>> + is
> > > >>>> effectively
> > > >>>>> +         * the default root complex(pcie.0). Ensure that we 
> > > >>>>> retrieve
> the
> > > >>>>> +         * correct IOMMU ops(if any) in such cases.
> > > >>>>> +         */
> > > >>>>> +        if (pci_bus_is_express(iommu_bus) &&
> > > >>>> pci_bus_is_root(iommu_bus)) {
> > > >>>>> +            if (!iommu_bus->iommu_per_bus && parent_bus-
> > > >>>>> iommu_per_bus) {
> > > >>>>> +                break;
> > > >>>>
> > > >>>> Mind elaborating why the current bus must unset iommu_per_bus
> > > >> while
> > > >>>> its parent sets iommu_per_bus?
> > > >>>>
> > > >>>> My understanding is that for a pxb-pcie we should set
> > > iommu_per_bus
> > > >>>> but for its parent (the default root complex) we should unset its
> > > >>>> iommu_per_bus?
> > > >>>
> > > >>> Well, for new arm-smmuv3 dev you need an associated pcie root
> > > >>> complex. Either the default pcie.0 or a pxb-pcie one. And as I
> > > >>> mentioned in my reply to the other thread(patch #2) and commit
> log
> > > >> here,
> > > >>> the pxb-pcie is special extra root complex in Qemu which has pcie.0
> > > >>> has parent bus.
> > > >>>
> > > >>> The above pci_device_get_iommu_bus_devfn() at present, iterate
> over
> > > >> the
> > > >>> parent_dev if it is set and returns the parent_bus IOMMU ops even
> if
> > > >>> the associated pxb-pcie bus doesn't have any IOMMU. This creates
> > > >>> problem for a case that is described here in the cover letter here,
> > > >>> https://lore.kernel.org/qemu-devel/20250708154055.101012-1-
> > > >> shameerali.kolothum.th...@huawei.com/
> > > >>>
> > > >>> (Please see "Major changes from v4:" section)
> > > >>>
> > > >>> To address that issue, this patch introduces an new helper function
> > > >>> to
> > > >> specify that
> > > >>> the IOMMU ops are specific to the associated root
> > > >> complex(iommu_per_bus) and
> > > >>> use that to return the correct IOMMU ops.
> > > >>>
> > > >>> Hope with that context it is clear now.
> > > >>
> > > >> Hmm, I was not questioning the context, I get what the patch is
> > > >> supposed to do.
> > > >>
> > > >> I was asking the logic that is unclear to me why it breaks when:
> > > >>      !pxb-pcie->iommu_per_bus && pcie.0->iommu_per_bus
> > > >>
> > > >> Or in which case pcie.0 would be set to iommu_per_bus=true?
> > > >
> > > > Yes. Consider the example I gave in cover  letter,
> > > >
> > > > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ -device
> > > > virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \ -device
> > > > pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \ -device
> > > > arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \ -device
> > > > pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \ -device
> > > > virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1
> > > >
> > > > pcie.0 is behind new SMMUv3 dev(smmuv3.1) and has
> iommu_per_bus
> > > set.
> > > > pcie.1 has no SMMv3U and iommu_per_bus is not set for it.
> > > pcie.1 doesn't?   then what is this line saying/meaning?:
> > >   -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.2 \
> > >
> > > I read that as an smmuv3 attached to pcie.1, with an id of smmuv3.2;
> just
> > > as I read this config:
> > >   -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \ as an smmuv3
> > > attached to pcie.0 iwth id smmuv3.1
> >
> > Oops..I forgot to delete that from the config:
> > This is what I meant,
> >
> > -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.1 \
> > -device virtio-net-pci,bus=pcie.0,netdev=net0,id=virtionet.0 \
> > -device pxb-pcie,id=pcie.1,bus_nr=8,bus=pcie.0 \
> > -device pcie-root-port,id=pcie.port1,chassis=2,bus=pcie.1 \
> > -device virtio-net-pci,bus=pcie.port1,netdev=net1,id=virtionet.1 \
> 
> So, the logic is trying to avoid:
>         "iommu_bus = parent_bus;"
> for a case that parent_bus (pcie.0) having its own IOMMU.
> 
> But shouldn't it be just "if (parent_bus->iommu_per_bus)"?
> 
> Why does the current iommu_bus->iommu_per_bus has to be unset?

I think that !iommu_bus->iommu_per_bus check will be always true as 
it enters the while loop only if !iommu_bus->iommu_ops case,

while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {

}

So yes, I think that can be removed. I will double check though as I cant
recollect why I added that now.

> 
> I think "iommu_bus = parent_bus" should be avoided too even if
> the current iommu_bus has its own IOMMU, i.e. iommu_per_bus is
> set?

Why? Not clear to me. It only enters the loop if the current iommu_bus
doesn't have Iommu_ops set which in turn means iommu_per_bus is not set. 
Isn't it? . Do you have a particular configuration in mind where it will fail
otherwise?

Thanks,
Shameer

Reply via email to