On Fri, Apr 26, 2024 at 03:10:14AM +0000, Duan, Zhenzhong wrote: > > > >-----Original Message----- > >From: Cédric Le Goater <[email protected]> > >Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do > >compatibility check with host IOMMU cap/ecap > > > >On 4/25/24 10:46, Duan, Zhenzhong wrote: > >> Hi Cédric, > >> > >>> -----Original Message----- > >>> From: Cédric Le Goater <[email protected]> > >>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do > >>> compatibility check with host IOMMU cap/ecap > >>> > >>> Hello Zhenzhong, > >>> > >>> On 4/18/24 10:42, Duan, Zhenzhong wrote: > >>>> Hi Cédric, > >>>> > >>>>> -----Original Message----- > >>>>> From: Cédric Le Goater <[email protected]> > >>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do > >>>>> compatibility check with host IOMMU cap/ecap > >>>>> > >>>>> Hello Zhenzhong > >>>>> > >>>>> On 4/17/24 11:24, Duan, Zhenzhong wrote: > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Cédric Le Goater <[email protected]> > >>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do > >>>>>>> compatibility check with host IOMMU cap/ecap > >>>>>>> > >>>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>>> -----Original Message----- > >>>>>>>>> From: Cédric Le Goater <[email protected]> > >>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do > >>>>>>>>> compatibility check with host IOMMU cap/ecap > >>>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote: > >>>>>>>>>> Hi Cédric, > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: Cédric Le Goater <[email protected]> > >>>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to > >do > >>>>>>>>>>> compatibility check with host IOMMU cap/ecap > >>>>>>>>>>> > >>>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote: > >>>>>>>>>>>> From: Yi Liu <[email protected]> > >>>>>>>>>>>> > >>>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device) > >>> should > >>>>>>> not > >>>>>>>>>>>> be passed to guest. > >>>>>>>>>>>> > >>>>>>>>>>>> Implementation details for different backends will be in > >>> following > >>>>>>>>> patches. > >>>>>>>>>>>> > >>>>>>>>>>>> Signed-off-by: Yi Liu <[email protected]> > >>>>>>>>>>>> Signed-off-by: Yi Sun <[email protected]> > >>>>>>>>>>>> Signed-off-by: Zhenzhong Duan <[email protected]> > >>>>>>>>>>>> --- > >>>>>>>>>>>> hw/i386/intel_iommu.c | 35 > >>>>>>>>>>> +++++++++++++++++++++++++++++++++++ > >>>>>>>>>>>> 1 file changed, 35 insertions(+) > >>>>>>>>>>>> > >>>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > >>>>>>>>>>>> index 4f84e2e801..a49b587c73 100644 > >>>>>>>>>>>> --- a/hw/i386/intel_iommu.c > >>>>>>>>>>>> +++ b/hw/i386/intel_iommu.c > >>>>>>>>>>>> @@ -35,6 +35,7 @@ > >>>>>>>>>>>> #include "sysemu/kvm.h" > >>>>>>>>>>>> #include "sysemu/dma.h" > >>>>>>>>>>>> #include "sysemu/sysemu.h" > >>>>>>>>>>>> +#include "sysemu/iommufd.h" > >>>>>>>>>>>> #include "hw/i386/apic_internal.h" > >>>>>>>>>>>> #include "kvm/kvm_i386.h" > >>>>>>>>>>>> #include "migration/vmstate.h" > >>>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace > >>>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, > >>>>>>>>>>>> return vtd_dev_as; > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s, > >>>>>>>>>>>> + HostIOMMUDevice *hiod, > >>>>>>>>>>>> + Error **errp) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + return 0; > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s, > >>>>>>>>>>>> + HostIOMMUDevice *hiod, > >>>>>>>>>>>> + Error **errp) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + return 0; > >>>>>>>>>>>> +} > >>>>>>>>>>>> + > >>>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s, > >>>>>>>>> VTDHostIOMMUDevice > >>>>>>>>>>> *vtd_hdev, > >>>>>>>>>>>> + Error **errp) > >>>>>>>>>>>> +{ > >>>>>>>>>>>> + HostIOMMUDevice *hiod = vtd_hdev->dev; > >>>>>>>>>>>> + > >>>>>>>>>>>> + if (object_dynamic_cast(OBJECT(hiod), > >>> TYPE_HIOD_IOMMUFD)) > >>>>> { > >>>>>>>>>>>> + return vtd_check_iommufd_hdev(s, hiod, errp); > >>>>>>>>>>>> + } > >>>>>>>>>>>> + > >>>>>>>>>>>> + return vtd_check_legacy_hdev(s, hiod, errp); > >>>>>>>>>>>> +} > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> I think we should be using the .get_host_iommu_info() class > >>> handler > >>>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on > >>>>>>>>>>> the type ? > >>>>>>>>>> > >>>>>>>>>> There is some difficulty ini avoiding this check, the behavior of > >>>>>>>>> vtd_check_legacy_hdev > >>>>>>>>>> and vtd_check_iommufd_hdev are different especially after > >>> nesting > >>>>>>>>> support introduced. > >>>>>>>>>> vtd_check_iommufd_hdev() has much wider check over > >cap/ecap > >>> bits > >>>>>>>>> besides aw_bits. > >>>>>>>>> > >>>>>>>>> I think it is important to fully separate the vIOMMU model from > >the > >>>>>>>>> host IOMMU backing device. > >>>>> > >>>>> This comment is true for the structures also. > >>>>> > >>>>>>>>> Could we introduce a new HostIOMMUDeviceClass > >>>>>>>>> handler .check_hdev() handler, which would > >>>>> call .get_host_iommu_info() ? > >>>>> > >>>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO > >should > >>> be > >>>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes > >>>>> for the different backends. Each .get_host_iommu_info() > >implementation > >>>>> would translate the specific host iommu device data presentation > >>>>> into the common 'HostIOMMUDeviceInfo', this is true for > >host_aw_bits. > >>>> > >>>> I see, it's just not easy to define the unified elements in > >>> HostIOMMUDeviceInfo > >>>> so that they maps to bits or fields in host return IOMMU info. > >>> > >>> The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a > >>> new > >>> API needs to be completely defined for it. The IOMMU backend > >>> implementation > >>> could be anything, legacy, iommufd, iommufd v2, some other framework > >>> and > >>> the vIOMMU shouldn't be aware of its implementation. > >>> > >>> Exposing the kernel structures as done below should be avoided because > >>> they are part of the QEMU <-> kernel IOMMUFD interface. > >>> > >>> > >>>> Different platform returned host IOMMU info is platform specific. > >>>> For vtd and siommu: > >>>> > >>>> struct iommu_hw_info_vtd { > >>>> __u32 flags; > >>>> __u32 __reserved; > >>>> __aligned_u64 cap_reg; > >>>> __aligned_u64 ecap_reg; > >>>> }; > >>>> > >>>> struct iommu_hw_info_arm_smmuv3 { > >>>> __u32 flags; > >>>> __u32 __reserved; > >>>> __u32 idr[6]; > >>>> __u32 iidr; > >>>> __u32 aidr; > >>>> }; > >>>> > >>>> I can think of two kinds of declaration of HostIOMMUDeviceInfo: > >>>> > >>>> struct HostIOMMUDeviceInfo { > >>>> uint8_t aw_bits; > >>>> enum iommu_hw_info_type type; > >>>> union { > >>>> struct iommu_hw_info_vtd vtd; > >>>> struct iommu_hw_info_arm_smmuv3; > >>>> ...... > >>>> } data; > >>>> } > >>>> > >>>> or > >>>> > >>>> struct HostIOMMUDeviceInfo { > >>>> uint8_t aw_bits; > >>>> enum iommu_hw_info_type type; > >>>> __u32 flags; > >>>> __aligned_u64 cap_reg; > >>>> __aligned_u64 ecap_reg; > >>>> __u32 idr[6]; > >>>> __u32 iidr; > >>>> __u32 aidr; > >>>> ...... > >>>> } > >>>> > >>>> Not clear if any is your expected format. > >>>> > >>>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo' > >>>>> type attribute and host iommu device type definitions, or as you > >>>>> suggested with a QOM interface. This is more complex however. In > >>>>> this case, I would suggest to implement a .compatible() handler to > >>>>> compare the host iommu device type with the vIOMMU type. > >>>>> > >>>>> The resulting check_hdev routine would look something like : > >>>>> > >>>>> static int vtd_check_hdev(IntelIOMMUState *s, > >VTDHostIOMMUDevice > >>>>> *vtd_hdev, > >>>>> Error **errp) > >>>>> { > >>>>> HostIOMMUDevice *hiod = vtd_hdev->dev; > >>>>> HostIOMMUDeviceClass *hiodc = > >>>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod); > >>>>> HostIOMMUDevice info; > >>>>> int host_aw_bits, ret; > >>>>> > >>>>> ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp); > >>>>> if (ret) { > >>>>> return ret; > >>>>> } > >>>>> > >>>>> ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s)); > >>>>> if (ret) { > >>>>> return ret; > >>>>> } > >>>>> > >>>>> if (s->aw_bits > info.aw_bits) { > >>>>> error_setg(errp, "aw-bits %d > host aw-bits %d", > >>>>> s->aw_bits, info.aw_bits); > >>>>> return -EINVAL; > >>>>> } > >>>>> } > >>>>> > >>>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a > >>>>> vIOMMUInterface::compatible() handler simply returning > >>>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ? > >>>> > >>>> Not quite get what HostIOMMUDeviceClass::is_compatible() does. > >>> > >>> HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU > >backend > >>> to determine which IOMMU types are exposed by the host, then calls the > >>> vIOMMUInterface::compatible() handler to do the compare. API is to be > >>> defined. > >>> > >>> As a refinement, we could introduce in the vIOMMU <-> > >HostIOMMUDevice > >>> interface capabilities, or features, to check more precisely the level > >>> of compatibility between the vIOMMU and the host IOMMU device. This > >is > >>> similar to what is done between QEMU and KVM. > >>> > >>> If you think this is too complex, include type in HostIOMMUDeviceInfo. > >>> > >>>> Currently legacy and IOMMUFD host device has different check logic, > >how > >>> it can help > >>>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() > >into > >>> a single vtd_check_hdev()? > >>> > >>> IMHO, IOMMU shouldn't be aware of the IOMMU backend > >implementation, > >>> but > >>> if you think the Intel vIOMMU should access directly the iommufd > >backend > >>> when available, then we should drop this proposal and revisit the design > >>> to take a different approach. > >> > >> I implemented a draft following your suggestions so we could explore > >further. > >> See > >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_pre > >q_v3_tmp > >> > >> In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx > >> just like KVM CAPs. > >> A common HostIOMMUDeviceCaps structure is introduced to be used by > >> both legacy and iommufd backend. > >> > >> It indeed is cleaner. Only problem is I failed to implement .compatible() > >> as all the check could go ahead by just calling check_cap(). > >> Could you help a quick check to see if I misunderstood any of your > >suggestion? > > > >Thanks for the changes. It looks cleaner and simpler ! Some comments, > > > >* HostIOMMUDeviceIOMMUFDClass seems useless as it is empty. I don't > > remember if you told me already you had plans for future changes. > > Sorry about that if this is the case. I forgot. > > Never mind😊, reason is: > > In nesting series > https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_nesting_rfcv2/ > This commit > https://github.com/yiliu1765/qemu/commit/581fc900aa296988eaa48abee6d68d3670faf8c9 > implement [at|de]tach_hwpt handlers. > > So I add an extra layer of abstract HostIOMMUDeviceIOMMUFDClass to define > [at|de]tach_hwpt handlers. > > > > >* I would use the 'host_iommu_device_' prefix for external routines > > which are part of the HostIOMMUDevice API and use 'hiod_' for > > internal routines where it makes sense, to limit the name length for > > instance. > > Good idea, will do. > > > > >* I would rename HOST_IOMMU_DEVICE_CAP_IOMMUFD_V1 to > > HOST_IOMMU_DEVICE_CAP_IOMMUFD. I mentioned IOMMUFD v2 as a > > theoretical example of a different IOMMU interface. I don't think we > > need to anticipate yet :) > > Will do. > > > > >* HostIOMMUDeviceCaps is using 'enum iommu_hw_info_type' from > > 'linux/iommufd.h', that's not my preferred choice but I won't > > object. The result looks good. > > Ok, will keep it for now. We can change when you want in future. > > > > >* HostIOMMUDevice now has a realize() routine to query the host IOMMU > > capability for later usage. This is a good idea. However, you could > > change the return value to bool and avoid the ERRP_GUARD() prologue. > > Will do. > > > > >* Beware of : > > > > struct Range { > > /* > > * Do not access members directly, use the functions! > > * A non-empty range has @lob <= @upb. > > * An empty range has @lob == @upb + 1. > > */ > > uint64_t lob; /* inclusive lower bound */ > > uint64_t upb; /* inclusive upper bound */ > > }; > > I remember😊, will add the change in formal version. > > > > > > >* I think you could introduce a new VFIOIOMMUClass attribute. Let's > > call it VFIOIOMMUClass::hiod_typename. The creation of > >HostIOMMUDevice > > would become generic and you could move : > > > > hiod= > >HOST_IOMMU_DEVICE(object_new(TYPE_HOST_IOMMU_DEVICE_LEGACY_V > >FIO)); > > HOST_IOMMU_DEVICE_GET_CLASS(hiod)->realize(hiod, vbasedev, errp); > > if (*errp) { > > object_unref(hiod); > > return -EINVAL; > > } > > vbasedev->hiod = hiod; > > > > at the end of vfio_attach_device(). > > Good suggestion! Will do. > > Thanks > Zhenzhong
So I'm expecting v3 of this.
