>-----Original Message-----
>From: Steven Sistare <[email protected]>
>Subject: Re: [PATCH V3 37/42] vfio/iommufd: reconstruct device
>
>On 5/16/2025 6:22 AM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steve Sistare <[email protected]>
>>> Subject: [PATCH V3 37/42] vfio/iommufd: reconstruct device
>>>
>>> Reconstruct userland device state after CPR. During vfio_realize, skip
>>> all ioctls that configure the device, as it was already configured in old
>>> QEMU.
>>>
>>> Save the ioas_id in vmstate, and skip its allocation in vfio_realize.
>>> Because we skip ioctl's, it is not needed at realize time. However, we do
>>> need the range info, so defer the call to iommufd_cdev_get_info_iova_range
>>> to a post_load handler, at which time the ioas_id is known.
>>>
>>> This reconstruction is not complete. hwpt_id and devid need special
>>> treatment, handled in subsequent patches.
>>>
>>> Signed-off-by: Steve Sistare <[email protected]>
>>> ---
>>> hw/vfio/cpr-iommufd.c | 8 ++++++++
>>> hw/vfio/iommufd.c | 17 +++++++++++++++++
>>> 2 files changed, 25 insertions(+)
>>>
>>> diff --git a/hw/vfio/cpr-iommufd.c b/hw/vfio/cpr-iommufd.c
>>> index b760bd3..3d430f0 100644
>>> --- a/hw/vfio/cpr-iommufd.c
>>> +++ b/hw/vfio/cpr-iommufd.c
>>> @@ -31,6 +31,13 @@ static int vfio_container_post_load(void *opaque, int
>>> version_id)
>>> VFIOIOMMUFDContainer *container = opaque;
>>> VFIOContainerBase *bcontainer = &container->bcontainer;
>>> VFIODevice *vbasedev;
>>> + Error *err = NULL;
>>> + uint32_t ioas_id = container->ioas_id;
>>> +
>>> + if (!iommufd_cdev_get_info_iova_range(container, ioas_id, &err)) {
>>> + error_report_err(err);
>>> + return -1;
>>> + }
>>>
>>> QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
>>> vbasedev->cpr.reused = false;
>>> @@ -47,6 +54,7 @@ static const VMStateDescription vfio_container_vmstate
>= {
>>> .post_load = vfio_container_post_load,
>>> .needed = cpr_needed_for_reuse,
>>> .fields = (VMStateField[]) {
>>> + VMSTATE_UINT32(ioas_id, VFIOIOMMUFDContainer),
>>> VMSTATE_END_OF_LIST()
>>> }
>>> };
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 046f601..c49a7e7 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -122,6 +122,10 @@ static bool
>>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>>> goto err_kvm_device_add;
>>> }
>>>
>>> + if (vbasedev->cpr.reused) {
>>> + goto skip_bind;
>>> + }
>>> +
>>> /* Bind device to iommufd */
>>> bind.iommufd = iommufd->fd;
>>> if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) {
>>> @@ -133,6 +137,8 @@ static bool
>>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>>> vbasedev->devid = bind.out_devid;
>>> trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
>>> vbasedev->fd, vbasedev->devid);
>>> +
>>> +skip_bind:
>>> return true;
>>> err_bind:
>>> iommufd_cdev_kvm_device_del(vbasedev);
>>> @@ -580,6 +586,11 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>> }
>>> }
>>>
>>> + if (vbasedev->cpr.reused) {
>>> + ioas_id = -1; /* ioas_id will be received from vmstate */
>>> + goto skip_ioas_alloc;
>>> + }
>>> +
>>> /* Need to allocate a new dedicated container */
>>> if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp)) {
>>> goto err_alloc_ioas;
>>> @@ -587,6 +598,7 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>>
>>> trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
>>>
>>> +skip_ioas_alloc:
>>> container =
>>> VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
>>> container->be = vbasedev->iommufd;
>>> container->ioas_id = ioas_id;
>>> @@ -605,6 +617,10 @@ static bool iommufd_cdev_attach(const char *name,
>>> VFIODevice *vbasedev,
>>> goto err_discard_disable;
>>> }
>>>
>>> + if (vbasedev->cpr.reused) {
>>> + goto skip_info;
>>
>> I suspect this will break virtio-iommu, see virtio_iommu_set_iommu_device().
>> When virtio-iommu try to get host_iova_ranges, it's not ready until post
>> load.
>
>Thanks, I'll look into it.
>Can you give me a clue or a pointer on command line options to set this up?
-device virtio-iommu-pci \
-device vfio-pci,host=0000:01:00.0 \
-trace virtio_iommu_host_resv_regions
The vfio device needs to have reserved region, then diff the trace between old
and new qemu can show us if reserved region is lost in new qemu.