>-----Original Message----- >From: Steven Sistare <steven.sist...@oracle.com> >Subject: Re: [PATCH V5 33/38] vfio/iommufd: reconstruct device > >On 6/25/2025 7:40 AM, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Steve Sistare <steven.sist...@oracle.com> >>> Subject: [PATCH V5 33/38] vfio/iommufd: reconstruct device >>> >>> Reconstruct userland device state after CPR. During vfio_realize, skip all >>> ioctls that configure the device, as it was already configured in old QEMU. >>> >>> Skip bind, and use the devid from CPR state. >>> >>> Skip allocation of, and attachment to, ioas_id. Recover ioas_id from CPR >>> state, and use it to find a matching container, if any, before creating a >>> new one. >>> >>> This reconstruction is not complete. hwpt_id is handled in a subsequent >>> patch. >>> >>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>> --- >>> hw/vfio/iommufd.c | 30 ++++++++++++++++++++++++++++-- >>> 1 file changed, 28 insertions(+), 2 deletions(-) >>> >>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c >>> index f0d57ea..a650517 100644 >>> --- a/hw/vfio/iommufd.c >>> +++ b/hw/vfio/iommufd.c >>> @@ -25,6 +25,7 @@ >>> #include "system/reset.h" >>> #include "qemu/cutils.h" >>> #include "qemu/chardev_open.h" >>> +#include "migration/cpr.h" >>> #include "pci.h" >>> #include "vfio-iommufd.h" >>> #include "vfio-helpers.h" >>> @@ -121,6 +122,10 @@ static bool >>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp) >>> goto err_kvm_device_add; >>> } >>> >>> + if (cpr_is_incoming()) { >>> + goto skip_bind; >>> + } >>> + >>> /* Bind device to iommufd */ >>> bind.iommufd = iommufd->fd; >>> if (ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind)) { >>> @@ -132,6 +137,8 @@ static bool >>> iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp) >>> vbasedev->devid = bind.out_devid; >>> trace_iommufd_cdev_connect_and_bind(bind.iommufd, >vbasedev->name, >>> vbasedev->fd, >vbasedev->devid); >>> + >>> +skip_bind: >> >> I'm not sure if we should take above trace for CPR.. > >My thinking is: on cpr, we do not connect or bind, so we should not log it. >iommufd_backend_connect() is called but it just reuses a cpr fd, and we >can observe the latter with cpr traces.
OK. > >>> return true; >>> err_bind: >>> iommufd_cdev_kvm_device_del(vbasedev); >>> @@ -421,7 +428,9 @@ static bool >iommufd_cdev_attach_container(VFIODevice >>> *vbasedev, >>> return iommufd_cdev_autodomains_get(vbasedev, container, >errp); >>> } >>> >>> - return !iommufd_cdev_attach_ioas_hwpt(vbasedev, >container->ioas_id, errp); >>> + /* If CPR, we are already attached to ioas_id. */ >>> + return cpr_is_incoming() || >>> + !iommufd_cdev_attach_ioas_hwpt(vbasedev, >container->ioas_id, errp); >>> } >>> >>> static void iommufd_cdev_detach_container(VFIODevice *vbasedev, >>> @@ -510,6 +519,7 @@ static bool iommufd_cdev_attach(const char >*name, >>> VFIODevice *vbasedev, >>> VFIOAddressSpace *space; >>> struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) }; >>> int ret, devfd; >>> + bool res; >>> uint32_t ioas_id; >>> Error *err = NULL; >>> const VFIOIOMMUClass *iommufd_vioc = >>> @@ -540,7 +550,16 @@ static bool iommufd_cdev_attach(const char >*name, >>> VFIODevice *vbasedev, >>> vbasedev->iommufd != container->be) { >>> continue; >>> } >>> - if (!iommufd_cdev_attach_container(vbasedev, container, &err)) >{ >>> + >>> + if (!cpr_is_incoming()) { >>> + res = iommufd_cdev_attach_container(vbasedev, >container, &err); >>> + } else if (vbasedev->cpr.ioas_id == container->ioas_id) { >>> + res = true; >>> + } else { >>> + continue; >>> + } >>> + >>> + if (!res) { >>> const char *msg = error_get_pretty(err); >>> >>> >trace_iommufd_cdev_fail_attach_existing_container(msg); >>> @@ -557,6 +576,11 @@ static bool iommufd_cdev_attach(const char >*name, >>> VFIODevice *vbasedev, >>> } >>> } >>> >>> + if (cpr_is_incoming()) { >>> + ioas_id = vbasedev->cpr.ioas_id; >>> + goto skip_ioas_alloc; >>> + } >>> + >>> /* Need to allocate a new dedicated container */ >>> if (!iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, >errp)) { >>> goto err_alloc_ioas; >>> @@ -564,10 +588,12 @@ static bool iommufd_cdev_attach(const char >*name, >>> VFIODevice *vbasedev, >>> >>> trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id); >>> >>> +skip_ioas_alloc: >> >> Same here, others look good. > >During cpr, we do not allocate a new ioas, we use the one from cpr state. >I think it would be confusing to print a trace that suggests we allocated >a new ioas. > >Perhaps I should add a trace in vfio_cpr_find_device: > > trace_vfio_cpr_find_device(elem->ioas_id, elem->dev_id, >elem->hwpt_id) Yes, that will be better. Reviewed-by: Zhenzhong Duan <zhenzhong.d...@intel.com> Thanks Zhenzhong