On Mon, 26 Jun 2023 17:26:42 +0200 Cédric Le Goater <[email protected]> wrote:
> On 6/26/23 15:40, Joao Martins wrote: > > On 26/06/2023 14:20, Cédric Le Goater wrote: > >> Hello Avihai, > >> > >> On 6/26/23 10:23, Avihai Horon wrote: > >>> The major parts of VFIO migration are supported today in QEMU. This > >>> includes basic VFIO migration, device dirty page tracking and precopy > >>> support. > >>> > >>> Thus, at this point in time, it seems appropriate to make VFIO migration > >>> non-experimental: remove the x prefix from enable_migration property, > >>> change it to ON_OFF_AUTO and let the default value be AUTO. > >>> > >>> In addition, make the following adjustments: > >>> 1. Require device dirty tracking support when enable_migration is AUTO > >>> (i.e., not explicitly enabled). This is because device dirty tracking > >>> is currently the only method to do dirty page tracking, which is > >>> essential for migrating in a reasonable downtime. > >> > >> hmm, I don't think QEMU should decide to disable a feature for all > >> devices supposedly because it could be slow for some. That's too > >> restrictive. What about devices with have small states ? for which > >> the downtime would be reasonable even without device dirty tracking > >> support. > >> > > > > device dirty tracking refers to the ability to tracking dirty IOVA used by > > the > > device which will DMA into RAM. It is required because the > > consequence/alternative is to transfer all RAM in stop copy phase. Device > > state > > size at that point is the least of our problems downtime wise. > > Arg. thanks for reminding me. I tend to take this for granted ... My initial reaction was the same, for instance we could have a non-DMA device (ex. PCI serial card) that supports migration w/o dirty tracking, but QEMU cannot assume the device doesn't do DMA, nor is it worth our time to monitor whether bus-master is ever enabled for the device, so QEMU needs to assume all memory that's DMA accessible for the device is perpetually dirty. Also, if such a corner case existed for a non-DMA migratable device, the easier path is simply to require dirty tracking stubs in the variant driver to report no memory dirtied. > > I can imagine that allowing without dirty tracking is useful for developer > > testing of the suspend/device-state flows, but as real default (auto) is > > very > > questionable to let it go through without dirty tracking. When we have > > IOMMUFD > > dirty tracking that's when we can relieve this restriction as a default. > > > > But then note that (...) > > > >> > >>> Setting > >>> enable_migration to ON will not require device dirty tracking. > > > > (...) this lets it ignore dirty tracking as you would like. > > > > > >>> 2. Make migration blocker messages more elaborate. > >>> 3. Remove error prints in vfio_migration_query_flags(). > >>> 4. Remove a redundant assignment in vfio_migration_realize(). > >>> > >>> Signed-off-by: Avihai Horon <[email protected]> > >>> --- > >>> include/hw/vfio/vfio-common.h | 2 +- > >>> hw/vfio/migration.c | 29 ++++++++++++++++------------- > >>> hw/vfio/pci.c | 4 ++-- > >>> 3 files changed, 19 insertions(+), 16 deletions(-) > >>> > >>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h > >>> index b4c28f318f..387eabde60 100644 > >>> --- a/include/hw/vfio/vfio-common.h > >>> +++ b/include/hw/vfio/vfio-common.h > >>> @@ -139,7 +139,7 @@ typedef struct VFIODevice { > >>> bool needs_reset; > >>> bool no_mmap; > >>> bool ram_block_discard_allowed; > >>> - bool enable_migration; > >>> + OnOffAuto enable_migration; > >>> VFIODeviceOps *ops; > >>> unsigned int num_irqs; > >>> unsigned int num_regions; > >>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > >>> index 79eb81dfd7..d8e0848635 100644 > >>> --- a/hw/vfio/migration.c > >>> +++ b/hw/vfio/migration.c > >>> @@ -731,14 +731,6 @@ static int vfio_migration_query_flags(VFIODevice > >>> *vbasedev, uint64_t *mig_flags) > >>> feature->argsz = sizeof(buf); > >>> feature->flags = VFIO_DEVICE_FEATURE_GET | > >>> VFIO_DEVICE_FEATURE_MIGRATION; > >>> if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) { > >>> - if (errno == ENOTTY) { > >>> - error_report("%s: VFIO migration is not supported in kernel", > >>> - vbasedev->name); > >>> - } else { > >>> - error_report("%s: Failed to query VFIO migration support, > >>> err: %s", > >>> - vbasedev->name, strerror(errno)); > >>> - } > >>> - > >>> return -errno; > >>> } > >>> @@ -831,14 +823,28 @@ void vfio_reset_bytes_transferred(void) > >>> int vfio_migration_realize(VFIODevice *vbasedev, Error **errp) > >>> { > >>> - int ret = -ENOTSUP; > >>> + int ret; > >>> - if (!vbasedev->enable_migration) { > >>> + if (vbasedev->enable_migration == ON_OFF_AUTO_OFF) { > >>> + error_setg(&vbasedev->migration_blocker, > >>> + "%s: Migration is disabled for VFIO device", > >>> vbasedev->name); > >>> goto add_blocker; > >>> } > >>> ret = vfio_migration_init(vbasedev); > >>> if (ret) { > >> > >> It would be good to keep the message for 'errno == ENOTTY' as it was in > >> vfio_migration_query_flags(). When migration fails, it is an important > >> information to know that it is because the VFIO PCI host device driver > >> doesn't support the feature. The root cause could be deep below in FW or > >> how the VF was set up. > >> > > +1 As I have been in this rabbit hole > > > >>> + error_setg(&vbasedev->migration_blocker, > >>> + "%s: Migration couldn't be initialized for VFIO > >>> device, " > >>> + "err: %d (%s)", > >>> + vbasedev->name, ret, strerror(-ret)); > >>> + goto add_blocker; > >>> + } > >>> + > >>> + if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO && > >>> + !vbasedev->dirty_pages_supported) { > >> > >> I don't agree with this test. > >> > > > > The alternative right now is perceptual dirty tracking. How is that OK as a > > default? It's not like we have even an option :( > > > > Maybe perhaps you refer to avoid strongly enforcing *always* it to allow > > testing > > of the non dirty tracking parts? Maybe when you 'force' enabling with > > enable-migration=on is when you ignore the dirty tracking which is what > > this is > > doing. > > > I see ON_OFF_AUTO_ON as a way to abort the machine startup while > ON_OFF_AUTO_AUTO would let it run but block migration. Agreed. There's a little bit of redundancy between the device-level "enable-migration=on" option and the global "-only-migratable" option relative to preventing machine startup, but it also doesn't make sense to me if the device-level option let realize complete successfully if the device doesn't support or fails migration setup. So I think we'd generally rely on using the -only-migratable option with the default ON_OFF_AUTO_AUTO value, allow the ON_OFF_AUTO_ON value to enable dis-recommended support, and live with the redundancy that it should also cause the device realize to fail if migration is not supported. Thanks, Alex > We would > need an extra property to relax the checks, else we are hijacking > some pre-existing option to fit our need. > > Since dirty tracking is a must-have to implement migration support > for any existing and future VFIO PCI variant driver, anything else > would be experimental code and we are trying to remove the flag ! > Please correct me if I am wrong. > > So, the case !vbasedev->dirty_pages_supported is just an extra > information to report for why migration is not supported. Does > that sound reasonable ? > > Thanks, > > C. > > > > > > >>> + error_setg(&vbasedev->migration_blocker, > >>> + "%s: VFIO device doesn't support device dirty > >>> tracking", > >>> + vbasedev->name); > >>> goto add_blocker; > >>> } > >> I agree that with ON_OFF_AUTO_AUTO, errors at realize time should be > >> recorded > >> in a migration blocker. What about the ON_OFF_AUTO_ON case ? If migration > >> was > >> explicitly requested for the device and the conditions on the host are not > >> met, > >> I think realize should fail and the machine abort. > >> > > +1 Good point > > > >> Thanks, > >> > >> C. > >> > >> > >> > >>> @@ -856,9 +862,6 @@ int vfio_migration_realize(VFIODevice *vbasedev, Error > >>> **errp) > >>> return 0; > >>> add_blocker: > >>> - error_setg(&vbasedev->migration_blocker, > >>> - "VFIO device doesn't support migration"); > >>> - > >>> ret = migrate_add_blocker(vbasedev->migration_blocker, errp); > >>> if (ret < 0) { > >>> error_free(vbasedev->migration_blocker); > >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >>> index 73874a94de..48584e3b01 100644 > >>> --- a/hw/vfio/pci.c > >>> +++ b/hw/vfio/pci.c > >>> @@ -3347,8 +3347,8 @@ static Property vfio_pci_dev_properties[] = { > >>> VFIO_FEATURE_ENABLE_REQ_BIT, true), > >>> DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, > >>> VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), > >>> - DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice, > >>> - vbasedev.enable_migration, false), > >>> + DEFINE_PROP_ON_OFF_AUTO("enable-migration", VFIOPCIDevice, > >>> + vbasedev.enable_migration, ON_OFF_AUTO_AUTO), > >>> DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, > >>> false), > >>> DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice, > >>> vbasedev.ram_block_discard_allowed, false), > >> >
