On 04/09/2024 16:00, Peter Xu wrote:
External email: Use caution opening links or attachments


Hello, Avihai,

Reviving this thread just to discuss one issue below..

On Thu, Feb 16, 2023 at 04:36:27PM +0200, Avihai Horon wrote:
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+
+/*
+ * Only exact function is implemented and not estimate function. The reason is
+ * that during pre-copy phase of migration the estimate function is called
+ * repeatedly while pending RAM size is over the threshold, thus migration
+ * can't converge and querying the VFIO device pending data size is useless.
+ */
+static void vfio_state_pending_exact(void *opaque, uint64_t *must_precopy,
+                                     uint64_t *can_postcopy)
+{
+    VFIODevice *vbasedev = opaque;
+    uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+
+    /*
+     * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
+     * reported so downtime limit won't be violated.
+     */
+    vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
+    *must_precopy += stop_copy_size;
Is this the chunk of data only can be copied during VM stopped?  If so, I
wonder why it's reported as "must precopy" if we know precopy won't ever
move them..

A VFIO device that doesn't support precopy will send this data only when VM is stopped. A VFIO device that supports precopy may or may not send this data (or part of it) during precopy, and it depends on the specific VFIO device.

According to state_pending_{estimate,exact} documentation, must_precopy is the amount of data that must be migrated before target starts, and indeed this VFIO data must be migrated before target starts.


The issue is if with such reporting (and now in latest master branch we do
have the precopy size too, which was reported both in exact() and
estimate()), we can observe weird reports like this:

23411@1725380798968696657 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)
23411@1725380799050766000 migrate_pending_exact exact pending size 21038628864 
(pre = 21038628864 post=0)
23411@1725380799050896975 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)
23411@1725380799138657103 migrate_pending_exact exact pending size 21040144384 
(pre = 21040144384 post=0)
23411@1725380799140166709 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)
23411@1725380799217246861 migrate_pending_exact exact pending size 21038628864 
(pre = 21038628864 post=0)
23411@1725380799217384969 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)
23411@1725380799305147722 migrate_pending_exact exact pending size 21039976448 
(pre = 21039976448 post=0)
23411@1725380799306639956 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)
23411@1725380799385118245 migrate_pending_exact exact pending size 21038796800 
(pre = 21038796800 post=0)
23411@1725380799385709382 migrate_pending_estimate estimate pending size 0 (pre 
= 0 post=0)

So estimate() keeps reporting zero but the exact() reports much larger, and
it keeps spinning like this.  I think that's not how it was designed to be
used..

It keeps spinning and migration doesn't converge?
If so, configuring a higher downtime limit or the avail-switchover-bandwidth parameter may solve it.


Does this stop copy size change for a VFIO device or not?

It depends on the specific VFIO device.
If the device supports precopy and all (or part) of its data is precopy-able, then stopcopy size will change. Besides that, the amount of resources currently used by the VFIO device can also affect the stopcopy size, and it may increase or decrease as resources are created or destroyed.

IIUC, we may want some other mechanism to report stop copy size for a
device, rather than reporting it with the current exact()/estimate() api.
That's, per my undertanding, only used for iterable data, while
stop-copy-size may not fall into that category if so.

The above situation is caused by the fact that VFIO data may not be fully precopy-able (as opposed to RAM data). I don't think reporting the stop-copy-size in a different API will help the above situation -- we would still have to take stop-copy-size into account before converging, to not violate downtime.

Thanks.


+
+    trace_vfio_state_pending_exact(vbasedev->name, *must_precopy, 
*can_postcopy,
+                                   stop_copy_size);
+}
--
Peter Xu


Reply via email to