On 7/25/25 5:31 AM, Michael S. Tsirkin wrote:
On Thu, Jul 24, 2025 at 10:45:34AM -0400, Jonah Palmer wrote:
On 7/23/25 2:51 AM, Michael S. Tsirkin wrote:
On Tue, Jul 22, 2025 at 12:41:25PM +0000, Jonah Palmer wrote:
Lays out the initial groundwork for iteratively migrating the state of a
virtio-net device, starting with its vmstate (via vmstate_save_state &
vmstate_load_state).
The original non-iterative vmstate framework still runs during the
stop-and-copy phase when the guest is paused, which is still necessary
to migrate over the final state of the virtqueues once the sourced has
been paused.
Although the vmstate framework is used twice (once during the iterative
portion and once during the stop-and-copy phase), it appears that
there's some modest improvement in guest-visible downtime when using a
virtio-net device.
When tracing the vmstate_downtime_save and vmstate_downtime_load
tracepoints, for a virtio-net device using iterative live migration, the
non-iterative downtime portion improved modestly, going from ~3.2ms to
~1.4ms:
Before:
-------
vmstate_downtime_load type=non-iterable idstr=0000:00:03.0/virtio-net
instance_id=0 downtime=3594
After:
------
vmstate_downtime_load type=non-iterable idstr=0000:00:03.0/virtio-net
instance_id=0 downtime=1607
This improvement is likely due to the initial vmstate_load_state call
(while the guest is still running) "warming up" all related pages and
structures on the destination. In other words, by the time the final
stop-and-copy phase starts, the heavy allocations and page-fault
latencies are reduced, making the device re-loads slightly faster and
the guest-visible downtime window slightly smaller.
did I get it right it's just the vmstate load for this single device?
If the theory is right, is it not possible that while the
tracepoints are now closer together, you have pushed something
else out of the cache, making the effect on guest visible downtime
unpredictable? how about the total vmstate load time?
Correct, the data above is just from the virtio-net device's downtime
contribution (specifically during the stop-and-copy phase).
Theoretically, yes I believe so. To try and get a feel on this, I ran some
slightly heavier testing for the virtio-net device: vhost-net + 4 queue
pairs (the one above was just a virtio-net device with 1 queue pair).
I traced the reported downtimes of the devices that come right before and
after virtio-net's vmstate_load_state call with and without iterative
migration on the virtio-net device.
The downtimes below are all from the vmstate_load_state calls that happen
while the source has been stopped:
With iterative migration for virtio-net:
----------------------------------------
vga: 1.50ms | 1.39ms | 1.37ms | 1.50ms | 1.63ms |
virtio-console: 13.78ms | 14.24ms | 13.74ms | 13.89ms | 13.60ms |
virtio-net: 13.91ms | 13.52ms | 13.09ms | 13.59ms | 13.37ms |
virtio-scsi: 18.71ms | 13.96ms | 14.05ms | 16.55ms | 14.30ms |
vga: Avg. 1.47ms | Var: 0.0109ms² | Std. Dev (σ): 0.104ms
virtio-console: Avg. 13.85ms | Var: 0.0583ms² | Std. Dev (σ): 0.241ms
virtio-net: Avg. 13.49ms | Var: 0.0904ms² | Std. Dev (σ): 0.301ms
virtio-scsi: Avg. 15.51ms | Var: 4.3299ms² | Std. Dev (σ): 2.081ms
Without iterative migration for virtio-net:
-------------------------------------------
vga: 1.47ms | 1.28ms | 1.55ms | 1.36ms | 1.22ms |
virtio-console: 13.39ms | 13.40ms | 14.37ms | 13.93ms | 13.36ms |
virtio-net: 18.52ms | 17.77ms | 17.52ms | 15.52ms | 17.32ms |
virtio-scsi: 13.35ms | 13.94ms | 15.17ms | 16.01ms | 14.08ms |
vga: Avg. 1.37ms | Var: 0.0182ms² | Std. Dev (σ): 0.135ms
virtio-console: Avg. 13.69ms | Var: 0.2007ms² | Std. Dev (σ): 0.448ms
virtio-net: Avg. 17.33ms | Var: 1.2305ms² | Std. Dev (σ): 1.109ms
virtio-scsi: Avg. 14.51ms | Var: 1.1352ms² | Std. Dev (σ): 1.065ms
The most notable difference here is the standard deviation of virtio-scsi's
migration downtime, which comes after virtio-net's migration: virtio-scsi's
σ rises from ~1.07ms to ~2.08ms when virtio-net is iteratively migrated.
However, since I only got 5 samples per device, the trend is indicative but
not definitive.
Total vmstate load time per device ≈ downtimes reported above, unless you're
referring to overall downtime across all devices?
Indeed.
I also wonder, if preheating cache is a big gain, why don't we just
do it for all devices? there is nothing special in virtio: just
call save for all devices, send the state, call load on destination
then call reset to discard the state.
So with a relatively simple guest with vhost-net (4x queue pairs),
virtio-scsi, and virtio-serial (virtio-console), total downtime across
all devices came out to ~66.29ms. This was with iterative live migration
for the virtio-net device.
The 5 largest contributors to downtime was virtio-scsi, virtio-serial,
virtio-net, RAM, and CPU:
virtio-scsi: 13.994ms
virtio-console: 13.796ms
virtio-net: 13.495ms
RAM: 9.994ms
CPU: 4.125ms
...
-----------
Perhaps we could do it for all devices, but it would probably be much
more efficient to not discard the state after the iterative portion and
just send the deltas at the end. This will probably be the next goal for
this series.
----------
Having said all this, this RFC is just an initial, first-step for iterative
migration of a virtio-net device. This second vmstate_load_state call during
the stop-and-copy phase isn't optimal. A future version of this series could
do away with this second call and only send the deltas instead of the entire
state again.
I see how this could be a win, in theory, if the state is big.
Future patches could improve upon this by skipping the second
vmstate_save/load_state calls (during the stop-and-copy phase) and
instead only send deltas right before/after the source is stopped.
Signed-off-by: Jonah Palmer <jonah.pal...@oracle.com>
---
hw/net/virtio-net.c | 37 ++++++++++++++++++++++++++++++++++
include/hw/virtio/virtio-net.h | 8 ++++++++
2 files changed, 45 insertions(+)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 19aa5b5936..86a6fe5b91 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3808,16 +3808,31 @@ static bool virtio_net_is_active(void *opaque)
static int virtio_net_save_setup(QEMUFile *f, void *opaque, Error **errp)
{
+ VirtIONet *n = opaque;
+
+ qemu_put_be64(f, VNET_MIG_F_INIT_STATE);
+ vmstate_save_state(f, &vmstate_virtio_net, n, NULL);
+ qemu_put_be64(f, VNET_MIG_F_END_DATA);
+
return 0;
}
static int virtio_net_save_live_iterate(QEMUFile *f, void *opaque)
{
+ bool new_data = false;
+
+ if (!new_data) {
+ qemu_put_be64(f, VNET_MIG_F_NO_DATA);
+ return 1;
+ }
+
+ qemu_put_be64(f, VNET_MIG_F_END_DATA);
return 1;
}
static int virtio_net_save_live_complete_precopy(QEMUFile *f, void *opaque)
{
+ qemu_put_be64(f, VNET_MIG_F_NO_DATA);
return 0;
}
@@ -3833,6 +3848,28 @@ static int virtio_net_load_setup(QEMUFile *f, void
*opaque, Error **errp)
static int virtio_net_load_state(QEMUFile *f, void *opaque, int version_id)
{
+ VirtIONet *n = opaque;
+ uint64_t flag;
+
+ flag = qemu_get_be64(f);
+ if (flag == VNET_MIG_F_NO_DATA) {
+ return 0;
+ }
+
+ while (flag != VNET_MIG_F_END_DATA) {
+ switch (flag) {
+ case VNET_MIG_F_INIT_STATE:
+ {
+ vmstate_load_state(f, &vmstate_virtio_net, n,
VIRTIO_NET_VM_VERSION);
+ break;
+ }
+ default:
+ qemu_log_mask(LOG_GUEST_ERROR, "%s: Uknown flag 0x%"PRIx64,
__func__, flag);
+ return -EINVAL;
+ }
+
+ flag = qemu_get_be64(f);
+ }
return 0;
}
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b9ea9e824e..d6c7619053 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -163,6 +163,14 @@ typedef struct VirtIONetQueue {
struct VirtIONet *n;
} VirtIONetQueue;
+/*
+ * Flags to be used as unique delimiters for virtio-net devices in the
+ * migration stream.
+ */
+#define VNET_MIG_F_INIT_STATE (0xffffffffef200000ULL)
+#define VNET_MIG_F_END_DATA (0xffffffffef200001ULL)
+#define VNET_MIG_F_NO_DATA (0xffffffffef200002ULL)
+
struct VirtIONet {
VirtIODevice parent_obj;
uint8_t mac[ETH_ALEN];
--
2.47.1