Re: [RFC 0/6] virtio-net: initial iterative live migration support

Jason Wang Mon, 28 Jul 2025 19:40:44 -0700

On Mon, Jul 28, 2025 at 10:51 PM Eugenio Perez Martin
<epere...@redhat.com> wrote:
>
> On Mon, Jul 28, 2025 at 9:36 AM Jason Wang <jasow...@redhat.com> wrote:
> >
> > On Mon, Jul 28, 2025 at 3:09 PM Jason Wang <jasow...@redhat.com> wrote:
> > >
> > > On Fri, Jul 25, 2025 at 5:33 PM Michael S. Tsirkin <m...@redhat.com> 
> > > wrote:
> > > >
> > > > On Thu, Jul 24, 2025 at 05:59:20PM -0400, Jonah Palmer wrote:
> > > > >
> > > > >
> > > > > On 7/23/25 1:51 AM, Jason Wang wrote:
> > > > > > On Tue, Jul 22, 2025 at 8:41 PM Jonah Palmer 
> > > > > > <jonah.pal...@oracle.com> wrote:
> > > > > > >
> > > > > > > This series is an RFC initial implementation of iterative live
> > > > > > > migration for virtio-net devices.
> > > > > > >
> > > > > > > The main motivation behind implementing iterative migration for
> > > > > > > virtio-net devices is to start on heavy, time-consuming operations
> > > > > > > for the destination while the source is still active (i.e. before
> > > > > > > the stop-and-copy phase).
> > > > > >
> > > > > > It would be better to explain which kind of operations were heavy 
> > > > > > and
> > > > > > time-consuming and how iterative migration help.
> > > > > >
> > > > >
> > > > > You're right. Apologies for being vague here.
> > > > >
> > > > > I did do some profiling of the virtio_load call for virtio-net to try 
> > > > > and
> > > > > narrow down where exactly most of the downtime is coming from during 
> > > > > the
> > > > > stop-and-copy phase.
> > > > >
> > > > > Pretty much the entirety of the downtime comes from the 
> > > > > vmstate_load_state
> > > > > call for the vmstate_virtio's subsections:
> > > > >
> > > > > /* Subsections */
> > > > > ret = vmstate_load_state(f, &vmstate_virtio, vdev, 1);
> > > > > if (ret) {
> > > > >     return ret;
> > > > > }
> > > > >
> > > > > More specifically, the vmstate_virtio_virtqueues and
> > > > > vmstate_virtio_extra_state subsections.
> > > > >
> > > > > For example, currently (with no iterative migration), for a virtio-net
> > > > > device, the virtio_load call took 13.29ms to finish. 13.20ms of that 
> > > > > time
> > > > > was spent in vmstate_load_state(f, &vmstate_virtio, vdev, 1).
> > > > >
> > > > > Of that 13.21ms, ~6.83ms was spent migrating 
> > > > > vmstate_virtio_virtqueues and
> > > > > ~6.33ms was spent migrating the vmstate_virtio_extra_state 
> > > > > subsections. And
> > > > > I believe this is from walking VIRTIO_QUEUE_MAX virtqueues, twice.
> > > >
> > > > Can we optimize it simply by sending a bitmap of used vqs?
> > >
> > > +1.
> > >
> > > For example devices like virtio-net may know exactly the number of
> > > virtqueues that will be used.
> >
> > Ok, I think it comes from the following subsections:
> >
> > static const VMStateDescription vmstate_virtio_virtqueues = {
> >     .name = "virtio/virtqueues",
> >     .version_id = 1,
> >     .minimum_version_id = 1,
> >     .needed = &virtio_virtqueue_needed,
> >     .fields = (const VMStateField[]) {
> >         VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
> >                       VIRTIO_QUEUE_MAX, 0, vmstate_virtqueue, VirtQueue),
> >         VMSTATE_END_OF_LIST()
> >     }
> > };
> >
> > static const VMStateDescription vmstate_virtio_packed_virtqueues = {
> >     .name = "virtio/packed_virtqueues",
> >     .version_id = 1,
> >     .minimum_version_id = 1,
> >     .needed = &virtio_packed_virtqueue_needed,
> >     .fields = (const VMStateField[]) {
> >         VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice,
> >                       VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue, 
> > VirtQueue),
> >         VMSTATE_END_OF_LIST()
> >     }
> > };
> >
> > A rough idea is to disable those subsections and use new subsections
> > instead (and do the compatibility work) like virtio_save():
> >
> >     for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
> >         if (vdev->vq[i].vring.num == 0)
> >             break;
> >     }
> >
> >     qemu_put_be32(f, i);
> >     ....
> >
>
> While I think this is a very good area to explore, I think we will get
> more benefits by pre-warming vhost-vdpa devices, as they take one or
> two orders of magnitude more than sending and processing the
> virtio-net state (1s~10s vs 10~100ms).


Yes, but note that Jonah does the testing on a software virtio device.

Thanks

>

Re: [RFC 0/6] virtio-net: initial iterative live migration support

Reply via email to