On Mon, Jul 28, 2025 at 10:51 PM Eugenio Perez Martin <epere...@redhat.com> wrote: > > On Mon, Jul 28, 2025 at 9:36 AM Jason Wang <jasow...@redhat.com> wrote: > > > > On Mon, Jul 28, 2025 at 3:09 PM Jason Wang <jasow...@redhat.com> wrote: > > > > > > On Fri, Jul 25, 2025 at 5:33 PM Michael S. Tsirkin <m...@redhat.com> > > > wrote: > > > > > > > > On Thu, Jul 24, 2025 at 05:59:20PM -0400, Jonah Palmer wrote: > > > > > > > > > > > > > > > On 7/23/25 1:51 AM, Jason Wang wrote: > > > > > > On Tue, Jul 22, 2025 at 8:41 PM Jonah Palmer > > > > > > <jonah.pal...@oracle.com> wrote: > > > > > > > > > > > > > > This series is an RFC initial implementation of iterative live > > > > > > > migration for virtio-net devices. > > > > > > > > > > > > > > The main motivation behind implementing iterative migration for > > > > > > > virtio-net devices is to start on heavy, time-consuming operations > > > > > > > for the destination while the source is still active (i.e. before > > > > > > > the stop-and-copy phase). > > > > > > > > > > > > It would be better to explain which kind of operations were heavy > > > > > > and > > > > > > time-consuming and how iterative migration help. > > > > > > > > > > > > > > > > You're right. Apologies for being vague here. > > > > > > > > > > I did do some profiling of the virtio_load call for virtio-net to try > > > > > and > > > > > narrow down where exactly most of the downtime is coming from during > > > > > the > > > > > stop-and-copy phase. > > > > > > > > > > Pretty much the entirety of the downtime comes from the > > > > > vmstate_load_state > > > > > call for the vmstate_virtio's subsections: > > > > > > > > > > /* Subsections */ > > > > > ret = vmstate_load_state(f, &vmstate_virtio, vdev, 1); > > > > > if (ret) { > > > > > return ret; > > > > > } > > > > > > > > > > More specifically, the vmstate_virtio_virtqueues and > > > > > vmstate_virtio_extra_state subsections. > > > > > > > > > > For example, currently (with no iterative migration), for a virtio-net > > > > > device, the virtio_load call took 13.29ms to finish. 13.20ms of that > > > > > time > > > > > was spent in vmstate_load_state(f, &vmstate_virtio, vdev, 1). > > > > > > > > > > Of that 13.21ms, ~6.83ms was spent migrating > > > > > vmstate_virtio_virtqueues and > > > > > ~6.33ms was spent migrating the vmstate_virtio_extra_state > > > > > subsections. And > > > > > I believe this is from walking VIRTIO_QUEUE_MAX virtqueues, twice. > > > > > > > > Can we optimize it simply by sending a bitmap of used vqs? > > > > > > +1. > > > > > > For example devices like virtio-net may know exactly the number of > > > virtqueues that will be used. > > > > Ok, I think it comes from the following subsections: > > > > static const VMStateDescription vmstate_virtio_virtqueues = { > > .name = "virtio/virtqueues", > > .version_id = 1, > > .minimum_version_id = 1, > > .needed = &virtio_virtqueue_needed, > > .fields = (const VMStateField[]) { > > VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice, > > VIRTIO_QUEUE_MAX, 0, vmstate_virtqueue, VirtQueue), > > VMSTATE_END_OF_LIST() > > } > > }; > > > > static const VMStateDescription vmstate_virtio_packed_virtqueues = { > > .name = "virtio/packed_virtqueues", > > .version_id = 1, > > .minimum_version_id = 1, > > .needed = &virtio_packed_virtqueue_needed, > > .fields = (const VMStateField[]) { > > VMSTATE_STRUCT_VARRAY_POINTER_KNOWN(vq, struct VirtIODevice, > > VIRTIO_QUEUE_MAX, 0, vmstate_packed_virtqueue, > > VirtQueue), > > VMSTATE_END_OF_LIST() > > } > > }; > > > > A rough idea is to disable those subsections and use new subsections > > instead (and do the compatibility work) like virtio_save(): > > > > for (i = 0; i < VIRTIO_QUEUE_MAX; i++) { > > if (vdev->vq[i].vring.num == 0) > > break; > > } > > > > qemu_put_be32(f, i); > > .... > > > > While I think this is a very good area to explore, I think we will get > more benefits by pre-warming vhost-vdpa devices, as they take one or > two orders of magnitude more than sending and processing the > virtio-net state (1s~10s vs 10~100ms).
Yes, but note that Jonah does the testing on a software virtio device. Thanks >