> On Jan 2, 2020, at 3:07 PM, Stefan Hajnoczi <[email protected]> wrote: > > On Thu, Dec 26, 2019 at 05:40:22PM +0800, 张海斌 wrote: >> Stefan Hajnoczi <[email protected]> 于2019年3月29日周五 上午1:08写道: >>> >>> On Thu, Mar 28, 2019 at 05:53:34PM +0800, 张海斌 wrote: >>>> hi, stefan >>>> >>>> I have faced the same problem you wrote in >>>> https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg04025.html >>>> >>>> Reproduce as follow: >>>> 1. Clone qemu code from https://git.qemu.org/git/qemu.git, add some >>>> debug information and compile >>>> 2. Start a new VM >>>> 3. In VM, use fio randwrite to add pressure for disk >>>> 4. Live migrate >>>> >>>> Log show as follow: >>>> [2019-03-28 15:10:40.206] /data/qemu/cpus.c:1086: enter do_vm_stop >>>> [2019-03-28 15:10:40.212] /data/qemu/cpus.c:1097: call bdrv_drain_all >>>> [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1099: call >>>> replay_disable_events >>>> [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1101: call bdrv_flush_all >>>> [2019-03-28 15:10:41.004] /data/qemu/cpus.c:1104: done do_vm_stop >>>> >>>> Calling bdrv_drain_all() costs 792 mini-seconds. >>>> I just add a bdrv_drain_all() at start of do_vm_stop() before >>>> pause_all_vcpus(), but it doesn't work. >>>> Is there any way to improve live-migration downtime cause by >>>> bdrv_drain_all()? > > I believe there were ideas about throttling storage controller devices > during the later phases of live migration to reduce the number of > pending I/Os. > > In other words, if QEMU's virtio-blk/scsi emulation code reduces the > queue depth as live migration nears the handover point, bdrv_drain_all() > should become cheaper because fewer I/O requests will be in-flight. > > A simple solution would reduce the queue depth during live migration > (e.g. queue depth 1). A smart solution would look at I/O request > latency to decide what queue depth is acceptable. For example, if > requests are taking 4 ms to complete then we might allow 2 or 3 requests > to achieve a ~10 ms bdrv_drain_all() downtime target. > > As far as I know this has not been implemented. > > Do you want to try implementing this? > > Stefan
It is a really hard problem to solve. Ultimately, if guarantees are needed about the blackout period, I don't see any viable solution other than aborting all pending storage commands. Starting with a "go to QD=1 mode" approach is probably sensible. Vhost-based backends could even do that off the "you need to log" message, given that these are only used during migration. Having a "you are taking too long, abort everything" command might be something worth looking into, specially if we can *safely* replay them on the other side. (That may be backend-dependent.) F.
