* Wangxin (Alexander) ([email protected]) wrote: > Hi all, (copying in Michael for vhost user maintainer).
> We found that the downtime of migration will reach a few seconds when live > migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/ > 24 vhost-user-blk disks(x4 queues), most of the time is spent in the > position of stopping the device at src and starting device at dst. I suspect that's more vhost-user devices than anyone else has run on a single VM! > Our idea is to stop the device through multiple threads during the end of > migration. To be more specific, we create thread pool at the beginning of live > migraion, when migration thread call virtio_vmstate_change callback to stop or > start device in vm_state_notify, it will submits request to thread pool to > handle the callback concurrently. > > We live migrate the vm and count the cost time at different stages of > stopping/starting devices. > > - - - Cost: Original With state change > concurrently > get vring base 36ms 18ms > disk disable guest notify 48ms 32ms > disable host notify 300ms 120ms > Src get vring base 1376ms 294ms > net disable host notify 1011ms 116ms > disable guest notify 59ms 40ms > - - - > enable guest notify 310ms 97ms > net set memtable 48ms 20ms > enable host notify 2022ms 114ms > Dst enable host notify 312ms 78ms > disk enable guest notify 32ms 23ms > set memTable 16ms 10ms > Total Downtime 5600ms 962ms > > However, there are some side effects: > 1. When disable host notify or guest notify concurrently, the vm will be > crashed > due to disabling same notify at the different threads, we now add two > different lock > to solve this problem, it is hacking to do so and may be resulting in other > problems. > > 2. As the QEMU BQL will be held by migration thread before stopping device in > migration_completion, there will be deadlock in the following scene: > migration_thread [thread 1] > set_up_multithread > ... > migration_completion()# get QEMU BQL > qemu_mutex_lock_iothread() > vm_stop_force_state() > ... > submit stopping device request > to thread pool > virtio_vmstate_change > virtio_set_status > ... > memory_region_transaction_begin > ... > prepare_mmio_access > > qemu_mutex_iothread_locked()# N > > qemu_mutex_lock_iothread()# deadlock > > Now we add another lock to replace the BQL in this scene to solve the problem, > but we think this is not reliable enough and has potential risk that other > processes will also use the QEMU BQL during the process of stopping device. My > question is: how to deal with the conflict with QEMU BQL properly. > > Any advice will be appreciated, thanks. To me it feels like the other way here would be to explicitly split each of these stages into two; one where it sends the request to the vhost device and the other it waits for the response from the vhost-user device; (i.e. in the vhost_user case after the vhost_user_write but before the vhost_user_read) - so instead of parallelising everything in threads, you'd parallelise all of the corresponding operations; so all of the get_vring_base's happen at the same time. Michael: Would this make sense as a thing to change VhostOps get_vring_base and many of the others into two part operations? (or maybe coroutines with a yield in???) Dave -- Dr. David Alan Gilbert / [email protected] / Manchester, UK
