On Wed, 15 Jan 2020 19:26:18 +0100 Laurent Vivier <[email protected]> wrote:
> On 15/01/2020 19:10, Laurent Vivier wrote: > > Hi, > > > > On 15/01/2020 18:48, Greg Kurz wrote: > >> Migration can potentially race with CAS reboot. If the migration thread > >> completes migration after CAS has set spapr->cas_reboot but before the > >> mainloop could pick up the reset request and reset the machine, the > >> guest is migrated unrebooted and the destination doesn't reboot it > >> either because it isn't aware a CAS reboot was needed (eg, because a > >> device was added before CAS). This likely result in a broken or hung > >> guest. > >> > >> Even if it is small, the window between CAS and CAS reboot is enough to > >> re-qualify spapr->cas_reboot as state that we should migrate. Add a new > >> subsection for that and always send it when a CAS reboot is pending. > >> This may cause migration to older QEMUs to fail but it is still better > >> than end up with a broken guest. > >> > >> The destination cannot honour the CAS reboot request from a post load > >> handler because this must be done after the guest is fully restored. > >> It is thus done from a VM change state handler. > >> > >> Reported-by: Lukáš Doktor <[email protected]> > >> Signed-off-by: Greg Kurz <[email protected]> > >> --- > >> > > > > I'm wondering if the problem can be related with the fact that > > main_loop_should_exit() could release qemu_global_mutex in > > pause_all_vcpus() in the reset case? > > > > 1602 static bool main_loop_should_exit(void) > > 1603 { > > ... > > 1633 request = qemu_reset_requested(); > > 1634 if (request) { > > 1635 pause_all_vcpus(); > > 1636 qemu_system_reset(request); > > 1637 resume_all_vcpus(); > > 1638 if (!runstate_check(RUN_STATE_RUNNING) && > > 1639 !runstate_check(RUN_STATE_INMIGRATE)) { > > 1640 runstate_set(RUN_STATE_PRELAUNCH); > > 1641 } > > 1642 } > > ... > > > > I already sent a patch for this kind of problem (in current Juan pull > > request): > > > > "runstate: ignore finishmigrate -> prelaunch transition" > > > > but I don't know if it could fix this one. > > I think it should be interesting to have the state transition on source > and destination when the problem occurs (with something like "-trace > runstate_set"). > With "-serial mon:stdio -trace runstate_set -trace -trace guest_cpu_reset" : OF stdout device is: /vdevice/vty@71000000 Preparing to boot Linux version 4.18.0-80.el8.ppc64le ([email protected]) (gcc version 8.2.1 20180905 (Red Hat 8.2.1-3) (GCC)) #1 SMP Wed Mar 13 11:26:21 UTC 2019 Detected machine type: 0000000000000101 command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-80.el8.ppc64le root=UUID=012b83a5-2594-48ac-b936-12fec7cdbb9a ro console=ttyS0 console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) Calling ibm,client-architecture-support. Migration starts here. ..qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE capability must be present for KVM Falling back to kernel-irqchip=off This ^^ indicates that CAS was called and switched to XIVE, for which we lack proper KVM support on GA boston machines. [email protected]:runstate_set current_run_state 9 (running) new_state 7 (finish-migrate) [email protected]:runstate_set current_run_state 7 (finish-migrate) new_state 5 (postmigrate) The migration thread is holding the global QEMU mutex at this point. It has stopped all CPUs. It now streams the full state to the destination before releasing the mutex. [email protected]:guest_cpu_reset cpu=0xf9dbb48a5e0 [email protected]:guest_cpu_reset cpu=0xf9dbb4d56a0 The main loop regained control and could process the CAS reboot request but it is too late... [email protected]:runstate_set current_run_state 5 (postmigrate) new_state 6 (prelaunch) > Thanks, > Laurent >
