On Wed, 15 Jan 2020 19:26:18 +0100
Laurent Vivier <[email protected]> wrote:

> On 15/01/2020 19:10, Laurent Vivier wrote:
> > Hi,
> > 
> > On 15/01/2020 18:48, Greg Kurz wrote:
> >> Migration can potentially race with CAS reboot. If the migration thread
> >> completes migration after CAS has set spapr->cas_reboot but before the
> >> mainloop could pick up the reset request and reset the machine, the
> >> guest is migrated unrebooted and the destination doesn't reboot it
> >> either because it isn't aware a CAS reboot was needed (eg, because a
> >> device was added before CAS). This likely result in a broken or hung
> >> guest.
> >>
> >> Even if it is small, the window between CAS and CAS reboot is enough to
> >> re-qualify spapr->cas_reboot as state that we should migrate. Add a new
> >> subsection for that and always send it when a CAS reboot is pending.
> >> This may cause migration to older QEMUs to fail but it is still better
> >> than end up with a broken guest.
> >>
> >> The destination cannot honour the CAS reboot request from a post load
> >> handler because this must be done after the guest is fully restored.
> >> It is thus done from a VM change state handler.
> >>
> >> Reported-by: Lukáš Doktor <[email protected]>
> >> Signed-off-by: Greg Kurz <[email protected]>
> >> ---
> >>
> > 
> > I'm wondering if the problem can be related with the fact that
> > main_loop_should_exit() could release qemu_global_mutex in
> > pause_all_vcpus() in the reset case?
> > 
> > 1602 static bool main_loop_should_exit(void)
> > 1603 {
> > ...
> > 1633     request = qemu_reset_requested();
> > 1634     if (request) {
> > 1635         pause_all_vcpus();
> > 1636         qemu_system_reset(request);
> > 1637         resume_all_vcpus();
> > 1638         if (!runstate_check(RUN_STATE_RUNNING) &&
> > 1639                 !runstate_check(RUN_STATE_INMIGRATE)) {
> > 1640             runstate_set(RUN_STATE_PRELAUNCH);
> > 1641         }
> > 1642     }
> > ...
> > 
> > I already sent a patch for this kind of problem (in current Juan pull
> > request):
> > 
> > "runstate: ignore finishmigrate -> prelaunch transition"
> > 
> > but I don't know if it could fix this one.
> 
> I think it should be interesting to have the state transition on source
> and destination when the problem occurs (with something like "-trace
> runstate_set").
> 

With "-serial mon:stdio -trace runstate_set -trace -trace guest_cpu_reset" :

OF stdout device is: /vdevice/vty@71000000
Preparing to boot Linux version 4.18.0-80.el8.ppc64le 
([email protected]) (gcc version 8.2.1 20180905 (Red 
Hat 8.2.1-3) (GCC)) #1 SMP Wed Mar 13 11:26:21 UTC 2019
Detected machine type: 0000000000000101
command line: BOOT_IMAGE=/boot/vmlinuz-4.18.0-80.el8.ppc64le 
root=UUID=012b83a5-2594-48ac-b936-12fec7cdbb9a ro console=ttyS0 
console=ttyS0,115200n8 no_timer_check net.ifnames=0 crashkernel=auto
Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
Calling ibm,client-architecture-support.

 Migration starts here.

..qemu-system-ppc64: warning: kernel_irqchip allowed but unavailable: IRQ_XIVE 
capability must be present for KVM
Falling back to kernel-irqchip=off

 This ^^ indicates that CAS was called and switched to XIVE, for which
 we lack proper KVM support on GA boston machines.

[email protected]:runstate_set current_run_state 9 (running) new_state 7 
(finish-migrate)
[email protected]:runstate_set current_run_state 7 (finish-migrate) 
new_state 5 (postmigrate)

 The migration thread is holding the global QEMU mutex at this point. It
 has stopped all CPUs. It now streams the full state to the destination
 before releasing the mutex.

[email protected]:guest_cpu_reset cpu=0xf9dbb48a5e0 
[email protected]:guest_cpu_reset cpu=0xf9dbb4d56a0 

 The main loop regained control and could process the CAS reboot request
 but it is too late...

[email protected]:runstate_set current_run_state 5 (postmigrate) 
new_state 6 (prelaunch)

> Thanks,
> Laurent
> 


Reply via email to