On 8/14/2023 3:37 PM, Peter Xu wrote:
> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>> Can we just call vm_state_notify() earlier?
>>
>> We cannot. The guest is not running yet, and will not be until later.
>> We cannot call notifiers that perform actions that complete, or react to,
>> the guest entering a running state.
>
> I tried to look at a few examples of the notifees and most of them I read
> do not react to "vcpu running" but "vm running" (in which case I think
> "suspended" mode falls into "vm running" case); most of them won't care on
> the RunState parameter passed in, but only the bool "running".
>
> In reality, when running=true, it must be RUNNING so far.
>
> In that case does it mean we should notify right after the switchover,
> since after migration the vm is indeed running only if the vcpus are not
> during suspend?
I cannot parse your question, but maybe this answers it.
If the outgoing VM is running and not suspended, then the incoming side
tests for autostart==true and calls vm_start, which calls the notifiers,
right after the switchover.
> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
> device; this kind of prove to me that SUSPEND is actually one of
> running=true states.
>
> If we postpone all notifiers here even after we switched over to dest qemu
> to the next upcoming suspend wakeup, I think it means these devices will
> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
> VFIO_DEVICE_STATE_STOP.
or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
AFAIK it is OK to remain in that state until wakeup is called later.
> Ideally I think we should here call vm_state_notify() with running=true and
> state=SUSPEND, but since I do see some hooks are not well prepared for
> SUSPEND over running=true, I'd think we should on the safe side call
> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
> over phase. With that IIUC it'll naturally work (e.g. when wakeup again
> later we just need to call no notifiers).
Notifiers are just one piece, all the code in vm_prepare_start must be called.
Is it correct to call all of that long before we actually resume the CPUs in
wakeup? I don't know, but what is the point? The wakeup code still needs
modification to conditionally resume the vcpus. The scheme would be roughly:
loadvm_postcopy_handle_run_bh()
runstat = global_state_get_runstate();
if (runstate == RUN_STATE_RUNNING) {
vm_start()
} else if (runstate == RUN_STATE_SUSPENDED)
vm_prepare_start(); // the start of vm_start()
}
qemu_system_wakeup_request()
if (some condition)
resume_all_vcpus(); // the remainder of vm_start()
else
runstate_set(RUN_STATE_RUNNING)
How is that better than my patches
[PATCH V3 01/10] vl: start on wakeup request
[PATCH V3 02/10] migration: preserve suspended runstate
loadvm_postcopy_handle_run_bh()
runstate = global_state_get_runstate();
if (runstate == RUN_STATE_RUNNING)
vm_start()
else
runstate_set(runstate); // eg RUN_STATE_SUSPENDED
qemu_system_wakeup_request()
if (!vm_started)
vm_start();
else
runstate_set(RUN_STATE_RUNNING);
Recall this thread started with your comment "It then can avoid touching the
system wakeup code which seems cleaner". We still need to touch the wakeup
code.
- Steve