On Fri, Dec 01, 2023 at 11:23:33AM -0500, Steven Sistare wrote:
> >> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int
> >> version_id)
> >> return -EINVAL;
> >> }
> >> s->state = r;
> >> + vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
> >
> > IIUC current vm_was_suspended (based on my read of your patch) was not the
> > same as a boolean representing "whether VM is suspended", but only a
> > temporary field to remember that for a VM stop request. To be explicit, I
> > didn't see this flag set in qemu_system_suspend() in your previous patch.
> >
> > If so, we can already do:
> >
> > vm_set_suspended(s->vm_was_suspended);
> >
> > Irrelevant of RUN_STATE_SUSPENDED?
>
> We need both terms of the expression.
>
> If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false.
> We call global_state_store prior to vm_stop_force_state, so the incoming
> side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
Right.
> However, the runstate is RUN_STATE_INMIGRATE. When incoming finishes by
> calling vm_start, we need to restore the suspended state. Thus in
> global_state_post_load, we must set vm_was_suspended = true.
With above, shouldn't global_state_get_runstate() (on dest) fetch SUSPENDED
already? Then I think it should call vm_start(SUSPENDED) if to start.
Maybe you're talking about the special case where autostart==false? We
used to have this (existing process_incoming_migration_bh()):
if (!global_state_received() ||
global_state_get_runstate() == RUN_STATE_RUNNING) {
if (autostart) {
vm_start();
} else {
runstate_set(RUN_STATE_PAUSED);
}
}
If so maybe I get you, because in the "else" path we do seem to lose the
SUSPENDED state again, but in that case IMHO we should logically set
vm_was_suspended only when we "lose" it - we didn't lose it during
migration, but only until we decided to switch to PAUSED (due to
autostart==false). IOW, change above to something like:
state = global_state_get_runstate();
if (!global_state_received() || runstate_is_alive(state)) {
if (autostart) {
vm_start(state);
} else {
if (runstate_is_suspended(state)) {
/* Remember suspended state before setting system to STOPed */
vm_was_suspended = true;
}
runstate_set(RUN_STATE_PAUSED);
}
}
It may or may not have a functional difference even if current patch,
though. However maybe clearer to follow vm_was_suspended's strict
definition.
>
> If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
> then vm_was_suspended = true. Migration from that state sets
> vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and
> ends with runstate_set(RUN_STATE_PAUSED).
>
> I will add a comment here in the code.
>
> >> return 0;
> >> }
> >> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
> >> .fields = (VMStateField[]) {
> >> VMSTATE_UINT32(size, GlobalState),
> >> VMSTATE_BUFFER(runstate, GlobalState),
> >> + VMSTATE_BOOL(vm_was_suspended, GlobalState),
> >> VMSTATE_END_OF_LIST()
> >> },
> >> };
> >
> > I think this will break migration between old/new, unfortunately. And
> > since the global state exist mostly for every VM, all VM setup should be
> > affected, and over all archs.
>
> Thanks, I keep forgetting that my binary tricks are no good here. However,
> I have one other trick up my sleeve, which is to store vm_was_running in
> global_state.runstate[strlen(runstate) + 2]. It is forwards and backwards
> compatible, since that byte is always 0 in older qemu. It can be implemented
> with a few lines of code change confined to global_state.c, versus many lines
> spread across files to do it the conventional way using a compat property and
> a subsection. Sound OK?
Tricky! But sounds okay to me. I think you're inventing some of your own
way of being compatible, not relying on machine type as a benefit. If go
this route please document clearly on the layout and also what it looked
like in old binaries.
I think maybe it'll be good to keep using strings, so in the new binaries
we allow >1 strings, then we define properly on those strings (index 0:
runstate, existed since start; index 2: suspended, perhaps using "1"/"0" to
express, while 0x00 means old binary, etc.).
I hope this trick will need less code than the subsection solution,
otherwise I'd still consider going with that, which is the "common
solution".
Let's also see whether Juan/Fabiano/others has any opinions.
--
Peter Xu