On 11/30/2023 6:06 PM, Peter Xu wrote:
> On Thu, Nov 30, 2023 at 01:37:18PM -0800, Steve Sistare wrote:
>> If the outgoing machine was previously suspended, propagate that to the
>> incoming side via global_state, so a subsequent vm_start restores the
>> suspended state. To maintain backward and forward compatibility, define
>> the new field in a zero'd hole in the GlobalState struct.
>>
>> Signed-off-by: Steve Sistare <[email protected]>
>> ---
>> migration/global_state.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/migration/global_state.c b/migration/global_state.c
>> index 4e2a9d8..de2532c 100644
>> --- a/migration/global_state.c
>> +++ b/migration/global_state.c
>> @@ -25,6 +25,7 @@ typedef struct {
>> uint8_t runstate[100];
>> RunState state;
>> bool received;
>> + bool vm_was_suspended;
>> } GlobalState;
>>
>> static GlobalState global_state;
>> @@ -35,6 +36,7 @@ static void global_state_do_store(RunState state)
>> assert(strlen(state_str) < sizeof(global_state.runstate));
>> strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>> state_str, '\0');
>> + global_state.vm_was_suspended = vm_get_suspended();
>> }
>>
>> void global_state_store(void)
>> @@ -68,6 +70,12 @@ static bool global_state_needed(void *opaque)
>> return true;
>> }
>>
>> + /* If the suspended state must be remembered, it is needed */
>> +
>> + if (vm_get_suspended()) {
>> + return true;
>> + }
>> +
>> /* If state is running or paused, it is not needed */
>>
>> if (strcmp(runstate, "running") == 0 ||
>> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int
>> version_id)
>> return -EINVAL;
>> }
>> s->state = r;
>> + vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
>
> IIUC current vm_was_suspended (based on my read of your patch) was not the
> same as a boolean representing "whether VM is suspended", but only a
> temporary field to remember that for a VM stop request. To be explicit, I
> didn't see this flag set in qemu_system_suspend() in your previous patch.
>
> If so, we can already do:
>
> vm_set_suspended(s->vm_was_suspended);
>
> Irrelevant of RUN_STATE_SUSPENDED?
We need both terms of the expression.
If the vm *is* suspended (RUN_STATE_SUSPENDED), then vm_was_suspended = false.
We call global_state_store prior to vm_stop_force_state, so the incoming
side sees s->state = RUN_STATE_SUSPENDED and s->vm_was_suspended = false.
However, the runstate is RUN_STATE_INMIGRATE. When incoming finishes by
calling vm_start, we need to restore the suspended state. Thus in
global_state_post_load, we must set vm_was_suspended = true.
If the vm *was* suspended, but is currently stopped (eg RUN_STATE_PAUSED),
then vm_was_suspended = true. Migration from that state sets
vm_was_suspended = s->vm_was_suspended = true in global_state_post_load and
ends with runstate_set(RUN_STATE_PAUSED).
I will add a comment here in the code.
>> return 0;
>> }
>> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>> .fields = (VMStateField[]) {
>> VMSTATE_UINT32(size, GlobalState),
>> VMSTATE_BUFFER(runstate, GlobalState),
>> + VMSTATE_BOOL(vm_was_suspended, GlobalState),
>> VMSTATE_END_OF_LIST()
>> },
>> };
>
> I think this will break migration between old/new, unfortunately. And
> since the global state exist mostly for every VM, all VM setup should be
> affected, and over all archs.
Thanks, I keep forgetting that my binary tricks are no good here. However,
I have one other trick up my sleeve, which is to store vm_was_running in
global_state.runstate[strlen(runstate) + 2]. It is forwards and backwards
compatible, since that byte is always 0 in older qemu. It can be implemented
with a few lines of code change confined to global_state.c, versus many lines
spread across files to do it the conventional way using a compat property and
a subsection. Sound OK?
> We used to have the version_id field right above for adding fields, but I
> _think_ that will still break backward migration fron new->old binary, so
> not wanted. Juan can keep me honest.
>
> The best thing is still machine compat properties, afaict, to fix. It's
> slightly involved, but let me attach a sample diff for you (at the end,
> possibly working with your current patch kind-of squashed, but not ever
> tested), hopefully make it slightly easier.
>
> I'm wondering how bad it is to just ignore it, it's not as bad as if we
> don't fix stop-during-suspend, in this case the worst case of forgetting
> this field over migration is: if VM stopped (and used to be suspended) then
> after migration it'll keep being stopped, however after "cont" it'll forget
> the suspended state.
Cont changes state to running, but the VM is broken because wake was never
called.
> Not that bad! IIUC SPR should always migrate with
> suspended (rather than any fully stopped state), right? Then shouldn't be
> affected. If risk is low, maybe we can leave this one for later?
>
> Thanks,
Thanks for the patch. I am going to save this as a template for adding
compatible
subsections in future work.
- Steve
> ===8<===
>
> diff --git a/migration/migration.h b/migration/migration.h
> index cf2c9c88e0..c3fd1f8347 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -470,6 +470,8 @@ struct MigrationState {
> bool switchover_acked;
> /* Is this a rdma migration */
> bool rdma_migration;
> + /* Whether remember global vm_was_suspended field? */
> + bool store_vm_was_suspended;
> };
>
> void migrate_set_state(int *state, int old_state, int new_state);
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 0c17398141..365e01c1c9 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -37,6 +37,7 @@ GlobalProperty hw_compat_8_1[] = {
> { "ramfb", "x-migrate", "off" },
> { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" },
> { "igb", "x-pcie-flr-init", "off" },
> + { "migration", "store-vm-was-suspended", false },
> };
> const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
>
> diff --git a/migration/global_state.c b/migration/global_state.c
> index 4e2a9d8ec0..ffa7bf82ca 100644
> --- a/migration/global_state.c
> +++ b/migration/global_state.c
> @@ -25,6 +25,7 @@ typedef struct {
> uint8_t runstate[100];
> RunState state;
> bool received;
> + bool vm_was_suspended;
> } GlobalState;
>
> static GlobalState global_state;
> @@ -124,6 +125,25 @@ static int global_state_pre_save(void *opaque)
> return 0;
> }
>
> +static bool global_state_has_vm_was_suspended(void *opaque)
> +{
> + return migrate_get_current()->store_vm_was_suspended;
> +}
> +
> +static const VMStateDescription vmstate_vm_was_suspended = {
> + .name = "globalstate/vm_was_suspended",
> + .version_id = 1,
> + .minimum_version_id = 1,
> + /* TODO: Fill these in */
> + .pre_save = NULL,
> + .post_load = NULL,
> + .needed = global_state_has_vm_was_suspended,
> + .fields = (VMStateField[]) {
> + VMSTATE_BOOL(vm_was_suspended, GlobalState),
> + VMSTATE_END_OF_LIST()
> + },
> +};
> +
> static const VMStateDescription vmstate_globalstate = {
> .name = "globalstate",
> .version_id = 1,
> @@ -136,6 +156,10 @@ static const VMStateDescription vmstate_globalstate = {
> VMSTATE_BUFFER(runstate, GlobalState),
> VMSTATE_END_OF_LIST()
> },
> + .subsections = (const VMStateDescription*[]) {
> + &vmstate_vm_was_suspended,
> + NULL
> + },
> };
>
> void register_global_state(void)
> diff --git a/migration/options.c b/migration/options.c
> index 8d8ec73ad9..5f9998d3e8 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -88,6 +88,8 @@
> Property migration_properties[] = {
> DEFINE_PROP_BOOL("store-global-state", MigrationState,
> store_global_state, true),
> + DEFINE_PROP_BOOL("store-vm-was-suspended", MigrationState,
> + store_vm_was_suspended, true),
> DEFINE_PROP_BOOL("send-configuration", MigrationState,
> send_configuration, true),
> DEFINE_PROP_BOOL("send-section-footer", MigrationState,
>