On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
> [add migration maintainers]
>
> On 24.09.24 15:56, Andrey Drobyshev wrote:
>> [...]
>
> I doubt that this a correct way to go.
>
> As far as I understand, "inactive" actually means that "storage is not
> belong to qemu, but to someone else (another qemu process for example),
> and may be changed transparently". In turn this means that Qemu should
> do nothing with inactive disks. So the problem is that nobody called
> bdrv_activate_all on target, and we shouldn't ignore that.
>
> Hmm, I see in process_incoming_migration_bh() we do call
> bdrv_activate_all(), but only in some scenarios. May be, the condition
> should be less strict here.
>
> Why we need any condition here at all? Don't we want to activate
> block-layer on target after migration anyway?
>
Hmm I'm not sure about the unconditional activation, since we at least
have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
in such a case). In current libvirt upstream I see such code:
> /* Migration capabilities which should always be enabled as long as they
>
> * are supported by QEMU. If the capability is supposed to be enabled on both
>
> * sides of migration, it won't be enabled unless both sides support it.
>
> */
>
> static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] =
> {
>
> {QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,
>
> QEMU_MIGRATION_SOURCE},
>
>
>
> {QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,
>
> QEMU_MIGRATION_DESTINATION},
>
> };
which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.
The code from process_incoming_migration_bh() you're referring to:
> /* If capability late_block_activate is set:
>
> * Only fire up the block code now if we're going to restart the
>
> * VM, else 'cont' will do it.
>
> * This causes file locking to happen; so we don't want it to happen
>
> * unless we really are starting the VM.
>
> */
>
> if (!migrate_late_block_activate() ||
>
> (autostart && (!global_state_received() ||
>
> runstate_is_live(global_state_get_runstate())))) {
>
> /* Make sure all file formats throw away their mutable metadata.
>
>
> * If we get an error here, just don't restart the VM yet. */
>
> bdrv_activate_all(&local_err);
>
> if (local_err) {
>
> error_report_err(local_err);
>
> local_err = NULL;
>
> autostart = false;
>
> }
>
> }
It states explicitly that we're either going to start VM right at this
point if (autostart == true), or we wait till "cont" command happens.
None of this is going to happen if we start another migration while
still being in PAUSED state. So I think it seems reasonable to take
such case into account. For instance, this patch does prevent the crash:
> diff --git a/migration/migration.c b/migration/migration.c
> index ae2be31557..3222f6745b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
> */
> if (!migrate_late_block_activate() ||
> (autostart && (!global_state_received() ||
> - runstate_is_live(global_state_get_runstate())))) {
> + runstate_is_live(global_state_get_runstate()))) ||
> + (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
> /* Make sure all file formats throw away their mutable metadata.
> * If we get an error here, just don't restart the VM yet. */
> bdrv_activate_all(&local_err);
What are your thoughts on it?
Andrey