Hi Peter,
On 2025-08-07 16:54, Peter Xu wrote:
> On Thu, Aug 07, 2025 at 01:49:10PM +0200, Juraj Marcin wrote:
> > From: Juraj Marcin <[email protected]>
> >
> > Depending on where an error during postcopy_start() happens, the state
> > can be either "active", "device" or "cancelling", but never
> > "postcopy-active". Migration state is transitioned to "postcopy-active"
> > only just before a successful return from the function.
> >
> > Accept any state except "cancelling" when transitioning to "failed"
> > state.
> >
> > Signed-off-by: Juraj Marcin <[email protected]>
> > ---
> > migration/migration.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 10c216d25d..e5ce2940d5 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error
> > **errp)
> > fail_closefb:
> > qemu_fclose(fb);
> > fail:
> > - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> > - MIGRATION_STATUS_FAILED);
> > + if ( ms->state != MIGRATION_STATUS_CANCELLING) {
> > + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> > + }
>
> Hmm, this might have been overlooked from my commit 48814111366b. Maybe
> worth a Fixes and copy stable?
yeah, it looks like it. POSTCOPY_ACTIVE state used to be set way sooner
before. I'll add Fixes tag to the patch.
>
> For example, I would expect the old code (prior of 48814111366b) still be
> able to fail postcopy and resume src QEMU if qemu_savevm_send_packaged()
> failed. Now, looks like it'll be stuck at "device" state..
>
> The other thing is it also looks like a common pattern to set FAILED
> meanwhile not messing with a CANCELLING stage. It's not easy to always
> remember this, so maybe we should consider having a helper function?
>
> migrate_set_failure(MigrationState *, Error *err);
>
> Which could set err with migrate_set_error() (likely we could also
> error_report() the error), and update FAILED iff it's not CANCELLING.
>
> I saw three of such occurances that such helper may apply, but worth double
> check:
>
> postcopy_start[2725] if (ms->state != MIGRATION_STATUS_CANCELLING) {
> migration_completion[3069] if (s->state != MIGRATION_STATUS_CANCELLING) {
> igration_connect[4064] if (s->state != MIGRATION_STATUS_CANCELLING) {
>
> If the cleanup looks worthwhile, and if the Fixes apply, we could have the
> cleanup patch on top of the fixes patch so patch 1 is easier to backport.
Such function could be useful. I could also send it with the above fix
together as a separate patchset, and send it also to stable.
Best regards
Juraj Marcin
>
> Thanks,
>
> > migration_block_activate(NULL);
> > migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
> > bql_unlock();
> > --
> > 2.50.1
> >
>
> --
> Peter Xu
>