Hi Fabiano
On 2025-08-08 16:08, Fabiano Rosas wrote:
> Peter Xu <[email protected]> writes:
>
> > On Thu, Aug 07, 2025 at 01:49:10PM +0200, Juraj Marcin wrote:
> >> From: Juraj Marcin <[email protected]>
> >>
> >> Depending on where an error during postcopy_start() happens, the state
> >> can be either "active", "device" or "cancelling", but never
> >> "postcopy-active". Migration state is transitioned to "postcopy-active"
> >> only just before a successful return from the function.
> >>
> >> Accept any state except "cancelling" when transitioning to "failed"
> >> state.
> >>
> >> Signed-off-by: Juraj Marcin <[email protected]>
> >> ---
> >> migration/migration.c | 5 +++--
> >> 1 file changed, 3 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 10c216d25d..e5ce2940d5 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -2872,8 +2872,9 @@ static int postcopy_start(MigrationState *ms, Error
> >> **errp)
> >> fail_closefb:
> >> qemu_fclose(fb);
> >> fail:
> >> - migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE,
> >> - MIGRATION_STATUS_FAILED);
> >> + if ( ms->state != MIGRATION_STATUS_CANCELLING) {
> >> + migrate_set_state(&ms->state, ms->state, MIGRATION_STATUS_FAILED);
> >> + }
> >
> > Hmm, this might have been overlooked from my commit 48814111366b. Maybe
> > worth a Fixes and copy stable?
> >
> > For example, I would expect the old code (prior of 48814111366b) still be
> > able to fail postcopy and resume src QEMU if qemu_savevm_send_packaged()
> > failed. Now, looks like it'll be stuck at "device" state..
> >
> > The other thing is it also looks like a common pattern to set FAILED
> > meanwhile not messing with a CANCELLING stage. It's not easy to always
> > remember this, so maybe we should consider having a helper function?
> >
> > migrate_set_failure(MigrationState *, Error *err);
> >
>
> We didn't do it back then because at there would be some logical
> conflict with this series:
>
> https://lore.kernel.org/r/[email protected]
>
> But I don't remember the details. If it works this time I'm all for it.
Thanks! I will look into that.
Best regards,
Juraj Marcin
>
> > Which could set err with migrate_set_error() (likely we could also
> > error_report() the error), and update FAILED iff it's not CANCELLING.
> >
> > I saw three of such occurances that such helper may apply, but worth double
> > check:
> >
> > postcopy_start[2725] if (ms->state !=
> > MIGRATION_STATUS_CANCELLING) {
> > migration_completion[3069] if (s->state != MIGRATION_STATUS_CANCELLING)
> > {
> > igration_connect[4064] if (s->state != MIGRATION_STATUS_CANCELLING) {
> >
> > If the cleanup looks worthwhile, and if the Fixes apply, we could have the
> > cleanup patch on top of the fixes patch so patch 1 is easier to backport.
> >
> > Thanks,
> >
> >> migration_block_activate(NULL);
> >> migration_call_notifiers(ms, MIG_EVENT_PRECOPY_FAILED, NULL);
> >> bql_unlock();
> >> --
> >> 2.50.1
> >>
>