Peter Xu <[email protected]> writes: > We set CANCELLED very late, it means migration_has_failed() may not work > correctly if it's invoked before updating CANCELLING to CANCELLED. >
The prophecy is fulfilled. https://wiki.qemu.org/ToDo/LiveMigration#Migration_cancel_concurrency I'm not sure I'm convinced, for instance, CANCELLING is part of migration_is_running(), while FAILED is not. This doesn't seem right. Another point is that CANCELLING is not a final state, so we're prone to later need a migration_has_finished_failing_now() helper. =) My mental model is that CANCELLING is a transitional, ongoing state where we shouldn't really be making assumptions. Once FAILED is reached, then we're sure in which general state everything is. How did you catch this? It was one of the cancel tests that failed? I just noticed that multifd_send_shutdown() is called from migration_cleanup() before it changes the state to CANCELLED. So current code also has whatever issue you detected here. > Allow that state will make migration_has_failed() working as expected even > if it's invoked slightly earlier. > > One current user is the multifd code for the TLS graceful termination, > where it's before updating to CANCELLED. > > Signed-off-by: Peter Xu <[email protected]> > --- > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 7015c2b5e0..397917b1b3 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1723,7 +1723,8 @@ int migration_call_notifiers(MigrationState *s, > MigrationEventType type, > > bool migration_has_failed(MigrationState *s) > { > - return (s->state == MIGRATION_STATUS_CANCELLED || > + return (s->state == MIGRATION_STATUS_CANCELLING || > + s->state == MIGRATION_STATUS_CANCELLED || > s->state == MIGRATION_STATUS_FAILED); > }
