On Fri, Oct 25, 2024 at 02:50:36PM +0200, Cédric Le Goater wrote: > On 10/24/24 23:30, Peter Xu wrote: > > Introduce migration_mutex, protecting concurrent updates to > > current_migration. > > > > In reality, most of the exported migration functions are safe to access > > migration objects on capabilities, etc., e.g. many of the code is invoked > > within migration thread via different hooks (e.g., precopy notifier, > > vmstate handler hooks, etc.). > > > > So literally the mutex so far only makes sure below two APIs that are prone > > to accessing freed current_migration: > > > > migration_is_running() > > migration_file_set_error() > > > > Then we'll need to take the mutex too when init/free the migration object. > > we should also drop : > > static void vfio_set_migration_error(int ret) > { > if (migration_is_running()) { > migration_file_set_error(ret, NULL); > } > } > > and use directly migration_file_set_error().
We'll need to export migration_is_running() anyway, though. So maybe we can do that as a VFIO's follow up? > > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > --- > > migration/migration.h | 3 +++ > > migration/migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 46 insertions(+) > > > > diff --git a/migration/migration.h b/migration/migration.h > > index 9fa26ab06a..05edcf0c49 100644 > > --- a/migration/migration.h > > +++ b/migration/migration.h > > @@ -473,6 +473,9 @@ struct MigrationState { > > bool rdma_migration; > > }; > > +extern QemuMutex migration_mutex; > > +#define QEMU_MIGRATION_LOCK_GUARD() QEMU_LOCK_GUARD(&migration_mutex) > > + > > Why are these definitions exported ? This is still only used in migration/ so it's not exported to QEMU. I was planning this can be available anywhere in migration/ when a function needs to be exported. However I think you're right.. this so far is even only used in migration.c, so we don't need to export it, at least until some other migration/*.c will export anything. Will remove it when repost. Thanks, > > Thanks, > > C. > > > > > void migrate_set_state(MigrationStatus *state, MigrationStatus old_state, > > MigrationStatus new_state); > > diff --git a/migration/migration.c b/migration/migration.c > > index 127b01734d..ef044968df 100644 > > --- a/migration/migration.c > > +++ b/migration/migration.c > > @@ -97,6 +97,14 @@ enum mig_rp_message_type { > > migrations at once. For now we don't need to add > > dynamic creation of migration */ > > +/* > > + * Protects current_migration below. Must be hold when using migration > > + * exported functions unless the caller knows it won't get freed. For > > + * example, when in the context of migration_thread() it's safe to access > > + * current_migration without the mutex, because the thread holds one extra > > + * refcount of the object, so it literally pins the object in-memory. > > + */ > > +QemuMutex migration_mutex; > > static MigrationState *current_migration; > > static MigrationIncomingState *current_incoming; > > @@ -110,6 +118,17 @@ static void migrate_fd_cancel(MigrationState *s); > > static bool close_return_path_on_source(MigrationState *s); > > static void migration_completion_end(MigrationState *s); > > +/* > > + * This is explicitly done without migration_object_init(), because it may > > + * start to use this lock already when instance_init() of the object. The > > + * mutex is alive for the whole lifecycle of QEMU, so it's always usable > > + * and never destroyed. > > + */ > > +static void __attribute__((constructor)) migration_mutex_init(void) > > +{ > > + qemu_mutex_init(&migration_mutex); > > +} > > + > > static void migration_downtime_start(MigrationState *s) > > { > > trace_vmstate_downtime_checkpoint("src-downtime-start"); > > @@ -336,6 +355,14 @@ void migration_shutdown(void) > > * stop the migration using this structure > > */ > > migration_cancel(NULL); > > + /* > > + * Release the refcount from the main thread. It means it can be freed > > + * here if migration thread is not running. > > + * > > + * NOTE: we don't need QEMU_MIGRATION_LOCK_GUARD() on this access > > + * because we're sure there's one refcount. The lock will be taken in > > + * finalize() if it triggers, so we can't take it here anyway. > > + */ > > object_unref(OBJECT(current_migration)); > > /* > > @@ -1118,8 +1145,14 @@ void > > migrate_send_rp_resume_ack(MigrationIncomingState *mis, uint32_t value) > > bool migration_is_running(void) > > { > > + QEMU_MIGRATION_LOCK_GUARD(); > > + > > MigrationState *s = current_migration; > > + if (!s) { > > + return false; > > + } > > + > > switch (s->state) { > > case MIGRATION_STATUS_ACTIVE: > > case MIGRATION_STATUS_POSTCOPY_ACTIVE: > > @@ -3029,8 +3062,14 @@ static MigThrError postcopy_pause(MigrationState *s) > > void migration_file_set_error(int ret, Error *err) > > { > > + QEMU_MIGRATION_LOCK_GUARD(); > > + > > MigrationState *s = current_migration; > > + if (!s) { > > + return; > > + } > > + > > WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) { > > if (s->to_dst_file) { > > qemu_file_set_error_obj(s->to_dst_file, ret, err); > > @@ -3835,6 +3874,8 @@ static void migration_instance_finalize(Object *obj) > > { > > MigrationState *ms = MIGRATION_OBJ(obj); > > + QEMU_MIGRATION_LOCK_GUARD(); > > + > > qemu_mutex_destroy(&ms->error_mutex); > > qemu_mutex_destroy(&ms->qemu_file_lock); > > qemu_sem_destroy(&ms->wait_unplug_sem); > > @@ -3858,6 +3899,8 @@ static void migration_instance_init(Object *obj) > > { > > MigrationState *ms = MIGRATION_OBJ(obj); > > + QEMU_MIGRATION_LOCK_GUARD(); > > + > > /* > > * There can only be one migration object globally. Keep a record of > > * the pointer in current_migration, which will be reset after the > -- Peter Xu