* Peter Xu ([email protected]) wrote: > On Wed, Aug 27, 2025 at 11:51:06PM +0000, Dr. David Alan Gilbert wrote: > > * Peter Xu ([email protected]) wrote: > > > Migration module was there for 10+ years. Initially, it was in most cases > > > based on coroutines. As more features were added into the framework, like > > > postcopy, multifd, etc.. it became a mixture of threads and coroutines. > > > > > > I'm guessing coroutines just can't fix all issues that migration want to > > > resolve. > > > > Yeh migration can happily eat a whole core. > > > > > After all these years, migration is now heavily based on a threaded model. > > > > > > Now there's still a major part of migration framework that is still not > > > thread-based, which is precopy load. We do load in a separate thread in > > > postcopy since the 1st day postcopy was introduced, however that requires > > > a > > > separate state transition from precopy loading all devices first, which > > > still happens in the main thread of a coroutine. > > > > ... > > > > > COLO > > > ==== > > > > If you can I suggest splitting the COLO stuff out as a separate thread, > > not many people understand it. > > I can try this one, but then it'll be a bunch of "if (qemu_in_coroutine())" > checks all over the places. > > For emample, this change of this patch: > > - assert(bql_locked()); > assert(migration_incoming_colo_enabled()); > > qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO, > colo_process_incoming_thread, > mis, QEMU_THREAD_JOINABLE); > > - mis->colo_incoming_co = qemu_coroutine_self(); > - qemu_coroutine_yield(); > - mis->colo_incoming_co = NULL; > - > - bql_unlock(); > /* Wait checkpoint incoming thread exit before free resource */ > qemu_thread_join(&th); > - bql_lock(); > > Will become: > > - assert(bql_locked()); > assert(migration_incoming_colo_enabled()); > > qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO, > colo_process_incoming_thread, > mis, QEMU_THREAD_JOINABLE); > > - mis->colo_incoming_co = qemu_coroutine_self(); > - qemu_coroutine_yield(); > - mis->colo_incoming_co = NULL; > + if (qemu_in_coroutine()) { > + assert(bql_locked()); > + mis->colo_incoming_co = qemu_coroutine_self(); > + qemu_coroutine_yield(); > + mis->colo_incoming_co = NULL; > + bql_unlock(); > + } > > - bql_unlock(); > /* Wait checkpoint incoming thread exit before free resource */ > qemu_thread_join(&th); > - bql_lock(); > + > + if (qemu_in_coroutine()) { > + bql_lock(); > + } > > Then I'll add one more patch at last to remove all these "if" blocks. > > Which one is better?
Not much difference is there. > > For the rest, I can still try to move things; migration_channel_read_peek() > change be a separate patch after this one, but that's pretty small.. not > so much like that, normally we'll still need such "if"s to be added prior > this patch, apply this patch, then removed those "if"s in another later patch. > > > > > > TODO > > > ==== > > > > > > Currently the BQL is taken during loading of a START|FULL section. When > > > the IO hangs (e.g. network issue) during this process, it could > > > potentially > > > block others like the monitor servers. One solution is breaking BQL to > > > smaller granule and leave IOs to be always BQL-free. That'll need more > > > justifications. > > > > > > For example, there are at least four things that need some closer > > > attention: > > > > > > - SaveVMHandlers's load_state(): this likely DO NOT need BQL, but we > > > need > > > to justify all of them (not to mention, some of them look like prone to > > > be rewritten as VMSDs..) > > > > > > - VMSD's pre_load(): in most cases, this DO NOT really need BQL, but > > > sometimes maybe it will! Double checking on this will be needed. > > > > > > - VMSD's post_load(): in many cases, this DO need BQL, for example on > > > address space operations. Likely we should just take it for any > > > post_load(). > > > > > > - VMSD field's get(): this is tricky! It could internally be anything > > > even if it was only a field. E.g. there can be users to use a SINGLE > > > field to load a whole VMSD, which can further introduce more > > > possibilities. > > > > Long long ago, I did convert some get's to structure; I got stuck on some > > though - some have pretty crazy hand built lists and things. > > Yeah, I can feel it even though I didn't look into each of them yet. :) > > Looks like they're all explicit VMS_SINGLE users; we have 22 instances. > Unfortunately, I still see new ones being added, latest one in > 5d56bff11e3d. I wonder whether pre_save() + post_load() would have worked > there.. I seem to remember the virtio stuff is particularly complicated, but remember other lists as well. > > > > > In general, QEMUFile IOs should not need BQL, that is when receiving the > > > VMSD data and waiting for e.g. the socket buffer to get refilled. But > > > that's the easy part. > > > > It's probably generally a good thing to get rid of the BQL there, but I bet > > it's going to throw some surprises; maybe something like devices doing > > stuff before the migration has fully arrived > > Is that pre_load() or.. maybe something else? > > I should still look into each of them, but only if we want to further push > the bql to be at post_load() level. I am not sure if some pre_load() would > assume BQL won't be released until post_load(), if so that'll be an issue, > and that will need some closer code observation... Well maybe pre_load; but anything that might start happening once the state has been loaded that shouldn't start happening until migration ends; I think there are some devices that do it properly and wait for end of migration. > > or incoming socket connections to non-migration stuff perhaps. > > Any example for this one? I was just thinking aloud; but was thinking of NIC activity or maybe UI stuff? But just guesses. Dave > Thanks! > > > > > Dave > > > > > Signed-off-by: Peter Xu <[email protected]> > > > --- > > > include/migration/colo.h | 6 ++-- > > > migration/migration.h | 52 ++++++++++++++++++++++++++------ > > > migration/savevm.h | 5 ++-- > > > migration/channel.c | 7 ++--- > > > migration/colo-stubs.c | 2 +- > > > migration/colo.c | 23 ++++----------- > > > migration/migration.c | 62 ++++++++++++++++++++++++++++---------- > > > migration/rdma.c | 5 ---- > > > migration/savevm.c | 64 ++++++++++++++++++++++++---------------- > > > migration/trace-events | 4 +-- > > > 10 files changed, 142 insertions(+), 88 deletions(-) > > > > > > diff --git a/include/migration/colo.h b/include/migration/colo.h > > > index 43222ef5ae..bfb30eccf0 100644 > > > --- a/include/migration/colo.h > > > +++ b/include/migration/colo.h > > > @@ -44,12 +44,10 @@ void colo_do_failover(void); > > > void colo_checkpoint_delay_set(void); > > > > > > /* > > > - * Starts COLO incoming process. Called from > > > process_incoming_migration_co() > > > + * Starts COLO incoming process. Called from migration_incoming_thread() > > > * after loading the state. > > > - * > > > - * Called with BQL locked, may temporary release BQL. > > > */ > > > -void coroutine_fn colo_incoming_co(void); > > > +void colo_incoming_wait(void); > > > > > > void colo_shutdown(void); > > > #endif > > > diff --git a/migration/migration.h b/migration/migration.h > > > index 01329bf824..c4a626eed4 100644 > > > --- a/migration/migration.h > > > +++ b/migration/migration.h > > > @@ -42,6 +42,44 @@ > > > #define MIGRATION_THREAD_DST_LISTEN "mig/dst/listen" > > > #define MIGRATION_THREAD_DST_PREEMPT "mig/dst/preempt" > > > > > > +/** > > > + * WITH_BQL_HELD(): Run a task, making sure BQL is held > > > + * > > > + * @bql_held: Whether BQL is already held > > > + * @task: The task to run within BQL held > > > + */ > > > +#define WITH_BQL_HELD(bql_held, task) \ > > > + do { \ > > > + if (!bql_held) { \ > > > + bql_lock(); \ > > > + } else { \ > > > + assert(bql_locked()); \ > > > + } \ > > > + task; \ > > > + if (!bql_held) { \ > > > + bql_unlock(); \ > > > + } \ > > > + } while (0) > > > + > > > +/** > > > + * WITHOUT_BQL_HELD(): Run a task, making sure BQL is released > > > + * > > > + * @bql_held: Whether BQL is already held > > > + * @task: The task to run making sure BQL released > > > + */ > > > +#define WITHOUT_BQL_HELD(bql_held, task) \ > > > + do { \ > > > + if (bql_held) { \ > > > + bql_unlock(); \ > > > + } else { \ > > > + assert(!bql_locked()); \ > > > + } \ > > > + task; \ > > > + if (bql_held) { \ > > > + bql_lock(); \ > > > + } \ > > > + } while (0) > > > + > > > struct PostcopyBlocktimeContext; > > > typedef struct ThreadPool ThreadPool; > > > > > > @@ -119,6 +157,10 @@ struct MigrationIncomingState { > > > bool have_listen_thread; > > > QemuThread listen_thread; > > > > > > + /* Migration main recv thread */ > > > + bool have_recv_thread; > > > + QemuThread recv_thread; > > > + > > > /* For the kernel to send us notifications */ > > > int userfault_fd; > > > /* To notify the fault_thread to wake, e.g., when need to quit */ > > > @@ -177,15 +219,7 @@ struct MigrationIncomingState { > > > > > > MigrationStatus state; > > > > > > - /* > > > - * The incoming migration coroutine, non-NULL during > > > qemu_loadvm_state(). > > > - * Used to wake the migration incoming coroutine from rdma code. How > > > much is > > > - * it safe - it's a question. > > > - */ > > > - Coroutine *loadvm_co; > > > - > > > - /* The coroutine we should enter (back) after failover */ > > > - Coroutine *colo_incoming_co; > > > + /* Notify secondary VM to move on */ > > > QemuEvent colo_incoming_event; > > > > > > /* Optional load threads pool and its thread exit request flag */ > > > diff --git a/migration/savevm.h b/migration/savevm.h > > > index 2d5e9c7166..c07e14f61a 100644 > > > --- a/migration/savevm.h > > > +++ b/migration/savevm.h > > > @@ -64,9 +64,10 @@ void qemu_savevm_send_colo_enable(QEMUFile *f); > > > void qemu_savevm_live_state(QEMUFile *f); > > > int qemu_save_device_state(QEMUFile *f); > > > > > > -int qemu_loadvm_state(QEMUFile *f); > > > +int qemu_loadvm_state(QEMUFile *f, bool bql_held); > > > void qemu_loadvm_state_cleanup(MigrationIncomingState *mis); > > > -int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis); > > > +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis, > > > + bool bql_held); > > > int qemu_load_device_state(QEMUFile *f); > > > int qemu_loadvm_approve_switchover(void); > > > int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f, > > > diff --git a/migration/channel.c b/migration/channel.c > > > index a547b1fbfe..621f8a4a2a 100644 > > > --- a/migration/channel.c > > > +++ b/migration/channel.c > > > @@ -136,11 +136,8 @@ int migration_channel_read_peek(QIOChannel *ioc, > > > } > > > > > > /* 1ms sleep. */ > > > - if (qemu_in_coroutine()) { > > > - qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, 1000000); > > > - } else { > > > - g_usleep(1000); > > > - } > > > + assert(!qemu_in_coroutine()); > > > + g_usleep(1000); > > > } > > > > > > return 0; > > > diff --git a/migration/colo-stubs.c b/migration/colo-stubs.c > > > index e22ce65234..ef77d1ab4b 100644 > > > --- a/migration/colo-stubs.c > > > +++ b/migration/colo-stubs.c > > > @@ -9,7 +9,7 @@ void colo_shutdown(void) > > > { > > > } > > > > > > -void coroutine_fn colo_incoming_co(void) > > > +void colo_incoming_wait(void) > > > { > > > } > > > > > > diff --git a/migration/colo.c b/migration/colo.c > > > index e0f713c837..f5722d9d9d 100644 > > > --- a/migration/colo.c > > > +++ b/migration/colo.c > > > @@ -147,11 +147,6 @@ static void secondary_vm_do_failover(void) > > > } > > > /* Notify COLO incoming thread that failover work is finished */ > > > qemu_event_set(&mis->colo_incoming_event); > > > - > > > - /* For Secondary VM, jump to incoming co */ > > > - if (mis->colo_incoming_co) { > > > - qemu_coroutine_enter(mis->colo_incoming_co); > > > - } > > > } > > > > > > static void primary_vm_do_failover(void) > > > @@ -686,7 +681,7 @@ static void > > > colo_incoming_process_checkpoint(MigrationIncomingState *mis, > > > > > > bql_lock(); > > > cpu_synchronize_all_states(); > > > - ret = qemu_loadvm_state_main(mis->from_src_file, mis); > > > + ret = qemu_loadvm_state_main(mis->from_src_file, mis, true); > > > bql_unlock(); > > > > > > if (ret < 0) { > > > @@ -854,10 +849,8 @@ static void *colo_process_incoming_thread(void > > > *opaque) > > > goto out; > > > } > > > /* > > > - * Note: the communication between Primary side and Secondary side > > > - * should be sequential, we set the fd to unblocked in migration > > > incoming > > > - * coroutine, and here we are in the COLO incoming thread, so it is > > > ok to > > > - * set the fd back to blocked. > > > + * Here we are in the COLO incoming thread, so it is ok to set the fd > > > + * to blocked. > > > */ > > > qemu_file_set_blocking(mis->from_src_file, true); > > > > > > @@ -930,26 +923,20 @@ out: > > > return NULL; > > > } > > > > > > -void coroutine_fn colo_incoming_co(void) > > > +/* Wait for failover */ > > > +void colo_incoming_wait(void) > > > { > > > MigrationIncomingState *mis = migration_incoming_get_current(); > > > QemuThread th; > > > > > > - assert(bql_locked()); > > > assert(migration_incoming_colo_enabled()); > > > > > > qemu_thread_create(&th, MIGRATION_THREAD_DST_COLO, > > > colo_process_incoming_thread, > > > mis, QEMU_THREAD_JOINABLE); > > > > > > - mis->colo_incoming_co = qemu_coroutine_self(); > > > - qemu_coroutine_yield(); > > > - mis->colo_incoming_co = NULL; > > > - > > > - bql_unlock(); > > > /* Wait checkpoint incoming thread exit before free resource */ > > > qemu_thread_join(&th); > > > - bql_lock(); > > > > > > /* We hold the global BQL, so it is safe here */ > > > colo_release_ram_cache(); > > > diff --git a/migration/migration.c b/migration/migration.c > > > index 10c216d25d..7e4d25b15c 100644 > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -494,6 +494,11 @@ void migration_incoming_state_destroy(void) > > > mis->postcopy_qemufile_dst = NULL; > > > } > > > > > > + if (mis->have_recv_thread) { > > > + qemu_thread_join(&mis->recv_thread); > > > + mis->have_recv_thread = false; > > > + } > > > + > > > cpr_set_incoming_mode(MIG_MODE_NONE); > > > yank_unregister_instance(MIGRATION_YANK_INSTANCE); > > > } > > > @@ -864,30 +869,46 @@ static void process_incoming_migration_bh(void > > > *opaque) > > > migration_incoming_state_destroy(); > > > } > > > > > > -static void coroutine_fn > > > -process_incoming_migration_co(void *opaque) > > > +static void migration_incoming_state_destroy_bh(void *opaque) > > > +{ > > > + struct MigrationIncomingState *mis = opaque; > > > + > > > + if (mis->exit_on_error) { > > > + /* > > > + * NOTE: this exit() should better happen in the main thread, as > > > + * the exit notifier may require BQL which can deadlock. See > > > + * commit e7bc0204e57836 for example. > > > + */ > > > + exit(EXIT_FAILURE); > > > + } > > > + > > > + migration_incoming_state_destroy(); > > > +} > > > + > > > +static void *migration_incoming_thread(void *opaque) > > > { > > > MigrationState *s = migrate_get_current(); > > > - MigrationIncomingState *mis = migration_incoming_get_current(); > > > + MigrationIncomingState *mis = opaque; > > > PostcopyState ps; > > > int ret; > > > Error *local_err = NULL; > > > > > > + rcu_register_thread(); > > > + > > > assert(mis->from_src_file); > > > + assert(!bql_locked()); > > > > > > mis->largest_page_size = qemu_ram_pagesize_largest(); > > > postcopy_state_set(POSTCOPY_INCOMING_NONE); > > > migrate_set_state(&mis->state, MIGRATION_STATUS_SETUP, > > > MIGRATION_STATUS_ACTIVE); > > > > > > - mis->loadvm_co = qemu_coroutine_self(); > > > - ret = qemu_loadvm_state(mis->from_src_file); > > > - mis->loadvm_co = NULL; > > > + ret = qemu_loadvm_state(mis->from_src_file, false); > > > > > > trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed"); > > > > > > ps = postcopy_state_get(); > > > - trace_process_incoming_migration_co_end(ret, ps); > > > + trace_process_incoming_migration_end(ret, ps); > > > if (ps != POSTCOPY_INCOMING_NONE) { > > > if (ps == POSTCOPY_INCOMING_ADVISE) { > > > /* > > > @@ -901,7 +922,7 @@ process_incoming_migration_co(void *opaque) > > > * Postcopy was started, cleanup should happen at the end of > > > the > > > * postcopy thread. > > > */ > > > - trace_process_incoming_migration_co_postcopy_end_main(); > > > + trace_process_incoming_migration_postcopy_end_main(); > > > goto out; > > > } > > > /* Else if something went wrong then just fall out of the normal > > > exit */ > > > @@ -913,8 +934,8 @@ process_incoming_migration_co(void *opaque) > > > } > > > > > > if (migration_incoming_colo_enabled()) { > > > - /* yield until COLO exit */ > > > - colo_incoming_co(); > > > + /* wait until COLO exits */ > > > + colo_incoming_wait(); > > > } > > > > > > migration_bh_schedule(process_incoming_migration_bh, mis); > > > @@ -926,19 +947,24 @@ fail: > > > migrate_set_error(s, local_err); > > > error_free(local_err); > > > > > > - migration_incoming_state_destroy(); > > > - > > > if (mis->exit_on_error) { > > > WITH_QEMU_LOCK_GUARD(&s->error_mutex) { > > > error_report_err(s->error); > > > s->error = NULL; > > > } > > > - > > > - exit(EXIT_FAILURE); > > > } > > > + > > > + /* > > > + * There's some step of the destroy process that will need to happen > > > in > > > + * the main thread (e.g. joining this thread itself). Leave to a BH. > > > + */ > > > + migration_bh_schedule(migration_incoming_state_destroy_bh, (void > > > *)mis); > > > + > > > out: > > > /* Pairs with the refcount taken in qmp_migrate_incoming() */ > > > migrate_incoming_unref_outgoing_state(); > > > + rcu_unregister_thread(); > > > + return NULL; > > > } > > > > > > /** > > > @@ -956,8 +982,12 @@ static void migration_incoming_setup(QEMUFile *f) > > > > > > void migration_incoming_process(void) > > > { > > > - Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, > > > NULL); > > > - qemu_coroutine_enter(co); > > > + MigrationIncomingState *mis = migration_incoming_get_current(); > > > + > > > + mis->have_recv_thread = true; > > > + qemu_thread_create(&mis->recv_thread, "mig/dst/main", > > > + migration_incoming_thread, mis, > > > + QEMU_THREAD_JOINABLE); > > > } > > > > > > /* Returns true if recovered from a paused migration, otherwise false */ > > > diff --git a/migration/rdma.c b/migration/rdma.c > > > index bcd7aae2f2..2b995513aa 100644 > > > --- a/migration/rdma.c > > > +++ b/migration/rdma.c > > > @@ -3068,7 +3068,6 @@ static void rdma_cm_poll_handler(void *opaque) > > > { > > > RDMAContext *rdma = opaque; > > > struct rdma_cm_event *cm_event; > > > - MigrationIncomingState *mis = migration_incoming_get_current(); > > > > > > if (rdma_get_cm_event(rdma->channel, &cm_event) < 0) { > > > error_report("get_cm_event failed %d", errno); > > > @@ -3087,10 +3086,6 @@ static void rdma_cm_poll_handler(void *opaque) > > > } > > > } > > > rdma_ack_cm_event(cm_event); > > > - if (mis->loadvm_co) { > > > - qemu_coroutine_enter(mis->loadvm_co); > > > - } > > > - return; > > > } > > > rdma_ack_cm_event(cm_event); > > > } > > > diff --git a/migration/savevm.c b/migration/savevm.c > > > index fabbeb296a..ad606c5425 100644 > > > --- a/migration/savevm.c > > > +++ b/migration/savevm.c > > > @@ -154,11 +154,10 @@ static void > > > qemu_loadvm_thread_pool_destroy(MigrationIncomingState *mis) > > > } > > > > > > static bool qemu_loadvm_thread_pool_wait(MigrationState *s, > > > - MigrationIncomingState *mis) > > > + MigrationIncomingState *mis, > > > + bool bql_held) > > > { > > > - bql_unlock(); /* Let load threads do work requiring BQL */ > > > - thread_pool_wait(mis->load_threads); > > > - bql_lock(); > > > + WITHOUT_BQL_HELD(bql_held, thread_pool_wait(mis->load_threads)); > > > > > > return !migrate_has_error(s); > > > } > > > @@ -2091,14 +2090,11 @@ static void *postcopy_ram_listen_thread(void > > > *opaque) > > > trace_postcopy_ram_listen_thread_start(); > > > > > > rcu_register_thread(); > > > - /* > > > - * Because we're a thread and not a coroutine we can't yield > > > - * in qemu_file, and thus we must be blocking now. > > > - */ > > > + /* Because we're a thread, making sure to use blocking mode */ > > > qemu_file_set_blocking(f, true); > > > > > > /* TODO: sanity check that only postcopiable data will be loaded > > > here */ > > > - load_res = qemu_loadvm_state_main(f, mis); > > > + load_res = qemu_loadvm_state_main(f, mis, false); > > > > > > /* > > > * This is tricky, but, mis->from_src_file can change after it > > > @@ -2392,13 +2388,14 @@ static int > > > loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > > * Immediately following this command is a blob of data containing an > > > embedded > > > * chunk of migration stream; read it and load it. > > > * > > > - * @mis: Incoming state > > > - * @length: Length of packaged data to read > > > + * @mis: Incoming state > > > + * @bql_held: Whether BQL is held already > > > * > > > * Returns: Negative values on error > > > * > > > */ > > > -static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis) > > > +static int loadvm_handle_cmd_packaged(MigrationIncomingState *mis, > > > + bool bql_held) > > > { > > > int ret; > > > size_t length; > > > @@ -2449,7 +2446,7 @@ static int > > > loadvm_handle_cmd_packaged(MigrationIncomingState *mis) > > > qemu_coroutine_yield(); > > > } while (1); > > > > > > - ret = qemu_loadvm_state_main(packf, mis); > > > + ret = qemu_loadvm_state_main(packf, mis, bql_held); > > > trace_loadvm_handle_cmd_packaged_main(ret); > > > qemu_fclose(packf); > > > object_unref(OBJECT(bioc)); > > > @@ -2539,7 +2536,7 @@ static int > > > loadvm_postcopy_handle_switchover_start(void) > > > * LOADVM_QUIT All good, but exit the loop > > > * <0 Error > > > */ > > > -static int loadvm_process_command(QEMUFile *f) > > > +static int loadvm_process_command(QEMUFile *f, bool bql_held) > > > { > > > MigrationIncomingState *mis = migration_incoming_get_current(); > > > uint16_t cmd; > > > @@ -2609,7 +2606,7 @@ static int loadvm_process_command(QEMUFile *f) > > > break; > > > > > > case MIG_CMD_PACKAGED: > > > - return loadvm_handle_cmd_packaged(mis); > > > + return loadvm_handle_cmd_packaged(mis, bql_held); > > > > > > case MIG_CMD_POSTCOPY_ADVISE: > > > return loadvm_postcopy_handle_advise(mis, len); > > > @@ -3028,7 +3025,8 @@ static bool > > > postcopy_pause_incoming(MigrationIncomingState *mis) > > > return true; > > > } > > > > > > -int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis) > > > +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis, > > > + bool bql_held) > > > { > > > uint8_t section_type; > > > int ret = 0; > > > @@ -3046,7 +3044,15 @@ retry: > > > switch (section_type) { > > > case QEMU_VM_SECTION_START: > > > case QEMU_VM_SECTION_FULL: > > > - ret = qemu_loadvm_section_start_full(f, section_type); > > > + /* > > > + * FULL should normally require BQL, e.g. during post_load() > > > + * there can be memory region updates. START may or may not > > > + * require it, but just to keep it simple to always hold BQL > > > + * for now. > > > + */ > > > + WITH_BQL_HELD( > > > + bql_held, > > > + ret = qemu_loadvm_section_start_full(f, section_type)); > > > if (ret < 0) { > > > goto out; > > > } > > > @@ -3059,7 +3065,11 @@ retry: > > > } > > > break; > > > case QEMU_VM_COMMAND: > > > - ret = loadvm_process_command(f); > > > + /* > > > + * Be careful; QEMU_VM_COMMAND can embed FULL sections, so it > > > + * may internally need BQL. > > > + */ > > > + ret = loadvm_process_command(f, bql_held); > > > trace_qemu_loadvm_state_section_command(ret); > > > if ((ret < 0) || (ret == LOADVM_QUIT)) { > > > goto out; > > > @@ -3103,7 +3113,7 @@ out: > > > return ret; > > > } > > > > > > -int qemu_loadvm_state(QEMUFile *f) > > > +int qemu_loadvm_state(QEMUFile *f, bool bql_held) > > > { > > > MigrationState *s = migrate_get_current(); > > > MigrationIncomingState *mis = migration_incoming_get_current(); > > > @@ -3131,9 +3141,10 @@ int qemu_loadvm_state(QEMUFile *f) > > > qemu_loadvm_state_switchover_ack_needed(mis); > > > } > > > > > > - cpu_synchronize_all_pre_loadvm(); > > > + /* run_on_cpu() requires BQL */ > > > + WITH_BQL_HELD(bql_held, cpu_synchronize_all_pre_loadvm()); > > > > > > - ret = qemu_loadvm_state_main(f, mis); > > > + ret = qemu_loadvm_state_main(f, mis, bql_held); > > > qemu_event_set(&mis->main_thread_load_event); > > > > > > trace_qemu_loadvm_state_post_main(ret); > > > @@ -3149,7 +3160,7 @@ int qemu_loadvm_state(QEMUFile *f) > > > /* When reaching here, it must be precopy */ > > > if (ret == 0) { > > > if (migrate_has_error(migrate_get_current()) || > > > - !qemu_loadvm_thread_pool_wait(s, mis)) { > > > + !qemu_loadvm_thread_pool_wait(s, mis, bql_held)) { > > > ret = -EINVAL; > > > } else { > > > ret = qemu_file_get_error(f); > > > @@ -3196,7 +3207,8 @@ int qemu_loadvm_state(QEMUFile *f) > > > } > > > } > > > > > > - cpu_synchronize_all_post_init(); > > > + /* run_on_cpu() requires BQL */ > > > + WITH_BQL_HELD(bql_held, cpu_synchronize_all_post_init()); > > > > > > return ret; > > > } > > > @@ -3207,7 +3219,7 @@ int qemu_load_device_state(QEMUFile *f) > > > int ret; > > > > > > /* Load QEMU_VM_SECTION_FULL section */ > > > - ret = qemu_loadvm_state_main(f, mis); > > > + ret = qemu_loadvm_state_main(f, mis, true); > > > if (ret < 0) { > > > error_report("Failed to load device state: %d", ret); > > > return ret; > > > @@ -3438,7 +3450,7 @@ void qmp_xen_load_devices_state(const char > > > *filename, Error **errp) > > > f = qemu_file_new_input(QIO_CHANNEL(ioc)); > > > object_unref(OBJECT(ioc)); > > > > > > - ret = qemu_loadvm_state(f); > > > + ret = qemu_loadvm_state(f, true); > > > qemu_fclose(f); > > > if (ret < 0) { > > > error_setg(errp, "loading Xen device state failed"); > > > @@ -3512,7 +3524,7 @@ bool load_snapshot(const char *name, const char > > > *vmstate, > > > ret = -EINVAL; > > > goto err_drain; > > > } > > > - ret = qemu_loadvm_state(f); > > > + ret = qemu_loadvm_state(f, true); > > > migration_incoming_state_destroy(); > > > > > > bdrv_drain_all_end(); > > > diff --git a/migration/trace-events b/migration/trace-events > > > index 706db97def..eeb41e03f1 100644 > > > --- a/migration/trace-events > > > +++ b/migration/trace-events > > > @@ -193,8 +193,8 @@ source_return_path_thread_resume_ack(uint32_t v) > > > "%"PRIu32 > > > source_return_path_thread_switchover_acked(void) "" > > > migration_thread_low_pending(uint64_t pending) "%" PRIu64 > > > migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t > > > bandwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " > > > time_spent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " > > > max_size %" PRId64 > > > -process_incoming_migration_co_end(int ret, int ps) "ret=%d > > > postcopy-state=%d" > > > -process_incoming_migration_co_postcopy_end_main(void) "" > > > +process_incoming_migration_end(int ret, int ps) "ret=%d > > > postcopy-state=%d" > > > +process_incoming_migration_postcopy_end_main(void) "" > > > postcopy_preempt_enabled(bool value) "%d" > > > migration_precopy_complete(void) "" > > > > > > -- > > > 2.50.1 > > > > > -- > > -----Open up your eyes, open up your mind, open up your code ------- > > / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ > > \ dave @ treblig.org | | In Hex / > > \ _________________________|_____ http://www.treblig.org |_______/ > > > > -- > Peter Xu > -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/
