* Denis V. Lunev ([email protected]) wrote: > There is a possibility to hit an assert in qcow2_get_specific_info that > s->qcow_version is undefined. This happens when VM in starting from > suspended state, i.e. it processes incoming migration, and in the same > time 'info block' is called. > > The problem is that qcow2_invalidate_cache() closes the image and > memset()s BDRVQcowState in the middle. > > The patch moves processing of bdrv_invalidate_cache_all out of > coroutine context for postcopy migration to avoid that. This function > is called with the following stack: > process_incoming_migration_co > qemu_loadvm_state > qemu_loadvm_state_main > loadvm_process_command > loadvm_postcopy_handle_run > > Signed-off-by: Denis V. Lunev <[email protected]> > CC: Paolo Bonzini <[email protected]> > CC: Juan Quintela <[email protected]> > CC: Amit Shah <[email protected]>
I've tested my normal postcopy smoke test and it seems to work, so Tested-by: Dr. David Alan Gilbert <[email protected]> however, can you add a comment before loadvm_postcopy_handle_run_bh to explain why it's a bh, otherwise at some point someone might put it back. I'll admit to not really understanding what the difference is between bh and coroutine context; I'd thought if it was all in the main thread stuff was safe. Dave > --- > migration/savevm.c | 27 +++++++++++++++++---------- > 1 file changed, 17 insertions(+), 10 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index 94f2894..8415fd9 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1496,18 +1496,10 @@ static int > loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > return 0; > } > > -/* After all discards we can start running and asking for pages */ > -static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +static void loadvm_postcopy_handle_run_bh(void *opaque) > { > - PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > Error *local_err = NULL; > > - trace_loadvm_postcopy_handle_run(); > - if (ps != POSTCOPY_INCOMING_LISTENING) { > - error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > - return -1; > - } > - > /* TODO we should move all of this lot into postcopy_ram.c or a shared > code > * in migration.c > */ > @@ -1519,7 +1511,6 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > bdrv_invalidate_cache_all(&local_err); > if (local_err) { > error_report_err(local_err); > - return -1; > } > > trace_loadvm_postcopy_handle_run_cpu_sync(); > @@ -1534,6 +1525,22 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > /* leave it paused and let management decide when to start the CPU */ > runstate_set(RUN_STATE_PAUSED); > } > +} > + > +/* After all discards we can start running and asking for pages */ > +static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +{ > + PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > + QEMUBH *bh; > + > + trace_loadvm_postcopy_handle_run(); > + if (ps != POSTCOPY_INCOMING_LISTENING) { > + error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > + return -1; > + } > + > + bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL); > + qemu_bh_schedule(bh); > > /* We need to finish reading the stream from the package > * and also stop reading anything more from the stream that loaded the > -- > 2.1.4 > > -- Dr. David Alan Gilbert / [email protected] / Manchester, UK
