Cédric Le Goater <[email protected]> writes:
> In case of error, close_return_path_on_source() can perform a shutdown
> to exit the return-path thread. However, in migrate_fd_cleanup(),
> 'to_dst_file' is closed before calling close_return_path_on_source()
> and the shutdown fails, leaving the source and destination waiting for
> an event to occur.
Hi, Cédric
Are you sure this is not caused by patch 13? That 'if (ms->to_dst_file'
was there to avoid this sort of thing happening.
Is there some reordering possibility that I'm not spotting in the code
below? I think the data dependency on to_dst_file shouldn't allow it.
migrate_fd_cleanup:
qemu_mutex_lock(&s->qemu_file_lock);
tmp = s->to_dst_file;
s->to_dst_file = NULL;
qemu_mutex_unlock(&s->qemu_file_lock);
...
qemu_fclose(tmp);
close_return_path_on_source:
WITH_QEMU_LOCK_GUARD(&ms->qemu_file_lock) {
if (ms->to_dst_file && ms->rp_state.from_dst_file &&
qemu_file_get_error(ms->to_dst_file)) {
qemu_file_shutdown(ms->rp_state.from_dst_file);
}
}
I'm thinking maybe the culprit is the close_return_path_on_source() at
migration_completion(). It might be possible for it to race with the
migrate_fd_cleanup_bh from migration_iteration_finish().
If that's the case, then I think that one possible fix would be to hold
the BQL at migration_completion() so the BH doesn't get dispatched until
we properly close the return path.
>
> Close the file after calling close_return_path_on_source() so that the
> shutdown succeeds and the return-path thread exits.
>
> Signed-off-by: Cédric Le Goater <[email protected]>
> ---
>
> This is an RFC because the correct fix implies reworking the QEMUFile
> construct, built on top of the QEMU I/O channel.
>
> migration/migration.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index
> 5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
> 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int
> new_state)
>
> static void migrate_fd_cleanup(MigrationState *s)
> {
> + QEMUFile *tmp = NULL;
> +
> g_free(s->hostname);
> s->hostname = NULL;
> json_writer_free(s->vmdesc);
> @@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
> qemu_savevm_state_cleanup();
>
> if (s->to_dst_file) {
> - QEMUFile *tmp;
> -
> trace_migrate_fd_cleanup();
> bql_unlock();
> if (s->migration_thread_running) {
> @@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
> * critical section won't block for long.
> */
> migration_ioc_unregister_yank_from_file(tmp);
> - qemu_fclose(tmp);
> }
>
> - /*
> - * We already cleaned up to_dst_file, so errors from the return
> - * path might be due to that, ignore them.
> - */
> close_return_path_on_source(s);
>
> + if (tmp) {
> + qemu_fclose(tmp);
> + }
> +
> assert(!migration_is_active(s));
>
> if (s->state == MIGRATION_STATUS_CANCELLING) {