On Fri, Sep 15, 2017 at 04:20:26PM +0800, Fam Zheng wrote:
[...]
> > > void qmp_migrate_set_cache_size(int64_t value, Error **errp)
> > > {
> > > MigrationState *s = migrate_get_current();
> > > diff --git a/vl.c b/vl.c
> > > index fb1f05b937..abbe61f40b 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -87,6 +87,7 @@ int main(int argc, char **argv)
> > > #include "sysemu/blockdev.h"
> > > #include "hw/block/block.h"
> > > #include "migration/misc.h"
> > > +#include "migration/savevm.h"
> > > #include "migration/snapshot.h"
> > > #include "migration/global_state.h"
> > > #include "sysemu/tpm.h"
> > > @@ -4799,6 +4800,8 @@ int main(int argc, char **argv, char **envp)
> > > iothread_stop_all();
> > >
> > > pause_all_vcpus();
> > > + migrate_cancel();
> >
> > IIUC this is an async cancel, so when reach here the migration thread
> > can still be alive. Then...
> >
> > > + qemu_savevm_state_cleanup();
> >
> > ... Here calling qemu_savevm_state_cleanup() may be problematic if
> > migration thread has not yet quitted.
> >
> > I'm thinking whether we should make migrate_fd_cancel() wait until the
> > migration thread finishes (state change to CANCELLED). Then the
> > migration thread will do the cleanup, and here we can avoid calling
> > qemu_savevm_state_cleanup() as well.
>
> But if the migration thread is stuck and CANCELLED is never reached, we'll
> hang
> here?
Maybe we can add timeout. Anyway, if it gets stuck, I do see it a
migration bug, and in that case I'm not sure force return after
timeout would be anything wiser.
Dave/Juan may have better idea though.
--
Peter Xu