Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

Daniel P . Berrangé Mon, 29 Nov 2021 07:01:35 -0800

On Mon, Nov 29, 2021 at 11:20:08AM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé ([email protected]) wrote:
> > On Fri, Nov 26, 2021 at 04:31:53PM +0100, Li Zhang wrote:
> > > When doing live migration with multifd channels 8, 16 or larger number,
> > > the guest hangs in the presence of the network errors such as missing TCP 
> > > ACKs.
> > > 
> > > At sender's side:
> > > The main thread is blocked on qemu_thread_join, migration_fd_cleanup
> > > is called because one thread fails on qio_channel_write_all when
> > > the network problem happens and other send threads are blocked on sendmsg.
> > > They could not be terminated. So the main thread is blocked on 
> > > qemu_thread_join
> > > to wait for the threads terminated.
> > 
> > Isn't the right answer here to ensure we've called 'shutdown' on
> > all the FDs, so that the threads get kicked out of sendmsg, before
> > trying to join the thread ?
> 
> I agree a timeout is wrong here; there is no way to get a good timeout
> value.
> However, I'm a bit confused - we should be able to try a shutdown on the
> receive side using the 'yank' command. - that's what it's there for; Li
> does this solve your problem?


Why do we even need to use 'yank' on the receive side ? Until migration
has switched over from src to dst, the receive side is discardable and
the whole process can just be teminated with kill(SIGTERM/SIGKILL).

On the source side 'yank' is needed, because the QEMU process is still
running the live workload and thus is precious and mustn't be killed.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 1/2] multifd: use qemu_sem_timedwait in multifd_recv_thread to avoid waiting forever

Reply via email to