Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

Daniel P. Berrange Fri, 19 May 2017 03:05:04 -0700

On Fri, May 19, 2017 at 05:51:43PM +0800, Peter Xu wrote:
> On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > We don't really have a return path for the other types yet. Let's check
> > > this when .get_return_path() is called.
> > > 
> > > For this, we introduce a new feature bit, and set it up only for socket
> > > typed IO channels.
> > > 
> > > This will help detect earlier failure for postcopy, e.g., logically
> > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > With this patch, we'll get:
> > > 
> > > (qemu) migrate -d exec:cat>out
> > > Unable to open return-path for postcopy
> > 
> > This is wrong - post-copy migration *can* work with exec: - it just entirely
> > depends on what command you are running. Your example ran a command which is
> > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > bidirectional channel. Actually the channel is always bi-directional, but
> > 'cat' simply won't ever send data back to QEMU.
> 
> Indeed. I should not block postcopy if the user used a TCP tunnel
> between the source and destination in some way, using this exec: way.
> Thanks for pointing that out.
> 
> However I still think the idea is needed here. Say, we'd better know
> whether the transport would be able to respond (though current
> approach of "assuming sockets are the only ones that can reply" is not
> a good solution...). Please see below.
> 
> > 
> > If QEMU hangs when the other end doesn't send data back, that actually seems
> > like a potentially serious bug in migration code. Even if using the normal
> > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > send data to QEMU on the return path, the source QEMU must never hang.
> 
> Firstly I should not say it's a hang - it's actually by-design here
> imho - migration thread is in the last phase now, waiting for a SHUT
> message from destination (which I think is wise). But from the
> behavior, indeed src VM is not usable during the time, just like what
> happened for most postcopy cases on the source side. So, we can see
> that postcopy "assumes" that destination side can reply now.
> 
> Meanwhile, I see it reasonable for postcopy to have such an
> assumption. After all, postcopy means "start VM on destination before
> pages are moved over completely", then there must be someone to reply
> to source, no matter whether it'll be via some kind of io channel.
> 
> That's why I think we still need the general idea here, that we need
> to know whether destination end is able to reply.
> 
> But, I still have no good idea (after knowing this patch won't work)
> on how we can do this... Any further suggestions would be greatly
> welcomed.


IMHO this is nothing more than a documentation issue for the 'exec'
protocol. ie, document that you should provide a bi-directional
transport for live migration.

A uni-directional transport is arguably only valid if you're using
migrate to save/restore the VM state to a file.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed

Reply via email to