Re: [Qemu-devel] When does live migration give up?

Alex Bligh Wed, 04 Sep 2013 15:38:49 -0700

Paolo,
> 
> Do you mean something like this?
> 
>   destination
>      socket()
>      bind() to { sin_port = 0, sin_addr.s_addr = INADDR_ANY }
>      listen()
>      getsockname()
>      send address to source
>      accept()
>      start QEMU with file descriptor returned by accept
> 
>   source
>      read address
>      socket()
>      connect()
>      pass socket file descriptor to QEMU and migrate to it
> 
> Anything that doesn't use sin_port = 0 and getsockname() is prone to
> race conditions.


From memory we bind() to a specific randomly chosen port and if
that fails retry until bind() succeeds. This is because we
want the port to be within a given range. I believe that is
race free as only one bind() can run at once.

>> Approx 10% of migrations die after many minutes on the
>> customer's platform. This does not appear to happen if migrations are
>> not carried out 50 at a time.
> 
> Dying after many minutes usually means that the destination is not set
> up the same as the source, as you said below.

Hmmm. OK I thought that produced an immediate error. Is there any way
of logging what's up to stderr or similar etc?

Alex


> 
> Paolo
> 
>> We appear to be getting something other than 'ms' returned through the
>> monitoring system. Unhelpfully what that is is not logged.
>> 
>> Is there anything (apart from the socket closing prematurely) which can
>> cause a failed migration after many minutes? We've seen problems where
>> the destination is not set up the same as the source (e.g. different
>> numbers of NICs) but IIRC that fails much earlier.
>> 
>> To make things easier (cough), this is qemu 1.0 (as shipped with Ubuntu
>> Precise).
>> 
> 
> 
> 

-- 
Alex Bligh

Re: [Qemu-devel] When does live migration give up?

Reply via email to