Hi Dave,

On 2025-09-01 17:57, Dr. David Alan Gilbert wrote:
> * Peter Xu ([email protected]) wrote:
> > On Thu, Aug 14, 2025 at 05:42:23PM +0200, Juraj Marcin wrote:
> > > Fair point, I'll then continue with the PING/PONG solution, the first
> > > implementation I have seems to be working to resolve Issue 1.
> > > 
> > > For rarer split brain, we'll rely on block device locks/mgmt to resolve
> > > and change the failure handling, so it registers errors from disk
> > > activation.
> > > 
> > > As tested, there should be no problems with the destination
> > > transitioning to POSTCOPY_PAUSED, since the VM was not started yet.
> > > 
> > > However, to prevent the source side from transitioning to
> > > POSTCOPY_PAUSED, I think adding a new state is still the best option.
> > > 
> > > I tried keeping the migration states as they are now and just rely on an
> > > attribute of MigrationState if 3rd PONG was received, however, this
> > > collides with (at least) migrate_pause tests, that are waiting for
> > > POSTCOPY_ACTIVE, and then pause the migration triggering the source to
> > > resume. We could maybe work around it by waiting for the 3rd pong
> > > instead, but I am not sure if it is possible from tests, or by not
> > > resuming if migrate_pause command is executed?
> > > 
> > > I also tried extending the span of the DEVICE state, but some functions
> > > behave differently depending on if they are in postcopy or not, using
> > > the migration_in_postcopy() function, but adding the DEVICE there isn't
> > > working either. And treating the DEVICE state sometimes as postcopy and
> > > sometimes as not seems just too messy, if it would even be possible.
> > 
> > Yeah, it might indeed be a bit messy.
> > 
> > Is it possible to find a middle ground?  E.g. add postcopy-setup status,
> > but without any new knob to enable it?  Just to describe the period of time
> > where dest QEMU haven't started running but started loading device states.
> > 
> > The hope is libvirt (which, AFAIU, always enables the "events" capability)
> > can ignore the new postcopy-setup status transition, then maybe we can also
> > introduce the postcopy-setup and make it always appear.
> 
> When the destination is started with '-S' (autostart=false), which is what
> I think libvirt does, doesn't management only start the destination
> after a certain useful event?
> In other words, is there an event we already emit to say that the destination
> has finished loading the postcopy devices, or could we just add that
> event, so that management could just wait for that before issuing
> the continue?

I am not aware of any such event on the destination side. When postcopy
(and its switchower) starts, the destination transitions from ACTIVE
directly to POSTCOPY_ACTIVE in the listen thread while devices are
loaded concurrently by the main thread.

There is DEVICE state on the source side, but that is used only on the
source side when device state is being collected. When device state is
being loaded on the destination, the source side is also already in
POSTCOPY_ACTIVE state.

Best regards,

Juraj Marcin

> 
> Dave
> 
> > Thanks,
> > 
> > -- 
> > Peter Xu
> > 
> > 
> -- 
>  -----Open up your eyes, open up your mind, open up your code -------   
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
> 


Reply via email to