* Peter Xu ([email protected]) wrote:
> On Thu, Aug 14, 2025 at 05:42:23PM +0200, Juraj Marcin wrote:
> > Fair point, I'll then continue with the PING/PONG solution, the first
> > implementation I have seems to be working to resolve Issue 1.
> > 
> > For rarer split brain, we'll rely on block device locks/mgmt to resolve
> > and change the failure handling, so it registers errors from disk
> > activation.
> > 
> > As tested, there should be no problems with the destination
> > transitioning to POSTCOPY_PAUSED, since the VM was not started yet.
> > 
> > However, to prevent the source side from transitioning to
> > POSTCOPY_PAUSED, I think adding a new state is still the best option.
> > 
> > I tried keeping the migration states as they are now and just rely on an
> > attribute of MigrationState if 3rd PONG was received, however, this
> > collides with (at least) migrate_pause tests, that are waiting for
> > POSTCOPY_ACTIVE, and then pause the migration triggering the source to
> > resume. We could maybe work around it by waiting for the 3rd pong
> > instead, but I am not sure if it is possible from tests, or by not
> > resuming if migrate_pause command is executed?
> > 
> > I also tried extending the span of the DEVICE state, but some functions
> > behave differently depending on if they are in postcopy or not, using
> > the migration_in_postcopy() function, but adding the DEVICE there isn't
> > working either. And treating the DEVICE state sometimes as postcopy and
> > sometimes as not seems just too messy, if it would even be possible.
> 
> Yeah, it might indeed be a bit messy.
> 
> Is it possible to find a middle ground?  E.g. add postcopy-setup status,
> but without any new knob to enable it?  Just to describe the period of time
> where dest QEMU haven't started running but started loading device states.
> 
> The hope is libvirt (which, AFAIU, always enables the "events" capability)
> can ignore the new postcopy-setup status transition, then maybe we can also
> introduce the postcopy-setup and make it always appear.

When the destination is started with '-S' (autostart=false), which is what
I think libvirt does, doesn't management only start the destination
after a certain useful event?
In other words, is there an event we already emit to say that the destination
has finished loading the postcopy devices, or could we just add that
event, so that management could just wait for that before issuing
the continue?

Dave

> Thanks,
> 
> -- 
> Peter Xu
> 
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Reply via email to