On Mon, Sep 22, 2025 at 03:34:19PM +0200, Juraj Marcin wrote:
> Hi Peter,
>
> On 2025-09-19 13:50, Peter Xu wrote:
> > On Fri, Sep 19, 2025 at 12:58:08PM -0400, Peter Xu wrote:
> > > > @@ -2564,6 +2569,11 @@ static void *source_return_path_thread(void
> > > > *opaque)
> > > > tmp32 = ldl_be_p(buf);
> > > > trace_source_return_path_thread_pong(tmp32);
> > > > qemu_sem_post(&ms->rp_state.rp_pong_acks);
> > > > + if (tmp32 == QEMU_VM_PING_PACKAGED_LOADED) {
> > > > + trace_source_return_path_thread_dst_started();
> > > > + migrate_set_state(&ms->state,
> > > > MIGRATION_STATUS_POSTCOPY_DEVICE,
> > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > >
> > > Could this race with the migration thread modifying the state
> > > concurrently?
> > >
> > > To avoid it, we could have a bool, set it here once, and in the iterations
> > > do something like:
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 10c216d25d..55230e10ee 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -3449,6 +3449,16 @@ static MigIterateState
> > > migration_iteration_run(MigrationState *s)
> > > trace_migrate_pending_estimate(pending_size, must_precopy,
> > > can_postcopy);
> > >
> > > if (in_postcopy) {
> > > + if (s->postcopy_package_loaded) {
> > > + assert(s->state == MIGRATION_STATUS_POSTCOPY_DEVICE);
> > > + migrate_set_state(s->state, MIGRATION_STATUS_POSTCOPY_DEVICE,
> > > + MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > + /*
> > > + * Since postcopy cannot be re-initiated, this flag will only
> > > + * be set at most once for QEMU's whole lifecyce.
> > > + */
> > > + s->postcopy_package_loaded = false;
> > > + }
> > > /*
> > > * Iterate in postcopy until all pending data flushed. Note that
> > > * postcopy completion doesn't rely on can_switchover, because
> > > when
>
> It was there in the RFC version, when there was mutual handshake before
> dst starting, but I thought cmp&exchange would be enough. I can add it
> again, however, there is no need to set it to false afterwards, we can
> simply check if this condition is true:
> s->postcopy_package_loaded && s->state == MIGRATION_STATUS_POSTCOPY_DEVICE.
Setting it to false was a safety measure. Indeed not needed, but when so
we need to be extremely careful to always check with above two conditions
to avoid it frequently triggers. So I thought clearing it would be much
easier to read. I can wait and read the new version whatever you prefer.
>
> >
> > [...]
> >
> > > > @@ -2871,7 +2882,7 @@ static int postcopy_start(MigrationState *ms,
> > > > Error **errp)
> > > >
> > > > /* Now, switchover looks all fine, switching to postcopy-active */
> > > > migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE,
> > > > - MIGRATION_STATUS_POSTCOPY_ACTIVE);
> > > > + MIGRATION_STATUS_POSTCOPY_DEVICE);
> > > >
> > > > bql_unlock();
> > > >
> > > > @@ -3035,7 +3046,8 @@ static void migration_completion(MigrationState
> > > > *s)
> > > >
> > > > if (s->state == MIGRATION_STATUS_ACTIVE) {
> > > > ret = migration_completion_precopy(s);
> > > > - } else if (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > > > + } else if (s->state == MIGRATION_STATUS_POSTCOPY_DEVICE ||
> > > > + s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
> > >
> > > Exactly. We need to be prepared that src sending too fast so when device
> > > loading on dest we finished.
> >
> > One thing more to mention here.. which may void some previous comments I
> > left. Let's discuss.
> >
> > I think it may also make sense to only allow a COMPLETE after
> > POSTCOPY_ACTIVE.
> >
> > That is, if src sends too fast to have finished sending everything,
> > reaching COMPLETE during POSTCOPY_DEVICE, that is, while before it receives
> > the new PONG you defined, then.. I _think_ it is better to wait for that.
> >
> > If it finally arrives, then it's perfect, we switch to POSTCOPY_ACTIVE,
> > then continue the completion.
> >
> > If the channel is broken before its arrival, logically we should handle
> > this case as a FAILURE and restart the VM on src.
> >
> > It's only relevant in a corner case, but does that sound better?
>
> Yes, it does make sense to wait for POSTCOPY_ACTIVE as src could finish
> before dst finished package load and could still fail. We could use a
> qemu_event_wait() to wait and set the event in src return path thread
> when the PONG is received.
Right.
Though since we want to move to POSTCOPY_ACTIVE asap when receiving the
corresponding PONG, we may want to have something like qemu_event_read()
just return "ev->value == EV_SET", as we can't event_wait() in the
migration thread while pushing RAMs.
Or, to avoid touching library code, we can also introduce yet another bool,
then when receiving the PONG we set both (1) the bool, and (2) the event.
We can check the bool in the iterations (when set, wait() to consume the
event; the event should have happened or just to happen), and wait() on the
event when completing (when return, bool must have been set, hence can
reset bool).
--
Peter Xu