On Fri, Feb 06, 2026 at 11:56:27AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 05.02.26 19:25, Peter Xu wrote:
> > On Thu, Feb 05, 2026 at 11:06:03AM +0300, Vladimir Sementsov-Ogievskiy 
> > wrote:
> > > On 05.02.26 10:07, Markus Armbruster wrote:
> > > > Peter Xu <[email protected]> writes:
> > > > 
> > > > > On Sun, Feb 01, 2026 at 07:19:55PM +0300, Vladimir 
> > > > > Sementsov-Ogievskiy wrote:
> > > > > >    # @migrate-set-parameters:
> > > > > > @@ -1004,6 +1005,13 @@
> > > > > >    #     is @cpr-exec.  The first list element is the program's 
> > > > > > filename,
> > > > > >    #     the remainder its arguments.  (Since 10.2)
> > > > > >    #
> > > > > > +# @backend-transfer: Enable backend-transfer feature for devices 
> > > > > > that
> > > > > > +#     supports it.  In general that means that backend state and 
> > > > > > its
> > > > > > +#     file descriptors are passed to the destination in the 
> > > > > > migraton
> > > > > > +#     channel (which must be a UNIX socket).  Individual devices
> > > > > > +#     declare the support for backend-transfer by per-device
> > > > > > +#     backend-transfer option.  (Since 11.0)
> > > > > 
> > > > > I still think it'll be nice to either have "local" in the name of 
> > > > > parameter
> > > > > or at least document it with crystal clear terms.
> > > > > 
> > > > > I used to suggest fd-passing, but maybe you wanted to emphasize 
> > > > > there's
> > > > > more than fds to be migrated at least for tap?
> > > 
> > > For vhost-user-blk it's the same: not only FDs.
> > > 
> > > > > Then it can still be
> > > > > "local-backend-transfer", because nobody stops a device to transfer 
> > > > > backend
> > > > > states in a remote migration either.. so "backend-transfer" seems to 
> > > > > also
> > > > > work for remote migrations, but it is not.
> > > 
> > > Hmm. I imagine a mechanism, where OS supports passing FDs to another host.
> > > This needs support for actually migrating the corresponding kernel object
> > > by OS automatically. But theoretically I think it can be done 
> > > transparently
> > > for userspace QEMU process, which will simply pass FDs to the some special
> > > socket, similar to UNIX domain socket.
> > > 
> > > So, the key aspect is that we should be able to pass FDs to the migration
> > > channel, which currently meant that it must be UNIX domain socket, and it
> > > must be local migration. But in future it may change.
> > 
> > That's a nice vision, but IMHO we shouldn't take it into account when
> > defining any QEMU interface, when it's only about pure imaginations..
> > unless there is solid work in progress, or ideas proposed / known feasible
> > at least.
> > 
> > > 
> > > And yes, "backend-transfer" work for remote migration of backend.
> > > If we ever implement remote backend migration, why not to
> > > reuse "backend-transfer" for it? Even if there will not be transparent
> > > support from OS, and we'll implement another mechanics, we may add
> > > new parameter
> > > 
> > > 
> > >     backend-transfer-mechanism = "scm-rights" | "something-other"
> > 
> > Yes, this will look much better.  We likely shouldn't make it "scm-rights",
> > it should be generic terms that applies to all platforms like "local", even
> > if the implication / implementation might be different on various
> > platforms.
> > 
> > That's also the major confusion I got when I was reading the other
> > vhost-user-blk series, thought it was a local migration but not.
> > 
> > I feel like the interface is simply wrong to make it one covering both, or
> > at least it shouldn't be a boolean as you said because it represents more
> > than one use case.
> > 
> > If it's a boolean, it also shouldn't rely on UNIX sockets if it was trying
> > to describe a remote migration, right?  The vhost-usr-blk way of
> > backend-migration doesn't require UNIX socket, or does it?
> 
> It does require UNIX socket too.

I'm lost once more.. :( Could you share what requires the UNIX socket for
the other work here?

https://lore.kernel.org/all/[email protected]/#r

There's indeed the inflight->fd, but it's not migrated but allocated before
taking the inflight buffer.  I don't see how it requires UNIX socket.

> 
> > 
> > Especially, if we still want to have your new proposal try to work for CPR
> > too or even replace it some day (or a continuous set of proposals in the
> > future, from different developers based on this feature), we need to have a
> > solid and clear way represents what CPR does, which is to do local fd
> > sharing.  "backend-transfer: local" or something similar can be that.
> > 
> > > 
> > > (or we can put this into "backend-transfer", supporting passing string to
> > > it and deprecating boolean)
> > 
> > It can be a enum, something like NONE, LOCAL, REMOTE.  But before that..
> > 
> > > 
> > > More over, this future "remote-backend-transfer" could be used for local
> > > migration, so again, it should be called simply "backend-transfer"..
> > 
> > Yes, REMOTE might be slightly misleading.  And considering you seem to want
> > to allow any of below to work:
> > 
> >    (1) enable fd migrations only,
> >    (2) enable remote migrations on backends only,
> >    (3) enable both of (1)+(2)
> > 
> > Maybe we should have two different feature bits?  The per-device one can be
> > kept as backend-transfer, however we need to change the global migration
> > knob to something describing a local migration.
> > 
> > In summary, still 1 new parameter for migration, 1 new parameter for
> > device, but adjust to:
> > 
> >    - Migration parameter: "local", boolean, when set, the migration must be
> >      a local migration within host (which requires UNIX sockets on Linux)
> > 
> >    - Per-device parameter "backend-transfer", boolean, when set, device will
> >      migrate backends when migration happens.  Otherwise, backends are not
> >      migrated; dest QEMU needs to re-initialize it.  The backends may or may
> >      not contain FDs.
> > 
> >      When the backend device states contain FDs and FD migrations are
> >      required, it requires "local" set first above, or it should fail the
> >      migration when user requested backend-transfer=on.
> > 
> >      When it doesn't contain FD at all (or FD migration is not a must?), it
> >      should either migrate the backend or not depending on the user's
> >      selection.
> > 
> > For tap (your series here), you need to set both ON and required.
> > 
> > For vhost-usr-blk, that only needs to set per-device knob to ON, the other
> > one shouldn't matter.
> > 
> > Then when we want to replace cpr, we request people switch (cpr-transfer
> > only, keeping cpr-exec / cpr-reboot aside for now) from setting
> > mode=cpr-transfer to local=on, which hopefully will start work as before.
> > The per-device parameter doesn't matter in this case.
> > 
> > Would this be more reasonable?
> > 
> 
> Hmm. So, with backend-transfer=on on device and local mig parameter set to 
> false, it fails?
> 
> But this way we'll have to set backend-transfer to on/off before any 
> migration (local or
> remote) on all devices with help of set-qom. That's not comfortable.

Personally as long as we can separate the two use cases with the two knobs
properly, then it will look good to me.  It doesn't need to be strictly a
failure on such conflictions indeed.

E.g. we can also define this case (local=off, backend-transfer=on) the
other way round if failing is not wanted; that is, allow migration to
happen but skip the part of backend transfer that requires the locality.

Fundamentally, we should accept two kinds of backend-transfer impl:

  - When it is supported regardless of local=on/off.  I believe that's
    vhost-usr-blk's case (but I'll now need to double check with you again
    above on UNIX dependency).  Then this only relies on the per-dev knob.

  - When it is supported only if local=on (this series).  This part is
    where we can define the behavior of whether we fail the migration on
    local=off, or we skip the feature instead.

So I think we can choose to skip it for the latter.  It should almost be
the same logic as what you have done in this patchset, afaict, besides the
rename and re-definition of the migration knob.

> 
> The original idea was that backend-transfer is done for the device when both 
> migration
> parameter and device option are set to true. This way before the migration 
> (local or
> remote) we only have to set appropriate migration parameters. And 
> backend-transfer
> per-device options can be setup once (and the same way) when starting the 
> QEMU, or
> they may be inherited from Machine Type. And with such logic, it's good to 
> have
> similar names for migration parameter and device option.

I hope above will solve this problem.  IIUC what you described should work
if we tweat the new proposal on the local=off & backend-transfer=on case.

> 
> Considering all this, could we keep the logic as is (in this patch), but 
> rename
> backend-transfer parameter to local-backend-transfer, as you proposed before?
> Or turn it into "backend-transfer" = "local" | "off" (but IMHO it's too 
> optimistic:
> who knows, will we really add something into this enum in the future? I don't 
> have
> such plans)

IMHO "local" would be nicer because it's very simple, generic and clear on
is own.  It almost says "requires UNIX sockets" on Linux and it also opens
the door for this parameter to be reused when without a backend: for
example, when some frontend or any-not-trivially-a-backend also want to
migrate an FD in the future.  I'm not surprised to see it coming.

But let's finish above disucssion and see if we can reach the same page.

Thanks,

-- 
Peter Xu


Reply via email to