On Thu, Feb 05, 2026 at 11:06:03AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 05.02.26 10:07, Markus Armbruster wrote:
> > Peter Xu <[email protected]> writes:
> >
> > > On Sun, Feb 01, 2026 at 07:19:55PM +0300, Vladimir Sementsov-Ogievskiy
> > > wrote:
> > > > # @migrate-set-parameters:
> > > > @@ -1004,6 +1005,13 @@
> > > > # is @cpr-exec. The first list element is the program's filename,
> > > > # the remainder its arguments. (Since 10.2)
> > > > #
> > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > +# supports it. In general that means that backend state and its
> > > > +# file descriptors are passed to the destination in the migraton
> > > > +# channel (which must be a UNIX socket). Individual devices
> > > > +# declare the support for backend-transfer by per-device
> > > > +# backend-transfer option. (Since 11.0)
> > >
> > > I still think it'll be nice to either have "local" in the name of
> > > parameter
> > > or at least document it with crystal clear terms.
> > >
> > > I used to suggest fd-passing, but maybe you wanted to emphasize there's
> > > more than fds to be migrated at least for tap?
>
> For vhost-user-blk it's the same: not only FDs.
>
> > > Then it can still be
> > > "local-backend-transfer", because nobody stops a device to transfer
> > > backend
> > > states in a remote migration either.. so "backend-transfer" seems to also
> > > work for remote migrations, but it is not.
>
> Hmm. I imagine a mechanism, where OS supports passing FDs to another host.
> This needs support for actually migrating the corresponding kernel object
> by OS automatically. But theoretically I think it can be done transparently
> for userspace QEMU process, which will simply pass FDs to the some special
> socket, similar to UNIX domain socket.
>
> So, the key aspect is that we should be able to pass FDs to the migration
> channel, which currently meant that it must be UNIX domain socket, and it
> must be local migration. But in future it may change.
That's a nice vision, but IMHO we shouldn't take it into account when
defining any QEMU interface, when it's only about pure imaginations..
unless there is solid work in progress, or ideas proposed / known feasible
at least.
>
> And yes, "backend-transfer" work for remote migration of backend.
> If we ever implement remote backend migration, why not to
> reuse "backend-transfer" for it? Even if there will not be transparent
> support from OS, and we'll implement another mechanics, we may add
> new parameter
>
>
> backend-transfer-mechanism = "scm-rights" | "something-other"
Yes, this will look much better. We likely shouldn't make it "scm-rights",
it should be generic terms that applies to all platforms like "local", even
if the implication / implementation might be different on various
platforms.
That's also the major confusion I got when I was reading the other
vhost-user-blk series, thought it was a local migration but not.
I feel like the interface is simply wrong to make it one covering both, or
at least it shouldn't be a boolean as you said because it represents more
than one use case.
If it's a boolean, it also shouldn't rely on UNIX sockets if it was trying
to describe a remote migration, right? The vhost-usr-blk way of
backend-migration doesn't require UNIX socket, or does it?
Especially, if we still want to have your new proposal try to work for CPR
too or even replace it some day (or a continuous set of proposals in the
future, from different developers based on this feature), we need to have a
solid and clear way represents what CPR does, which is to do local fd
sharing. "backend-transfer: local" or something similar can be that.
>
> (or we can put this into "backend-transfer", supporting passing string to
> it and deprecating boolean)
It can be a enum, something like NONE, LOCAL, REMOTE. But before that..
>
> More over, this future "remote-backend-transfer" could be used for local
> migration, so again, it should be called simply "backend-transfer"..
Yes, REMOTE might be slightly misleading. And considering you seem to want
to allow any of below to work:
(1) enable fd migrations only,
(2) enable remote migrations on backends only,
(3) enable both of (1)+(2)
Maybe we should have two different feature bits? The per-device one can be
kept as backend-transfer, however we need to change the global migration
knob to something describing a local migration.
In summary, still 1 new parameter for migration, 1 new parameter for
device, but adjust to:
- Migration parameter: "local", boolean, when set, the migration must be
a local migration within host (which requires UNIX sockets on Linux)
- Per-device parameter "backend-transfer", boolean, when set, device will
migrate backends when migration happens. Otherwise, backends are not
migrated; dest QEMU needs to re-initialize it. The backends may or may
not contain FDs.
When the backend device states contain FDs and FD migrations are
required, it requires "local" set first above, or it should fail the
migration when user requested backend-transfer=on.
When it doesn't contain FD at all (or FD migration is not a must?), it
should either migrate the backend or not depending on the user's
selection.
For tap (your series here), you need to set both ON and required.
For vhost-usr-blk, that only needs to set per-device knob to ON, the other
one shouldn't matter.
Then when we want to replace cpr, we request people switch (cpr-transfer
only, keeping cpr-exec / cpr-reboot aside for now) from setting
mode=cpr-transfer to local=on, which hopefully will start work as before.
The per-device parameter doesn't matter in this case.
Would this be more reasonable?
Thanks,
--
Peter Xu