Re: [PATCH v10 3/8] qapi: add backend-transfer migration parameter

Vladimir Sementsov-Ogievskiy Fri, 06 Feb 2026 12:39:01 -0800

On 06.02.26 19:08, Peter Xu wrote:

On Fri, Feb 06, 2026 at 11:56:27AM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 05.02.26 19:25, Peter Xu wrote:

On Thu, Feb 05, 2026 at 11:06:03AM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 05.02.26 10:07, Markus Armbruster wrote:

Peter Xu <[email protected]> writes:

On Sun, Feb 01, 2026 at 07:19:55PM +0300, Vladimir Sementsov-Ogievskiy wrote:

    # @migrate-set-parameters:
@@ -1004,6 +1005,13 @@
    #     is @cpr-exec.  The first list element is the program's filename,
    #     the remainder its arguments.  (Since 10.2)
    #
+# @backend-transfer: Enable backend-transfer feature for devices that
+#     supports it.  In general that means that backend state and its
+#     file descriptors are passed to the destination in the migraton
+#     channel (which must be a UNIX socket).  Individual devices
+#     declare the support for backend-transfer by per-device
+#     backend-transfer option.  (Since 11.0)


I still think it'll be nice to either have "local" in the name of parameter
or at least document it with crystal clear terms.

I used to suggest fd-passing, but maybe you wanted to emphasize there's
more than fds to be migrated at least for tap?


For vhost-user-blk it's the same: not only FDs.

Then it can still be
"local-backend-transfer", because nobody stops a device to transfer backend
states in a remote migration either.. so "backend-transfer" seems to also
work for remote migrations, but it is not.


Hmm. I imagine a mechanism, where OS supports passing FDs to another host.
This needs support for actually migrating the corresponding kernel object
by OS automatically. But theoretically I think it can be done transparently
for userspace QEMU process, which will simply pass FDs to the some special
socket, similar to UNIX domain socket.

So, the key aspect is that we should be able to pass FDs to the migration
channel, which currently meant that it must be UNIX domain socket, and it
must be local migration. But in future it may change.


That's a nice vision, but IMHO we shouldn't take it into account when
defining any QEMU interface, when it's only about pure imaginations..
unless there is solid work in progress, or ideas proposed / known feasible
at least.


And yes, "backend-transfer" work for remote migration of backend.
If we ever implement remote backend migration, why not to
reuse "backend-transfer" for it? Even if there will not be transparent
support from OS, and we'll implement another mechanics, we may add
new parameter


     backend-transfer-mechanism = "scm-rights" | "something-other"


Yes, this will look much better.  We likely shouldn't make it "scm-rights",
it should be generic terms that applies to all platforms like "local", even
if the implication / implementation might be different on various
platforms.

That's also the major confusion I got when I was reading the other
vhost-user-blk series, thought it was a local migration but not.

I feel like the interface is simply wrong to make it one covering both, or
at least it shouldn't be a boolean as you said because it represents more
than one use case.

If it's a boolean, it also shouldn't rely on UNIX sockets if it was trying
to describe a remote migration, right?  The vhost-usr-blk way of
backend-migration doesn't require UNIX socket, or does it?


It does require UNIX socket too.


I'm lost once more.. :( Could you share what requires the UNIX socket for
the other work here?

https://lore.kernel.org/all/[email protected]/#r


Ah sorry, I thought we are talking about my series
"[PATCH v2 00/25] vhost-user-blk: live-backend local migration"

Of course, Alexander's series doesn't need UNIX socket.


There's indeed the inflight->fd, but it's not migrated but allocated before
taking the inflight buffer.  I don't see how it requires UNIX socket.


It doesn't. But it doesn't transfer "the whole backend", only the inflight
region.


Especially, if we still want to have your new proposal try to work for CPR
too or even replace it some day (or a continuous set of proposals in the
future, from different developers based on this feature), we need to have a
solid and clear way represents what CPR does, which is to do local fd
sharing.  "backend-transfer: local" or something similar can be that.


(or we can put this into "backend-transfer", supporting passing string to
it and deprecating boolean)


It can be a enum, something like NONE, LOCAL, REMOTE.  But before that..


More over, this future "remote-backend-transfer" could be used for local
migration, so again, it should be called simply "backend-transfer"..


Yes, REMOTE might be slightly misleading.  And considering you seem to want
to allow any of below to work:

    (1) enable fd migrations only,
    (2) enable remote migrations on backends only,
    (3) enable both of (1)+(2)

Maybe we should have two different feature bits?  The per-device one can be
kept as backend-transfer, however we need to change the global migration
knob to something describing a local migration.

In summary, still 1 new parameter for migration, 1 new parameter for
device, but adjust to:

    - Migration parameter: "local", boolean, when set, the migration must be
      a local migration within host (which requires UNIX sockets on Linux)

    - Per-device parameter "backend-transfer", boolean, when set, device will
      migrate backends when migration happens.  Otherwise, backends are not
      migrated; dest QEMU needs to re-initialize it.  The backends may or may
      not contain FDs.

      When the backend device states contain FDs and FD migrations are
      required, it requires "local" set first above, or it should fail the
      migration when user requested backend-transfer=on.

      When it doesn't contain FD at all (or FD migration is not a must?), it
      should either migrate the backend or not depending on the user's
      selection.

For tap (your series here), you need to set both ON and required.

For vhost-usr-blk, that only needs to set per-device knob to ON, the other
one shouldn't matter.

Then when we want to replace cpr, we request people switch (cpr-transfer
only, keeping cpr-exec / cpr-reboot aside for now) from setting
mode=cpr-transfer to local=on, which hopefully will start work as before.
The per-device parameter doesn't matter in this case.

Would this be more reasonable?


Hmm. So, with backend-transfer=on on device and local mig parameter set to 
false, it fails?

But this way we'll have to set backend-transfer to on/off before any migration 
(local or
remote) on all devices with help of set-qom. That's not comfortable.


Personally as long as we can separate the two use cases with the two knobs
properly, then it will look good to me.  It doesn't need to be strictly a
failure on such conflictions indeed.

E.g. we can also define this case (local=off, backend-transfer=on) the
other way round if failing is not wanted; that is, allow migration to
happen but skip the part of backend transfer that requires the locality.

Fundamentally, we should accept two kinds of backend-transfer impl:

   - When it is supported regardless of local=on/off.  I believe that's
     vhost-usr-blk's case (but I'll now need to double check with you again
     above on UNIX dependency).  Then this only relies on the per-dev knob.

   - When it is supported only if local=on (this series).  This part is
     where we can define the behavior of whether we fail the migration on
     local=off, or we skip the feature instead.

So I think we can choose to skip it for the latter.  It should almost be
the same logic as what you have done in this patchset, afaict, besides the
rename and re-definition of the migration knob.


The original idea was that backend-transfer is done for the device when both 
migration
parameter and device option are set to true. This way before the migration 
(local or
remote) we only have to set appropriate migration parameters. And 
backend-transfer
per-device options can be setup once (and the same way) when starting the QEMU, 
or
they may be inherited from Machine Type. And with such logic, it's good to have
similar names for migration parameter and device option.


I hope above will solve this problem.  IIUC what you described should work
if we tweat the new proposal on the local=off & backend-transfer=on case.


Considering all this, could we keep the logic as is (in this patch), but rename
backend-transfer parameter to local-backend-transfer, as you proposed before?
Or turn it into "backend-transfer" = "local" | "off" (but IMHO it's too 
optimistic:
who knows, will we really add something into this enum in the future? I don't 
have
such plans)


IMHO "local" would be nicer because it's very simple, generic and clear on
is own.  It almost says "requires UNIX sockets" on Linux and it also opens
the door for this parameter to be reused when without a backend: for
example, when some frontend or any-not-trivially-a-backend also want to
migrate an FD in the future.  I'm not surprised to see it coming.

But let's finish above disucssion and see if we can reach the same page.


So finally, rename to "local", and keep the logic as is, right? OK for me, will 
do.

--
Best regards,
Vladimir

Re: [PATCH v10 3/8] qapi: add backend-transfer migration parameter

Reply via email to