Re: [PATCH v3 4/9] qapi: add interface for local TAP migration

Vladimir Sementsov-Ogievskiy Tue, 09 Sep 2025 00:40:58 -0700

On 08.09.25 21:47, Peter Xu wrote:

On Mon, Sep 08, 2025 at 07:38:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:

On 08.09.25 18:35, Peter Xu wrote:

On Fri, Sep 05, 2025 at 04:50:34PM +0300, Vladimir Sementsov-Ogievskiy wrote:

diff --git a/qapi/migration.json b/qapi/migration.json
index 2387c21e9c..992a5b1e2b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -517,6 +517,12 @@
   #     each RAM page.  Requires a migration URI that supports seeking,
   #     such as a file.  (since 9.0)
   #
+# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
+#     descriptors and TAP-related state are migrated. Only may be
+#     used when migration channel is unix socket. For target device
+#     also @local-incoming option must be specified (since 10.2)
+#     (since 10.2)


IMHO we should move this into a per-device property, at least we need one
there to still control the device behavior; we had a similar discussion
recently on iterable virtio-net.

But maybe this one is slightly special?  Maybe the tap device needs to at
least know whether in this specific migration, if we want to pass over FD
or not (e.g. local upgrade, or remote _real_ migration)?

If that's the case, we may consider providing a generic migration
capability, like cap-fd-passing.  Nowadays since Fabiano's moving migration
capabilities all over to migration parameters, this one can start with a
parameter instead of a capability.  The problem with migration capability
is (at least) that it can't by default ON on any machine types.. meanwhile
it simply looks like identital to parameters except it's always bool.

The high level rational is that we should never add a per-device cap flag
into migration framework.


Hmm.

1. Yes, we need to distinguish, is that _real_ migration or local. And setting a
special property for each device (which supports fd-migration) to turn on 
passing
FD to the channel seems not comfortable and error prune.

2. Initially, I decided separate "local-tap" and "local-vhost-user-blk" 
capabilities
just to simplify further testing/debugging in real environment: the possibility 
to
enable only "half of magic" helps.

So, granularity makes sence, but having local-XXX capability for each device 
class
looks bad.

Maybe, having generic cap-fd-passing, together with possibility to disable it on
per-device basis (like migrate-fd=false) is good compromise.


Another question is, do we need "local-incoming" option for target device.

Initially I added this because I thought: ho, I need to distinguish it 
initialization
time: do I need to call open(), or wait for incoming fd.

Now I see that I can just postpone this decision up to "start" point, where

- either I already have fd from incoming migration
- or have nothing: in this case, let's call open()

-

I'll try to go with one "fd-passing" capability, as any kind of granularity may 
be
added later on demand.


Hmm2. Probably we can avoid even adding such a capability, but just check, is 
migration
channel support fd passing or not? Seems too implicit for me.


If we want to expose a feature internally, IIUC we can use QAPI "features"
like this:

https://lore.kernel.org/all/[email protected]/

But I'm not yet sure whether it's useful..

In this case the "capability" itself should almost always be present when
using unix sockets..  The problem is, IIUC we're not trying to describe a
capability, but a choice the user made.

For example, when unix socket is the transport, we can still decide to not
use fd passing even if it's fully supported in the current QEMU binary for
any devices that are involved, because any of: (1) it could be a unix
socket to a proxy daemon (of a container?) when fd passing isn't supported
in the daemon, or (2) as you mentioned above, for debugging purpose when we
want to triage whether a bug is relevant to fd-passing.  Maybe more.

The per-device granularity you mentioned also makes sense to me.

An use case is when, imagine, we have a QEMU that (1) supports tap local
migration, but (2) doesn't yet support virtio-blk local migration.  Then we
want to be able to enable the fd-passing for tap/virtio-net, but not for
virtio-blk (even if the src QEMU in the context might support both)?

IOW, it makes sense to me to have two layers of controls here:

   (a) Migration new parameter, "migrate-fds" (or any better name..).

       When set, it enables all devices that supports fd-passing to migrate
       the fds directly.  OTOH, when not set, even if all devices enabled
       fd-passing, it should still do a full migration.  This one is the
       user knob saying "I want to migrate with fd migrated".

       This should imply unix sockets for sure as the transport, and should
       fail upfront if it's not a unix socket.

       We should also auto-select this with cpr migrations..  then in any
       code path (whenever such path exists?) that the fds can be either
       migrated from cpr or main channels.

   (b) Device new parameter, "migrate-fds" (or any better name..).

       When set, the device will declare support migrating fds "whenever the
       migration applies", aka, when above (a) is selected first.

       Taking tap device as example here, setting it ON here means "please
       enable fd-passing whenever the user enables this migration option".
       So in tap code, it should migrate fd if both (a) and (b) are ON.
       When migrating to e.g. old QEMUs, here (b) should be OFF even if (a)
       is ON.

Would above make sense?


Yes, I meant something like this, sounds good.

--
Best regards,
Vladimir

Re: [PATCH v3 4/9] qapi: add interface for local TAP migration

Reply via email to