On 08.09.25 21:47, Peter Xu wrote:
On Mon, Sep 08, 2025 at 07:38:45PM +0300, Vladimir Sementsov-Ogievskiy wrote:
On 08.09.25 18:35, Peter Xu wrote:
On Fri, Sep 05, 2025 at 04:50:34PM +0300, Vladimir Sementsov-Ogievskiy wrote:
diff --git a/qapi/migration.json b/qapi/migration.json
index 2387c21e9c..992a5b1e2b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -517,6 +517,12 @@
# each RAM page. Requires a migration URI that supports seeking,
# such as a file. (since 9.0)
#
+# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
+# descriptors and TAP-related state are migrated. Only may be
+# used when migration channel is unix socket. For target device
+# also @local-incoming option must be specified (since 10.2)
+# (since 10.2)
IMHO we should move this into a per-device property, at least we need one
there to still control the device behavior; we had a similar discussion
recently on iterable virtio-net.
But maybe this one is slightly special? Maybe the tap device needs to at
least know whether in this specific migration, if we want to pass over FD
or not (e.g. local upgrade, or remote _real_ migration)?
If that's the case, we may consider providing a generic migration
capability, like cap-fd-passing. Nowadays since Fabiano's moving migration
capabilities all over to migration parameters, this one can start with a
parameter instead of a capability. The problem with migration capability
is (at least) that it can't by default ON on any machine types.. meanwhile
it simply looks like identital to parameters except it's always bool.
The high level rational is that we should never add a per-device cap flag
into migration framework.
Hmm.
1. Yes, we need to distinguish, is that _real_ migration or local. And setting a
special property for each device (which supports fd-migration) to turn on
passing
FD to the channel seems not comfortable and error prune.
2. Initially, I decided separate "local-tap" and "local-vhost-user-blk"
capabilities
just to simplify further testing/debugging in real environment: the possibility
to
enable only "half of magic" helps.
So, granularity makes sence, but having local-XXX capability for each device
class
looks bad.
Maybe, having generic cap-fd-passing, together with possibility to disable it on
per-device basis (like migrate-fd=false) is good compromise.
Another question is, do we need "local-incoming" option for target device.
Initially I added this because I thought: ho, I need to distinguish it
initialization
time: do I need to call open(), or wait for incoming fd.
Now I see that I can just postpone this decision up to "start" point, where
- either I already have fd from incoming migration
- or have nothing: in this case, let's call open()
-
I'll try to go with one "fd-passing" capability, as any kind of granularity may
be
added later on demand.
Hmm2. Probably we can avoid even adding such a capability, but just check, is
migration
channel support fd passing or not? Seems too implicit for me.
If we want to expose a feature internally, IIUC we can use QAPI "features"
like this:
https://lore.kernel.org/all/[email protected]/
But I'm not yet sure whether it's useful..
In this case the "capability" itself should almost always be present when
using unix sockets.. The problem is, IIUC we're not trying to describe a
capability, but a choice the user made.
For example, when unix socket is the transport, we can still decide to not
use fd passing even if it's fully supported in the current QEMU binary for
any devices that are involved, because any of: (1) it could be a unix
socket to a proxy daemon (of a container?) when fd passing isn't supported
in the daemon, or (2) as you mentioned above, for debugging purpose when we
want to triage whether a bug is relevant to fd-passing. Maybe more.
The per-device granularity you mentioned also makes sense to me.
An use case is when, imagine, we have a QEMU that (1) supports tap local
migration, but (2) doesn't yet support virtio-blk local migration. Then we
want to be able to enable the fd-passing for tap/virtio-net, but not for
virtio-blk (even if the src QEMU in the context might support both)?
IOW, it makes sense to me to have two layers of controls here:
(a) Migration new parameter, "migrate-fds" (or any better name..).
When set, it enables all devices that supports fd-passing to migrate
the fds directly. OTOH, when not set, even if all devices enabled
fd-passing, it should still do a full migration. This one is the
user knob saying "I want to migrate with fd migrated".
This should imply unix sockets for sure as the transport, and should
fail upfront if it's not a unix socket.
We should also auto-select this with cpr migrations.. then in any
code path (whenever such path exists?) that the fds can be either
migrated from cpr or main channels.
(b) Device new parameter, "migrate-fds" (or any better name..).
When set, the device will declare support migrating fds "whenever the
migration applies", aka, when above (a) is selected first.
Taking tap device as example here, setting it ON here means "please
enable fd-passing whenever the user enables this migration option".
So in tap code, it should migrate fd if both (a) and (b) are ON.
When migrating to e.g. old QEMUs, here (b) should be OFF even if (a)
is ON.
Would above make sense?
Yes, I meant something like this, sounds good.
--
Best regards,
Vladimir