On 10.09.25 09:28, Markus Armbruster wrote:
Vladimir Sementsov-Ogievskiy <[email protected]> writes:

To migrate TAP device (including open fds) locally, user should:

1. enable local-tap migration capability both on source and target
2. use additional local-incoming=true option for tap on target

Why capability is not enough? We need an option to modify early
initialization of TAP, to avoid opening new fds. The capability
may not be set at the moment of netdev initialization.

Bummer.  No way around that?

Thanks, you made me think about it once again)

Let me first describe the problem:

At initialization time, we want to know, are we going to get the
live backend from migration stream, or we should initialize it
by hand (e.g. calling open(), and other syscalls, and making other
preparations).

If we don't know, we have to postpone the initialization to some
later point, when we have an information.

The most simple thing is to postpone it to start of the vm.

But that's work bad: at this point we can't clearly rollback the migration.

So, we have to postpone to post-load in case of incoming migration, and
to start for normal start of QEMU (not incoming).

Still, there is still a significant disadvantage:

In case of non-fds migration, we move initilization of backing into downtime,
downtime becomes longer.

What to do?

We need a point in time, when downtime is not started, but we can check for
fds-passing global capability.

And this point in time is migration_incoming_state_setup(), actually.



Peter, could we add .incoming_setup() handler to VMStateDescription ?



So, final interface could look like:

1. global fds-passing migration capability, to enable/disable the whole feature

2. per-device fds-passing option, on by default for all supporting devices, to 
be
able to disable backing migration for some devices. (we discussed it here: 
https://lore.kernel.org/all/[email protected]/ ).
Still, normally these options are always on by default.
And more over, I can postpone their implementation to separate series, to 
reduce discussion field, and to check that everything may work without 
additional user input.


And how it works (using the example of TAP)

1. Normal start of vm

- On device initialization we don't open /dev/net/tun, and don't intialize it
- On vm start (in device's set_status), we see that we are still don't have 
open fd, so open /dev/net/tun, and do all other actions around it

2. Usual migration without fds

on target:
- On device initialization we don't open /dev/net/tun, and don't intialize it
- In .incoming_setup(), we see that we are still don't have open fd, and we see 
that fds-capability is disabled, so open /dev/net/tun, and do all other actions 
around it, source is still running!
- Next, incoming migration starts, downtime starts.
- On vm start, we see that TAP is already initialized, nothing to do

3. Local migration with fds

- enable fds capability both on source and target
- In .incoming_setup(), we see that fds-capability is enabled, so nothing to do 
(we can check, that fd must not be set)
- During load of migration, we get fds
- In .post_load(), we do any remaining initialization actions around incoming 
fd (not too much, as backend device is already initialized)
- On vm start, we see that TAP is already initialized, nothing to do


Signed-off-by: Vladimir Sementsov-Ogievskiy <[email protected]>

[...]

diff --git a/qapi/migration.json b/qapi/migration.json
index 2387c21e9c..992a5b1e2b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -517,6 +517,12 @@
  #     each RAM page.  Requires a migration URI that supports seeking,
  #     such as a file.  (since 9.0)
  #
+# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
+#     descriptors and TAP-related state are migrated. Only may be
+#     used when migration channel is unix socket. For target device
+#     also @local-incoming option must be specified (since 10.2)
+#     (since 10.2)
+#
  # Features:
  #
  # @unstable: Members @x-colo and @x-ignore-shared are experimental.

Missing here: local-tap is also experimental.

@@ -536,7 +542,8 @@
             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
             'validate-uuid', 'background-snapshot',
             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
-           'dirty-limit', 'mapped-ram'] }
+           'dirty-limit', 'mapped-ram',
+           { 'name': 'local-tap', 'features': [ 'unstable' ] } ] }
##
  # @MigrationCapabilityStatus:
diff --git a/qapi/net.json b/qapi/net.json
index 78bcc9871e..8f53549d58 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -353,6 +353,15 @@
  # @poll-us: maximum number of microseconds that could be spent on busy
  #     polling for tap (since 2.7)
  #
+# @local-incoming: Do load open file descriptor for that TAP
+#     on incoming migration. May be used only if QEMU is started
+#     for incoming migration. Will work only together with local-tap
+#     migration capability enabled (default: false) (Since: 10.2)

Scratch "Do".

Re "Maybe be used only": what happens when you use it without incoming
migration or when local-tap is off?

Does "local-incoming": false count as invalid use then?

Two spaces between sentences for consistency, please.

+#
+# Features:
+#
+# @unstable: Member @local-incoming is experimental

Period at the end for consistency, please.

+#
  # Since: 1.2
  ##
  { 'struct': 'NetdevTapOptions',
@@ -371,7 +380,8 @@
      '*vhostfds':   'str',
      '*vhostforce': 'bool',
      '*queues':     'uint32',
-    '*poll-us':    'uint32'} }
+    '*poll-us':    'uint32',
+    '*local-incoming': { 'type': 'bool', 'features': [ 'unstable' ] } } }
##
  # @NetdevSocketOptions:



--
Best regards,
Vladimir

Reply via email to