On 10.09.25 19:58, Peter Xu wrote:
On Wed, Sep 10, 2025 at 12:35:10AM +0300, Vladimir Sementsov-Ogievskiy wrote:
I wished devices could opt-in to provide its own model so that it is
prepared to boot the QEMU without FDs being there and pause itself at that
stage if a load would happen.

So, you suggest to postpone the initialization up to "start" even for "normal 
start"
of QEMU, to avoid these endless "if (we are in our special local-incoming/CPR 
mode)".

Actually, that's how normal migratable devices live: we don't have "if 
(incoming)" for
every step of initialization/start currently.

I'll see, could I apply the concept to TAP local migration series.


Hmm, not so simple.

OK, my current series behave like this:

init:  if tap.local_incoming then do nothing else open(/dev/net/tun)

incoming migration: get fd, and continue initialization


Assume, we want to avoid extra "if"s, and just postpone the initialization to 
vm start point, like

init: do nothing. set fd=-1

incmoing migration: get fd (if cap-fd-passing enabled)

start: open(), if fd==-1, continue initialization


But that mean that we postpone possible errors up to start as well, when we 
cannot rollback the
migration..

Yep, doesn't sound like a good idea.  We also don't want to slow down VM
starts.



Alternatively, we can postpone open() to post-load.. But what for normal start 
of vm?

init: if INMIGRATE then do nothing, else open()

incoming: get fd (if cap-fd-passing)

post-load: open(), if fd==-1, continue initialization

start: if fd is still -1, open(), continue initialization

that avoids extra tap.local_incoming option, but:

- seems even more complicated
- open() and some initialization is done in downtime, when we don't enable 
cap-fd-passing


So, now I think, that my current approach with additional "local-incoming" 
per-device option is better.

What do you think?


Probably I'm trying to optimize wrong "if". As "if local-incomging .." in 
generic layer is a lot
more expensive than checking the options in device code.

But the idea is generic: for non-fd migration, we do as much initialization at 
start as possible,

AFAIU, the non-fd migrations works simply because the portion that VMSD
loads will always be over-writeable.  When it's not, a pre_load() or
post_load() would make it work.

to get early errors and to decrease further downtime. For fd migration, we 
postpone fd-initialization
up to post-load stage. So, we have "if"s in device code to handle it, and we have 
"if"s in generic
code to support device, which doesn't still have fully initialized backend (no 
fds during init).

What I meant is, IMHO we should try to not use things like
cpr_is_incoming() too deep into the device stack, and we should use it as
less frequent as possible.

In many cases, IIUC it's because the current device emulation code is not
yet separating the FD installation (and also whatever that can be relevant
to the FD) from the realize() process.  Hence a quick way to make it work
is to add cpr_is_incoming() or similar helpers either to skip some process,
or do something different with an existing FD.

If we can have device emulation be prepared with such, in an ideal world
and just to show what I am thinking.. it could be:

   - realize()
     - realize_frontend()
     - if migration is incoming, and backend should be postponed (for fd
       loading, or maybe something else)?
       - ... realize_backend() postponed until post_load()...
     - else
       - realize_backend()

If all of the devices would support such split of realize() process
v.s. FDs / backends, _maybe_ we can remove all cpr_is_incoming() but move
it upper and upper until qdev code, like:

device_set_realized():
         if (migration_incoming_XXX() && dc->realize_prepare) {
             /*
              * This is only part of realize(), rest done in a separate VMSD
              * post_load().
              */
             dc->realize_prepare(dev, &local_err);
             if (local_err != NULL) {
                 goto fail;
             }
         } else if (dc->realize) {
             dc->realize(dev, &local_err);
             if (local_err != NULL) {
                 goto fail;
             }
         }

In general, that "whether is incoming fd migration" concept will be passed
down from higher the stack, rather than randomly checked very deep in
stack.  That should IMHO make code more maintenable.

But that's only my two cents.. so please take that with a grain of salt.  I
don't really know device code well to say.



Thanks for explanation, I see the idea now. Will see, how much I can apply it to
TAP series. I believe, TAP is good chance to make good design, as it's a lot 
simpler than
vhost-user-blk or vfio.


--
Best regards,
Vladimir

Reply via email to