* Peter Xu ([email protected]) wrote:
> Add Vladimir and Dan.
> 
> On Thu, Aug 14, 2025 at 10:17:14AM -0700, Steve Sistare wrote:
> > This patch series adds the live migration cpr-exec mode.  
> > 
> > The new user-visible interfaces are:
> >   * cpr-exec (MigMode migration parameter)
> >   * cpr-exec-command (migration parameter)
> > 
> > cpr-exec mode is similar in most respects to cpr-transfer mode, with the 
> > primary difference being that old QEMU directly exec's new QEMU.  The user
> > specifies the command to exec new QEMU in the migration parameter
> > cpr-exec-command.
> > 
> > Why?
> > 
> > In a containerized QEMU environment, cpr-exec reuses an existing QEMU
> > container and its assigned resources.  By contrast, cpr-transfer mode
> > requires a new container to be created on the same host as the target of
> > the CPR operation.  Resources must be reserved for the new container, while
> > the old container still reserves resources until the operation completes.
> > Avoiding over commitment requires extra work in the management layer.
> 
> Can we spell out what are these resources?
> 
> CPR definitely relies on completely shared memory.  That's already not a
> concern.
> 
> CPR resolves resources that are bound to devices like VFIO by passing over
> FDs, these are not over commited either.
> 
> Is it accounting QEMU/KVM process overhead?  That would really be trivial,
> IMHO, but maybe something else?
> 
> > This is one reason why a cloud provider may prefer cpr-exec.  A second 
> > reason
> > is that the container may include agents with their own connections to the
> > outside world, and such connections remain intact if the container is 
> > reused.
> 
> We discussed about this one.  Personally I still cannot understand why this
> is a concern if the agents can be trivially started as a new instance.  But
> I admit I may not know the whole picture.  To me, the above point is more
> persuasive, but I'll need to understand which part that is over-commited
> that can be a problem.

> After all, cloud hosts should preserve some extra memory anyway to make
> sure dynamic resources allocations all the time (e.g., when live migration
> starts, KVM pgtables can drastically increase if huge pages are enabled,
> for PAGE_SIZE trackings), I assumed the over-commit portion should be less
> that those.. and when it's also temporary (src QEMU will release all
> resources after live upgrade) then it looks manageable.

k8s used to find it very hard to change the amount of memory allocated to a
container after launch (although I heard that's getting fixed); so you'd
need more excess at the start even if your peek during hand over is only
very short.

Dave
> 
> > 
> > How?
> > 
> > cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag,
> > and by sending the unique name and value of each descriptor to new QEMU
> > via CPR state.
> > 
> > CPR state cannot be sent over the normal migration channel, because devices
> > and backends are created prior to reading the channel, so this mode sends
> > CPR state over a second migration channel that is not visible to the user.
> > New QEMU reads the second channel prior to creating devices or backends.
> > 
> > The exec itself is trivial.  After writing to the migration channels, the
> > migration code calls a new main-loop hook to perform the exec.
> > 
> > Example:
> > 
> > In this example, we simply restart the same version of QEMU, but in
> > a real scenario one would use a new QEMU binary path in cpr-exec-command.
> > 
> >   # qemu-kvm -monitor stdio
> >   -object memory-backend-memfd,id=ram0,size=1G
> >   -machine memory-backend=ram0 -machine aux-ram-share=on ...
> > 
> >   QEMU 10.1.50 monitor - type 'help' for more information
> >   (qemu) info status
> >   VM status: running
> >   (qemu) migrate_set_parameter mode cpr-exec
> >   (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming 
> > file:vm.state
> >   (qemu) migrate -d file:vm.state
> >   (qemu) QEMU 10.1.50 monitor - type 'help' for more information
> >   (qemu) info status
> >   VM status: running
> > 
> > Steve Sistare (9):
> >   migration: multi-mode notifier
> >   migration: add cpr_walk_fd
> >   oslib: qemu_clear_cloexec
> >   vl: helper to request exec
> >   migration: cpr-exec-command parameter
> >   migration: cpr-exec save and load
> >   migration: cpr-exec mode
> >   migration: cpr-exec docs
> >   vfio: cpr-exec mode
> 
> The other thing is, as Vladimir is working on (looks like) a cleaner way of
> passing FDs fully relying on unix sockets, I want to understand better on
> the relationships of his work and the exec model.
> 
> I still personally think we should always stick with unix sockets, but I'm
> open to be convinced on above limitations.  If exec is better than
> cpr-transfer in any way, the hope is more people can and should adopt it.
> 
> We also have no answer yet on how cpr-exec can resolve container world with
> seccomp forbidding exec.  I guess that's a no-go.  It's definitely a
> downside instead.  Better mention that in the cover letter.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Reply via email to