* Peter Xu ([email protected]) wrote: > Add Vladimir and Dan. > > On Thu, Aug 14, 2025 at 10:17:14AM -0700, Steve Sistare wrote: > > This patch series adds the live migration cpr-exec mode. > > > > The new user-visible interfaces are: > > * cpr-exec (MigMode migration parameter) > > * cpr-exec-command (migration parameter) > > > > cpr-exec mode is similar in most respects to cpr-transfer mode, with the > > primary difference being that old QEMU directly exec's new QEMU. The user > > specifies the command to exec new QEMU in the migration parameter > > cpr-exec-command. > > > > Why? > > > > In a containerized QEMU environment, cpr-exec reuses an existing QEMU > > container and its assigned resources. By contrast, cpr-transfer mode > > requires a new container to be created on the same host as the target of > > the CPR operation. Resources must be reserved for the new container, while > > the old container still reserves resources until the operation completes. > > Avoiding over commitment requires extra work in the management layer. > > Can we spell out what are these resources? > > CPR definitely relies on completely shared memory. That's already not a > concern. > > CPR resolves resources that are bound to devices like VFIO by passing over > FDs, these are not over commited either. > > Is it accounting QEMU/KVM process overhead? That would really be trivial, > IMHO, but maybe something else? > > > This is one reason why a cloud provider may prefer cpr-exec. A second > > reason > > is that the container may include agents with their own connections to the > > outside world, and such connections remain intact if the container is > > reused. > > We discussed about this one. Personally I still cannot understand why this > is a concern if the agents can be trivially started as a new instance. But > I admit I may not know the whole picture. To me, the above point is more > persuasive, but I'll need to understand which part that is over-commited > that can be a problem.
> After all, cloud hosts should preserve some extra memory anyway to make > sure dynamic resources allocations all the time (e.g., when live migration > starts, KVM pgtables can drastically increase if huge pages are enabled, > for PAGE_SIZE trackings), I assumed the over-commit portion should be less > that those.. and when it's also temporary (src QEMU will release all > resources after live upgrade) then it looks manageable. k8s used to find it very hard to change the amount of memory allocated to a container after launch (although I heard that's getting fixed); so you'd need more excess at the start even if your peek during hand over is only very short. Dave > > > > > How? > > > > cpr-exec preserves descriptors across exec by clearing the CLOEXEC flag, > > and by sending the unique name and value of each descriptor to new QEMU > > via CPR state. > > > > CPR state cannot be sent over the normal migration channel, because devices > > and backends are created prior to reading the channel, so this mode sends > > CPR state over a second migration channel that is not visible to the user. > > New QEMU reads the second channel prior to creating devices or backends. > > > > The exec itself is trivial. After writing to the migration channels, the > > migration code calls a new main-loop hook to perform the exec. > > > > Example: > > > > In this example, we simply restart the same version of QEMU, but in > > a real scenario one would use a new QEMU binary path in cpr-exec-command. > > > > # qemu-kvm -monitor stdio > > -object memory-backend-memfd,id=ram0,size=1G > > -machine memory-backend=ram0 -machine aux-ram-share=on ... > > > > QEMU 10.1.50 monitor - type 'help' for more information > > (qemu) info status > > VM status: running > > (qemu) migrate_set_parameter mode cpr-exec > > (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming > > file:vm.state > > (qemu) migrate -d file:vm.state > > (qemu) QEMU 10.1.50 monitor - type 'help' for more information > > (qemu) info status > > VM status: running > > > > Steve Sistare (9): > > migration: multi-mode notifier > > migration: add cpr_walk_fd > > oslib: qemu_clear_cloexec > > vl: helper to request exec > > migration: cpr-exec-command parameter > > migration: cpr-exec save and load > > migration: cpr-exec mode > > migration: cpr-exec docs > > vfio: cpr-exec mode > > The other thing is, as Vladimir is working on (looks like) a cleaner way of > passing FDs fully relying on unix sockets, I want to understand better on > the relationships of his work and the exec model. > > I still personally think we should always stick with unix sockets, but I'm > open to be convinced on above limitations. If exec is better than > cpr-transfer in any way, the hope is more people can and should adopt it. > > We also have no answer yet on how cpr-exec can resolve container world with > seccomp forbidding exec. I guess that's a no-go. It's definitely a > downside instead. Better mention that in the cover letter. > > Thanks, > > -- > Peter Xu > -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/
