On 24/02/2022 18:58, Fabiano Rosas wrote:
This series implements the migration for a TCG pseries guest running a
nested KVM guest. This is just like migrating a pseries TCG guest, but
with some extra state to allow a nested guest to continue to run on
the destination.
Unfortunately the regular TCG migration scenario (not nested) is not
fully working so I cannot be entirely sure the nested migration is
correct. I have included a couple of patches for the general migration
case that (I think?) improve the situation a bit, but I'm still seeing
hard lockups and other issues with more than 1 vcpu.
This is more of an early RFC to see if anyone spots something right
away. I haven't made much progress in debugging the general TCG
migration case so if anyone has any input there as well I'd appreciate
it.
Thanks
Fabiano Rosas (4):
target/ppc: TCG: Migrate tb_offset and decr
spapr: TCG: Migrate spapr_cpu->prod
hw/ppc: Take nested guest into account when saving timebase
spapr: Add KVM-on-TCG migration support
hw/ppc/ppc.c | 17 +++++++-
hw/ppc/spapr.c | 19 ++++++++
hw/ppc/spapr_cpu_core.c | 77 +++++++++++++++++++++++++++++++++
include/hw/ppc/spapr_cpu_core.h | 2 +-
target/ppc/machine.c | 61 ++++++++++++++++++++++++++
5 files changed, 174 insertions(+), 2 deletions(-)
FWIW I noticed there were some issues with migrating the decrementer on Mac machines
a while ago which causes a hang on the destination with TCG (for MacOS on a x86 host
in my case). Have a look at the following threads for reference:
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg00546.html
https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04622.html
IIRC there is code that assumes any migration in PPC is being done live, and so
adjusts the timebase on the destination to reflect wall clock time by recalculating
tb_offset. I haven't looked at the code for a while but I think the outcome was that
there needs to be 2 phases in migration: the first is to migrate the timebase as-is
for guests that are paused during migration, whilst the second is to notify
hypervisor-aware guest OSs such as Linux to make the timebase adjustment if required
if the guest is running.
ATB,
Mark.