Hi Peter, Richard,
On 1/26/26 5:52 PM, Eric Auger wrote:
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
> register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
> from v6.13 onwards. Causes forward migration failure.
>
> This situation is really problematic for distributions which want to
> guarantee forward and backward migration of a given machine type
> between different releases.
>
> While the series mainly targets KVM acceleration, this problem
> also exists with TCG. For instance some registers may be exposed
> while they shouldn't. Then it is tricky to fix that situation
> without breaking forward migration. An example was provided by
> Peter: 4f2b82f60 ("target/arm: Reinstate bogus AArch32 DBGDTRTX
> register for migration compat).
>
> This series introduces 2 CPU array properties that list
> - the CPU registers to hide from the exposes sysregs (aims
> at removing registers from the destination)
> - The CPU registers that may not exist but which can be found
> in the incoming migration stream (aims at ignoring extra
> registers in the incoming state)
>
> An example is given to illustrate how those props
> could be used to apply compats for machine types supposed to "see" the
> same register set accross various host kernels.
>
> Mitigation of DBGDTRTX issue would be achieved by setting
> x-mig-safe-missing-regs=0x40200000200e0298 which matches
> AArch32 DBGDTRTX register index.
>
> The first patch improves the tracing so that we can quickly detect
> which registers do not match between the incoming stream and the
> exposed sysregs
Most of the patches of the series have collected R-bs. Do you have
concerns with the approach?
This aims at solving distro real life issues wrt cross kernel migration
failures and we would appreciate to get a generic solution within 11.0
timeframe.
Also [PATCH v4 0/2] arm: add kvm-psci-version vcpu property
(https://lore.kernel.org/all/[email protected]/)
is part of this initiative and also collected R-bs/T-bs.
Looking forward to your feedbacks.
Eric
>
> ---
>
> Available at:
> https://github.com/eauger/qemu/tree/mitig-v6
>
> ---
>
> Tests:
> - migration with 10.2 machine with old qemu featuring DBGDTRTX
> and this one where it is removed. Forward migration works.
> backward doesn't because the register is not present in the
> input migration stream and write_list_to_cpustate() fails
> while write_raw_cp_reg and reading it back. write_raw_cp_reg()
> seems to read an unintialized values from cpu->cpreg_values[i].
> write has no effect since type is ARM_CP_CONST but read_raw_cp_reg
> returns ri->resetvalue which differs from uninitialized value.
> I would have expected the initial cpu->cpreg_values[i] to match
> reset value which is obviously not the case. Laso the comment hints
> that it should be. So maybe another issue? Nevertheless I am
> not totally sure supporting backward migration for TCG is a must.
> This may be fixed separately if it is confirmed this is a bug.
>
> - migration with accel=kvm back and forth old host/qemu where
> host does not feature fixes for TCR2_EL1, PIRE0_EL1, PIR_EL1
> and recent KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW and more recent
> kernel/this qemu that feature them. Migration works forward
> and backward with 10.1 machine type.
>
> History:
>
> v5 -> v6:
> - move GString init and collected Sebastian's R-b
>
> v4 -> v5:
> - Fixed issue reported by Sebastian about aggregated array
> props. This lead to the introduction of
> hw/arm/virt: Introduce framework to aggregate hidden-regs
> and safe-missing-regs
> - Collected additional hacks from Connie
>
> v3 -> v4:
> - Collected Connie's & Sebastian's R-bs
> - Squashed patches 3 and 5
> - various typos and rewording
>
> v2 -> v3:
> - revert target/arm: Reinstate bogus AArch32 DBGDTRTX register for migration
> compat
> - fix some typos and rework target/arm/cpu.h hidden_regs comment (Connie)
> - Even for TCG we use KVM index
>
> v1 -> v2:
> - fixed typos (Connie)
> - Make it less KVM specific (tentative hidding of TCG regs, not
> tested)
> - Tested DBGDTRTX TCG case reported by Peter
> - No change to the property format yet. Ran out of idea. However
> I changed the name of the property with x-mig prefix
> - Changed the terminology, kept hidding but remove fake which was
> confusing
> - Simplified the logic for regs missing in the incoming stream and
> do not check anymore they are exposed on dest
>
>
> Eric Auger (11):
> hw/arm/virt: Rename arm_virt_compat into arm_virt_compat_defaults
> target/arm/machine: Improve traces on register mismatch during
> migration
> target/arm/cpu: Allow registers to be hidden
> target/arm/machine: Allow extra regs in the incoming stream
> kvm-all: Enforce hidden regs are never accessed
> target/arm/cpu: Implement hide_reg callback()
> target/arm/cpu: Expose x-mig-hidden-regs and x-mig-safe-missing-regs
> properties
> hw/arm/virt: Declare AArch32 DBGDTRTX as safe to ignore in incoming
> stream
> Revert "target/arm: Reinstate bogus AArch32 DBGDTRTX register for
> migration compat"
> hw/arm/virt: Introduce framework to aggregate hidden-regs and
> safe-missing-regs
> hw/arm/virt: [DO NOT UPSTREAM] Enforce compatibility with older
> kernels
>
> include/hw/arm/virt.h | 23 ++++++++++
> include/hw/core/cpu.h | 2 +
> target/arm/cpu.h | 48 +++++++++++++++++++++
> accel/kvm/kvm-all.c | 12 ++++++
> hw/arm/virt.c | 89 ++++++++++++++++++++++++++++++++++++---
> target/arm/cpu.c | 11 +++++
> target/arm/debug_helper.c | 29 -------------
> target/arm/helper.c | 12 +++++-
> target/arm/kvm.c | 35 ++++++++++++++-
> target/arm/machine.c | 70 +++++++++++++++++++++++++++---
> target/arm/trace-events | 10 +++++
> 11 files changed, 298 insertions(+), 43 deletions(-)
>