On Mon, 26 Jan 2026 at 16:54, Eric Auger <[email protected]> wrote:
>
> When migrating ARM guests accross same machines with different host
> kernels we are likely to encounter failures such as:
>
> "failed to load cpu:cpreg_vmstate_array_len"
>
> This is due to the fact KVM exposes a different number of registers
> to qemu on source and destination. When trying to migrate a bigger
> register set to a smaller one, qemu cannot save the CPU state.
>
> For example, recently we faced such kind of situations with:
> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo
>   register from v6.16 onwards. Causes backward migration failure.
> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1
>   from v6.13 onwards. Causes forward migration failure.

Hi; sorry I haven't given this series any attention before.

(1) Yes, this is definitely a problem we need to solve.

(2) What are the requirements we have for this?

This series sets up CPU properties controlling this, and then
sets them in the virt machine model based on the machine
type, but this seems awkward for two reasons:

 * using properties confines us to using a "text string"
   way of describing the behaviour; if we could implement
   the handling in code and C data structures in target/arm
   we could potentially do it in a more flexible and
   readable way (e.g. being able to specify the register
   via something other than a raw hex value)
 * different host kernel versions isn't really related to
   the QEMU version, so tying it to a versioned machine
   type doesn't seem to fit

Q: Do we need the user to be able to control this (e.g. adding
extra registers to be ignored) on their command line, or
can we say "you need a newer QEMU that understands how to
deal with this register if you want to do migrations involving
this newer kernel version" ?

Q: This series adds a "hide this register" option which
stops the register appearing in the outbound migration data.
Do we need that, or would it be enough to have "ignore this
register in the inbound migration data" ? Assuming we're
not trying to migrate backwards to an older QEMU version
that's unaware of the new register, that seems to me like
it should be equivalent.

(3) Categories of sysreg that are causing problems:

a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting
controls what the kernel is exposing to the guest, and so we need
to be able to have the user tell QEMU to use a specific version
that's not the host kernel default if the default isn't one
that's valid for all older kernels. Sometimes the new kernel
default is the same as the old kernel's behaviour and in those
cases we also want handling of "if you see the control reg in
the incoming data and its value is the default then it's OK to
ignore it".

b: "things exposed that should not have been" -- where the old kernel
exposed a register but the new one does not because exposing the
register was wrong (i.e. a bug). The handling here can be
"ignore this in migration input if present". Examples are the
TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the
corresponding feature was disabled for the guest.

c: "things not exposed that should have been" -- where a new kernel
exposes a new register that the old one does not, and so migration
from a host with the new kernel to the old one fails. In most cases
it should be possible to handle this with "ignore in migration input
if present", or "fail migration if incoming value is not some safe
default, but if it is that default value then ignore".

Have I missed anything ?

(4) Mechanisms for handling them:

This series provides two mechanisms:

"safe missing reg" -- these registers are ignored if they appear
in the incoming migration data.

"hidden" -- the behaviour here is that we effectively entirely
ignore the register, so we do not read it from the kernel or write
it back, do not send it in outbound migration data, and do
not expect to see it in incoming migration data.

The "arm: add kvm-psci-version vcpu property" series handles one
specific "control" register, with a specific user-facing cpu property.
If new "control" type registers are rare, this seems like a good
way to go, because it means we can give the user an interface that
is reasonably clear about what it does, and we can provide better
errors on the migration-destination side (e.g. pointing the user
at the need to specify the property on the source side to get a
VM they can migrate to this destination).

The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2.
However, I'm not sure this is the right way to handle this register.
Judging from the documentation, this seems to be a "control" register:
it would let QEMU enable certain things to be visible to the guest.
It also is odd to treat this differently from the existing
KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same
semantics.

I think that the right way to treat this register would be
"if this is present in the incoming migration system and the
host kernel doesn't know about it, a value of zero is OK, but
any other value should fail migration".

In general I'm not convinced that "hidden" is a useful thing
to provide -- it should always be fine for QEMU to read and
write back to the same host kernel some sysreg it doesn't
know about, so what "hidden" is mostly doing is "don't put
this into outgoing migration data". Do we need to be able
to do that, or can we instead always use a "ignore in
incoming migration data" strategy?

(5) My preferences

I think that assuming that it meets the requirements, I would
prefer something like a mechanism where we use some kind of
C data structure / code in target/arm/machine.c to represent
"this register needs some special handling", where the special
handling might be:
 - ignore if present in input
 - if present in input, value must be X, otherwise fail
   migration
 - maybe some other things if we need them

and this is not tied to specific QEMU machine versions and
isn't something we expose via QOM properties.

I'd rather avoid the "hidden" register idea unless we
definitely need it in addition to "ignore in incoming data".

thanks
-- PMM

Reply via email to