Hi Alex, On 2/9/26 3:59 PM, Alex Bennée wrote: > Peter Maydell <[email protected]> writes: > >> On Mon, 26 Jan 2026 at 16:54, Eric Auger <[email protected]> wrote: >>> When migrating ARM guests accross same machines with different host >>> kernels we are likely to encounter failures such as: >>> >>> "failed to load cpu:cpreg_vmstate_array_len" >>> >>> This is due to the fact KVM exposes a different number of registers >>> to qemu on source and destination. When trying to migrate a bigger >>> register set to a smaller one, qemu cannot save the CPU state. >>> >>> For example, recently we faced such kind of situations with: >>> - unconditionnal exposure of KVM_REG_ARM_VENDOR_HYP_BMAP_2 FW pseudo >>> register from v6.16 onwards. Causes backward migration failure. >>> - removal of unconditionnal exposure of TCR2_EL1, PIRE0_EL1, PIR_EL1 >>> from v6.13 onwards. Causes forward migration failure. >> Hi; sorry I haven't given this series any attention before. >> >> (1) Yes, this is definitely a problem we need to solve. >> >> (2) What are the requirements we have for this? >> >> This series sets up CPU properties controlling this, and then >> sets them in the virt machine model based on the machine >> type, but this seems awkward for two reasons: >> >> * using properties confines us to using a "text string" >> way of describing the behaviour; if we could implement >> the handling in code and C data structures in target/arm >> we could potentially do it in a more flexible and >> readable way (e.g. being able to specify the register >> via something other than a raw hex value) >> * different host kernel versions isn't really related to >> the QEMU version, so tying it to a versioned machine >> type doesn't seem to fit >> >> Q: Do we need the user to be able to control this (e.g. adding >> extra registers to be ignored) on their command line, or >> can we say "you need a newer QEMU that understands how to >> deal with this register if you want to do migrations involving >> this newer kernel version" ? >> >> Q: This series adds a "hide this register" option which >> stops the register appearing in the outbound migration data. >> Do we need that, or would it be enough to have "ignore this >> register in the inbound migration data" ? Assuming we're >> not trying to migrate backwards to an older QEMU version >> that's unaware of the new register, that seems to me like >> it should be equivalent. to me we may try to migrate backwards to an older QEMU version that's unaware of this new reg. > As I understand it these signal to the guest what services the > hypervisor supplies. I assume the guest kernel only reads these once at > boot up rather than before invoking any particular service? > > If this is the case then things would break if a new host couldn't > support the guest's request of the hypervisor service. Effectively we need to make sure the reg value is not different from the reset value (meaning qemu has not enabled any feature on src that the dest is not able to run).
Eric > >> (3) Categories of sysreg that are causing problems: >> >> a: "controls" -- like the PSCI_VERSION pseudoreg. Here the setting >> controls what the kernel is exposing to the guest, and so we need >> to be able to have the user tell QEMU to use a specific version >> that's not the host kernel default if the default isn't one >> that's valid for all older kernels. Sometimes the new kernel >> default is the same as the old kernel's behaviour and in those >> cases we also want handling of "if you see the control reg in >> the incoming data and its value is the default then it's OK to >> ignore it". >> >> b: "things exposed that should not have been" -- where the old kernel >> exposed a register but the new one does not because exposing the >> register was wrong (i.e. a bug). The handling here can be >> "ignore this in migration input if present". Examples are the >> TCG2_EL1, PIRE0_EL1, PIR_EL1 regs that shouldn't exist if the >> corresponding feature was disabled for the guest. >> >> c: "things not exposed that should have been" -- where a new kernel >> exposes a new register that the old one does not, and so migration >> from a host with the new kernel to the old one fails. In most cases >> it should be possible to handle this with "ignore in migration input >> if present", or "fail migration if incoming value is not some safe >> default, but if it is that default value then ignore". > Shame we don't know if the guest ever read the register. If the old host > provides features the new host doesn't but it never probed anyway then > neither the guest or new host needs to care about the register. > >> Have I missed anything ? >> >> (4) Mechanisms for handling them: >> >> This series provides two mechanisms: >> >> "safe missing reg" -- these registers are ignored if they appear >> in the incoming migration data. >> >> "hidden" -- the behaviour here is that we effectively entirely >> ignore the register, so we do not read it from the kernel or write >> it back, do not send it in outbound migration data, and do >> not expect to see it in incoming migration data. >> >> The "arm: add kvm-psci-version vcpu property" series handles one >> specific "control" register, with a specific user-facing cpu property. >> If new "control" type registers are rare, this seems like a good >> way to go, because it means we can give the user an interface that >> is reasonably clear about what it does, and we can provide better >> errors on the migration-destination side (e.g. pointing the user >> at the need to specify the property on the source side to get a >> VM they can migrate to this destination). >> >> The only use of "hidden" so far is for KVM_REG_ARM_VENDOR_HYP_BMAP_2. >> However, I'm not sure this is the right way to handle this register. >> Judging from the documentation, this seems to be a "control" register: >> it would let QEMU enable certain things to be visible to the guest. >> It also is odd to treat this differently from the existing >> KVM_REG_ARM_VENDOR_HYP_BMAP register, which has exactly the same >> semantics. >> >> I think that the right way to treat this register would be >> "if this is present in the incoming migration system and the >> host kernel doesn't know about it, a value of zero is OK, but >> any other value should fail migration". >> >> In general I'm not convinced that "hidden" is a useful thing >> to provide -- it should always be fine for QEMU to read and >> write back to the same host kernel some sysreg it doesn't >> know about, so what "hidden" is mostly doing is "don't put >> this into outgoing migration data". Do we need to be able >> to do that, or can we instead always use a "ignore in >> incoming migration data" strategy? >> >> (5) My preferences >> >> I think that assuming that it meets the requirements, I would >> prefer something like a mechanism where we use some kind of >> C data structure / code in target/arm/machine.c to represent >> "this register needs some special handling", where the special >> handling might be: >> - ignore if present in input >> - if present in input, value must be X, otherwise fail >> migration >> - maybe some other things if we need them >> >> and this is not tied to specific QEMU machine versions and >> isn't something we expose via QOM properties. >> >> I'd rather avoid the "hidden" register idea unless we >> definitely need it in addition to "ignore in incoming data". >> >> thanks >> -- PMM
