On Mon, 26 Jan 2026 at 16:55, Eric Auger <[email protected]> wrote:
>
> Currently when the number of KVM registers exposed by the source is
> larger than the one exposed on the destination, the migration fails
> with: "failed to load cpu:cpreg_vmstate_array_len"
>
> This gives no information about which registers are causing the trouble.
>
> This patch reworks the target/arm/machine code so that it becomes
> able to handle an input stream with a larger set of registers than
> the destination and print useful information about which registers
> are causing the trouble. The migration outcome is unchanged:
> - unexpected registers still will fail the migration
> - missing ones are printed but will not fail the migration, as done today.
Improving the diagnostics here is a great idea.
> The input stream can contain MAX_CPREG_VMSTATE_ANOMALIES(10) extra
> registers compared to what exists on the target.
>
> If there are more registers we will still hit the previous
> "load cpu:cpreg_vmstate_array_len" error.
>
> At most, MAX_CPREG_VMSTATE_ANOMALIES missing registers
> and MAX_CPREG_VMSTATE_ANOMALIES unexpected registers are printed.
>
> Example:
>
> qemu-system-aarch64: kvm_arm_cpu_post_load Missing register in input stream:
> 0 0x6030000000160003 fw feat reg 3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input
> stream: 0 0x603000000013c103 op0:3 op1:0 crn:2 crm:0 op2:3
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input
> stream: 1 0x603000000013c512 op0:3 op1:0 crn:10 crm:2 op2:2
> qemu-system-aarch64: kvm_arm_cpu_post_load Unexpected register in input
> stream: 2 0x603000000013c513 op0:3 op1:0 crn:10 crm:2 op2:3
> qemu-system-aarch64: error while loading state for instance 0x0 of device
> 'cpu'
> qemu-system-aarch64: load of migration failed: Operation not permitted
>
> With TCG there is no user friendly formatting of the faulting
> register indexes as with KVM. However the 2 added trace points
> help to identify the culprit indexes.
Could we move kvm_print_register_name() out of kvm.c and into
somewhere that the TCG code can use it? (I did think when I
was reviewing the patch that added that that we might want it
for TCG too eventually.)
> @@ -990,7 +991,13 @@ static int cpu_pre_load(void *opaque)
> {
> ARMCPU *cpu = opaque;
> CPUARMState *env = &cpu->env;
> + int arraylen = cpu->cpreg_vmstate_array_len +
> MAX_CPREG_VMSTATE_ANOMALIES;
>
> + cpu->cpreg_vmstate_indexes = g_renew(uint64_t,
> cpu->cpreg_vmstate_indexes,
> + arraylen);
> + cpu->cpreg_vmstate_values = g_renew(uint64_t, cpu->cpreg_vmstate_values,
> + arraylen);
> + cpu->cpreg_vmstate_array_len = arraylen;
It seems a bit odd to extend these on cpu_pre_load, especially
since it means we'll do so on every cpu_pre_load call, which I
think can happen if you try an inbound migration, it fails, and
then you retry it.
I think it ought to be possible to both avoid this reallocation
and the problem noted in the commit message where more than 10
extra registers results in an unhelpful message, if we can
convert the vmstate fields from VMSTATE_VARRAY_INT32 to
VMSTATE_VARRAY_INT32_ALLOC. (That latter doesn't exist yet but
will be the INT32 equivalent of VMSTATE_VARRAY_UINT32_ALLOC.)
If I have read the code correctly, these should work by
having the inbound migration code allocate the buffer for the
array data instead of expecting it to be pre-allocated -- that
means our post_load function can look at all the data it got
without imposing a length limitation.
I think (but we should check :-)) that the data in the migration
stream is the same in both cases, so this will not be a compat break.
(Some existing code will need adjustment to avoid a memory leak,
e.g. g_free any existing array in pre_load.)
thanks
-- PMM