Peter Maydell <peter.mayd...@linaro.org> writes: > On Wed, 23 Jul 2025 at 23:20, Fabiano Rosas <faro...@suse.de> wrote: >> >> Peter Maydell <peter.mayd...@linaro.org> writes: >> >> > We don't implement the Debug Communications Channel (DCC), but >> > we do attempt to provide dummy versions of its system registers >> > so that software that tries to access them doesn't fall over. >> > >> > However, we got the tx/rx register definitions wrong. These >> > should be: >> > >> > AArch32: >> > DBGDTRTX p14 0 c0 c5 0 (on writes) >> > DBGDTRRX p14 0 c0 c5 0 (on reads) >> > >> > AArch64: >> > DBGDTRTX_EL0 2 3 0 5 0 (on writes) >> > DBGDTRRX_EL0 2 3 0 5 0 (on reads) >> > DBGDTR_EL0 2 3 0 4 0 (reads and writes) >> > >> > where DBGDTRTX and DBGDTRRX are effectively different names for the >> > same 32-bit register, which has tx behaviour on writes and rx >> > behaviour on reads. The AArch64-only DBGDTR_EL0 is a 64-bit wide >> > register whose top and bottom halves map to the DBGDTRRX and DBGDTRTX >> > registers. >> > >> > Currently we have just one cpreg struct, which: >> > * calls itself DBGDTR_EL0 >> > * uses the DBGDTRTX_EL0/DBGDTRRX_EL0 encoding >> > * is marked as ARM_CP_STATE_BOTH but has the wrong opc1 >> > value for AArch32 >> > * is implemented as RAZ/WI >> > >> > Correct the encoding so: >> > * we name the DBGDTRTX/DBGDTRRX register correctly >> > * we split it into AA64 and AA32 versions so we can get the >> > AA32 encoding right >> > * we implement DBGDTR_EL0 at its correct encoding >> > >> > Cc: qemu-sta...@nongnu.org >> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2986 >> > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org> >> > Reviewed-by: Richard Henderson <richard.hender...@linaro.org> >> > Message-id: 20250708141049.778361-1-peter.mayd...@linaro.org >> > --- >> > target/arm/debug_helper.c | 13 +++++++++++-- >> > 1 file changed, 11 insertions(+), 2 deletions(-) >> > >> > diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c >> > index 69fb1d0d9ff..aee06d4d426 100644 >> > --- a/target/arm/debug_helper.c >> > +++ b/target/arm/debug_helper.c >> > @@ -988,11 +988,20 @@ static const ARMCPRegInfo debug_cp_reginfo[] = { >> > .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2, >> > .access = PL1_RW, .accessfn = access_tdcc, >> > .type = ARM_CP_CONST, .resetvalue = 0 }, >> > - /* DBGDTRTX_EL0/DBGDTRRX_EL0 depend on direction */ >> > - { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_BOTH, .cp = 14, >> > + /* Architecturally DBGDTRTX is named DBGDTRRX when used for reads */ >> > + { .name = "DBGDTRTX_EL0", .state = ARM_CP_STATE_AA64, >> > .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0, >> > .access = PL0_RW, .accessfn = access_tdcc, >> > .type = ARM_CP_CONST, .resetvalue = 0 }, >> > + { .name = "DBGDTRTX", .state = ARM_CP_STATE_AA32, .cp = 14, >> > + .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 0, >> > + .access = PL0_RW, .accessfn = access_tdcc, >> > + .type = ARM_CP_CONST, .resetvalue = 0 }, >> > + /* This is AArch64-only and is a combination of DBGDTRTX and DBGDTRRX >> > */ >> > + { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_AA64, >> > + .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 4, .opc2 = 0, >> > + .access = PL0_RW, .accessfn = access_tdcc, >> > + .type = ARM_CP_CONST, .resetvalue = 0 }, >> > /* >> > * OSECCR_EL1 provides a mechanism for an operating system >> > * to access the contents of EDECCR. EDECCR is not implemented though, >> >> Hi, this patch breaks migration. I'm leaving for the day and will take a >> closer look in the morning. But since we have timezones, here it is: > > Thanks for the report; I can repro this. It happens because > the loop in cpu_post_load hits the "register in their list but > not ours" check, because the source VM has the AArch32 > {.cp = 14, .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0} > register which should never have existed. >
My debugging (in dst) shows: (gdb) x/16x 0x555557ad5d20 //cpu->cpreg_vmstate_indexes[0x18 to 0x1c] 0x555557ad5d20: 0x200e0205 0x40200000 0x200e0284< 0x40200000 0x555557ad5d30: 0x200e0285 0x40200000 0x200e0298 0x40200000 0x555557ad5d40: 0x200e0302 0x40200000 0x200e0380 0x40200000 0x555557ad5d50: 0x200e0800 0x40200000 0x200e0838 0x40200000 (gdb) x/16x 0x555557ad4ac0 //cpu->cpreg_indexes[0x18 to 0x1c] p0x555557ad4ac0: 0x200e0205 0x40200000 0x200e0280< 0x40200000 0x555557ad4ad0: 0x200e0284 0x40200000 0x200e0285 0x40200000 0x555557ad4ae0: 0x200e0302 0x40200000 0x200e0380 0x40200000 0x555557ad4af0: 0x200e0800 0x40200000 0x200e0838 0x40200000 > I'm not sure how to handle this, as we have no mechanism for > "ignore this incoming register value, it is bogus". I'm surprised > we've never run into this before... > I was thinking the same. I actually don't understand what the encoding in cpu->cpreg_indexes is supposed to represent. How does comparing the indexes implies "in our list"/"in their list"? Is there some sort of ISA level assumption? if (cpu->cpreg_vmstate_indexes[v] > cpu->cpreg_indexes[i]) { /* register in our list but not incoming : skip it */ continue; } if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) { /* register in their list but not ours: fail migration */ return -1; } > I won't be able to look at this further til the tail end of > next week. > > As an aside, it is a shame that post_load hooks do not > take an Error** -- if they did we would be able to report > this more usefully to the user by saying why the migration > failed instead of just returning -1. Perhaps it would be > worth adding _err versions of the hook fields in > VMStateDescription so that devices can optionally > implement them instead if they have interesting or > complicated errors to report ? > Definitely, there's work happening at moment in the upper layer that will allow errors to be propagated from post_load. https://lore.kernel.org/r/20250725-propagate_tpm_error-v7-0-d52704443...@redhat.com > thanks > -- PMM