Peter Maydell <peter.mayd...@linaro.org> writes:

> On Wed, 23 Jul 2025 at 23:20, Fabiano Rosas <faro...@suse.de> wrote:
>>
>> Peter Maydell <peter.mayd...@linaro.org> writes:
>>
>> > We don't implement the Debug Communications Channel (DCC), but
>> > we do attempt to provide dummy versions of its system registers
>> > so that software that tries to access them doesn't fall over.
>> >
>> > However, we got the tx/rx register definitions wrong. These
>> > should be:
>> >
>> > AArch32:
>> >   DBGDTRTX   p14 0 c0 c5 0  (on writes)
>> >   DBGDTRRX   p14 0 c0 c5 0  (on reads)
>> >
>> > AArch64:
>> >   DBGDTRTX_EL0  2 3 0 5 0 (on writes)
>> >   DBGDTRRX_EL0  2 3 0 5 0 (on reads)
>> >   DBGDTR_EL0    2 3 0 4 0 (reads and writes)
>> >
>> > where DBGDTRTX and DBGDTRRX are effectively different names for the
>> > same 32-bit register, which has tx behaviour on writes and rx
>> > behaviour on reads.  The AArch64-only DBGDTR_EL0 is a 64-bit wide
>> > register whose top and bottom halves map to the DBGDTRRX and DBGDTRTX
>> > registers.
>> >
>> > Currently we have just one cpreg struct, which:
>> >  * calls itself DBGDTR_EL0
>> >  * uses the DBGDTRTX_EL0/DBGDTRRX_EL0 encoding
>> >  * is marked as ARM_CP_STATE_BOTH but has the wrong opc1
>> >    value for AArch32
>> >  * is implemented as RAZ/WI
>> >
>> > Correct the encoding so:
>> >  * we name the DBGDTRTX/DBGDTRRX register correctly
>> >  * we split it into AA64 and AA32 versions so we can get the
>> >    AA32 encoding right
>> >  * we implement DBGDTR_EL0 at its correct encoding
>> >
>> > Cc: qemu-sta...@nongnu.org
>> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2986
>> > Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
>> > Reviewed-by: Richard Henderson <richard.hender...@linaro.org>
>> > Message-id: 20250708141049.778361-1-peter.mayd...@linaro.org
>> > ---
>> >  target/arm/debug_helper.c | 13 +++++++++++--
>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/target/arm/debug_helper.c b/target/arm/debug_helper.c
>> > index 69fb1d0d9ff..aee06d4d426 100644
>> > --- a/target/arm/debug_helper.c
>> > +++ b/target/arm/debug_helper.c
>> > @@ -988,11 +988,20 @@ static const ARMCPRegInfo debug_cp_reginfo[] = {
>> >        .opc0 = 2, .opc1 = 0, .crn = 0, .crm = 3, .opc2 = 2,
>> >        .access = PL1_RW, .accessfn = access_tdcc,
>> >        .type = ARM_CP_CONST, .resetvalue = 0 },
>> > -    /* DBGDTRTX_EL0/DBGDTRRX_EL0 depend on direction */
>> > -    { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_BOTH, .cp = 14,
>> > +    /* Architecturally DBGDTRTX is named DBGDTRRX when used for reads */
>> > +    { .name = "DBGDTRTX_EL0", .state = ARM_CP_STATE_AA64,
>> >        .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0,
>> >        .access = PL0_RW, .accessfn = access_tdcc,
>> >        .type = ARM_CP_CONST, .resetvalue = 0 },
>> > +    { .name = "DBGDTRTX", .state = ARM_CP_STATE_AA32, .cp = 14,
>> > +      .opc1 = 0, .crn = 0, .crm = 5, .opc2 = 0,
>> > +      .access = PL0_RW, .accessfn = access_tdcc,
>> > +      .type = ARM_CP_CONST, .resetvalue = 0 },
>> > +    /* This is AArch64-only and is a combination of DBGDTRTX and DBGDTRRX 
>> > */
>> > +    { .name = "DBGDTR_EL0", .state = ARM_CP_STATE_AA64,
>> > +      .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 4, .opc2 = 0,
>> > +      .access = PL0_RW, .accessfn = access_tdcc,
>> > +      .type = ARM_CP_CONST, .resetvalue = 0 },
>> >      /*
>> >       * OSECCR_EL1 provides a mechanism for an operating system
>> >       * to access the contents of EDECCR. EDECCR is not implemented though,
>>
>> Hi, this patch breaks migration. I'm leaving for the day and will take a
>> closer look in the morning. But since we have timezones, here it is:
>
> Thanks for the report; I can repro this. It happens because
> the loop in cpu_post_load hits the "register in their list but
> not ours" check, because the source VM has the AArch32
> {.cp = 14, .opc0 = 2, .opc1 = 3, .crn = 0, .crm = 5, .opc2 = 0}
> register which should never have existed.
>

My debugging (in dst) shows:

(gdb) x/16x 0x555557ad5d20 //cpu->cpreg_vmstate_indexes[0x18 to 0x1c]
0x555557ad5d20: 0x200e0205      0x40200000      0x200e0284<     0x40200000
0x555557ad5d30: 0x200e0285      0x40200000      0x200e0298      0x40200000
0x555557ad5d40: 0x200e0302      0x40200000      0x200e0380      0x40200000
0x555557ad5d50: 0x200e0800      0x40200000      0x200e0838      0x40200000

(gdb) x/16x 0x555557ad4ac0 //cpu->cpreg_indexes[0x18 to 0x1c]
p0x555557ad4ac0: 0x200e0205      0x40200000      0x200e0280<     0x40200000
0x555557ad4ad0: 0x200e0284      0x40200000      0x200e0285      0x40200000
0x555557ad4ae0: 0x200e0302      0x40200000      0x200e0380      0x40200000
0x555557ad4af0: 0x200e0800      0x40200000      0x200e0838      0x40200000


> I'm not sure how to handle this, as we have no mechanism for
> "ignore this incoming register value, it is bogus". I'm surprised
> we've never run into this before...
>

I was thinking the same.

I actually don't understand what the encoding in cpu->cpreg_indexes is
supposed to represent. How does comparing the indexes implies "in our
list"/"in their list"? Is there some sort of ISA level assumption?

       if (cpu->cpreg_vmstate_indexes[v] > cpu->cpreg_indexes[i]) {
            /* register in our list but not incoming : skip it */
            continue;
        }
        if (cpu->cpreg_vmstate_indexes[v] < cpu->cpreg_indexes[i]) {
            /* register in their list but not ours: fail migration */
            return -1;
        }
 
> I won't be able to look at this further til the tail end of
> next week.
>
> As an aside, it is a shame that post_load hooks do not
> take an Error** -- if they did we would be able to report
> this more usefully to the user by saying why the migration
> failed instead of just returning -1. Perhaps it would be
> worth adding _err versions of the hook fields in
> VMStateDescription so that devices can optionally
> implement them instead if they have interesting or
> complicated errors to report ?
>

Definitely, there's work happening at moment in the upper layer that
will allow errors to be propagated from post_load.

https://lore.kernel.org/r/20250725-propagate_tpm_error-v7-0-d52704443...@redhat.com

> thanks
> -- PMM

Reply via email to