Hello Cedric,

thanks for having a look at this.

On 06.02.24 16:37, Cedric Berger wrote:
Hello,

I've started to investigate bug #4923.

A first problem can be found after applying the following patch to the paranoia.exe test:

https://devel.rtems.org/attachment/ticket/4923/bug-1-make-paranoia-test-fail.patch

The test fails because of the added sleep(1) at the beginning of Init() and the CONFIGURE_STACK_CHECKER_ENABLED option which poisons the stack.

What I have been able to observe is something like this:

1) Because of the sleep(1) call, RTEMS goes through a INIT->IDLE->INIT pair of context switches.

2) The first context switch goes through _Thread_Dispatch_direct, which does not use an ARMv7 exception frame save/restore.

3) In _CPU_Context_switch(), the FPSCR register is not saved/restored.

This is related to:

http://devel.rtems.org/ticket/4027


4) After the 1s sleep ends, the second context switch originate with _ARMV7M_Pendable_service_call()

5) In _ARMV7M_Pendable_service_call(), a new stack frame is generaterd (--ef;) but only partially initalized with PC and XPSR. Everything else on that exception frame comes from the poisonned stack.

The _ARMV7M_Pendable_service_call() and _ARMV7M_Supervisor_call() work as a team. The --ef and ++ef is there to preserve the original exception frame across the jump to and from _ARMV7M_Thread_dispatch().

void _ARMV7M_Pendable_service_call( void )
{
  Per_CPU_Control *cpu_self = _Per_CPU_Get();

  /*
   * We must check here if a thread dispatch is allowed.  Right after a
* "msr basepri_max, %[basepri]" instruction an interrupt service may still
   * take place.  However, pendable service calls that are activated during
* this interrupt service may be delayed until interrupts are enable again.
   */
  if (
( cpu_self->isr_nest_level | cpu_self->thread_dispatch_disable_level ) == 0
  ) {
    ARMV7M_Exception_frame *ef;

    cpu_self->isr_nest_level = 1;

    _ARMV7M_SCB->icsr = ARMV7M_SCB_ICSR_PENDSVCLR;
    _ARMV7M_Trigger_lazy_floating_point_context_save();

At this point, the floating point context should be saved on the exception frame. The FPCCR.LSPACT bit should be 0, to indicate that lazy state preservation is no longer active.

    ef = (ARMV7M_Exception_frame *) _ARMV7M_Get_PSP();
    --ef;
    _ARMV7M_Set_PSP( (uint32_t) ef );

This new exception frame is just there to jump to _ARMV7M_Thread_dispatch(). Here could be the problem, maybe we have to set FPCCR.LSPACT to 1, so that we do not restore a garbage FP state, see also:

https://developer.arm.com/documentation/ddi0403/d/System-Level-Architecture/System-Level-Programmers--Model/ARMv7-M-exception-model/Exception-return-behavior?lang=en

    /*
     * According to "ARMv7-M Architecture Reference Manual" section B1.5.6
     * "Exception entry behavior" the return address is half-word aligned.
     */
    ef->register_pc = (void *)
      ((uintptr_t) _ARMV7M_Thread_dispatch & ~((uintptr_t) 1));

    ef->register_xpsr = 0x01000000U;
  }
}

void _ARMV7M_Supervisor_call( void )
{
  Per_CPU_Control *cpu_self = _Per_CPU_Get();
  ARMV7M_Exception_frame *ef;

  _ARMV7M_Trigger_lazy_floating_point_context_save();

  ef = (ARMV7M_Exception_frame *) _ARMV7M_Get_PSP();
  ++ef;
  _ARMV7M_Set_PSP( (uint32_t) ef );

  cpu_self->isr_nest_level = 0;

  if ( cpu_self->dispatch_necessary ) {
    _ARMV7M_Pendable_service_call();
  }
}



6) Because of that, the FPU registers S0-S15 and FPSCR is loaded with garbage.

7) This garbage is transfered to the Init task after _CPU_Context_switch(). It does not matter for S0-S15 because they don't need to be preserved across function calls, but for FPSCR it matters.

8) With the bogus FPSCR, pow(2.0, 1.0) returns 1.9999999999, which breaks the paranoia FPU tests and our javascript engine :)

I might have missed something, but with the two following fixes, the problem goes away:

1) https://devel.rtems.org/attachment/ticket/4923/fix-1-save-and-restore-fpscr.patch

This patch explicitely save/restore the FPSCR in _CPU_Context_switch()

2) https://devel.rtems.org/attachment/ticket/4923/fix-2-clean-initial-thread-frame.patch

This patch prevents the uninitialized or poisoned stack data to be loaded into the VFP registers.

There is still a lot that I need to test and what I don't understand, but what do you guys think about these 2 patches?

Saving and restoring the FPSCR could make sense independent of the current issue. However, I am not sure if we really need it.

Could you please test also if this patch alone fixes the problem?

diff --git a/cpukit/score/cpu/arm/armv7m-isr-dispatch.c b/cpukit/score/cpu/arm/armv7m-isr-dispatch.c
index 5ab9d1fae2..b30197c45b 100644
--- a/cpukit/score/cpu/arm/armv7m-isr-dispatch.c
+++ b/cpukit/score/cpu/arm/armv7m-isr-dispatch.c
@@ -78,6 +78,9 @@ void _ARMV7M_Pendable_service_call( void )

     _ARMV7M_SCB->icsr = ARMV7M_SCB_ICSR_PENDSVCLR;
     _ARMV7M_Trigger_lazy_floating_point_context_save();
+#ifdef ARM_MULTILIB_VFP
+    _ARMV7M_SCB->fpccr |= 0x1;
+#endif

     ef = (ARMV7M_Exception_frame *) _ARMV7M_Get_PSP();
     --ef;

--
embedded brains GmbH & Co. KG
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
_______________________________________________
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Reply via email to