Re: Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Isaac Gutekunst Wed, 21 Oct 2015 05:13:36 -0700

Thanks for the reply.

On 10/21/2015 01:50 AM, Sebastian Huber wrote:



On 20/10/15 16:02, Isaac Gutekunst wrote:

Hi Devel,

I'm pretty sure this is a devel question, not users.


I'm working with a colleague at Vecna to port lwIP to the STM32F7 BSP we've 
developed.

We have a basic HTTP server that prints out the current list of tasks. We 
refresh the page at
a very high rate, and after about 1-30 minutes, get a crash.

Every time the exception is thrown after _CORE_mutex_Check_dispatch_for_seize( 
wait )  on
line 254 of coremuteximpl.h. Every time this is inside a pthread_mutex_lock() 
call.


Here is the full backtrace:

stm32fxxxx_fatal_error_handler() at hal-fatal-error-handler.c:126 0x800af92
_User_extensions_Fatal_visitor() at userextiterate.c:123 0x803212c
_User_extensions_Iterate() at userextiterate.c:166 0x80321c0
_User_extensions_Fatal() at userextimpl.h:254 0x802a85e
_Terminate() at interr.c:44 0x802a888
_CORE_mutex_Seize_body() at coremuteximpl.h:255 0x8068df0
_POSIX_Mutex_Lock_support() at mutexlocksupp.c:57 0x806907e
pthread_mutex_lock() at mutexlock.c:40 0x8068bee
sys_arch_sem_wait() at sys_arch.c:485 0x808da8a
sys_arch_mbox_fetch() at sys_arch.c:357 0x808d804
sys_timeouts_mbox_fetch() at timers.c:532 0x80883ce
tcpip_thread() at tcpip.c:95 0x808c170
_Thread_Handler() at threadhandler.c:102 0x806bbe8
_User_extensions_Thread_exitted() at userextimpl.h:244 0x806bb60
bsp_section_work_begin() at 0xc016a12c


However, the lwip code calling pthread_mutex_lock varies, but is consistently 
from lwIP.


Does this ring any bells?


Normally you get this if you obtain a locked mutex in interrupt context, but 
your stack trace
says you are not.


That was my first suspicion as well.


As far as I can tell this would only occur if the caller of pthread_mutex_lock was in a 
"bad"
state. I don't believe it is in an interrupt context, and don't know what other 
bad states
could exist.


We have

#define _CORE_mutex_Check_dispatch_for_seize(_wait) \
   (!_Thread_Dispatch_is_enabled() \
     && (_wait) \
     && (_System_state_Get() >= SYSTEM_STATE_UP))

What is the thread dispatch disable level and the system state at this point?

In case the thread dispatch disable level is not zero, then something is 
probably broken in the
operating system code which is difficult to find. Could be a general memory 
corruption problem
too. Which RTEMS version do you use?


The thread dispatch disable level is usually -1 or -2.
(0xFFFFFFFE or 0xFFFFFFD).

We first suspected that _Thread_Dispatch_decrement_disable_level (in threaddispatch.h) wasbeing called two many times (somehow). However, it always crashes without the check being fired.


For the record, I inserted this snippet of code:

    if (disable_level < 0) {
        _Terminate(
            INTERNAL_ERROR_CORE,
            true,
            INTERNAL_ERROR_MUTEX_OBTAIN_FROM_BAD_STATE
        );
      // In case the _Terminate call doesn't work
      __asm__ volatile ("BKPT #01");
    }

This pointed us towards a general memory corruption issue, so we are a bit stuck. Anotheravenue we are exploring is sticking tests for a negative disable_level all over the code hopingto get closer to the corruption.

We are running a fork based on 314ff3c43ff1c00232e201df68e39cc0e5600d95. Our changes since theninclude the addition of our STM32F BSPs, but no changes to the kernel except a new CAN driver.

Real trace functionality would be really nice, but we lack the hardware (both trace probes, andexposed trace lines).

This is probably a stretch, but does anyone have experience getting the ETM or ITM sending datato the ETB and getting the data over JTAG? (with RTEMS and GCC)


Isaac
_______________________________________________
devel mailing list
[email protected]
http://lists.rtems.org/mailman/listinfo/devel

Re: Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Reply via email to