On 9/17/2014 12:44 PM, Hesham Moustafa wrote: > > On Tue, Sep 16, 2014 at 11:08 PM, Joel Sherrill <joel.sherr...@oarcorp.com > <mailto:joel.sherr...@oarcorp.com>> wrote: > > Gedare.. cc'ed you for help in spotting an empty rbtree > in gdb. See below. > On 9/16/2014 2:45 PM, Hesham Moustafa wrote: > > Breakpoint 2, 0x00000600 in _unalign () > > (gdb) bt > > #0 0x00000600 in _unalign () > > #1 0x0002ec4c in _RBTree_Next ( > > node=0x40890, dir=RBT_RIGHT) > > at > ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35 > > #2 0x0002e2f4 in _RBTree_Successor ( > > node=0x40890) > > at > ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512 > > #3 0x0002e8c0 in _RBTree_Extract ( > > the_rbtree=0x4198c, > > the_node=0x40890) > > at > ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106 > > #4 0x00021524 in _RBTree_Get ( > > the_rbtree=0x4198c, dir=RBT_LEFT) > > at > ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540 > > #5 0x000215c8 in _Thread_queue_Dequeue > > (the_thread_queue=0x4198c) > > ---Type <return> to continue, or q <return> to quit--- > > at > ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51 > > #6 0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c, > > id=436273153, > > api_semaphore_mp_support=0x0) > > at > ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37 > > #7 0x00014868 in rtems_semaphore_release (id=436273153) > > at > ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102 > > #8 0x00026cfc in rtems_libio_unlock () > > at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253 > > #9 0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0) > > at > > ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type > > <return> to continue, or q <return> to quit--- > > k_and_unlock.c:39 > > #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c) > > at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292 > > #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c) > > at > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29 > > #12 0x00029734 in rtems_libio_free ( > > iop=0x49c50) > > at > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136 > > #13 0x0002912c in close (fd=0) > > at > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38 > > #14 0x000064b0 in rtems_libio_exit () > > at > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31 > > ---Type <return> to continue, or q <return> to quit--- > > #15 0x0003b058 in _exit (status=0) > > at > > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46 > > #16 0x00034798 in exit (code=0) > > at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70 > > #17 0x00002e3c in Test_task (unused=1) > > at > > ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41 > > #18 0x000340f0 in _Thread_Handler () > > at > ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192 > > #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890) > > at > ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243 > > Backtrace stopped: frame did not save the PC > > (gdb) > > > > > > It breaks at _RBTree_Next specifically at the following line: > > while ( ( current = current->child[ opp_dir ] ) != NULL ) > > > > (gdb) p current->child[ opp_dir ] > > Cannot access memory at address 0xa010006 > > (gdb) p current > > $1 = (RBTree_Node *) 0xa010002 > These look like object ids. > > This address is invalid, the current memory length should be only 32 > > MB (0x2000000) > > > > >http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20 > > > > So I guest current->child is overwritten somehow? > Yep. Two approaches. > > + Set a watchpoint in gdb if it is supported. But even if supported, > it will likely slow the run tremendously. > > There is no HW watchpoint supported. > > + Break selectively and more or less binary search for where it is > overwritten. I would break at the first call to _ISR_Dispatch > (or whatever you called it) and see if it gets clobbered. > > That could be clobbered VERY early in the program. It could be > a blown stack. But it could just be a stray write. Check the value > of that semaphore's rbtree when you get to Init and just > break periodically and see where it is corrupt. > > That's what I did. As you assumed, it's clobbered very early. > > Breakpoint 1, _Objects_Extend_information ( > information=0x3e26c <_RTEMS_tasks_Information>) > at > ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67 > 67 do_extend = true; > (gdb) bt > #0 _Objects_Extend_information ( > information=0x3e26c <_RTEMS_tasks_Information>) > at > ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67 > #1 0x0001b554 in _Objects_Initialize_information ( > information=0x3e26c <_RTEMS_tasks_Information>, > the_api=OBJECTS_CLASSIC_API, the_class=1, maximum=4, > size=1424, is_string=false, maximum_name_length=4) > at > ../../../../../../rtems/c/src/../../cpukit/score/src/objectinitializeinformation.c:126 > #2 0x0002c688 in _RTEMS_tasks_Manager_initialization () > at ../../../../../../rtems/c/src/../../cpukit/rtems/src/tasks.c:197 > #3 0x00015bd4 in _RTEMS_API_Initialize () > at ../../../../../../rtems/c/src/../../cpukit/sapi/src/rtemsapi.c:59 > #4 0x0001590c in rtems_initialize_data_structures () > at ../../../../../../rtems/c/src/../../cpukit/sapi/src/exinit.c:140 > #5 0x0000333c in boot_card (cmdline=0x0) > ---Type <return> to continue, or q <return> to quit--- > at > ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/bootcard.c:92 > #6 0x00000000 in ?? () > (gdb) > > Specifically, here > http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261 I think this is the first time it is initialized. What's the next time it is modified?
But this looking like task manager class information and not a semaphore like the crash so this is odd. :( > > I cc'ed Gedare because I don't know how to spot that the rbtree > is empty in gdb. > > You need to see where that memory is overwritten. > > Again running all tests with the simulator clock tick could > eliminate the ISR code as the culprit. :) > > On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill > > <joel.sherr...@oarcorp.com <mailto:joel.sherr...@oarcorp.com>> wrote: > >> On 9/16/2014 2:17 PM, Hesham Moustafa wrote: > >>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill > >>> <joel.sherr...@oarcorp.com <mailto:joel.sherr...@oarcorp.com>> wrote: > >>>> > >>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote: > >>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill > >>>>> <joel.sherr...@oarcorp.com <mailto:joel.sherr...@oarcorp.com>> > wrote: > >>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote: > >>>>>>> Hi > >>>>>>> > >>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill > >>>>>>> <joel.sherr...@oarcorp.com <mailto:joel.sherr...@oarcorp.com>> > wrote: > >>>>>>>> I don't understand this but I got it applied. > >>>>>>>> > >>>>>>>> I manually edited the saved email to delete the preinstall.am > <http://preinstall.am> > >>>>>>>> changes. I committed the rest. Then I ran bootstrap -p myself > >>>>>>>> and folded that into the rest of your patch. > >>>>>>>> > >>>>>>>> It should all be committed now. > >>>>>>>> > >>>>>>> Thanks for doing this, me too do not know what's wrong. BTW, > commits > >>>>>>> are not mirrored on github since 4 days ago. > >>>>>>> > >>>>>>>> How about some new test results. :) > >>>>>>>> > >>>>>>> I did run one last night, no big progress since previous results > :( Is > >>>>>>> there any tool, script, utility program or whatever that I can > use to > >>>>>>> detect wrong memory access (i.e, stack overwrite, heap > corruption, > >>>>>>> access to another task context)? I tried to add > -fstack-protector-all > >>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker just > hangs. > >>>>>> I haven't checked into how gcc does its stack overwrite > protection. > >>>>>> > >>>>>> The tests by themselves don't have these problems. The first > >>>>>> possible source is incorrect layout of sections to memory by > >>>>>> the linker script. There is some debug code in boot > >>>>>> > >>>>>> There used to be debug printk's in bspgetworkarea.c so you > >>>>>> could check if areas overlapped. That usually causes bad issues > >>>>>> though. Let's go through some basics: > >>>>>> > >>>>>> + Does hello world run and exit cleanly? > >>>>>> > >>>>> The output of Hello World is: > >>>>> > >>>>> *** BEGIN OF TEST HELLO WORLD *** > >>>>> Hello World > >>>>> *** END OF TEST HELLO WORLD *** > >>>>> Fatal Error 5.0 Halted > >>>>> > >>>>> From GDB: > >>>>> > >>>>> Breakpoint 1, _Terminate ( > >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false, > >>>>> the_error=0) > >>>>> at > >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39 > >>>>> 39 _ISR_Disable_without_giant( level ); > >>>>> (gdb) bt > >>>>> #0 _Terminate ( > >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false, > >>>>> the_error=0) > >>>>> at > >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39 > >>>>> #1 0x0003b5f8 in rtems_shutdown_executive (result=0) > >>>>> at > >>>>> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21 > >>>>> #2 0x0003b350 in _exit (status=0) > >>>>> at > >>>>> > >>>>> > > ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47 > >>>>> #3 0x0002cc30 in exit (code=0) > >>>>> at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70 > >>>>> #4 0x00002184 in Init (ignored=253816) > >>>>> at > >>>>> > >>>>> > ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33 > >>>>> ---Type <return> to continue, or q <return> to quit--- > >>>>> #5 0x0002c5b8 in _Thread_Handler () > >>>>> at > >>>>> > ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192 > >>>>> #6 0x0002c540 in _User_extensions_Thread_exitted > (executing=0x40080) > >>>>> at > >>>>> > ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243 > >>>> This is normal and OK. Look at the arguments to _Terminate. > >>>>>> + How far does ticker get? > >>>>>> > >>>>> Ticker goes to the end: > >>>>> > >>>>> *** BEGIN OF TEST CLOCK TICK *** > >>>>> TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988 > >>>>> TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988 > >>>>> TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:05 12/31/1988 > >>>>> TA2 - rtems_clock_get_tod - 09:00:10 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:10 12/31/1988 > >>>>> TA3 - rtems_clock_get_tod - 09:00:15 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:15 12/31/1988 > >>>>> TA2 - rtems_clock_get_tod - 09:00:20 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:20 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:25 12/31/1988 > >>>>> TA3 - rtems_clock_get_tod - 09:00:30 12/31/1988 > >>>>> TA2 - rtems_clock_get_tod - 09:00:30 12/31/1988 > >>>>> TA1 - rtems_clock_get_tod - 09:00:30 12/31/1988 > >>>>> *** END OF TEST CLOCK TICK *** > >>>>> Fatal Error 9.276564 Halted > >>>>> > >>>>> From GDB: > >>>>> > >>>>> (gdb) break _Terminate > >>>>> Breakpoint 1 at 0x19a68: file > >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line > >>>>> 39. > >>>>> (gdb) break _OR1K_Exception_default > >>>>> Breakpoint 2 at 0x2686c: file > >>>>> > >>>>> > >>>>> > > ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c, > >>>>> line 22. > >>>>> (gdb) c > >>>>> The program is not being run. > >>>>> (gdb) target remote :50001 > >>>>> Remote debugging using :50001 > >>>>> 0x00000100 in _reset () > >>>>> (gdb) c > >>>>> Continuing. > >>>>> > >>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at > >>>>> > >>>>> > >>>>> > > ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22 > >>>>> 22 rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION, (rtems_fatal_code) > frame > >>>>> ); > >>>>> (gdb) bt > >>>>> #0 _OR1K_Exception_default (vector=6, frame=0x43854) at > >>>>> > >>>>> > >>>>> > > ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22 > >>>>> #1 0x00026980 in jump_to_c_handler () > >>>>> Backtrace stopped: frame did not save the PC > >>>>> > >>>>> vector 6 is _unalign exception. > >>>> Set a break point at exit() (I think) and > rtems_shutdown_executive(). You > >>>> could start in the task which calls whatever kicks off the shutdown > >>>> sequence. > >>>> It looks like something in the shutdown procedure trips over > something. > >>>> This might be easy to debug. > >>>> > >>> I did add just a function call to _CPU_Exception_frame_print(frame); > >>> from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame > >>> *frame) > >>> And ticker exits normally without even entering > >>> _OR1K_Exception_defaul as it did before. This is very weird. Does > this > >>> mean that some areas of the code are overlapped from the linker > >>> script? > >> I doubt it. I suspect something unitialized or not aligned properly. > >> > >> Set a breakpoint at > >> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40 > >> next over the print and then step through rtems_test_exit() and see > >> where it faults. > >>>> If the fault address is in the exception data, you can map that > back to > >>>> the > >>>> nm file and see what file that was in, then that might help. > >>>>>> + Have you tried the trick I suggested earlier to disable the > >>>>>> real clock tick driver, use the simulator idle tick code, and > >>>>>> disable all the tests that are known to fail that way. This > >>>>>> will eliminate the ISR code as an issue because you won't > >>>>>> have any (if console output if polled). See h8sim for > >>>>>> an example. Should be a Makefile.am change, adding > >>>>>> an include to the testsuite configuration file, building > >>>>>> and running. > >>>>>> > >> -- > >> Joel Sherrill, Ph.D. Director of Research & Development > >> joel.sherr...@oarcorp.com On-Line Applications Research > >> Ask me about RTEMS: a free RTOS Huntsville AL 35805 > >> Support Available (256) 722-9985 > >> > > -- > Joel Sherrill, Ph.D. Director of Research & Development > joel.sherr...@oarcorp.com On-Line Applications Research > Ask me about RTEMS: a free RTOS Huntsville AL 35805 > Support Available (256) 722-9985 > > > -- Joel Sherrill, Ph.D. Director of Research & Development joel.sherr...@oarcorp.com On-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985 _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel