On Tue, Sep 16, 2014 at 5:08 PM, Joel Sherrill <joel.sherr...@oarcorp.com> wrote: > Gedare.. cc'ed you for help in spotting an empty rbtree > in gdb. See below. > On 9/16/2014 2:45 PM, Hesham Moustafa wrote: >> Breakpoint 2, 0x00000600 in _unalign () >> (gdb) bt >> #0 0x00000600 in _unalign () >> #1 0x0002ec4c in _RBTree_Next ( >> node=0x40890, dir=RBT_RIGHT) >> at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35 >> #2 0x0002e2f4 in _RBTree_Successor ( >> node=0x40890) >> at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512 >> #3 0x0002e8c0 in _RBTree_Extract ( >> the_rbtree=0x4198c, >> the_node=0x40890) >> at >> ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106 >> #4 0x00021524 in _RBTree_Get ( >> the_rbtree=0x4198c, dir=RBT_LEFT) >> at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540 >> #5 0x000215c8 in _Thread_queue_Dequeue >> (the_thread_queue=0x4198c) >> ---Type <return> to continue, or q <return> to quit--- >> at >> ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51 >> #6 0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c, >> id=436273153, >> api_semaphore_mp_support=0x0) >> at >> ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37 >> #7 0x00014868 in rtems_semaphore_release (id=436273153) >> at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102 >> #8 0x00026cfc in rtems_libio_unlock () >> at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253 >> #9 0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0) >> at >> ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type >> <return> to continue, or q <return> to quit--- >> k_and_unlock.c:39 >> #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c) >> at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292 >> #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c) >> at >> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29 >> #12 0x00029734 in rtems_libio_free ( >> iop=0x49c50) >> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136 >> #13 0x0002912c in close (fd=0) >> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38 >> #14 0x000064b0 in rtems_libio_exit () >> at >> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31 >> ---Type <return> to continue, or q <return> to quit--- >> #15 0x0003b058 in _exit (status=0) >> at >> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46 >> #16 0x00034798 in exit (code=0) >> at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70 >> #17 0x00002e3c in Test_task (unused=1) >> at >> ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41 >> #18 0x000340f0 in _Thread_Handler () >> at >> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192 >> #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890) >> at >> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243 >> Backtrace stopped: frame did not save the PC >> (gdb) >> >> >> It breaks at _RBTree_Next specifically at the following line: >> while ( ( current = current->child[ opp_dir ] ) != NULL ) >> >> (gdb) p current->child[ opp_dir ] >> Cannot access memory at address 0xa010006 >> (gdb) p current >> $1 = (RBTree_Node *) 0xa010002 > These look like object ids. >> This address is invalid, the current memory length should be only 32 >> MB (0x2000000) >> >> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20 >> >> So I guest current->child is overwritten somehow? > Yep. Two approaches. > > + Set a watchpoint in gdb if it is supported. But even if supported, > it will likely slow the run tremendously. > + Break selectively and more or less binary search for where it is > overwritten. I would break at the first call to _ISR_Dispatch > (or whatever you called it) and see if it gets clobbered. > > That could be clobbered VERY early in the program. It could be > a blown stack. But it could just be a stray write. Check the value > of that semaphore's rbtree when you get to Init and just > break periodically and see where it is corrupt. > > I cc'ed Gedare because I don't know how to spot that the rbtree > is empty in gdb. > You should be able to watch one of the pointers from the rbtree_control. I think there is a check for rbtree_is_empty that would also tell you what to do. I don't have the code in front of me to check. -Gedare
> You need to see where that memory is overwritten. > > Again running all tests with the simulator clock tick could > eliminate the ISR code as the culprit. :) >> On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill >> <joel.sherr...@oarcorp.com> wrote: >>> On 9/16/2014 2:17 PM, Hesham Moustafa wrote: >>>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill >>>> <joel.sherr...@oarcorp.com> wrote: >>>>> >>>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote: >>>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill >>>>>> <joel.sherr...@oarcorp.com> wrote: >>>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill >>>>>>>> <joel.sherr...@oarcorp.com> wrote: >>>>>>>>> I don't understand this but I got it applied. >>>>>>>>> >>>>>>>>> I manually edited the saved email to delete the preinstall.am >>>>>>>>> changes. I committed the rest. Then I ran bootstrap -p myself >>>>>>>>> and folded that into the rest of your patch. >>>>>>>>> >>>>>>>>> It should all be committed now. >>>>>>>>> >>>>>>>> Thanks for doing this, me too do not know what's wrong. BTW, commits >>>>>>>> are not mirrored on github since 4 days ago. >>>>>>>> >>>>>>>>> How about some new test results. :) >>>>>>>>> >>>>>>>> I did run one last night, no big progress since previous results :( Is >>>>>>>> there any tool, script, utility program or whatever that I can use to >>>>>>>> detect wrong memory access (i.e, stack overwrite, heap corruption, >>>>>>>> access to another task context)? I tried to add -fstack-protector-all >>>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker just hangs. >>>>>>> I haven't checked into how gcc does its stack overwrite protection. >>>>>>> >>>>>>> The tests by themselves don't have these problems. The first >>>>>>> possible source is incorrect layout of sections to memory by >>>>>>> the linker script. There is some debug code in boot >>>>>>> >>>>>>> There used to be debug printk's in bspgetworkarea.c so you >>>>>>> could check if areas overlapped. That usually causes bad issues >>>>>>> though. Let's go through some basics: >>>>>>> >>>>>>> + Does hello world run and exit cleanly? >>>>>>> >>>>>> The output of Hello World is: >>>>>> >>>>>> *** BEGIN OF TEST HELLO WORLD *** >>>>>> Hello World >>>>>> *** END OF TEST HELLO WORLD *** >>>>>> Fatal Error 5.0 Halted >>>>>> >>>>>> From GDB: >>>>>> >>>>>> Breakpoint 1, _Terminate ( >>>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false, >>>>>> the_error=0) >>>>>> at >>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39 >>>>>> 39 _ISR_Disable_without_giant( level ); >>>>>> (gdb) bt >>>>>> #0 _Terminate ( >>>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false, >>>>>> the_error=0) >>>>>> at >>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39 >>>>>> #1 0x0003b5f8 in rtems_shutdown_executive (result=0) >>>>>> at >>>>>> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21 >>>>>> #2 0x0003b350 in _exit (status=0) >>>>>> at >>>>>> >>>>>> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47 >>>>>> #3 0x0002cc30 in exit (code=0) >>>>>> at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70 >>>>>> #4 0x00002184 in Init (ignored=253816) >>>>>> at >>>>>> >>>>>> ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33 >>>>>> ---Type <return> to continue, or q <return> to quit--- >>>>>> #5 0x0002c5b8 in _Thread_Handler () >>>>>> at >>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192 >>>>>> #6 0x0002c540 in _User_extensions_Thread_exitted (executing=0x40080) >>>>>> at >>>>>> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243 >>>>> This is normal and OK. Look at the arguments to _Terminate. >>>>>>> + How far does ticker get? >>>>>>> >>>>>> Ticker goes to the end: >>>>>> >>>>>> *** BEGIN OF TEST CLOCK TICK *** >>>>>> TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988 >>>>>> TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988 >>>>>> TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:05 12/31/1988 >>>>>> TA2 - rtems_clock_get_tod - 09:00:10 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:10 12/31/1988 >>>>>> TA3 - rtems_clock_get_tod - 09:00:15 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:15 12/31/1988 >>>>>> TA2 - rtems_clock_get_tod - 09:00:20 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:20 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:25 12/31/1988 >>>>>> TA3 - rtems_clock_get_tod - 09:00:30 12/31/1988 >>>>>> TA2 - rtems_clock_get_tod - 09:00:30 12/31/1988 >>>>>> TA1 - rtems_clock_get_tod - 09:00:30 12/31/1988 >>>>>> *** END OF TEST CLOCK TICK *** >>>>>> Fatal Error 9.276564 Halted >>>>>> >>>>>> From GDB: >>>>>> >>>>>> (gdb) break _Terminate >>>>>> Breakpoint 1 at 0x19a68: file >>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line >>>>>> 39. >>>>>> (gdb) break _OR1K_Exception_default >>>>>> Breakpoint 2 at 0x2686c: file >>>>>> >>>>>> >>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c, >>>>>> line 22. >>>>>> (gdb) c >>>>>> The program is not being run. >>>>>> (gdb) target remote :50001 >>>>>> Remote debugging using :50001 >>>>>> 0x00000100 in _reset () >>>>>> (gdb) c >>>>>> Continuing. >>>>>> >>>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at >>>>>> >>>>>> >>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22 >>>>>> 22 rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION, (rtems_fatal_code) frame >>>>>> ); >>>>>> (gdb) bt >>>>>> #0 _OR1K_Exception_default (vector=6, frame=0x43854) at >>>>>> >>>>>> >>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22 >>>>>> #1 0x00026980 in jump_to_c_handler () >>>>>> Backtrace stopped: frame did not save the PC >>>>>> >>>>>> vector 6 is _unalign exception. >>>>> Set a break point at exit() (I think) and rtems_shutdown_executive(). You >>>>> could start in the task which calls whatever kicks off the shutdown >>>>> sequence. >>>>> It looks like something in the shutdown procedure trips over something. >>>>> This might be easy to debug. >>>>> >>>> I did add just a function call to _CPU_Exception_frame_print(frame); >>>> from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame >>>> *frame) >>>> And ticker exits normally without even entering >>>> _OR1K_Exception_defaul as it did before. This is very weird. Does this >>>> mean that some areas of the code are overlapped from the linker >>>> script? >>> I doubt it. I suspect something unitialized or not aligned properly. >>> >>> Set a breakpoint at >>> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40 >>> next over the print and then step through rtems_test_exit() and see >>> where it faults. >>>>> If the fault address is in the exception data, you can map that back to >>>>> the >>>>> nm file and see what file that was in, then that might help. >>>>>>> + Have you tried the trick I suggested earlier to disable the >>>>>>> real clock tick driver, use the simulator idle tick code, and >>>>>>> disable all the tests that are known to fail that way. This >>>>>>> will eliminate the ISR code as an issue because you won't >>>>>>> have any (if console output if polled). See h8sim for >>>>>>> an example. Should be a Makefile.am change, adding >>>>>>> an include to the testsuite configuration file, building >>>>>>> and running. >>>>>>> >>> -- >>> Joel Sherrill, Ph.D. Director of Research & Development >>> joel.sherr...@oarcorp.com On-Line Applications Research >>> Ask me about RTEMS: a free RTOS Huntsville AL 35805 >>> Support Available (256) 722-9985 >>> > > -- > Joel Sherrill, Ph.D. Director of Research & Development > joel.sherr...@oarcorp.com On-Line Applications Research > Ask me about RTEMS: a free RTOS Huntsville AL 35805 > Support Available (256) 722-9985 > > _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel