On 15/9/20 8:58 pm, small...@aliyun.com wrote: > I am developing applications in rtems 5.1. As we know, my application and > rtems > kernel are both in the same address space. > So if my application access an invalid address or encounter other fatal > errors, > I want the kernel not just being hunging, but create a core dump file. > This file contains the whole contents of memory and I could use a debuger to > analyse the file to handle the bug. > The question arise because I do not want always debug rtems in the bsp.
This is an interesting question. For production units I think capturing and reporting an error is important but a full core is not worth the effort. Core images can be saved with a single address space OS. I remember Cisco's single address space OS for routers from 20 years ago could capture a complete core that could be loaded by gdb. Those devices had a Compact flash card installed to capture the core and I suppose their users did not mind the wait while the core was saved. As others have explained capturing the full address space and saving it so gdb could be taught to load it is difficult. You need to put aside some memory to construct the core image as you save it and you need to have small stand alone drivers and what ever else to get the image off the target and saved. RTEMS cannot be used. Where this approach gets hard is when you start to consider hardware failure type issues. My preferred solution is to add a small storage area away from the RTEMS memory map called the Run Time Error (RTE) store. This is a piece of RAM that can survive a reset or reboot and is not part of the RTEMS memory map. Internal SoC memory can often be enough. The memory cannot be cleared or corrupted during reset. The struct is something like: typedef struct { uint32_t type; /* The type of error in this trace buffer. */ uint32_t count; /* The number of times we have had an error. */ uint64_t uptime; /* The period of time we have been up. */ union { error_trace_fatal fatal; /* A fatal error. */ error_trace_assert assert; /* An assert error. */ error_trace_error error; /* An error code. */ } error; uint32_t crc; /* Checksum */ } error_trace; You provide a struct for a fatal error, an assert or an error. It is a matter of hooking the error handlers and saving the data. The fatal error is something like: typedef struct { rtems_fatal_source source; uint32_t internal; rtems_fatal_code code; CPU_Exception_frame frame; uint32_t stack[ET_STACK_SIZE]; } error_trace_fatal; Catch the fatal error handler and fill in the fields including the crc then reset the board. Limit the code you call before reset. When RTEMS starts get your application to check the RTE and if the checksum is valid check the error count. If the error count is not zero you have captured an error. You now have a working RTEMS that can be used to process and save the error. I have production systems that save errors to a JFFS2 disk and a web interface can be used to download it. I also have systems that send the data to a syslog type server when the devices are networked. In those systems it is really important to capture _every_ reset. Finding the error in the code is a matter of getting the PC address and the ELF executable image with the DWARF debug information and using `objdump -d --source`. Disassemble the exe with source and search for the PC address. The report points to the location and the dumped registers will help you see what the issue is. Most of the time the issue can be found or investigated further and resolved but sometimes it cannot be found directly. In those cases you need to stress the system in a lab to expose the crash and then investigate. The key information is the crash happened and where. Chris _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel