https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172
Bug ID: 92172 Summary: ARM Thumb2 frame pointers inconsistent with clang and ARM-THUMB Procedure Call Standard Product: gcc Version: 8.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: sethml at ofb dot net Target Milestone: --- This is a bit of a feature request, which has been rejected before, but I think there are compelling reasons to reconsider. The issue is described pretty well in this gcc-patches thread: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg195725.html And in this clang bug: https://bugs.llvm.org/show_bug.cgi?id=18505 The request is to provide an option to make gcc's frame pointer behavior consistent with clang, either with a special flag, or by default. The behavior of frame pointers on ARM is a mess, with AAPCS not defining it, the obsolete ARM-Thumb Procedure Call Standard (ATPCS) recommdending a frame layout different than GCC and clang, and ARM's obsolete armcc compiler implementing different semantics. However, as of 2014, ARM's standard toolchain is "ARM Compiler 6", which packages clang: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.swdev.comp6/index.html The Keil embedded toolchain, which is pretty industry-standard for ARM embedded development, uses armclang: http://www.keil.com/support/man/docs/armclang_ref/armclang_ref_vvi1466179578564.htm Addressing some of the objections to modifying the frame layout from the gcc-patches thread: Wilco Dijkstra <https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg195782.html>: > However changing the frame pointer like in the proposed patch > will have a much larger cost - both in performance and codesize. You'd be > lucky if it is less than 10%. This is due to placing the frame pointer at the > top rather than the bottom of the frame, and that is very inefficient in > Thumb-2. I don't understand this objection. For a simple function the additional overhead is literally nothing - for example <https://godbolt.org/z/BhvM2t>, GCC generates: push {r3, r4, r7, lr} add r7, sp, #0 while clang adds a small constant to make r7 point to the previous r7 on the stack, with lr immediately above - zero overhead: push {r4, r6, r7, lr} add r7, sp, #8 For a more complex function where the compiler has to spill r8-r11 one extra instruction is required to generate the right frame layout - gcc generates: push {r3, r4, r5, r6, r7, r8, r9, lr} add r7, sp, #0 While clang generates: push {r4, r5, r6, r7, lr} add r7, sp, #12 push.w {r8, r9, r11} Push (stmia) instructions take, at least on Cortex-M3, 1+N cycles, where N is the number of registers saved. So clang's frame pointer approach takes one extra cycle and 4 extra bytes. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337e/BABBCJII.html > Doing real unwinding is also far more accurate than frame pointer based > unwinding (the latter doesn't handle leaf functions correctly, entry/exit in > non-leaf functions and shrinkwrapped functions - and this breaks callgraph > profiling). This is true, but doing real unwinding is prohibitively expensive in an embedded systems context, in which one has only hundreds of KiB of code storage and RAM. Richard Earnshaw <https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg196444.html>: > I object to another hack going in for another ill-specified frame > pointer variant until such time as the ABI is updated to sort this out > properly. > > So until the ABI sanctions a proper inter-function frame chain record, > GCC will only support local use of the frame pointer and no chaining. Since this is not defined by the ABI, the ABI is unlikely to specify it any time soon. However, ARM seems to have blessed clang as the official ARM compiler, so it's a defacto standard at this point. Richard Earnshaw <https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg196488.html>: > On entry to a function the code has to save the existing frame register. > It doesn't know (can't trivially know) whether the caller is code > compiled in Arm state or Thumb state. So how can it save the caller's > frame register if they are not the same? > > Furthermore, the 'other' frame register (ie r7 in Arm state, r11 in > Thumb) is available as a call-saved register, so can contain any random > value. If you try to use that random value during a frame chain walk > your program will most like take an access violation. It will certainly > give you a garbage frame chain. This is true - you cannot safely walk the stack frames if thumb and arm functions are intermixed. However, for the situations in which this feature is most useful this is not a problem. For deeply embedded codebases, the entire codebase is compiled with a single compiler and instruction set. Most microcontrollers use a Cortex-M instruction set, which doesn't even implement ARM instructions, so by definition they will not be present! Someone wrote something like: > The extra overhead of frame pointers will remove the benefit of thumb > instructions - > why not just use ARM instructions? As noted above, there exist many MCUs for which ARM mode is not implemented. I have two applications motivating me to wanting this fixed. I'm working on safety-critical firmware running on small microcontrollers. 1) In case of a crash, it would be extremely helpful to be able to have the embedded firmware relay back a simple stack trace. Integrating libunwind and including the unwind tables in our firmware is too heavyweight. We know the boundaries of the stack, so it's easy to validate address when traversing frames. If the stack trace sometimes ends early due to issues such as ARM/Thumb interworking, we don't mind - it's much better than no trace at all. 2) It would be really helpful to have random sampling profiling, by capturing stack traces from a randomly triggered timer interrupt handler. Full profiling would add excessive overhead. I'm totally willing to take a slight performance hit to get the two features above. Judging from stackoverflow questions and such, there are others who would like predictable frame pointers: https://stackoverflow.com/questions/19643047/arm-call-stack-generation-with-no-frame-pointer http://cplusadd.blogspot.com/2008/11/frame-pointers-and-function-call.html https://gcc-help.gcc.gnu.narkive.com/D8BDrQzp/stack-backtrace-for-arm-thumb https://github.com/google/sanitizers/issues/640