https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92172
--- Comment #4 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Seth LaForge from comment #2) > Good point on frame pointers vs a frame chain for unwinding. I'm looking for > the unwindable frame chain. > > Wilco: > > Why does this matter? Well as your examples show, if you want to emit a > > frame > > chain using standard push/pop, it typically ends up pointing to the top of > > the > > frame. That is the worst possible position for a frame pointer on Thumb - > > while > > Arm supports negative immediate offsets up to 4KB, Thumb-1 doesn't support > > negative offsets at all, and Thumb-2 supports offsets up to -255 but only > > with > > 32-bit instructions. So the result of conflating the frame chain and frame > > pointer implies a terrible codesize hit for Thumb. > > Well, there's really no need for a frame pointer for efficiency, since the > stack frame can be efficiently accessed with positive immediate accesses > relative to the stack pointer. There are even special encodings for Thumb-2 > 16-bit LDR/STR which allow an immediate offset of 0 to 1020 when relative to > SP - much larger than other registers. You're saying using a frame pointer > implies a terrible codesize hit for Thumb, but I don't see how that can be - > stack access will continue to go through SP, and the only code size hit > should be pushing/popping R7 (~2 cycles), computing R7 as a frame pointer > (~1 cycle), and potential register spills due to one less register > available. That's a pretty small amount of overhead for a non-leaf function. On GCC10 the codesize overhead of -fno-omit-frame-pointer is 4.1% for Arm and 4.8% for Thumb-2 (measured on SPEC2006). That's already a large overhead, especially since this feature doesn't do anything useful besides adding overhead... The key is that GCC uses the frame pointer for every stack access, and thus the placement of the frame pointer within a frame matters. Thumb compilers place the frame pointer at the bottom of the frame so they can efficiently access locals using positive offsets. Despite that the overhead is significant already. If GCC would emit a frame chain like the LLVM sequence this means placing the frame pointer at the top of the stack. This forces negative frame offsets for all frame accesses. Getting a 10% overhead is being lucky, I've seen worse... So this is something that needs to be properly designed and carefully implemented. > Baseline: With gcc 4.7, -fomit-frame-pointer, -mthumb: 384016 bytes, 110.943 > s. Thanks for posting actual numbers, but GCC 4.7?!? It might be time to try GCC9...