https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116521
Bug ID: 116521 Summary: missing optimization: xtensa tail-call Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsaxvc at gmail dot com Target Milestone: --- On GCC 12.2.0, -O2 -Wall -Wextra, the following code: #include <stdint.h> __attribute__ ((noinline)) uint32_t callee(uint32_t x, uint16_t y){ return x + y; } __attribute__ ((noinline)) uint32_t caller(uint32_t x, uint32_t y){ return callee(x, y); } compiles to these xtensa instructions: callee: entry sp, 32 extui a3, a3, 0, 16 add.n a2, a3, a2 retw.n caller: entry sp, 32 extui a11, a3, 0, 16 mov.n a10, a2 call8 callee mov.n a2, a10 retw.n If the caller were to tail-call callee, it could be a lot closer to the following on ARM(basically, caller does not need to manipulate the register windows): callee: add r0, r0, r1 bx lr caller: uxth r1, r1 //similar to extui, .., .., 0, 16 b callee On xtensa, this might mean that the arguments are in different registers in caller(), I'm not sure if the caller or callee is responsible for rotating the window. This may only apply when the number of arguments of each match. It's also possible I'm misunderstanding the mechanism.