https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116521

            Bug ID: 116521
           Summary: missing optimization: xtensa tail-call
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsaxvc at gmail dot com
  Target Milestone: ---

On GCC 12.2.0, -O2 -Wall -Wextra, the following code:

    #include <stdint.h>

    __attribute__ ((noinline)) uint32_t callee(uint32_t x, uint16_t y){
        return x + y;
    }

    __attribute__ ((noinline)) uint32_t caller(uint32_t x, uint32_t y){
        return callee(x, y);
    }

compiles to these xtensa instructions:

    callee:
            entry   sp, 32
            extui   a3, a3, 0, 16
            add.n   a2, a3, a2
            retw.n
    caller:
            entry   sp, 32
            extui   a11, a3, 0, 16
            mov.n   a10, a2
            call8   callee
            mov.n   a2, a10
            retw.n

If the caller were to tail-call callee, it could be a lot closer to the
following on ARM(basically, caller does not need to manipulate the register
windows):

    callee:
            add     r0, r0, r1
            bx      lr
    caller:
            uxth    r1, r1 //similar to extui, .., .., 0, 16
            b       callee

On xtensa, this might mean that the arguments are in different registers in
caller(), I'm not sure if the caller or callee is responsible for rotating the
window. This may only apply when the number of arguments of each match. It's
also possible I'm misunderstanding the mechanism.

Reply via email to