[Bug target/105733] riscv: Poor codegen for large stack frames

wilson at gcc dot gnu.org via Gcc-bugs Mon, 06 Jun 2022 19:41:09 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733


Jim Wilson <wilson at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wilson at gcc dot gnu.org

--- Comment #2 from Jim Wilson <wilson at gcc dot gnu.org> ---
This is a known problem that I have commented on before.  For instance here
    https://github.com/riscv-collab/riscv-gcc/issues/193

Copying my reply to here since this is a better place for the bug report:

This is a known problem. GCC first emits code using a frame pointer, and then
tries to eliminate the frame pointer during register allocation. When the frame
size is larger than the immediate field of an add, we need temporaries to
compute frame offsets. Compiler optimizations like common subexpression
elimination (cse) try to optimize the calculation of the temporaries. Then when
we try to eliminate the frame pointer, we run into trouble because the cse opt
changes interfere with the frame pointer elimination, and we end up with an
ugly inefficient mess.

This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and
most everyone else has 16-bit immediates, we hit the problem sooner, and thus
makes it much more visible for RISC-V. The example above is putting 12KB on the
stack, which is larger than 12-bits of range (4KB), but well within 16-bits of
range (64KB).

I have a prototype of a patch to fix it by allowing >12-bit offsets for frame
pointer references, and then fixing this when eliminating the frame pointer,
but this can cause other problems, and needs more work to be usable. I have no
idea when someone might try to finish the patch.

End of inclusion from github.

To improve my earlier answer, we have signed 12-bit immediates, so the trouble
actually starts at 2048 bytes, and Jessica's example is larger than that. 
Reduce BUFSIZ to 2044 and you get a reasonable result.

foo:
        li      a5,4096
        addi    a5,a5,-2048
        addi    sp,sp,-2048
        add     a5,a5,a0
        add     a0,a5,sp
        li      t0,4096
        sb      zero,-2048(a0)
        addi    t0,t0,-2048
        add     sp,sp,t0
        jr      ra

There is still a bit of oddness here.  We are adding 2048 to a0 and then using
an address -2048(a0).  I think that is more cse trouble.  2048 requires two
instructions to load into a register which is likely confusing something
somewhere.

[Bug target/105733] riscv: Poor codegen for large stack frames

Reply via email to