https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105733
Jim Wilson <wilson at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilson at gcc dot gnu.org --- Comment #2 from Jim Wilson <wilson at gcc dot gnu.org> --- This is a known problem that I have commented on before. For instance here https://github.com/riscv-collab/riscv-gcc/issues/193 Copying my reply to here since this is a better place for the bug report: This is a known problem. GCC first emits code using a frame pointer, and then tries to eliminate the frame pointer during register allocation. When the frame size is larger than the immediate field of an add, we need temporaries to compute frame offsets. Compiler optimizations like common subexpression elimination (cse) try to optimize the calculation of the temporaries. Then when we try to eliminate the frame pointer, we run into trouble because the cse opt changes interfere with the frame pointer elimination, and we end up with an ugly inefficient mess. This problem isn't RISC-V specific, but since RISC-V has 12-bit immediates and most everyone else has 16-bit immediates, we hit the problem sooner, and thus makes it much more visible for RISC-V. The example above is putting 12KB on the stack, which is larger than 12-bits of range (4KB), but well within 16-bits of range (64KB). I have a prototype of a patch to fix it by allowing >12-bit offsets for frame pointer references, and then fixing this when eliminating the frame pointer, but this can cause other problems, and needs more work to be usable. I have no idea when someone might try to finish the patch. End of inclusion from github. To improve my earlier answer, we have signed 12-bit immediates, so the trouble actually starts at 2048 bytes, and Jessica's example is larger than that. Reduce BUFSIZ to 2044 and you get a reasonable result. foo: li a5,4096 addi a5,a5,-2048 addi sp,sp,-2048 add a5,a5,a0 add a0,a5,sp li t0,4096 sb zero,-2048(a0) addi t0,t0,-2048 add sp,sp,t0 jr ra There is still a bit of oddness here. We are adding 2048 to a0 and then using an address -2048(a0). I think that is more cse trouble. 2048 requires two instructions to load into a register which is likely confusing something somewhere.