https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118086
--- Comment #1 from Jonathan Gruber <jonathan.gruber.jg at gmail dot com> --- Created attachment 59892 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59892&action=edit Minimal test case, medium-sized struct I also tested this with a medium-sized struct. I have attached the new test case but have reproduced it below for your convenience: struct Medium { void *x, *y; }; extern int extern_func_medium(struct Medium m); int tail_call_medium(struct Medium m) { return extern_func_medium(m); } int non_tail_call_medium(struct Medium m) { return ~extern_func_medium(m); } I tested with -O3. The generated assembly for x86_64 and aarch64 was identical to that for small structs. riscv64 however manipulated the stack pointer sp in a weird way: tail_call_medium: .cfi_startproc addi sp,sp,-16 .cfi_def_cfa_offset 16 addi sp,sp,16 .cfi_def_cfa_offset 0 tail extern_func_medium@plt .cfi_endproc non_tail_call_medium: .cfi_startproc addi sp,sp,-32 .cfi_def_cfa_offset 32 sd ra,24(sp) .cfi_offset 1, -8 call extern_func_medium@plt ld ra,24(sp) .cfi_restore 1 not a0,a0 sext.w a0,a0 addi sp,sp,32 .cfi_def_cfa_offset 0 jr ra .cfi_endproc For the tail call, riscv64 decreased sp by 16 and then added 16 back to sp (essentially, a slightly slower no-op), before unconditionally branching to extern_func. For the non-tail call, riscv64 decreased sp by 32, even though it only stores the return address ra on the stack. I have only a passing familiarity with riscv64's calling convention, so I don't know if the stack pointer has a certain alignment requirement, but I assume that 32 bytes is nonetheless excessive stack space to reserve for storing a single 8-byte register value.