https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118076
--- Comment #5 from Jonathan Gruber <jonathan.gruber.jg at gmail dot com> --- For completeness, I also checked for when the call to the struct-accepting function is not a tail call. This is just the same test case as before, but the return value of extern_func is bitwise negated to make it a non-tail call: struct S { void *x, *y, *z, *w; }; extern int extern_func(struct S s); int non_tail_fwd(void *x, void *y, void *z, void *w) { struct S s = { x, y, z, w }; return ~extern_func(s); } Unsurprisingly, we see the same bug. Below is the generated assembly for the target architectures I tested, all with -O3 optimization (we see the same bug on other optimization levels, too). gcc (on x86_64): .cfi_startproc subq $72, %rsp .cfi_def_cfa_offset 80 movq %rdi, 32(%rsp) movq %rsi, 40(%rsp) movdqa 32(%rsp), %xmm0 movq %rdx, 48(%rsp) movq %rcx, 56(%rsp) movups %xmm0, (%rsp) movdqa 48(%rsp), %xmm0 movups %xmm0, 16(%rsp) call extern_func@PLT addq $72, %rsp .cfi_def_cfa_offset 8 notl %eax ret .cfi_endproc aarch64-linux-gnu-gcc (on x86_64): .cfi_startproc stp x29, x30, [sp, -80]! .cfi_def_cfa_offset 80 .cfi_offset 29, -80 .cfi_offset 30, -72 mov x29, sp stp x0, x1, [sp, 48] add x0, sp, 16 stp x2, x3, [sp, 64] ldp q30, q31, [sp, 48] str q30, [sp, 16] str q31, [x0, 16] bl extern_func mvn w0, w0 ldp x29, x30, [sp], 80 .cfi_restore 30 .cfi_restore 29 .cfi_def_cfa_offset 0 ret .cfi_endproc riscv64-linux-gnu-gcc (on x86_64): .cfi_startproc addi sp,sp,-80 .cfi_def_cfa_offset 80 mv a5,a0 mv a0,sp sd ra,72(sp) .cfi_offset 1, -8 sd a5,0(sp) sd a1,8(sp) sd a2,16(sp) sd a3,24(sp) call extern_func@plt ld ra,72(sp) .cfi_restore 1 not a0,a0 sext.w a0,a0 addi sp,sp,80 .cfi_def_cfa_offset 0 jr ra .cfi_endproc