https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96539
Bug ID: 96539 Summary: Unnecessary no-op copy with Os and tail call with struct argument Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com Target Milestone: --- Test C code, ``` struct A { int a; int b; int c; int d; int e; int f; void *p1; void *p2; void *p3; void *p4; void *p5; void *p6; void *p7; }; int k(int a); int f(int a, int b, int c, void *p, struct A s); int g(int a, int b, int c, void *p, struct A s) { k(a); return f(a, b, c, p, s); } ``` At `-O2`, the code produced is ``` g: pushq %r14 movq %rcx, %r14 pushq %r13 movl %edx, %r13d pushq %r12 movl %esi, %r12d pushq %rbp movl %edi, %ebp subq $8, %rsp call k@PLT addq $8, %rsp movq %r14, %rcx movl %r13d, %edx movl %r12d, %esi movl %ebp, %edi popq %rbp popq %r12 popq %r13 popq %r14 jmp f@PLT ``` I'm not sure why the spill of register and save the argument in those registers (maybe for latency for the final call?) but both clang and gcc does that so I assume that's good for performance. However, when I tried `-Os`, the code produced is, ``` g: pushq %r14 movq %rcx, %r14 pushq %r12 movl %esi, %r12d pushq %rbp movl %edi, %ebp subq $16, %rsp movl %edx, 12(%rsp) call k@PLT leaq 48(%rsp), %rdi movl $20, %ecx movq %rdi, %rsi rep movsl movq %r14, %rcx movl %r12d, %esi movl %ebp, %edi movl 12(%rsp), %edx addq $16, %rsp popq %rbp popq %r12 popq %r14 jmp f@PLT ``` AFAICT, the ``` movq %rdi, %rsi rep movsl ``` is basically always a no-op (moving from and to the same memory location) other than potentially triggering memory fault. The memory being copied in place here is the area where the argument is stored (80 bytes starting at `rsp + 48`) so maybe it's the copying of the argument that failed to be removed when it becomes an no-op for tail call? At `-O1`, the code produced is ``` g: pushq %r13 pushq %r12 pushq %rbp pushq %rbx subq $8, %rsp movl %edi, %ebx movl %esi, %ebp movl %edx, %r12d movq %rcx, %r13 call k@PLT pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) pushq 120(%rsp) movq %r13, %rcx movl %r12d, %edx movl %ebp, %esi movl %ebx, %edi call f@PLT addq $88, %rsp popq %rbx popq %rbp popq %r12 popq %r13 ret ``` which shows the copying of 10 pointers that was not no-op without tail call.