https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96539
Bug ID: 96539
Summary: Unnecessary no-op copy with Os and tail call with
struct argument
Product: gcc
Version: 10.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: yyc1992 at gmail dot com
Target Milestone: ---
Test C code,
```
struct A {
int a;
int b;
int c;
int d;
int e;
int f;
void *p1;
void *p2;
void *p3;
void *p4;
void *p5;
void *p6;
void *p7;
};
int k(int a);
int f(int a, int b, int c, void *p, struct A s);
int g(int a, int b, int c, void *p, struct A s)
{
k(a);
return f(a, b, c, p, s);
}
```
At `-O2`, the code produced is
```
g:
pushq %r14
movq %rcx, %r14
pushq %r13
movl %edx, %r13d
pushq %r12
movl %esi, %r12d
pushq %rbp
movl %edi, %ebp
subq $8, %rsp
call k@PLT
addq $8, %rsp
movq %r14, %rcx
movl %r13d, %edx
movl %r12d, %esi
movl %ebp, %edi
popq %rbp
popq %r12
popq %r13
popq %r14
jmp f@PLT
```
I'm not sure why the spill of register and save the argument in those registers
(maybe for latency for the final call?) but both clang and gcc does that so I
assume that's good for performance. However, when I tried `-Os`, the code
produced is,
```
g:
pushq %r14
movq %rcx, %r14
pushq %r12
movl %esi, %r12d
pushq %rbp
movl %edi, %ebp
subq $16, %rsp
movl %edx, 12(%rsp)
call k@PLT
leaq 48(%rsp), %rdi
movl $20, %ecx
movq %rdi, %rsi
rep movsl
movq %r14, %rcx
movl %r12d, %esi
movl %ebp, %edi
movl 12(%rsp), %edx
addq $16, %rsp
popq %rbp
popq %r12
popq %r14
jmp f@PLT
```
AFAICT, the
```
movq %rdi, %rsi
rep movsl
```
is basically always a no-op (moving from and to the same memory location) other
than potentially triggering memory fault.
The memory being copied in place here is the area where the argument is stored
(80 bytes starting at `rsp + 48`) so maybe it's the copying of the argument
that failed to be removed when it becomes an no-op for tail call?
At `-O1`, the code produced is
```
g:
pushq %r13
pushq %r12
pushq %rbp
pushq %rbx
subq $8, %rsp
movl %edi, %ebx
movl %esi, %ebp
movl %edx, %r12d
movq %rcx, %r13
call k@PLT
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
pushq 120(%rsp)
movq %r13, %rcx
movl %r12d, %edx
movl %ebp, %esi
movl %ebx, %edi
call f@PLT
addq $88, %rsp
popq %rbx
popq %rbp
popq %r12
popq %r13
ret
```
which shows the copying of 10 pointers that was not no-op without tail call.