https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96539

            Bug ID: 96539
           Summary: Unnecessary no-op copy with Os and tail call with
                    struct argument
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: yyc1992 at gmail dot com
  Target Milestone: ---

Test C code,

```
struct A {
    int a;
    int b;
    int c;
    int d;
    int e;
    int f;
    void *p1;
    void *p2;
    void *p3;
    void *p4;
    void *p5;
    void *p6;
    void *p7;
};

int k(int a);
int f(int a, int b, int c, void *p, struct A s);

int g(int a, int b, int c, void *p, struct A s)
{
    k(a);
    return f(a, b, c, p, s);
}
```

At `-O2`, the code produced is

```
g:
        pushq   %r14
        movq    %rcx, %r14
        pushq   %r13
        movl    %edx, %r13d
        pushq   %r12
        movl    %esi, %r12d
        pushq   %rbp
        movl    %edi, %ebp
        subq    $8, %rsp
        call    k@PLT
        addq    $8, %rsp
        movq    %r14, %rcx
        movl    %r13d, %edx
        movl    %r12d, %esi
        movl    %ebp, %edi
        popq    %rbp
        popq    %r12
        popq    %r13
        popq    %r14
        jmp     f@PLT
```

I'm not sure why the spill of register and save the argument in those registers
(maybe for latency for the final call?) but both clang and gcc does that so I
assume that's good for performance. However, when I tried `-Os`, the code
produced is,

```
g:
        pushq   %r14
        movq    %rcx, %r14
        pushq   %r12
        movl    %esi, %r12d
        pushq   %rbp
        movl    %edi, %ebp
        subq    $16, %rsp
        movl    %edx, 12(%rsp)
        call    k@PLT
        leaq    48(%rsp), %rdi
        movl    $20, %ecx
        movq    %rdi, %rsi
        rep movsl
        movq    %r14, %rcx
        movl    %r12d, %esi
        movl    %ebp, %edi
        movl    12(%rsp), %edx
        addq    $16, %rsp
        popq    %rbp
        popq    %r12
        popq    %r14
        jmp     f@PLT
```

AFAICT, the

```
        movq    %rdi, %rsi
        rep movsl
```

is basically always a no-op (moving from and to the same memory location) other
than potentially triggering memory fault.

The memory being copied in place here is the area where the argument is stored
(80 bytes starting at `rsp + 48`) so maybe it's the copying of the argument
that failed to be removed when it becomes an no-op for tail call?

At `-O1`, the code produced is

```
g:
        pushq   %r13
        pushq   %r12
        pushq   %rbp
        pushq   %rbx
        subq    $8, %rsp
        movl    %edi, %ebx
        movl    %esi, %ebp
        movl    %edx, %r12d
        movq    %rcx, %r13
        call    k@PLT
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        pushq   120(%rsp)
        movq    %r13, %rcx
        movl    %r12d, %edx
        movl    %ebp, %esi
        movl    %ebx, %edi
        call    f@PLT
        addq    $88, %rsp
        popq    %rbx
        popq    %rbp
        popq    %r12
        popq    %r13
        ret
```
which shows the copying of 10 pointers that was not no-op without tail call.

Reply via email to