https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99078

            Bug ID: 99078
           Summary: Optimizer moves struct initialization into loop
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: magiblot at hotmail dot com
  Target Milestone: ---

Consider the following piece of code (https://godbolt.org/z/WhTcbd):

> struct S
> {
>     char c[24];
> };
> 
> void copy(S *dest, unsigned count)
> {
>     S s {};
>     for (int i = 0; i < 7; ++i)
>         s.c[i] = i;
>     for (int i = 8; i < 15; ++i)
>         s.c[i] = i;
>     for (int i = 16; i < 23; ++i)
>         s.c[i] = i;
>     while (count--)
>         *dest++ = s;
> }

The generated assembly with -O2 looks like this:

> copy(S*, unsigned int):
>         mov     QWORD PTR [rsp-24], 0
>         pxor    xmm0, xmm0
>         movups  XMMWORD PTR [rsp-40], xmm0
>         test    esi, esi
>         je      .L1
>         mov     esi, esi
>         lea     rax, [rsi+rsi*2]
>         lea     rdx, [rdi+rax*8]
> .L3:
>         mov     eax, 1541
>         mov     ecx, 3340
>         mov     esi, 5396
>         mov     DWORD PTR [rsp-39], 67305985
>         mov     WORD PTR [rsp-35], ax
>         add     rdi, 24
>         mov     DWORD PTR [rsp-32], 185207048
>         mov     WORD PTR [rsp-28], cx
>         mov     BYTE PTR [rsp-26], 14
>         movdqu  xmm1, XMMWORD PTR [rsp-40]
>         mov     DWORD PTR [rsp-24], 319951120
>         mov     WORD PTR [rsp-20], si
>         mov     BYTE PTR [rsp-18], 22
>         mov     rax, QWORD PTR [rsp-24]
>         movups  XMMWORD PTR [rdi-24], xmm1
>         mov     QWORD PTR [rdi-8], rax
>         cmp     rdi, rdx
>         jne     .L3
> .L1:
>         ret

It can be seen that the struct initialization has been moved into the loop,
which is a severe pessimization.

The issue cannot be reproduced if the struct is initialized this way:

> S s;
> memset(&s, 0, sizeof(s));

But the following still reproduces the issue:

> S s {};
> memset(&s, 0, sizeof(s));

Replacing the assignment inside the loop with memcpy does not affect the
result.

According to Godbolt, the generated assembly has not changed since GCC 7.2. GCC
7.1 does not use vector registers but still initializes the struct inside the
loop. GCC 6.4 and earlier do not use vector registers either but do initialize
the struct outside the loop, as expected.

EXPECTED RESULT

Ideally, the loop body would be optimized into something like this:

>         movdqu  xmm1, XMMWORD PTR [rsp-40]
>         mov     rax, QWORD PTR [rsp-24]
> .L3:
>         add     rdi, 24
>         movups  XMMWORD PTR [rdi-24], xmm1
>         mov     QWORD PTR [rdi-8], rax
>         cmp     rdi, rdx
>         jne     .L3
> .L1:
>         ret

Thank you.

Reply via email to