https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107916

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Reduced even further just compile with `-O2 -mvsx` is enough to show the issue
really:
```
typedef unsigned u32x8 __attribute__ ((vector_size (32)));

void f(int n, u32x8 *a, u32x8 *b)
{
  u32x8 c = {0};
  for(int i = 0; i < n; i++)
     c+=*a;
  *b += c;
}
```

With the above you can see the issue on x86_64 with just -O2 (not turning on
AVX 512 or anything):
.L3:
        movdqa  xmm4, XMMWORD PTR [rsp-32]
        movdqa  xmm5, XMMWORD PTR [rsp-16]
        add     eax, 1
        paddd   xmm4, xmm2
        paddd   xmm5, xmm3
        movaps  XMMWORD PTR [rsp-32], xmm4
        movaps  XMMWORD PTR [rsp-16], xmm5
        cmp     edi, eax
        jne     .L3

See the extra load/stores.

Reply via email to