https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107916
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Reduced even further just compile with `-O2 -mvsx` is enough to show the issue really: ``` typedef unsigned u32x8 __attribute__ ((vector_size (32))); void f(int n, u32x8 *a, u32x8 *b) { u32x8 c = {0}; for(int i = 0; i < n; i++) c+=*a; *b += c; } ``` With the above you can see the issue on x86_64 with just -O2 (not turning on AVX 512 or anything): .L3: movdqa xmm4, XMMWORD PTR [rsp-32] movdqa xmm5, XMMWORD PTR [rsp-16] add eax, 1 paddd xmm4, xmm2 paddd xmm5, xmm3 movaps XMMWORD PTR [rsp-32], xmm4 movaps XMMWORD PTR [rsp-16], xmm5 cmp edi, eax jne .L3 See the extra load/stores.