https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51499
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2021-08-07
Ever confirmed|0 |1
Summary|vectorizer missing simple |-Ofast does not vectorize
|case |while -O3 does.
Status|UNCONFIRMED |NEW
--- Comment #15 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So here is the interesting for the trunk,
With -O3 we can vectorize the loop because we are using a SLP vectorizer but
-Ofast we don't as we say the vectorization is too costly.
The inner most loop for -O3:
.L3:
addq $1, %rax
addpd %xmm1, %xmm2
addpd %xmm1, %xmm3
addpd %xmm1, %xmm4
cmpq %rax, %rdi
jne .L3
The SLP vectorizer has done it since 11+.
Here is the inner loop for -Ofast:
.L3:
addq $1, %rax
addsd %xmm0, %xmm3
addsd %xmm0, %xmm6
addsd %xmm0, %xmm1
addsd %xmm0, %xmm5
addsd %xmm0, %xmm2
addsd %xmm0, %xmm4
cmpq %rax, %rdi
jne .L3
as you can see we don't vectorize it.