https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125174
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'll note that SIMD clone vector calls are not costed at all ...
-shell2.fppized.f90:971:24: optimized: loop vectorized using 32 byte vectors
and unroll factor 4
+shell2.fppized.f90:971:24: optimized: loop vectorized using 16 byte vectors
and unroll factor 2
shell2.fppized.f90:971:24: optimized: loop versioned for vectorization
because of possible aliasing
-shell2.fppized.f90:971:24: optimized: epilogue loop vectorized using 16 byte
vectors and unroll factor 2
is a difference, shown for
do k = 1, k_max
k1 = k_x(k); k2 = k_y(k); k3 = k_z(k)
dot1 = k1*P1+k2*P2+k3*P3
dot2 = g4 * (k1*k1+k2*k2+k3*k3)
res_ij(k) = res_ij(k) + therm(k) * (fac1 *
exp(cmplx(dot2,dot1,kind=kind((1.0d0,1.0d0)))))
end do