As observed in PR 17619: Take this code: -------------------------------- struct X { float array[4]; }; X a,b; float foobar () { float s = 0; for (unsigned int d=0; d<4; ++d) s += a.array[d] * b.array[d]; return s; } -------------------------- It compiles to -------------------------- flds b+12 fmuls a+12 movss b, %xmm1 mulss a, %xmm1 addss .LC0, %xmm1 movss b+4, %xmm0 mulss a+4, %xmm0 addss %xmm0, %xmm1 movss b+8, %xmm0 mulss a+8, %xmm0 addss %xmm0, %xmm1 movss %xmm1, -4(%ebp) flds -4(%ebp) faddp %st, %st(1) -------------------------- However, what should really happen is that the compiler should vectorize the loop. As Uros points out in PR 17619, this isn't happening, although this modified function here -------------------------- struct X { float array[4]; }; float foobar() { X a, b, c; float s = 0; for (unsigned int d = 0; d < 4; ++d) c.array[d] = a.array[d] * b.array[d]; for (unsigned int d = 0; d < 4; ++d) s += c.array[d]; return s; } -------------------------- generates the optimal code: -------------------------- movaps 32(%esp), %xmm0 mulps 16(%esp), %xmm0 movaps %xmm0, (%esp) flds 4(%esp) fadds (%esp) fadds 8(%esp) fadds 12(%esp) -------------------------- The compiler should be able to make this transformation by itself. Thanks W. --------------------------
-- Summary: No vectorization for simple loop Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: bangerth at dealii dot org CC: gcc-bugs at gcc dot gnu dot org,uros at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767