As observed in PR 17619:
Take this code:
--------------------------------
struct X { float array[4]; };
X a,b;
float foobar () {
float s = 0;
for (unsigned int d=0; d<4; ++d)
s += a.array[d] * b.array[d];
return s;
}
--------------------------
It compiles to
--------------------------
flds b+12
fmuls a+12
movss b, %xmm1
mulss a, %xmm1
addss .LC0, %xmm1
movss b+4, %xmm0
mulss a+4, %xmm0
addss %xmm0, %xmm1
movss b+8, %xmm0
mulss a+8, %xmm0
addss %xmm0, %xmm1
movss %xmm1, -4(%ebp)
flds -4(%ebp)
faddp %st, %st(1)
--------------------------
However, what should really happen is that the compiler should vectorize
the loop. As Uros points out in PR 17619, this isn't happening, although
this modified function here
--------------------------
struct X
{
float array[4];
};
float foobar()
{
X a, b, c;
float s = 0;
for (unsigned int d = 0; d < 4; ++d)
c.array[d] = a.array[d] * b.array[d];
for (unsigned int d = 0; d < 4; ++d)
s += c.array[d];
return s;
}
--------------------------
generates the optimal code:
--------------------------
movaps 32(%esp), %xmm0
mulps 16(%esp), %xmm0
movaps %xmm0, (%esp)
flds 4(%esp)
fadds (%esp)
fadds 8(%esp)
fadds 12(%esp)
--------------------------
The compiler should be able to make this transformation by itself.
Thanks
W.
--------------------------
--
Summary: No vectorization for simple loop
Product: gcc
Version: 4.0.0
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: target
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: bangerth at dealii dot org
CC: gcc-bugs at gcc dot gnu dot org,uros at gcc dot gnu dot
org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18767