https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88459
Bug ID: 88459
Summary: vectorization failure for a simple sum reduction loop
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: jiangning.liu at amperecomputing dot com
Target Milestone: ---
For the simple loop below, gcc -O3 fails to vectorize it.
unsigned int tmp[1024];
unsigned int test_vec(int n)
{
int sum = 0;
for(int i = 0; i < 1024; i++)
{
sum += tmp[i];
}
return sum;
}
The kernel loop is,
.L2:
ldr w2, [x1], 4
add w0, w0, w2
cmp x3, x1
bne .L2
But if we change the data type of sum from "int" to "unsigned int" as below,
unsigned int tmp[1024];
unsigned int test_vec(int n)
{
unsigned int sum = 0;
for(int i = 0; i < 1024; i++)
{
sum += tmp[i];
}
return sum;
}
gcc can vectorize it, and the kernel loop is like,
.L2:
ldr q1, [x0], 16
add v0.4s, v0.4s, v1.4s
cmp x1, x0
bne .L2