https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
Bug ID: 119181
Summary: Missed vectorization due to imperfect SLP discovery
for strided & interleaved load.
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Target Milestone: ---
void
foo (double* a, double* __restrict b)
{
b[0] = a[0] * a[256];
b[1] = a[257] * a[1];
b[2] = a[2] * a[258];
b[3] = a[259] * a[3];
b[4] = a[260] * a[4];
b[5] = a[261] * a[5];
b[6] = a[6] * a[262];
b[7] = a[7] * a[263];
}
void
foo1 (double* a, double* __restrict b, double* c)
{
b[0] = a[0] * a[256];
b[1] = a[1] * a[257];
b[2] = a[2] * a[258];
b[3] = a[3] * a[259];
b[4] = a[4] * a[260];
b[5] = a[5] * a[261];
b[6] = a[6] * a[262];
b[7] = a[7] * a[263];
}
Foo1 can be vectorized but foo can't, it's because SLP discovery didn't build
a[0], a[1], a[2], ...., a[7] into same node, but build a[0] a[257], a[2],
a[259], a[260], a[261], a[6], a[7].
/app/example.cpp:4:10: note: starting SLP discovery for node 0xb1488e0
/app/example.cpp:4:10: note: get vectype for scalar type (group size 8):
double
/app/example.cpp:4:10: note: vectype: vector(4) double
/app/example.cpp:4:10: note: nunits = 4
/app/example.cpp:4:10: note: Build SLP for _1 = *a_26(D);
/app/example.cpp:4:10: note: Build SLP for _4 = MEM[(double *)a_26(D) +
2056B];
/app/example.cpp:4:10: note: Build SLP for _7 = MEM[(double *)a_26(D) + 16B];
/app/example.cpp:4:10: note: Build SLP for _10 = MEM[(double *)a_26(D) +
2072B];
/app/example.cpp:4:10: note: Build SLP for _13 = MEM[(double *)a_26(D) +
2080B];
/app/example.cpp:4:10: note: Build SLP for _16 = MEM[(double *)a_26(D) +
2088B];
/app/example.cpp:4:10: note: Build SLP for _19 = MEM[(double *)a_26(D) +
48B];
/app/example.cpp:4:10: note: Build SLP for _22 = MEM[(double *)a_26(D) +
56B];
NOTE it's different from PR114375 since there's no permuted mask load.