https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181
Bug ID: 119181 Summary: Missed vectorization due to imperfect SLP discovery for strided & interleaved load. Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: liuhongt at gcc dot gnu.org Target Milestone: --- void foo (double* a, double* __restrict b) { b[0] = a[0] * a[256]; b[1] = a[257] * a[1]; b[2] = a[2] * a[258]; b[3] = a[259] * a[3]; b[4] = a[260] * a[4]; b[5] = a[261] * a[5]; b[6] = a[6] * a[262]; b[7] = a[7] * a[263]; } void foo1 (double* a, double* __restrict b, double* c) { b[0] = a[0] * a[256]; b[1] = a[1] * a[257]; b[2] = a[2] * a[258]; b[3] = a[3] * a[259]; b[4] = a[4] * a[260]; b[5] = a[5] * a[261]; b[6] = a[6] * a[262]; b[7] = a[7] * a[263]; } Foo1 can be vectorized but foo can't, it's because SLP discovery didn't build a[0], a[1], a[2], ...., a[7] into same node, but build a[0] a[257], a[2], a[259], a[260], a[261], a[6], a[7]. /app/example.cpp:4:10: note: starting SLP discovery for node 0xb1488e0 /app/example.cpp:4:10: note: get vectype for scalar type (group size 8): double /app/example.cpp:4:10: note: vectype: vector(4) double /app/example.cpp:4:10: note: nunits = 4 /app/example.cpp:4:10: note: Build SLP for _1 = *a_26(D); /app/example.cpp:4:10: note: Build SLP for _4 = MEM[(double *)a_26(D) + 2056B]; /app/example.cpp:4:10: note: Build SLP for _7 = MEM[(double *)a_26(D) + 16B]; /app/example.cpp:4:10: note: Build SLP for _10 = MEM[(double *)a_26(D) + 2072B]; /app/example.cpp:4:10: note: Build SLP for _13 = MEM[(double *)a_26(D) + 2080B]; /app/example.cpp:4:10: note: Build SLP for _16 = MEM[(double *)a_26(D) + 2088B]; /app/example.cpp:4:10: note: Build SLP for _19 = MEM[(double *)a_26(D) + 48B]; /app/example.cpp:4:10: note: Build SLP for _22 = MEM[(double *)a_26(D) + 56B]; NOTE it's different from PR114375 since there's no permuted mask load.