https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119181

            Bug ID: 119181
           Summary: Missed vectorization due to imperfect SLP discovery
                    for strided & interleaved load.
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

void
foo (double* a, double* __restrict b)
{
    b[0] = a[0] * a[256];
    b[1] = a[257] * a[1];
    b[2] = a[2] * a[258];
    b[3] = a[259] * a[3];
    b[4] = a[260] * a[4];
    b[5] = a[261] * a[5];
    b[6] = a[6] * a[262];
    b[7] = a[7] * a[263];
}

void
foo1 (double* a, double* __restrict b, double* c)
{
    b[0] = a[0] * a[256];
    b[1] = a[1] * a[257];
    b[2] = a[2] * a[258];
    b[3] = a[3] * a[259];
    b[4] = a[4] * a[260];
    b[5] = a[5] * a[261];
    b[6] = a[6] * a[262];
    b[7] = a[7] * a[263];
}

Foo1 can be vectorized but foo can't, it's because SLP discovery didn't build
a[0], a[1], a[2], ...., a[7] into same node, but build a[0] a[257], a[2],
a[259], a[260], a[261], a[6], a[7].


/app/example.cpp:4:10: note:   starting SLP discovery for node 0xb1488e0
/app/example.cpp:4:10: note:   get vectype for scalar type (group size 8):
double
/app/example.cpp:4:10: note:   vectype: vector(4) double
/app/example.cpp:4:10: note:   nunits = 4
/app/example.cpp:4:10: note:   Build SLP for _1 = *a_26(D);
/app/example.cpp:4:10: note:   Build SLP for _4 = MEM[(double *)a_26(D) +
2056B];
/app/example.cpp:4:10: note:   Build SLP for _7 = MEM[(double *)a_26(D) + 16B];
/app/example.cpp:4:10: note:   Build SLP for _10 = MEM[(double *)a_26(D) +
2072B];
/app/example.cpp:4:10: note:   Build SLP for _13 = MEM[(double *)a_26(D) +
2080B];
/app/example.cpp:4:10: note:   Build SLP for _16 = MEM[(double *)a_26(D) +
2088B];
/app/example.cpp:4:10: note:   Build SLP for _19 = MEM[(double *)a_26(D) +
48B];
/app/example.cpp:4:10: note:   Build SLP for _22 = MEM[(double *)a_26(D) +
56B];

NOTE it's different from PR114375 since there's no permuted mask load.

Reply via email to