https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101668

            Bug ID: 101668
           Summary: vectorizer doesn't categorize vector construct cost
                    right.
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---

cat test.c

typedef int v16si __attribute__((vector_size (64)));
typedef long long v8di __attribute__((vector_size (64)));

void
bar_s32_s64 (v8di * dst, v16si src)
{
  long long tem[8];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  tem[4] = src[4];
  tem[5] = src[5];
  tem[6] = src[6];
  tem[7] = src[7];
  dst[0] = *(v8di *) tem;
}

gcc -O3 -march=skylake-avx512 will fail to vectorize the case after my r12-2549
because i've increased vec_construct cost for SKX/CLX. Here's dump for slp2

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <src_18(D), 32, 0>;
  _2 = (long long int) _1;
  _3 = BIT_FIELD_REF <src_18(D), 32, 32>;
  _4 = (long long int) _3;
  _5 = BIT_FIELD_REF <src_18(D), 32, 64>;
  _6 = (long long int) _5;
  _7 = BIT_FIELD_REF <src_18(D), 32, 96>;
  _8 = (long long int) _7;
  _9 = BIT_FIELD_REF <src_18(D), 32, 128>;
  _10 = (long long int) _9;
  _11 = BIT_FIELD_REF <src_18(D), 32, 160>;
  _12 = (long long int) _11;
  _13 = BIT_FIELD_REF <src_18(D), 32, 192>;
  _14 = (long long int) _13;
  _15 = BIT_FIELD_REF <src_18(D), 32, 224>;
  _31 = {_1, _3, _5, _7, _9, _11, _13, _15};
  vect__2.4_32 = (vector(8) long long int) _31;
  _16 = (long long int) _15;
  MEM <vector(8) long long int> [(long long int *)&tem] = vect__2.4_32;
  _17 = MEM[(v8di *)&tem];
  *dst_28(D) = _17;
  tem ={v} {CLOBBER};
  return;

But actually, there's no need for vec_contruct from each element, it will be
optimized to

   <bb 2> [local count: 1073741824]:
  _2 = BIT_FIELD_REF <src_18(D), 256, 0>;
  vect__2.4_32 = (vector(8) long long int) _2;
  *dst_28(D) = vect__2.4_32;
  return;

So at the time slp2 can realize the optimization and categorize vec_contruct
cost more accurately, we can avoid this regression.

Reply via email to