Hi,

I am just looking at SLP costing for this program:

#include <stdint.h>

typedef int16_t __attribute__((vector_size(16))) int16x8_t;

int16x8_t __attribute__((noinline)) vec_padd(int16x8_t a, int16x8_t b) {
  const int16x8_t result = {
      a[0] + a[1],
      a[2] + a[3],
      a[4] + a[5],
      a[6] + a[7],
      b[0] + b[1],
      b[2] + b[3],
      b[4] + b[5],
      b[6] + b[7],
  };

  return result;
}

SLP costing adds two external vec_construct nodes:

node 0x5210210 1 times vec_construct costs 7 in prologue
node 0x52103c0 1 times vec_construct costs 7 in prologue

which lead to very high costs.  I also see these two array constructs
in the SLP pass dump:

  _53 = {_1, _7, _13, _19, _25, _31, _37, _43};
  vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_53);
  _55 = {_3, _9, _15, _21, _27, _33, _39, _45};
  vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_55);

I also see in the SLP log messages like:

missed:   Build SLP failed: different BIT_FIELD_REF arguments in _25 = 
BIT_FIELD_REF <b_50(D), 16, 0>;

However, if I disable the cost mode (-fvect-cost-model=unlimited),
forwardprop later on transforms these two vector constructions into
two vector permute operations:

  _2 = VEC_PERM_EXPR <a_49(D), b_50(D), { 0, 2, 4, 6, 8, 10, 12, 14 }>;
  vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_2);
  _4 = VEC_PERM_EXPR <a_49(D), b_50(D), { 1, 3, 5, 7, 9, 11, 13, 15 }>;
  vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_4);

These would have lower costs and make vectorization profitable.

My question now is: Should I mimic the behaviour of forwardprop in the
cost model, or is this something that eventually will be ported to SLP
vectorizer such that the cost model gets two vec_permute instead of
two vec_constructs?  And if yes, are there other cases where a later
pass performs essential changes to the constructs passed to the cost
model?

Thanks,

Juergen

Reply via email to