Hi, I am just looking at SLP costing for this program:
#include <stdint.h> typedef int16_t __attribute__((vector_size(16))) int16x8_t; int16x8_t __attribute__((noinline)) vec_padd(int16x8_t a, int16x8_t b) { const int16x8_t result = { a[0] + a[1], a[2] + a[3], a[4] + a[5], a[6] + a[7], b[0] + b[1], b[2] + b[3], b[4] + b[5], b[6] + b[7], }; return result; } SLP costing adds two external vec_construct nodes: node 0x5210210 1 times vec_construct costs 7 in prologue node 0x52103c0 1 times vec_construct costs 7 in prologue which lead to very high costs. I also see these two array constructs in the SLP pass dump: _53 = {_1, _7, _13, _19, _25, _31, _37, _43}; vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_53); _55 = {_3, _9, _15, _21, _27, _33, _39, _45}; vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_55); I also see in the SLP log messages like: missed: Build SLP failed: different BIT_FIELD_REF arguments in _25 = BIT_FIELD_REF <b_50(D), 16, 0>; However, if I disable the cost mode (-fvect-cost-model=unlimited), forwardprop later on transforms these two vector constructions into two vector permute operations: _2 = VEC_PERM_EXPR <a_49(D), b_50(D), { 0, 2, 4, 6, 8, 10, 12, 14 }>; vect__2.3_54 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_2); _4 = VEC_PERM_EXPR <a_49(D), b_50(D), { 1, 3, 5, 7, 9, 11, 13, 15 }>; vect__4.4_56 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(_4); These would have lower costs and make vectorization profitable. My question now is: Should I mimic the behaviour of forwardprop in the cost model, or is this something that eventually will be ported to SLP vectorizer such that the cost model gets two vec_permute instead of two vec_constructs? And if yes, are there other cases where a later pass performs essential changes to the constructs passed to the cost model? Thanks, Juergen