https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
--- Comment #19 from rguenther at suse dot de <rguenther at suse dot de> --- On Sun, 24 Dec 2017, sergey.shalnov at intel dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008 > > --- Comment #18 from sergey.shalnov at intel dot com --- > Yes, I agree that vector_store stage has it’s own vectorization cost. > And each vector_store has vector_construction stage. These stages are > different > in gcc slp (as you know). > To better illustrate my point of view I would like to propose a patch. > I didn’t submit the patch to community yet because I think it is better to > discuss it before that. > > index 0ca42b4..72e3123 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -1825,7 +1825,7 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree > node, > record_stmt_cost (prologue_cost_vec, 1, vector_load, > stmt_info, 0, vect_prologue); > else if (dt == vect_external_def) > - record_stmt_cost (prologue_cost_vec, 1, vec_construct, > + record_stmt_cost (prologue_cost_vec, ncopies_for_cost, > vec_construct, > stmt_info, 0, vect_prologue); > } > } I'm not sure it affects the testcase in question but yes, there's imprecision in this in that it assumes too much CSE happening. Using ncopies_for_cost will over-estimate the cost though, it will match the code we generate though (CSE will cleanup). As the comment says this code should be re-engineered ;) (it also misses to make a splat cheap). Sth like the following might work for most cases: Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 256070) +++ gcc/tree-vect-slp.c (working copy) @@ -1815,18 +1815,41 @@ vect_analyze_slp_cost_1 (slp_instance in enum vect_def_type dt; if (!op || op == lhs) continue; - if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt)) + if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt) + && (dt == vect_constant_def || dt == vect_external_def)) { /* Without looking at the actual initializer a vector of constants can be implemented as load from the constant pool. - ??? We need to pass down stmt_info for a vector type - even if it points to the wrong stmt. */ - if (dt == vect_constant_def) - record_stmt_cost (prologue_cost_vec, 1, vector_load, - stmt_info, 0, vect_prologue); - else if (dt == vect_external_def) - record_stmt_cost (prologue_cost_vec, 1, vec_construct, - stmt_info, 0, vect_prologue); + When all elements are the same we can use a splat. */ + tree vectype = get_vectype_for_scalar_type (TREE_TYPE (op)); + unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype); + unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length (); + tree elt = NULL_TREE; + unsigned nelt = 0; + for (unsigned j = 0; j < ncopies_for_cost * nunits / group_size; ++j) + for (unsigned k = 0; k < group_size; ++k) + { + if (nelt == 0) + elt = gimple_op (SLP_TREE_SCALAR_STMTS (node)[nelt], i); + /* ??? We're just tracking whether all operands of a single + vector initializer are the same, ideally we'd check if + we emitted the same one already. */ + else if (elt != gimple_op (SLP_TREE_SCALAR_STMTS (node)[nelt], + i)) + elt = NULL_TREE; + nelt++; + if (nelt == group_size) + { + /* ??? We need to pass down stmt_info for a vector type + even if it points to the wrong stmt. */ + record_stmt_cost (prologue_cost_vec, 1, + dt == vect_external_def + ? (elt ? scalar_to_vec : vec_construct) + : vector_load, + stmt_info, 0, vect_prologue); + nelt = 0; + } + } } }