On Mon, Aug 16, 2021 at 8:03 AM Kewen.Lin <li...@linux.ibm.com> wrote: > > Hi, > > IIUC, the function vectorizable_bb_reduc_epilogue missed to > consider the cost to extract the final value from the vector > for reduc operations. This patch is to add one time of > vec_to_scalar cost for extracting. > > Bootstrapped & regtested on powerpc64le-linux-gnu P9. > The testing on x86_64 and aarch64 is ongoing. > > Is it ok for trunk?
There's no such instruction necessary, the way the costing works the result is in lane zero already. Note the optabs are defined to reduce to a scalar already. So if your arch implements those and requires such move then the backend costing needs to handle that. That said, ideally we'd simply cost the IFN_REDUC_* in the backend but for BB reductions we don't actually build a SLP node with such representative stmt to pass down (yet). I guess you're running into a integer reduction where there's a vector -> gpr move missing in costing? I suppose costing vec_to_scalar works for that but in the end we should maybe find a way to cost the IFN_REDUC_* ... Richard. > BR, > Kewen > ----- > gcc/ChangeLog: > > * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for > value extraction. > > diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c > index b9d88c2d943..841a0872afa 100644 > --- a/gcc/tree-vect-slp.c > +++ b/gcc/tree-vect-slp.c > @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, > return false; > > /* There's no way to cost a horizontal vector reduction via REDUC_FN so > - cost log2 vector operations plus shuffles. */ > + cost log2 vector operations plus shuffles and one extraction. */ > unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); > record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], > vectype, 0, vect_body); > record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], > vectype, 0, vect_body); > + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], > + vectype, 0, vect_body); > return true; > }