Hi,

IIUC, the function vectorizable_bb_reduc_epilogue missed to
consider the cost to extract the final value from the vector
for reduc operations.  This patch is to add one time of
vec_to_scalar cost for extracting.

Bootstrapped & regtested on powerpc64le-linux-gnu P9.
The testing on x86_64 and aarch64 is ongoing.

Is it ok for trunk?

BR,
Kewen
-----
gcc/ChangeLog:

        * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for
        value extraction.

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index b9d88c2d943..841a0872afa 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
     return false;

   /* There's no way to cost a horizontal vector reduction via REDUC_FN so
-     cost log2 vector operations plus shuffles.  */
+     cost log2 vector operations plus shuffles and one extraction.  */
   unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
   record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
                    vectype, 0, vect_body);
   record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
                    vectype, 0, vect_body);
+  record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
+                   vectype, 0, vect_body);
   return true;
 }

Reply via email to