Hi, IIUC, the function vectorizable_bb_reduc_epilogue missed to consider the cost to extract the final value from the vector for reduc operations. This patch is to add one time of vec_to_scalar cost for extracting.
Bootstrapped & regtested on powerpc64le-linux-gnu P9. The testing on x86_64 and aarch64 is ongoing. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: * tree-vect-slp.c (vectorizable_bb_reduc_epilogue): Add the cost for value extraction. diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index b9d88c2d943..841a0872afa 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -4845,12 +4845,14 @@ vectorizable_bb_reduc_epilogue (slp_instance instance, return false; /* There's no way to cost a horizontal vector reduction via REDUC_FN so - cost log2 vector operations plus shuffles. */ + cost log2 vector operations plus shuffles and one extraction. */ unsigned steps = floor_log2 (vect_nunits_for_cost (vectype)); record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0], vectype, 0, vect_body); record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0], vectype, 0, vect_body); + record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0], + vectype, 0, vect_body); return true; }