> -----Original Message-----
> From: Richard Biener <[email protected]>
> Sent: Monday, May 12, 2025 1:46 PM
> To: [email protected]
> Cc: Tamar Christina <[email protected]>; RISC-V CI <patchworks-
> [email protected]>
> Subject: [PATCH] Cleanup internal vectorizer costing API
>
> This tries to cleanup the API available to vectorizable_* to record
> stmt costs. There are several overloads of record_stmt_cost for this
> and the patch adds one only specifying SLP node and makes the one
> only having a stmt_vec_info suitable for scalar stmt processing only.
>
> There are awkward spots left which can use the overload with the
> full set of parameters, SLP node, stmt_vec_info and vectype. One
> issue is that BB vectorization SLP instances have root statements
> that are not represented by a SLP node. The other big offender
> is dataref alignment peeling analysis which I plan to move away
> from the add_stmt API, back to the target hook based costing
> (just to be out of the way, not necessarily as final solution).
>
> For backends the main visible change will be that most calls to
> add_stmt_cost will now have a SLP node passed. I still pass
> a stmt_vec_info in addition to the SLP node to cause less
> disruption there.
>
> This is not the big vectorizer costing overhaul.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing revealed some
> cost related fallout. I'll eventually try to split this up.
> For now I want to see whether any of the asserts trip on
> aarch64/riscv.
>
FWIW I get a bootstrap failure:
/opt/buildAgent/work/505bfdd4dad8af3d/libgcc/libgcc2.c:1468:1: internal
compiler error: Segmentation fault
1468 | __mulbitint3 (UBILtype *ret, SItype retprec,
| ^~~~~~~~~~~~
0x3f039fb internal_error(char const*, ...)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/diagnostic-global-context.cc:517
0x1bab727 crash_signal
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/toplev.cc:321
0xf385a0 tree_class_check(tree_node*, tree_code_class, char const*, int, char
const*)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree.h:3846
0x21780b3 aarch64_vector_costs::add_stmt_cost(int, vect_cost_for_stmt,
_stmt_vec_info*, _slp_tree*, tree_node*, int, vect_cost_model_location)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/config/aarch64/aarch64.cc:17883
0x1fa61e7 add_stmt_cost(vector_costs*, int, vect_cost_for_stmt,
_stmt_vec_info*, _slp_tree*, tree_node*, int, vect_cost_model_location)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.h:1972
0x1fa6383 add_stmt_costs(vector_costs*, vec<stmt_info_for_cost, va_heap,
vl_ptr>*)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.h:2002
0x1f83b07 vect_compute_single_scalar_iteration_cost
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vect-loop.cc:1735
0x1f876a7 vect_analyze_loop_2
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vect-loop.cc:2849
0x1f89223 vect_analyze_loop_1
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vect-loop.cc:3424
0x1f89d67 vect_analyze_loop(loop*, gimple*, vec_info_shared*)
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vect-loop.cc:3584
0x200932f try_vectorize_loop_1
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1096
0x200986b try_vectorize_loop
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1214
0x2009af7 execute
/opt/buildAgent/work/505bfdd4dad8af3d/gcc/tree-vectorizer.cc:1330
I think something is expecting a stmt_vec_info but got NULL.
Cheers,
Tamar
> Richard.
>
> * tree-vectorizer.h (record_stmt_cost): Remove inline
> overload with stmt_vec_info argument, make out-of-line
> version of this no longer take a vectype - it is only
> for scalar stmt costs.
> (record_stmt_cost): Remove stmt_vec_info argument from
> inline overload with SLP node specified.
> * tree-vect-loop.cc (vect_model_reduction_cost): Take
> SLP node as argument and adjust.
> (vectorizable_lane_reducing): Use SLP node overload
> for record_stmt_cost.
> (vectorizable_reduction): Likewise.
> (vectorizable_phi): Likewise.
> (vectorizable_recurr): Likewise.
> (vectorizable_nonlinear_induction): Likewise.
> (vectorizable_induction): Likewise.
> (vectorizable_live_operation): Likewise.
> * tree-vect-slp.cc (vect_prologue_cost_for_slp): Use
> full-blown record_stmt_cost.
> (vectorizable_bb_reduc_epilogue): Likewise.
> (vect_bb_slp_scalar_cost): Adjust.
> * tree-vect-stmts.cc (record_stmt_cost): For stmt_vec_info
> overload assert we are only using it for scalar stmt costing.
> (record_stmt_cost): For SLP node overload record
> SLP_TREE_REPRESENTATIVE as stmt_vec_info.
> (vect_model_simple_cost): Do not get stmt_vec_info argument
> and adjust.
> (vect_model_promotion_demotion_cost): Get SLP node instead
> of stmt_vec_info argument and adjust.
> (vect_get_store_cost): Compute vectype based on whether
> we got SLP node or stmt_vec_info and use the full record_stmt_cost
> API.
> (vect_get_load_cost): Likewise.
> (vectorizable_bswap): Adjust.
> (vectorizable_call): Likewise.
> (vectorizable_simd_clone_call): Likewise.
> (vectorizable_conversion): Likewise.
> (vectorizable_assignment): Likewise.
> (vectorizable_shift): Likewise.
> (vectorizable_operation): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_load): Likewise.
> (vectorizable_condition): Likewise.
> (vectorizable_comparison_1): Likewise.
> ---
> gcc/tree-vect-loop.cc | 58 +++++++--------
> gcc/tree-vect-slp.cc | 11 ++-
> gcc/tree-vect-stmts.cc | 160 ++++++++++++++++++++---------------------
> gcc/tree-vectorizer.h | 23 ++----
> 4 files changed, 118 insertions(+), 134 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index fe6f3cf188e..5abac27ec62 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -5283,7 +5283,7 @@ vect_is_emulated_mixed_dot_prod (stmt_vec_info
> stmt_info)
>
> static void
> vect_model_reduction_cost (loop_vec_info loop_vinfo,
> - stmt_vec_info stmt_info, internal_fn reduc_fn,
> + slp_tree slp_node, internal_fn reduc_fn,
> vect_reduction_type reduction_type,
> int ncopies, stmt_vector_for_cost *cost_vec)
> {
> @@ -5299,9 +5299,10 @@ vect_model_reduction_cost (loop_vec_info
> loop_vinfo,
> if (reduction_type == COND_REDUCTION)
> ncopies *= 2;
>
> - vectype = STMT_VINFO_VECTYPE (stmt_info);
> + vectype = SLP_TREE_VECTYPE (slp_node);
> mode = TYPE_MODE (vectype);
> - stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
> + stmt_vec_info orig_stmt_info
> + = vect_orig_stmt (SLP_TREE_REPRESENTATIVE (slp_node));
>
> gimple_match_op op;
> if (!gimple_extract_op (orig_stmt_info->stmt, &op))
> @@ -5319,16 +5320,16 @@ vect_model_reduction_cost (loop_vec_info
> loop_vinfo,
> if (reduc_fn != IFN_LAST)
> /* Count one reduction-like operation per vector. */
> inside_cost = record_stmt_cost (cost_vec, ncopies, vec_to_scalar,
> - stmt_info, 0, vect_body);
> + slp_node, 0, vect_body);
> else
> {
> /* Use NELEMENTS extracts and NELEMENTS scalar ops. */
> unsigned int nelements = ncopies * vect_nunits_for_cost (vectype);
> inside_cost = record_stmt_cost (cost_vec, nelements,
> - vec_to_scalar, stmt_info, 0,
> + vec_to_scalar, slp_node, 0,
> vect_body);
> inside_cost += record_stmt_cost (cost_vec, nelements,
> - scalar_stmt, stmt_info, 0,
> + scalar_stmt, orig_stmt_info, 0,
> vect_body);
> }
> }
> @@ -5345,7 +5346,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
> /* We need the initial reduction value. */
> prologue_stmts = 1;
> prologue_cost += record_stmt_cost (cost_vec, prologue_stmts,
> - scalar_to_vec, stmt_info, 0,
> + scalar_to_vec, slp_node, 0,
> vect_prologue);
> }
>
> @@ -5362,24 +5363,24 @@ vect_model_reduction_cost (loop_vec_info
> loop_vinfo,
> {
> /* An EQ stmt and an COND_EXPR stmt. */
> epilogue_cost += record_stmt_cost (cost_vec, 2,
> - vector_stmt, stmt_info, 0,
> + vector_stmt, slp_node, 0,
> vect_epilogue);
> /* Reduction of the max index and a reduction of the found
> values. */
> epilogue_cost += record_stmt_cost (cost_vec, 2,
> - vec_to_scalar, stmt_info, 0,
> + vec_to_scalar, slp_node, 0,
> vect_epilogue);
> /* A broadcast of the max value. */
> epilogue_cost += record_stmt_cost (cost_vec, 1,
> - scalar_to_vec, stmt_info, 0,
> + scalar_to_vec, slp_node, 0,
> vect_epilogue);
> }
> else
> {
> epilogue_cost += record_stmt_cost (cost_vec, 1, vector_stmt,
> - stmt_info, 0, vect_epilogue);
> + slp_node, 0, vect_epilogue);
> epilogue_cost += record_stmt_cost (cost_vec, 1,
> - vec_to_scalar, stmt_info, 0,
> + vec_to_scalar, slp_node, 0,
> vect_epilogue);
> }
> }
> @@ -5389,12 +5390,12 @@ vect_model_reduction_cost (loop_vec_info
> loop_vinfo,
> /* Extraction of scalar elements. */
> epilogue_cost += record_stmt_cost (cost_vec,
> 2 * estimated_nunits,
> - vec_to_scalar, stmt_info, 0,
> + vec_to_scalar, slp_node, 0,
> vect_epilogue);
> /* Scalar max reductions via COND_EXPR / MAX_EXPR. */
> epilogue_cost += record_stmt_cost (cost_vec,
> 2 * estimated_nunits - 3,
> - scalar_stmt, stmt_info, 0,
> + scalar_stmt, orig_stmt_info, 0,
> vect_epilogue);
> }
> else if (reduction_type == EXTRACT_LAST_REDUCTION
> @@ -5420,10 +5421,10 @@ vect_model_reduction_cost (loop_vec_info
> loop_vinfo,
> Also requires scalar extract. */
> epilogue_cost += record_stmt_cost (cost_vec,
> exact_log2 (nelements) * 2,
> - vector_stmt, stmt_info, 0,
> + vector_stmt, slp_node, 0,
> vect_epilogue);
> epilogue_cost += record_stmt_cost (cost_vec, 1,
> - vec_to_scalar, stmt_info, 0,
> + vec_to_scalar, slp_node, 0,
> vect_epilogue);
> }
> else
> @@ -5431,7 +5432,7 @@ vect_model_reduction_cost (loop_vec_info loop_vinfo,
> elements, we have N extracts and N-1 reduction ops. */
> epilogue_cost += record_stmt_cost (cost_vec,
> nelements + nelements - 1,
> - vector_stmt, stmt_info, 0,
> + vector_stmt, slp_node, 0,
> vect_epilogue);
> }
> }
> @@ -7421,7 +7422,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
> value and one that contains half of its negative. */
> int prologue_stmts = 2;
> unsigned cost = record_stmt_cost (cost_vec, prologue_stmts,
> - scalar_to_vec, stmt_info, 0,
> + scalar_to_vec, slp_node, 0,
> vect_prologue);
> if (dump_enabled_p ())
> dump_printf (MSG_NOTE, "vectorizable_lane_reducing: "
> @@ -7431,7 +7432,7 @@ vectorizable_lane_reducing (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
> ncopies_for_cost *= 4;
> }
>
> - record_stmt_cost (cost_vec, (int) ncopies_for_cost, vector_stmt, stmt_info,
> + record_stmt_cost (cost_vec, (int) ncopies_for_cost, vector_stmt, slp_node,
> 0, vect_body);
>
> if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
> @@ -8345,13 +8346,14 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> return false;
> }
>
> - vect_model_reduction_cost (loop_vinfo, stmt_info, reduc_fn,
> + vect_model_reduction_cost (loop_vinfo, slp_for_stmt_info, reduc_fn,
> reduction_type, ncopies, cost_vec);
> /* Cost the reduction op inside the loop if transformed via
> vect_transform_reduction for non-lane-reducing operation. Otherwise
> this is costed by the separate vectorizable_* routines. */
> if (single_defuse_cycle)
> - record_stmt_cost (cost_vec, ncopies, vector_stmt, stmt_info, 0,
> vect_body);
> + record_stmt_cost (cost_vec, ncopies, vector_stmt,
> + slp_for_stmt_info, 0, vect_body);
>
> if (dump_enabled_p ()
> && reduction_type == FOLD_LEFT_REDUCTION)
> @@ -9125,7 +9127,7 @@ vectorizable_phi (vec_info *,
> favoring the vector path (but may pessimize it in some cases). */
> if (gimple_phi_num_args (as_a <gphi *> (stmt_info->stmt)) > 1)
> record_stmt_cost (cost_vec, SLP_TREE_NUMBER_OF_VEC_STMTS
> (slp_node),
> - vector_stmt, stmt_info, vectype, 0, vect_body);
> + vector_stmt, slp_node, 0, vect_body);
> STMT_VINFO_TYPE (stmt_info) = phi_info_type;
> return true;
> }
> @@ -9306,7 +9308,7 @@ vectorizable_recurr (loop_vec_info loop_vinfo,
> stmt_vec_info stmt_info,
> prologue_cost = record_stmt_cost (cost_vec, 1, scalar_to_vec,
> stmt_info, 0, vect_prologue);
> unsigned inside_cost = record_stmt_cost (cost_vec, ncopies,
> vector_stmt,
> - stmt_info, 0, vect_body);
> + slp_node, 0, vect_body);
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "vectorizable_recurr: inside_cost = %d, "
> @@ -9837,7 +9839,7 @@ vectorizable_nonlinear_induction (loop_vec_info
> loop_vinfo,
> /* loop cost for vec_loop. Neg induction doesn't have any
> inside_cost. */
> inside_cost = record_stmt_cost (cost_vec, ncopies, vector_stmt,
> - stmt_info, 0, vect_body);
> + slp_node, 0, vect_body);
>
> /* loop cost for vec_loop. Neg induction doesn't have any
> inside_cost. */
> @@ -9846,7 +9848,7 @@ vectorizable_nonlinear_induction (loop_vec_info
> loop_vinfo,
>
> /* prologue cost for vec_init and vec_step. */
> prologue_cost = record_stmt_cost (cost_vec, 2, scalar_to_vec,
> - stmt_info, 0, vect_prologue);
> + slp_node, 0, vect_prologue);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -10172,11 +10174,11 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> inside_cost
> = record_stmt_cost (cost_vec,
> SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node),
> - vector_stmt, stmt_info, 0, vect_body);
> + vector_stmt, slp_node, 0, vect_body);
> /* prologue cost for vec_init (if not nested) and step. */
> prologue_cost = record_stmt_cost (cost_vec, 1 + !nested_in_vect_loop,
> scalar_to_vec,
> - stmt_info, 0, vect_prologue);
> + slp_node, 0, vect_prologue);
> }
> else /* if (!slp_node) */
> {
> @@ -11157,7 +11159,7 @@ vectorizable_live_operation (vec_info *vinfo,
> stmt_vec_info stmt_info,
> }
> /* ??? Enable for loop costing as well. */
> if (!loop_vinfo)
> - record_stmt_cost (cost_vec, 1, vec_to_scalar, stmt_info, NULL_TREE,
> + record_stmt_cost (cost_vec, 1, vec_to_scalar, slp_node, NULL_TREE,
> 0, vect_epilogue);
> return true;
> }
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index f7c51b6cf68..8d0a612577b 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -8038,7 +8038,7 @@ vect_prologue_cost_for_slp (slp_tree node,
> we are costing so avoid passing it down more than once. Pass
> it to the first vec_construct or scalar_to_vec part since for those
> the x86 backend tries to account for GPR to XMM register moves. */
> - record_stmt_cost (cost_vec, 1, kind,
> + record_stmt_cost (cost_vec, 1, kind, nullptr,
> (kind != vector_load && !passed) ? node : nullptr,
> vectype, 0, vect_prologue);
> if (kind != vector_load)
> @@ -8463,11 +8463,11 @@ vectorizable_bb_reduc_epilogue (slp_instance
> instance,
> cost log2 vector operations plus shuffles and one extraction. */
> unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
> record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
> - vectype, 0, vect_body);
> + NULL, vectype, 0, vect_body);
> record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
> - vectype, 0, vect_body);
> + NULL, vectype, 0, vect_body);
> record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
> - vectype, 0, vect_body);
> + NULL, vectype, 0, vect_body);
>
> /* Since we replace all stmts of a possibly longer scalar reduction
> chain account for the extra scalar stmts for that. */
> @@ -8890,8 +8890,7 @@ next_lane:
> continue;
> else
> kind = scalar_stmt;
> - record_stmt_cost (cost_vec, 1, kind, orig_stmt_info,
> - SLP_TREE_VECTYPE (node), 0, vect_body);
> + record_stmt_cost (cost_vec, 1, kind, orig_stmt_info, 0, vect_body);
> }
>
> auto_vec<bool, 20> subtree_life;
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index ab60f0eb657..8f38d8bcb7c 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -117,11 +117,13 @@ record_stmt_cost (stmt_vector_for_cost
> *body_cost_vec, int count,
> unsigned
> record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
> enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> - tree vectype, int misalign,
> - enum vect_cost_model_location where)
> + int misalign, enum vect_cost_model_location where)
> {
> + gcc_assert (kind == scalar_stmt
> + || kind == scalar_load
> + || kind == scalar_store);
> return record_stmt_cost (body_cost_vec, count, kind, stmt_info, NULL,
> - vectype, misalign, where);
> + NULL_TREE, misalign, where);
> }
>
> unsigned
> @@ -130,7 +132,8 @@ record_stmt_cost (stmt_vector_for_cost *body_cost_vec,
> int count,
> tree vectype, int misalign,
> enum vect_cost_model_location where)
> {
> - return record_stmt_cost (body_cost_vec, count, kind, NULL, node,
> + return record_stmt_cost (body_cost_vec, count, kind,
> + SLP_TREE_REPRESENTATIVE (node), node,
> vectype, misalign, where);
> }
>
> @@ -905,11 +908,8 @@ vect_mark_stmts_to_be_vectorized (loop_vec_info
> loop_vinfo, bool *fatal)
> be generated for the single vector op. We will handle that shortly. */
>
> static void
> -vect_model_simple_cost (vec_info *,
> - stmt_vec_info stmt_info, int ncopies,
> - enum vect_def_type *dt,
> - int ndts,
> - slp_tree node,
> +vect_model_simple_cost (vec_info *, int ncopies, enum vect_def_type *dt,
> + int ndts, slp_tree node,
> stmt_vector_for_cost *cost_vec,
> vect_cost_for_stmt kind = vector_stmt)
> {
> @@ -928,11 +928,11 @@ vect_model_simple_cost (vec_info *,
> for (int i = 0; i < ndts; i++)
> if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> - stmt_info, 0, vect_prologue);
> + node, 0, vect_prologue);
>
> /* Pass the inside-of-loop statements to the target-specific cost model.
> */
> inside_cost += record_stmt_cost (cost_vec, ncopies, kind,
> - stmt_info, 0, vect_body);
> + node, 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -950,7 +950,7 @@ vect_model_simple_cost (vec_info *,
> is true the stmt is doing widening arithmetic. */
>
> static void
> -vect_model_promotion_demotion_cost (stmt_vec_info stmt_info,
> +vect_model_promotion_demotion_cost (slp_tree node,
> enum vect_def_type *dt,
> unsigned int ncopies, int pwr,
> stmt_vector_for_cost *cost_vec,
> @@ -964,7 +964,7 @@ vect_model_promotion_demotion_cost (stmt_vec_info
> stmt_info,
> inside_cost += record_stmt_cost (cost_vec, ncopies,
> widen_arith
> ? vector_stmt : vec_promote_demote,
> - stmt_info, 0, vect_body);
> + node, 0, vect_body);
> ncopies *= 2;
> }
>
> @@ -972,7 +972,7 @@ vect_model_promotion_demotion_cost (stmt_vec_info
> stmt_info,
> for (i = 0; i < 2; i++)
> if (dt[i] == vect_constant_def || dt[i] == vect_external_def)
> prologue_cost += record_stmt_cost (cost_vec, 1, vector_stmt,
> - stmt_info, 0, vect_prologue);
> + node, 0, vect_prologue);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1019,13 +1019,15 @@ vect_get_store_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> unsigned int *inside_cost,
> stmt_vector_for_cost *body_cost_vec)
> {
> + tree vectype
> + = slp_node ? SLP_TREE_VECTYPE (slp_node) : STMT_VINFO_VECTYPE
> (stmt_info);
> switch (alignment_support_scheme)
> {
> case dr_aligned:
> {
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> - vector_store, stmt_info, slp_node, 0,
> - vect_body);
> + vector_store, stmt_info, slp_node,
> + vectype, 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1038,7 +1040,7 @@ vect_get_store_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> /* Here, we assign an additional cost for the unaligned store. */
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> unaligned_store, stmt_info, slp_node,
> - misalignment, vect_body);
> + vectype, misalignment, vect_body);
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "vect_model_store_cost: unaligned supported by "
> @@ -1072,12 +1074,16 @@ vect_get_load_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> stmt_vector_for_cost *body_cost_vec,
> bool record_prologue_costs)
> {
> + tree vectype
> + = slp_node ? SLP_TREE_VECTYPE (slp_node) : STMT_VINFO_VECTYPE
> (stmt_info);
> +
> switch (alignment_support_scheme)
> {
> case dr_aligned:
> {
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, slp_node, vectype,
> + 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1090,7 +1096,7 @@ vect_get_load_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> /* Here, we assign an additional cost for the unaligned load. */
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> unaligned_load, stmt_info, slp_node,
> - misalignment, vect_body);
> + vectype, misalignment, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1102,18 +1108,19 @@ vect_get_load_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> case dr_explicit_realign:
> {
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies * 2,
> - vector_load, stmt_info, slp_node, 0,
> - vect_body);
> + vector_load, stmt_info, slp_node,
> + vectype, 0, vect_body);
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies,
> - vec_perm, stmt_info, slp_node, 0,
> - vect_body);
> + vec_perm, stmt_info, slp_node,
> + vectype, 0, vect_body);
>
> /* FIXME: If the misalignment remains fixed across the iterations of
> the containing loop, the following cost should be added to the
> prologue costs. */
> if (targetm.vectorize.builtin_mask_for_load)
> *inside_cost += record_stmt_cost (body_cost_vec, 1, vector_stmt,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, slp_node, vectype,
> + 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -1139,17 +1146,21 @@ vect_get_load_cost (vec_info *, stmt_vec_info
> stmt_info, slp_tree slp_node,
> {
> *prologue_cost += record_stmt_cost (prologue_cost_vec, 2,
> vector_stmt, stmt_info,
> - slp_node, 0, vect_prologue);
> + slp_node, vectype,
> + 0, vect_prologue);
> if (targetm.vectorize.builtin_mask_for_load)
> *prologue_cost += record_stmt_cost (prologue_cost_vec, 1,
> vector_stmt, stmt_info,
> - slp_node, 0, vect_prologue);
> + slp_node, vectype,
> + 0, vect_prologue);
> }
>
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies, vector_load,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, slp_node, vectype,
> + 0, vect_body);
> *inside_cost += record_stmt_cost (body_cost_vec, ncopies, vec_perm,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, slp_node, vectype,
> + 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -3406,11 +3417,11 @@ vectorizable_bswap (vec_info *vinfo,
> STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_bswap");
> record_stmt_cost (cost_vec,
> - 1, vector_stmt, stmt_info, 0, vect_prologue);
> + 1, vector_stmt, slp_node, 0, vect_prologue);
> record_stmt_cost (cost_vec,
> slp_node
> ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) :
> ncopies,
> - vec_perm, stmt_info, 0, vect_body);
> + vec_perm, slp_node, 0, vect_body);
> return true;
> }
>
> @@ -3756,11 +3767,10 @@ vectorizable_call (vec_info *vinfo,
> }
> STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_call");
> - vect_model_simple_cost (vinfo, stmt_info,
> - ncopies, dt, ndts, slp_node, cost_vec);
> + vect_model_simple_cost (vinfo, ncopies, dt, ndts, slp_node, cost_vec);
> if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
> record_stmt_cost (cost_vec, ncopies / 2,
> - vec_promote_demote, stmt_info, 0, vect_body);
> + vec_promote_demote, slp_node, 0, vect_body);
>
> if (loop_vinfo
> && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> @@ -4724,8 +4734,7 @@ vectorizable_simd_clone_call (vec_info *vinfo,
> stmt_vec_info stmt_info,
>
> STMT_VINFO_TYPE (stmt_info) = call_simd_clone_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_simd_clone_call");
> -/* vect_model_simple_cost (vinfo, stmt_info, ncopies,
> - dt, slp_node, cost_vec); */
> +/* vect_model_simple_cost (vinfo, ncopies, dt, slp_node, cost_vec); */
> return true;
> }
>
> @@ -5922,7 +5931,7 @@ vectorizable_conversion (vec_info *vinfo,
> if (modifier == NONE)
> {
> STMT_VINFO_TYPE (stmt_info) = type_conversion_vec_info_type;
> - vect_model_simple_cost (vinfo, stmt_info, (1 + multi_step_cvt),
> + vect_model_simple_cost (vinfo, (1 + multi_step_cvt),
> dt, ndts, slp_node, cost_vec);
> }
> else if (modifier == NARROW_SRC || modifier == NARROW_DST)
> @@ -5930,7 +5939,7 @@ vectorizable_conversion (vec_info *vinfo,
> STMT_VINFO_TYPE (stmt_info) = type_demotion_vec_info_type;
> /* The final packing step produces one vector result per copy. */
> unsigned int nvectors = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> - vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
> + vect_model_promotion_demotion_cost (slp_node, dt, nvectors,
> multi_step_cvt, cost_vec,
> widen_arith);
> }
> @@ -5942,7 +5951,7 @@ vectorizable_conversion (vec_info *vinfo,
> so >> MULTI_STEP_CVT divides by 2^(number of steps - 1). */
> unsigned int nvectors
> = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) >> multi_step_cvt;
> - vect_model_promotion_demotion_cost (stmt_info, dt, nvectors,
> + vect_model_promotion_demotion_cost (slp_node, dt, nvectors,
> multi_step_cvt, cost_vec,
> widen_arith);
> }
> @@ -6291,8 +6300,7 @@ vectorizable_assignment (vec_info *vinfo,
> STMT_VINFO_TYPE (stmt_info) = assignment_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_assignment");
> if (!vect_nop_conversion_p (stmt_info))
> - vect_model_simple_cost (vinfo, stmt_info, ncopies, dt, ndts, slp_node,
> - cost_vec);
> + vect_model_simple_cost (vinfo, ncopies, dt, ndts, slp_node, cost_vec);
> return true;
> }
>
> @@ -6662,7 +6670,7 @@ vectorizable_shift (vec_info *vinfo,
> }
> STMT_VINFO_TYPE (stmt_info) = shift_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_shift");
> - vect_model_simple_cost (vinfo, stmt_info, ncopies, dt,
> + vect_model_simple_cost (vinfo, ncopies, dt,
> scalar_shift_arg ? 1 : ndts, slp_node, cost_vec);
> return true;
> }
> @@ -7099,8 +7107,7 @@ vectorizable_operation (vec_info *vinfo,
>
> STMT_VINFO_TYPE (stmt_info) = op_vec_info_type;
> DUMP_VECT_SCOPE ("vectorizable_operation");
> - vect_model_simple_cost (vinfo, stmt_info,
> - 1, dt, ndts, slp_node, cost_vec);
> + vect_model_simple_cost (vinfo, 1, dt, ndts, slp_node, cost_vec);
> if (using_emulated_vectors_p)
> {
> /* The above vect_model_simple_cost call handles constants
> @@ -8658,7 +8665,7 @@ vectorizable_store (vec_info *vinfo,
> unsigned int inside_cost = 0, prologue_cost = 0;
> if (vls_type == VLS_STORE_INVARIANT)
> prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> - stmt_info, 0, vect_prologue);
> + slp_node, 0, vect_prologue);
> vect_get_store_cost (vinfo, stmt_info, slp_node, ncopies,
> alignment_support_scheme, misalignment,
> &inside_cost, cost_vec);
> @@ -8730,7 +8737,7 @@ vectorizable_store (vec_info *vinfo,
> }
> else if (vls_type != VLS_STORE_INVARIANT)
> return;
> - *prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> stmt_info,
> + *prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> slp_node, 0, vect_prologue);
> };
>
> @@ -9039,7 +9046,7 @@ vectorizable_store (vec_info *vinfo,
> if (nstores > 1)
> inside_cost
> += record_stmt_cost (cost_vec, n_adjacent_stores,
> - vec_to_scalar, stmt_info, slp_node,
> + vec_to_scalar, slp_node,
> 0, vect_body);
> }
> if (dump_enabled_p ())
> @@ -9377,7 +9384,7 @@ vectorizable_store (vec_info *vinfo,
> {
> if (costing_p && vls_type == VLS_STORE_INVARIANT)
> prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> - stmt_info, slp_node, 0,
> + slp_node, 0,
> vect_prologue);
> else if (!costing_p)
> {
> @@ -9452,8 +9459,7 @@ vectorizable_store (vec_info *vinfo,
> unsigned int cnunits = vect_nunits_for_cost (vectype);
> inside_cost
> += record_stmt_cost (cost_vec, cnunits, scalar_store,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> continue;
> }
>
> @@ -9521,7 +9527,7 @@ vectorizable_store (vec_info *vinfo,
> unsigned int cnunits = vect_nunits_for_cost (vectype);
> inside_cost
> += record_stmt_cost (cost_vec, cnunits, scalar_store,
> - stmt_info, slp_node, 0, vect_body);
> + slp_node, 0, vect_body);
> continue;
> }
>
> @@ -9629,14 +9635,14 @@ vectorizable_store (vec_info *vinfo,
> consumed by the load). */
> inside_cost
> += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
> - stmt_info, slp_node, 0, vect_body);
> + slp_node, 0, vect_body);
> /* N scalar stores plus extracting the elements. */
> inside_cost
> += record_stmt_cost (cost_vec, cnunits, vec_to_scalar,
> - stmt_info, slp_node, 0, vect_body);
> + slp_node, 0, vect_body);
> inside_cost
> += record_stmt_cost (cost_vec, cnunits, scalar_store,
> - stmt_info, slp_node, 0, vect_body);
> + slp_node, 0, vect_body);
> continue;
> }
>
> @@ -9830,8 +9836,7 @@ vectorizable_store (vec_info *vinfo,
> int group_size = DR_GROUP_SIZE (first_stmt_info);
> int nstmts = ceil_log2 (group_size) * group_size;
> inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> "vect_model_store_cost: "
> @@ -9860,8 +9865,7 @@ vectorizable_store (vec_info *vinfo,
> {
> if (costing_p)
> inside_cost += record_stmt_cost (cost_vec, 1, vec_perm,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> else
> {
> tree perm_mask = perm_mask_for_reverse (vectype);
> @@ -10080,11 +10084,11 @@ vectorizable_store (vec_info *vinfo,
> /* Spill. */
> prologue_cost
> += record_stmt_cost (cost_vec, ncopies, vector_store,
> - stmt_info, slp_node, 0, vect_epilogue);
> + slp_node, 0, vect_epilogue);
> /* Loads. */
> prologue_cost
> += record_stmt_cost (cost_vec, ncopies * nregs, scalar_load,
> - stmt_info, slp_node, 0, vect_epilogue);
> + slp_node, 0, vect_epilogue);
> }
> }
> }
> @@ -10657,9 +10661,8 @@ vectorizable_load (vec_info *vinfo,
> enum vect_cost_model_location cost_loc
> = hoist_p ? vect_prologue : vect_body;
> unsigned int cost = record_stmt_cost (cost_vec, 1, scalar_load,
> - stmt_info, slp_node, 0,
> - cost_loc);
> - cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info,
> + stmt_info, 0, cost_loc);
> + cost += record_stmt_cost (cost_vec, 1, scalar_to_vec,
> slp_node, 0, cost_loc);
> unsigned int prologue_cost = hoist_p ? cost : 0;
> unsigned int inside_cost = hoist_p ? 0 : cost;
> @@ -10925,8 +10928,7 @@ vectorizable_load (vec_info *vinfo,
> n_adjacent_loads++;
> else
> inside_cost += record_stmt_cost (cost_vec, 1, scalar_load,
> - stmt_info, slp_node, 0,
> - vect_body);
> + stmt_info, 0, vect_body);
> continue;
> }
> tree this_off = build_int_cst (TREE_TYPE (alias_off),
> @@ -10964,8 +10966,7 @@ vectorizable_load (vec_info *vinfo,
> {
> if (costing_p)
> inside_cost += record_stmt_cost (cost_vec, 1, vec_construct,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> else
> {
> tree vec_inv = build_constructor (lvectype, v);
> @@ -11020,8 +11021,7 @@ vectorizable_load (vec_info *vinfo,
> vect_transform_slp_perm_load (vinfo, slp_node, vNULL, NULL, vf,
> true, &n_perms, &n_loads);
> inside_cost += record_stmt_cost (cost_vec, n_perms, vec_perm,
> - first_stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> }
> else
> vect_transform_slp_perm_load (vinfo, slp_node, dr_chain, gsi, vf,
> @@ -11591,7 +11591,7 @@ vectorizable_load (vec_info *vinfo,
> unsigned int cnunits = vect_nunits_for_cost (vectype);
> inside_cost
> = record_stmt_cost (cost_vec, cnunits, scalar_load,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, 0, vect_body);
> continue;
> }
> if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
> @@ -11667,7 +11667,7 @@ vectorizable_load (vec_info *vinfo,
> unsigned int cnunits = vect_nunits_for_cost (vectype);
> inside_cost
> = record_stmt_cost (cost_vec, cnunits, scalar_load,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, 0, vect_body);
> continue;
> }
> poly_uint64 offset_nunits
> @@ -11796,16 +11796,16 @@ vectorizable_load (vec_info *vinfo,
> /* For emulated gathers N offset vector element
> offset add is consumed by the load). */
> inside_cost = record_stmt_cost (cost_vec, const_nunits,
> - vec_to_scalar, stmt_info,
> + vec_to_scalar,
> slp_node, 0, vect_body);
> /* N scalar loads plus gathering them into a
> vector. */
> inside_cost
> = record_stmt_cost (cost_vec, const_nunits, scalar_load,
> - stmt_info, slp_node, 0, vect_body);
> + stmt_info, 0, vect_body);
> inside_cost
> = record_stmt_cost (cost_vec, 1, vec_construct,
> - stmt_info, slp_node, 0, vect_body);
> + slp_node, 0, vect_body);
> continue;
> }
> unsigned HOST_WIDE_INT const_offset_nunits
> @@ -12466,8 +12466,7 @@ vectorizable_load (vec_info *vinfo,
> {
> if (costing_p)
> inside_cost = record_stmt_cost (cost_vec, 1, vec_perm,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> else
> {
> tree perm_mask = perm_mask_for_reverse (vectype);
> @@ -12536,8 +12535,7 @@ vectorizable_load (vec_info *vinfo,
> vect_transform_slp_perm_load (vinfo, slp_node, vNULL, nullptr, vf,
> true, &n_perms, nullptr);
> inside_cost = record_stmt_cost (cost_vec, n_perms, vec_perm,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
> }
> else
> {
> @@ -12564,8 +12562,7 @@ vectorizable_load (vec_info *vinfo,
> int group_size = DR_GROUP_SIZE (first_stmt_info);
> int nstmts = ceil_log2 (group_size) * group_size;
> inside_cost += record_stmt_cost (cost_vec, nstmts, vec_perm,
> - stmt_info, slp_node, 0,
> - vect_body);
> + slp_node, 0, vect_body);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> @@ -12985,7 +12982,7 @@ vectorizable_condition (vec_info *vinfo,
> }
>
> STMT_VINFO_TYPE (stmt_info) = condition_vec_info_type;
> - vect_model_simple_cost (vinfo, stmt_info, ncopies, dts, ndts, slp_node,
> + vect_model_simple_cost (vinfo, ncopies, dts, ndts, slp_node,
> cost_vec, kind);
> return true;
> }
> @@ -13417,8 +13414,7 @@ vectorizable_comparison_1 (vec_info *vinfo, tree
> vectype,
> return false;
> }
>
> - vect_model_simple_cost (vinfo, stmt_info,
> - ncopies * (1 + (bitop2 != NOP_EXPR)),
> + vect_model_simple_cost (vinfo, ncopies * (1 + (bitop2 != NOP_EXPR)),
> dts, ndts, slp_node, cost_vec);
> return true;
> }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index a2f33a5ecd6..990072fca95 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2418,7 +2418,7 @@ extern int compare_step_with_zero (vec_info *,
> stmt_vec_info);
>
> extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
> enum vect_cost_for_stmt, stmt_vec_info,
> - tree, int, enum vect_cost_model_location);
> + int, enum vect_cost_model_location);
> extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
> enum vect_cost_for_stmt, slp_tree,
> tree, int, enum vect_cost_model_location);
> @@ -2430,28 +2430,15 @@ extern unsigned record_stmt_cost
> (stmt_vector_for_cost *, int,
> slp_tree, tree, int,
> enum vect_cost_model_location);
>
> -/* Overload of record_stmt_cost with VECTYPE derived from STMT_INFO. */
> -
> -inline unsigned
> -record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
> - enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> - int misalign, enum vect_cost_model_location where)
> -{
> - return record_stmt_cost (body_cost_vec, count, kind, stmt_info,
> - STMT_VINFO_VECTYPE (stmt_info), misalign, where);
> -}
> -
> -/* Overload of record_stmt_cost with VECTYPE derived from STMT_INFO and
> - SLP node specified. */
> +/* Overload of record_stmt_cost with VECTYPE derived from SLP node. */
>
> inline unsigned
> record_stmt_cost (stmt_vector_for_cost *body_cost_vec, int count,
> - enum vect_cost_for_stmt kind, stmt_vec_info stmt_info,
> - slp_tree node,
> + enum vect_cost_for_stmt kind, slp_tree node,
> int misalign, enum vect_cost_model_location where)
> {
> - return record_stmt_cost (body_cost_vec, count, kind, stmt_info, node,
> - STMT_VINFO_VECTYPE (stmt_info), misalign, where);
> + return record_stmt_cost (body_cost_vec, count, kind, node,
> + SLP_TREE_VECTYPE (node), misalign, where);
> }
>
> extern void vect_finish_replace_stmt (vec_info *, stmt_vec_info, gimple *);
> --
> 2.43.0