Hi Richi,
on 2021/8/23 下午10:33, Richard Biener via Gcc-patches wrote:
> This removes --param vect-inner-loop-cost-factor in favor of looking
> at the estimated number of iterations of the inner loop
> when available and otherwise just assumes a single inner
> iteration which is conservative on the side of not vectorizing.
>
I may miss something, the factor seems to be an amplifier, a single
inner iteration on the side of not vectorizing only relies on that
vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction
will be flipped? ({vector,scalar}_cost is only for inner loop part).
Since we don't calculate/compare costing for inner loop independently
and early return if scalar_cost < vector_cost for inner loop, I guess
it's possible to have "scalar_cost < vector_cost" case theoretically,
especially when targets can cost something more on vector side.
> The alternative is to retain the --param for exactly that case,
> not sure if the result is better or not. The --param is new on
> head, it was static '50' before.
>
I think the intention of --param is to offer ports a way to tweak
it (no ports do it for now though :)). Not sure how target costing
is sensitive to this factor, but I also prefer to make its default
value as 50 as Honza suggested to avoid more possible tweakings.
If targets want more, maybe we can extend it to:
default_hook:
return estimated or likely_max if either is valid;
return default value;
target hook:
val = default_hook; // or from scratch
tweak the val as it wishes;
I guess there is no this need for now.
> Any strong opinions?
>
> Richard.
>
> 2021-08-23 Richard Biener <[email protected]>
>
> * doc/invoke.texi (vect-inner-loop-cost-factor): Remove
> documentation.
> * params.opt (--param vect-inner-loop-cost-factor): Remove.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info):
> Initialize inner_loop_cost_factor to 1.
> (vect_analyze_loop_form): Initialize inner_loop_cost_factor
> from the estimated number of iterations of the inner loop.
> ---
> gcc/doc/invoke.texi | 5 -----
> gcc/params.opt | 4 ----
> gcc/tree-vect-loop.c | 12 +++++++++++-
> 3 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c057cc1e4ae..054950132f6 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and
> stores in all loops.
> The parameter only has an effect on targets that support partial
> vector loads and stores.
>
> -@item vect-inner-loop-cost-factor
> -The factor which the loop vectorizer applies to the cost of statements
> -in an inner loop relative to the loop being vectorized. The default
> -value is 50.
> -
> @item avoid-fma-max-bits
> Maximum number of bits for which we avoid creating FMAs.
>
> diff --git a/gcc/params.opt b/gcc/params.opt
> index f9264887b40..f7b19fa430d 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the
> vectorizer's loop versioning f
> Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2)
> IntegerRange(0, 2) Param Optimization
> Controls how loop vectorizer uses partial vectors. 0 means never, 1 means
> only for loops whose need to iterate can be removed, 2 means for all loops.
> The default value is 2.
>
> --param=vect-inner-loop-cost-factor=
> -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50)
> IntegerRange(1, 999999) Param Optimization
> -The factor which the loop vectorizer applies to the cost of statements in an
> inner loop relative to the loop being vectorized.
> -
> ; This comment is to ensure we retain the blank line above.
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c521b43a47c..cb48717f20e 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in,
> vec_info_shared *shared)
> single_scalar_iteration_cost (0),
> vec_outside_cost (0),
> vec_inside_cost (0),
> - inner_loop_cost_factor (param_vect_inner_loop_cost_factor),
> + inner_loop_cost_factor (1),
> vectorizable (false),
> can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
> using_partial_vectors_p (false),
> @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop,
> vec_info_shared *shared)
> stmt_vec_info inner_loop_cond_info
> = loop_vinfo->lookup_stmt (inner_loop_cond);
> STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type;
> + /* If we have an estimate on the number of iterations of the inner
> + loop use that as the scale for costing, otherwise conservatively
> + assume a single inner iteration. */
> + widest_int nit;
> + if (get_estimated_loop_iterations (loop->inner, &nit))
> + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo)
> + /* Since costing is done on unsigned int cap the scale on
> + some large number consistent with what we'd see in
> + CFG counts. */
> + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi ();
I noticed loop-doloop.c use _int version and likely_max, maybe you want that
here?
est_niter = get_estimated_loop_iterations_int (loop);
if (est_niter == -1)
est_niter = get_likely_max_loop_iterations_int (loop)
BR,
Kewen