Hi Richi, on 2021/8/23 下午10:33, Richard Biener via Gcc-patches wrote: > This removes --param vect-inner-loop-cost-factor in favor of looking > at the estimated number of iterations of the inner loop > when available and otherwise just assumes a single inner > iteration which is conservative on the side of not vectorizing. >
I may miss something, the factor seems to be an amplifier, a single inner iteration on the side of not vectorizing only relies on that vector_cost < scalar_cost, if scalar_cost < vector_cost, the direction will be flipped? ({vector,scalar}_cost is only for inner loop part). Since we don't calculate/compare costing for inner loop independently and early return if scalar_cost < vector_cost for inner loop, I guess it's possible to have "scalar_cost < vector_cost" case theoretically, especially when targets can cost something more on vector side. > The alternative is to retain the --param for exactly that case, > not sure if the result is better or not. The --param is new on > head, it was static '50' before. > I think the intention of --param is to offer ports a way to tweak it (no ports do it for now though :)). Not sure how target costing is sensitive to this factor, but I also prefer to make its default value as 50 as Honza suggested to avoid more possible tweakings. If targets want more, maybe we can extend it to: default_hook: return estimated or likely_max if either is valid; return default value; target hook: val = default_hook; // or from scratch tweak the val as it wishes; I guess there is no this need for now. > Any strong opinions? > > Richard. > > 2021-08-23 Richard Biener <rguent...@suse.de> > > * doc/invoke.texi (vect-inner-loop-cost-factor): Remove > documentation. > * params.opt (--param vect-inner-loop-cost-factor): Remove. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): > Initialize inner_loop_cost_factor to 1. > (vect_analyze_loop_form): Initialize inner_loop_cost_factor > from the estimated number of iterations of the inner loop. > --- > gcc/doc/invoke.texi | 5 ----- > gcc/params.opt | 4 ---- > gcc/tree-vect-loop.c | 12 +++++++++++- > 3 files changed, 11 insertions(+), 10 deletions(-) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index c057cc1e4ae..054950132f6 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -14385,11 +14385,6 @@ code to iterate. 2 allows partial vector loads and > stores in all loops. > The parameter only has an effect on targets that support partial > vector loads and stores. > > -@item vect-inner-loop-cost-factor > -The factor which the loop vectorizer applies to the cost of statements > -in an inner loop relative to the loop being vectorized. The default > -value is 50. > - > @item avoid-fma-max-bits > Maximum number of bits for which we avoid creating FMAs. > > diff --git a/gcc/params.opt b/gcc/params.opt > index f9264887b40..f7b19fa430d 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -1113,8 +1113,4 @@ Bound on number of runtime checks inserted by the > vectorizer's loop versioning f > Common Joined UInteger Var(param_vect_partial_vector_usage) Init(2) > IntegerRange(0, 2) Param Optimization > Controls how loop vectorizer uses partial vectors. 0 means never, 1 means > only for loops whose need to iterate can be removed, 2 means for all loops. > The default value is 2. > > --param=vect-inner-loop-cost-factor= > -Common Joined UInteger Var(param_vect_inner_loop_cost_factor) Init(50) > IntegerRange(1, 999999) Param Optimization > -The factor which the loop vectorizer applies to the cost of statements in an > inner loop relative to the loop being vectorized. > - > ; This comment is to ensure we retain the blank line above. > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index c521b43a47c..cb48717f20e 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -841,7 +841,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, > vec_info_shared *shared) > single_scalar_iteration_cost (0), > vec_outside_cost (0), > vec_inside_cost (0), > - inner_loop_cost_factor (param_vect_inner_loop_cost_factor), > + inner_loop_cost_factor (1), > vectorizable (false), > can_use_partial_vectors_p (param_vect_partial_vector_usage != 0), > using_partial_vectors_p (false), > @@ -1519,6 +1519,16 @@ vect_analyze_loop_form (class loop *loop, > vec_info_shared *shared) > stmt_vec_info inner_loop_cond_info > = loop_vinfo->lookup_stmt (inner_loop_cond); > STMT_VINFO_TYPE (inner_loop_cond_info) = loop_exit_ctrl_vec_info_type; > + /* If we have an estimate on the number of iterations of the inner > + loop use that as the scale for costing, otherwise conservatively > + assume a single inner iteration. */ > + widest_int nit; > + if (get_estimated_loop_iterations (loop->inner, &nit)) > + LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo) > + /* Since costing is done on unsigned int cap the scale on > + some large number consistent with what we'd see in > + CFG counts. */ > + = wi::smax (nit, REG_BR_PROB_BASE).to_uhwi (); I noticed loop-doloop.c use _int version and likely_max, maybe you want that here? est_niter = get_estimated_loop_iterations_int (loop); if (est_niter == -1) est_niter = get_likely_max_loop_iterations_int (loop) BR, Kewen