On Mon, 19 May 2025, Tamar Christina wrote:
> > > +-param=vect-scalar-cost-multiplier=
> > > +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(1)
> > IntegerRange(0, 100000) Param Optimization
> > > +The scaling multiplier to add to all scalar loop costing when performing
> > vectorization profitability analysis. The default value is 1.
> > > +
> >
> > Note this only allows whole number scaling. May I suggest to instead
> > use percentage as unit, thus the multiplier is --param
> > param_vect_scalar_cost_multiplier / 100?
> >
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
OK.
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * params.opt (vect-scalar-cost-multiplier): New.
> * tree-vect-loop.cc (vect_estimate_min_profitable_iters): Use it.
> * doc/invoke.texi (vect-scalar-cost-multiplier): Document it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/cost_model_16.c: New test.
>
> -- inline copy of patch --
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index
> 699ee1cc0b7580d4729bbefff8f897eed1c3e49b..95a25c0f63b77f26db05a7b48bfad8f9c58bcc5f
> 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -17273,6 +17273,10 @@ this parameter. The default value of this parameter
> is 50.
> @item vect-induction-float
> Enable loop vectorization of floating point inductions.
>
> +@item vect-scalar-cost-multiplier
> +Apply the given multiplier % to scalar loop costing during vectorization.
> +Increasing the cost multiplier will make vector loops more profitable.
> +
> @item vrp-block-limit
> Maximum number of basic blocks before VRP switches to a lower memory
> algorithm.
>
> diff --git a/gcc/params.opt b/gcc/params.opt
> index
> 1f0abeccc4b9b439ad4a4add6257b4e50962863d..a67f900a63f7187b1daa593fe17cd88f2fc32367
> 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -1253,6 +1253,10 @@ The maximum factor which the loop vectorizer applies
> to the cost of statements i
> Common Joined UInteger Var(param_vect_induction_float) Init(1)
> IntegerRange(0, 1) Param Optimization
> Enable loop vectorization of floating point inductions.
>
> +-param=vect-scalar-cost-multiplier=
> +Common Joined UInteger Var(param_vect_scalar_cost_multiplier) Init(100)
> IntegerRange(0, 10000) Param Optimization
> +The scaling multiplier as a percentage to apply to all scalar loop costing
> when performing vectorization profitability analysis. The default value is
> 100.
> +
> -param=vrp-block-limit=
> Common Joined UInteger Var(param_vrp_block_limit) Init(150000) Optimization
> Param
> Maximum number of basic blocks before VRP switches to a fast model with less
> memory requirements.
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization
> -fdump-tree-vect-details" } */
> +
> +void
> +foo (char *restrict a, int *restrict b, int *restrict c,
> + int *restrict d, int stride)
> +{
> + if (stride <= 1)
> + return;
> +
> + for (int i = 0; i < 3; i++)
> + {
> + int res = c[i];
> + int t = b[i * stride];
> + if (a[i] != 0)
> + res = t * d[i];
> + c[i] = res;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index
> fe6f3cf188e40396b299ff9e814cc402bc2d4e2d..c18e75794046f506c473b36639e6ae6658a5516b
> 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -4646,7 +4646,8 @@ vect_estimate_min_profitable_iters (loop_vec_info
> loop_vinfo,
> TODO: Consider assigning different costs to different scalar
> statements. */
>
> - scalar_single_iter_cost = loop_vinfo->scalar_costs->total_cost ();
> + scalar_single_iter_cost = (loop_vinfo->scalar_costs->total_cost ()
> + * param_vect_scalar_cost_multiplier) / 100;
>
> /* Add additional cost for the peeled instructions in prologue and epilogue
> loop. (For fully-masked loops there will be no peeling.)
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)