Tamar Christina <tamar.christ...@arm.com> writes:
> Hi All,
>
> With the middle-end providing a way to make vectorization more profitable by
> scaling vect-scalar-cost-multiplier this makes a more user friendly option
> to make it easier to use.
>
> I propose making it an actual -m option that we document and retain vs using
> the parameter name.  In the future I would like to extend this option to 
> modify
> additional costing in the AArch64 backend itself.
>
> This can be used together with --param aarch64-autovec-preference to get the
> vectorizer to say, always vectorize with SVE.  I did consider making this an
> additional enum to --param aarch64-autovec-preference but I also think this is
> a useful thing to be able to set with pragmas and attributes, but am open to
> suggestions.
>
> Note that as a follow up I plan on extending -fdump-tree-vect to support 
> -stats
> which is then intended to be usable with this flag.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>       * config/aarch64/aarch64.opt (max-vectorization): New.
>       * config/aarch64/aarch64.cc (aarch64_override_options_internal): Save
>       and restore option.
>       Implement it through vect-scalar-cost-multiplier.
>       (aarch64_attributes): Default to off.
>       * common/config/aarch64/aarch64-common.cc (aarch64_handle_option):
>       Initialize option.
>       * doc/extend.texi (max-vectorization): Document attribute.
>       * doc/invoke.texi (max-vectorization): Document flag.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/aarch64/sve/cost_model_17.c: New test.
>       * gcc.target/aarch64/sve/cost_model_18.c: New test.

Sorry for the slow reply.  This was obviously paired with the original
version of patch 1, but I was waiting to see how that developed before
reviewing this.  I've taken the updated version of patch 1 into account
below.

> ---
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc 
> b/gcc/common/config/aarch64/aarch64-common.cc
> index 
> b9ed83642ade4462f1b030d68cf9744d31d70c23..1488697c6ce43108ae2938e5b8a00ac7ac262da6
>  100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -142,6 +142,10 @@ aarch64_handle_option (struct gcc_options *opts,
>        opts->x_aarch64_flag_outline_atomics = val;
>        return true;
>  
> +    case OPT_mmax_vectorization:
> +      opts->x_flag_aarch64_max_vectorization = val;
> +      return true;
> +
>      default:
>        return true;
>      }
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..46204264fea5af781be15374edc89587429518cb
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18973,6 +18973,12 @@ aarch64_override_options_internal (struct 
> gcc_options *opts)
>    if (TARGET_SME && !TARGET_SVE2)
>      sorry ("no support for %qs without %qs", "sme", "sve2");
>  
> +  /* Set scalar costing to a high value such that we always pick
> +     vectorization.  */
> +  if (opts->x_flag_aarch64_max_vectorization)
> +    SET_OPTION_IF_UNSET (opts, &global_options_set,
> +                      param_vect_scalar_cost_multiplier, 0xffff);

The new maximum in patch 1 is 10000, which is less than this.
I suppose we should use 10000 here too.

> +
>    aarch64_override_options_after_change_1 (opts);
>  }
>  
> @@ -19723,6 +19729,8 @@ static const struct aarch64_attribute_info 
> aarch64_attributes[] =
>       OPT_msign_return_address_ },
>    { "outline-atomics", aarch64_attr_bool, true, NULL,
>       OPT_moutline_atomics},
> +  { "max-vectorization", aarch64_attr_bool, false, NULL,
> +     OPT_mmax_vectorization},
>    { NULL, aarch64_attr_custom, false, NULL, OPT____ }
>  };
>  
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 
> f32d56d4ffaef7862c1c45a11753be5d480220d0..2725c50da64a2c05489ea6202bdd5eedf1ba7e27
>  100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -290,6 +290,10 @@ msve-vector-bits=
>  Target RejectNegative Joined Enum(sve_vector_bits) 
> Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
>  -msve-vector-bits=<number>   Set the number of bits in an SVE vector 
> register.
>  
> +mmax-vectorization
> +Target Undocumented Var(flag_aarch64_max_vectorization) Save

It is (rightly) not undocumented :)

> +Override the scalar cost model such that vectorization is always profitable.
> +
>  mverbose-cost-dump
>  Target Undocumented Var(flag_aarch64_verbose_cost)
>  Enables verbose cost model dumping in the debug dump files.
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 
> 40ccf22b29f4316928f905ec2c978fdaf30a55ec..759a04bc7c4c66155154d55045bb75d695b2d6c2
>  100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3882,6 +3882,13 @@ Enable or disable calls to out-of-line helpers to 
> implement atomic operations.
>  This corresponds to the behavior of the command-line options
>  @option{-moutline-atomics} and @option{-mno-outline-atomics}.
>  
> +@cindex @code{max-vectorization} function attribute, AArch64
> +@item max-vectorization
> +Enable or disable loop costing overrides inside the current function to apply

I suppose this applies to outline-atomics too, but: I think we should list:

@itemx no-max-vectorization

otherwise it doesn't seem clear how the choice between enabling and
disabling is made.

> +a penalty to scalar loops such that vector costing is always profitable.

How about something like:

@code{max-vectorization} tells GCC's vectorizer to treat all vector
loops as being more profitable than the original scalar loops when
optimizing the current function.  @code{no-max-vectorization} disables
this behavior.

I'm just not sure how well understood "vector costing" would be.

> +This corresponds to the behavior of the command-line options
> +@option{-mmax-vectorization} and @option{-mno-max-vectorization}.
> +
>  @cindex @code{indirect_return} function attribute, AArch64
>  @item indirect_return
>  The @code{indirect_return} attribute can be applied to a function type
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 75aecf0a317799b2618fa7b2a641412e9e29bfa4..9f65f154aa27c008213c327440d04a1a056fe68d
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21984,6 +21984,12 @@ used directly.  The same applies when using 
> @option{-mcpu=} when the
>  selected cpu supports the @samp{lse} feature.
>  This option is on by default.
>  
> +@item -mmax-vectorization
> +@itemx -mno-max-vectorization
> +Enable or disable override to vectorizer cost model making vectorization 
> always
> +profitable.  This option can be combined with other options allowing you to 
> pick
> +which ISA is used during vectorization.

I think we should say how, which suggests that we should add an -m
option for aarch64-autovec-preference too.  Also, we should explicitly
say that, unlike -fvect-cost-model=unlimited, this does not affect the
comparison between different vector implementations.

Thanks,
Richard

> +
>  @opindex march
>  @item -march=@var{name}
>  Specify the name of the target architecture and, optionally, one or
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization 
> -fdump-tree-vect-details" } */
> +
> +void
> +foo (char *restrict a, int *restrict b, int *restrict c,
> +     int *restrict d, int stride)
> +{
> +    if (stride <= 1)
> +        return;
> +
> +    for (int i = 0; i < 3; i++)
> +        {
> +            int res = c[i];
> +            int t = b[i * stride];
> +            if (a[i] != 0)
> +                res = t * d[i];
> +            c[i] = res;
> +        }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..8e91f9e9c29971a4bb7033be6be4d2fc4c71d05a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -fdump-tree-vect-details" } */
> +
> +void __attribute__ (( target ("max-vectorization")))
> +foo (char *restrict a, int *restrict b, int *restrict c,
> +     int *restrict d, int stride)
> +{
> +    if (stride <= 1)
> +        return;
> +
> +    for (int i = 0; i < 3; i++)
> +        {
> +            int res = c[i];
> +            int t = b[i * stride];
> +            if (a[i] != 0)
> +                res = t * d[i];
> +            c[i] = res;
> +        }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */

Reply via email to