Tamar Christina <[email protected]> writes:
> Hi All,
>
> With the middle-end providing a way to make vectorization more profitable by
> scaling vect-scalar-cost-multiplier this makes a more user friendly option
> to make it easier to use.
>
> I propose making it an actual -m option that we document and retain vs using
> the parameter name. In the future I would like to extend this option to
> modify
> additional costing in the AArch64 backend itself.
>
> This can be used together with --param aarch64-autovec-preference to get the
> vectorizer to say, always vectorize with SVE. I did consider making this an
> additional enum to --param aarch64-autovec-preference but I also think this is
> a useful thing to be able to set with pragmas and attributes, but am open to
> suggestions.
>
> Note that as a follow up I plan on extending -fdump-tree-vect to support
> -stats
> which is then intended to be usable with this flag.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.opt (max-vectorization): New.
> * config/aarch64/aarch64.cc (aarch64_override_options_internal): Save
> and restore option.
> Implement it through vect-scalar-cost-multiplier.
> (aarch64_attributes): Default to off.
> * common/config/aarch64/aarch64-common.cc (aarch64_handle_option):
> Initialize option.
> * doc/extend.texi (max-vectorization): Document attribute.
> * doc/invoke.texi (max-vectorization): Document flag.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/cost_model_17.c: New test.
> * gcc.target/aarch64/sve/cost_model_18.c: New test.
Sorry for the slow reply. This was obviously paired with the original
version of patch 1, but I was waiting to see how that developed before
reviewing this. I've taken the updated version of patch 1 into account
below.
> ---
>
> diff --git a/gcc/common/config/aarch64/aarch64-common.cc
> b/gcc/common/config/aarch64/aarch64-common.cc
> index
> b9ed83642ade4462f1b030d68cf9744d31d70c23..1488697c6ce43108ae2938e5b8a00ac7ac262da6
> 100644
> --- a/gcc/common/config/aarch64/aarch64-common.cc
> +++ b/gcc/common/config/aarch64/aarch64-common.cc
> @@ -142,6 +142,10 @@ aarch64_handle_option (struct gcc_options *opts,
> opts->x_aarch64_flag_outline_atomics = val;
> return true;
>
> + case OPT_mmax_vectorization:
> + opts->x_flag_aarch64_max_vectorization = val;
> + return true;
> +
> default:
> return true;
> }
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> 9e3f2885bccb62550c5fcfdf93d72fbc2e63233e..46204264fea5af781be15374edc89587429518cb
> 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -18973,6 +18973,12 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> if (TARGET_SME && !TARGET_SVE2)
> sorry ("no support for %qs without %qs", "sme", "sve2");
>
> + /* Set scalar costing to a high value such that we always pick
> + vectorization. */
> + if (opts->x_flag_aarch64_max_vectorization)
> + SET_OPTION_IF_UNSET (opts, &global_options_set,
> + param_vect_scalar_cost_multiplier, 0xffff);
The new maximum in patch 1 is 10000, which is less than this.
I suppose we should use 10000 here too.
> +
> aarch64_override_options_after_change_1 (opts);
> }
>
> @@ -19723,6 +19729,8 @@ static const struct aarch64_attribute_info
> aarch64_attributes[] =
> OPT_msign_return_address_ },
> { "outline-atomics", aarch64_attr_bool, true, NULL,
> OPT_moutline_atomics},
> + { "max-vectorization", aarch64_attr_bool, false, NULL,
> + OPT_mmax_vectorization},
> { NULL, aarch64_attr_custom, false, NULL, OPT____ }
> };
>
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index
> f32d56d4ffaef7862c1c45a11753be5d480220d0..2725c50da64a2c05489ea6202bdd5eedf1ba7e27
> 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -290,6 +290,10 @@ msve-vector-bits=
> Target RejectNegative Joined Enum(sve_vector_bits)
> Var(aarch64_sve_vector_bits) Init(SVE_SCALABLE)
> -msve-vector-bits=<number> Set the number of bits in an SVE vector
> register.
>
> +mmax-vectorization
> +Target Undocumented Var(flag_aarch64_max_vectorization) Save
It is (rightly) not undocumented :)
> +Override the scalar cost model such that vectorization is always profitable.
> +
> mverbose-cost-dump
> Target Undocumented Var(flag_aarch64_verbose_cost)
> Enables verbose cost model dumping in the debug dump files.
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index
> 40ccf22b29f4316928f905ec2c978fdaf30a55ec..759a04bc7c4c66155154d55045bb75d695b2d6c2
> 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3882,6 +3882,13 @@ Enable or disable calls to out-of-line helpers to
> implement atomic operations.
> This corresponds to the behavior of the command-line options
> @option{-moutline-atomics} and @option{-mno-outline-atomics}.
>
> +@cindex @code{max-vectorization} function attribute, AArch64
> +@item max-vectorization
> +Enable or disable loop costing overrides inside the current function to apply
I suppose this applies to outline-atomics too, but: I think we should list:
@itemx no-max-vectorization
otherwise it doesn't seem clear how the choice between enabling and
disabling is made.
> +a penalty to scalar loops such that vector costing is always profitable.
How about something like:
@code{max-vectorization} tells GCC's vectorizer to treat all vector
loops as being more profitable than the original scalar loops when
optimizing the current function. @code{no-max-vectorization} disables
this behavior.
I'm just not sure how well understood "vector costing" would be.
> +This corresponds to the behavior of the command-line options
> +@option{-mmax-vectorization} and @option{-mno-max-vectorization}.
> +
> @cindex @code{indirect_return} function attribute, AArch64
> @item indirect_return
> The @code{indirect_return} attribute can be applied to a function type
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index
> 75aecf0a317799b2618fa7b2a641412e9e29bfa4..9f65f154aa27c008213c327440d04a1a056fe68d
> 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21984,6 +21984,12 @@ used directly. The same applies when using
> @option{-mcpu=} when the
> selected cpu supports the @samp{lse} feature.
> This option is on by default.
>
> +@item -mmax-vectorization
> +@itemx -mno-max-vectorization
> +Enable or disable override to vectorizer cost model making vectorization
> always
> +profitable. This option can be combined with other options allowing you to
> pick
> +which ISA is used during vectorization.
I think we should say how, which suggests that we should add an -m
option for aarch64-autovec-preference too. Also, we should explicitly
say that, unlike -fvect-cost-model=unlimited, this does not affect the
comparison between different vector implementations.
Thanks,
Richard
> +
> @opindex march
> @item -march=@var{name}
> Specify the name of the target architecture and, optionally, one or
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..c405591a101d50b4734bc6d65a6d6c01888bea48
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_17.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -mmax-vectorization
> -fdump-tree-vect-details" } */
> +
> +void
> +foo (char *restrict a, int *restrict b, int *restrict c,
> + int *restrict d, int stride)
> +{
> + if (stride <= 1)
> + return;
> +
> + for (int i = 0; i < 3; i++)
> + {
> + int res = c[i];
> + int t = b[i * stride];
> + if (a[i] != 0)
> + res = t * d[i];
> + c[i] = res;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c
> b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c
> new file mode 100644
> index
> 0000000000000000000000000000000000000000..8e91f9e9c29971a4bb7033be6be4d2fc4c71d05a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cost_model_18.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Ofast -march=armv8-a+sve -fdump-tree-vect-details" } */
> +
> +void __attribute__ (( target ("max-vectorization")))
> +foo (char *restrict a, int *restrict b, int *restrict c,
> + int *restrict d, int stride)
> +{
> + if (stride <= 1)
> + return;
> +
> + for (int i = 0; i < 3; i++)
> + {
> + int res = c[i];
> + int t = b[i * stride];
> + if (a[i] != 0)
> + res = t * d[i];
> + c[i] = res;
> + }
> +}
> +
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */