Victor Do Nascimento <[email protected]> writes:
> Given recent changes to the dot_prod standard pattern name, this patch
> fixes the aarch64 back-end by implementing the following changes:
>
> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
> 2. Rewrite initialization and function expansion mechanism for simd
> builtins.
> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
> builtins.
>
> Finally, given that it is now possible for the compiler to
> differentiate between the two- and four-way dot product, we add a test
> to ensure that autovectorization picks up on dot-product patterns
> where the result is twice the width of the operands.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md
> (<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
> (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
> (usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
> (usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
> (<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
> (popcount<mode2>): fix use of `udot_prod_optab'.
> * gcc/config/aarch64/aarch64-sve.md
> (<sur>dot_prod<vsi2qi>): Renamed to...
> (<sur>dot_prod<mode><vsi2qi>): ...this.
> (@<sur>dot_prod<vsi2qi>): Renamed to...
> (@<sur>dot_prod<mode><vsi2qi>): ...this.
> (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
> * gcc/config/aarch64/aarch64-sve2.md
> (@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
> (<sur>dot_prodvnx4sivnx8hi): ...this.
> * config/aarch64/aarch64-simd-builtins.def: Modify macro
> expansion-based initialization and expansion
> of (u|s|us)dot_prod builtins.
> * config/aarch64/aarch64-sve-builtins-base.cc
> (svdot_impl::expand): s/direct/convert/ in
> `convert_optab_handler_for_sign' function call.
> (svusdot_impl::expand): add second mode argument in call to
> `code_for_dot_prod'.
> * config/aarch64/aarch64-sve-builtins.cc
> (function_expander::convert_optab_handler_for_sign): New class
> method.
> * config/aarch64/aarch64-sve-builtins.h
> (class function_expander): Add prototype for new
> `convert_optab_handler_for_sign' method.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
Could you run the patch through contrib/check_GNU_style.py to catch
the long lines?
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 7 ++++++
> gcc/config/aarch64/aarch64-simd-builtins.def | 6 ++---
> gcc/config/aarch64/aarch64-simd.md | 9 ++++---
> .../aarch64/aarch64-sve-builtins-base.cc | 13 +++++-----
> gcc/config/aarch64/aarch64-sve-builtins.cc | 17 +++++++++++++
> gcc/config/aarch64/aarch64-sve-builtins.h | 3 +++
> gcc/config/aarch64/aarch64-sve.md | 6 ++---
> gcc/config/aarch64/aarch64-sve2.md | 2 +-
> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++++++++++++++++++
> 9 files changed, 71 insertions(+), 17 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> [...]
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index 0a560eaedca..975eca0bbd6 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign
> (optab signed_op,
> return ::direct_optab_handler (op, mode);
> }
>
> +/* Choose between signed and unsigned convert optabs SIGNED_OP and
> + UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
> + pick the appropriate optab handler for the mode. Use MODE as the
> + mode if given, otherwise use the mode of type suffix SUFFIX_I. */
The last sentence needs to be adapted for this function. Also, because
there is no longer a single mode, I don't think it makes sense to allow
a default. So how about:
/* Choose between signed and unsigned convert optabs SIGNED_OP and
UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
pick the appropriate optab handler for "converting" from FROM_MODE
to TO_MODE. */
> +insn_code
> +function_expander::convert_optab_handler_for_sign (optab signed_op,
> + optab unsigned_op,
> + unsigned int suffix_i,
> + machine_mode to_mode,
> + machine_mode from_mode)
> +{
> + if (from_mode == VOIDmode)
> + from_mode = vector_mode (suffix_i);
This code would then be removed.
> + optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
> + return ::convert_optab_handler (op, to_mode, from_mode);
> +}
> +
> /* Return true if X overlaps any input. */
> bool
> function_expander::overlaps_input_p (rtx x)
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h
> b/gcc/config/aarch64/aarch64-sve-builtins.h
> index 9ab6f202c30..7534a58c3d7 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.h
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.h
> @@ -659,6 +659,9 @@ public:
> insn_code direct_optab_handler (optab, unsigned int = 0);
> insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
> machine_mode = E_VOIDmode);
> + insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
> + machine_mode = E_VOIDmode,
> + machine_mode = E_VOIDmode);
and the "= E_VOIDmode"s here too.
> machine_mode result_mode () const;
>
> [...]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> new file mode 100644
> index 00000000000..453f3a75e6f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
> @@ -0,0 +1,25 @@
> +/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */
Could you remove the -march option in favour of:
#pragma GCC target "+sme2"
? That way, we honour the user's test flags if they already include sme2.
LGTM otherwise, thanks.
Richard
> +
> +#include <stdint.h>
> +
> +uint32_t udot2(int n, uint16_t* data) __arm_streaming
> +{
> + uint32_t sum = 0;
> + for (int i=0; i<n; i+=1) {
> + sum += data[i] * data[i];
> + }
> + return sum;
> +}
> +
> +int32_t sdot2(int n, int16_t* data) __arm_streaming
> +{
> + int32_t sum = 0;
> + for (int i=0; i<n; i+=1) {
> + sum += data[i] * data[i];
> + }
> + return sum;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h,
> z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h,
> z[0-9]+\.h\n} 5 } } */
> +/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */