Victor Do Nascimento <victor.donascime...@arm.com> writes:
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:
1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.
Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
(popcount<mode2>): fix use of `udot_prod_optab'.
* gcc/config/aarch64/aarch64-sve.md
(<sur>dot_prod<vsi2qi>): Renamed to...
(<sur>dot_prod<mode><vsi2qi>): ...this.
(@<sur>dot_prod<vsi2qi>): Renamed to...
(@<sur>dot_prod<mode><vsi2qi>): ...this.
(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
* gcc/config/aarch64/aarch64-sve2.md
(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
(<sur>dot_prodvnx4sivnx8hi): ...this.
* config/aarch64/aarch64-simd-builtins.def: Modify macro
expansion-based initialization and expansion
of (u|s|us)dot_prod builtins.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdot_impl::expand): s/direct/convert/ in
`convert_optab_handler_for_sign' function call.
(svusdot_impl::expand): add second mode argument in call to
`code_for_dot_prod'.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::convert_optab_handler_for_sign): New class
method.
* config/aarch64/aarch64-sve-builtins.h
(class function_expander): Add prototype for new
`convert_optab_handler_for_sign' method.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
Could you run the patch through contrib/check_GNU_style.py to catch
the long lines?
---
gcc/config/aarch64/aarch64-builtins.cc | 7 ++++++
gcc/config/aarch64/aarch64-simd-builtins.def | 6 ++---
gcc/config/aarch64/aarch64-simd.md | 9 ++++---
.../aarch64/aarch64-sve-builtins-base.cc | 13 +++++-----
gcc/config/aarch64/aarch64-sve-builtins.cc | 17 +++++++++++++
gcc/config/aarch64/aarch64-sve-builtins.h | 3 +++
gcc/config/aarch64/aarch64-sve.md | 6 ++---
gcc/config/aarch64/aarch64-sve2.md | 2 +-
.../aarch64/sme/vect-dotprod-twoway.c | 25 +++++++++++++++++++
9 files changed, 71 insertions(+), 17 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
[...]
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 0a560eaedca..975eca0bbd6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign (optab
signed_op,
return ::direct_optab_handler (op, mode);
}
+/* Choose between signed and unsigned convert optabs SIGNED_OP and
+ UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
+ pick the appropriate optab handler for the mode. Use MODE as the
+ mode if given, otherwise use the mode of type suffix SUFFIX_I. */
The last sentence needs to be adapted for this function. Also, because
there is no longer a single mode, I don't think it makes sense to allow
a default. So how about:
/* Choose between signed and unsigned convert optabs SIGNED_OP and
UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
pick the appropriate optab handler for "converting" from FROM_MODE
to TO_MODE. */
+insn_code
+function_expander::convert_optab_handler_for_sign (optab signed_op,
+ optab unsigned_op,
+ unsigned int suffix_i,
+ machine_mode to_mode,
+ machine_mode from_mode)
+{
+ if (from_mode == VOIDmode)
+ from_mode = vector_mode (suffix_i);
This code would then be removed.
+ optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
+ return ::convert_optab_handler (op, to_mode, from_mode);
+}
+
/* Return true if X overlaps any input. */
bool
function_expander::overlaps_input_p (rtx x)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9ab6f202c30..7534a58c3d7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -659,6 +659,9 @@ public:
insn_code direct_optab_handler (optab, unsigned int = 0);
insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
machine_mode = E_VOIDmode);
+ insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
+ machine_mode = E_VOIDmode,
+ machine_mode = E_VOIDmode);
and the "= E_VOIDmode"s here too.
machine_mode result_mode () const;
[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..453f3a75e6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */
Could you remove the -march option in favour of:
#pragma GCC target "+sme2"
? That way, we honour the user's test flags if they already include sme2.
LGTM otherwise, thanks.
Richard