On 8/15/24 09:26, Richard Sandiford wrote:
Victor Do Nascimento <victor.donascime...@arm.com> writes:
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:

1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.

Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.

gcc/ChangeLog:

        * config/aarch64/aarch64-simd.md
        (<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
        (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
        (usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
        (usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
        (<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
        (popcount<mode2>): fix use of `udot_prod_optab'.
        * gcc/config/aarch64/aarch64-sve.md
        (<sur>dot_prod<vsi2qi>): Renamed to...
        (<sur>dot_prod<mode><vsi2qi>): ...this.
        (@<sur>dot_prod<vsi2qi>): Renamed to...
        (@<sur>dot_prod<mode><vsi2qi>): ...this.
        (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
        * gcc/config/aarch64/aarch64-sve2.md
        (@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
        (<sur>dot_prodvnx4sivnx8hi): ...this.
        * config/aarch64/aarch64-simd-builtins.def: Modify macro
        expansion-based initialization and expansion
        of (u|s|us)dot_prod builtins.
        * config/aarch64/aarch64-sve-builtins-base.cc
        (svdot_impl::expand): s/direct/convert/ in
        `convert_optab_handler_for_sign' function call.
        (svusdot_impl::expand): add second mode argument in call to
        `code_for_dot_prod'.
        * config/aarch64/aarch64-sve-builtins.cc
        (function_expander::convert_optab_handler_for_sign): New class
        method.
        * config/aarch64/aarch64-sve-builtins.h
        (class function_expander): Add prototype for new
        `convert_optab_handler_for_sign' method.

gcc/testsuite/ChangeLog:
        * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.

Could you run the patch through contrib/check_GNU_style.py to catch
the long lines?

---
  gcc/config/aarch64/aarch64-builtins.cc        |  7 ++++++
  gcc/config/aarch64/aarch64-simd-builtins.def  |  6 ++---
  gcc/config/aarch64/aarch64-simd.md            |  9 ++++---
  .../aarch64/aarch64-sve-builtins-base.cc      | 13 +++++-----
  gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++++++++++
  gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +++
  gcc/config/aarch64/aarch64-sve.md             |  6 ++---
  gcc/config/aarch64/aarch64-sve2.md            |  2 +-
  .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++++++++++++++
  9 files changed, 71 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
[...]
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index 0a560eaedca..975eca0bbd6 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3745,6 +3745,23 @@ function_expander::direct_optab_handler_for_sign (optab 
signed_op,
    return ::direct_optab_handler (op, mode);
  }
+/* Choose between signed and unsigned convert optabs SIGNED_OP and
+   UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
+   pick the appropriate optab handler for the mode.  Use MODE as the
+   mode if given, otherwise use the mode of type suffix SUFFIX_I.  */

The last sentence needs to be adapted for this function.  Also, because
there is no longer a single mode, I don't think it makes sense to allow
a default.  So how about:

/* Choose between signed and unsigned convert optabs SIGNED_OP and
    UNSIGNED_OP based on the signedness of type suffix SUFFIX_I, then
    pick the appropriate optab handler for "converting" from FROM_MODE
    to TO_MODE.  */

+insn_code
+function_expander::convert_optab_handler_for_sign (optab signed_op,
+                                                  optab unsigned_op,
+                                                  unsigned int suffix_i,
+                                                  machine_mode to_mode,
+                                                  machine_mode from_mode)
+{
+  if (from_mode == VOIDmode)
+    from_mode = vector_mode (suffix_i);

This code would then be removed.

+  optab op = type_suffix (suffix_i).unsigned_p ? unsigned_op : signed_op;
+  return ::convert_optab_handler (op, to_mode, from_mode);
+}
+
  /* Return true if X overlaps any input.  */
  bool
  function_expander::overlaps_input_p (rtx x)
diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h 
b/gcc/config/aarch64/aarch64-sve-builtins.h
index 9ab6f202c30..7534a58c3d7 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins.h
@@ -659,6 +659,9 @@ public:
    insn_code direct_optab_handler (optab, unsigned int = 0);
    insn_code direct_optab_handler_for_sign (optab, optab, unsigned int = 0,
                                           machine_mode = E_VOIDmode);
+  insn_code convert_optab_handler_for_sign (optab, optab, unsigned int = 0,
+                                           machine_mode = E_VOIDmode,
+                                           machine_mode = E_VOIDmode);

and the "= E_VOIDmode"s here too.

    machine_mode result_mode () const;
[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c 
b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
new file mode 100644
index 00000000000..453f3a75e6f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-march=armv9.2-a+sme2 -O2 -ftree-vectorize" } */

Could you remove the -march option in favour of:

   #pragma GCC target "+sme2"

?  That way, we honour the user's test flags if they already include sme2.

LGTM otherwise, thanks.

Richard

Thanks Richard, sorry the requested changes didn't make it into the posted V3 patch series.

Your proposed improvements make good sense and have been implemented in accordance with the provided feedback.

Cheers,
Victor

+
+#include <stdint.h>
+
+uint32_t udot2(int n, uint16_t* data) __arm_streaming
+{
+  uint32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+int32_t sdot2(int n, int16_t* data) __arm_streaming
+{
+  int32_t sum = 0;
+  for (int i=0; i<n; i+=1) {
+    sum += data[i] * data[i];
+  }
+  return sum;
+}
+
+/* { dg-final { scan-assembler-times {\tudot\tz[0-9]+\.s, z[0-9]+\.h, 
z[0-9]+\.h\n} 5 } } */
+/* { dg-final { scan-assembler-times {\tsdot\tz[0-9]+\.s, z[0-9]+\.h, 
z[0-9]+\.h\n} 5 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\t} 4 } } */

Reply via email to