[PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-18 Thread Akram Ahmad
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
---
 gcc/config/aarch64/aarch64-builtins.cc| 13 +++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  8 +-
 gcc/config/aarch64/aarch64-simd.md| 93 +-
 gcc/config/aarch64/arm_neon.h | 96 +--
 gcc/config/aarch64/iterators.md   |  4 +
 .../saturating_arithmetic_autovect.inc| 58 +++
 .../saturating_arithmetic_autovect_1.c| 79 +++
 .../saturating_arithmetic_autovect_2.c| 79 +++
 .../saturating_arithmetic_autovect_3.c| 75 +++
 .../saturating_arithmetic_autovect_4.c| 77 +++
 .../aarch64/saturating_arithmetic.inc | 39 
 .../aarch64/saturating_arithmetic_1.c | 41 
 .../aarch64/saturating_arithmetic_2.c | 41 
 .../aarch64/saturating_arithmetic_3.c | 30 ++
 .../aarch64/saturating_arithmetic_4.c | 30 ++
 15 files changed, 707 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
b/gcc/config/aarch64/aarch64-builtins.cc
index 7d737877e0b..f2a1b6ddbf6 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -3849,6 +3849,19 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt,
  new_stmt = gimple_build_assign (gimple_call_lhs (stmt),
  LSHIFT_EXPR, args[0], args[1]);
break;
+
+  /* lower saturating add/sub neon builtins to gimple.  */
+  BUILTIN_VSDQ_I (BINOP, ssadd, 3, NONE)
+  BUILTIN_VSDQ_I (BINOPU, usadd, 3, NONE)
+   new_stmt = gimple_build_call_internal (IFN_SAT_ADD, 2, args[0], 
args[1])

[PATCH 2/2] aarch64: Use standard names for SVE saturating arithmetic

2024-10-18 Thread Akram Ahmad
Rename the existing SVE unpredicated saturating arithmetic instructions
to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md: Rename insns

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc:
Template file for auto-vectorizer tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c:
Instantiate 8-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c:
Instantiate 16-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c:
Instantiate 32-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c:
Instantiate 64-bit vector tests.
---
 gcc/config/aarch64/aarch64-sve.md |  4 +-
 .../aarch64/sve/saturating_arithmetic.inc | 68 +++
 .../aarch64/sve/saturating_arithmetic_1.c | 60 
 .../aarch64/sve/saturating_arithmetic_2.c | 60 
 .../aarch64/sve/saturating_arithmetic_3.c | 62 +
 .../aarch64/sve/saturating_arithmetic_4.c | 62 +
 6 files changed, 314 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 06bd3e4bb2c..b987b292b20 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4379,7 +4379,7 @@
 ;; -
 
 ;; Unpredicated saturating signed addition and subtraction.
-(define_insn "@aarch64_sve_"
+(define_insn "s3"
   [(set (match_operand:SVE_FULL_I 0 "register_operand")
(SBINQOPS:SVE_FULL_I
  (match_operand:SVE_FULL_I 1 "register_operand")
@@ -4395,7 +4395,7 @@
 )
 
 ;; Unpredicated saturating unsigned addition and subtraction.
-(define_insn "@aarch64_sve_"
+(define_insn "s3"
   [(set (match_operand:SVE_FULL_I 0 "register_operand")
(UBINQOPS:SVE_FULL_I
  (match_operand:SVE_FULL_I 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc 
b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
new file mode 100644
index 000..0b3ebbcb0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
@@ -0,0 +1,68 @@
+/* Template file for vector saturating arithmetic validation.
+
+   This file defines saturating addition and subtraction functions for a given
+   scalar type, testing the auto-vectorization of these two operators. This
+   type, along with the corresponding minimum and maximum values for that type,
+   must be defined by any test file which includes this template file.  */
+
+#ifndef SAT_ARIT_AUTOVEC_INC
+#define SAT_ARIT_AUTOVEC_INC
+
+#include 
+#include 
+
+#ifndef UT
+#define UT uint32_t
+#define UMAX UINT_MAX
+#define UMIN 0
+#endif
+
+void uaddq (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] + b[i];
+  out[i] = sum < a[i] ? UMAX : sum;
+}
+}
+
+void uaddq2 (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum;
+  if (!__builtin_add_overflow(a[i], b[i], &sum))
+   out[i] = sum;
+  else
+   out[i] = UMAX;
+}
+}
+
+void uaddq_imm (UT *out, UT *a, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] + 50;
+  out[i] = sum < a[i] ? UMAX : sum;
+}
+}
+
+void usubq (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] - b[i];
+  out[i] = sum > a[i] ? UMIN : sum;
+}
+}
+
+void usubq_imm (UT *out, UT *a, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] - 50;
+  out[i] = sum > a[i] ? UMIN : sum;
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
new file mode 100644
index 000..6936e9a2704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** uaddq:
+** ...
+** ld1b\tz([0-9]+)\.b, .*
+** ld1b\tz([0-9]+)\.b, .*
+** uqadd\tz\2.b, z\1\.b, z\2\.b
+** ...
+** ldr\tb([0-9]+), .*
+** ldr\tb([0-9]+), .*
+** uqadd\tb\4, b\3, b\4
+** ...
+*/
+/*
+** ua

[PATCH 0/2] aarch64: Use standard names for saturating arithmetic

2024-10-18 Thread Akram Ahmad
Hi all,

This patch series introduces standard names for scalar, Adv. SIMD, and
SVE saturating arithmetic instructions in the aarch64 backend.

Additional tests are added for unsigned saturating arithmetic, as well
as to test that the auto-vectorizer correctly inserts NEON instructions
or scalar instructions where necessary, such as in 32 and 64-bit scalar
unsigned arithmetic. There are also tests for the auto-vectorized SVE
code.

An important discussion point: this patch causes scalar 32 and 64-bit
unsigned saturating arithmetic to now use adds, csinv / subs, csel as
is expected elsewhere in the backend. This affects the NEON intrinsics
for these two modes as well. This is the cause of a few test failures,
otherwise there are no regressions on aarch64-none-linux-gnu.

SVE currently uses the unpredicated version of the instruction in the
backend.

Many thanks,

Akram

---

Akram Ahmad (2):
  aarch64: Use standard names for saturating arithmetic
  aarch64: Use standard names for SVE saturating arithmetic

 gcc/config/aarch64/aarch64-builtins.cc| 13 +++
 gcc/config/aarch64/aarch64-simd-builtins.def  |  8 +-
 gcc/config/aarch64/aarch64-simd.md| 93 +-
 gcc/config/aarch64/aarch64-sve.md |  4 +-
 gcc/config/aarch64/arm_neon.h | 96 +--
 gcc/config/aarch64/iterators.md   |  4 +
 .../saturating_arithmetic_autovect.inc| 58 +++
 .../saturating_arithmetic_autovect_1.c| 79 +++
 .../saturating_arithmetic_autovect_2.c| 79 +++
 .../saturating_arithmetic_autovect_3.c| 75 +++
 .../saturating_arithmetic_autovect_4.c| 77 +++
 .../aarch64/saturating_arithmetic.inc | 39 
 .../aarch64/saturating_arithmetic_1.c | 41 
 .../aarch64/saturating_arithmetic_2.c | 41 
 .../aarch64/saturating_arithmetic_3.c | 30 ++
 .../aarch64/saturating_arithmetic_4.c | 30 ++
 .../aarch64/sve/saturating_arithmetic.inc | 68 +
 .../aarch64/sve/saturating_arithmetic_1.c | 60 
 .../aarch64/sve/saturating_arithmetic_2.c | 60 
 .../aarch64/sve/saturating_arithmetic_3.c | 62 
 .../aarch64/sve/saturating_arithmetic_4.c | 62 
 21 files changed, 1021 insertions(+), 58 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c

-- 
2.34.1



[PATCH 2/2] Match: make SAT_ADD case 7 commutative

2024-10-21 Thread Akram Ahmad
Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
 
 _1 = BIT_FIELD_REF ;
 sum_5 = _1 + a_4(D);
 if (a_4(D) <= sum_5)
   goto ; [INV]
 else
   goto ; [INV]

  :

  :
 _2 = PHI <255(3), sum_5(2)>
 return _2;

After:
   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  _2 = .SAT_ADD (_1, a_4(D)); [tail call]
  return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures.

gcc/ChangeLog:

* match.pd: Modify existing case for SAT_ADD.
---
 gcc/match.pd | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..a77fca92181 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation add, case 7 (branch with le):
SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
 (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:c@2 @0 @1)) @2 integer_minus_onep))
 
 /* Unsigned saturation add, case 8 (branch with gt):
SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
-- 
2.34.1



[PATCH 0/2] Match: support additional cases of unsigned scalar arithmetic

2024-10-21 Thread Akram Ahmad
Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

Regression tests for aarch64-none-linux-gnu all pass with no failures.

Many thanks,

Akram

---

Akram Ahmad (2):
  Match: support new case of unsigned scalar SAT_SUB
  Match: make SAT_ADD case 7 commutative

 gcc/match.pd | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.34.1



[PATCH 1/2] Match: support new case of unsigned scalar SAT_SUB

2024-10-21 Thread Akram Ahmad
This patch adds a new case for unsigned scalar saturating subtraction
using a branch with a greater-than-or-equal condition. For example,

X >= (X - Y) ? (X - Y) : 0

is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars,
which therefore correctly matches more cases of IFN SAT_SUB.

This passes the aarch64-none-linux-gnu regression tests with no failures.

gcc/ChangeLog:

* match.pd: Add new match for SAT_SUB.
---
 gcc/match.pd | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index ee53c25cef9..4fc5efa6247 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* Unsigned saturation sub, case 11 (branch with ge):
+  SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (ge @0 (minus @0 @1))
+  (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
-- 
2.34.1



Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-11-04 Thread Akram Ahmad

On 31/10/2024 08:00, Richard Biener wrote:

On Wed, Oct 30, 2024 at 4:46 PM Akram Ahmad  wrote:

On 29/10/2024 12:48, Richard Biener wrote:

The testcases will FAIL unless the target has support for .SAT_ADD - you want to
add proper effective target tests here.

The match.pd part looks OK to me.

Richard.

Hi Richard,

I assume this also applies to the tests written for the SAT_SUB pattern
too in that case?

Yes, of course.


I've taken a look at the effective target definitions in 
target-supports.exp, but I can't
find anything relating to saturating arithmetic. I'm not sure if it's 
only aarch64 which
doesn't support this yet either, otherwise I would try and add a 
definition myself. Am
I missing any existing definitions that I can use for the 
dg-effective-target keyword?


Many thanks,

Akram



Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-10-30 Thread Akram Ahmad

On 29/10/2024 12:48, Richard Biener wrote:

On Mon, Oct 28, 2024 at 4:45 PM Akram Ahmad  wrote:

Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
  
  _1 = BIT_FIELD_REF ;
  sum_5 = _1 + a_4(D);
  if (a_4(D) <= sum_5)
goto ; [INV]
  else
goto ; [INV]

   :

   :
  _2 = PHI <255(3), sum_5(2)>
  return _2;

After:
[local count: 1073741824]:
   _1 = BIT_FIELD_REF ;
   _2 = .SAT_ADD (_1, a_4(D)); [tail call]
   return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures. The tests written in this patch will fail on targets which
do not implement the standard names for IFN SAT_ADD.

gcc/ChangeLog:

 * match.pd: Modify existing case for SAT_ADD.

gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test.
 * gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test.
---
  gcc/match.pd  |  4 ++--
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 21 +++
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..98c50ab097f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
 SAT_ADD = (X + Y) | -((X + Y) < X)  */
  (match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
+ (plus @0 @1)
   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
&& types_match (type, @0, @1

@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  /* Unsigned saturation add, case 7 (branch with le):
 SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
  (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep))

  /* Unsigned saturation add, case 8 (branch with gt):
 SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
new file mode 100644
index 000..0202c70cc83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */

The testcases will FAIL unless the target has support for .SAT_ADD - you want to
add proper effective target tests here.

The match.pd part looks OK to me.

Richard.


Hi Richard,

I assume this also applies to the tests written for the SAT_SUB pattern 
too in that case?


Many thanks,

Akram




\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
new file mode 100644
index 000..34c80ba3854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
new file mode 100644
index 000..0718cb566d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
@@ -0,0 +1,21 @@
+/* { dg-do

Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic

2024-10-23 Thread Akram Ahmad

On 23/10/2024 12:20, Richard Sandiford wrote:

Thanks for doing this.  The approach looks good.  My main question is:
are we sure that we want to use the Advanced SIMD instructions for
signed saturating SI and DI arithmetic on GPRs?  E.g. for addition,
we only saturate at the negative limit if both operands are negative,
and only saturate at the positive limit if both operands are positive.
So for 32-bit values we can use:

asr tmp, x or y, #31
eor tmp, tmp, #0x8000

to calculate the saturation value and:

addsres, x, y
cselres, tmp, res, vs

to calculate the full result.  That's the same number of instructions
as two fmovs for the inputs, the sqadd, and the fmov for the result,
but it should be more efficient.

The reason for asking now, rather than treating it as a potential
future improvement, is that it would also avoid splitting the patterns
for signed and unsigned ops.  (The length of the split alternative can be
conservatively set to 16 even for the unsigned version, since nothing
should care in practice.  The split will have happened before
shorten_branches.)


Hi Richard, thanks for looking over this.

I might be misunderstanding your suggestion, but is there a way to 
efficiently
check the signedness of the second operand (let's say 'y') if it is 
stored in
a register? This is a problem we considered and couldn't solve 
post-reload, as
we only have three registers (including two operands) to work with. (I 
might be
wrong in terms of how many registers we have available). AFAIK that's 
why we only

use adds, csinv / subs, csel in the unsigned case.

To illustrate the point better: consider signed X + Y where both operands
are in GPR. Without knowing the signedness of Y, for branchless code, we 
would

need to saturate at both the positive and negative limit and then perform a
comparison on Y to check the sign, selecting either saturating limit 
accordingly.
This of course doesn't apply if signed saturating 'addition' with a 
negative op2
is only required to saturate to the positive limit- nor does it apply if 
Y or

op2 is an immediate.

Otherwise, I agree that this should be fixed now rather than as a future
improvement.




gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
diff --git 
a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
new file mode 100644
index 000..63eb21e438b
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+/*
+** uadd_lane: { xfail *-*-* }

Just curious: why does this fail?  Is it a vector costing issue?

This is due to a missing pattern from match.pd- I've sent another patch
upstream to rectify this. In essence, this function exposes a commutative
form of an existing addition pattern, but that form isn't currently 
commutative
when it should be. It's a similar reason for why the uqsubs are also 
marked as

xfail, so that same patch series contains a fix for the uqsub case too.

Since the operands are commutative, and since there's no restriction
on the choice of destination register, it's probably safer to use:


+** uqadd\tv[0-9].16b, (?:v\1.16b, v\2.16b|v\2.16b

[PATCH v2 1/2] Match: support new case of unsigned scalar SAT_SUB

2024-10-28 Thread Akram Ahmad
This patch adds a new case for unsigned scalar saturating subtraction
using a branch with a greater-than-or-equal condition. For example,

X >= (X - Y) ? (X - Y) : 0

is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars,
which therefore correctly matches more cases of IFN SAT_SUB. New tests
are added to verify this behaviour on targets which use the standard
names for IFN SAT_SUB.

This passes the aarch64 regression tests with no additional failures.

gcc/ChangeLog:

* match.pd: Add new match for SAT_SUB.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c: New test.
---
 gcc/match.pd   |  8 
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c| 14 ++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c| 14 ++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c| 14 ++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c | 14 ++
 5 files changed, 64 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ee53c25cef9..4fc5efa6247 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* Unsigned saturation sub, case 11 (branch with ge):
+  SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (ge @0 (minus @0 @1))
+  (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
new file mode 100644
index 000..164719980c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
new file mode 100644
index 000..40a28c6092b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
new file mode 100644
index 000..5649858ef2a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
new file mode 100644
index 000..785e48b92ee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint8_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
-- 
2.34.1



[PATCH v2 0/2] Match: support additional cases of unsigned scalar arithmetic

2024-10-28 Thread Akram Ahmad
Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

Regression tests for aarch64-none-linux-gnu all pass with no failures.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
  the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series 
adds
  support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

Many thanks,

Akram

---

Akram Ahmad (2):
  Match: support new case of unsigned scalar SAT_SUB
  Match: make SAT_ADD case 7 commutative

 gcc/match.pd  | 12 +--
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 14 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 14 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 14 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 14 +
 9 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

-- 
2.34.1



[PATCH v2 2/2] Match: make SAT_ADD case 7 commutative

2024-10-28 Thread Akram Ahmad
Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
 
 _1 = BIT_FIELD_REF ;
 sum_5 = _1 + a_4(D);
 if (a_4(D) <= sum_5)
   goto ; [INV]
 else
   goto ; [INV]

  :

  :
 _2 = PHI <255(3), sum_5(2)>
 return _2;

After:
   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  _2 = .SAT_ADD (_1, a_4(D)); [tail call]
  return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures. The tests written in this patch will fail on targets which
do not implement the standard names for IFN SAT_ADD.

gcc/ChangeLog:

* match.pd: Modify existing case for SAT_ADD.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test.
---
 gcc/match.pd  |  4 ++--
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 21 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++
 5 files changed, 86 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..98c50ab097f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
SAT_ADD = (X + Y) | -((X + Y) < X)  */
 (match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
+ (plus @0 @1)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation add, case 7 (branch with le):
SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
 (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep))
 
 /* Unsigned saturation add, case 8 (branch with gt):
SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
new file mode 100644
index 000..0202c70cc83
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
new file mode 100644
index 000..34c80ba3854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
new file mode 100644
index 000..0718cb566d3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
new file mode 100644
index 00

Re: [PATCH 2/2] Match: make SAT_ADD case 7 commutative

2024-10-28 Thread Akram Ahmad

On 24/10/2024 16:06, Richard Biener wrote:

Can you check whether removing the :c from the (plus in
usadd_left_part_1 keeps things
working?


Hi Richard,

Thanks for the feedback. I've written some tests and can confirm that they
pass as expected with these two changes being made (removal of :c in
usadd_left_part_1, change :c to :C in form 7).

I've noticed a duplicate pattern warning for case 1 and 2 of saturating
subtraction, but I don't think that's related to my patch series, so I'll
send V2 to the mailing list imminently.

Many thanks once again,

Akram


Ping [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic

2024-11-28 Thread Akram Ahmad

Just pinging v2 of this patch series

On 14/11/2024 15:53, Akram Ahmad wrote:

Hi all,

This patch series introduces standard names for scalar, Adv. SIMD, and
SVE saturating arithmetic instructions in the aarch64 backend.

Additional tests are added for scalar saturating arithmetic, as well
as to test that the auto-vectorizer correctly inserts NEON instructions
or scalar instructions where necessary, such as in 32 and 64-bit scalar
unsigned arithmetic. There are also tests for the auto-vectorized SVE
code.

The biggest change from V1-V2 of this series is the optimisation for
signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in
the case of a constant and non-constant operand (immediate or GP reg
values respectively). This is only exhibited if early-ra is disabled
due to an early-ra bug which is assigning FP registers for operands
even if this would unnecessarily result in FMOV being used. This new
optimisation is tested by means of check-function-bodies as well as
an execution test.

As with v1 of this patch, the only new regression failures on aarch64
are to do with unsigned scalar intrinsics (32 and 64-bit) not using
the NEON instructions any more. Otherwise, there are no regressions.

SVE currently uses the unpredicated version of the instruction in the
backend.

v1 -> v2:
- Add new split for signed saturating arithmetic
- New test for signed saturating arithmetic
- Make addition tests accept commutative operands, other test fixes

Only the first patch in this series is updated in v2. The other
patch is already approved. If this is ok, could this be committed
for me please? I do not have commit rights.

Many thanks,

Akram

---

Akram Ahmad (2):
   aarch64: Use standard names for saturating arithmetic
   aarch64: Use standard names for SVE saturating arithmetic

  gcc/config/aarch64/aarch64-builtins.cc|  13 +
  gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
  gcc/config/aarch64/aarch64-simd.md| 209 ++-
  gcc/config/aarch64/aarch64-sve.md |   4 +-
  gcc/config/aarch64/arm_neon.h |  96 +++
  gcc/config/aarch64/iterators.md   |   4 +
  .../saturating_arithmetic_autovect.inc|  58 +
  .../saturating_arithmetic_autovect_1.c|  79 ++
  .../saturating_arithmetic_autovect_2.c|  79 ++
  .../saturating_arithmetic_autovect_3.c|  75 ++
  .../saturating_arithmetic_autovect_4.c|  77 ++
  .../aarch64/saturating-arithmetic-signed.c| 244 ++
  .../aarch64/saturating_arithmetic.inc |  39 +++
  .../aarch64/saturating_arithmetic_1.c |  36 +++
  .../aarch64/saturating_arithmetic_2.c |  36 +++
  .../aarch64/saturating_arithmetic_3.c |  30 +++
  .../aarch64/saturating_arithmetic_4.c |  30 +++
  .../aarch64/sve/saturating_arithmetic.inc |  68 +
  .../aarch64/sve/saturating_arithmetic_1.c |  60 +
  .../aarch64/sve/saturating_arithmetic_2.c |  60 +
  .../aarch64/sve/saturating_arithmetic_3.c |  62 +
  .../aarch64/sve/saturating_arithmetic_4.c |  62 +
  22 files changed, 1371 insertions(+), 58 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c



[PATCH v3 3/3] Match: make SAT_ADD case 7 commutative

2024-11-27 Thread Akram Ahmad
Case 7 of unsigned scalar saturating addition defines
SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as
SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1
being commutative.

The pattern for case 7 currently does not accept the alternative
where Y is used in the condition. Therefore, this commit adds the
commutative property to this case which causes more valid cases of
unsigned saturating arithmetic to be recognised.

Before:
 
 _1 = BIT_FIELD_REF ;
 sum_5 = _1 + a_4(D);
 if (a_4(D) <= sum_5)
   goto ; [INV]
 else
   goto ; [INV]

  :

  :
 _2 = PHI <255(3), sum_5(2)>
 return _2;

After:
   [local count: 1073741824]:
  _1 = BIT_FIELD_REF ;
  _2 = .SAT_ADD (_1, a_4(D)); [tail call]
  return _2;

This passes the aarch64-none-linux-gnu regression tests with no new
failures. The tests will be skipped on targets which do not support
IFN_SAT_ADD for each of these modes via dg-require-effective-target.

gcc/ChangeLog:

* match.pd: Modify existing case for SAT_ADD.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test.
---
 gcc/match.pd  |  4 ++--
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 +++
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 +++
 5 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 4fc5efa6247..98c50ab097f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka:
SAT_ADD = (X + Y) | -((X + Y) < X)  */
 (match (usadd_left_part_1 @0 @1)
- (plus:c @0 @1)
+ (plus @0 @1)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
@@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Unsigned saturation add, case 7 (branch with le):
SAT_ADD = x <= (X + Y) ? (X + Y) : -1.  */
 (match (unsigned_integer_sat_add @0 @1)
- (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep))
+ (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep))
 
 /* Unsigned saturation add, case 8 (branch with gt):
SAT_ADD = x > (X + Y) ? -1 : (X + Y).  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
new file mode 100644
index 000..866ce6cdbc1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_himode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
new file mode 100644
index 000..8f841c32852
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_simode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
new file mode 100644
index 000..39548d63384
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target usadd_dimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+#define UMAX (T) -1
+
+T sat_u_add_1 (T a, T b)
+{
+  T sum = a + b;
+  return sum < a ? UMAX : sum;
+}
+
+T sat_u_add_2 (T a, T b)
+{
+  T sum = a + b;
+  return sum < b ? UMAX : sum;
+}
+
+/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */
\ No newline at 

[PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic

2024-11-27 Thread Akram Ahmad
Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

v3 of this series now introduces support for dg-require-effective-target for 
both usadd
and ussub optabs as well as individual modes that these optabs may be 
implemented for.
aarch64 support for these optabs is in review, so there are currently no 
targets listed
in these effective-target options.

Regression tests for aarch64 all pass with no failures.

v3 changes:
- add support for new effective-target keywords.
- tests for the two new patterns now use the dg-require-effective-target so 
that they are
  skipped on relevant targets.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
  the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series 
adds
  support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

If this is OK for master, please commit these on my behalf, as I do not have 
the ability
to do so.

Many thanks,

Akram

---

Akram Ahmad (3):
  testsuite: Support dg-require-effective-target for us{add, sub}
  Match: support new case of unsigned scalar SAT_SUB
  Match: make SAT_ADD case 7 commutative

 gcc/match.pd  | 12 +++-
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 
 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +
 gcc/testsuite/lib/target-supports.exp | 56 +++
 10 files changed, 214 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

-- 
2.34.1



[PATCH v3 1/3] testsuite: Support dg-require-effective-target for us{add, sub}

2024-11-27 Thread Akram Ahmad
Support for middle-end representation of saturating arithmetic (via
IFN_SAT_ADD or IFN_SAT_SUB) cannot be determined externally, making it
currently impossible to selectively skip relevant tests on targets which
do not support this.

This patch adds new dg-require-effective-target keywords for each of the
unsigned saturating arithmetic optabs, for scalar QImode, HImode,
SImode, and DImode. These can then be used in future tests which focus
on these internal functions.

Currently passes aarch64 regression tests with no additional failures.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add new effective-target keywords
---
 gcc/testsuite/lib/target-supports.exp | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index d113a08dff7..ec1d73970a1 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4471,6 +4471,62 @@ proc check_effective_target_vect_complex_add_double { } {
}}]
 }
 
+# Return 1 if the target supports middle-end representation of saturating
+# addition for QImode, 0 otherwise.
+
+proc check_effective_target_usadd_qimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for HImode, 0 otherwise.
+
+proc check_effective_target_usadd_himode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for SImode, 0 otherwise.
+
+proc check_effective_target_usadd_simode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# addition for DImode, 0 otherwise.
+
+proc check_effective_target_usadd_dimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for QImode, 0 otherwise.
+
+proc check_effective_target_ussub_qimode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for HImode, 0 otherwise.
+
+proc check_effective_target_ussub_himode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for SImode, 0 otherwise.
+
+proc check_effective_target_ussub_simode { } {
+return 0
+}
+
+# Return 1 if the target supports middle-end representation of saturating
+# subtraction for DImode, 0 otherwise.
+
+proc check_effective_target_ussub_dimode { } {
+return 0
+}
+
 # Return 1 if the target supports signed int->float conversion
 #
 
-- 
2.34.1



[PATCH v3 2/3] Match: support new case of unsigned scalar SAT_SUB

2024-11-27 Thread Akram Ahmad
This patch adds a new case for unsigned scalar saturating subtraction
using a branch with a greater-than-or-equal condition. For example,

X >= (X - Y) ? (X - Y) : 0

is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars,
which therefore correctly matches more cases of IFN SAT_SUB. New tests
are added to verify this behaviour on targets which use the standard
names for IFN SAT_SUB, and the tests are skipped if the current target
does not support IFN_SAT_SUB for each of these modes (via
dg-require-effective-target).

This passes the aarch64 regression tests with no additional failures.

gcc/ChangeLog:

* match.pd: Add new match for SAT_SUB.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c: New test.
* gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c: New test.
---
 gcc/match.pd  |  8 
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +++
 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +++
 5 files changed, 68 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ee53c25cef9..4fc5efa6247 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (wi::eq_p (sum, wi::uhwi (0, precision)))
 
+/* Unsigned saturation sub, case 11 (branch with ge):
+  SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0.  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (cond^ (ge @0 (minus @0 @1))
+  (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop)
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1
+
 /* Signed saturation sub, case 1:
T minus = (T)((UT)X - (UT)Y);
SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
new file mode 100644
index 000..641fac50858
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_himode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint16_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
new file mode 100644
index 000..27f3bae7d52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_simode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint32_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
new file mode 100644
index 000..92883ce60c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_dimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint64_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
new file mode 100644
index 000..06ff91dbed0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target ussub_qimode } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+#define T uint8_t
+
+T sat_u_sub_1 (T a, T b)
+{
+  T sum = a - b;
+  return sum > a ? 0 : sum;
+}
+
+/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */
\ No newline at end of file
-- 
2.34.1



[PATCH 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-02 Thread Akram Ahmad
GIMPLE code which performs a narrowing truncation on the result of a
vector concatenation currently results in an unnecessary XTN being
emitted following a UZP1 to concate the operands. In cases such as this,
UZP1 should instead use a smaller arrangement specifier to replace the
XTN instruction. This is seen in cases such as in this GIMPLE example:

int32x2_t foo (svint64_t a, svint64_t b)
{
  vector(2) int vect__2.8;
  long int _1;
  long int _3;
  vector(2) long int _12;

   [local count: 1073741824]:
  _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D));
  _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D));
  _12 = {_1, _3};
  vect__2.8_13 = (vector(2) int) _12;
  return vect__2.8_13;

}

Original assembly generated:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret

This patch therefore defines the *aarch64_trunc_concat insn which
truncates the concatenation result, rather than concatenating the
truncated operands (such as in *aarch64_narrow_trunc), resulting
in the following optimised assembly being emitted:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2s, v0.2s, v1.2s
ret

This patch passes all regression tests on aarch64 with no new failures.
A supporting test for this optimisation is also written and passes.

OK for master? I do not have commit rights so I cannot push the patch
myself.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) new
  insn definition.
* config/aarch64/iterators.md: (VDQHSD_F): new mode iterator.
  (VTRUNCD): new mode attribute for truncated modes.
  (Vtruncd): new mode attribute for arrangement specifier.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/truncated_concatenation_1.c: new test
  for the above example and the int64x2 version of the above.
---
 gcc/config/aarch64/aarch64-simd.md| 16 ++
 gcc/config/aarch64/iterators.md   | 12 ++
 .../aarch64/sve/truncated_concatenation_1.c   | 22 +++
 3 files changed, 50 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index cfe95bd4c31..de3dd444ecd 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1872,6 +1872,22 @@
   [(set_attr "type" "neon_permute")]
 )
 
+(define_insn "*aarch64_trunc_concat"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (truncate:
+ (vec_concat:VDQHSD_F
+(match_operand: 1 "register_operand" "w")
+   (match_operand: 2 "register_operand" "w"]
+  "TARGET_SIMD"
+{
+  if (!BYTES_BIG_ENDIAN)
+return "uzp1\\t%0., %1., %2.";
+  else
+return "uzp1\\t%0., %2., %1.";
+}
+  [(set_attr "type" "neon_permute")]
+)
+
 ;; Packing doubles.
 
 (define_expand "vec_pack_trunc_"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d7cb27e1885..3b28b2fae0c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -290,6 +290,10 @@
 ;; Advanced SIMD modes for H, S and D types.
 (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI])
 
+;; Advanced SIMD modes that can be truncated whilst preserving
+;; the number of vector elements.
+(define_mode_iterator VDQHSD_F [V8HI V4SI V2DI V2SF V4SF V2DF])
+
 (define_mode_iterator VDQHSD_V1DI [VDQHSD V1DI])
 
 ;; Advanced SIMD and scalar integer modes for H and S.
@@ -1722,6 +1726,14 @@
 (define_mode_attr Vnarrowq2 [(V8HI "v16qi") (V4SI "v8hi")
 (V2DI "v4si")])
 
+;; Truncated Advanced SIMD modes which preserve the number of lanes.
+(define_mode_attr VTRUNCD [(V8HI "V8QI") (V4SI "V4HI")
+  (V2SF "V2HF") (V4SF "V4HF")
+  (V2DI "V2SI") (V2DF "V2SF")])
+(define_mode_attr Vtruncd [(V8HI "8b") (V4SI "4h")
+  (V2SF "2h") (V4SF "4h")
+  (V2DI "2s") (V2DF "2s")])
+
 ;; Narrowed modes of vector modes.
 (define_mode_attr VNARROW [(VNx8HI "VNx16QI")
   (VNx4SI "VNx8HI") (VNx4SF "VNx8HF")
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c
new file mode 100644
index 000..e0ad4209206
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */
+
+#include 
+#include 
+
+int32x2_t foo (svint64_t a, svint64_t 

[PATCH 0/1] aarch64: remove extra XTN in vector concatenation

2024-12-02 Thread Akram Ahmad
Hi all,

This patch adds a new insn which optimises vector concatenations on SIMD/FP
registers when a narrowing truncation is performed on the resulting vector.
This usually results in codegen such as...

uzp1v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret

... whereas the following would have sufficed without the need for XTN:

uzp1v0.2s, v0.2s, v1.2s
ret

A more rigorous example is provided in the commit message. This is a
fairly straightforward patch, although I would appreciate some feedback
as to whether the scope of the modes covered by the insn is appropriate.
Similarly, I would also appreciate any suggestions for other test cases
that should be covered for this optimisation.

Many thanks,

Akram

---

Akram Ahmad (1):
  aarch64: remove extra XTN in vector concatenation

 gcc/config/aarch64/aarch64-simd.md| 16 ++
 gcc/config/aarch64/iterators.md   | 12 ++
 .../aarch64/sve/truncated_concatenation_1.c   | 22 +++
 3 files changed, 50 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c

-- 
2.34.1



[PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic

2024-11-14 Thread Akram Ahmad
Hi all,

This patch series introduces standard names for scalar, Adv. SIMD, and
SVE saturating arithmetic instructions in the aarch64 backend.

Additional tests are added for scalar saturating arithmetic, as well
as to test that the auto-vectorizer correctly inserts NEON instructions
or scalar instructions where necessary, such as in 32 and 64-bit scalar
unsigned arithmetic. There are also tests for the auto-vectorized SVE
code.

The biggest change from V1-V2 of this series is the optimisation for
signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in
the case of a constant and non-constant operand (immediate or GP reg
values respectively). This is only exhibited if early-ra is disabled
due to an early-ra bug which is assigning FP registers for operands
even if this would unnecessarily result in FMOV being used. This new
optimisation is tested by means of check-function-bodies as well as
an execution test.

As with v1 of this patch, the only new regression failures on aarch64
are to do with unsigned scalar intrinsics (32 and 64-bit) not using
the NEON instructions any more. Otherwise, there are no regressions.

SVE currently uses the unpredicated version of the instruction in the
backend.

v1 -> v2:
- Add new split for signed saturating arithmetic
- New test for signed saturating arithmetic
- Make addition tests accept commutative operands, other test fixes

Only the first patch in this series is updated in v2. The other
patch is already approved. If this is ok, could this be committed
for me please? I do not have commit rights.

Many thanks,

Akram

---

Akram Ahmad (2):
  aarch64: Use standard names for saturating arithmetic
  aarch64: Use standard names for SVE saturating arithmetic

 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 209 ++-
 gcc/config/aarch64/aarch64-sve.md |   4 +-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 +
 .../saturating_arithmetic_autovect_1.c|  79 ++
 .../saturating_arithmetic_autovect_2.c|  79 ++
 .../saturating_arithmetic_autovect_3.c|  75 ++
 .../saturating_arithmetic_autovect_4.c|  77 ++
 .../aarch64/saturating-arithmetic-signed.c| 244 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 +++
 .../aarch64/saturating_arithmetic_4.c |  30 +++
 .../aarch64/sve/saturating_arithmetic.inc |  68 +
 .../aarch64/sve/saturating_arithmetic_1.c |  60 +
 .../aarch64/sve/saturating_arithmetic_2.c |  60 +
 .../aarch64/sve/saturating_arithmetic_3.c |  62 +
 .../aarch64/sve/saturating_arithmetic_4.c |  62 +
 22 files changed, 1371 insertions(+), 58 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c

-- 
2.34.1



[PATCH v2 2/2] aarch64: Use standard names for SVE saturating arithmetic

2024-11-14 Thread Akram Ahmad
Rename the existing SVE unpredicated saturating arithmetic instructions
to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB.

gcc/ChangeLog:

* config/aarch64/aarch64-sve.md: Rename insns

gcc/testsuite/ChangeLog:

* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc:
Template file for auto-vectorizer tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c:
Instantiate 8-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
Instantiate 16-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
Instantiate 32-bit vector tests.
* gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c
Instantiate 64-bit vector tests.
---
 gcc/config/aarch64/aarch64-sve.md |  4 +-
 .../aarch64/sve/saturating_arithmetic.inc | 68 +++
 .../aarch64/sve/saturating_arithmetic_1.c | 60 
 .../aarch64/sve/saturating_arithmetic_2.c | 60 
 .../aarch64/sve/saturating_arithmetic_3.c | 62 +
 .../aarch64/sve/saturating_arithmetic_4.c | 62 +
 6 files changed, 314 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c

diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 06bd3e4bb2c..b987b292b20 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -4379,7 +4379,7 @@
 ;; -
 
 ;; Unpredicated saturating signed addition and subtraction.
-(define_insn "@aarch64_sve_"
+(define_insn "s3"
   [(set (match_operand:SVE_FULL_I 0 "register_operand")
(SBINQOPS:SVE_FULL_I
  (match_operand:SVE_FULL_I 1 "register_operand")
@@ -4395,7 +4395,7 @@
 )
 
 ;; Unpredicated saturating unsigned addition and subtraction.
-(define_insn "@aarch64_sve_"
+(define_insn "s3"
   [(set (match_operand:SVE_FULL_I 0 "register_operand")
(UBINQOPS:SVE_FULL_I
  (match_operand:SVE_FULL_I 1 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc 
b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
new file mode 100644
index 000..0b3ebbcb0d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
@@ -0,0 +1,68 @@
+/* Template file for vector saturating arithmetic validation.
+
+   This file defines saturating addition and subtraction functions for a given
+   scalar type, testing the auto-vectorization of these two operators. This
+   type, along with the corresponding minimum and maximum values for that type,
+   must be defined by any test file which includes this template file.  */
+
+#ifndef SAT_ARIT_AUTOVEC_INC
+#define SAT_ARIT_AUTOVEC_INC
+
+#include 
+#include 
+
+#ifndef UT
+#define UT uint32_t
+#define UMAX UINT_MAX
+#define UMIN 0
+#endif
+
+void uaddq (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] + b[i];
+  out[i] = sum < a[i] ? UMAX : sum;
+}
+}
+
+void uaddq2 (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum;
+  if (!__builtin_add_overflow(a[i], b[i], &sum))
+   out[i] = sum;
+  else
+   out[i] = UMAX;
+}
+}
+
+void uaddq_imm (UT *out, UT *a, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] + 50;
+  out[i] = sum < a[i] ? UMAX : sum;
+}
+}
+
+void usubq (UT *out, UT *a, UT *b, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] - b[i];
+  out[i] = sum > a[i] ? UMIN : sum;
+}
+}
+
+void usubq_imm (UT *out, UT *a, int n)
+{
+  for (int i = 0; i < n; i++)
+{
+  UT sum = a[i] - 50;
+  out[i] = sum > a[i] ? UMIN : sum;
+}
+}
+
+#endif
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
new file mode 100644
index 000..6936e9a2704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** uaddq:
+** ...
+** ld1b\tz([0-9]+)\.b, .*
+** ld1b\tz([0-9]+)\.b, .*
+** uqadd\tz\2.b, z\1\.b, z\2\.b
+** ...
+** ldr\tb([0-9]+), .*
+** ldr\tb([0-9]+), .*
+** uqadd\tb\4, b\3, b\4
+** ...
+*/
+/*
+** uaddq

[PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-11-14 Thread Akram Ahmad
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Furthermore, this patch introduces a new optimisation for signed 32
and 64-bit scalar saturating arithmetic which uses adds/subs in place
of the NEON instruction.

Addition, before:
fmovd0, x0
fmovd1, x1
sqadd   d0, d0, d1
fmovx0, d0

Addition, after:
asr x2, x1, 63
addsx0, x0, x1
eor x2, x2, 0x8000
csinv   x0, x0, x2, vc

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere. The signed scalar case is also tested with
an execution test to check the results.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
* gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
---
 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 209 ++-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 +
 .../saturating_arithmetic_autovect_1.c|  79 ++
 .../saturating_arithmetic_autovect_2.c|  79 ++
 .../saturating_arithmetic_autovect_3.c|  75 ++
 .../saturating_arithmetic_autovect_4.c|  77 ++
 .../aarch64/saturating-arithmetic-signed.c| 244 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 +++
 .../aarch64/saturating_arithmetic_4.c |  30 +++
 16 files changed, 1057 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_auto

Re: [PATCH 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-03 Thread Akram Ahmad

Hi Kyrill, thanks for the very quick response!

On 02/12/2024 15:09, Kyrylo Tkachov wrote:

Thanks for the patch. As this is sent after the end of stage1 and is not 
finishing support for an architecture feature perhaps we should stage this for 
GCC 16.
But if it fixes a performance problem in a real app or, better yet, fixes a 
performance regression then we should consider it for this cycle.
Sorry, I should have specified in the cover letter that this was 
originally intended for GCC 16... although it would improve performance 
in some video codecs as this is where the issue was first raised.I'll 
try and find out a bit more about this if needed.

… The UZP1 instruction doesn’t accept .2h operands so I don’t think this 
pattern is valid for the V2SF value of VDQHSD_F
We should have tests for the various sizes that the new pattern covers.


Okay, I'll correct the modes and then write tests for the ones that remain.

Many thanks,
Akram


Ping [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic

2024-12-05 Thread Akram Ahmad

Ping


[PATCH v2 1/1] aarch64: remove extra XTN in vector concatenation

2024-12-05 Thread Akram Ahmad
GIMPLE code which performs a narrowing truncation on the result of a
vector concatenation currently results in an unnecessary XTN being
emitted following a UZP1 to concate the operands. In cases such as this,
UZP1 should instead use a smaller arrangement specifier to replace the
XTN instruction. This is seen in cases such as in this GIMPLE example:

int32x2_t foo (svint64_t a, svint64_t b)
{
  vector(2) int vect__2.8;
  long int _1;
  long int _3;
  vector(2) long int _12;

   [local count: 1073741824]:
  _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D));
  _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D));
  _12 = {_1, _3};
  vect__2.8_13 = (vector(2) int) _12;
  return vect__2.8_13;

}

Original assembly generated:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret

This patch therefore defines the *aarch64_trunc_concat insn which
truncates the concatenation result, rather than concatenating the
truncated operands (such as in *aarch64_narrow_trunc), resulting
in the following optimised assembly being emitted:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2s, v0.2s, v1.2s
ret

This patch passes all regression tests on aarch64 with no new failures.
A supporting test for this optimisation is also written and passes.

OK for master? I do not have commit rights so I cannot push the patch
myself.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat)
  (*aarch64_float_trunc_concat) new insn definitions.
* config/aarch64/iterators.md: (VQ_SDF): new mode iterator.
  (VTRUNCD): new mode attribute for truncated modes.
  (Vtruncd): new mode attribute for arrangement specifier.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/truncated_concatenation_1.c: new test
  for the above example and other modes covered by insn
  definitions.
---
 gcc/config/aarch64/aarch64-simd.md| 32 +
 gcc/config/aarch64/iterators.md   | 11 +
 .../aarch64/sve/truncated_concatenation_1.c   | 46 +++
 3 files changed, 89 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index cfe95bd4c31..90730960451 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1872,6 +1872,38 @@
   [(set_attr "type" "neon_permute")]
 )
 
+(define_insn "*aarch64_trunc_concat"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (truncate:
+ (vec_concat:VQN
+   (match_operand: 1 "register_operand" "w")
+   (match_operand: 2 "register_operand" "w"]
+  "TARGET_SIMD"
+{
+  if (!BYTES_BIG_ENDIAN)
+return "uzp1\\t%0., %1., %2.";
+  else
+return "uzp1\\t%0., %2., %1.";
+}
+  [(set_attr "type" "neon_permute")]
+)
+
+(define_insn "*aarch64_float_trunc_concat"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (float_truncate:
+ (vec_concat:VQ_SDF
+   (match_operand: 1 "register_operand" "w")
+   (match_operand: 2 "register_operand" "w"]
+  "TARGET_SIMD"
+{
+  if (!BYTES_BIG_ENDIAN)
+return "uzp1\\t%0., %1., %2.";
+  else
+return "uzp1\\t%0., %2., %1.";
+}
+  [(set_attr "type" "neon_permute")]
+)
+
 ;; Packing doubles.
 
 (define_expand "vec_pack_trunc_"
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index d7cb27e1885..008629ecf63 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -181,6 +181,9 @@
 ;; Advanced SIMD single Float modes.
 (define_mode_iterator VDQSF [V2SF V4SF])
 
+;; Quad vector Float modes with single and double elements.
+(define_mode_iterator VQ_SDF [V4SF V2DF])
+
 ;; Quad vector Float modes with half/single elements.
 (define_mode_iterator VQ_HSF [V8HF V4SF])
 
@@ -1722,6 +1725,14 @@
 (define_mode_attr Vnarrowq2 [(V8HI "v16qi") (V4SI "v8hi")
 (V2DI "v4si")])
 
+;; Truncated Advanced SIMD modes which preserve the number of lanes.
+(define_mode_attr VTRUNCD [(V8HI "V8QI") (V4SI "V4HI")
+  (V4SF "V4HF") (V2DI "V2SI")
+  (V2DF "V2SF")])
+(define_mode_attr Vtruncd [(V8HI "8b") (V4SI "4h")
+  (V4SF "4h") (V2DI "2s")
+  (V2DF "2s")])
+
 ;; Narrowed modes of vector modes.
 (define_mode_attr VNARROW [(VNx8HI "VNx16QI")
   (VNx4SI "VNx8HI") (VNx4SF "VNx8HF")
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c 
b/gcc/testsuite/g

[PATCH v2 0/1] aarch64: remove extra XTN in vector concatenation

2024-12-05 Thread Akram Ahmad
Hi all,

This is V2 of a patch which adds new insns which optimise vector concatenations
when a narrowing truncation is performed on the resulting vector. This is for
integer as well as floating-point vectors.

The aforementioned operation usually results in codegen such as...

uzp1v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret

... whereas the following would have sufficed without the need for XTN:

uzp1v0.2s, v0.2s, v1.2s
ret

A more rigorous example is provided in the commit message. The main changes from
V1 -> V2 are the removal of incorrect modes for UZP1, and adding a test for each
mode affected by the new insns. Furthermore, support for floating-point is 
added,
having accidentally been omitted from V1.

Best wishes,

Akram

---

Akram Ahmad (1):
  aarch64: remove extra XTN in vector concatenation

 gcc/config/aarch64/aarch64-simd.md| 32 +
 gcc/config/aarch64/iterators.md   | 11 +
 .../aarch64/sve/truncated_concatenation_1.c   | 46 +++
 3 files changed, 89 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c

-- 
2.34.1



Ping [PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic

2024-12-06 Thread Akram Ahmad

Ping

On 27/11/2024 20:27, Akram Ahmad wrote:

Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

v3 of this series now introduces support for dg-require-effective-target for 
both usadd
and ussub optabs as well as individual modes that these optabs may be 
implemented for.
aarch64 support for these optabs is in review, so there are currently no 
targets listed
in these effective-target options.

Regression tests for aarch64 all pass with no failures.

v3 changes:
- add support for new effective-target keywords.
- tests for the two new patterns now use the dg-require-effective-target so 
that they are
   skipped on relevant targets.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
   the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch 
series adds
   support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

If this is OK for master, please commit these on my behalf, as I do not have 
the ability
to do so.

Many thanks,

Akram

---

Akram Ahmad (3):
   testsuite: Support dg-require-effective-target for us{add, sub}
   Match: support new case of unsigned scalar SAT_SUB
   Match: make SAT_ADD case 7 commutative

  gcc/match.pd  | 12 +++-
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +
  gcc/testsuite/lib/target-supports.exp | 56 +++
  10 files changed, 214 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c



[PATCH v3] aarch64: remove extra XTN in vector concatenation

2025-01-06 Thread Akram Ahmad
Hi Richard,

Thanks for the feedback. I've copied in the resulting patch here- if
this is okay, please could it be committed on my behalf? The patch
continues below.

Many thanks,

Akram

---

GIMPLE code which performs a narrowing truncation on the result of a
vector concatenation currently results in an unnecessary XTN being
emitted following a UZP1 to concate the operands. In cases such as this,
UZP1 should instead use a smaller arrangement specifier to replace the
XTN instruction. This is seen in cases such as in this GIMPLE example:

int32x2_t foo (svint64_t a, svint64_t b)
{
  vector(2) int vect__2.8;
  long int _1;
  long int _3;
  vector(2) long int _12;

   [local count: 1073741824]:
  _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D));
  _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D));
  _12 = {_1, _3};
  vect__2.8_13 = (vector(2) int) _12;
  return vect__2.8_13;

}

Original assembly generated:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2d, v0.2d, v1.2d
xtn v0.2s, v0.2d
ret

This patch therefore defines the *aarch64_trunc_concat insn which
truncates the concatenation result, rather than concatenating the
truncated operands (such as in *aarch64_narrow_trunc), resulting
in the following optimised assembly being emitted:

bar:
ptrue   p3.b, all
uaddv   d0, p3, z0.d
uaddv   d1, p3, z1.d
uzp1v0.2s, v0.2s, v1.2s
ret

This patch passes all regression tests on aarch64 with no new failures.
A supporting test for this optimisation is also written and passes.

OK for master? I do not have commit rights so I cannot push the patch
myself.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat)
  new insn definition.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/truncated_concatenation_1.c: new test
  for the above example and other modes covered by insn
  definitions.
---
 gcc/config/aarch64/aarch64-simd.md| 16 ++
 .../aarch64/sve/truncated_concatenation_1.c   | 32 +++
 2 files changed, 48 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index cfe95bd4c31..6c129d6c4a8 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1872,6 +1872,22 @@
   [(set_attr "type" "neon_permute")]
 )
 
+(define_insn "*aarch64_trunc_concat"
+  [(set (match_operand: 0 "register_operand" "=w")
+   (truncate:
+ (vec_concat:VQN
+   (match_operand: 1 "register_operand" "w")
+   (match_operand: 2 "register_operand" "w"]
+  "TARGET_SIMD"
+{
+  if (!BYTES_BIG_ENDIAN)
+return "uzp1\\t%0., %1., %2.";
+  else
+return "uzp1\\t%0., %2., %1.";
+}
+  [(set_attr "type" "neon_permute")]
+)
+
 ;; Packing doubles.
 
 (define_expand "vec_pack_trunc_"
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c 
b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c
new file mode 100644
index 000..95577a1a9ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */
+
+#include 
+#include 
+
+int8x8_t f1 (int16x4_t a, int16x4_t b) {
+int8x8_t ab = vdup_n_s8 (0);
+int16x8_t ab_concat = vcombine_s16 (a, b);
+ab = vmovn_s16 (ab_concat);
+return ab;
+}
+
+int16x4_t f2 (int32x2_t a, int32x2_t b) {
+int16x4_t ab = vdup_n_s16 (0);
+int32x4_t ab_concat = vcombine_s32 (a, b);
+ab = vmovn_s32 (ab_concat);
+return ab;
+}
+
+int32x2_t f3 (svint64_t a, svint64_t b) {
+int32x2_t ab = vdup_n_s32 (0);
+ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), a), ab, 0);
+ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), b), ab, 1);
+return ab;
+}
+
+/* { dg-final { scan-assembler-not {\txtn\t} } }*/
+/* { dg-final { scan-assembler-not {\tfcvtn\t} } }*/
+/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.8b, v[0-9]+\.8b, 
v[0-9]+\.8b} 1 } }*/
+/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.4h, v[0-9]+\.4h, 
v[0-9]+\.4h} 1 } }*/
+/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.2s, v[0-9]+\.2s, 
v[0-9]+\.2s} 1 } }*/
\ No newline at end of file
-- 
2.34.1



Ping [PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic

2024-12-17 Thread Akram Ahmad

Pinging

On 27/11/2024 20:27, Akram Ahmad wrote:

Hi all,

This patch series adds support for 2 new cases of unsigned scalar saturating 
arithmetic
(one addition, one subtraction). This results in more valid patterns being 
recognised,
which results in a call to .SAT_ADD or .SAT_SUB where relevant.

v3 of this series now introduces support for dg-require-effective-target for 
both usadd
and ussub optabs as well as individual modes that these optabs may be 
implemented for.
aarch64 support for these optabs is in review, so there are currently no 
targets listed
in these effective-target options.

Regression tests for aarch64 all pass with no failures.

v3 changes:
- add support for new effective-target keywords.
- tests for the two new patterns now use the dg-require-effective-target so 
that they are
   skipped on relevant targets.

v2 changes:
- add new tests for both patterns (these will fail on targets which don't 
implement
   the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch 
series adds
   support for this in aarch64).
- minor adjustment to the constraints on the match statement for 
usadd_left_part_1.

If this is OK for master, please commit these on my behalf, as I do not have 
the ability
to do so.

Many thanks,

Akram

---

Akram Ahmad (3):
   testsuite: Support dg-require-effective-target for us{add, sub}
   Match: support new case of unsigned scalar SAT_SUB
   Match: make SAT_ADD case 7 commutative

  gcc/match.pd  | 12 +++-
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c   | 22 
  .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c   | 15 +
  .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +
  gcc/testsuite/lib/target-supports.exp | 56 +++
  10 files changed, 214 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c



Re: [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic

2024-12-17 Thread Akram Ahmad

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668794.html

On 14/11/2024 15:53, Akram Ahmad wrote:

Hi all,

This patch series introduces standard names for scalar, Adv. SIMD, and
SVE saturating arithmetic instructions in the aarch64 backend.

Additional tests are added for scalar saturating arithmetic, as well
as to test that the auto-vectorizer correctly inserts NEON instructions
or scalar instructions where necessary, such as in 32 and 64-bit scalar
unsigned arithmetic. There are also tests for the auto-vectorized SVE
code.

The biggest change from V1-V2 of this series is the optimisation for
signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in
the case of a constant and non-constant operand (immediate or GP reg
values respectively). This is only exhibited if early-ra is disabled
due to an early-ra bug which is assigning FP registers for operands
even if this would unnecessarily result in FMOV being used. This new
optimisation is tested by means of check-function-bodies as well as
an execution test.

As with v1 of this patch, the only new regression failures on aarch64
are to do with unsigned scalar intrinsics (32 and 64-bit) not using
the NEON instructions any more. Otherwise, there are no regressions.

SVE currently uses the unpredicated version of the instruction in the
backend.

v1 -> v2:
- Add new split for signed saturating arithmetic
- New test for signed saturating arithmetic
- Make addition tests accept commutative operands, other test fixes

Only the first patch in this series is updated in v2. The other
patch is already approved. If this is ok, could this be committed
for me please? I do not have commit rights.

Many thanks,

Akram

---

Akram Ahmad (2):
   aarch64: Use standard names for saturating arithmetic
   aarch64: Use standard names for SVE saturating arithmetic

  gcc/config/aarch64/aarch64-builtins.cc|  13 +
  gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
  gcc/config/aarch64/aarch64-simd.md| 209 ++-
  gcc/config/aarch64/aarch64-sve.md |   4 +-
  gcc/config/aarch64/arm_neon.h |  96 +++
  gcc/config/aarch64/iterators.md   |   4 +
  .../saturating_arithmetic_autovect.inc|  58 +
  .../saturating_arithmetic_autovect_1.c|  79 ++
  .../saturating_arithmetic_autovect_2.c|  79 ++
  .../saturating_arithmetic_autovect_3.c|  75 ++
  .../saturating_arithmetic_autovect_4.c|  77 ++
  .../aarch64/saturating-arithmetic-signed.c| 244 ++
  .../aarch64/saturating_arithmetic.inc |  39 +++
  .../aarch64/saturating_arithmetic_1.c |  36 +++
  .../aarch64/saturating_arithmetic_2.c |  36 +++
  .../aarch64/saturating_arithmetic_3.c |  30 +++
  .../aarch64/saturating_arithmetic_4.c |  30 +++
  .../aarch64/sve/saturating_arithmetic.inc |  68 +
  .../aarch64/sve/saturating_arithmetic_1.c |  60 +
  .../aarch64/sve/saturating_arithmetic_2.c |  60 +
  .../aarch64/sve/saturating_arithmetic_3.c |  62 +
  .../aarch64/sve/saturating_arithmetic_4.c |  62 +
  22 files changed, 1371 insertions(+), 58 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c
  create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c
  create mode 100644 
gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c



Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic

2024-12-18 Thread Akram Ahmad

Hi Kyrill,

On 17/12/2024 15:15, Kyrylo Tkachov wrote:

We avoid using the __builtin_aarch64_* builtins in test cases as they are 
undocumented and we don’t make any guarantees about their stability to users.
I’d prefer if the saturating operation was open-coded in C. I expect the midend 
machinery is smart enough to recognize the saturating logic for scalars by now?


Thanks for the detailed feedback. It's been really helpful, and I've 
gone ahead and implemented almost all of it. I'm struggling to find a 
pattern that's recognised for signed arithmetic though- the following 
emits branching code:


int64_t  __attribute__((noipa))
sadd64 (int64_t __a, int64_t __b)
{
  if (__a > 0) {
    if (__b > INT64_MAX - __a)
  return INT64_MAX;
  } else if (__b < INT64_MIN - __a) {
    return INT64_MIN;
  }
  return __a + __b;
}

Resulting assembly:

|sadd64: .LFB6: .cfi_startproc mov x3, x0 cmp x0, 0 ble .L9 mov x2, 
9223372036854775807 sub x4, x2, x0 mov x0, x2 cmp x4, x1 blt .L8 .L11: 
add x0, x3, x1 .L8: ret .p2align 2,,3 .L9: mov x2, -9223372036854775808 
sub x0, x2, x0 cmp x0, x1 ble .L11 mov x0, x2 ret Is there a way to 
force this not to use branches by any chance? I'll keep looking and see 
if there are some patterns recently added to match that will work here. 
If I don't find something, would it be sufficient to use the scalar NEON 
intrinsics for this? And if so, would that mean the test should move to 
the Adv. SIMD directory? Many thanks once again, Akram |


[PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-08 Thread Akram Ahmad
Hi Kyrill,

Thanks for the feedback on V2. I found a pattern which works for
the open-coded signed arithmetic, and I've implemented the other
feedback you provided as well.

I've send the modified patch in this thread as the SVE patch [2/2]
hasn't been changed, but I'm happy to send the entire V3 patch
series as a new thread if that's easier. Patch continues below.

If this is OK, please could you commit on my behalf?

Many thanks,

Akram

---

This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Furthermore, this patch introduces a new optimisation for signed 32
and 64-bit scalar saturating arithmetic which uses adds/subs in place
of the NEON instruction.

Addition, before:
fmovd0, x0
fmovd1, x1
sqadd   d0, d0, d1
fmovx0, d0

Addition, after:
asr x2, x1, 63
addsx0, x0, x1
eor x2, x2, 0x8000
csinv   x0, x0, x2, vc

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere. The signed scalar case is also tested with
an execution test to check the results.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
* gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
---
 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 218 +-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 
 .../saturating_arithmetic_autovect_1.c|  79 +
 .../saturating_arithmetic_autovect_2.c|  79 +
 .../saturating_arithmetic_autovect_3.c|  75 +
 .../saturating_arithmetic_autovect_4.c|  77 +
 .../aarch64/saturating-arithmetic-signed.c| 270 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 ++
 .../aarch64/saturating_arithmetic_4.c |  30 ++
 16 files changed, 1092 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithm

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad

Ah whoops- I didn't see this before sending off V4 just now, my apologies.
I'll try my best to get this implemented before the end of the day so that
it doesn't miss the deadline.

On 09/01/2025 23:04, Richard Sandiford wrote:

Akram Ahmad  writes:

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

This can be fixed by changing:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op))
return true;
  break;

to:

case CT_REGISTER:
  if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH)
return true;
  break;

But I can test & post that as a follow-up if you prefer.

Yes please, if that's not too much trouble- would that have to go into
another patch?

+
  ;; Double vector modes.
  (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF])
  
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c

new file mode 100644
index 000..2b72be7b0d7
--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
@@ -0,0 +1,79 @@
+/* { dg-do assemble { target { aarch64*-*-* } } } */
+/* { dg-options "-O2 --save-temps -ftree-vectorize" } */
+/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */
+
+/*
+** uadd_lane: { xfail *-*-* }
+** dup\tv([0-9]+).8b, w0
+** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1)
+** umov\tw0, v\2.b\[0\]
+** ret
+*/

Whats the reason behind the xfail?  Is it the early-ra thing, or
something else?  (You might already have covered this, sorry.)

xfailing is fine if it needs further optimisation, was just curious :)
This is because of a missing pattern in match.pd (I've sent another 
patch upstream
to add the missing pattern, although it may have gotten lost). Once that 
pattern is
added though, this should be recognised as .SAT_SUB, and the new 
instructions will

appear.

[...]
diff --git a/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c 
b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
new file mode 100644
index 000..0fc6804683a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c
@@ -0,0 +1,270 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps -mearly-ra=none" } */

It'd be worth adding -fno-schedule-insns2 here.  Same for
saturating_arithmetic_1.c and saturating_arithmetic_2.c.  The reason
is that:


+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+#include 
+#include 
+#include 
+
+/*
+** sadd32:
+** asr w([0-9]+), w1, 31
+** addsw([0-9]+), (?:w0, w1|w1, w0)
+** eor w\1, w\1, -2147483648
+** csinv   w0, w\2, w\1, vc
+** ret
+*/

...the first two instructions can be in either order, and similarly
for the second and third.

Really nice tests though :)


Thanks! That also makes a lot of sense, I was cautious of assuming the 
instructions would
always be in that exact order, so it's good to know I can try and 
specify that.




[PATCH v4 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad
Hi Kyrill,

Thanks for the very quick response! V4 of the patch can be found
below the line.

Best wishes,

Akram

---

This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.

The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.

Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.

Furthermore, this patch introduces a new optimisation for signed 32
and 64-bit scalar saturating arithmetic which uses adds/subs in place
of the NEON instruction.

Addition, before:
fmovd0, x0
fmovd1, x1
sqadd   d0, d0, d1
fmovx0, d0

Addition, after:
asr x2, x1, 63
addsx0, x0, x1
eor x2, x2, 0x8000
csinv   x0, x0, x2, vc

In the above example, subtraction replaces the adds with subs and the
csinv with csel. The 32-bit case follows the same approach. Arithmetic
with a constant operand is simplified further by directly storing the
saturating limit in the temporary register, resulting in only three
instructions being used. It is important to note that this only works
when early-ra is disabled due to an early-ra bug which erroneously
assigns FP registers to the operands; if early-ra is enabled, then the
original behaviour (NEON instruction) occurs.

Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere. The signed scalar case is also tested with
an execution test to check the results.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for scalar arithmetic.

gcc/testsuite/ChangeLog:

* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc:
Template file for unsigned vector saturating arithmetic tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c:
8-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c:
16-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c:
32-bit vector type tests.
* 
gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c:
64-bit vector type tests.
* gcc.target/aarch64/saturating_arithmetic.inc: Template file
for scalar saturating arithmetic tests.
* gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests.
* gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests.
* gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests.
* gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests.
* gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests.
---
 gcc/config/aarch64/aarch64-builtins.cc|  13 +
 gcc/config/aarch64/aarch64-simd-builtins.def  |   8 +-
 gcc/config/aarch64/aarch64-simd.md| 207 +-
 gcc/config/aarch64/arm_neon.h |  96 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../saturating_arithmetic_autovect.inc|  58 
 .../saturating_arithmetic_autovect_1.c|  79 +
 .../saturating_arithmetic_autovect_2.c|  79 +
 .../saturating_arithmetic_autovect_3.c|  75 +
 .../saturating_arithmetic_autovect_4.c|  77 +
 .../aarch64/saturating-arithmetic-signed.c| 270 ++
 .../aarch64/saturating_arithmetic.inc |  39 +++
 .../aarch64/saturating_arithmetic_1.c |  36 +++
 .../aarch64/saturating_arithmetic_2.c |  36 +++
 .../aarch64/saturating_arithmetic_3.c |  30 ++
 .../aarch64/saturating_arithmetic_4.c |  30 ++
 16 files changed, 1081 insertions(+), 56 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_

Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic

2025-01-10 Thread Akram Ahmad

On 09/01/2025 23:04, Richard Sandiford wrote:

+   gcc_assert (imm != 0);

The constraints do allow 0, so I'm not sure this assert is safe.
Certainly we shouldn't usually get unfolded instructions, but strange
things can happen with fuzzed options.

Does the code mishandle that case?  It looked like it should be ok.
I accidentally deleted my response when trimming down the quote text- I 
haven't tested this, but it came about from an offline discussion about 
the patch with a teammate. It should be fine without the assert, but 
I'll test it to make sure.