[PATCH 1/2] aarch64: Use standard names for saturating arithmetic
This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for unsigned scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. --- gcc/config/aarch64/aarch64-builtins.cc| 13 +++ gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 93 +- gcc/config/aarch64/arm_neon.h | 96 +-- gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 +++ .../saturating_arithmetic_autovect_1.c| 79 +++ .../saturating_arithmetic_autovect_2.c| 79 +++ .../saturating_arithmetic_autovect_3.c| 75 +++ .../saturating_arithmetic_autovect_4.c| 77 +++ .../aarch64/saturating_arithmetic.inc | 39 .../aarch64/saturating_arithmetic_1.c | 41 .../aarch64/saturating_arithmetic_2.c | 41 .../aarch64/saturating_arithmetic_3.c | 30 ++ .../aarch64/saturating_arithmetic_4.c | 30 ++ 15 files changed, 707 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc index 7d737877e0b..f2a1b6ddbf6 100644 --- a/gcc/config/aarch64/aarch64-builtins.cc +++ b/gcc/config/aarch64/aarch64-builtins.cc @@ -3849,6 +3849,19 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, gcall *stmt, new_stmt = gimple_build_assign (gimple_call_lhs (stmt), LSHIFT_EXPR, args[0], args[1]); break; + + /* lower saturating add/sub neon builtins to gimple. */ + BUILTIN_VSDQ_I (BINOP, ssadd, 3, NONE) + BUILTIN_VSDQ_I (BINOPU, usadd, 3, NONE) + new_stmt = gimple_build_call_internal (IFN_SAT_ADD, 2, args[0], args[1])
[PATCH 2/2] aarch64: Use standard names for SVE saturating arithmetic
Rename the existing SVE unpredicated saturating arithmetic instructions to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: Rename insns gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc: Template file for auto-vectorizer tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c: Instantiate 8-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c: Instantiate 16-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c: Instantiate 32-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c: Instantiate 64-bit vector tests. --- gcc/config/aarch64/aarch64-sve.md | 4 +- .../aarch64/sve/saturating_arithmetic.inc | 68 +++ .../aarch64/sve/saturating_arithmetic_1.c | 60 .../aarch64/sve/saturating_arithmetic_2.c | 60 .../aarch64/sve/saturating_arithmetic_3.c | 62 + .../aarch64/sve/saturating_arithmetic_4.c | 62 + 6 files changed, 314 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 06bd3e4bb2c..b987b292b20 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -4379,7 +4379,7 @@ ;; - ;; Unpredicated saturating signed addition and subtraction. -(define_insn "@aarch64_sve_" +(define_insn "s3" [(set (match_operand:SVE_FULL_I 0 "register_operand") (SBINQOPS:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") @@ -4395,7 +4395,7 @@ ) ;; Unpredicated saturating unsigned addition and subtraction. -(define_insn "@aarch64_sve_" +(define_insn "s3" [(set (match_operand:SVE_FULL_I 0 "register_operand") (UBINQOPS:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc new file mode 100644 index 000..0b3ebbcb0d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc @@ -0,0 +1,68 @@ +/* Template file for vector saturating arithmetic validation. + + This file defines saturating addition and subtraction functions for a given + scalar type, testing the auto-vectorization of these two operators. This + type, along with the corresponding minimum and maximum values for that type, + must be defined by any test file which includes this template file. */ + +#ifndef SAT_ARIT_AUTOVEC_INC +#define SAT_ARIT_AUTOVEC_INC + +#include +#include + +#ifndef UT +#define UT uint32_t +#define UMAX UINT_MAX +#define UMIN 0 +#endif + +void uaddq (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] + b[i]; + out[i] = sum < a[i] ? UMAX : sum; +} +} + +void uaddq2 (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum; + if (!__builtin_add_overflow(a[i], b[i], &sum)) + out[i] = sum; + else + out[i] = UMAX; +} +} + +void uaddq_imm (UT *out, UT *a, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] + 50; + out[i] = sum < a[i] ? UMAX : sum; +} +} + +void usubq (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] - b[i]; + out[i] = sum > a[i] ? UMIN : sum; +} +} + +void usubq_imm (UT *out, UT *a, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] - 50; + out[i] = sum > a[i] ? UMIN : sum; +} +} + +#endif \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c new file mode 100644 index 000..6936e9a2704 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c @@ -0,0 +1,60 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** uaddq: +** ... +** ld1b\tz([0-9]+)\.b, .* +** ld1b\tz([0-9]+)\.b, .* +** uqadd\tz\2.b, z\1\.b, z\2\.b +** ... +** ldr\tb([0-9]+), .* +** ldr\tb([0-9]+), .* +** uqadd\tb\4, b\3, b\4 +** ... +*/ +/* +** ua
[PATCH 0/2] aarch64: Use standard names for saturating arithmetic
Hi all, This patch series introduces standard names for scalar, Adv. SIMD, and SVE saturating arithmetic instructions in the aarch64 backend. Additional tests are added for unsigned saturating arithmetic, as well as to test that the auto-vectorizer correctly inserts NEON instructions or scalar instructions where necessary, such as in 32 and 64-bit scalar unsigned arithmetic. There are also tests for the auto-vectorized SVE code. An important discussion point: this patch causes scalar 32 and 64-bit unsigned saturating arithmetic to now use adds, csinv / subs, csel as is expected elsewhere in the backend. This affects the NEON intrinsics for these two modes as well. This is the cause of a few test failures, otherwise there are no regressions on aarch64-none-linux-gnu. SVE currently uses the unpredicated version of the instruction in the backend. Many thanks, Akram --- Akram Ahmad (2): aarch64: Use standard names for saturating arithmetic aarch64: Use standard names for SVE saturating arithmetic gcc/config/aarch64/aarch64-builtins.cc| 13 +++ gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 93 +- gcc/config/aarch64/aarch64-sve.md | 4 +- gcc/config/aarch64/arm_neon.h | 96 +-- gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 +++ .../saturating_arithmetic_autovect_1.c| 79 +++ .../saturating_arithmetic_autovect_2.c| 79 +++ .../saturating_arithmetic_autovect_3.c| 75 +++ .../saturating_arithmetic_autovect_4.c| 77 +++ .../aarch64/saturating_arithmetic.inc | 39 .../aarch64/saturating_arithmetic_1.c | 41 .../aarch64/saturating_arithmetic_2.c | 41 .../aarch64/saturating_arithmetic_3.c | 30 ++ .../aarch64/saturating_arithmetic_4.c | 30 ++ .../aarch64/sve/saturating_arithmetic.inc | 68 + .../aarch64/sve/saturating_arithmetic_1.c | 60 .../aarch64/sve/saturating_arithmetic_2.c | 60 .../aarch64/sve/saturating_arithmetic_3.c | 62 .../aarch64/sve/saturating_arithmetic_4.c | 62 21 files changed, 1021 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c -- 2.34.1
[PATCH 2/2] Match: make SAT_ADD case 7 commutative
Case 7 of unsigned scalar saturating addition defines SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1 being commutative. The pattern for case 7 currently does not accept the alternative where Y is used in the condition. Therefore, this commit adds the commutative property to this case which causes more valid cases of unsigned saturating arithmetic to be recognised. Before: _1 = BIT_FIELD_REF ; sum_5 = _1 + a_4(D); if (a_4(D) <= sum_5) goto ; [INV] else goto ; [INV] : : _2 = PHI <255(3), sum_5(2)> return _2; After: [local count: 1073741824]: _1 = BIT_FIELD_REF ; _2 = .SAT_ADD (_1, a_4(D)); [tail call] return _2; This passes the aarch64-none-linux-gnu regression tests with no new failures. gcc/ChangeLog: * match.pd: Modify existing case for SAT_ADD. --- gcc/match.pd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/match.pd b/gcc/match.pd index 4fc5efa6247..a77fca92181 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Unsigned saturation add, case 7 (branch with le): SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ (match (unsigned_integer_sat_add @0 @1) - (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) + (cond^ (le @0 (usadd_left_part_1:c@2 @0 @1)) @2 integer_minus_onep)) /* Unsigned saturation add, case 8 (branch with gt): SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ -- 2.34.1
[PATCH 0/2] Match: support additional cases of unsigned scalar arithmetic
Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. Regression tests for aarch64-none-linux-gnu all pass with no failures. Many thanks, Akram --- Akram Ahmad (2): Match: support new case of unsigned scalar SAT_SUB Match: make SAT_ADD case 7 commutative gcc/match.pd | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) -- 2.34.1
[PATCH 1/2] Match: support new case of unsigned scalar SAT_SUB
This patch adds a new case for unsigned scalar saturating subtraction using a branch with a greater-than-or-equal condition. For example, X >= (X - Y) ? (X - Y) : 0 is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars, which therefore correctly matches more cases of IFN SAT_SUB. This passes the aarch64-none-linux-gnu regression tests with no failures. gcc/ChangeLog: * match.pd: Add new match for SAT_SUB. --- gcc/match.pd | 8 1 file changed, 8 insertions(+) diff --git a/gcc/match.pd b/gcc/match.pd index ee53c25cef9..4fc5efa6247 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (wi::eq_p (sum, wi::uhwi (0, precision))) +/* Unsigned saturation sub, case 11 (branch with ge): + SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0. */ +(match (unsigned_integer_sat_sub @0 @1) + (cond^ (ge @0 (minus @0 @1)) + (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop) + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1 + /* Signed saturation sub, case 1: T minus = (T)((UT)X - (UT)Y); SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; -- 2.34.1
Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative
On 31/10/2024 08:00, Richard Biener wrote: On Wed, Oct 30, 2024 at 4:46 PM Akram Ahmad wrote: On 29/10/2024 12:48, Richard Biener wrote: The testcases will FAIL unless the target has support for .SAT_ADD - you want to add proper effective target tests here. The match.pd part looks OK to me. Richard. Hi Richard, I assume this also applies to the tests written for the SAT_SUB pattern too in that case? Yes, of course. I've taken a look at the effective target definitions in target-supports.exp, but I can't find anything relating to saturating arithmetic. I'm not sure if it's only aarch64 which doesn't support this yet either, otherwise I would try and add a definition myself. Am I missing any existing definitions that I can use for the dg-effective-target keyword? Many thanks, Akram
Re: [PATCH v2 2/2] Match: make SAT_ADD case 7 commutative
On 29/10/2024 12:48, Richard Biener wrote: On Mon, Oct 28, 2024 at 4:45 PM Akram Ahmad wrote: Case 7 of unsigned scalar saturating addition defines SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1 being commutative. The pattern for case 7 currently does not accept the alternative where Y is used in the condition. Therefore, this commit adds the commutative property to this case which causes more valid cases of unsigned saturating arithmetic to be recognised. Before: _1 = BIT_FIELD_REF ; sum_5 = _1 + a_4(D); if (a_4(D) <= sum_5) goto ; [INV] else goto ; [INV] : : _2 = PHI <255(3), sum_5(2)> return _2; After: [local count: 1073741824]: _1 = BIT_FIELD_REF ; _2 = .SAT_ADD (_1, a_4(D)); [tail call] return _2; This passes the aarch64-none-linux-gnu regression tests with no new failures. The tests written in this patch will fail on targets which do not implement the standard names for IFN SAT_ADD. gcc/ChangeLog: * match.pd: Modify existing case for SAT_ADD. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test. --- gcc/match.pd | 4 ++-- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++ 5 files changed, 86 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c diff --git a/gcc/match.pd b/gcc/match.pd index 4fc5efa6247..98c50ab097f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka: SAT_ADD = (X + Y) | -((X + Y) < X) */ (match (usadd_left_part_1 @0 @1) - (plus:c @0 @1) + (plus @0 @1) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1 @@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Unsigned saturation add, case 7 (branch with le): SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ (match (unsigned_integer_sat_add @0 @1) - (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) + (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep)) /* Unsigned saturation add, case 8 (branch with gt): SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c new file mode 100644 index 000..0202c70cc83 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint16_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ The testcases will FAIL unless the target has support for .SAT_ADD - you want to add proper effective target tests here. The match.pd part looks OK to me. Richard. Hi Richard, I assume this also applies to the tests written for the SAT_SUB pattern too in that case? Many thanks, Akram \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c new file mode 100644 index 000..34c80ba3854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint32_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c new file mode 100644 index 000..0718cb566d3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c @@ -0,0 +1,21 @@ +/* { dg-do
Re: [PATCH 1/2] aarch64: Use standard names for saturating arithmetic
On 23/10/2024 12:20, Richard Sandiford wrote: Thanks for doing this. The approach looks good. My main question is: are we sure that we want to use the Advanced SIMD instructions for signed saturating SI and DI arithmetic on GPRs? E.g. for addition, we only saturate at the negative limit if both operands are negative, and only saturate at the positive limit if both operands are positive. So for 32-bit values we can use: asr tmp, x or y, #31 eor tmp, tmp, #0x8000 to calculate the saturation value and: addsres, x, y cselres, tmp, res, vs to calculate the full result. That's the same number of instructions as two fmovs for the inputs, the sqadd, and the fmov for the result, but it should be more efficient. The reason for asking now, rather than treating it as a potential future improvement, is that it would also avoid splitting the patterns for signed and unsigned ops. (The length of the split alternative can be conservatively set to 16 even for the unsigned version, since nothing should care in practice. The split will have happened before shorten_branches.) Hi Richard, thanks for looking over this. I might be misunderstanding your suggestion, but is there a way to efficiently check the signedness of the second operand (let's say 'y') if it is stored in a register? This is a problem we considered and couldn't solve post-reload, as we only have three registers (including two operands) to work with. (I might be wrong in terms of how many registers we have available). AFAIK that's why we only use adds, csinv / subs, csel in the unsigned case. To illustrate the point better: consider signed X + Y where both operands are in GPR. Without knowing the signedness of Y, for branchless code, we would need to saturate at both the positive and negative limit and then perform a comparison on Y to check the sign, selecting either saturating limit accordingly. This of course doesn't apply if signed saturating 'addition' with a negative op2 is only required to saturate to the positive limit- nor does it apply if Y or op2 is an immediate. Otherwise, I agree that this should be fixed now rather than as a future improvement. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for unsigned scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c new file mode 100644 index 000..63eb21e438b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c @@ -0,0 +1,79 @@ +/* { dg-do assemble { target { aarch64*-*-* } } } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ + +/* +** uadd_lane: { xfail *-*-* } Just curious: why does this fail? Is it a vector costing issue? This is due to a missing pattern from match.pd- I've sent another patch upstream to rectify this. In essence, this function exposes a commutative form of an existing addition pattern, but that form isn't currently commutative when it should be. It's a similar reason for why the uqsubs are also marked as xfail, so that same patch series contains a fix for the uqsub case too. Since the operands are commutative, and since there's no restriction on the choice of destination register, it's probably safer to use: +** uqadd\tv[0-9].16b, (?:v\1.16b, v\2.16b|v\2.16b
[PATCH v2 1/2] Match: support new case of unsigned scalar SAT_SUB
This patch adds a new case for unsigned scalar saturating subtraction using a branch with a greater-than-or-equal condition. For example, X >= (X - Y) ? (X - Y) : 0 is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars, which therefore correctly matches more cases of IFN SAT_SUB. New tests are added to verify this behaviour on targets which use the standard names for IFN SAT_SUB. This passes the aarch64 regression tests with no additional failures. gcc/ChangeLog: * match.pd: Add new match for SAT_SUB. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c: New test. --- gcc/match.pd | 8 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c| 14 ++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c| 14 ++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c| 14 ++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c | 14 ++ 5 files changed, 64 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c diff --git a/gcc/match.pd b/gcc/match.pd index ee53c25cef9..4fc5efa6247 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (wi::eq_p (sum, wi::uhwi (0, precision))) +/* Unsigned saturation sub, case 11 (branch with ge): + SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0. */ +(match (unsigned_integer_sat_sub @0 @1) + (cond^ (ge @0 (minus @0 @1)) + (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop) + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1 + /* Signed saturation sub, case 1: T minus = (T)((UT)X - (UT)Y); SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c new file mode 100644 index 000..164719980c3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint16_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c new file mode 100644 index 000..40a28c6092b --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint32_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c new file mode 100644 index 000..5649858ef2a --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint64_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c new file mode 100644 index 000..785e48b92ee --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint8_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file -- 2.34.1
[PATCH v2 0/2] Match: support additional cases of unsigned scalar arithmetic
Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. Regression tests for aarch64-none-linux-gnu all pass with no failures. v2 changes: - add new tests for both patterns (these will fail on targets which don't implement the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series adds support for this in aarch64). - minor adjustment to the constraints on the match statement for usadd_left_part_1. Many thanks, Akram --- Akram Ahmad (2): Match: support new case of unsigned scalar SAT_SUB Match: make SAT_ADD case 7 commutative gcc/match.pd | 12 +-- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c | 14 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c | 14 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c | 14 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 14 + 9 files changed, 150 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c -- 2.34.1
[PATCH v2 2/2] Match: make SAT_ADD case 7 commutative
Case 7 of unsigned scalar saturating addition defines SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1 being commutative. The pattern for case 7 currently does not accept the alternative where Y is used in the condition. Therefore, this commit adds the commutative property to this case which causes more valid cases of unsigned saturating arithmetic to be recognised. Before: _1 = BIT_FIELD_REF ; sum_5 = _1 + a_4(D); if (a_4(D) <= sum_5) goto ; [INV] else goto ; [INV] : : _2 = PHI <255(3), sum_5(2)> return _2; After: [local count: 1073741824]: _1 = BIT_FIELD_REF ; _2 = .SAT_ADD (_1, a_4(D)); [tail call] return _2; This passes the aarch64-none-linux-gnu regression tests with no new failures. The tests written in this patch will fail on targets which do not implement the standard names for IFN SAT_ADD. gcc/ChangeLog: * match.pd: Modify existing case for SAT_ADD. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test. --- gcc/match.pd | 4 ++-- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 21 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 21 +++ 5 files changed, 86 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c diff --git a/gcc/match.pd b/gcc/match.pd index 4fc5efa6247..98c50ab097f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka: SAT_ADD = (X + Y) | -((X + Y) < X) */ (match (usadd_left_part_1 @0 @1) - (plus:c @0 @1) + (plus @0 @1) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1 @@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Unsigned saturation add, case 7 (branch with le): SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ (match (unsigned_integer_sat_add @0 @1) - (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) + (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep)) /* Unsigned saturation add, case 8 (branch with gt): SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c new file mode 100644 index 000..0202c70cc83 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint16_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c new file mode 100644 index 000..34c80ba3854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint32_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c new file mode 100644 index 000..0718cb566d3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint64_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c new file mode 100644 index 00
Re: [PATCH 2/2] Match: make SAT_ADD case 7 commutative
On 24/10/2024 16:06, Richard Biener wrote: Can you check whether removing the :c from the (plus in usadd_left_part_1 keeps things working? Hi Richard, Thanks for the feedback. I've written some tests and can confirm that they pass as expected with these two changes being made (removal of :c in usadd_left_part_1, change :c to :C in form 7). I've noticed a duplicate pattern warning for case 1 and 2 of saturating subtraction, but I don't think that's related to my patch series, so I'll send V2 to the mailing list imminently. Many thanks once again, Akram
Ping [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic
Just pinging v2 of this patch series On 14/11/2024 15:53, Akram Ahmad wrote: Hi all, This patch series introduces standard names for scalar, Adv. SIMD, and SVE saturating arithmetic instructions in the aarch64 backend. Additional tests are added for scalar saturating arithmetic, as well as to test that the auto-vectorizer correctly inserts NEON instructions or scalar instructions where necessary, such as in 32 and 64-bit scalar unsigned arithmetic. There are also tests for the auto-vectorized SVE code. The biggest change from V1-V2 of this series is the optimisation for signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in the case of a constant and non-constant operand (immediate or GP reg values respectively). This is only exhibited if early-ra is disabled due to an early-ra bug which is assigning FP registers for operands even if this would unnecessarily result in FMOV being used. This new optimisation is tested by means of check-function-bodies as well as an execution test. As with v1 of this patch, the only new regression failures on aarch64 are to do with unsigned scalar intrinsics (32 and 64-bit) not using the NEON instructions any more. Otherwise, there are no regressions. SVE currently uses the unpredicated version of the instruction in the backend. v1 -> v2: - Add new split for signed saturating arithmetic - New test for signed saturating arithmetic - Make addition tests accept commutative operands, other test fixes Only the first patch in this series is updated in v2. The other patch is already approved. If this is ok, could this be committed for me please? I do not have commit rights. Many thanks, Akram --- Akram Ahmad (2): aarch64: Use standard names for saturating arithmetic aarch64: Use standard names for SVE saturating arithmetic gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 209 ++- gcc/config/aarch64/aarch64-sve.md | 4 +- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 + .../saturating_arithmetic_autovect_1.c| 79 ++ .../saturating_arithmetic_autovect_2.c| 79 ++ .../saturating_arithmetic_autovect_3.c| 75 ++ .../saturating_arithmetic_autovect_4.c| 77 ++ .../aarch64/saturating-arithmetic-signed.c| 244 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 +++ .../aarch64/saturating_arithmetic_4.c | 30 +++ .../aarch64/sve/saturating_arithmetic.inc | 68 + .../aarch64/sve/saturating_arithmetic_1.c | 60 + .../aarch64/sve/saturating_arithmetic_2.c | 60 + .../aarch64/sve/saturating_arithmetic_3.c | 62 + .../aarch64/sve/saturating_arithmetic_4.c | 62 + 22 files changed, 1371 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c
[PATCH v3 3/3] Match: make SAT_ADD case 7 commutative
Case 7 of unsigned scalar saturating addition defines SAT_ADD = X <= (X + Y) ? (X + Y) : -1. This is the same as SAT_ADD = Y <= (X + Y) ? (X + Y) : -1 due to usadd_left_part_1 being commutative. The pattern for case 7 currently does not accept the alternative where Y is used in the condition. Therefore, this commit adds the commutative property to this case which causes more valid cases of unsigned saturating arithmetic to be recognised. Before: _1 = BIT_FIELD_REF ; sum_5 = _1 + a_4(D); if (a_4(D) <= sum_5) goto ; [INV] else goto ; [INV] : : _2 = PHI <255(3), sum_5(2)> return _2; After: [local count: 1073741824]: _1 = BIT_FIELD_REF ; _2 = .SAT_ADD (_1, a_4(D)); [tail call] return _2; This passes the aarch64-none-linux-gnu regression tests with no new failures. The tests will be skipped on targets which do not support IFN_SAT_ADD for each of these modes via dg-require-effective-target. gcc/ChangeLog: * match.pd: Modify existing case for SAT_ADD. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sat-u-add-match-1-u16.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u32.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u64.c: New test. * gcc.dg/tree-ssa/sat-u-add-match-1-u8.c: New test. --- gcc/match.pd | 4 ++-- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 22 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 22 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 22 +++ .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 +++ 5 files changed, 90 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c diff --git a/gcc/match.pd b/gcc/match.pd index 4fc5efa6247..98c50ab097f 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3085,7 +3085,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* SAT_ADD = usadd_left_part_1 | usadd_right_part_1, aka: SAT_ADD = (X + Y) | -((X + Y) < X) */ (match (usadd_left_part_1 @0 @1) - (plus:c @0 @1) + (plus @0 @1) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && types_match (type, @0, @1 @@ -3166,7 +3166,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* Unsigned saturation add, case 7 (branch with le): SAT_ADD = x <= (X + Y) ? (X + Y) : -1. */ (match (unsigned_integer_sat_add @0 @1) - (cond^ (le @0 (usadd_left_part_1@2 @0 @1)) @2 integer_minus_onep)) + (cond^ (le @0 (usadd_left_part_1:C@2 @0 @1)) @2 integer_minus_onep)) /* Unsigned saturation add, case 8 (branch with gt): SAT_ADD = x > (X + Y) ? -1 : (X + Y). */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c new file mode 100644 index 000..866ce6cdbc1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target usadd_himode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint16_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c new file mode 100644 index 000..8f841c32852 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target usadd_simode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint32_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c new file mode 100644 index 000..39548d63384 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target usadd_dimode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint64_t +#define UMAX (T) -1 + +T sat_u_add_1 (T a, T b) +{ + T sum = a + b; + return sum < a ? UMAX : sum; +} + +T sat_u_add_2 (T a, T b) +{ + T sum = a + b; + return sum < b ? UMAX : sum; +} + +/* { dg-final { scan-tree-dump-times " .SAT_ADD " 2 "optimized" } } */ \ No newline at
[PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic
Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. v3 of this series now introduces support for dg-require-effective-target for both usadd and ussub optabs as well as individual modes that these optabs may be implemented for. aarch64 support for these optabs is in review, so there are currently no targets listed in these effective-target options. Regression tests for aarch64 all pass with no failures. v3 changes: - add support for new effective-target keywords. - tests for the two new patterns now use the dg-require-effective-target so that they are skipped on relevant targets. v2 changes: - add new tests for both patterns (these will fail on targets which don't implement the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series adds support for this in aarch64). - minor adjustment to the constraints on the match statement for usadd_left_part_1. If this is OK for master, please commit these on my behalf, as I do not have the ability to do so. Many thanks, Akram --- Akram Ahmad (3): testsuite: Support dg-require-effective-target for us{add, sub} Match: support new case of unsigned scalar SAT_SUB Match: make SAT_ADD case 7 commutative gcc/match.pd | 12 +++- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 + gcc/testsuite/lib/target-supports.exp | 56 +++ 10 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c -- 2.34.1
[PATCH v3 1/3] testsuite: Support dg-require-effective-target for us{add, sub}
Support for middle-end representation of saturating arithmetic (via IFN_SAT_ADD or IFN_SAT_SUB) cannot be determined externally, making it currently impossible to selectively skip relevant tests on targets which do not support this. This patch adds new dg-require-effective-target keywords for each of the unsigned saturating arithmetic optabs, for scalar QImode, HImode, SImode, and DImode. These can then be used in future tests which focus on these internal functions. Currently passes aarch64 regression tests with no additional failures. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add new effective-target keywords --- gcc/testsuite/lib/target-supports.exp | 56 +++ 1 file changed, 56 insertions(+) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d113a08dff7..ec1d73970a1 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4471,6 +4471,62 @@ proc check_effective_target_vect_complex_add_double { } { }}] } +# Return 1 if the target supports middle-end representation of saturating +# addition for QImode, 0 otherwise. + +proc check_effective_target_usadd_qimode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# addition for HImode, 0 otherwise. + +proc check_effective_target_usadd_himode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# addition for SImode, 0 otherwise. + +proc check_effective_target_usadd_simode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# addition for DImode, 0 otherwise. + +proc check_effective_target_usadd_dimode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# subtraction for QImode, 0 otherwise. + +proc check_effective_target_ussub_qimode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# subtraction for HImode, 0 otherwise. + +proc check_effective_target_ussub_himode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# subtraction for SImode, 0 otherwise. + +proc check_effective_target_ussub_simode { } { +return 0 +} + +# Return 1 if the target supports middle-end representation of saturating +# subtraction for DImode, 0 otherwise. + +proc check_effective_target_ussub_dimode { } { +return 0 +} + # Return 1 if the target supports signed int->float conversion # -- 2.34.1
[PATCH v3 2/3] Match: support new case of unsigned scalar SAT_SUB
This patch adds a new case for unsigned scalar saturating subtraction using a branch with a greater-than-or-equal condition. For example, X >= (X - Y) ? (X - Y) : 0 is transformed into SAT_SUB (X, Y) when X and Y are unsigned scalars, which therefore correctly matches more cases of IFN SAT_SUB. New tests are added to verify this behaviour on targets which use the standard names for IFN SAT_SUB, and the tests are skipped if the current target does not support IFN_SAT_SUB for each of these modes (via dg-require-effective-target). This passes the aarch64 regression tests with no additional failures. gcc/ChangeLog: * match.pd: Add new match for SAT_SUB. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c: New test. * gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c: New test. --- gcc/match.pd | 8 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c | 15 +++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c | 15 +++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c | 15 +++ .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 +++ 5 files changed, 68 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c diff --git a/gcc/match.pd b/gcc/match.pd index ee53c25cef9..4fc5efa6247 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -3360,6 +3360,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) } (if (wi::eq_p (sum, wi::uhwi (0, precision))) +/* Unsigned saturation sub, case 11 (branch with ge): + SAT_U_SUB = X >= (X - Y) ? (X - Y) : 0. */ +(match (unsigned_integer_sat_sub @0 @1) + (cond^ (ge @0 (minus @0 @1)) + (convert? (minus (convert1? @0) (convert1? @1))) integer_zerop) + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) && types_match (@0, @1 + /* Signed saturation sub, case 1: T minus = (T)((UT)X - (UT)Y); SAT_S_SUB = (X ^ Y) & (X ^ minus) < 0 ? (-(T)(X < 0) ^ MAX) : minus; diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c new file mode 100644 index 000..641fac50858 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ussub_himode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint16_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c new file mode 100644 index 000..27f3bae7d52 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ussub_simode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint32_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c new file mode 100644 index 000..92883ce60c7 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ussub_dimode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint64_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c new file mode 100644 index 000..06ff91dbed0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ussub_qimode } */ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +#include + +#define T uint8_t + +T sat_u_sub_1 (T a, T b) +{ + T sum = a - b; + return sum > a ? 0 : sum; +} + +/* { dg-final { scan-tree-dump " .SAT_SUB " "optimized" } } */ \ No newline at end of file -- 2.34.1
[PATCH 1/1] aarch64: remove extra XTN in vector concatenation
GIMPLE code which performs a narrowing truncation on the result of a vector concatenation currently results in an unnecessary XTN being emitted following a UZP1 to concate the operands. In cases such as this, UZP1 should instead use a smaller arrangement specifier to replace the XTN instruction. This is seen in cases such as in this GIMPLE example: int32x2_t foo (svint64_t a, svint64_t b) { vector(2) int vect__2.8; long int _1; long int _3; vector(2) long int _12; [local count: 1073741824]: _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D)); _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D)); _12 = {_1, _3}; vect__2.8_13 = (vector(2) int) _12; return vect__2.8_13; } Original assembly generated: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret This patch therefore defines the *aarch64_trunc_concat insn which truncates the concatenation result, rather than concatenating the truncated operands (such as in *aarch64_narrow_trunc), resulting in the following optimised assembly being emitted: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2s, v0.2s, v1.2s ret This patch passes all regression tests on aarch64 with no new failures. A supporting test for this optimisation is also written and passes. OK for master? I do not have commit rights so I cannot push the patch myself. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) new insn definition. * config/aarch64/iterators.md: (VDQHSD_F): new mode iterator. (VTRUNCD): new mode attribute for truncated modes. (Vtruncd): new mode attribute for arrangement specifier. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/truncated_concatenation_1.c: new test for the above example and the int64x2 version of the above. --- gcc/config/aarch64/aarch64-simd.md| 16 ++ gcc/config/aarch64/iterators.md | 12 ++ .../aarch64/sve/truncated_concatenation_1.c | 22 +++ 3 files changed, 50 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cfe95bd4c31..de3dd444ecd 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1872,6 +1872,22 @@ [(set_attr "type" "neon_permute")] ) +(define_insn "*aarch64_trunc_concat" + [(set (match_operand: 0 "register_operand" "=w") + (truncate: + (vec_concat:VDQHSD_F +(match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w"] + "TARGET_SIMD" +{ + if (!BYTES_BIG_ENDIAN) +return "uzp1\\t%0., %1., %2."; + else +return "uzp1\\t%0., %2., %1."; +} + [(set_attr "type" "neon_permute")] +) + ;; Packing doubles. (define_expand "vec_pack_trunc_" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d7cb27e1885..3b28b2fae0c 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -290,6 +290,10 @@ ;; Advanced SIMD modes for H, S and D types. (define_mode_iterator VDQHSD [V4HI V8HI V2SI V4SI V2DI]) +;; Advanced SIMD modes that can be truncated whilst preserving +;; the number of vector elements. +(define_mode_iterator VDQHSD_F [V8HI V4SI V2DI V2SF V4SF V2DF]) + (define_mode_iterator VDQHSD_V1DI [VDQHSD V1DI]) ;; Advanced SIMD and scalar integer modes for H and S. @@ -1722,6 +1726,14 @@ (define_mode_attr Vnarrowq2 [(V8HI "v16qi") (V4SI "v8hi") (V2DI "v4si")]) +;; Truncated Advanced SIMD modes which preserve the number of lanes. +(define_mode_attr VTRUNCD [(V8HI "V8QI") (V4SI "V4HI") + (V2SF "V2HF") (V4SF "V4HF") + (V2DI "V2SI") (V2DF "V2SF")]) +(define_mode_attr Vtruncd [(V8HI "8b") (V4SI "4h") + (V2SF "2h") (V4SF "4h") + (V2DI "2s") (V2DF "2s")]) + ;; Narrowed modes of vector modes. (define_mode_attr VNARROW [(VNx8HI "VNx16QI") (VNx4SI "VNx8HI") (VNx4SF "VNx8HF") diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c new file mode 100644 index 000..e0ad4209206 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */ + +#include +#include + +int32x2_t foo (svint64_t a, svint64_t
[PATCH 0/1] aarch64: remove extra XTN in vector concatenation
Hi all, This patch adds a new insn which optimises vector concatenations on SIMD/FP registers when a narrowing truncation is performed on the resulting vector. This usually results in codegen such as... uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret ... whereas the following would have sufficed without the need for XTN: uzp1v0.2s, v0.2s, v1.2s ret A more rigorous example is provided in the commit message. This is a fairly straightforward patch, although I would appreciate some feedback as to whether the scope of the modes covered by the insn is appropriate. Similarly, I would also appreciate any suggestions for other test cases that should be covered for this optimisation. Many thanks, Akram --- Akram Ahmad (1): aarch64: remove extra XTN in vector concatenation gcc/config/aarch64/aarch64-simd.md| 16 ++ gcc/config/aarch64/iterators.md | 12 ++ .../aarch64/sve/truncated_concatenation_1.c | 22 +++ 3 files changed, 50 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c -- 2.34.1
[PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic
Hi all, This patch series introduces standard names for scalar, Adv. SIMD, and SVE saturating arithmetic instructions in the aarch64 backend. Additional tests are added for scalar saturating arithmetic, as well as to test that the auto-vectorizer correctly inserts NEON instructions or scalar instructions where necessary, such as in 32 and 64-bit scalar unsigned arithmetic. There are also tests for the auto-vectorized SVE code. The biggest change from V1-V2 of this series is the optimisation for signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in the case of a constant and non-constant operand (immediate or GP reg values respectively). This is only exhibited if early-ra is disabled due to an early-ra bug which is assigning FP registers for operands even if this would unnecessarily result in FMOV being used. This new optimisation is tested by means of check-function-bodies as well as an execution test. As with v1 of this patch, the only new regression failures on aarch64 are to do with unsigned scalar intrinsics (32 and 64-bit) not using the NEON instructions any more. Otherwise, there are no regressions. SVE currently uses the unpredicated version of the instruction in the backend. v1 -> v2: - Add new split for signed saturating arithmetic - New test for signed saturating arithmetic - Make addition tests accept commutative operands, other test fixes Only the first patch in this series is updated in v2. The other patch is already approved. If this is ok, could this be committed for me please? I do not have commit rights. Many thanks, Akram --- Akram Ahmad (2): aarch64: Use standard names for saturating arithmetic aarch64: Use standard names for SVE saturating arithmetic gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 209 ++- gcc/config/aarch64/aarch64-sve.md | 4 +- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 + .../saturating_arithmetic_autovect_1.c| 79 ++ .../saturating_arithmetic_autovect_2.c| 79 ++ .../saturating_arithmetic_autovect_3.c| 75 ++ .../saturating_arithmetic_autovect_4.c| 77 ++ .../aarch64/saturating-arithmetic-signed.c| 244 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 +++ .../aarch64/saturating_arithmetic_4.c | 30 +++ .../aarch64/sve/saturating_arithmetic.inc | 68 + .../aarch64/sve/saturating_arithmetic_1.c | 60 + .../aarch64/sve/saturating_arithmetic_2.c | 60 + .../aarch64/sve/saturating_arithmetic_3.c | 62 + .../aarch64/sve/saturating_arithmetic_4.c | 62 + 22 files changed, 1371 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c -- 2.34.1
[PATCH v2 2/2] aarch64: Use standard names for SVE saturating arithmetic
Rename the existing SVE unpredicated saturating arithmetic instructions to use standard names which are used by IFN_SAT_ADD and IFN_SAT_SUB. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: Rename insns gcc/testsuite/ChangeLog: * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc: Template file for auto-vectorizer tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c: Instantiate 8-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c Instantiate 16-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c Instantiate 32-bit vector tests. * gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c Instantiate 64-bit vector tests. --- gcc/config/aarch64/aarch64-sve.md | 4 +- .../aarch64/sve/saturating_arithmetic.inc | 68 +++ .../aarch64/sve/saturating_arithmetic_1.c | 60 .../aarch64/sve/saturating_arithmetic_2.c | 60 .../aarch64/sve/saturating_arithmetic_3.c | 62 + .../aarch64/sve/saturating_arithmetic_4.c | 62 + 6 files changed, 314 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md index 06bd3e4bb2c..b987b292b20 100644 --- a/gcc/config/aarch64/aarch64-sve.md +++ b/gcc/config/aarch64/aarch64-sve.md @@ -4379,7 +4379,7 @@ ;; - ;; Unpredicated saturating signed addition and subtraction. -(define_insn "@aarch64_sve_" +(define_insn "s3" [(set (match_operand:SVE_FULL_I 0 "register_operand") (SBINQOPS:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") @@ -4395,7 +4395,7 @@ ) ;; Unpredicated saturating unsigned addition and subtraction. -(define_insn "@aarch64_sve_" +(define_insn "s3" [(set (match_operand:SVE_FULL_I 0 "register_operand") (UBINQOPS:SVE_FULL_I (match_operand:SVE_FULL_I 1 "register_operand") diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc new file mode 100644 index 000..0b3ebbcb0d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc @@ -0,0 +1,68 @@ +/* Template file for vector saturating arithmetic validation. + + This file defines saturating addition and subtraction functions for a given + scalar type, testing the auto-vectorization of these two operators. This + type, along with the corresponding minimum and maximum values for that type, + must be defined by any test file which includes this template file. */ + +#ifndef SAT_ARIT_AUTOVEC_INC +#define SAT_ARIT_AUTOVEC_INC + +#include +#include + +#ifndef UT +#define UT uint32_t +#define UMAX UINT_MAX +#define UMIN 0 +#endif + +void uaddq (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] + b[i]; + out[i] = sum < a[i] ? UMAX : sum; +} +} + +void uaddq2 (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum; + if (!__builtin_add_overflow(a[i], b[i], &sum)) + out[i] = sum; + else + out[i] = UMAX; +} +} + +void uaddq_imm (UT *out, UT *a, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] + 50; + out[i] = sum < a[i] ? UMAX : sum; +} +} + +void usubq (UT *out, UT *a, UT *b, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] - b[i]; + out[i] = sum > a[i] ? UMIN : sum; +} +} + +void usubq_imm (UT *out, UT *a, int n) +{ + for (int i = 0; i < n; i++) +{ + UT sum = a[i] - 50; + out[i] = sum > a[i] ? UMIN : sum; +} +} + +#endif \ No newline at end of file diff --git a/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c new file mode 100644 index 000..6936e9a2704 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c @@ -0,0 +1,60 @@ +/* { dg-do compile { target { aarch64*-*-* } } } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** uaddq: +** ... +** ld1b\tz([0-9]+)\.b, .* +** ld1b\tz([0-9]+)\.b, .* +** uqadd\tz\2.b, z\1\.b, z\2\.b +** ... +** ldr\tb([0-9]+), .* +** ldr\tb([0-9]+), .* +** uqadd\tb\4, b\3, b\4 +** ... +*/ +/* +** uaddq
[PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic
This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Furthermore, this patch introduces a new optimisation for signed 32 and 64-bit scalar saturating arithmetic which uses adds/subs in place of the NEON instruction. Addition, before: fmovd0, x0 fmovd1, x1 sqadd d0, d0, d1 fmovx0, d0 Addition, after: asr x2, x1, 63 addsx0, x0, x1 eor x2, x2, 0x8000 csinv x0, x0, x2, vc In the above example, subtraction replaces the adds with subs and the csinv with csel. The 32-bit case follows the same approach. Arithmetic with a constant operand is simplified further by directly storing the saturating limit in the temporary register, resulting in only three instructions being used. It is important to note that this only works when early-ra is disabled due to an early-ra bug which erroneously assigns FP registers to the operands; if early-ra is enabled, then the original behaviour (NEON instruction) occurs. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. The signed scalar case is also tested with an execution test to check the results. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. * gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests. --- gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 209 ++- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 + .../saturating_arithmetic_autovect_1.c| 79 ++ .../saturating_arithmetic_autovect_2.c| 79 ++ .../saturating_arithmetic_autovect_3.c| 75 ++ .../saturating_arithmetic_autovect_4.c| 77 ++ .../aarch64/saturating-arithmetic-signed.c| 244 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 +++ .../aarch64/saturating_arithmetic_4.c | 30 +++ 16 files changed, 1057 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_auto
Re: [PATCH 1/1] aarch64: remove extra XTN in vector concatenation
Hi Kyrill, thanks for the very quick response! On 02/12/2024 15:09, Kyrylo Tkachov wrote: Thanks for the patch. As this is sent after the end of stage1 and is not finishing support for an architecture feature perhaps we should stage this for GCC 16. But if it fixes a performance problem in a real app or, better yet, fixes a performance regression then we should consider it for this cycle. Sorry, I should have specified in the cover letter that this was originally intended for GCC 16... although it would improve performance in some video codecs as this is where the issue was first raised.I'll try and find out a bit more about this if needed. … The UZP1 instruction doesn’t accept .2h operands so I don’t think this pattern is valid for the V2SF value of VDQHSD_F We should have tests for the various sizes that the new pattern covers. Okay, I'll correct the modes and then write tests for the ones that remain. Many thanks, Akram
Ping [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic
Ping
[PATCH v2 1/1] aarch64: remove extra XTN in vector concatenation
GIMPLE code which performs a narrowing truncation on the result of a vector concatenation currently results in an unnecessary XTN being emitted following a UZP1 to concate the operands. In cases such as this, UZP1 should instead use a smaller arrangement specifier to replace the XTN instruction. This is seen in cases such as in this GIMPLE example: int32x2_t foo (svint64_t a, svint64_t b) { vector(2) int vect__2.8; long int _1; long int _3; vector(2) long int _12; [local count: 1073741824]: _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D)); _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D)); _12 = {_1, _3}; vect__2.8_13 = (vector(2) int) _12; return vect__2.8_13; } Original assembly generated: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret This patch therefore defines the *aarch64_trunc_concat insn which truncates the concatenation result, rather than concatenating the truncated operands (such as in *aarch64_narrow_trunc), resulting in the following optimised assembly being emitted: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2s, v0.2s, v1.2s ret This patch passes all regression tests on aarch64 with no new failures. A supporting test for this optimisation is also written and passes. OK for master? I do not have commit rights so I cannot push the patch myself. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) (*aarch64_float_trunc_concat) new insn definitions. * config/aarch64/iterators.md: (VQ_SDF): new mode iterator. (VTRUNCD): new mode attribute for truncated modes. (Vtruncd): new mode attribute for arrangement specifier. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/truncated_concatenation_1.c: new test for the above example and other modes covered by insn definitions. --- gcc/config/aarch64/aarch64-simd.md| 32 + gcc/config/aarch64/iterators.md | 11 + .../aarch64/sve/truncated_concatenation_1.c | 46 +++ 3 files changed, 89 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cfe95bd4c31..90730960451 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1872,6 +1872,38 @@ [(set_attr "type" "neon_permute")] ) +(define_insn "*aarch64_trunc_concat" + [(set (match_operand: 0 "register_operand" "=w") + (truncate: + (vec_concat:VQN + (match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w"] + "TARGET_SIMD" +{ + if (!BYTES_BIG_ENDIAN) +return "uzp1\\t%0., %1., %2."; + else +return "uzp1\\t%0., %2., %1."; +} + [(set_attr "type" "neon_permute")] +) + +(define_insn "*aarch64_float_trunc_concat" + [(set (match_operand: 0 "register_operand" "=w") + (float_truncate: + (vec_concat:VQ_SDF + (match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w"] + "TARGET_SIMD" +{ + if (!BYTES_BIG_ENDIAN) +return "uzp1\\t%0., %1., %2."; + else +return "uzp1\\t%0., %2., %1."; +} + [(set_attr "type" "neon_permute")] +) + ;; Packing doubles. (define_expand "vec_pack_trunc_" diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index d7cb27e1885..008629ecf63 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -181,6 +181,9 @@ ;; Advanced SIMD single Float modes. (define_mode_iterator VDQSF [V2SF V4SF]) +;; Quad vector Float modes with single and double elements. +(define_mode_iterator VQ_SDF [V4SF V2DF]) + ;; Quad vector Float modes with half/single elements. (define_mode_iterator VQ_HSF [V8HF V4SF]) @@ -1722,6 +1725,14 @@ (define_mode_attr Vnarrowq2 [(V8HI "v16qi") (V4SI "v8hi") (V2DI "v4si")]) +;; Truncated Advanced SIMD modes which preserve the number of lanes. +(define_mode_attr VTRUNCD [(V8HI "V8QI") (V4SI "V4HI") + (V4SF "V4HF") (V2DI "V2SI") + (V2DF "V2SF")]) +(define_mode_attr Vtruncd [(V8HI "8b") (V4SI "4h") + (V4SF "4h") (V2DI "2s") + (V2DF "2s")]) + ;; Narrowed modes of vector modes. (define_mode_attr VNARROW [(VNx8HI "VNx16QI") (VNx4SI "VNx8HI") (VNx4SF "VNx8HF") diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c b/gcc/testsuite/g
[PATCH v2 0/1] aarch64: remove extra XTN in vector concatenation
Hi all, This is V2 of a patch which adds new insns which optimise vector concatenations when a narrowing truncation is performed on the resulting vector. This is for integer as well as floating-point vectors. The aforementioned operation usually results in codegen such as... uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret ... whereas the following would have sufficed without the need for XTN: uzp1v0.2s, v0.2s, v1.2s ret A more rigorous example is provided in the commit message. The main changes from V1 -> V2 are the removal of incorrect modes for UZP1, and adding a test for each mode affected by the new insns. Furthermore, support for floating-point is added, having accidentally been omitted from V1. Best wishes, Akram --- Akram Ahmad (1): aarch64: remove extra XTN in vector concatenation gcc/config/aarch64/aarch64-simd.md| 32 + gcc/config/aarch64/iterators.md | 11 + .../aarch64/sve/truncated_concatenation_1.c | 46 +++ 3 files changed, 89 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c -- 2.34.1
Ping [PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic
Ping On 27/11/2024 20:27, Akram Ahmad wrote: Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. v3 of this series now introduces support for dg-require-effective-target for both usadd and ussub optabs as well as individual modes that these optabs may be implemented for. aarch64 support for these optabs is in review, so there are currently no targets listed in these effective-target options. Regression tests for aarch64 all pass with no failures. v3 changes: - add support for new effective-target keywords. - tests for the two new patterns now use the dg-require-effective-target so that they are skipped on relevant targets. v2 changes: - add new tests for both patterns (these will fail on targets which don't implement the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series adds support for this in aarch64). - minor adjustment to the constraints on the match statement for usadd_left_part_1. If this is OK for master, please commit these on my behalf, as I do not have the ability to do so. Many thanks, Akram --- Akram Ahmad (3): testsuite: Support dg-require-effective-target for us{add, sub} Match: support new case of unsigned scalar SAT_SUB Match: make SAT_ADD case 7 commutative gcc/match.pd | 12 +++- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 + gcc/testsuite/lib/target-supports.exp | 56 +++ 10 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
[PATCH v3] aarch64: remove extra XTN in vector concatenation
Hi Richard, Thanks for the feedback. I've copied in the resulting patch here- if this is okay, please could it be committed on my behalf? The patch continues below. Many thanks, Akram --- GIMPLE code which performs a narrowing truncation on the result of a vector concatenation currently results in an unnecessary XTN being emitted following a UZP1 to concate the operands. In cases such as this, UZP1 should instead use a smaller arrangement specifier to replace the XTN instruction. This is seen in cases such as in this GIMPLE example: int32x2_t foo (svint64_t a, svint64_t b) { vector(2) int vect__2.8; long int _1; long int _3; vector(2) long int _12; [local count: 1073741824]: _1 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, a_6(D)); _3 = svaddv_s64 ({ -1, 0, 0, 0, 0, 0, 0, 0, ... }, b_7(D)); _12 = {_1, _3}; vect__2.8_13 = (vector(2) int) _12; return vect__2.8_13; } Original assembly generated: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2d, v0.2d, v1.2d xtn v0.2s, v0.2d ret This patch therefore defines the *aarch64_trunc_concat insn which truncates the concatenation result, rather than concatenating the truncated operands (such as in *aarch64_narrow_trunc), resulting in the following optimised assembly being emitted: bar: ptrue p3.b, all uaddv d0, p3, z0.d uaddv d1, p3, z1.d uzp1v0.2s, v0.2s, v1.2s ret This patch passes all regression tests on aarch64 with no new failures. A supporting test for this optimisation is also written and passes. OK for master? I do not have commit rights so I cannot push the patch myself. gcc/ChangeLog: * config/aarch64/aarch64-simd.md: (*aarch64_trunc_concat) new insn definition. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/truncated_concatenation_1.c: new test for the above example and other modes covered by insn definitions. --- gcc/config/aarch64/aarch64-simd.md| 16 ++ .../aarch64/sve/truncated_concatenation_1.c | 32 +++ 2 files changed, 48 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index cfe95bd4c31..6c129d6c4a8 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1872,6 +1872,22 @@ [(set_attr "type" "neon_permute")] ) +(define_insn "*aarch64_trunc_concat" + [(set (match_operand: 0 "register_operand" "=w") + (truncate: + (vec_concat:VQN + (match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w"] + "TARGET_SIMD" +{ + if (!BYTES_BIG_ENDIAN) +return "uzp1\\t%0., %1., %2."; + else +return "uzp1\\t%0., %2., %1."; +} + [(set_attr "type" "neon_permute")] +) + ;; Packing doubles. (define_expand "vec_pack_trunc_" diff --git a/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c new file mode 100644 index 000..95577a1a9ef --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/truncated_concatenation_1.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -Wall -march=armv8.2-a+sve" } */ + +#include +#include + +int8x8_t f1 (int16x4_t a, int16x4_t b) { +int8x8_t ab = vdup_n_s8 (0); +int16x8_t ab_concat = vcombine_s16 (a, b); +ab = vmovn_s16 (ab_concat); +return ab; +} + +int16x4_t f2 (int32x2_t a, int32x2_t b) { +int16x4_t ab = vdup_n_s16 (0); +int32x4_t ab_concat = vcombine_s32 (a, b); +ab = vmovn_s32 (ab_concat); +return ab; +} + +int32x2_t f3 (svint64_t a, svint64_t b) { +int32x2_t ab = vdup_n_s32 (0); +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), a), ab, 0); +ab = vset_lane_s32 ((int)svaddv_s64 (svptrue_b64 (), b), ab, 1); +return ab; +} + +/* { dg-final { scan-assembler-not {\txtn\t} } }*/ +/* { dg-final { scan-assembler-not {\tfcvtn\t} } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.8b, v[0-9]+\.8b, v[0-9]+\.8b} 1 } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.4h, v[0-9]+\.4h, v[0-9]+\.4h} 1 } }*/ +/* { dg-final { scan-assembler-times {\tuzp1\tv[0-9]+\.2s, v[0-9]+\.2s, v[0-9]+\.2s} 1 } }*/ \ No newline at end of file -- 2.34.1
Ping [PATCH v3 0/3] Match: support additional cases of unsigned scalar arithmetic
Pinging On 27/11/2024 20:27, Akram Ahmad wrote: Hi all, This patch series adds support for 2 new cases of unsigned scalar saturating arithmetic (one addition, one subtraction). This results in more valid patterns being recognised, which results in a call to .SAT_ADD or .SAT_SUB where relevant. v3 of this series now introduces support for dg-require-effective-target for both usadd and ussub optabs as well as individual modes that these optabs may be implemented for. aarch64 support for these optabs is in review, so there are currently no targets listed in these effective-target options. Regression tests for aarch64 all pass with no failures. v3 changes: - add support for new effective-target keywords. - tests for the two new patterns now use the dg-require-effective-target so that they are skipped on relevant targets. v2 changes: - add new tests for both patterns (these will fail on targets which don't implement the standard insn names for IFN_SAT_ADD and IFN_SAT_SUB; another patch series adds support for this in aarch64). - minor adjustment to the constraints on the match statement for usadd_left_part_1. If this is OK for master, please commit these on my behalf, as I do not have the ability to do so. Many thanks, Akram --- Akram Ahmad (3): testsuite: Support dg-require-effective-target for us{add, sub} Match: support new case of unsigned scalar SAT_SUB Match: make SAT_ADD case 7 commutative gcc/match.pd | 12 +++- .../gcc.dg/tree-ssa/sat-u-add-match-1-u16.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u32.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u64.c | 22 .../gcc.dg/tree-ssa/sat-u-add-match-1-u8.c| 22 .../gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c | 15 + .../gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c| 15 + gcc/testsuite/lib/target-supports.exp | 56 +++ 10 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-add-match-1-u8.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u16.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u32.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u64.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sat-u-sub-match-1-u8.c
Re: [PATCH v2 0/2] aarch64: Use standard names for saturating arithmetic
Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668794.html On 14/11/2024 15:53, Akram Ahmad wrote: Hi all, This patch series introduces standard names for scalar, Adv. SIMD, and SVE saturating arithmetic instructions in the aarch64 backend. Additional tests are added for scalar saturating arithmetic, as well as to test that the auto-vectorizer correctly inserts NEON instructions or scalar instructions where necessary, such as in 32 and 64-bit scalar unsigned arithmetic. There are also tests for the auto-vectorized SVE code. The biggest change from V1-V2 of this series is the optimisation for signed scalar arithmetic (32 and 64-bit) to avoid the use of FMOV in the case of a constant and non-constant operand (immediate or GP reg values respectively). This is only exhibited if early-ra is disabled due to an early-ra bug which is assigning FP registers for operands even if this would unnecessarily result in FMOV being used. This new optimisation is tested by means of check-function-bodies as well as an execution test. As with v1 of this patch, the only new regression failures on aarch64 are to do with unsigned scalar intrinsics (32 and 64-bit) not using the NEON instructions any more. Otherwise, there are no regressions. SVE currently uses the unpredicated version of the instruction in the backend. v1 -> v2: - Add new split for signed saturating arithmetic - New test for signed saturating arithmetic - Make addition tests accept commutative operands, other test fixes Only the first patch in this series is updated in v2. The other patch is already approved. If this is ok, could this be committed for me please? I do not have commit rights. Many thanks, Akram --- Akram Ahmad (2): aarch64: Use standard names for saturating arithmetic aarch64: Use standard names for SVE saturating arithmetic gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 209 ++- gcc/config/aarch64/aarch64-sve.md | 4 +- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 + .../saturating_arithmetic_autovect_1.c| 79 ++ .../saturating_arithmetic_autovect_2.c| 79 ++ .../saturating_arithmetic_autovect_3.c| 75 ++ .../saturating_arithmetic_autovect_4.c| 77 ++ .../aarch64/saturating-arithmetic-signed.c| 244 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 +++ .../aarch64/saturating_arithmetic_4.c | 30 +++ .../aarch64/sve/saturating_arithmetic.inc | 68 + .../aarch64/sve/saturating_arithmetic_1.c | 60 + .../aarch64/sve/saturating_arithmetic_2.c | 60 + .../aarch64/sve/saturating_arithmetic_3.c | 62 + .../aarch64/sve/saturating_arithmetic_4.c | 62 + 22 files changed, 1371 insertions(+), 58 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/saturating_arithmetic_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/saturating_arithmetic_4.c
Re: [PATCH v2 1/2] aarch64: Use standard names for saturating arithmetic
Hi Kyrill, On 17/12/2024 15:15, Kyrylo Tkachov wrote: We avoid using the __builtin_aarch64_* builtins in test cases as they are undocumented and we don’t make any guarantees about their stability to users. I’d prefer if the saturating operation was open-coded in C. I expect the midend machinery is smart enough to recognize the saturating logic for scalars by now? Thanks for the detailed feedback. It's been really helpful, and I've gone ahead and implemented almost all of it. I'm struggling to find a pattern that's recognised for signed arithmetic though- the following emits branching code: int64_t __attribute__((noipa)) sadd64 (int64_t __a, int64_t __b) { if (__a > 0) { if (__b > INT64_MAX - __a) return INT64_MAX; } else if (__b < INT64_MIN - __a) { return INT64_MIN; } return __a + __b; } Resulting assembly: |sadd64: .LFB6: .cfi_startproc mov x3, x0 cmp x0, 0 ble .L9 mov x2, 9223372036854775807 sub x4, x2, x0 mov x0, x2 cmp x4, x1 blt .L8 .L11: add x0, x3, x1 .L8: ret .p2align 2,,3 .L9: mov x2, -9223372036854775808 sub x0, x2, x0 cmp x0, x1 ble .L11 mov x0, x2 ret Is there a way to force this not to use branches by any chance? I'll keep looking and see if there are some patterns recently added to match that will work here. If I don't find something, would it be sufficient to use the scalar NEON intrinsics for this? And if so, would that mean the test should move to the Adv. SIMD directory? Many thanks once again, Akram |
[PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic
Hi Kyrill, Thanks for the feedback on V2. I found a pattern which works for the open-coded signed arithmetic, and I've implemented the other feedback you provided as well. I've send the modified patch in this thread as the SVE patch [2/2] hasn't been changed, but I'm happy to send the entire V3 patch series as a new thread if that's easier. Patch continues below. If this is OK, please could you commit on my behalf? Many thanks, Akram --- This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Furthermore, this patch introduces a new optimisation for signed 32 and 64-bit scalar saturating arithmetic which uses adds/subs in place of the NEON instruction. Addition, before: fmovd0, x0 fmovd1, x1 sqadd d0, d0, d1 fmovx0, d0 Addition, after: asr x2, x1, 63 addsx0, x0, x1 eor x2, x2, 0x8000 csinv x0, x0, x2, vc In the above example, subtraction replaces the adds with subs and the csinv with csel. The 32-bit case follows the same approach. Arithmetic with a constant operand is simplified further by directly storing the saturating limit in the temporary register, resulting in only three instructions being used. It is important to note that this only works when early-ra is disabled due to an early-ra bug which erroneously assigns FP registers to the operands; if early-ra is enabled, then the original behaviour (NEON instruction) occurs. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. The signed scalar case is also tested with an execution test to check the results. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. * gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests. --- gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 218 +- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 .../saturating_arithmetic_autovect_1.c| 79 + .../saturating_arithmetic_autovect_2.c| 79 + .../saturating_arithmetic_autovect_3.c| 75 + .../saturating_arithmetic_autovect_4.c| 77 + .../aarch64/saturating-arithmetic-signed.c| 270 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 ++ .../aarch64/saturating_arithmetic_4.c | 30 ++ 16 files changed, 1092 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithm
Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic
Ah whoops- I didn't see this before sending off V4 just now, my apologies. I'll try my best to get this implemented before the end of the day so that it doesn't miss the deadline. On 09/01/2025 23:04, Richard Sandiford wrote: Akram Ahmad writes: In the above example, subtraction replaces the adds with subs and the csinv with csel. The 32-bit case follows the same approach. Arithmetic with a constant operand is simplified further by directly storing the saturating limit in the temporary register, resulting in only three instructions being used. It is important to note that this only works when early-ra is disabled due to an early-ra bug which erroneously assigns FP registers to the operands; if early-ra is enabled, then the original behaviour (NEON instruction) occurs. This can be fixed by changing: case CT_REGISTER: if (REG_P (op) || SUBREG_P (op)) return true; break; to: case CT_REGISTER: if (REG_P (op) || SUBREG_P (op) || GET_CODE (op) == SCRATCH) return true; break; But I can test & post that as a follow-up if you prefer. Yes please, if that's not too much trouble- would that have to go into another patch? + ;; Double vector modes. (define_mode_iterator VD [V8QI V4HI V4HF V2SI V2SF V4BF]) diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c new file mode 100644 index 000..2b72be7b0d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c @@ -0,0 +1,79 @@ +/* { dg-do assemble { target { aarch64*-*-* } } } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ +/* { dg-final { check-function-bodies "**" "" "-DCHECK_ASM" } } */ + +/* +** uadd_lane: { xfail *-*-* } +** dup\tv([0-9]+).8b, w0 +** uqadd\tb([0-9]+), (?:b\1, b0|b0, b\1) +** umov\tw0, v\2.b\[0\] +** ret +*/ Whats the reason behind the xfail? Is it the early-ra thing, or something else? (You might already have covered this, sorry.) xfailing is fine if it needs further optimisation, was just curious :) This is because of a missing pattern in match.pd (I've sent another patch upstream to add the missing pattern, although it may have gotten lost). Once that pattern is added though, this should be recognised as .SAT_SUB, and the new instructions will appear. [...] diff --git a/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c new file mode 100644 index 000..0fc6804683a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/saturating-arithmetic-signed.c @@ -0,0 +1,270 @@ +/* { dg-do run } */ +/* { dg-options "-O2 --save-temps -mearly-ra=none" } */ It'd be worth adding -fno-schedule-insns2 here. Same for saturating_arithmetic_1.c and saturating_arithmetic_2.c. The reason is that: +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include +#include +#include + +/* +** sadd32: +** asr w([0-9]+), w1, 31 +** addsw([0-9]+), (?:w0, w1|w1, w0) +** eor w\1, w\1, -2147483648 +** csinv w0, w\2, w\1, vc +** ret +*/ ...the first two instructions can be in either order, and similarly for the second and third. Really nice tests though :) Thanks! That also makes a lot of sense, I was cautious of assuming the instructions would always be in that exact order, so it's good to know I can try and specify that.
[PATCH v4 1/2] aarch64: Use standard names for saturating arithmetic
Hi Kyrill, Thanks for the very quick response! V4 of the patch can be found below the line. Best wishes, Akram --- This renames the existing {s,u}q{add,sub} instructions to use the standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and IFN_SAT_SUB. The NEON intrinsics for saturating arithmetic and their corresponding builtins are changed to use these standard names too. Using the standard names for the instructions causes 32 and 64-bit unsigned scalar saturating arithmetic to use the NEON instructions, resulting in an additional (and inefficient) FMOV to be generated when the original operands are in GP registers. This patch therefore also restores the original behaviour of using the adds/subs instructions in this circumstance. Furthermore, this patch introduces a new optimisation for signed 32 and 64-bit scalar saturating arithmetic which uses adds/subs in place of the NEON instruction. Addition, before: fmovd0, x0 fmovd1, x1 sqadd d0, d0, d1 fmovx0, d0 Addition, after: asr x2, x1, 63 addsx0, x0, x1 eor x2, x2, 0x8000 csinv x0, x0, x2, vc In the above example, subtraction replaces the adds with subs and the csinv with csel. The 32-bit case follows the same approach. Arithmetic with a constant operand is simplified further by directly storing the saturating limit in the temporary register, resulting in only three instructions being used. It is important to note that this only works when early-ra is disabled due to an early-ra bug which erroneously assigns FP registers to the operands; if early-ra is enabled, then the original behaviour (NEON instruction) occurs. Additional tests are written for the scalar and Adv. SIMD cases to ensure that the correct instructions are used. The NEON intrinsics are already tested elsewhere. The signed scalar case is also tested with an execution test to check the results. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc: Expand iterators. * config/aarch64/aarch64-simd-builtins.def: Use standard names * config/aarch64/aarch64-simd.md: Use standard names, split insn definitions on signedness of operator and type of operands. * config/aarch64/arm_neon.h: Use standard builtin names. * config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to simplify splitting of insn for scalar arithmetic. gcc/testsuite/ChangeLog: * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc: Template file for unsigned vector saturating arithmetic tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c: 8-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c: 16-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_3.c: 32-bit vector type tests. * gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_4.c: 64-bit vector type tests. * gcc.target/aarch64/saturating_arithmetic.inc: Template file for scalar saturating arithmetic tests. * gcc.target/aarch64/saturating_arithmetic_1.c: 8-bit tests. * gcc.target/aarch64/saturating_arithmetic_2.c: 16-bit tests. * gcc.target/aarch64/saturating_arithmetic_3.c: 32-bit tests. * gcc.target/aarch64/saturating_arithmetic_4.c: 64-bit tests. * gcc.target/aarch64/saturating_arithmetic_signed.c: Signed tests. --- gcc/config/aarch64/aarch64-builtins.cc| 13 + gcc/config/aarch64/aarch64-simd-builtins.def | 8 +- gcc/config/aarch64/aarch64-simd.md| 207 +- gcc/config/aarch64/arm_neon.h | 96 +++ gcc/config/aarch64/iterators.md | 4 + .../saturating_arithmetic_autovect.inc| 58 .../saturating_arithmetic_autovect_1.c| 79 + .../saturating_arithmetic_autovect_2.c| 79 + .../saturating_arithmetic_autovect_3.c| 75 + .../saturating_arithmetic_autovect_4.c| 77 + .../aarch64/saturating-arithmetic-signed.c| 270 ++ .../aarch64/saturating_arithmetic.inc | 39 +++ .../aarch64/saturating_arithmetic_1.c | 36 +++ .../aarch64/saturating_arithmetic_2.c | 36 +++ .../aarch64/saturating_arithmetic_3.c | 30 ++ .../aarch64/saturating_arithmetic_4.c | 30 ++ 16 files changed, 1081 insertions(+), 56 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect.inc create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_autovect_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/saturating_arithmetic_
Re: [PATCH v3 1/2] aarch64: Use standard names for saturating arithmetic
On 09/01/2025 23:04, Richard Sandiford wrote: + gcc_assert (imm != 0); The constraints do allow 0, so I'm not sure this assert is safe. Certainly we shouldn't usually get unfolded instructions, but strange things can happen with fuzzed options. Does the code mishandle that case? It looked like it should be ok. I accidentally deleted my response when trimming down the quote text- I haven't tested this, but it came about from an offline discussion about the patch with a teammate. It should be fine without the assert, but I'll test it to make sure.