date:20240904

Re: [PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-04 Thread Torbjorn SVENSSON





On 2024-09-03 20:23, Richard Biener wrote:




Am 03.09.2024 um 19:00 schrieb Tamar Christina :

Hi All,

The meaning of the testcase was changed by passing it -fwrapv.  The reason for
the test failures on some platform was because the test was testing some
implementation defined behavior wrt INT_MIN in generic code.

Instead of using -fwrapv this just removes the border case from the test so
all the values now have a defined semantic.  It still relies on the handling of
shifting a negative value right, but that wasn't changed with -fwrapv anyway.

The -fwrapv case is being handled already by other testcases.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?


Ok


As my patch (r14-10592) that was adding -fwrapv also got backported to 
releases/gcc-14, I assume that this patch should also be backported.


Do you want me to do the backport or will you manage it?

Kind regards,
Torbjörn




Thanks,
Tamar

gcc/testsuite/ChangeLog:

* gcc.dg/signbit-5.c: Remove -fwrapv and change INT_MIN to INT_MIN+1.

---
diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c
index 
2bca640f930b7d1799e995e86152a6d8d05ec2a0..e778f91ca33010029419b035cbb31eb742345c84
 100644
--- a/gcc/testsuite/gcc.dg/signbit-5.c
+++ b/gcc/testsuite/gcc.dg/signbit-5.c
@@ -1,5 +1,5 @@
/* { dg-do run } */
-/* { dg-options "-O3 -fwrapv" } */
+/* { dg-options "-O3" } */

/* This test does not work when the truth type does not match vector type.  */
/* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
@@ -44,8 +44,8 @@ int main ()
   TYPE a[N];
   TYPE b[N];

-  a[0] = INT_MIN;
-  b[0] = INT_MIN;
+  a[0] = INT_MIN+1;
+  b[0] = INT_MIN+1;

   for (int i = 1; i < N; ++i)
 {




--

RE: [PATCH] Match: Fix ordered and nonequal

2024-09-04 Thread Hu, Lin1

Type wrong hongtao's e-mail address.

> -Original Message-
> From: Hu, Lin1 
> Sent: Wednesday, September 4, 2024 1:44 PM
> To: gcc-patches@gcc.gnu.org
> Cc: hontao@intel.com; ubiz...@gmail.com; rguent...@suse.de;
> ja...@redhat.com; pins...@gmail.com
> Subject: [PATCH] Match: Fix ordered and nonequal
> 
> Hi, all
> 
> This patch is a fix patch.
> 
> Need to add :c for bit_and, because bit_and is commutative. And is (ltgt @0 
> @1)
> is simpler than (bit_not (uneq @0 @1)).
> 
> Bootstrapped/regtested on x86-64-pc-linux-gnu, OK for trunk?
> 
> BRs,
> Lin
> 
> gcc/ChangeLog:
> 
>   * match.pd: Fix match for (bit_and (ordered @0 @1) (ne @0 @1)).
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/opt-ordered-and-nonequal-1.c: New test.
>   * gcc.target/i386/optimize_one.c: Change name to opt-comi-1.c.
>   * gcc.target/i386/opt-comi-1.c: New test.
> ---
>  gcc/match.pd  |  4 +-
>  .../gcc.dg/opt-ordered-and-nonequal-1.c   | 49 +++
>  gcc/testsuite/gcc.target/i386/opt-comi-1.c| 49 +++
>  gcc/testsuite/gcc.target/i386/optimize_one.c  |  9 
>  4 files changed, 100 insertions(+), 11 deletions(-)  create mode 100644
> gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/opt-comi-1.c
>  delete mode 100644 gcc/testsuite/gcc.target/i386/optimize_one.c
> 
> diff --git a/gcc/match.pd b/gcc/match.pd index 4298e89dad6..621306213e4
> 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6652,8 +6652,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (!flag_trapping_math || !tree_expr_maybe_nan_p (@0))
>{ constant_boolean_node (false, type); }))  (simplify
> - (bit_and (ordered @0 @1) (ne @0 @1))
> - (bit_not (uneq @0 @1)))
> + (bit_and:c (ordered @0 @1) (ne @0 @1)) (ltgt @0 @1))
> 
>  /* x == ~x -> false */
>  /* x != ~x -> true */
> diff --git a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> new file mode 100644
> index 000..6d102c2bd0c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> +
> +int is_ordered_and_nonequal_sh_1 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && (a != b); }
> +
> +int is_ordered_and_nonequal_sh_2 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && (b != a); }
> +
> +int is_ordered_and_nonequal_sh_3 (float a, float b) {
> +  return (b != a) && !__builtin_isunordered (a, b); }
> +
> +int is_ordered_and_nonequal_sh_4 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && !(a == b); }
> +
> +int is_ordered_and_nonequal_sh_5 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && !(b == a); }
> +
> +int is_ordered_and_nonequal_sh_6 (float a, float b) {
> +  return !(b == a) && !__builtin_isunordered (a, b); }
> +
> +int is_ordered_or_nonequal_sh_7 (float a, float b) {
> +  return !(__builtin_isunordered (a, b) || (a == b)); }
> +
> +int is_ordered_or_nonequal_sh_8 (float a, float b) {
> +  return !(__builtin_isunordered (a, b) || (b == a)); }
> +
> +int is_ordered_or_nonequal_sh_9 (float a, float b) {
> +  return !((a == b) || __builtin_isunordered (b, a)); }
> +
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\n\r]*<>"
> +9 "forwprop1" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> b/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> new file mode 100644
> index 000..fc7b8632004
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mfpmath=sse -msse2" } */
> +/* { dg-final { scan-assembler-times "comiss" 9 } } */
> +/* { dg-final { scan-assembler-times "set" 9 } } */
> +
> +int is_ordered_and_nonequal_sh_1 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && (a != b); }
> +
> +int is_ordered_and_nonequal_sh_2 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && (b != a); }
> +
> +int is_ordered_and_nonequal_sh_3 (float a, float b) {
> +  return (b != a) && !__builtin_isunordered (a, b); }
> +
> +int is_ordered_and_nonequal_sh_4 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && !(a == b); }
> +
> +int is_ordered_and_nonequal_sh_5 (float a, float b) {
> +  return !__builtin_isunordered (a, b) && !(b == a); }
> +
> +int is_ordered_and_nonequal_sh_6 (float a, float b) {
> +  return !(b == a) && !__builtin_isunordered (a, b); }
> +
> +int is_ordered_or_nonequal_sh_7 (float a, float b) {
> +  return !(__builtin_isunordered (a, b) || (a == b)); }
> +
> +int is_ordered_or_nonequal_sh_8 (float a, float b) {
> +  return !(__builtin_isunordered (a, b) || (b == a)); }
> +
> +int is_ordered_or_nonequal_sh_9 (float a, float b) {
> +  return !((a == b) || __builtin_isunordered (b, a)); }
> diff --git a/gcc/testsuite/gcc.target/i386/optimize_one.c
> b/gcc/tests

[PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread pan2 . li

From: Pan Li 

The gen_phi_on_cond can only support below control flow for cond
from day 1.  Aka:

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

Unfortunately, there will be more scenarios of control flow on PHI.
For example as below:

T __attribute__((noinline))\
sat_s_add_##T##_fmt_3 (T x, T y)   \
{  \
  T sum;   \
  bool overflow = __builtin_add_overflow (x, y, &sum); \
  return overflow ? x < 0 ? MIN : MAX : sum;   \
}

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

With expanded RTL like below.
   3   │
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

The above code will have below control flow which is not supported by
the gen_phi_on_cond.

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| def |   |
| ... |   |
+-+   |
   |  |
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

This patch would like to add support above control flow for the
gen_phi_on_cond.

The below testsuites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* genmatch.cc (dt_operand::gen_phi_on_cond): Add support for
a new control flow when gen phi on condition.

Signed-off-by: Pan Li 
---
 gcc/genmatch.cc | 85 +++--
 1 file changed, 76 insertions(+), 9 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a56bd90cb2c..f538df1be62 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3529,28 +3529,95 @@ dt_operand::gen_phi_on_cond (FILE *f, int indent, int 
depth)
 "basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
   fprintf_indent (f, indent,
 "basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
+
   fprintf_indent (f, indent,
-"basic_block _db_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d)) ? "
-"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth);
+"gcond *_ct_0_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d));\n",
+depth, depth);
   fprintf_indent (f, indent,
-"basic_block _other_db_%d = safe_dyn_cast  "
-"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n",
+"gcond *_ct_1_%d = safe_dyn_cast  (*gsi_last_bb (_pb_1_%d));\n",
+depth, depth);
+  fprintf_indent (f, indent,
+"gcond *_ct_a_%d = _ct_0_%d ? _ct_0_%d : _ct_1_%d;\n",
+depth, depth, depth, depth);
+  fprintf_indent (f, indent,
+"basic_block _db_%d = _ct_0_%d ? _pb_0_%d : _pb_1_%d;\n",
+depth, depth, depth, depth);
+  fprintf_indent (f, indent,
+"basic_block _other_db_%d = _ct_0_%d ? _pb_1_%d : _pb_0_%d;\n",
 depth, depth, depth, depth);
 
   fprintf_indent (f, indent,
-"gcond *_ct_%d = safe_dyn_cast  (*gsi_last_bb (_db_%d));\n",
-depth, depth);
-  fprintf_indent (f, indent, "if (_ct_%d"
+"edge _e_00_%d = _pb_0_%d->preds ? EDGE_PRED (_pb_0_%d, 0) : NULL;\n",
+depth, depth, depth);
+  fprintf_indent (f, indent,
+"basic_block _pb_00_%d = _e_00_%d ? _e_00_%d->src : NULL;\n",
+depth, depth, depth);
+  fprintf_indent (f, indent,
+"gcond *_ct_b_%d = _pb_00_%d ? "
+"safe_dyn_cast  (*gsi_last_bb (_pb_00_%d)) : NULL;\n",
+depth, depth, depth);
+
+  /* Case 1 flow for PHI.
+   * +--+
+   * | def  |
+   * | ...  |   +-+
+   * | cond |-->| def |
+   * +--+   | ... |
+   *

[PATCH v1 2/2] Match: Support form 3 for scalar signed integer .SAT_ADD

2024-09-04 Thread pan2 . li

From: Pan Li 

This patch would like to support the form 3 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _3 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the form 3 of signed .SAT_ADD matching.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 1372f2ba377..1218abcd01a 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3222,6 +3222,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Signed saturation add, case 3:
+   Z = .ADD_OVERFLOW (X, Y)
+   SAT_S_ADD = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : sum;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
-- 
2.43.0

Re: [PATCH 1/4]middle-end: have vect_recog_cond_store_pattern use pattern statement for cond if available

2024-09-04 Thread Richard Biener

On Tue, 3 Sep 2024, Tamar Christina wrote:

> Hi All,
> 
> When vectorizing a conditional operation we rely on the bool_recog pattern to
> hit and convert the bool of the operand to a valid mask.
> 
> However we are currently not using the converted operand as this is in a 
> pattern
> statement.  This change updates it to look at the actual statement to be
> vectorized so we pick up the pattern.
> 
> Note that there are no tests here since vectorization will fail until we
> correctly lower all boolean conditionals early.
> 
> Tests for these are in the next patch, namely vect-conditional_store_5.c and
> vect-conditional_store_6.c.  And the existing vect-conditional_store_[1-4].c
> checks that the other cases are still handled correctly.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
> 
> Ok for master?

OK.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * tree-vect-patterns.cc (vect_recog_cond_store_pattern): Use pattern
>   statement.
> 
> ---
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 
> f52de2b6972dc0b8f63f812b64c60e9414962743..4b112910df357e9f2783f7173b71812085126389
>  100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6601,7 +6601,15 @@ vect_recog_cond_store_pattern (vec_info *vinfo,
>if (TREE_CODE (st_rhs) != SSA_NAME)
>  return NULL;
>  
> -  gassign *cond_stmt = dyn_cast (SSA_NAME_DEF_STMT (st_rhs));
> +  auto cond_vinfo = vinfo->lookup_def (st_rhs);
> +
> +  /* If the condition isn't part of the loop then bool recog wouldn't have 
> seen
> + it and so this transformation may not be valid.  */
> +  if (!cond_vinfo)
> +return NULL;
> +
> +  cond_vinfo = vect_stmt_to_vectorize (cond_vinfo);
> +  gassign *cond_stmt = dyn_cast (STMT_VINFO_STMT (cond_vinfo));
>if (!cond_stmt || gimple_assign_rhs_code (cond_stmt) != COND_EXPR)
>  return NULL;
>  
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH] coros: mark .CO_YIELD as LEAF [PR106973]

2024-09-04 Thread Richard Biener

On Tue, Sep 3, 2024 at 8:11 PM Arsen Arsenović  wrote:
>
> Tested on x86_64-pc-linux-gnu.  OK for trunk?

OK

> -- >8 --
> We rely on .CO_YIELD calls being followed by an assignment (optionally)
> and then a switch/if in the same basic block.  This implies that a
> .CO_YIELD can never end a block.  However, since a call to .CO_YIELD is
> still a call, if the function containing it calls setjmp, GCC thinks
> that the .CO_YIELD can introduce abnormal control flow, and generates an
> edge for the call.
>
> We know this is not the case; .CO_YIELD calls get removed quite early on
> and have no effect, and result in no other calls, so .CO_YIELD can be
> considered a leaf function, preventing generating an edge when calling
> it.
>
> PR c++/106973 - coroutine generator and setjmp
>
> PR c++/106973
>
> gcc/ChangeLog:
>
> * internal-fn.def (CO_YIELD): Mark as ECF_LEAF.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/coroutines/pr106973.C: New test.
> ---
>  gcc/internal-fn.def|  2 +-
>  gcc/testsuite/g++.dg/coroutines/pr106973.C | 22 ++
>  2 files changed, 23 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/coroutines/pr106973.C
>
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 75b527b1ab0b..23b4ab02b300 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -569,7 +569,7 @@ DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
>
>  /* For coroutines.  */
>  DEF_INTERNAL_FN (CO_ACTOR, ECF_NOTHROW | ECF_LEAF, NULL)
> -DEF_INTERNAL_FN (CO_YIELD, ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (CO_YIELD, ECF_NOTHROW | ECF_LEAF, NULL)
>  DEF_INTERNAL_FN (CO_SUSPN, ECF_NOTHROW, NULL)
>  DEF_INTERNAL_FN (CO_FRAME, ECF_PURE | ECF_NOTHROW | ECF_LEAF, NULL)
>
> diff --git a/gcc/testsuite/g++.dg/coroutines/pr106973.C 
> b/gcc/testsuite/g++.dg/coroutines/pr106973.C
> new file mode 100644
> index ..6db6cbc7711a
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/pr106973.C
> @@ -0,0 +1,22 @@
> +// https://gcc.gnu.org/PR106973
> +// { dg-require-effective-target indirect_jumps }
> +#include 
> +#include 
> +
> +struct generator;
> +struct generator_promise {
> +  generator get_return_object();
> +  std::suspend_always initial_suspend();
> +  std::suspend_always final_suspend() noexcept;
> +  std::suspend_always yield_value(int);
> +  void unhandled_exception();
> +};
> +
> +struct generator {
> +  using promise_type = generator_promise;
> +};
> +jmp_buf foo_env;
> +generator foo() {
> +  setjmp(foo_env);
> +  co_yield 1;
> +}
> --
> 2.46.0
>

Re: [PING] [PATCH] rust: avoid clobbering LIBS

2024-09-04 Thread Richard Biener

On Tue, Sep 3, 2024 at 8:42 PM Marc  wrote:
>
> Richard Biener  writes:
>
> > On Wed, Aug 28, 2024 at 11:10 AM Marc  wrote:
> >>
> >> Hello,
> >>
> >> Gentle reminder for this simple autoconf patch :)
> >
> > OK.
> >
> > Note that completely wiping LIBS might remove requirements detected earlier,
> > like some systems require explicit -lc for example.  I would instead not 
> > clear
> > LIBS here and instead allow the possible duplicates through CRAB_LIBS.
> > YMMV of course.
>
> Oh, that's a good remark. I've simply followed this suggestion that was
> given on #gcc and also took inspiration from gcc/configure.ac that has
> many instances of clearing LIBS like that. I think I'll merge it like
> that, unless you see any reason this pattern would cause issue here (top
> level) and not in gcc/configure.

If it's done like this elsewhere then it's good to follow existing
practice indeed.

Richard.

> Thank you,
> Marc
>
>

Re: [PATCH] object-size: Use simple_dce_from_worklist in object-size pass

2024-09-04 Thread Richard Biener

On Wed, Sep 4, 2024 at 1:58 AM Andrew Pinski  wrote:
>
> While trying to see if there was a way to improve object-size pass
> to use the ranger (for pointer plus), I noticed that it leaves around
> the statement containing __builtin_object_size if it was reduced to a 
> constant.
> This fixes that by using simple_dce_from_worklist.
>
> Bootstrapped and tested on x86_64-linux-gnu.

OK.

> gcc/ChangeLog:
>
> * tree-object-size.cc (object_sizes_execute): Mark lhs for maybe 
> dceing
> if doing a propagate. Call simple_dce_from_worklist.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/tree-object-size.cc | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
> index 4c1fa9b555f..6544730e153 100644
> --- a/gcc/tree-object-size.cc
> +++ b/gcc/tree-object-size.cc
> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "builtins.h"
>  #include "gimplify-me.h"
>  #include "gimplify.h"
> +#include "tree-ssa-dce.h"
>
>  struct object_size_info
>  {
> @@ -2187,6 +2188,7 @@ static unsigned int
>  object_sizes_execute (function *fun, bool early)
>  {
>todo = 0;
> +  auto_bitmap sdce_worklist;
>
>basic_block bb;
>FOR_EACH_BB_FN (bb, fun)
> @@ -2277,13 +2279,18 @@ object_sizes_execute (function *fun, bool early)
>
>   /* Propagate into all uses and fold those stmts.  */
>   if (!SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
> -   replace_uses_by (lhs, result);
> +   {
> + replace_uses_by (lhs, result);
> + /* Mark lhs as being possiblely DCEd. */
> + bitmap_set_bit (sdce_worklist, SSA_NAME_VERSION (lhs));
> +   }
>   else
> replace_call_with_value (&i, result);
> }
>  }
>
>fini_object_sizes ();
> +  simple_dce_from_worklist (sdce_worklist);
>return todo;
>  }
>
> --
> 2.43.0
>

Re: [PATCH] Match: Fix ordered and nonequal

2024-09-04 Thread Richard Biener

On Wed, Sep 4, 2024 at 9:15 AM Hu, Lin1  wrote:
>
> Type wrong hongtao's e-mail address.
>
> > -Original Message-
> > From: Hu, Lin1 
> > Sent: Wednesday, September 4, 2024 1:44 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: hontao@intel.com; ubiz...@gmail.com; rguent...@suse.de;
> > ja...@redhat.com; pins...@gmail.com
> > Subject: [PATCH] Match: Fix ordered and nonequal
> >
> > Hi, all
> >
> > This patch is a fix patch.
> >
> > Need to add :c for bit_and, because bit_and is commutative. And is (ltgt @0 
> > @1)
> > is simpler than (bit_not (uneq @0 @1)).
> >
> > Bootstrapped/regtested on x86-64-pc-linux-gnu, OK for trunk?

OK

> > BRs,
> > Lin
> >
> > gcc/ChangeLog:
> >
> >   * match.pd: Fix match for (bit_and (ordered @0 @1) (ne @0 @1)).
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.dg/opt-ordered-and-nonequal-1.c: New test.
> >   * gcc.target/i386/optimize_one.c: Change name to opt-comi-1.c.
> >   * gcc.target/i386/opt-comi-1.c: New test.
> > ---
> >  gcc/match.pd  |  4 +-
> >  .../gcc.dg/opt-ordered-and-nonequal-1.c   | 49 +++
> >  gcc/testsuite/gcc.target/i386/opt-comi-1.c| 49 +++
> >  gcc/testsuite/gcc.target/i386/optimize_one.c  |  9 
> >  4 files changed, 100 insertions(+), 11 deletions(-)  create mode 100644
> > gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/opt-comi-1.c
> >  delete mode 100644 gcc/testsuite/gcc.target/i386/optimize_one.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd index 4298e89dad6..621306213e4
> > 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6652,8 +6652,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (if (!flag_trapping_math || !tree_expr_maybe_nan_p (@0))
> >{ constant_boolean_node (false, type); }))  (simplify
> > - (bit_and (ordered @0 @1) (ne @0 @1))
> > - (bit_not (uneq @0 @1)))
> > + (bit_and:c (ordered @0 @1) (ne @0 @1)) (ltgt @0 @1))
> >
> >  /* x == ~x -> false */
> >  /* x != ~x -> true */
> > diff --git a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> > b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> > new file mode 100644
> > index 000..6d102c2bd0c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> > +
> > +int is_ordered_and_nonequal_sh_1 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && (a != b); }
> > +
> > +int is_ordered_and_nonequal_sh_2 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && (b != a); }
> > +
> > +int is_ordered_and_nonequal_sh_3 (float a, float b) {
> > +  return (b != a) && !__builtin_isunordered (a, b); }
> > +
> > +int is_ordered_and_nonequal_sh_4 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && !(a == b); }
> > +
> > +int is_ordered_and_nonequal_sh_5 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && !(b == a); }
> > +
> > +int is_ordered_and_nonequal_sh_6 (float a, float b) {
> > +  return !(b == a) && !__builtin_isunordered (a, b); }
> > +
> > +int is_ordered_or_nonequal_sh_7 (float a, float b) {
> > +  return !(__builtin_isunordered (a, b) || (a == b)); }
> > +
> > +int is_ordered_or_nonequal_sh_8 (float a, float b) {
> > +  return !(__builtin_isunordered (a, b) || (b == a)); }
> > +
> > +int is_ordered_or_nonequal_sh_9 (float a, float b) {
> > +  return !((a == b) || __builtin_isunordered (b, a)); }
> > +
> > +/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\n\r]*<>"
> > +9 "forwprop1" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> > b/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> > new file mode 100644
> > index 000..fc7b8632004
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/opt-comi-1.c
> > @@ -0,0 +1,49 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mfpmath=sse -msse2" } */
> > +/* { dg-final { scan-assembler-times "comiss" 9 } } */
> > +/* { dg-final { scan-assembler-times "set" 9 } } */
> > +
> > +int is_ordered_and_nonequal_sh_1 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && (a != b); }
> > +
> > +int is_ordered_and_nonequal_sh_2 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && (b != a); }
> > +
> > +int is_ordered_and_nonequal_sh_3 (float a, float b) {
> > +  return (b != a) && !__builtin_isunordered (a, b); }
> > +
> > +int is_ordered_and_nonequal_sh_4 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && !(a == b); }
> > +
> > +int is_ordered_and_nonequal_sh_5 (float a, float b) {
> > +  return !__builtin_isunordered (a, b) && !(b == a); }
> > +
> > +int is_ordered_and_nonequal_sh_6 (float a, float b) {
> > +  return !(b == a) && !__builtin_isunordered (a, b); }
> > +
> > +int is_ordered_or_nonequal_sh_7 (float a, float b) {
> > +  return !(__builtin_isunordered (a, b) || (a == b)); }
> > +
> > +int is_or

Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread Richard Biener

On Wed, Sep 4, 2024 at 9:25 AM  wrote:
>
> From: Pan Li 
>
> The gen_phi_on_cond can only support below control flow for cond
> from day 1.  Aka:
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> Unfortunately, there will be more scenarios of control flow on PHI.
> For example as below:
>
> T __attribute__((noinline))\
> sat_s_add_##T##_fmt_3 (T x, T y)   \
> {  \
>   T sum;   \
>   bool overflow = __builtin_add_overflow (x, y, &sum); \
>   return overflow ? x < 0 ? MIN : MAX : sum;   \
> }
>
> DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)
>
> With expanded RTL like below.
>3   │
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
>6   │ {
>7   │   signed char _1;
>8   │   signed char _2;
>9   │   int8_t _3;
>   10   │   __complex__ signed char _6;
>   11   │   _Bool _8;
>   12   │   signed char _9;
>   13   │   signed char _10;
>   14   │   signed char _11;
>   15   │
>   16   │ ;;   basic block 2, loop depth 0
>   17   │ ;;pred:   ENTRY
>   18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   19   │   _2 = IMAGPART_EXPR <_6>;
>   20   │   if (_2 != 0)
>   21   │ goto ; [50.00%]
>   22   │   else
>   23   │ goto ; [50.00%]
>   24   │ ;;succ:   4
>   25   │ ;;3
>   26   │
>   27   │ ;;   basic block 3, loop depth 0
>   28   │ ;;pred:   2
>   29   │   _1 = REALPART_EXPR <_6>;
>   30   │   goto ; [100.00%]
>   31   │ ;;succ:   5
>   32   │
>   33   │ ;;   basic block 4, loop depth 0
>   34   │ ;;pred:   2
>   35   │   _8 = x_4(D) < 0;
>   36   │   _9 = (signed char) _8;
>   37   │   _10 = -_9;
>   38   │   _11 = _10 ^ 127;
>   39   │ ;;succ:   5
>   40   │
>   41   │ ;;   basic block 5, loop depth 0
>   42   │ ;;pred:   3
>   43   │ ;;4
>   44   │   # _3 = PHI <_1(3), _11(4)>
>   45   │   return _3;
>   46   │ ;;succ:   EXIT
>   47   │
>   48   │ }
>
> The above code will have below control flow which is not supported by
> the gen_phi_on_cond.
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | def |   |
> | ... |   |
> +-+   |
>|  |
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> This patch would like to add support above control flow for the
> gen_phi_on_cond.
>
> The below testsuites are passed for this patch:
> * The rv64gcv fully regression test.
> * The x86 bootstrap test.
> * The x86 fully regression test.

I'm lazy - can you please quote genmatch generated code for the condition for
one case?

Thanks,
Richard.

> gcc/ChangeLog:
>
> * genmatch.cc (dt_operand::gen_phi_on_cond): Add support for
> a new control flow when gen phi on condition.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/genmatch.cc | 85 +++--
>  1 file changed, 76 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index a56bd90cb2c..f538df1be62 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -3529,28 +3529,95 @@ dt_operand::gen_phi_on_cond (FILE *f, int indent, int 
> depth)
>  "basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
>fprintf_indent (f, indent,
>  "basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
> +
>fprintf_indent (f, indent,
> -"basic_block _db_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d)) 
> ? "
> -"_pb_0_%d : _pb_1_%d;\n", depth, depth, depth, depth);
> +"gcond *_ct_0_%d = safe_dyn_cast  (*gsi_last_bb (_pb_0_%d));\n",
> +depth, depth);
>fprintf_indent (f, indent,
> -"basic_block _other_db_%d = safe_dyn_cast  "
> -"(*gsi_last_bb (_pb_0_%d)) ? _pb_1_%d : _pb_0_%d;\n",
> +"gcond *_ct_1_%d = safe_dyn_cast  (*gsi_last_bb (_pb_1_%d));\n",
> +depth, depth);
> +  fprintf_indent (f, indent,
> +"gcond *_ct_a_%d = _ct_0_%d ? _ct_0_%d : _ct_1_%d;\n",
> +depth, depth, depth, depth);
> +  fprintf_indent (f, indent,
> +"basic_block _db_%d = _ct_0_%d ? _pb_0_%d : _pb_1_%d;\n",
> +depth, depth, depth, depth);
> +  fprintf_indent (f, indent,
> +"basic_block _other_db_%d = _ct_0_%d ? _pb_1_%d : _pb_0_%d;\n",
>  depth, depth, depth, depth);
>
>fprintf_indent (f, indent,
> -"gcond *_ct_%d = safe_dyn_cast  (*gsi_last_bb (_db_%d));\n",
> -depth, depth);
> -  fprintf_indent (f, indent, "if (_ct_%d"
> +"edge _e_00_%d = _pb_0_%d->preds ? EDGE_PRED (_pb_0_%d, 0) : NULL;\n",
> +depth, dept

RE: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread Li, Pan2

> I'm lazy - can you please quote genmatch generated code for the condition for
> one case?

Sure thing, list the before and after covers all the changes to generated code 
as blow.

Before this patch:
  basic_block _b1 = gimple_bb (_a1);
  if (gimple_phi_num_args (_a1) == 2)
{
  basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
  basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
  basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1)) ? _pb_0_1 : _pb_1_1;
  basic_block _other_db_1 = safe_dyn_cast  
(*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
  gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
  if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
&& EDGE_COUNT (_other_db_1->succs) == 1
&& EDGE_PRED (_other_db_1, 0)->src == _db_1)
{
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), 
boolean_type_node, _cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
0)->flags & EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 
: 1);
  tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 
: 0);
  switch (TREE_CODE (_p0))
{

After this patch:
  basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
  basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
  gcond *_ct_0_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_0_1));
  gcond *_ct_1_1 = safe_dyn_cast  (*gsi_last_bb 
(_pb_1_1));
  gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1;
  basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1;
  basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1;
  edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : NULL;
  basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL;
  gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast  
(*gsi_last_bb (_pb_00_1)) : NULL;
  if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1
   && EDGE_COUNT (_other_db_1->succs) == 1
   && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  ||
  (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1
   && EDGE_COUNT (_pb_0_1->preds) == 1
   && EDGE_COUNT (_other_db_1->preds) == 1
   && EDGE_COUNT (_other_db_1->succs) == 1
   && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1))
{
  gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1;
  tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
  tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
  tree _p0 = build2 (gimple_cond_code (_ct_1), 
boolean_type_node, _cond_lhs_1, _cond_rhs_1);
  bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
0)->flags & EDGE_TRUE_VALUE;
  tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 0 
: 1);
  tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 1 
: 0);

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, September 4, 2024 3:42 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

On Wed, Sep 4, 2024 at 9:25 AM  wrote:
>
> From: Pan Li 
>
> The gen_phi_on_cond can only support below control flow for cond
> from day 1.  Aka:
>
> +--+
> | def  |
> | ...  |   +-+
> | cond |-->| def |
> +--+   | ... |
>|   +-+
>|  |
>v  |
> +-+   |
> | PHI |<--+
> +-+
>
> Unfortunately, there will be more scenarios of control flow on PHI.
> For example as below:
>
> T __attribute__((noinline))\
> sat_s_add_##T##_fmt_3 (T x, T y)   \
> {  \
>   T sum;   \
>   bool overflow = __builtin_add_overflow (x, y, &sum); \
>   return overflow ? x < 0 ? MIN : MAX : sum;   \
> }
>
> DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)
>
> With expanded RTL like below.
>3   │
>4   │ __attribute__((noinline))
>5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
>6   │ {
>7   │   signed char _1;
>8   │   signed char _2;
>9   │   int8_t _3;
>   10   │   __complex__ signed char _6;
>   11   │   _Bool _8;
>   12   │   sign

RE: [r15-3359 Regression] FAIL: gcc.target/i386/avx10_2-bf-vector-cmpp-1.c (test for excess errors) on Linux/x86_64

2024-09-04 Thread Jiang, Haochen

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, September 3, 2024 2:40 PM
> 
> On Tue, Sep 3, 2024 at 7:36 AM Jiang, Haochen 
> wrote:
> >
> >
> >
> > > From: Hongtao Liu 
> > > Sent: Tuesday, September 3, 2024 1:47 PM
> > >
> > > On Tue, Sep 3, 2024 at 9:45 AM Jiang, Haochen via Gcc-regression
> > >  wrote:
> > > >
> > > > As each AVX10.2 testcases previously, this is caused by option
> combination
> > > warning,
> > > > which is expected.
> > > >
> > > Can we put the warning for mix usage of mavx10 and -mavx512f under -
> > > Wpsabi
> > > And add -Wno-psabi in addition to -march=cascadelake to avoid the
> > > false positive?
> >
> > We could do that if nobody has objection to that.
> 
> But mixing both doesn't do anything to the ABI so -Wpsabi sounds like the
> wrong bucket to me.  Instead we have to solve the issue at hand - I would
> expect users to run into this warning as well if we do within our testsuite?

If we can bear that "false positive", I suppose it is ok.

l will change the -march=cascadelake to the future CPU contains AVX10.2
when it is doable to eliminate them.

Thx,
Haochen

> 
> Richard.
> 
> > Thx,
> > Haochen
> >
> > >
> > > --
> > > BR,
> > > Hongtao

[PATCH] SVE intrinsics: Fold svdiv with all-zero operands to zero vector

2024-09-04 Thread Jennifer Schmitz

This patch folds svdiv where one of the operands is all-zeros to a zero
vector, if the predicate is ptrue or the predication is _x or _z.
This case was not covered by the recent patch that implemented constant
folding, because that covered only cases where both operands are
constant vectors. Here, the operation is folded as soon as one of the operands
is a constant zero vector.
Folding of divison by 0 to return 0 is in accordance with
the semantics of sdiv and udiv.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold):
Add folding of all-zero operands to zero vector.

gcc/testsuite/
* gcc.target/aarch64/sve/fold_div_zero.c: New test.
* gcc.target/aarch64/sve/const_fold_div_1.c: Adjust expected
outcome.


0001-SVE-intrinsics-Fold-svdiv-with-all-zero-operands-to-.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-04 Thread Toon Moene


On 9/3/24 15:07, Jan Hubicka wrote:


Hi,
We disable gathers for zen4.  It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.


A small bit of "real world" experience (but for zen3):

Recently I switched to gfortran 14.2 for my weather forecasting.
A year ago I had changed "-march=native -mtune=native" (on my zen3 
system) to "-march=native -mtune=znver2" while using gfortran 13 - it 
had only a small effect (but positive).


Last Monday I switched back to "-march=native -mtune=native", but that 
consistently made a 12 hour computation around 6 minutes slower (i.e., 
about 1/120th, or 0.8 %). The most computational intensive part of the 
code needs gather (either instructions or inline expansions of them).


Hope this helps,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Re: [nvptx] Fix code-gen for alias attribute

2024-09-04 Thread Thomas Schwinge

Hi!

Honza (or others, of course), there's a question about
'ultimate_alias_target'.

On 2024-08-26T10:50:36+, Prathamesh Kulkarni  wrote:
> For the following test (adapted from pr96390.c):
>
> __attribute__((noipa)) int foo () { return 42; }
> int bar () __attribute__((alias ("foo")));
> int baz () __attribute__((alias ("bar")));

> Compiling [for nvptx] results in:
>
> ptxas fatal   : Internal error: alias to unknown symbol
> nvptx-as: ptxas returned 255 exit status

Prathamesh: thanks for looking into this, and ACK: one of the many
limitations of PTX '.alias'.  :-|

> This happens because ptx code-gen shows:
>
> // BEGIN GLOBAL FUNCTION DEF: foo
> .visible .func (.param.u32 %value_out) foo
> {
>   [...]
> }
> .visible .func (.param.u32 %value_out) bar;
> .alias bar,foo;
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,bar;

> .alias baz, bar is invalid since PTX requires aliasee to be a defined 
> function:
> https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-execution/latest-internal/#kernel-and-function-directives-alias

(Us ordinary mortals need to look at
;
please update the Git commit log.)

> The patch uses cgraph_node::get(name)->ultimate_alias_target () instead of 
> the provided value in nvptx_asm_output_def_from_decls.

I confirm that resolving to 'ultimate_alias_target' does work for this
case:

> For the above case, it now generates the following ptx:
>
> .alias baz,foo; 
> instead of:
> .alias baz,bar;
>
> which fixes the issue.

..., but I'm not sure if that's conceptually correct; I'm not familiar
with 'ultimate_alias_target' semantics.  (Honza?)

Also, I wonder whether 'gcc/varasm.cc:do_assemble_alias' is prepared for
'ASM_OUTPUT_DEF_FROM_DECLS' to disregard the specified 'target'/'value'
and instead do its own thing (here, the proposed resolving to
'ultimate_alias_target')?  (No other GCC back end appears to be doing
such a thing; from a quick look, all appear to faithfully use the
specified 'target'/'value'.)

Now, consider the case that the source code is changed as follows:

 __attribute__((noipa)) int foo () { return 42; }
-int bar () __attribute__((alias ("foo")));
+int bar () __attribute__((weak, alias ("foo")));
 int baz () __attribute__((alias ("bar")));

With 'ultimate_alias_target', I've checked, you'd then still emit
'.alias baz,foo;', losing the ability to override the weak alias with a
strong 'bar' definition in another compilation unit?

Now, that said: GCC/nvptx for such code currently diagnoses
"error: weak alias definitions not supported [...]" ;-| -- so we may be
safe, after all?  ..., or is there any other way that the resolving to
'ultimate_alias_target' might cause issues?  If not, then at least your
proposed patch shouldn't be causing any harm (doesn't affect
'--target=nvptx-none' test results at all...), and does address one
user-visible issue ('libgomp.c-c++-common/pr96390.c'), and thus makes
sense to install.

> [nvptx] Fix code-gen for alias attribute.

I'd rather suggest something like:
"[nvptx] (Some) support for aliases to aliases" (or similar).

Also, please add "PR target/104957" to the Git commit log, as your change
directly alters this one aspect of PR104957
"[nvptx] Use .alias directive (available starting ptx isa version 6.3)"'s
commit r12-7766-gf8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3":

| Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
| This is currently not prohibited by the compiler, but with the driver link we
| run into:  "Internal error: alias to unknown symbol" .

... which we then have (some) support for with the proposed code changes:

> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -7583,7 +7583,8 @@ nvptx_mem_local_p (rtx mem)
>while (0)
>  
>  void
> -nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
> +nvptx_asm_output_def_from_decls (FILE *stream, tree name,
> +  tree value ATTRIBUTE_UNUSED)
>  {
>if (nvptx_alias == 0 || !TARGET_PTX_6_3)
>  {
> @@ -7618,7 +7619,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>return;
>  }
>  
> -  if (!cgraph_node::get (name)->referred_to_p ())
> +  cgraph_node *cnode = cgraph_node::get (name);
> +  if (!cnode->referred_to_p ())
>  /* Prevent "Internal error: reference to deleted section".  */
>  return;
>  
> @@ -7627,8 +7629,10 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>fputs (s.str ().c_str (), stream);
>  
>tree id = DECL_ASSEMBLER_NAME (name);
> +  symtab_node *alias_target_node = cnode->ultimate_alias_target ();
> +  tree alias_target_id = DECL_ASSEMBLER_NAME (alias_target_node->decl);
>NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
> - IDENTIFIER_POINTER (value));
> +

Add 'gcc.target/nvptx/alias-weak-1.c' (was: [nvptx] Fix code-gen for alias attribute)

2024-09-04 Thread Thomas Schwinge

Hi!

On 2024-09-04T11:45:20+0200, I wrote:
> +int bar () __attribute__((weak, alias ("foo")));

> Now, that said: GCC/nvptx for such code currently diagnoses
> "error: weak alias definitions not supported [...]" ;-|

Pushed to trunk branch commit 2267d254eb6ad782cef7b462f2bb2128bc8ace30
"Add 'gcc.target/nvptx/alias-weak-1.c'", see attached.


Grüße
 Thomas


>From 2267d254eb6ad782cef7b462f2bb2128bc8ace30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 4 Sep 2024 09:58:32 +0200
Subject: [PATCH] Add 'gcc.target/nvptx/alias-weak-1.c'

... testing for the GCC/nvptx "weak alias definitions not supported" error
diagnostic (limitation of PTX).

	gcc/testsuite/
	* gcc.target/nvptx/alias-weak-1.c: New.
---
 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c b/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c
new file mode 100644
index 000..37d9543fc7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-add-options ptx_alias } */
+
+void __f ()
+{
+}
+
+void f () __attribute__ ((weak, alias ("__f")));
+/* { dg-error {weak alias definitions not supported} {} { target *-*-* } .-1 }
+   (limitation of PTX).  */
-- 
2.34.1

Add 'gcc.target/nvptx/alias-to-alias-1.c' (was: [nvptx] Fix code-gen for alias attribute)

2024-09-04 Thread Thomas Schwinge

Hi!

On 2024-09-04T11:45:20+0200, I wrote:
> On 2024-08-26T10:50:36+, Prathamesh Kulkarni  
> wrote:
>> For the following test (adapted from pr96390.c):
>>
>> __attribute__((noipa)) int foo () { return 42; }
>> int bar () __attribute__((alias ("foo")));
>> int baz () __attribute__((alias ("bar")));
>
>> Compiling [for nvptx] results in: [...]

> proposed patch [...] (doesn't affect
> '--target=nvptx-none' test results at all...)

Pushed to trunk branch commit a89321c890b96c583671b73fc802e87545e4a2b1
"Add 'gcc.target/nvptx/alias-to-alias-1.c'", see attached, which as part
of your proposed patch you'll then need to update as follows (or
similar):

--- gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
+++ gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
@@ -1,6 +1,8 @@
 /* Alias to alias; 'libgomp.c-c++-common/pr96390.c'.  */
 
-/* { dg-do compile } */
+/* { dg-do link } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options -save-temps } */
 /* { dg-add-options ptx_alias } */
 
 int v;
@@ -23,5 +25,6 @@ main (void)
 /* { dg-final { scan-assembler-times "\\.visible \\.func foo;" 1 } } */
 /* { dg-final { scan-assembler-times "\\.visible \\.func bar;" 1 } } */
 
-/* { dg-final { scan-assembler-times "\\.alias baz,bar;" 1 } } */
+/* Via 'ultimate_alias_target':
+   { dg-final { scan-assembler-times "\\.alias baz,foo;" 1 } } */
 /* { dg-final { scan-assembler-times "\\.visible \\.func baz;" 1 } } */


Grüße
 Thomas


>From a89321c890b96c583671b73fc802e87545e4a2b1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 4 Sep 2024 09:44:33 +0200
Subject: [PATCH] Add 'gcc.target/nvptx/alias-to-alias-1.c'

... similar to alias to alias usage in 'libgomp.c-c++-common/pr96390.c'.

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-to-alias-1.c: New.
---
 .../gcc.target/nvptx/alias-to-alias-1.c   | 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
new file mode 100644
index 000..3db79d1fc0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
@@ -0,0 +1,27 @@
+/* Alias to alias; 'libgomp.c-c++-common/pr96390.c'.  */
+
+/* { dg-do compile } */
+/* { dg-add-options ptx_alias } */
+
+int v;
+
+void foo () { v = 42; }
+void bar () __attribute__((alias ("foo")));
+void baz () __attribute__((alias ("bar")));
+
+int
+main (void)
+{
+  baz ();
+  if (v != 42)
+__builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "\\.alias bar,foo;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func foo;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func bar;" 1 } } */
+
+/* { dg-final { scan-assembler-times "\\.alias baz,bar;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func baz;" 1 } } */
-- 
2.34.1

[PING] Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN' (was: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2024-09-04 Thread Thomas Schwinge

Hi!

Ping.

On 2024-06-28T15:06:21+0200, I wrote:
> As part of this:
>
> On 2013-07-26T11:04:33-0400, David Malcolm  wrote:
>> This patch is the hand-written part of the conversion of passes from
>> C structs to C++ classes.
>
>> --- a/gcc/passes.c
>> +++ b/gcc/passes.c
>
> ..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to always refer
> to the first instance of 'PASS':
>
>>  #define PUSH_INSERT_PASSES_WITHIN(PASS) \
>>{ \
>> -struct opt_pass **p = &(PASS).pass.sub;
>> +struct opt_pass **p = &(PASS ## _1)->sub;
>
> ..., however we did change 'NEXT_PASS(PASS, NUM)' to actually use 'NUM':
>
>> -#define NEXT_PASS(PASS, NUM)  (p = next_pass_1 (p, &((PASS).pass)))
>> +#define NEXT_PASS(PASS, NUM) \
>> +  do { \
>> +gcc_assert (NULL == PASS ## _ ## NUM); \
>> +if ((NUM) == 1)  \
>> +  PASS ## _1 = make_##PASS (ctxt_);  \
>> +else \
>> +  {  \
>> +gcc_assert (PASS ## _1); \
>> +PASS ## _ ## NUM = PASS ## _1->clone (); \
>> +  }  \
>> +p = next_pass_1 (p, PASS ## _ ## NUM);  \
>> +  } while (0)
>
> This was never re-synchronized later on, and is problematic if you try to
> do something like this; change:
>
> [...]
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_postreload_cse);
> [...]
> NEXT_PASS (pass_cprop_hardreg);
> NEXT_PASS (pass_fast_rtl_dce);
> NEXT_PASS (pass_reorder_blocks);
> [...]
> POP_INSERT_PASSES ()
> [...]
>
> ... into:
>
> [...]
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_postreload_cse);
> [...]
> NEXT_PASS (pass_cprop_hardreg);
> POP_INSERT_PASSES ()
> NEXT_PASS (pass_fast_rtl_dce);
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_reorder_blocks);
> [...]
> POP_INSERT_PASSES ()
> [...]
>
> That is, interrupt the pass pipeline within 'pass_postreload', in order
> to unconditionally run 'pass_fast_rtl_dce' even if not running
> 'pass_postreload'.  What happens is that the second
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' overwrites the first
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' instead of applying to the
> second (preceding) 'NEXT_PASS (pass_postreload);'.
>
> (I ran into this in context of what I tried in
> 
> "nvptx vs. [PATCH] Add a late-combine pass [PR106594]"; discuss that
> specific use case over there, not here.)
>
> OK to address this with the attached
> "Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'"?
>
> This depends on
> 
> "Rewrite usage comment at the top of 'gcc/passes.def'" to avoid running
> into the 'ERROR: Can't locate [...]' that I'm adding, while processing
> the 'PUSH_INSERT_PASSES_WITHIN (PASS)' in the usage comment at the top of
> 'gcc/passes.def', where 'NEXT_PASS (PASS)' only appears later.  ;-)

(Already pushed.)

> I've verified this does the expected thing for the main 'gcc/passes.def',
> and that 'PUSH_INSERT_PASSES_WITHIN' is not used/not applicable for
> 'PASSES_EXTRA' ('gcc/config/*/*-passes.def').


Grüße
 Thomas


>From e368ccba93f5bbaee882076c80849adb55a68fa0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 12:10:12 +0200
Subject: [PATCH] Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'

..., such that also for repeated 'NEXT_PASS', 'PUSH_INSERT_PASSES_WITHIN' for a
given 'PASS', the 'PUSH_INSERT_PASSES_WITHIN' applies to the preceeding
'NEXT_PASS', and not unconditionally applies to the first 'NEXT_PASS'.

	gcc/
	* gen-pass-instances.awk: Handle 'PUSH_INSERT_PASSES_WITHIN'.
	* pass_manager.h (PUSH_INSERT_PASSES_WITHIN): Adjust.
	* passes.cc (PUSH_INSERT_PASSES_WITHIN): Likewise.
---
 gcc/gen-pass-instances.awk | 28 +---
 gcc/pass_manager.h |  2 +-
 gcc/passes.cc  |  6 +++---
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 449889663f7..871ac0cdb52 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -16,7 +16,7 @@
 
 # This Awk script takes passes.def and writes pass-instances.def,
 # counting the instances of each kind of pass, adding an instance number
-# to everywhere that NEXT_PASS is used.
+# to everywhere that NEXT_PASS or PUSH_INSERT_PASSES_WITHIN are used.
 # Also handle INSERT_PASS_AFTER, INSERT_PASS_BEFORE and REPLACE_PASS
 # directives.
 #
@@ -222,9 +222,31 @@ END {
 	  if (with_arg)
 	printf ",%s", with_arg;
 	  printf ")%s\n", postfix;
+
+	  continue;
 	}
-  else
-	print lines[i];
+
+  ret = parse_line(lines[i],

Fix gimple_debug_cfg declaration (was: [PATCH v2 2/N] Introduce dump_flags_t type and use it instead of int, type)

2024-09-04 Thread Thomas Schwinge

Hi!

On 2017-05-17T11:02:09+0200, Martin Liška  wrote:
> On 05/17/2017 09:44 AM, Richard Biener wrote:
>> On Tue, May 16, 2017 at 4:55 PM, Martin Liška  wrote:
>>> On 05/16/2017 03:48 PM, Richard Biener wrote:
 On Fri, May 12, 2017 at 3:00 PM, Martin Liška  wrote:
> Second part changes 'int flags' to a new typedef.
> All corresponding interfaces have been changed.

> installed as r248140.

Long ago, Frederik found that while the 'gimple_debug_cfg' definition had
been adjusted:

| --- a/gcc/tree-cfg.c
| +++ b/gcc/tree-cfg.c
| @@ -2372,7 +2372,7 @@ gimple_debug_bb_n (int n)
| (see TDF_* in dumpfile.h).  */
| 
|  void
| -gimple_debug_cfg (int flags)
| +gimple_debug_cfg (dump_flags_t flags)
|  {
|gimple_dump_cfg (stderr, flags);
|  }

..., but it's prototype had not been:

| --- a/gcc/tree-cfg.h
| +++ b/gcc/tree-cfg.h

|  extern void gimple_debug_cfg (int);

..., and (unfortunately only) fixed that on a development branch.
As obvious, I've now pushed to trunk branch
commit 347a953d855c6b246b1604bdf4728f615cb471b6
"Fix gimple_debug_cfg declaration", see attached.


Grüße
 Thomas


>From 347a953d855c6b246b1604bdf4728f615cb471b6 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 16 Nov 2021 16:08:40 +0100
Subject: [PATCH] Fix gimple_debug_cfg declaration

Silence a warning. The argument type did not match the definition.

gcc/ChangeLog:

	* tree-cfg.h (gimple_debug_cfg): Change argument type from int
	to dump_flags_t.
---
 gcc/tree-cfg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
index 0564b79b4ab..e55991740e8 100644
--- a/gcc/tree-cfg.h
+++ b/gcc/tree-cfg.h
@@ -45,7 +45,7 @@ extern void clear_special_calls (void);
 extern edge find_taken_edge (basic_block, tree);
 extern void gimple_debug_bb (basic_block);
 extern basic_block gimple_debug_bb_n (int);
-extern void gimple_debug_cfg (int);
+extern void gimple_debug_cfg (dump_flags_t);
 extern void gimple_dump_cfg (FILE *, dump_flags_t);
 extern void dump_cfg_stats (FILE *);
 extern void debug_cfg_stats (void);
-- 
2.34.1

Fix branch prediction dump message (was: Predict loops containing recursive call with fewer iterations)

2024-09-04 Thread Thomas Schwinge

Hi!

On 2016-06-26T21:36:56+0200, Jan Hubicka  wrote:
> this patch [...]

> --- predict.c (revision 237789)
> +++ predict.c (working copy)

> @@ -3367,6 +3446,15 @@ pass_profile::execute (function *fun)
>  gimple_dump_cfg (dump_file, dump_flags);
>   if (profile_status_for_fn (fun) == PROFILE_ABSENT)
>  profile_status_for_fn (fun) = PROFILE_GUESSED;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> + struct loop *loop;
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +   if (loop->header->frequency)
> + fprintf (dump_file, "Loop got predicted %d to iterate %i times.\n",
> +loop->num,
> +(int)expected_loop_iterations_unbounded (loop));
> +   }
>return 0;
>  }

... has some in a strange order terms.  ;-) Long ago, Frederik has fixed
this (unfortunately only) on a development branch.  As obvious, I've now
pushed to trunk branch commit 35e4414bac06927387fb7a6fe10b373e766da1c1
"Fix branch prediction dump message", see attached.


Grüße
 Thomas


>From 35e4414bac06927387fb7a6fe10b373e766da1c1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 16 Nov 2021 16:13:51 +0100
Subject: [PATCH] Fix branch prediction dump message

Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".

gcc/ChangeLog:

	* predict.cc (pass_profile::execute): Fix dump message.

Co-authored-by: Thomas Schwinge 
---
 gcc/predict.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 43e3694cb42..f611161f4aa 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4210,7 +4210,7 @@ pass_profile::execute (function *fun)
  sreal iterations;
  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
if (expected_loop_iterations_by_profile (loop, &iterations))
-	 fprintf (dump_file, "Loop got predicted %d to iterate %f times.\n",
+	 fprintf (dump_file, "Loop %d got predicted to iterate %f times.\n",
 	   loop->num, iterations.to_double ());
}
   return 0;
-- 
2.34.1

Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition

2024-09-04 Thread Richard Biener

On Wed, Sep 4, 2024 at 9:48 AM Li, Pan2  wrote:
>
> > I'm lazy - can you please quote genmatch generated code for the condition 
> > for
> > one case?
>
> Sure thing, list the before and after covers all the changes to generated 
> code as blow.
>
> Before this patch:
>   basic_block _b1 = gimple_bb (_a1);
>   if (gimple_phi_num_args (_a1) == 2)
> {
>   basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
>   basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
>   basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1)) ? _pb_0_1 : _pb_1_1;
>   basic_block _other_db_1 = safe_dyn_cast  
> (*gsi_last_bb (_pb_0_1)) ? _pb_1_1 : _pb_0_1;
>   gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb 
> (_db_1));
>   if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
> && EDGE_COUNT (_other_db_1->succs) == 1
> && EDGE_PRED (_other_db_1, 0)->src == _db_1)
> {
>   tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
>   tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
>   tree _p0 = build2 (gimple_cond_code (_ct_1), 
> boolean_type_node, _cond_lhs_1, _cond_rhs_1);
>   bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
> 0)->flags & EDGE_TRUE_VALUE;
>   tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 0 : 1);
>   tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 1 : 0);
>   switch (TREE_CODE (_p0))
> {
>
> After this patch:
>   basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
>   basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
>   gcond *_ct_0_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_0_1));
>   gcond *_ct_1_1 = safe_dyn_cast  (*gsi_last_bb 
> (_pb_1_1));
>   gcond *_ct_a_1 = _ct_0_1 ? _ct_0_1 : _ct_1_1;
>   basic_block _db_1 = _ct_0_1 ? _pb_0_1 : _pb_1_1;
>   basic_block _other_db_1 = _ct_0_1 ? _pb_1_1 : _pb_0_1;
>   edge _e_00_1 = _pb_0_1->preds ? EDGE_PRED (_pb_0_1, 0) : 
> NULL;
>   basic_block _pb_00_1 = _e_00_1 ? _e_00_1->src : NULL;
>   gcond *_ct_b_1 = _pb_00_1 ? safe_dyn_cast  
> (*gsi_last_bb (_pb_00_1)) : NULL;
>   if ((_ct_a_1 && EDGE_COUNT (_other_db_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->succs) == 1
>    && EDGE_PRED (_other_db_1, 0)->src == _db_1)
>   ||
>   (_ct_b_1 && _pb_00_1 && EDGE_COUNT (_pb_0_1->succs) == 1
>    && EDGE_COUNT (_pb_0_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->preds) == 1
>    && EDGE_COUNT (_other_db_1->succs) == 1
>    && EDGE_PRED (_other_db_1, 0)->src == _pb_00_1))
> {
>   gcond *_ct_1 = _ct_a_1 ? _ct_a_1 : _ct_b_1;
>   tree _cond_lhs_1 = gimple_cond_lhs (_ct_1);
>   tree _cond_rhs_1 = gimple_cond_rhs (_ct_1);
>   tree _p0 = build2 (gimple_cond_code (_ct_1), 
> boolean_type_node, _cond_lhs_1, _cond_rhs_1);
>   bool _arg_0_is_true_1 = gimple_phi_arg_edge (_a1, 
> 0)->flags & EDGE_TRUE_VALUE;
>   tree _p1 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 0 : 1);
>   tree _p2 = gimple_phi_arg_def (_a1, _arg_0_is_true_1 ? 
> 1 : 0);

I think it might be better to refactor this to detect the three CFGs like

 if (EDGE_COUNT (_pb_0_1->preds) == 1
 && EDGE_PRED (_pb_0_1, 0)->src == pb_1_1)
   {
.. check rest of constraints ..
   }
else if (... same for _pb_1_1 being the forwarder ...)
 ...
else if (EDGE_COUNT (_pb_0_1->preds) == 1
   && EDGE_COUNT (_pb_1_1->preds) == 1
   && EDGE_PRED (_pb_0_1, 0)->src == EDGE_PRED (_pb_1_1, 0)->src)
...

I also think we may want to split out this CFG matching code out into
a helper function
in gimple-match-head.cc instead of repeating it fully for each pattern?

Thanks,
Richard.

> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, September 4, 2024 3:42 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; juzhe.zh...@rivai.ai; 
> kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com
> Subject: Re: [PATCH v1 1/2] Genmatch: Support new flow for phi on condition
>
> On Wed, Sep 4, 2024 at 9:25 AM  wrote:
> >
> > From: Pan Li 
> >
> > The gen_phi_on_cond can only support below control flow for cond
> > from day 1.  Aka:
> >
> > +--+
> > | def  |
> > | ...  |   +-+
> > | cond |-->| def |
> > +--+   | ... |
> >|   +-+
> >|  |
> >v  |
> > +--

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-04 Thread Jan Hubicka

> On 9/3/24 15:07, Jan Hubicka wrote:
> 
> > Hi,
> > We disable gathers for zen4.  It seems that gather has improved a bit 
> > compared
> > to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions 
> > when
> > the indices are known ahead of time. Vector loads followed by shuffles 
> > result
> > in a higher load bandwidth." however the situation seems to be more
> > complicated.
> 
> A small bit of "real world" experience (but for zen3):
> 
> Recently I switched to gfortran 14.2 for my weather forecasting.
> A year ago I had changed "-march=native -mtune=native" (on my zen3 system)
> to "-march=native -mtune=znver2" while using gfortran 13 - it had only a
> small effect (but positive).
> 
> Last Monday I switched back to "-march=native -mtune=native", but that
> consistently made a 12 hour computation around 6 minutes slower (i.e., about
> 1/120th, or 0.8 %). The most computational intensive part of the code needs
> gather (either instructions or inline expansions of them).

It would be nice to know what is causing this. Gathers can be enabled
using -mtune-ctrl=use_gather and I would be happy to know about real
world situations where they help.

I am still looking into this.  IMO disabling gather like on other zens
makes sense especially for backporting. For trunk
it probably makes sense to look for heuristics carefully enabling
gathers.  It is not clear to me how to benchmark them or how to set up
heuristics.  Spec2017 has very small coverage for loops requiring
gathers and so does tsvc. I did some micro-benchmarks but their
behaviour is, well, puzzling. Having additional data would be great.

As Richard mentioned, it probably makes sense to enable masked gathers,
since the open coded version needs condiitonals and we would not
vectorize at all.  I am not sure if we can do that with current APIs.
I will cook up a micro-benchmarks for that.

Concerning code size, I am not sure how much that applies in practice
since gathers are used relatively sporadically and vectorizer blows up
the code a lot anyways, but certainly one can construct example with
very many loops needing gather...

My guess is that array prefetching data is annotated to the instructoin
cache and since gather produces a lot of loads, probably data simply does
not fit. Opencoding the gather makes extra space for this info...

Honza

[PATCH] RISC-V: Handle unused-only-live stmts in SLP discovery

2024-09-04 Thread Richard Biener

The following adds SLP discovery for roots that are only live but
otherwise unused.  These are usually inductions.  This allows a
few more testcases to be handled fully with SLP, for example
gcc.dg/vect/no-scevccp-pr86725-1.c

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-slp.cc (vect_analyze_slp): Analyze SLP for live
but otherwise unused defs.
---
 gcc/tree-vect-slp.cc | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 41bc92b138a..91d6927016d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4704,6 +4704,36 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  saved_stmts.release ();
}
}
+
+  /* Make sure to vectorize only-live stmts, usually inductions.  */
+  for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+   for (auto gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi);
+gsi_next (&gsi))
+ {
+   gphi *lc_phi = *gsi;
+   tree def = gimple_phi_arg_def_from_edge (lc_phi, e);
+   stmt_vec_info stmt_info;
+   if (TREE_CODE (def) == SSA_NAME
+   && !virtual_operand_p (def)
+   && (stmt_info = loop_vinfo->lookup_def (def))
+   && STMT_VINFO_RELEVANT (stmt_info) == vect_used_only_live
+   && STMT_VINFO_LIVE_P (stmt_info)
+   && (STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def
+   || (STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
+   && STMT_VINFO_REDUC_IDX (stmt_info) == -1)))
+ {
+   vec stmts;
+   vec roots = vNULL;
+   vec remain = vNULL;
+   stmts.create (1);
+   stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
+   vect_build_slp_instance (vinfo,
+slp_inst_kind_reduc_group,
+stmts, roots, remain,
+max_tree_size, &limit,
+bst_map, NULL);
+ }
+ }
 }
 
   hash_set visited_patterns;
-- 
2.43.0

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-04 Thread Richard Biener

On Wed, Sep 4, 2024 at 12:56 PM Jan Hubicka  wrote:
>
> > On 9/3/24 15:07, Jan Hubicka wrote:
> >
> > > Hi,
> > > We disable gathers for zen4.  It seems that gather has improved a bit 
> > > compared
> > > to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions 
> > > when
> > > the indices are known ahead of time. Vector loads followed by shuffles 
> > > result
> > > in a higher load bandwidth." however the situation seems to be more
> > > complicated.
> >
> > A small bit of "real world" experience (but for zen3):
> >
> > Recently I switched to gfortran 14.2 for my weather forecasting.
> > A year ago I had changed "-march=native -mtune=native" (on my zen3 system)
> > to "-march=native -mtune=znver2" while using gfortran 13 - it had only a
> > small effect (but positive).
> >
> > Last Monday I switched back to "-march=native -mtune=native", but that
> > consistently made a 12 hour computation around 6 minutes slower (i.e., about
> > 1/120th, or 0.8 %). The most computational intensive part of the code needs
> > gather (either instructions or inline expansions of them).
>
> It would be nice to know what is causing this. Gathers can be enabled
> using -mtune-ctrl=use_gather and I would be happy to know about real
> world situations where they help.
>
> I am still looking into this.  IMO disabling gather like on other zens
> makes sense especially for backporting. For trunk
> it probably makes sense to look for heuristics carefully enabling
> gathers.  It is not clear to me how to benchmark them or how to set up
> heuristics.  Spec2017 has very small coverage for loops requiring
> gathers and so does tsvc. I did some micro-benchmarks but their
> behaviour is, well, puzzling. Having additional data would be great.
>
> As Richard mentioned, it probably makes sense to enable masked gathers,
> since the open coded version needs condiitonals and we would not
> vectorize at all.  I am not sure if we can do that with current APIs.
> I will cook up a micro-benchmarks for that.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85919, the
targetm.vectorize.builtin_gather/targetm.vectorize.builtin_scatter interface
is legacy and does not support masking at all.

> Concerning code size, I am not sure how much that applies in practice
> since gathers are used relatively sporadically and vectorizer blows up
> the code a lot anyways, but certainly one can construct example with
> very many loops needing gather...
>
> My guess is that array prefetching data is annotated to the instructoin
> cache and since gather produces a lot of loads, probably data simply does
> not fit. Opencoding the gather makes extra space for this info...
>
> Honza
>

[RFC] On Adding Support for Target-Dependent Loop-Specific Pragmas

2024-09-04 Thread Paul Iannetta

Hi,

Currently, the only pragma directives that can be added by a backend, only have
access to the information on the same line as the pragma, which is enough for
modifying a global state.

This means that a loop target pragma could look like this:
#pragma target begin keyword [options]

#pragma target end keyword

However, the coupling between the code in-between the two pragmas and its
semantic is not preserved; moreover it's not possible to know if a loop is
included within the region at parse time.  The second point, is that it does not
work because all the pragmas are resolved at parse time, hence it is not
possible to observe the state of the region when performing optimizations.

It would be nice to have something similar to what clang offers (Language
Extensions: Extensions for loop hint optimizations [1]). So that it's clear that
the pragma is tied to a loop and there is no need for "begin" and "end"
region-delimiters.

Leaving aside OpenMP's pragma directives, the only loop-specific pragmas
provided by GCC are "unroll" and "ivdep".  The parsing of those are very much
hard-wired into the front-ends.  The function dealing with the parsing of loops
in the C and C++ front-end all take "ivdep" and "unroll" as an explicit
argument.  The Fortran front-end deals with them a bit differently but they are
very much hardwired as well there.

This means that for each new loop-specific pragma all those functions should be
augmented with a new parameter, and the loop structure be updated with a new
field.

My current idea to implement this is:
  (1) Add a field "struct loop_info tinfo" to the loop class in cfgloop.h
  (2) The tinfo structure would contain the fields for the target independent
  loop information (that is unroll, and ivdep).
  (3) More fields could be added to this structure through a TARGET macro, it
  would looks like:

struct loop_info tinfo {
  bool ivdep;
  unsigned short unroll;
  #ifdef TARGET_LOOP_INFO_FIELDS
TARGET_LOOP_INFO_FIELDS
  #endif
};

  (4) The annot_expr_kind enum should be extendable in the same way through a
  TARGET macro (TARGET_ANNOTATE_EXPR).
  (5) The callback hook processing the pragma should receive a pointer to the
  TINFO structure.
  (6) A new hook should be introduced for creating the ANNOTATE_EXPR on the loop
  condition.  This hook will be called by the loop parsing function.
  (7) The function replace_loop_annotate_in_block would iterate over the kinds
  described by the enum annot_expr_kind, and call a new hook to process the
  annotate expr.

In summary, in would require a new structure, the modification of the
prototypes of the loop parsing functions, two new TARGET macros, a new kind of
pragma hook for loop-specific pragma, and two new TARGET hooks.

I don't really like the fact that TARGET_LOOP_INFO_FIELDS and
TARGET_ANNOTATE_EXPR contains enum and struct fragments, another approach could
be to use a hash-table.

Thanks,
Paul

[1]: https://clang.llvm.org/docs/LanguageExtensions.html#id39

nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'

2024-09-04 Thread Thomas Schwinge

Hi!

Pushed to trunk branch in commit fee2fbedbb43ad7a017a33ed2b820be79b75e7e5
"nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'", see
attached.


Grüße
 Thomas


>From fee2fbedbb43ad7a017a33ed2b820be79b75e7e5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Jul 2024 10:49:16 +0200
Subject: [PATCH] nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of
 'int'

This allows getting rid of the respective type casts.  No change in behavior
intended.

	gcc/
	* config/nvptx/gen-opt.sh: Use 'enum ptx_isa' instead of 'int'.
	* config/nvptx/nvptx-gen.opt: Regenerate.
	* config/nvptx/nvptx.opt: Use 'enum ptx_version' instead of 'int'.
	* config/nvptx/nvptx-opts.h (enum ptx_isa): Add 'PTX_ISA_unset'.
	(enum ptx_version): Add 'PTX_VERSION_unset'.
	* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Adjust.
	* config/nvptx/nvptx.cc (default_ptx_version_option)
	(handle_ptx_version_option, nvptx_option_override)
	(nvptx_file_start): Likewise.
---
 gcc/config/nvptx/gen-opt.sh| 14 +-
 gcc/config/nvptx/nvptx-c.cc|  6 ++
 gcc/config/nvptx/nvptx-gen.opt |  2 +-
 gcc/config/nvptx/nvptx-opts.h  |  4 +++-
 gcc/config/nvptx/nvptx.cc  | 24 
 gcc/config/nvptx/nvptx.opt |  9 ++---
 6 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index 3f7838251d2..6022f51f897 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -38,12 +38,24 @@ echo
 
 . $gen_copyright_sh opt
 
+# Not emitting the following here (in addition to having it in 'nvptx.opt'), as
+# we'll otherwise run into:
+# 
+# gtyp-input.list:10: file [...]/gcc/config/nvptx/nvptx-opts.h specified more than once for language (all)
+# make[2]: *** [Makefile:2981: s-gtype] Error 1
+: ||
+cat .
 
 Enum
-Name(ptx_isa) Type(int)
+Name(ptx_isa) Type(enum ptx_isa)
 Known PTX ISA target architectures (for use with the -misa= option):
 
 EnumValue
diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index f8975327223..fb5147c143e 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -22,6 +22,7 @@
 
 enum ptx_isa
 {
+  PTX_ISA_unset,
 #define NVPTX_SM(XX, SEP) PTX_ISA_SM ## XX SEP
 #define NVPTX_SM_SEP ,
 #include "nvptx-sm.def"
@@ -31,7 +32,8 @@ enum ptx_isa
 
 enum ptx_version
 {
-  PTX_VERSION_default,
+  PTX_VERSION_unset,
+  PTX_VERSION_default = PTX_VERSION_unset,
   PTX_VERSION_3_0,
   PTX_VERSION_3_1,
   PTX_VERSION_4_2,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 2a8f713c680..144b8d0c874 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -231,8 +231,7 @@ first_ptx_version_supporting_sm (enum ptx_isa sm)
 static enum ptx_version
 default_ptx_version_option (void)
 {
-  enum ptx_version first
-= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+  enum ptx_version first = first_ptx_version_supporting_sm (ptx_isa_option);
 
   /* Pick a version that supports the sm.  */
   enum ptx_version res = first;
@@ -311,20 +310,21 @@ sm_version_to_string (enum ptx_isa sm)
 static void
 handle_ptx_version_option (void)
 {
-  if (!OPTION_SET_P (ptx_version_option)
-  || ptx_version_option == PTX_VERSION_default)
+  if (!OPTION_SET_P (ptx_version_option))
+gcc_checking_assert (ptx_version_option == PTX_VERSION_default);
+
+  if (ptx_version_option == PTX_VERSION_default)
 {
   ptx_version_option = default_ptx_version_option ();
   return;
 }
 
-  enum ptx_version first
-= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+  enum ptx_version first = first_ptx_version_supporting_sm (ptx_isa_option);
 
   if (ptx_version_option < first)
 error ("PTX version (%<-mptx%>) needs to be at least %s to support selected"
 	   " %<-misa%> (sm_%s)", ptx_version_to_string (first),
-	   sm_version_to_string ((enum ptx_isa)ptx_isa_option));
+	   sm_version_to_string (ptx_isa_option));
 }
 
 /* Implement TARGET_OPTION_OVERRIDE.  */
@@ -336,7 +336,9 @@ nvptx_option_override (void)
 
   /* Via nvptx 'OPTION_DEFAULT_SPECS', '-misa' always appears on the command
  line; but handle the case that the compiler is not run via the driver.  */
-  if (!OPTION_SET_P (ptx_isa_option))
+  gcc_checking_assert ((ptx_isa_option == PTX_ISA_unset)
+		   == (!OPTION_SET_P (ptx_isa_option)));
+  if (ptx_isa_option == PTX_ISA_unset)
 fatal_error (UNKNOWN_LOCATION, "%<-march=%> must be specified");
 
   handle_ptx_version_option ();
@@ -5953,13 +5955,11 @@ nvptx_file_start (void)
   fputs ("// BEGIN PREAMBLE\n", asm_out_file);
 
   fputs ("\t.version\t", asm_out_file);
-  fputs (ptx_version_to_string ((enum ptx_version)ptx_version_option),
-	 asm_out_file);
+  fputs (ptx_version_to_string (ptx_version_option), asm_out_file);
   fputs ("\n", asm_out_file);
 
   fputs ("\t.target\tsm_", asm_out_file);
-  fputs (sm_version_to_string ((enum pt

Re: [PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Evgeny Karpov

Monday, September 2, 2024
Martin Storsjö  wrote:

> The only non-obvious thing, is that for IMAGE_REL_ARM64_PAGEBASE_REL21,
> i.e. "adrp" instructions, the immediate that gets stored in the
> instruction, is the byte offset to the symbol.
>
> After linking, when the instruction is interpreted at execution time, the
> immediate in an adrp instruction denotes the offset in units of 2^12 bytes
> - but in relocatable object files, the unit of the immediate is in single
> bytes.

This is exactly the reason why the fix was introduced, and it resolves
the issues detected during testing.
Here is a more detailed explanation.

1. the code without the fix
adrp        x0, symbol + 256
add         x0, x0, symbol + 256

2. the code with the fix
adrp        x0, symbol
add         x0, x0, symbol
add         x0, x0, 256


Let's consider the following example, when symbol is located at 3072.

1. Example without the fix
compilation time
adrp        x0, (3072 + 256) & ~0xFFF // x0 = 0
add         x0, x0, (3072 + 256) & 0xFFF // x0 = 3328

linking time when symbol is relocated with offset 896
adrp        x0, (0 + 896) & ~0xFFF // x0 = 0
add         x0, x0, (3328 + 896) & 0xFFF; // x0 = 128
which is wrong. it should be x0 = 3072 + 896 + 256 = 4224

2. Example with the fix
compilation time
adrp        x0, 3072 & ~0xFFF // x0 = 0
add         x0, x0, 3072 & 0xFFF // x0 = 3072
add         x0, x0, 256 // x0 = 3328

linking time when symbol is relocated with offset 896
adrp        x0, (0 + 896) & ~0xFFF // x0 = 0
add         x0, x0, (3072 + 896) & 0xFFF // x0 = 3968
add         x0, x0, 256 // x0 = 4224
x0 contains expected result.

Theoretically, the issue can be solved by changing the alignment of segments
to 4096. It might require further investigation and a follow-up patch if it 
works.
Even if it works, it has a downside as it increases the segment sizes.

Regards,
Evgeny

[PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-04 Thread Jason Merrill

Tested x86_64-pc-linux-gnu.  Any objections?

-- 8< --

Several PRs complain about -Wswitch warning about a case for a bitwise
combination of enumerators.  Clang has an attribute flag_enum to prevent
this; let's adopt that approach as well.

This also recognizes the attribute as [[clang::flag_enum]], introducing
handling of the clang attribute namespace.

PR c++/46457

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_flag_enum_attribute): New.
(c_common_gnu_attributes): Add it.
(c_common_clang_attributes, c_common_clang_attribute_table): New.
* c-common.h: Declare c_common_clang_attribute_table.
* c-warn.cc (c_do_switch_warnings): Handle flag_enum.

gcc/c/ChangeLog:

* c-objc-common.h (c_objc_attribute_table): Add
c_common_clang_attribute_table.

gcc/cp/ChangeLog:

* cp-objcp-common.h (cp_objcp_attribute_table): Add
c_common_clang_attribute_table.

gcc/testsuite/ChangeLog:

* c-c++-common/attr-flag-enum-1.c: New test.

gcc/ChangeLog:

* doc/extend.texi: Document flag_enum attribute.
* doc/invoke.texi: Mention flag_enum in -Wswitch.

libstdc++-v3/ChangeLog:

* include/bits/regex_constants.h: Use flag_enum.
---
 gcc/doc/extend.texi   |  7 
 gcc/doc/invoke.texi   | 11 +++---
 gcc/c-family/c-common.h   |  1 +
 gcc/c/c-objc-common.h |  1 +
 gcc/cp/cp-objcp-common.h  |  1 +
 libstdc++-v3/include/bits/regex_constants.h   |  2 +-
 gcc/c-family/c-attribs.cc | 33 +
 gcc/c-family/c-warn.cc|  4 ++
 gcc/testsuite/c-c++-common/attr-flag-enum-1.c | 37 +++
 9 files changed, 91 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/attr-flag-enum-1.c

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5845bcedf6e..5b9d8c51059 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9187,6 +9187,13 @@ initialization will result in future breakage.
 GCC emits warnings based on this attribute by default; use
 @option{-Wno-designated-init} to suppress them.
 
+@cindex @code{flag_enum} type attribute
+@item flag_enum
+This attribute may be applied to an enumerated type to indicate that
+its enumerators are used in bitwise operations, so e.g. @option{-Wswitch}
+should not warn about a @code{case} that corresponds to a bitwise
+combination of enumerators.
+
 @cindex @code{hardbool} type attribute
 @item hardbool
 @itemx hardbool (@var{false_value})
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43afb0984e5..7c6175efbc0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -7672,9 +7672,9 @@ unless C++14 mode (or newer) is active.
 Warn whenever a @code{switch} statement has an index of enumerated type
 and lacks a @code{case} for one or more of the named codes of that
 enumeration.  (The presence of a @code{default} label prevents this
-warning.)  @code{case} labels outside the enumeration range also
-provoke warnings when this option is used (even if there is a
-@code{default} label).
+warning.)  @code{case} labels that do not correspond to enumerators also
+provoke warnings when this option is used, unless the enumeration is marked
+with the @code{flag_enum} attribute.
 This warning is enabled by @option{-Wall}.
 
 @opindex Wswitch-default
@@ -7688,8 +7688,9 @@ case.
 @item -Wswitch-enum
 Warn whenever a @code{switch} statement has an index of enumerated type
 and lacks a @code{case} for one or more of the named codes of that
-enumeration.  @code{case} labels outside the enumeration range also
-provoke warnings when this option is used.  The only difference
+enumeration.  @code{case} labels that do not correspond to enumerators also
+provoke warnings when this option is used, unless the enumeration is marked
+with the @code{flag_enum} attribute.  The only difference
 between @option{-Wswitch} and this option is that this option gives a
 warning about an omitted enumeration code even if there is a
 @code{default} label.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index d3827573a36..027f077d51b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -821,6 +821,7 @@ extern struct visibility_flags visibility_options;
 
 /* Attribute table common to the C front ends.  */
 extern const struct scoped_attribute_specs c_common_gnu_attribute_table;
+extern const struct scoped_attribute_specs c_common_clang_attribute_table;
 extern const struct scoped_attribute_specs c_common_format_attribute_table;
 
 /* Pointer to function to lazily generate the VAR_DECL for __FUNCTION__ etc.
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 20af5a5bb94..365b5938803 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -79,6 +79,7 @@ static const scoped_attribute_specs *const 
c_objc_attribute_table[] =
 {
   &std_attribute_table,
   &c_

[PATCH v1 7/9] aarch64: Disable the anchors

2024-09-04 Thread Evgeny Karpov

Monday, September 2, 2024
Andrew Pinski  wrote:

> Could you expand on this and why you think disabling is correct?
> It is so you could do:
>         adrp    x0, .LANCHOR0
>         add     x2, x0, :lo12:.LANCHOR0
>         ldr     w1, [x0, #:lo12:.LANCHOR0]
>         ldr     w0, [x2, 4]
>
> Rather than:
>         adrp    x1, t
>         adrp    x0, t1
>         ldr     w1, [x1, #:lo12:t]
>         ldr     w0, [x0, #:lo12:t1]
>         add     w0, w1, w0
>
> Notice how there is only one adrp in the anchor case.
> Could you expand on why the section anchors don't work for pe-coff?

The same explanation applies to addr/ldr when an anchor is used with
an offset.
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662213.html

Regards,
Evgeny

[PATCH v2 01/36] arm: [MVE intrinsics] improve comment for orrq shape

2024-09-04 Thread Christophe Lyon

Add a comment about the lack of "n" forms for floating-point nor 8-bit
integers, to make it clearer why we use build_16_32 for MODE_n.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_orrq_def): Improve 
comment.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ba20c6a8f73..e01939469e3 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -865,7 +865,12 @@ SHAPE (binary_opt_n)
int16x8_t [__arm_]vorrq_m[_s16](int16x8_t inactive, int16x8_t a, int16x8_t 
b, mve_pred16_t p)
int16x8_t [__arm_]vorrq_x[_s16](int16x8_t a, int16x8_t b, mve_pred16_t p)
int16x8_t [__arm_]vorrq[_n_s16](int16x8_t a, const int16_t imm)
-   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
mve_pred16_t p)  */
+   int16x8_t [__arm_]vorrq_m_n[_s16](int16x8_t a, const int16_t imm, 
mve_pred16_t p)
+
+   No "_n" forms for floating-point, nor 8-bit integers:
+   float16x8_t [__arm_]vorrq[_f16](float16x8_t a, float16x8_t b)
+   float16x8_t [__arm_]vorrq_m[_f16](float16x8_t inactive, float16x8_t a, 
float16x8_t b, mve_pred16_t p)
+   float16x8_t [__arm_]vorrq_x[_f16](float16x8_t a, float16x8_t b, 
mve_pred16_t p)  */
 struct binary_orrq_def : public overloaded_base<0>
 {
   bool
-- 
2.34.1

[PATCH v2 00/36] arm: [MVE intrinsics] Re-implement more intrinsics

2024-09-04 Thread Christophe Lyon

Hi,

This is v2 of the patch series I sent in
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657065.html.

I have taken into account the feedback I received, and added more
patches to the series, converting more MVE intrinsics to the new
framework.

Changes v1-v2:

- I kept patch #1 as-is (so, no change): the comment is a bit verbose,
  but I don't this this causes any harm.
- patch #3: use conditionals in a few more places, making the code
  more compact and hopefully easier to read.
- patch #5: use "su64" instead of "ss8" for the immediate parameter.
- patch #6: restore alphabetical order in arm-mve-builtins-base.def
- patch #7: remove trailing ')' in comments in mve.md
- patch #10: remove trailing ')' in comments in mve.md
- patch #11: remove unused parameter names to avoid warnings.
- patch #13: fix a comment.
- patch #15: fix a comment.

Patches 16-36 are new:
- #16: rework vctp
- #17-21: rework v[id]dup + cleanups
- #22: fix checks of immediate arguments, noticed after the discussion
   on patch #5
- #23-27: rework v[id]wdup + cleanups
- #28-30: rework vshlc
- #31-35: rework vadc/vadci/vsbc/vsbci
- #36: introduce long_type_suffix and half_type_suffix helpers to
   avoid some code duplication.

Tested on arm-eabi with
--target_board=arm-qemu{-mthumb/-mfloat-abi=hard/-march=armv8.1-m.main+mve.fp+fp.dp}

Christophe Lyon (36):
  arm: [MVE intrinsics] improve comment for orrq shape
  arm: [MVE intrinsics] remove useless resolve from create shape
  arm: [MVE intrinsics] Cleanup arm-mve-builtins-functions.h
  arm: [MVE intrinsics] factorize vcvtq
  arm: [MVE intrinsics] add vcvt shape
  arm: [MVE intrinsics] rework vcvtq
  arm: [MVE intrinsics] factorize vcvtbq vcvttq
  arm: [MVE intrinsics] add vcvt_f16_f32 and vcvt_f32_f16 shapes
  arm: [MVE intrinsics] rework vcvtbq_f16_f32 vcvttq_f16_f32
vcvtbq_f32_f16 vcvttq_f32_f16
  arm: [MVE intrinsics] factorize vcvtaq vcvtmq vcvtnq vcvtpq
  arm: [MVE intrinsics] add vcvtx shape
  arm: [MVE intrinsics] rework vcvtaq vcvtmq vcvtnq vcvtpq
  arm: [MVE intrinsics] rework vbicq
  arm: [MVE intrinsics] factorize vorn
  arm: [MVE intrinsics] rework vorn
  arm: [MVE intrinsics] rework vctp
  arm: [MVE intrinsics] factorize vddup vidup
  arm: [MVE intrinsics] add viddup shape
  arm: [MVE intrinsics] rework vddup vidup
  arm: [MVE intrinsics] update v[id]dup tests
  arm: [MVE intrinsics] remove v[id]dup expanders
  arm: [MVE intrinsics] fix checks of immediate arguments
  arm: [MVE intrinsics] factorize vdwdup viwdup
  arm: [MVE intrinsics] add vidwdup shape
  arm: [MVE intrinsics] rework vdwdup viwdup
  arm: [MVE intrinsics] update v[id]wdup tests
  arm: [MVE intrinsics] remove useless v[id]wdup expanders
  arm: [MVE intrinsics] add vshlc shape
  arm: [MVE intrinsics] rework vshlcq
  arm: [MVE intrinsics] remove vshlcq useless expanders
  arm: [MVE intrinsics] add vadc_vsbc shape
  arm: [MVE intrinsics] factorize vadc vadci vsbc vsbci
  arm: [MVE intrinsics] rework vadciq
  arm: [MVE intrinsics] rework vadcq
  arm: [MVE intrinsics] rework vsbcq vsbciq
  arm: [MVE intrinsics] use long_type_suffix / half_type_suffix helpers

 gcc/config/arm/arm-builtins.cc|   20 -
 gcc/config/arm/arm-mve-builtins-base.cc   |  593 ++
 gcc/config/arm/arm-mve-builtins-base.def  |   44 +-
 gcc/config/arm/arm-mve-builtins-base.h|   22 +
 gcc/config/arm/arm-mve-builtins-functions.h   |  815 +--
 gcc/config/arm/arm-mve-builtins-shapes.cc |  645 +-
 gcc/config/arm/arm-mve-builtins-shapes.h  |9 +
 gcc/config/arm/arm-mve-builtins.cc|   95 +-
 gcc/config/arm/arm-mve-builtins.def   |1 +
 gcc/config/arm/arm-mve-builtins.h |   12 +-
 gcc/config/arm/arm_mve.h  | 6353 +++--
 gcc/config/arm/arm_mve_builtins.def   |   20 -
 gcc/config/arm/iterators.md   |   68 +-
 gcc/config/arm/mve.md |  832 +--
 .../arm/mve/intrinsics/vddupq_m_wb_u16.c  |   18 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u32.c  |   18 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u8.c   |   18 +-
 .../arm/mve/intrinsics/vddupq_wb_u16.c|   14 +-
 .../arm/mve/intrinsics/vddupq_wb_u32.c|   14 +-
 .../arm/mve/intrinsics/vddupq_wb_u8.c |   14 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u16.c  |   18 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u32.c  |   18 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u8.c   |   18 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u16.c |6 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u32.c |6 +-
 .../arm/mve/intrinsics/vdwdupq_m_wb_u8.c  |6 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u16.c   |6 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u32.c   |6 +-
 .../arm/mve/intrinsics/vdwdupq_wb_u8.c|6 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u16.c |6 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u32.c |6 +-
 .../arm/mve/intrinsics/vdwdupq_x_wb_u8.c  |6 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u16.c

[PATCH v2 10/36] arm: [MVE intrinsics] factorize vcvtaq vcvtmq vcvtnq vcvtpq

2024-09-04 Thread Christophe Lyon

Factorize vcvtaq vcvtmq vcvtnq vcvtpq builtins so that they use the
same parameterized names.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTAQ_M_S, VCVTAQ_M_U,
VCVTAQ_S, VCVTAQ_U, VCVTMQ_M_S, VCVTMQ_M_U, VCVTMQ_S, VCVTMQ_U,
VCVTNQ_M_S, VCVTNQ_M_U, VCVTNQ_S, VCVTNQ_U, VCVTPQ_M_S,
VCVTPQ_M_U, VCVTPQ_S, VCVTPQ_U.
(VCVTAQ, VCVTPQ, VCVTNQ, VCVTMQ, VCVTAQ_M, VCVTMQ_M, VCVTNQ_M)
(VCVTPQ_M): Delete.
(VCVTxQ, VCVTxQ_M): New.
* config/arm/mve.md (mve_vcvtpq_)
(mve_vcvtnq_, mve_vcvtmq_)
(mve_vcvtaq_): Merge into ...
(@mve_q_): ... this.
(mve_vcvtaq_m_, mve_vcvtmq_m_)
(mve_vcvtpq_m_, mve_vcvtnq_m_): Merge into
...
(@mve_q_m_): ... this.
---
 gcc/config/arm/iterators.md |  18 +++---
 gcc/config/arm/mve.md   | 121 +---
 2 files changed, 26 insertions(+), 113 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b9c39a98ca2..162c0d56bfb 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -964,10 +964,18 @@ (define_int_attr mve_insn [
 (VCMLAQ_M_F "vcmla") (VCMLAQ_ROT90_M_F "vcmla") 
(VCMLAQ_ROT180_M_F "vcmla") (VCMLAQ_ROT270_M_F "vcmla")
 (VCMULQ_M_F "vcmul") (VCMULQ_ROT90_M_F "vcmul") 
(VCMULQ_ROT180_M_F "vcmul") (VCMULQ_ROT270_M_F "vcmul")
 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F 
"vcreate")
+(VCVTAQ_M_S "vcvta") (VCVTAQ_M_U "vcvta")
+(VCVTAQ_S "vcvta") (VCVTAQ_U "vcvta")
 (VCVTBQ_F16_F32 "vcvtb") (VCVTTQ_F16_F32 "vcvtt")
 (VCVTBQ_F32_F16 "vcvtb") (VCVTTQ_F32_F16 "vcvtt")
 (VCVTBQ_M_F16_F32 "vcvtb") (VCVTTQ_M_F16_F32 "vcvtt")
 (VCVTBQ_M_F32_F16 "vcvtb") (VCVTTQ_M_F32_F16 "vcvtt")
+(VCVTMQ_M_S "vcvtm") (VCVTMQ_M_U "vcvtm")
+(VCVTMQ_S "vcvtm") (VCVTMQ_U "vcvtm")
+(VCVTNQ_M_S "vcvtn") (VCVTNQ_M_U "vcvtn")
+(VCVTNQ_S "vcvtn") (VCVTNQ_U "vcvtn")
+(VCVTPQ_M_S "vcvtp") (VCVTPQ_M_U "vcvtp")
+(VCVTPQ_S "vcvtp") (VCVTPQ_U "vcvtp")
 (VCVTQ_FROM_F_S "vcvt") (VCVTQ_FROM_F_U "vcvt")
 (VCVTQ_M_FROM_F_S "vcvt") (VCVTQ_M_FROM_F_U "vcvt")
 (VCVTQ_M_N_FROM_F_S "vcvt") (VCVTQ_M_N_FROM_F_U "vcvt")
@@ -2732,14 +2740,10 @@ (define_int_iterator VMVNQ_N [VMVNQ_N_U VMVNQ_N_S])
 (define_int_iterator VREV64Q [VREV64Q_S VREV64Q_U])
 (define_int_iterator VCVTQ_FROM_F [VCVTQ_FROM_F_S VCVTQ_FROM_F_U])
 (define_int_iterator VREV16Q [VREV16Q_U VREV16Q_S])
-(define_int_iterator VCVTAQ [VCVTAQ_U VCVTAQ_S])
 (define_int_iterator VDUPQ_N [VDUPQ_N_U VDUPQ_N_S])
 (define_int_iterator VADDVQ [VADDVQ_U VADDVQ_S])
 (define_int_iterator VREV32Q [VREV32Q_U VREV32Q_S])
 (define_int_iterator VMOVLxQ [VMOVLBQ_S VMOVLBQ_U VMOVLTQ_U VMOVLTQ_S])
-(define_int_iterator VCVTPQ [VCVTPQ_S VCVTPQ_U])
-(define_int_iterator VCVTNQ [VCVTNQ_S VCVTNQ_U])
-(define_int_iterator VCVTMQ [VCVTMQ_S VCVTMQ_U])
 (define_int_iterator VADDLVQ [VADDLVQ_U VADDLVQ_S])
 (define_int_iterator VCVTQ_N_TO_F [VCVTQ_N_TO_F_S VCVTQ_N_TO_F_U])
 (define_int_iterator VCREATEQ [VCREATEQ_U VCREATEQ_S])
@@ -2795,7 +2799,6 @@ (define_int_iterator VQMOVNTQ [VQMOVNTQ_U VQMOVNTQ_S])
 (define_int_iterator VSHLLxQ_N [VSHLLBQ_N_S VSHLLBQ_N_U VSHLLTQ_N_S 
VSHLLTQ_N_U])
 (define_int_iterator VRMLALDAVHQ [VRMLALDAVHQ_U VRMLALDAVHQ_S])
 (define_int_iterator VBICQ_M_N [VBICQ_M_N_S VBICQ_M_N_U])
-(define_int_iterator VCVTAQ_M [VCVTAQ_M_S VCVTAQ_M_U])
 (define_int_iterator VCVTQ_M_TO_F [VCVTQ_M_TO_F_S VCVTQ_M_TO_F_U])
 (define_int_iterator VQRSHRNBQ_N [VQRSHRNBQ_N_U VQRSHRNBQ_N_S])
 (define_int_iterator VABAVQ [VABAVQ_S VABAVQ_U])
@@ -2845,9 +2848,6 @@ (define_int_iterator VQMOVNTQ_M [VQMOVNTQ_M_U 
VQMOVNTQ_M_S])
 (define_int_iterator VMVNQ_M_N [VMVNQ_M_N_U VMVNQ_M_N_S])
 (define_int_iterator VQSHRNTQ_N [VQSHRNTQ_N_U VQSHRNTQ_N_S])
 (define_int_iterator VSHRNTQ_N [VSHRNTQ_N_S VSHRNTQ_N_U])
-(define_int_iterator VCVTMQ_M [VCVTMQ_M_S VCVTMQ_M_U])
-(define_int_iterator VCVTNQ_M [VCVTNQ_M_S VCVTNQ_M_U])
-(define_int_iterator VCVTPQ_M [VCVTPQ_M_S VCVTPQ_M_U])
 (define_int_iterator VCVTQ_M_N_FROM_F [VCVTQ_M_N_FROM_F_S VCVTQ_M_N_FROM_F_U])
 (define_int_iterator VCVTQ_M_FROM_F [VCVTQ_M_FROM_F_U VCVTQ_M_FROM_F_S])
 (define_int_iterator VRMLALDAVHQ_P [VRMLALDAVHQ_P_S VRMLALDAVHQ_P_U])
@@ -2956,6 +2956,8 @@ (define_int_iterator VCVTxQ_F16_F32 [VCVTBQ_F16_F32 
VCVTTQ_F16_F32])
 (define_int_iterator VCVTxQ_F32_F16 [VCVTBQ_F32_F16 VCVTTQ_F32_F16])
 (define_int_iterator VCVTxQ_M_F16_F32 [VCVTBQ_M_F16_F32 VCVTTQ_M_F16_F32])
 (define_int_iterator VCVTxQ_M_F32_F16 [VCVTBQ_M_F32_F16 VCVTTQ_M_F32_F16])
+(define_int_iterator VCVTxQ [VCVTAQ_S VCVTAQ_U VCVTMQ_S VCVTMQ_U VCVTNQ_S 
VCVTNQ_U VCVTPQ_S VCVTPQ_U])
+(define_int_iterator VCVTxQ_M [VCVTAQ_M_S VCVTAQ_M_U VCVTMQ_M_S VCVTMQ_M_U

[PATCH v2 07/36] arm: [MVE intrinsics] factorize vcvtbq vcvttq

2024-09-04 Thread Christophe Lyon

Factorize vcvtbq, vcvttq so that they use the same parameterized
names.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTBQ_F16_F32,
VCVTTQ_F16_F32, VCVTBQ_F32_F16, VCVTTQ_F32_F16, VCVTBQ_M_F16_F32,
VCVTTQ_M_F16_F32, VCVTBQ_M_F32_F16, VCVTTQ_M_F32_F16.
(VCVTxQ_F16_F32): New iterator.
(VCVTxQ_F32_F16): Likewise.
(VCVTxQ_M_F16_F32): Likewise.
(VCVTxQ_M_F32_F16): Likewise.
* config/arm/mve.md (mve_vcvttq_f32_f16v4sf)
(mve_vcvtbq_f32_f16v4sf): Merge into ...
(@mve_q_f32_f16v4sf): ... this.
(mve_vcvtbq_f16_f32v8hf, mve_vcvttq_f16_f32v8hf): Merge into ...
(@mve_q_f16_f32v8hf): ... this.
(mve_vcvtbq_m_f16_f32v8hf, mve_vcvttq_m_f16_f32v8hf): Merge into
...
(@mve_q_m_f16_f32v8hf): ... this.
(mve_vcvtbq_m_f32_f16v4sf, mve_vcvttq_m_f32_f16v4sf): Merge into
...
(@mve_q_m_f32_f16v4sf): ... this.
---
 gcc/config/arm/iterators.md |   8 +++
 gcc/config/arm/mve.md   | 112 +---
 2 files changed, 34 insertions(+), 86 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index bf800625fac..b9c39a98ca2 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -964,6 +964,10 @@ (define_int_attr mve_insn [
 (VCMLAQ_M_F "vcmla") (VCMLAQ_ROT90_M_F "vcmla") 
(VCMLAQ_ROT180_M_F "vcmla") (VCMLAQ_ROT270_M_F "vcmla")
 (VCMULQ_M_F "vcmul") (VCMULQ_ROT90_M_F "vcmul") 
(VCMULQ_ROT180_M_F "vcmul") (VCMULQ_ROT270_M_F "vcmul")
 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F 
"vcreate")
+(VCVTBQ_F16_F32 "vcvtb") (VCVTTQ_F16_F32 "vcvtt")
+(VCVTBQ_F32_F16 "vcvtb") (VCVTTQ_F32_F16 "vcvtt")
+(VCVTBQ_M_F16_F32 "vcvtb") (VCVTTQ_M_F16_F32 "vcvtt")
+(VCVTBQ_M_F32_F16 "vcvtb") (VCVTTQ_M_F32_F16 "vcvtt")
 (VCVTQ_FROM_F_S "vcvt") (VCVTQ_FROM_F_U "vcvt")
 (VCVTQ_M_FROM_F_S "vcvt") (VCVTQ_M_FROM_F_U "vcvt")
 (VCVTQ_M_N_FROM_F_S "vcvt") (VCVTQ_M_N_FROM_F_U "vcvt")
@@ -2948,6 +2952,10 @@ (define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
 (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
 (define_int_iterator VQSHLUQ_M_N [VQSHLUQ_M_N_S])
 (define_int_iterator VQSHLUQ_N [VQSHLUQ_N_S])
+(define_int_iterator VCVTxQ_F16_F32 [VCVTBQ_F16_F32 VCVTTQ_F16_F32])
+(define_int_iterator VCVTxQ_F32_F16 [VCVTBQ_F32_F16 VCVTTQ_F32_F16])
+(define_int_iterator VCVTxQ_M_F16_F32 [VCVTBQ_M_F16_F32 VCVTTQ_M_F16_F32])
+(define_int_iterator VCVTxQ_M_F32_F16 [VCVTBQ_M_F32_F16 VCVTTQ_M_F32_F16])
 (define_int_iterator DLSTP [DLSTP8 DLSTP16 DLSTP32
   DLSTP64])
 (define_int_iterator LETP [LETP8 LETP16 LETP32
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 95c615c1534..6e2f542cdae 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -217,33 +217,20 @@ (define_insn "@mve_q_f"
  [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_f"))
   (set_attr "type" "mve_move")
 ])
-;;
-;; [vcvttq_f32_f16])
-;;
-(define_insn "mve_vcvttq_f32_f16v4sf"
-  [
-   (set (match_operand:V4SF 0 "s_register_operand" "=w")
-   (unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
-VCVTTQ_F32_F16))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvtt.f32.f16\t%q0, %q1"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvttq_f32_f16v4sf"))
-  (set_attr "type" "mve_move")
-])
 
 ;;
-;; [vcvtbq_f32_f16])
+;; [vcvtbq_f32_f16]
+;; [vcvttq_f32_f16]
 ;;
-(define_insn "mve_vcvtbq_f32_f16v4sf"
+(define_insn "@mve_q_f32_f16v4sf"
   [
(set (match_operand:V4SF 0 "s_register_operand" "=w")
(unspec:V4SF [(match_operand:V8HF 1 "s_register_operand" "w")]
-VCVTBQ_F32_F16))
+VCVTxQ_F32_F16))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvtb.f32.f16\t%q0, %q1"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvtbq_f32_f16v4sf"))
+  ".f32.f16\t%q0, %q1"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_f32_f16v4sf"))
   (set_attr "type" "mve_move")
 ])
 
@@ -1342,34 +1329,19 @@ (define_insn "mve_vctpq_m"
 ])
 
 ;;
-;; [vcvtbq_f16_f32])
-;;
-(define_insn "mve_vcvtbq_f16_f32v8hf"
-  [
-   (set (match_operand:V8HF 0 "s_register_operand" "=w")
-   (unspec:V8HF [(match_operand:V8HF 1 "s_register_operand" "0")
- (match_operand:V4SF 2 "s_register_operand" "w")]
-VCVTBQ_F16_F32))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvtb.f16.f32\t%q0, %q2"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvtbq_f16_f32v8hf"))
-  (set_attr "type" "mve_move")
-])
-
-;;
-;; [vcvttq_f16_f32])
+;; [vcvtbq_f16_f32]
+;; [vcvttq_f16_f32]
 ;;
-(define_insn "mve_vcvttq_f16_f32v8hf"
+(define_insn "@mve_q_f16_f32v8hf"
   [
(set (match_operand:V8HF 0 "s_registe

[PATCH v2 09/36] arm: [MVE intrinsics] rework vcvtbq_f16_f32 vcvttq_f16_f32 vcvtbq_f32_f16 vcvttq_f32_f16

2024-09-04 Thread Christophe Lyon

Implement vcvtbq_f16_f32, vcvttq_f16_f32, vcvtbq_f32_f16 and
vcvttq_f32_f16 using the new MVE builtins framework.

2024-07-11 Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (class vcvtxq_impl): New.
(vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins-base.def (vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins-base.h (vcvtbq, vcvttq): New.
* config/arm/arm-mve-builtins.cc (cvt_f16_f32, cvt_f32_f16): New
types.
(function_instance::has_inactive_argument): Support vcvtbq and
vcvttq.
* config/arm/arm_mve.h (vcvttq_f32): Delete.
(vcvtbq_f32): Delete.
(vcvtbq_m): Delete.
(vcvttq_m): Delete.
(vcvttq_f32_f16): Delete.
(vcvtbq_f32_f16): Delete.
(vcvttq_f16_f32): Delete.
(vcvtbq_f16_f32): Delete.
(vcvtbq_m_f16_f32): Delete.
(vcvtbq_m_f32_f16): Delete.
(vcvttq_m_f16_f32): Delete.
(vcvttq_m_f32_f16): Delete.
(vcvtbq_x_f32_f16): Delete.
(vcvttq_x_f32_f16): Delete.
(__arm_vcvttq_f32_f16): Delete.
(__arm_vcvtbq_f32_f16): Delete.
(__arm_vcvttq_f16_f32): Delete.
(__arm_vcvtbq_f16_f32): Delete.
(__arm_vcvtbq_m_f16_f32): Delete.
(__arm_vcvtbq_m_f32_f16): Delete.
(__arm_vcvttq_m_f16_f32): Delete.
(__arm_vcvttq_m_f32_f16): Delete.
(__arm_vcvtbq_x_f32_f16): Delete.
(__arm_vcvttq_x_f32_f16): Delete.
(__arm_vcvttq_f32): Delete.
(__arm_vcvtbq_f32): Delete.
(__arm_vcvtbq_m): Delete.
(__arm_vcvttq_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  56 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm-mve-builtins.cc   |  12 ++
 gcc/config/arm/arm_mve.h | 146 ---
 5 files changed, 74 insertions(+), 146 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index a780d686eb1..760378c91b1 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -251,6 +251,60 @@ public:
   }
 };
 
+  /* Implements vcvt[bt]q_f32_f16 and vcvt[bt]q_f16_f32
+ intrinsics.  */
+class vcvtxq_impl : public function_base
+{
+public:
+  CONSTEXPR vcvtxq_impl (int unspec_f16_f32, int unspec_for_m_f16_f32,
+int unspec_f32_f16, int unspec_for_m_f32_f16)
+: m_unspec_f16_f32 (unspec_f16_f32),
+  m_unspec_for_m_f16_f32 (unspec_for_m_f16_f32),
+  m_unspec_f32_f16 (unspec_f32_f16),
+  m_unspec_for_m_f32_f16 (unspec_for_m_f32_f16)
+  {}
+
+  /* The unspec code associated with vcvt[bt]q.  */
+  int m_unspec_f16_f32;
+  int m_unspec_for_m_f16_f32;
+  int m_unspec_f32_f16;
+  int m_unspec_for_m_f32_f16;
+
+  rtx
+  expand (function_expander &e) const override
+  {
+insn_code code;
+switch (e.pred)
+  {
+  case PRED_none:
+   /* No predicate.  */
+   if (e.type_suffix (0).element_bits == 16)
+ code = code_for_mve_q_f16_f32v8hf (m_unspec_f16_f32);
+   else
+ code = code_for_mve_q_f32_f16v4sf (m_unspec_f32_f16);
+   return e.use_exact_insn (code);
+
+  case PRED_m:
+  case PRED_x:
+   /* "m" or "x" predicate.  */
+   if (e.type_suffix (0).element_bits == 16)
+ code = code_for_mve_q_m_f16_f32v8hf (m_unspec_for_m_f16_f32);
+   else
+ code = code_for_mve_q_m_f32_f16v4sf (m_unspec_for_m_f32_f16);
+
+   if (e.pred == PRED_m)
+ return e.use_cond_insn (code, 0);
+   else
+ return e.use_pred_x_insn (code);
+
+  default:
+   gcc_unreachable ();
+  }
+
+gcc_unreachable ();
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
@@ -452,6 +506,8 @@ FUNCTION (vcmpcsq, 
unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GEU, UNK
 FUNCTION (vcmphiq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GTU, 
UNKNOWN, UNKNOWN, VCMPHIQ_M_U, UNKNOWN, UNKNOWN, VCMPHIQ_M_N_U, UNKNOWN))
 FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
 FUNCTION (vcvtq, vcvtq_impl,)
+FUNCTION (vcvtbq, vcvtxq_impl, (VCVTBQ_F16_F32, VCVTBQ_M_F16_F32, 
VCVTBQ_F32_F16, VCVTBQ_M_F32_F16))
+FUNCTION (vcvttq, vcvtxq_impl, (VCVTTQ_F16_F32, VCVTTQ_M_F16_F32, 
VCVTTQ_F32_F16, VCVTTQ_M_F32_F16))
 FUNCTION_ONLY_N (vdupq, VDUPQ)
 FUNCTION_WITH_RTX_M (veorq, XOR, VEORQ)
 FUNCTION (vfmaq, unspec_mve_function_exact_insn, (-1, -1, VFMAQ_F, -1, -1, 
VFMAQ_N_F, -1, -1, VFMAQ_M_F, -1, -1, VFMAQ_M_N_F))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 671f86b5096..85211d2adc2 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -179,7 +179,11 @@ DEF_MVE_FUNCTION (vcmulq_rot180, binary, all_float, 
mx_or_none)
 DEF_MVE_FUNCTION (vcmulq_rot270, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vcmulq_rot90, binary, all_float, mx_o

[PATCH v2 05/36] arm: [MVE intrinsics] add vcvt shape

2024-09-04 Thread Christophe Lyon

This patch adds the vcvt shape description.

It needs to add a new type_suffix_info parameter to
explicit_type_suffix_p (), because vcvt uses overloads for type
suffixes for integer to floating-point conversions, but not for
floating-point to integer.

2024-07-11 Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc
(nonoverloaded_base::explicit_type_suffix_p): Add unused
type_suffix_info parameter.
(overloaded_base::explicit_type_suffix_p): Likewise.
(unary_n_def::explicit_type_suffix_p): Likewise.
(vcvt): New.
* config/arm/arm-mve-builtins-shapes.h (vcvt): New.
* config/arm/arm-mve-builtins.cc (function_builder::get_name): Add
new type_suffix parameter.
(function_builder::add_overloaded_functions): Likewise.
* config/arm/arm-mve-builtins.h
(function_shape::explicit_type_suffix_p): Likewise.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 108 +-
 gcc/config/arm/arm-mve-builtins-shapes.h  |   1 +
 gcc/config/arm/arm-mve-builtins.cc|   9 +-
 gcc/config/arm/arm-mve-builtins.h |  10 +-
 4 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 0520a8331db..bc99a6a7c43 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -330,7 +330,8 @@ build_16_32 (function_builder &b, const char *signature,
 struct nonoverloaded_base : public function_shape
 {
   bool
-  explicit_type_suffix_p (unsigned int, enum predication_index, enum 
mode_suffix_index) const override
+  explicit_type_suffix_p (unsigned int, enum predication_index,
+ enum mode_suffix_index, type_suffix_info) const 
override
   {
 return true;
   }
@@ -360,7 +361,8 @@ template
 struct overloaded_base : public function_shape
 {
   bool
-  explicit_type_suffix_p (unsigned int i, enum predication_index, enum 
mode_suffix_index) const override
+  explicit_type_suffix_p (unsigned int i, enum predication_index,
+ enum mode_suffix_index, type_suffix_info) const 
override
   {
 return (EXPLICIT_MASK >> i) & 1;
   }
@@ -1856,7 +1858,7 @@ struct unary_n_def : public overloaded_base<0>
 {
   bool
   explicit_type_suffix_p (unsigned int, enum predication_index pred,
- enum mode_suffix_index) const override
+ enum mode_suffix_index, type_suffix_info) const 
override
   {
 return pred != PRED_m;
   }
@@ -1979,6 +1981,106 @@ struct unary_widen_acc_def : public overloaded_base<0>
 };
 SHAPE (unary_widen_acc)
 
+/* _t foo_t0[_t1](_t)
+   _t foo_t0_n[_t1](_t, const int)
+
+   Example: vcvtq.
+   float32x4_t [__arm_]vcvtq[_f32_s32](int32x4_t a)
+   float32x4_t [__arm_]vcvtq_m[_f32_s32](float32x4_t inactive, int32x4_t a, 
mve_pred16_t p)
+   float32x4_t [__arm_]vcvtq_x[_f32_s32](int32x4_t a, mve_pred16_t p)
+   float32x4_t [__arm_]vcvtq_n[_f32_s32](int32x4_t a, const int imm6)
+   float32x4_t [__arm_]vcvtq_m_n[_f32_s32](float32x4_t inactive, int32x4_t a, 
const int imm6, mve_pred16_t p)
+   float32x4_t [__arm_]vcvtq_x_n[_f32_s32](int32x4_t a, const int imm6, 
mve_pred16_t p)
+   int32x4_t [__arm_]vcvtq_s32_f32(float32x4_t a)
+   int32x4_t [__arm_]vcvtq_m[_s32_f32](int32x4_t inactive, float32x4_t a, 
mve_pred16_t p)
+   int32x4_t [__arm_]vcvtq_x_s32_f32(float32x4_t a, mve_pred16_t p)
+   int32x4_t [__arm_]vcvtq_n_s32_f32(float32x4_t a, const int imm6)
+   int32x4_t [__arm_]vcvtq_m_n[_s32_f32](int32x4_t inactive, float32x4_t a, 
const int imm6, mve_pred16_t p)
+   int32x4_t [__arm_]vcvtq_x_n_s32_f32(float32x4_t a, const int imm6, 
mve_pred16_t p)  */
+struct vcvt_def : public overloaded_base<0>
+{
+  bool
+  explicit_type_suffix_p (unsigned int i, enum predication_index pred,
+ enum mode_suffix_index,
+ type_suffix_info type_info) const override
+  {
+if (pred != PRED_m
+   && ((i == 0 && type_info.integer_p)
+   || (i == 1 && type_info.float_p)))
+  return true;
+return false;
+  }
+
+  bool
+  explicit_mode_suffix_p (enum predication_index,
+ enum mode_suffix_index) const override
+  {
+return true;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v1,su64", group, MODE_n, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index from_type;
+tree res;
+unsigned int nimm = (r.mode_suffix_id == MODE_none) ? 0 : 1;
+
+if (!r.check_gp_argument (1 + n

[PATCH v2 02/36] arm: [MVE intrinsics] remove useless resolve from create shape

2024-09-04 Thread Christophe Lyon

vcreateq have no overloaded forms, so there's no need for resolve ().

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (create_def): Remove
resolve.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index e01939469e3..0520a8331db 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1408,12 +1408,6 @@ struct create_def : public nonoverloaded_base
   {
 build_all (b, "v0,su64,su64", group, MODE_none, preserve_user_namespace);
   }
-
-  tree
-  resolve (function_resolver &r) const override
-  {
-return r.resolve_uniform (0, 2);
-  }
 };
 SHAPE (create)
 
-- 
2.34.1

[PATCH v2 08/36] arm: [MVE intrinsics] add vcvt_f16_f32 and vcvt_f32_f16 shapes

2024-09-04 Thread Christophe Lyon

This patch adds the vcvt_f16_f32 and vcvt_f32_f16 shapes descriptions.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vcvt_f16_f32)
(vcvt_f32_f16): New.
* config/arm/arm-mve-builtins-shapes.h (vcvt_f16_f32)
(vcvt_f32_f16): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 35 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  2 ++
 2 files changed, 37 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index bc99a6a7c43..5ebf666d954 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -2081,6 +2081,41 @@ struct vcvt_def : public overloaded_base<0>
 };
 SHAPE (vcvt)
 
+/* float16x8_t foo_f16_f32(float16x8_t, float32x4_t)
+
+   Example: vcvttq_f16_f32.
+   float16x8_t [__arm_]vcvttq_f16_f32(float16x8_t a, float32x4_t b)
+   float16x8_t [__arm_]vcvttq_m_f16_f32(float16x8_t a, float32x4_t b, 
mve_pred16_t p)
+*/
+struct vcvt_f16_f32_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+build_all (b, "v0,v0,v1", group, MODE_none, preserve_user_namespace);
+  }
+};
+SHAPE (vcvt_f16_f32)
+
+/* float32x4_t foo_f32_f16(float16x8_t)
+
+   Example: vcvttq_f32_f16.
+   float32x4_t [__arm_]vcvttq_f32_f16(float16x8_t a)
+   float32x4_t [__arm_]vcvttq_m_f32_f16(float32x4_t inactive, float16x8_t a, 
mve_pred16_t p)
+   float32x4_t [__arm_]vcvttq_x_f32_f16(float16x8_t a, mve_pred16_t p)
+*/
+struct vcvt_f32_f16_def : public nonoverloaded_base
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
+  }
+};
+SHAPE (vcvt_f32_f16)
+
 /* _t vfoo[_t0](_t, _t, mve_pred16_t)
 
i.e. a version of the standard ternary shape in which
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 9a112ceeb29..50157b57571 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -78,6 +78,8 @@ namespace arm_mve
 extern const function_shape *const unary_widen;
 extern const function_shape *const unary_widen_acc;
 extern const function_shape *const vcvt;
+extern const function_shape *const vcvt_f16_f32;
+extern const function_shape *const vcvt_f32_f16;
 extern const function_shape *const vpsel;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1

[PATCH v2 14/36] arm: [MVE intrinsics] factorize vorn

2024-09-04 Thread Christophe Lyon

Factorize vorn so that they use parameterized names.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC): Add VORNQ_M_S,
VORNQ_M_U.
(MVE_FP_M_BINARY_LOGIC): Add VORNQ_M_F.
(mve_insn): Add VORNQ_M_S, VORNQ_M_U, VORNQ_M_F.
* config/arm/mve.md (mve_vornq_s): Rename into ...
(@mve_vornq_s): ... this.
(mve_vornq_u): Rename into ...
(@mve_vornq_u): ... this.
(mve_vornq_f): Rename into ...
(@mve_vornq_f): ... this.
(mve_vornq_m_): Merge into vand/vbic pattern.
(mve_vornq_m_f): Likewise.
---
 gcc/config/arm/iterators.md |  3 +++
 gcc/config/arm/mve.md   | 48 ++---
 2 files changed, 10 insertions(+), 41 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 162c0d56bfb..3a1825ebab2 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -444,6 +444,7 @@ (define_int_iterator MVE_INT_M_BINARY_LOGIC   [
 VANDQ_M_S VANDQ_M_U
 VBICQ_M_S VBICQ_M_U
 VEORQ_M_S VEORQ_M_U
+VORNQ_M_S VORNQ_M_U
 VORRQ_M_S VORRQ_M_U
 ])
 
@@ -594,6 +595,7 @@ (define_int_iterator MVE_FP_M_BINARY_LOGIC   [
 VANDQ_M_F
 VBICQ_M_F
 VEORQ_M_F
+VORNQ_M_F
 VORRQ_M_F
 ])
 
@@ -1094,6 +1096,7 @@ (define_int_attr mve_insn [
 (VMVNQ_N_S "vmvn") (VMVNQ_N_U "vmvn")
 (VNEGQ_M_F "vneg")
 (VNEGQ_M_S "vneg")
+(VORNQ_M_S "vorn") (VORNQ_M_U "vorn") (VORNQ_M_F "vorn")
 (VORRQ_M_N_S "vorr") (VORRQ_M_N_U "vorr")
 (VORRQ_M_S "vorr") (VORRQ_M_U "vorr") (VORRQ_M_F "vorr")
 (VORRQ_N_S "vorr") (VORRQ_N_U "vorr")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index c0dd4b9019e..3d8b199d9d6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1021,9 +1021,9 @@ (define_insn "mve_q"
 ])
 
 ;;
-;; [vornq_u, vornq_s])
+;; [vornq_u, vornq_s]
 ;;
-(define_insn "mve_vornq_s"
+(define_insn "@mve_vornq_s"
   [
(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(ior:MVE_2 (not:MVE_2 (match_operand:MVE_2 2 "s_register_operand" "w"))
@@ -1035,7 +1035,7 @@ (define_insn "mve_vornq_s"
   (set_attr "type" "mve_move")
 ])
 
-(define_expand "mve_vornq_u"
+(define_expand "@mve_vornq_u"
   [
(set (match_operand:MVE_2 0 "s_register_operand")
(ior:MVE_2 (not:MVE_2 (match_operand:MVE_2 2 "s_register_operand"))
@@ -1429,9 +1429,9 @@ (define_insn "mve_q_f"
 ])
 
 ;;
-;; [vornq_f])
+;; [vornq_f]
 ;;
-(define_insn "mve_vornq_f"
+(define_insn "@mve_vornq_f"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(ior:MVE_0 (not:MVE_0 (match_operand:MVE_0 2 "s_register_operand" "w"))
@@ -2710,6 +2710,7 @@ (define_insn "@mve_q_m_"
 ;; [vandq_m_u, vandq_m_s]
 ;; [vbicq_m_u, vbicq_m_s]
 ;; [veorq_m_u, veorq_m_s]
+;; [vornq_m_u, vornq_m_s]
 ;; [vorrq_m_u, vorrq_m_s]
 ;;
 (define_insn "@mve_q_m_"
@@ -2836,24 +2837,6 @@ (define_insn "@mve_q_int_m_"
   (set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vornq_m_u, vornq_m_s])
-;;
-(define_insn "mve_vornq_m_"
-  [
-   (set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
-  (match_operand:MVE_2 2 "s_register_operand" "w")
-  (match_operand:MVE_2 3 "s_register_operand" "w")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VORNQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vornt\t%q0, %q2, %q3"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vornq_"))
-  (set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vqshlq_m_n_s, vqshlq_m_n_u]
 ;; [vshlq_m_n_s, vshlq_m_n_u]
@@ -3108,6 +3091,7 @@ (define_insn "@mve_q_m_n_f"
 ;; [vandq_m_f]
 ;; [vbicq_m_f]
 ;; [veorq_m_f]
+;; [vornq_m_f]
 ;; [vorrq_m_f]
 ;;
 (define_insn "@mve_q_m_f"
@@ -3187,24 +3171,6 @@ (define_insn "@mve_q_m_f"
   (set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vornq_m_f])
-;;
-(define_insn "mve_vornq_m_f"
-  [
-   (set (match_operand:MVE_0 0 "s_register_operand" "=w")
-   (unspec:MVE_0 [(match_operand:MVE_0 1 "s_register_operand" "0")
-  (match_operand:MVE_0 2 "s_register_operand" "w")
-  (match_operand:MVE_0 3 "s_register_operand" "w")
-  (match_operand: 4 "vpr_register_operand" 
"Up")]
-VORNQ_M_F))
-  ]
-  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vpst\;vornt\t%q0, %q2, %q3"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vornq_f"))
-  (set_attr "type" "mve_move")
-   (set_attr "length""8")])
-
 ;;
 ;; [vstrbq_s vstrbq_u]
 ;;
-- 
2.34.1

[PATCH v2 06/36] arm: [MVE intrinsics] rework vcvtq

2024-09-04 Thread Christophe Lyon

Implement vcvtq using the new MVE builtins framework.

In config/arm/arm-mve-builtins-base.def, the patch also restores the
alphabetical order.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (class vcvtq_impl): New.
(vcvtq): New.
* config/arm/arm-mve-builtins-base.def (vcvtq): New.
* config/arm/arm-mve-builtins-base.h (vcvtq): New.
* config/arm/arm-mve-builtins.cc (cvt): New type.
* config/arm/arm_mve.h (vcvtq): Delete.
(vcvtq_n): Delete.
(vcvtq_m): Delete.
(vcvtq_m_n): Delete.
(vcvtq_x): Delete.
(vcvtq_x_n): Delete.
(vcvtq_f16_s16): Delete.
(vcvtq_f32_s32): Delete.
(vcvtq_f16_u16): Delete.
(vcvtq_f32_u32): Delete.
(vcvtq_s16_f16): Delete.
(vcvtq_s32_f32): Delete.
(vcvtq_u16_f16): Delete.
(vcvtq_u32_f32): Delete.
(vcvtq_n_f16_s16): Delete.
(vcvtq_n_f32_s32): Delete.
(vcvtq_n_f16_u16): Delete.
(vcvtq_n_f32_u32): Delete.
(vcvtq_n_s16_f16): Delete.
(vcvtq_n_s32_f32): Delete.
(vcvtq_n_u16_f16): Delete.
(vcvtq_n_u32_f32): Delete.
(vcvtq_m_f16_s16): Delete.
(vcvtq_m_f16_u16): Delete.
(vcvtq_m_f32_s32): Delete.
(vcvtq_m_f32_u32): Delete.
(vcvtq_m_s16_f16): Delete.
(vcvtq_m_u16_f16): Delete.
(vcvtq_m_s32_f32): Delete.
(vcvtq_m_u32_f32): Delete.
(vcvtq_m_n_f16_u16): Delete.
(vcvtq_m_n_f16_s16): Delete.
(vcvtq_m_n_f32_u32): Delete.
(vcvtq_m_n_f32_s32): Delete.
(vcvtq_m_n_s32_f32): Delete.
(vcvtq_m_n_s16_f16): Delete.
(vcvtq_m_n_u32_f32): Delete.
(vcvtq_m_n_u16_f16): Delete.
(vcvtq_x_f16_u16): Delete.
(vcvtq_x_f16_s16): Delete.
(vcvtq_x_f32_s32): Delete.
(vcvtq_x_f32_u32): Delete.
(vcvtq_x_n_f16_s16): Delete.
(vcvtq_x_n_f16_u16): Delete.
(vcvtq_x_n_f32_s32): Delete.
(vcvtq_x_n_f32_u32): Delete.
(vcvtq_x_s16_f16): Delete.
(vcvtq_x_s32_f32): Delete.
(vcvtq_x_u16_f16): Delete.
(vcvtq_x_u32_f32): Delete.
(vcvtq_x_n_s16_f16): Delete.
(vcvtq_x_n_s32_f32): Delete.
(vcvtq_x_n_u16_f16): Delete.
(vcvtq_x_n_u32_f32): Delete.
(__arm_vcvtq_f16_s16): Delete.
(__arm_vcvtq_f32_s32): Delete.
(__arm_vcvtq_f16_u16): Delete.
(__arm_vcvtq_f32_u32): Delete.
(__arm_vcvtq_s16_f16): Delete.
(__arm_vcvtq_s32_f32): Delete.
(__arm_vcvtq_u16_f16): Delete.
(__arm_vcvtq_u32_f32): Delete.
(__arm_vcvtq_n_f16_s16): Delete.
(__arm_vcvtq_n_f32_s32): Delete.
(__arm_vcvtq_n_f16_u16): Delete.
(__arm_vcvtq_n_f32_u32): Delete.
(__arm_vcvtq_n_s16_f16): Delete.
(__arm_vcvtq_n_s32_f32): Delete.
(__arm_vcvtq_n_u16_f16): Delete.
(__arm_vcvtq_n_u32_f32): Delete.
(__arm_vcvtq_m_f16_s16): Delete.
(__arm_vcvtq_m_f16_u16): Delete.
(__arm_vcvtq_m_f32_s32): Delete.
(__arm_vcvtq_m_f32_u32): Delete.
(__arm_vcvtq_m_s16_f16): Delete.
(__arm_vcvtq_m_u16_f16): Delete.
(__arm_vcvtq_m_s32_f32): Delete.
(__arm_vcvtq_m_u32_f32): Delete.
(__arm_vcvtq_m_n_f16_u16): Delete.
(__arm_vcvtq_m_n_f16_s16): Delete.
(__arm_vcvtq_m_n_f32_u32): Delete.
(__arm_vcvtq_m_n_f32_s32): Delete.
(__arm_vcvtq_m_n_s32_f32): Delete.
(__arm_vcvtq_m_n_s16_f16): Delete.
(__arm_vcvtq_m_n_u32_f32): Delete.
(__arm_vcvtq_m_n_u16_f16): Delete.
(__arm_vcvtq_x_f16_u16): Delete.
(__arm_vcvtq_x_f16_s16): Delete.
(__arm_vcvtq_x_f32_s32): Delete.
(__arm_vcvtq_x_f32_u32): Delete.
(__arm_vcvtq_x_n_f16_s16): Delete.
(__arm_vcvtq_x_n_f16_u16): Delete.
(__arm_vcvtq_x_n_f32_s32): Delete.
(__arm_vcvtq_x_n_f32_u32): Delete.
(__arm_vcvtq_x_s16_f16): Delete.
(__arm_vcvtq_x_s32_f32): Delete.
(__arm_vcvtq_x_u16_f16): Delete.
(__arm_vcvtq_x_u32_f32): Delete.
(__arm_vcvtq_x_n_s16_f16): Delete.
(__arm_vcvtq_x_n_s32_f32): Delete.
(__arm_vcvtq_x_n_u16_f16): Delete.
(__arm_vcvtq_x_n_u32_f32): Delete.
(__arm_vcvtq): Delete.
(__arm_vcvtq_n): Delete.
(__arm_vcvtq_m): Delete.
(__arm_vcvtq_m_n): Delete.
(__arm_vcvtq_x): Delete.
(__arm_vcvtq_x_n): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  | 113 
 gcc/config/arm/arm-mve-builtins-base.def |  19 +-
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm-mve-builtins.cc   |  15 +
 gcc/config/arm/arm_mve.h | 666 ---
 5 files changed, 139 insertions(+), 675 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index e0ae593a6c0..a780d686eb1 100644
---

[PATCH v2 04/36] arm: [MVE intrinsics] factorize vcvtq

2024-09-04 Thread Christophe Lyon

Factorize vcvtq so that they use parameterized names.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VCVTQ_FROM_F_S,
VCVTQ_FROM_F_U, VCVTQ_M_FROM_F_S, VCVTQ_M_FROM_F_U,
VCVTQ_M_N_FROM_F_S, VCVTQ_M_N_FROM_F_U, VCVTQ_M_N_TO_F_S,
VCVTQ_M_N_TO_F_U, VCVTQ_M_TO_F_S, VCVTQ_M_TO_F_U,
VCVTQ_N_FROM_F_S, VCVTQ_N_FROM_F_U, VCVTQ_N_TO_F_S,
VCVTQ_N_TO_F_U, VCVTQ_TO_F_S, VCVTQ_TO_F_U.
* config/arm/mve.md (mve_vcvtq_to_f_): Rename into
@mve_q_to_f_.
(mve_vcvtq_from_f_): Rename into
@mve_q_from_f_.
(mve_vcvtq_n_to_f_): Rename into
@mve_q_n_to_f_.
(mve_vcvtq_n_from_f_): Rename into
@mve_q_n_from_f_.
(mve_vcvtq_m_to_f_): Rename into
@mve_q_m_to_f_.
(mve_vcvtq_m_n_from_f_): Rename into
@mve_q_m_n_from_f_.
(mve_vcvtq_m_from_f_): Rename into
@mve_q_m_from_f_.
(mve_vcvtq_m_n_to_f_): Rename into
@mve_q_m_n_to_f_.
---
 gcc/config/arm/iterators.md |  8 +
 gcc/config/arm/mve.md   | 64 ++---
 2 files changed, 40 insertions(+), 32 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b9ff01cb104..bf800625fac 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -964,6 +964,14 @@ (define_int_attr mve_insn [
 (VCMLAQ_M_F "vcmla") (VCMLAQ_ROT90_M_F "vcmla") 
(VCMLAQ_ROT180_M_F "vcmla") (VCMLAQ_ROT270_M_F "vcmla")
 (VCMULQ_M_F "vcmul") (VCMULQ_ROT90_M_F "vcmul") 
(VCMULQ_ROT180_M_F "vcmul") (VCMULQ_ROT270_M_F "vcmul")
 (VCREATEQ_S "vcreate") (VCREATEQ_U "vcreate") (VCREATEQ_F 
"vcreate")
+(VCVTQ_FROM_F_S "vcvt") (VCVTQ_FROM_F_U "vcvt")
+(VCVTQ_M_FROM_F_S "vcvt") (VCVTQ_M_FROM_F_U "vcvt")
+(VCVTQ_M_N_FROM_F_S "vcvt") (VCVTQ_M_N_FROM_F_U "vcvt")
+(VCVTQ_M_N_TO_F_S "vcvt") (VCVTQ_M_N_TO_F_U "vcvt")
+(VCVTQ_M_TO_F_S "vcvt") (VCVTQ_M_TO_F_U "vcvt")
+(VCVTQ_N_FROM_F_S "vcvt") (VCVTQ_N_FROM_F_U "vcvt")
+(VCVTQ_N_TO_F_S "vcvt") (VCVTQ_N_TO_F_U "vcvt")
+(VCVTQ_TO_F_S "vcvt") (VCVTQ_TO_F_U "vcvt")
 (VDUPQ_M_N_S "vdup") (VDUPQ_M_N_U "vdup") (VDUPQ_M_N_F "vdup")
 (VDUPQ_N_S "vdup") (VDUPQ_N_U "vdup") (VDUPQ_N_F "vdup")
 (VEORQ_M_S "veor") (VEORQ_M_U "veor") (VEORQ_M_F "veor")
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 706a45c7d66..95c615c1534 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -248,17 +248,17 @@ (define_insn "mve_vcvtbq_f32_f16v4sf"
 ])
 
 ;;
-;; [vcvtq_to_f_s, vcvtq_to_f_u])
+;; [vcvtq_to_f_s, vcvtq_to_f_u]
 ;;
-(define_insn "mve_vcvtq_to_f_"
+(define_insn "@mve_q_to_f_"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand: 1 "s_register_operand" "w")]
 VCVTQ_TO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvt.f%#.%#\t%q0, %q1"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvtq_to_f_"))
+  ".f%#.%#\t%q0, %q1"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_to_f_"))
   (set_attr "type" "mve_move")
 ])
 
@@ -278,17 +278,17 @@ (define_insn "@mve_q_"
 ])
 
 ;;
-;; [vcvtq_from_f_s, vcvtq_from_f_u])
+;; [vcvtq_from_f_s, vcvtq_from_f_u]
 ;;
-(define_insn "mve_vcvtq_from_f_"
+(define_insn "@mve_q_from_f_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
(unspec:MVE_5 [(match_operand: 1 "s_register_operand" "w")]
 VCVTQ_FROM_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvt.%#.f%#\t%q0, %q1"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvtq_from_f_"))
+  ".%#.f%#\t%q0, %q1"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_from_f_"))
   (set_attr "type" "mve_move")
 ])
 
@@ -581,9 +581,9 @@ (define_insn "@mve_q_n_f"
 ])
 
 ;;
-;; [vcvtq_n_to_f_s, vcvtq_n_to_f_u])
+;; [vcvtq_n_to_f_s, vcvtq_n_to_f_u]
 ;;
-(define_insn "mve_vcvtq_n_to_f_"
+(define_insn "@mve_q_n_to_f_"
   [
(set (match_operand:MVE_0 0 "s_register_operand" "=w")
(unspec:MVE_0 [(match_operand: 1 "s_register_operand" "w")
@@ -591,8 +591,8 @@ (define_insn "mve_vcvtq_n_to_f_"
 VCVTQ_N_TO_F))
   ]
   "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
-  "vcvt.f.\t%q0, %q1, %2"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vcvtq_n_to_f_"))
+  ".f.\t%q0, %q1, %2"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_n_to_f_"))
   (set_attr "type" "mve_move")
 ])
 
@@ -679,9 +679,9 @@ (define_insn "mve_vshrq_n_u_imm"
 ])
 
 ;;
-;; [vcvtq_n_from_f_s, vcvtq_n_from_f_u])
+;; [vcvtq_n_from_f_s, vcvtq_n_from_f_u]
 ;;
-(define_insn "mve_vcvtq_n_from_f_"
+(define_insn "@mve_q_n_from_f_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
(unspec:MVE_5 [(match_operand:

[PATCH v2 11/36] arm: [MVE intrinsics] add vcvtx shape

2024-09-04 Thread Christophe Lyon

This patch adds the vcvtx shape description for vcvtaq, vcvtmq,
vcvtnq, vcvtpq.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vcvtx): New.
* config/arm/arm-mve-builtins-shapes.h (vcvtx): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 59 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 60 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 5ebf666d954..6632ee49067 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -2116,6 +2116,65 @@ struct vcvt_f32_f16_def : public nonoverloaded_base
 };
 SHAPE (vcvt_f32_f16)
 
+/* _t foo_t0[_t1](_t)
+
+   Example: vcvtaq.
+   int16x8_t [__arm_]vcvtaq_s16_f16(float16x8_t a)
+   int16x8_t [__arm_]vcvtaq_m[_s16_f16](int16x8_t inactive, float16x8_t a, 
mve_pred16_t p)
+   int16x8_t [__arm_]vcvtaq_x_s16_f16(float16x8_t a, mve_pred16_t p)
+*/
+struct vcvtx_def : public overloaded_base<0>
+{
+  bool
+  explicit_type_suffix_p (unsigned int, enum predication_index pred,
+ enum mode_suffix_index,
+ type_suffix_info) const override
+  {
+return pred != PRED_m;
+  }
+
+  bool
+  skip_overload_p (enum predication_index pred, enum mode_suffix_index)
+const override
+  {
+return pred != PRED_m;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v1", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index from_type;
+tree res;
+
+if (!r.check_gp_argument (1, i, nargs)
+   || (from_type
+   = r.infer_vector_type (i)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+type_suffix_index to_type;
+
+gcc_assert (r.pred == PRED_m);
+
+/* Get the return type from the 'inactive' argument.  */
+to_type = r.infer_vector_type (0);
+
+if ((res = r.lookup_form (r.mode_suffix_id, to_type, from_type)))
+   return res;
+
+return r.report_no_such_form (from_type);
+  }
+};
+SHAPE (vcvtx)
+
 /* _t vfoo[_t0](_t, _t, mve_pred16_t)
 
i.e. a version of the standard ternary shape in which
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 50157b57571..ef497b6c97a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -80,6 +80,7 @@ namespace arm_mve
 extern const function_shape *const vcvt;
 extern const function_shape *const vcvt_f16_f32;
 extern const function_shape *const vcvt_f32_f16;
+extern const function_shape *const vcvtx;
 extern const function_shape *const vpsel;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1

[PATCH v2 13/36] arm: [MVE intrinsics] rework vbicq

2024-09-04 Thread Christophe Lyon

Implement vbicq using the new MVE builtins framework.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vbicq): New.
* config/arm/arm-mve-builtins-base.def (vbicq): New.
* config/arm/arm-mve-builtins-base.h (vbicq): New.
* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_exact_insn_vbic): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Add support for vbicq.
* config/arm/arm_mve.h (vbicq): Delete.
(vbicq_m_n): Delete.
(vbicq_m): Delete.
(vbicq_x): Delete.
(vbicq_u8): Delete.
(vbicq_s8): Delete.
(vbicq_u16): Delete.
(vbicq_s16): Delete.
(vbicq_u32): Delete.
(vbicq_s32): Delete.
(vbicq_n_u16): Delete.
(vbicq_f16): Delete.
(vbicq_n_s16): Delete.
(vbicq_n_u32): Delete.
(vbicq_f32): Delete.
(vbicq_n_s32): Delete.
(vbicq_m_n_s16): Delete.
(vbicq_m_n_s32): Delete.
(vbicq_m_n_u16): Delete.
(vbicq_m_n_u32): Delete.
(vbicq_m_s8): Delete.
(vbicq_m_s32): Delete.
(vbicq_m_s16): Delete.
(vbicq_m_u8): Delete.
(vbicq_m_u32): Delete.
(vbicq_m_u16): Delete.
(vbicq_m_f32): Delete.
(vbicq_m_f16): Delete.
(vbicq_x_s8): Delete.
(vbicq_x_s16): Delete.
(vbicq_x_s32): Delete.
(vbicq_x_u8): Delete.
(vbicq_x_u16): Delete.
(vbicq_x_u32): Delete.
(vbicq_x_f16): Delete.
(vbicq_x_f32): Delete.
(__arm_vbicq_u8): Delete.
(__arm_vbicq_s8): Delete.
(__arm_vbicq_u16): Delete.
(__arm_vbicq_s16): Delete.
(__arm_vbicq_u32): Delete.
(__arm_vbicq_s32): Delete.
(__arm_vbicq_n_u16): Delete.
(__arm_vbicq_n_s16): Delete.
(__arm_vbicq_n_u32): Delete.
(__arm_vbicq_n_s32): Delete.
(__arm_vbicq_m_n_s16): Delete.
(__arm_vbicq_m_n_s32): Delete.
(__arm_vbicq_m_n_u16): Delete.
(__arm_vbicq_m_n_u32): Delete.
(__arm_vbicq_m_s8): Delete.
(__arm_vbicq_m_s32): Delete.
(__arm_vbicq_m_s16): Delete.
(__arm_vbicq_m_u8): Delete.
(__arm_vbicq_m_u32): Delete.
(__arm_vbicq_m_u16): Delete.
(__arm_vbicq_x_s8): Delete.
(__arm_vbicq_x_s16): Delete.
(__arm_vbicq_x_s32): Delete.
(__arm_vbicq_x_u8): Delete.
(__arm_vbicq_x_u16): Delete.
(__arm_vbicq_x_u32): Delete.
(__arm_vbicq_f16): Delete.
(__arm_vbicq_f32): Delete.
(__arm_vbicq_m_f32): Delete.
(__arm_vbicq_m_f16): Delete.
(__arm_vbicq_x_f16): Delete.
(__arm_vbicq_x_f32): Delete.
(__arm_vbicq): Delete.
(__arm_vbicq_m_n): Delete.
(__arm_vbicq_m): Delete.
(__arm_vbicq_x): Delete.
* config/arm/mve.md (mve_vbicq_u): Rename into ...
(@mve_vbicq_u): ... this.
(mve_vbicq_s): Rename into ...
(@mve_vbicq_s): ... this.
(mve_vbicq_f): Rename into ...
(@mve_vbicq_f): ... this.
---
 gcc/config/arm/arm-mve-builtins-base.cc |   1 +
 gcc/config/arm/arm-mve-builtins-base.def|   2 +
 gcc/config/arm/arm-mve-builtins-base.h  |   1 +
 gcc/config/arm/arm-mve-builtins-functions.h |  54 ++
 gcc/config/arm/arm-mve-builtins.cc  |   1 +
 gcc/config/arm/arm_mve.h| 574 
 gcc/config/arm/mve.md   |   6 +-
 7 files changed, 62 insertions(+), 577 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 281f3749bce..e33603ec1f3 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -481,6 +481,7 @@ FUNCTION_PRED_P_S_U (vaddlvq, VADDLVQ)
 FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
 FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
 FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
+FUNCTION (vbicq, unspec_based_mve_function_exact_insn_vbic, (VBICQ_N_S, 
VBICQ_N_U, VBICQ_M_S, VBICQ_M_U, VBICQ_M_F, VBICQ_M_N_S, VBICQ_M_N_U))
 FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
 FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD90, 
UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M, VCADDQ_ROT90_M, 
VCADDQ_ROT90_M_F))
 FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, (UNSPEC_VCADD270, 
UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M, VCADDQ_ROT270_M, 
VCADDQ_ROT270_M_F))
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index cf733f7627a..aa7b71387f9 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -27,6 +27,7 @@ DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, 
mx_or_none)
 DEF_MVE_FUNCTION (vaddvaq, unary_int32_acc, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vaddvq, unary_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vandq, bi

[PATCH v2 20/36] arm: [MVE intrinsics] update v[id]dup tests

2024-09-04 Thread Christophe Lyon

Testing v[id]dup overloads with '1' as argument for uint32_t* does not
make sense: instead of choosing the '_wb' overload, we choose the
'_n', but we already do that in the '_n' tests.

This patch removes all such bogus foo2 functions.

2024-08-28  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vddupq_x_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_m_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_wb_u8.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u16.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u32.c: Remove foo2.
* gcc.target/arm/mve/intrinsics/vidupq_x_wb_u8.c: Remove foo2.
---
 .../arm/mve/intrinsics/vddupq_m_wb_u16.c   | 18 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u32.c   | 18 +-
 .../arm/mve/intrinsics/vddupq_m_wb_u8.c| 18 +-
 .../arm/mve/intrinsics/vddupq_wb_u16.c | 14 +-
 .../arm/mve/intrinsics/vddupq_wb_u32.c | 14 +-
 .../arm/mve/intrinsics/vddupq_wb_u8.c  | 14 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u16.c   | 18 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u32.c   | 18 +-
 .../arm/mve/intrinsics/vddupq_x_wb_u8.c| 18 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u16.c   | 18 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u32.c   | 18 +-
 .../arm/mve/intrinsics/vidupq_m_wb_u8.c| 18 +-
 .../arm/mve/intrinsics/vidupq_wb_u16.c | 14 +-
 .../arm/mve/intrinsics/vidupq_wb_u32.c | 14 +-
 .../arm/mve/intrinsics/vidupq_wb_u8.c  | 14 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u16.c   | 18 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u32.c   | 18 +-
 .../arm/mve/intrinsics/vidupq_x_wb_u8.c| 18 +-
 18 files changed, 18 insertions(+), 282 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
index 2a907417b40..d4391358fc2 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u16.c
@@ -42,24 +42,8 @@ foo1 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
   return vddupq_m (inactive, a, 1, p);
 }
 
-/*
-**foo2:
-** ...
-** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
-** ...
-** vpst(?: @.*|)
-** ...
-** vddupt.u16  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
-** ...
-*/
-uint16x8_t
-foo2 (uint16x8_t inactive, mve_pred16_t p)
-{
-  return vddupq_m (inactive, 1, 1, p);
-}
-
 #ifdef __cplusplus
 }
 #endif
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
index ffaf3734923..58609dae29f 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u32.c
@@ -42,24 +42,8 @@ foo1 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
   return vddupq_m (inactive, a, 1, p);
 }
 
-/*
-**foo2:
-** ...
-** vmsrp0, (?:ip|fp|r[0-9]+)(?:@.*|)
-** ...
-** vpst(?: @.*|)
-** ...
-** vddupt.u32  q[0-9]+, (?:ip|fp|r[0-9]+), #[0-9]+(?:  @.*|)
-** ...
-*/
-uint32x4_t
-foo2 (uint32x4_t inactive, mve_pred16_t p)
-{
-  return vddupq_m (inactive, 1, 1, p);
-}
-
 #ifdef __cplusplus
 }
 #endif
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
index ae7a4e25fe2..a4d820b3628 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vddupq_m_wb_u8.c
+++ b/gc

[PATCH v2 15/36] arm: [MVE intrinsics] rework vorn

2024-09-04 Thread Christophe Lyon

Implement vorn using the new MVE builtins framework.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vornq): New.
* config/arm/arm-mve-builtins-base.def (vornq): New.
* config/arm/arm-mve-builtins-base.h (vornq): New.
* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_exact_insn_vorn): New.
* config/arm/arm_mve.h (vornq): Delete.
(vornq_m): Delete.
(vornq_x): Delete.
(vornq_u8): Delete.
(vornq_s8): Delete.
(vornq_u16): Delete.
(vornq_s16): Delete.
(vornq_u32): Delete.
(vornq_s32): Delete.
(vornq_f16): Delete.
(vornq_f32): Delete.
(vornq_m_s8): Delete.
(vornq_m_s32): Delete.
(vornq_m_s16): Delete.
(vornq_m_u8): Delete.
(vornq_m_u32): Delete.
(vornq_m_u16): Delete.
(vornq_m_f32): Delete.
(vornq_m_f16): Delete.
(vornq_x_s8): Delete.
(vornq_x_s16): Delete.
(vornq_x_s32): Delete.
(vornq_x_u8): Delete.
(vornq_x_u16): Delete.
(vornq_x_u32): Delete.
(vornq_x_f16): Delete.
(vornq_x_f32): Delete.
(__arm_vornq_u8): Delete.
(__arm_vornq_s8): Delete.
(__arm_vornq_u16): Delete.
(__arm_vornq_s16): Delete.
(__arm_vornq_u32): Delete.
(__arm_vornq_s32): Delete.
(__arm_vornq_m_s8): Delete.
(__arm_vornq_m_s32): Delete.
(__arm_vornq_m_s16): Delete.
(__arm_vornq_m_u8): Delete.
(__arm_vornq_m_u32): Delete.
(__arm_vornq_m_u16): Delete.
(__arm_vornq_x_s8): Delete.
(__arm_vornq_x_s16): Delete.
(__arm_vornq_x_s32): Delete.
(__arm_vornq_x_u8): Delete.
(__arm_vornq_x_u16): Delete.
(__arm_vornq_x_u32): Delete.
(__arm_vornq_f16): Delete.
(__arm_vornq_f32): Delete.
(__arm_vornq_m_f32): Delete.
(__arm_vornq_m_f16): Delete.
(__arm_vornq_x_f16): Delete.
(__arm_vornq_x_f32): Delete.
(__arm_vornq): Delete.
(__arm_vornq_m): Delete.
(__arm_vornq_x): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc |   1 +
 gcc/config/arm/arm-mve-builtins-base.def|   2 +
 gcc/config/arm/arm-mve-builtins-base.h  |   1 +
 gcc/config/arm/arm-mve-builtins-functions.h |  53 +++
 gcc/config/arm/arm_mve.h| 431 
 5 files changed, 57 insertions(+), 431 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index e33603ec1f3..f8260f5f483 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -568,6 +568,7 @@ FUNCTION_WITH_RTX_M_N (vmulq, MULT, VMULQ)
 FUNCTION_WITH_RTX_M_N_NO_F (vmvnq, NOT, VMVNQ)
 FUNCTION (vnegq, unspec_based_mve_function_exact_insn, (NEG, NEG, NEG, -1, -1, 
-1, VNEGQ_M_S, -1, VNEGQ_M_F, -1, -1, -1))
 FUNCTION_WITHOUT_M_N (vpselq, VPSELQ)
+FUNCTION (vornq, unspec_based_mve_function_exact_insn_vorn, (-1, -1, 
VORNQ_M_S, VORNQ_M_U, VORNQ_M_F, -1, -1))
 FUNCTION_WITH_RTX_M_N_NO_N_F (vorrq, IOR, VORRQ)
 FUNCTION_WITHOUT_N_NO_U_F (vqabsq, VQABSQ)
 FUNCTION_WITH_M_N_NO_F (vqaddq, VQADDQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index aa7b71387f9..cc76db3e0b9 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -87,6 +87,7 @@ DEF_MVE_FUNCTION (vmulltq_poly, binary_widen_poly, poly_8_16, 
mx_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vmvnq, mvn, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vnegq, unary, all_signed, mx_or_none)
+DEF_MVE_FUNCTION (vornq, binary_orrq, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vpselq, vpsel, all_integer_with_64, none)
 DEF_MVE_FUNCTION (vqabsq, unary, all_signed, m_or_none)
@@ -206,6 +207,7 @@ DEF_MVE_FUNCTION (vminnmq, binary, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vminnmvq, binary_maxvminv, all_float, p_or_none)
 DEF_MVE_FUNCTION (vmulq, binary_opt_n, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vnegq, unary, all_float, mx_or_none)
+DEF_MVE_FUNCTION (vornq, binary_orrq, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vorrq, binary_orrq, all_float, mx_or_none)
 DEF_MVE_FUNCTION (vpselq, vpsel, all_float, none)
 DEF_MVE_FUNCTION (vreinterpretq, unary_convert, reinterpret_float, none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index e6b828a4e1e..ad2647b6758 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -118,6 +118,7 @@ extern const function_base *const vmulltq_poly;
 extern const function_base *const vmulq;
 extern const function_base *const vmvnq;
 extern const function_base *const vnegq;
+extern const function_base *const vornq;
 extern c

[PATCH v2 03/36] arm: [MVE intrinsics] Cleanup arm-mve-builtins-functions.h

2024-09-04 Thread Christophe Lyon

This patch brings no functional change but removes some code
duplication in arm-mve-builtins-functions.h and makes it easier to
read and maintain.

It introduces a new expand_unspec () member of
unspec_based_mve_function_base and makes a few classes inherit from it
instead of function_base.

This adds 3 new members containing the unspec codes for signed-int,
unsigned-int and floating-point intrinsics (no mode, no predicate).
Depending on the derived class, these will be used instead of the 3
similar RTX codes.

The new expand_unspec () handles all the possible unspecs, some of
which maybe not be supported by a given intrinsics family: such code
paths won't be used in that case.  Similarly, codes specific to a
family (RTX, or PRED_p for instance) should be handled by the caller
of expand_unspec ().

Thanks to this, expand () for unspec_based_mve_function_exact_insn,
unspec_mve_function_exact_insn, unspec_mve_function_exact_insn_pred_p,
unspec_mve_function_exact_insn_vshl no longer duplicate a lot of code.

The patch also makes most of PRED_m and PRED_x handling use the same
code, and uses conditional operators when computing which RTX
code/unspec to use when calling code_for_mve_q_XXX.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-functions.h
(unspec_based_mve_function_base): Add m_unspec_for_sint,
m_unspec_for_uint, m_unspec_for_fp and expand_unspec members.
(unspec_based_mve_function_exact_insn): Inherit from
unspec_based_mve_function_base and use expand_unspec.
(unspec_mve_function_exact_insn): Likewise.
(unspec_mve_function_exact_insn_pred_p): Likewise.  Use
conditionals.
(unspec_mve_function_exact_insn_vshl): Likewise.
(unspec_based_mve_function_exact_insn_vcmp): Initialize new
inherited members.  Use conditionals.
(unspec_mve_function_exact_insn_rot): Merge PRED_m and PRED_x
handling.  Use conditionals.
(unspec_mve_function_exact_insn_vmull): Likewise.
(unspec_mve_function_exact_insn_vmull_poly): Likewise.
---
 gcc/config/arm/arm-mve-builtins-functions.h | 726 
 1 file changed, 286 insertions(+), 440 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index ac2a731bff4..35cb5242b77 100644
--- a/gcc/config/arm/arm-mve-builtins-functions.h
+++ b/gcc/config/arm/arm-mve-builtins-functions.h
@@ -40,17 +40,23 @@ public:
 };
 
 /* An incomplete function_base for functions that have an associated
-   rtx_code for signed integers, unsigned integers and floating-point
-   values for the non-predicated, non-suffixed intrinsic, and unspec
-   codes, with separate codes for signed integers, unsigned integers
-   and floating-point values.  The class simply records information
-   about the mapping for derived classes to use.  */
+   rtx_code or an unspec for signed integers, unsigned integers and
+   floating-point values for the non-predicated, non-suffixed
+   intrinsics, and unspec codes, with separate codes for signed
+   integers, unsigned integers and floating-point values for
+   predicated and/or suffixed intrinsics.  The class simply records
+   information about the mapping for derived classes to use and
+   provides a generic expand_unspec () to avoid duplicating expansion
+   code in derived classes.  */
 class unspec_based_mve_function_base : public function_base
 {
 public:
   CONSTEXPR unspec_based_mve_function_base (rtx_code code_for_sint,
rtx_code code_for_uint,
rtx_code code_for_fp,
+   int unspec_for_sint,
+   int unspec_for_uint,
+   int unspec_for_fp,
int unspec_for_n_sint,
int unspec_for_n_uint,
int unspec_for_n_fp,
@@ -63,6 +69,9 @@ public:
 : m_code_for_sint (code_for_sint),
   m_code_for_uint (code_for_uint),
   m_code_for_fp (code_for_fp),
+  m_unspec_for_sint (unspec_for_sint),
+  m_unspec_for_uint (unspec_for_uint),
+  m_unspec_for_fp (unspec_for_fp),
   m_unspec_for_n_sint (unspec_for_n_sint),
   m_unspec_for_n_uint (unspec_for_n_uint),
   m_unspec_for_n_fp (unspec_for_n_fp),
@@ -83,6 +92,9 @@ public:
   /* The unspec code associated with signed-integer, unsigned-integer
  and floating-point operations respectively.  It covers the cases
  with the _n suffix, and/or the _m predicate.  */
+  int m_unspec_for_sint;
+  int m_unspec_for_uint;
+  int m_unspec_for_fp;
   int m_unspec_for_n_sint;
   int m_unspec_for_n_uint;
   int m_unspec_for_n_fp;
@@ -92,8 +104,101 @@ public:
   int m_unspec_for_m_n_sint;
   int m_unspec_for_m_n_uint;
   int m_unspec_for_m_n_fp;
+
+  rtx expand_unspec

[PATCH v2 16/36] arm: [MVE intrinsics] rework vctp

2024-09-04 Thread Christophe Lyon

Implement vctp using the new MVE builtins framework.

2024-08-21  Christophe Lyon  

gcc/ChangeLog:

* config/arm/arm-mve-builtins-base.cc (class vctpq_impl): New.
(vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-base.def (vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-base.h (vctp16q): New.
(vctp32q): New.
(vctp64q): New.
(vctp8q): New.
* config/arm/arm-mve-builtins-shapes.cc (vctp): New.
* config/arm/arm-mve-builtins-shapes.h (vctp): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Add support for vctp.
* config/arm/arm_mve.h (vctp16q): Delete.
(vctp32q): Delete.
(vctp64q): Delete.
(vctp8q): Delete.
(vctp8q_m): Delete.
(vctp64q_m): Delete.
(vctp32q_m): Delete.
(vctp16q_m): Delete.
(__arm_vctp16q): Delete.
(__arm_vctp32q): Delete.
(__arm_vctp64q): Delete.
(__arm_vctp8q): Delete.
(__arm_vctp8q_m): Delete.
(__arm_vctp64q_m): Delete.
(__arm_vctp32q_m): Delete.
(__arm_vctp16q_m): Delete.
* config/arm/mve.md (mve_vctpq): Add '@'
prefix.
(mve_vctpq_m): Likewise.
---
 gcc/config/arm/arm-mve-builtins-base.cc   | 48 +
 gcc/config/arm/arm-mve-builtins-base.def  |  4 ++
 gcc/config/arm/arm-mve-builtins-base.h|  4 ++
 gcc/config/arm/arm-mve-builtins-shapes.cc | 16 ++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 gcc/config/arm/arm-mve-builtins.cc|  4 ++
 gcc/config/arm/arm_mve.h  | 64 ---
 gcc/config/arm/mve.md |  4 +-
 8 files changed, 79 insertions(+), 66 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index f8260f5f483..89724320d43 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -139,6 +139,50 @@ public:
   }
 };
 
+  /* Implements vctp8q, vctp16q, vctp32q and vctp64q intrinsics.  */
+class vctpq_impl : public function_base
+{
+public:
+  CONSTEXPR vctpq_impl (machine_mode mode)
+: m_mode (mode)
+  {}
+
+  /* Mode this intrinsic operates on.  */
+  machine_mode m_mode;
+
+  rtx
+  expand (function_expander &e) const override
+  {
+insn_code code;
+rtx target;
+
+if (e.mode_suffix_id != MODE_none)
+  gcc_unreachable ();
+
+switch (e.pred)
+  {
+  case PRED_none:
+   /* No predicate, no suffix.  */
+   code = code_for_mve_vctpq (m_mode, m_mode);
+   target = e.use_exact_insn (code);
+   break;
+
+  case PRED_m:
+   /* No suffix, "m" predicate.  */
+   code = code_for_mve_vctpq_m (m_mode, m_mode);
+   target = e.use_cond_insn (code, 0);
+   break;
+
+  default:
+   gcc_unreachable ();
+  }
+
+rtx HItarget = gen_reg_rtx (HImode);
+emit_move_insn (HItarget, gen_lowpart (HImode, target));
+return HItarget;
+  }
+};
+
   /* Implements vcvtq intrinsics.  */
 class vcvtq_impl : public function_base
 {
@@ -506,6 +550,10 @@ FUNCTION (vcmpltq, 
unspec_based_mve_function_exact_insn_vcmp, (LT, UNKNOWN, LT,
 FUNCTION (vcmpcsq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GEU, 
UNKNOWN, UNKNOWN, VCMPCSQ_M_U, UNKNOWN, UNKNOWN, VCMPCSQ_M_N_U, UNKNOWN))
 FUNCTION (vcmphiq, unspec_based_mve_function_exact_insn_vcmp, (UNKNOWN, GTU, 
UNKNOWN, UNKNOWN, VCMPHIQ_M_U, UNKNOWN, UNKNOWN, VCMPHIQ_M_N_U, UNKNOWN))
 FUNCTION_WITHOUT_M_N (vcreateq, VCREATEQ)
+FUNCTION (vctp8q, vctpq_impl, (V16BImode))
+FUNCTION (vctp16q, vctpq_impl, (V8BImode))
+FUNCTION (vctp32q, vctpq_impl, (V4BImode))
+FUNCTION (vctp64q, vctpq_impl, (V2QImode))
 FUNCTION (vcvtq, vcvtq_impl,)
 FUNCTION_WITHOUT_N_NO_F (vcvtaq, VCVTAQ)
 FUNCTION_WITHOUT_N_NO_F (vcvtmq, VCVTMQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index cc76db3e0b9..dd46d882882 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -42,6 +42,10 @@ DEF_MVE_FUNCTION (vcmpleq, cmp, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vcmpltq, cmp, all_signed, m_or_none)
 DEF_MVE_FUNCTION (vcmpneq, cmp, all_integer, m_or_none)
 DEF_MVE_FUNCTION (vcreateq, create, all_integer_with_64, none)
+DEF_MVE_FUNCTION (vctp16q, vctp, none, m_or_none)
+DEF_MVE_FUNCTION (vctp32q, vctp, none, m_or_none)
+DEF_MVE_FUNCTION (vctp64q, vctp, none, m_or_none)
+DEF_MVE_FUNCTION (vctp8q, vctp, none, m_or_none)
 DEF_MVE_FUNCTION (vdupq, unary_n, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (veorq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vhaddq, binary_opt_n, all_integer, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index ad2647b6758..41fcf666b11 100644
--- a/gcc/confi

[PATCH v2 22/36] arm: [MVE intrinsics] fix checks of immediate arguments

2024-09-04 Thread Christophe Lyon

As discussed in [1], it is better to use "su64" for immediates in
intrinsics signatures in order to provide better diagnostics
(erroneous constants are not truncated for instance).  This patch thus
uses su64 instead of ss32 in binary_lshift_unsigned,
binary_rshift_narrow, binary_rshift_narrow_unsigned, ternary_lshift,
ternary_rshift.

In addition, we fix cases where we called require_integer_immediate
whereas we just want to check that the argument is a scalar, and thus
use require_scalar_type in binary_acca_int32, binary_acca_int64,
unary_int32_acc.

Finally, in binary_lshift_unsigned we just want to check that 'imm' is
an immediate, not the optional predicates.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660262.html

2024-08-21  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_acca_int32): Fix
check of scalar argument.
(binary_acca_int64): Likewise.
(binary_lshift_unsigned): Likewise.
(binary_rshift_narrow): Likewise.
(binary_rshift_narrow_unsigned): Likewise.
(ternary_lshift): Likewise.
(ternary_rshift): Likewise.
(unary_int32_acc): Likewise.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 47 +++
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 971e86a2727..a1d2e243128 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -477,18 +477,23 @@ struct binary_acca_int32_def : public overloaded_base<0>
   {
 unsigned int i, nargs;
 type_suffix_index type;
+const char *first_type_name;
+
 if (!r.check_gp_argument (3, i, nargs)
|| (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
   return error_mark_node;
 
+first_type_name = (type_suffixes[type].unsigned_p
+  ? "uint32_t"
+  : "int32_t");
+if (!r.require_scalar_type (0, first_type_name))
+  return error_mark_node;
+
 unsigned int last_arg = i + 1;
 for (i = 1; i < last_arg; i++)
   if (!r.require_matching_vector_type (i, type))
return error_mark_node;
 
-if (!r.require_integer_immediate (0))
-  return error_mark_node;
-
 return r.resolve_to (r.mode_suffix_id, type);
   }
 };
@@ -514,18 +519,24 @@ struct binary_acca_int64_def : public overloaded_base<0>
   {
 unsigned int i, nargs;
 type_suffix_index type;
+const char *first_type_name;
+
 if (!r.check_gp_argument (3, i, nargs)
|| (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
   return error_mark_node;
 
+
+first_type_name = (type_suffixes[type].unsigned_p
+  ? "uint64_t"
+  : "int64_t");
+if (!r.require_scalar_type (0, first_type_name))
+  return error_mark_node;
+
 unsigned int last_arg = i + 1;
 for (i = 1; i < last_arg; i++)
   if (!r.require_matching_vector_type (i, type))
return error_mark_node;
 
-if (!r.require_integer_immediate (0))
-  return error_mark_node;
-
 return r.resolve_to (r.mode_suffix_id, type);
   }
 };
@@ -613,7 +624,7 @@ struct binary_lshift_unsigned_def : public 
overloaded_base<0>
 bool preserve_user_namespace) const override
   {
 b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
-build_all (b, "vu0,vs0,ss32", group, MODE_n, preserve_user_namespace);
+build_all (b, "vu0,vs0,su64", group, MODE_n, preserve_user_namespace);
   }
 
   tree
@@ -622,6 +633,7 @@ struct binary_lshift_unsigned_def : public 
overloaded_base<0>
 unsigned int i, nargs;
 type_suffix_index type;
 if (!r.check_gp_argument (2, i, nargs)
+   || !r.require_integer_immediate (i)
|| (type = r.infer_vector_type (i-1)) == NUM_TYPE_SUFFIXES)
   return error_mark_node;
 
@@ -636,10 +648,6 @@ struct binary_lshift_unsigned_def : public 
overloaded_base<0>
  return error_mark_node;
   }
 
-for (; i < nargs; ++i)
-  if (!r.require_integer_immediate (i))
-   return error_mark_node;
-
 return r.resolve_to (r.mode_suffix_id, type);
   }
 
@@ -1097,7 +1105,7 @@ struct binary_rshift_narrow_def : public 
overloaded_base<0>
 bool preserve_user_namespace) const override
   {
 b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
-build_all (b, "vh0,vh0,v0,ss32", group, MODE_n, preserve_user_namespace);
+build_all (b, "vh0,vh0,v0,su64", group, MODE_n, preserve_user_namespace);
   }
 
   tree
@@ -1144,7 +1152,7 @@ struct binary_rshift_narrow_unsigned_def : public 
overloaded_base<0>
 bool preserve_user_namespace) const override
   {
 b.add_overloaded_functions (group, MODE_n, preserve_user_namespace);
-build_all (b, "vhu0,vhu0,v0,ss32", group, MODE_n, preserve_user_namespace);
+build_all (b, "vhu0,vhu0,v0,su64", group, MODE_n, preserve_user_namespace);
   }
 
   t

[PATCH v2 21/36] arm: [MVE intrinsics] remove v[id]dup expanders

2024-09-04 Thread Christophe Lyon

We use code_for_mve_q_u_insn, rather than the expanders used by the
previous implementation, so we can remove the expanders and their
declaration as builtins.

2024-08-21  Christophe Lyon  

gcc/
* config/arm/arm_mve_builtins.def (vddupq_n_u, vidupq_n_u)
(vddupq_m_n_u, vidupq_m_n_u): Delete.
* config/arm/mve.md (mve_vidupq_n_u, mve_vidupq_m_n_u)
(mve_vddupq_n_u, mve_vddupq_m_n_u): Delete.
---
 gcc/config/arm/arm_mve_builtins.def |  4 --
 gcc/config/arm/mve.md   | 73 -
 2 files changed, 77 deletions(-)

diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index f141aab816c..7e88db4e4c3 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -805,10 +805,6 @@ VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, 
viwdupq_m_wb_u, v16qi, v8hi, v4si
 VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_wb_u, v16qi, v8hi, 
v4si)
 VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, viwdupq_m_n_u, v16qi, v8hi, 
v4si)
 VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_n_u, v16qi, v8hi, 
v4si)
-VAR3 (BINOP_UNONE_UNONE_IMM, vddupq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_IMM, vidupq_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vddupq_m_n_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vidupq_m_n_u, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_n_u, v16qi, v4si, v8hi)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_n_u, v16qi, v4si, v8hi)
 VAR1 (STRSBWBU, vstrwq_scatter_base_wb_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 36117303fd6..3477bbdda7b 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -5088,22 +5088,6 @@ (define_insn 
"mve_vstrwq_scatter_shifted_offset_v4si_insn"
   (set_attr "length" "4")])
 
 ;;
-;; [vidupq_n_u])
-;;
-(define_expand "mve_vidupq_n_u"
- [(match_operand:MVE_2 0 "s_register_operand")
-  (match_operand:SI 1 "s_register_operand")
-  (match_operand:SI 2 "mve_imm_selective_upto_8")]
- "TARGET_HAVE_MVE"
-{
-  rtx temp = gen_reg_rtx (SImode);
-  emit_move_insn (temp, operands[1]);
-  rtx inc = gen_int_mode (INTVAL(operands[2]) * , SImode);
-  emit_insn (gen_mve_vidupq_u_insn (operands[0], temp, operands[1],
- operands[2], inc));
-  DONE;
-})
-
 ;;
 ;; [vddupq_u_insn, vidupq_u_insn]
 ;;
@@ -5118,26 +5102,6 @@ (define_insn "@mve_q_u_insn"
  "TARGET_HAVE_MVE"
  ".u%#\t%q0, %1, %3")
 
-;;
-;; [vidupq_m_n_u])
-;;
-(define_expand "mve_vidupq_m_n_u"
-  [(match_operand:MVE_2 0 "s_register_operand")
-   (match_operand:MVE_2 1 "s_register_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (match_operand:SI 3 "mve_imm_selective_upto_8")
-   (match_operand: 4 "vpr_register_operand")]
-  "TARGET_HAVE_MVE"
-{
-  rtx temp = gen_reg_rtx (SImode);
-  emit_move_insn (temp, operands[2]);
-  rtx inc = gen_int_mode (INTVAL(operands[3]) * , SImode);
-  emit_insn (gen_mve_vidupq_m_wb_u_insn(operands[0], operands[1], temp,
-operands[2], operands[3],
-operands[4], inc));
-  DONE;
-})
-
 ;;
 ;; [vddupq_m_wb_u_insn, vidupq_m_wb_u_insn]
 ;;
@@ -5156,43 +5120,6 @@ (define_insn "@mve_q_m_wb_u_insn"
  [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_u_insn"))
   (set_attr "length""8")])
 
-;;
-;; [vddupq_n_u])
-;;
-(define_expand "mve_vddupq_n_u"
- [(match_operand:MVE_2 0 "s_register_operand")
-  (match_operand:SI 1 "s_register_operand")
-  (match_operand:SI 2 "mve_imm_selective_upto_8")]
- "TARGET_HAVE_MVE"
-{
-  rtx temp = gen_reg_rtx (SImode);
-  emit_move_insn (temp, operands[1]);
-  rtx inc = gen_int_mode (INTVAL(operands[2]) * , SImode);
-  emit_insn (gen_mve_vddupq_u_insn (operands[0], temp, operands[1],
- operands[2], inc));
-  DONE;
-})
-
-;;
-;; [vddupq_m_n_u])
-;;
-(define_expand "mve_vddupq_m_n_u"
-  [(match_operand:MVE_2 0 "s_register_operand")
-   (match_operand:MVE_2 1 "s_register_operand")
-   (match_operand:SI 2 "s_register_operand")
-   (match_operand:SI 3 "mve_imm_selective_upto_8")
-   (match_operand: 4 "vpr_register_operand")]
-  "TARGET_HAVE_MVE"
-{
-  rtx temp = gen_reg_rtx (SImode);
-  emit_move_insn (temp, operands[2]);
-  rtx inc = gen_int_mode (INTVAL(operands[3]) * , SImode);
-  emit_insn (gen_mve_vddupq_m_wb_u_insn(operands[0], operands[1], temp,
-operands[2], operands[3],
-operands[4], inc));
-  DONE;
-})
-
 ;;
 ;; [vdwdupq_n_u])
 ;;
-- 
2.34.1

Re: [PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Martin Storsjö


On Wed, 4 Sep 2024, Evgeny Karpov wrote:


Monday, September 2, 2024
Martin Storsjö  wrote:


The only non-obvious thing, is that for IMAGE_REL_ARM64_PAGEBASE_REL21,
i.e. "adrp" instructions, the immediate that gets stored in the
instruction, is the byte offset to the symbol.

After linking, when the instruction is interpreted at execution time, the
immediate in an adrp instruction denotes the offset in units of 2^12 bytes
- but in relocatable object files, the unit of the immediate is in single
bytes.


This is exactly the reason why the fix was introduced, and it resolves
the issues detected during testing.
Here is a more detailed explanation.

1. the code without the fix
adrp        x0, symbol + 256
add         x0, x0, symbol + 256


I can attest that there are _no_ problems with using such representation 
in COFF-ARM64. Whatever issues you are having must be caused by bugs in 
your assembler, or linker, or both.



Let's consider the following example, when symbol is located at 3072.

1. Example without the fix
compilation time
adrp        x0, (3072 + 256) & ~0xFFF // x0 = 0
add         x0, x0, (3072 + 256) & 0xFFF // x0 = 3328

linking time when symbol is relocated with offset 896
adrp        x0, (0 + 896) & ~0xFFF // x0 = 0


Why did the 3072 suddenly become 0 here?


add         x0, x0, (3328 + 896) & 0xFFF; // x0 = 128


Where did 3328 come from in your example? Wasn't "symbol" supposed to be 
at address 3072, and we're adding an offset of 896 to it?



In any case - you are misrepresenting how the relocations and immediates 
work.


If you have this on the assembly level:

adrp x0, symbol + 896

then the assembler should output an adrp instruction, with the instruction 
immediate encoding the offset 896, with an IMAGE_REL_ARM64_PAGEBASE_REL21 
pointing at "symbol".


When the linker links this object file, it will resolve the virtual 
address of "symbol", add the offset 896, and use this as the destination 
address to calculate the final page offset, for the instruction.


I can produce a set of test data to showcase the various corner cases 
that can be relevant in handling of these relocations, which work with 
both LLD and MS link.exe, to help you pinpoint your potential bug in your 
assembler and linker, and potential misunderstandings about how these 
concepts work. I can hopefully have such a set of examples ready for you 
tonight.


// Martin

[PATCH v2 12/36] arm: [MVE intrinsics] rework vcvtaq vcvtmq vcvtnq vcvtpq

2024-09-04 Thread Christophe Lyon

Implement vcvtaq vcvtmq vcvtnq vcvtpq using the new MVE builtins
framework.

2024-07-11  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins-base.def (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins-base.h: (vcvtaq): New.
(vcvtmq): New.
(vcvtnq): New.
(vcvtpq): New.
* config/arm/arm-mve-builtins.cc (cvtx): New type.
* config/arm/arm_mve.h (vcvtaq_m): Delete.
(vcvtmq_m): Delete.
(vcvtnq_m): Delete.
(vcvtpq_m): Delete.
(vcvtaq_s16_f16): Delete.
(vcvtaq_s32_f32): Delete.
(vcvtnq_s16_f16): Delete.
(vcvtnq_s32_f32): Delete.
(vcvtpq_s16_f16): Delete.
(vcvtpq_s32_f32): Delete.
(vcvtmq_s16_f16): Delete.
(vcvtmq_s32_f32): Delete.
(vcvtpq_u16_f16): Delete.
(vcvtpq_u32_f32): Delete.
(vcvtnq_u16_f16): Delete.
(vcvtnq_u32_f32): Delete.
(vcvtmq_u16_f16): Delete.
(vcvtmq_u32_f32): Delete.
(vcvtaq_u16_f16): Delete.
(vcvtaq_u32_f32): Delete.
(vcvtaq_m_s16_f16): Delete.
(vcvtaq_m_u16_f16): Delete.
(vcvtaq_m_s32_f32): Delete.
(vcvtaq_m_u32_f32): Delete.
(vcvtmq_m_s16_f16): Delete.
(vcvtnq_m_s16_f16): Delete.
(vcvtpq_m_s16_f16): Delete.
(vcvtmq_m_u16_f16): Delete.
(vcvtnq_m_u16_f16): Delete.
(vcvtpq_m_u16_f16): Delete.
(vcvtmq_m_s32_f32): Delete.
(vcvtnq_m_s32_f32): Delete.
(vcvtpq_m_s32_f32): Delete.
(vcvtmq_m_u32_f32): Delete.
(vcvtnq_m_u32_f32): Delete.
(vcvtpq_m_u32_f32): Delete.
(vcvtaq_x_s16_f16): Delete.
(vcvtaq_x_s32_f32): Delete.
(vcvtaq_x_u16_f16): Delete.
(vcvtaq_x_u32_f32): Delete.
(vcvtnq_x_s16_f16): Delete.
(vcvtnq_x_s32_f32): Delete.
(vcvtnq_x_u16_f16): Delete.
(vcvtnq_x_u32_f32): Delete.
(vcvtpq_x_s16_f16): Delete.
(vcvtpq_x_s32_f32): Delete.
(vcvtpq_x_u16_f16): Delete.
(vcvtpq_x_u32_f32): Delete.
(vcvtmq_x_s16_f16): Delete.
(vcvtmq_x_s32_f32): Delete.
(vcvtmq_x_u16_f16): Delete.
(vcvtmq_x_u32_f32): Delete.
(__arm_vcvtpq_u16_f16): Delete.
(__arm_vcvtpq_u32_f32): Delete.
(__arm_vcvtnq_u16_f16): Delete.
(__arm_vcvtnq_u32_f32): Delete.
(__arm_vcvtmq_u16_f16): Delete.
(__arm_vcvtmq_u32_f32): Delete.
(__arm_vcvtaq_u16_f16): Delete.
(__arm_vcvtaq_u32_f32): Delete.
(__arm_vcvtaq_s16_f16): Delete.
(__arm_vcvtaq_s32_f32): Delete.
(__arm_vcvtnq_s16_f16): Delete.
(__arm_vcvtnq_s32_f32): Delete.
(__arm_vcvtpq_s16_f16): Delete.
(__arm_vcvtpq_s32_f32): Delete.
(__arm_vcvtmq_s16_f16): Delete.
(__arm_vcvtmq_s32_f32): Delete.
(__arm_vcvtaq_m_s16_f16): Delete.
(__arm_vcvtaq_m_u16_f16): Delete.
(__arm_vcvtaq_m_s32_f32): Delete.
(__arm_vcvtaq_m_u32_f32): Delete.
(__arm_vcvtmq_m_s16_f16): Delete.
(__arm_vcvtnq_m_s16_f16): Delete.
(__arm_vcvtpq_m_s16_f16): Delete.
(__arm_vcvtmq_m_u16_f16): Delete.
(__arm_vcvtnq_m_u16_f16): Delete.
(__arm_vcvtpq_m_u16_f16): Delete.
(__arm_vcvtmq_m_s32_f32): Delete.
(__arm_vcvtnq_m_s32_f32): Delete.
(__arm_vcvtpq_m_s32_f32): Delete.
(__arm_vcvtmq_m_u32_f32): Delete.
(__arm_vcvtnq_m_u32_f32): Delete.
(__arm_vcvtpq_m_u32_f32): Delete.
(__arm_vcvtaq_x_s16_f16): Delete.
(__arm_vcvtaq_x_s32_f32): Delete.
(__arm_vcvtaq_x_u16_f16): Delete.
(__arm_vcvtaq_x_u32_f32): Delete.
(__arm_vcvtnq_x_s16_f16): Delete.
(__arm_vcvtnq_x_s32_f32): Delete.
(__arm_vcvtnq_x_u16_f16): Delete.
(__arm_vcvtnq_x_u32_f32): Delete.
(__arm_vcvtpq_x_s16_f16): Delete.
(__arm_vcvtpq_x_s32_f32): Delete.
(__arm_vcvtpq_x_u16_f16): Delete.
(__arm_vcvtpq_x_u32_f32): Delete.
(__arm_vcvtmq_x_s16_f16): Delete.
(__arm_vcvtmq_x_s32_f32): Delete.
(__arm_vcvtmq_x_u16_f16): Delete.
(__arm_vcvtmq_x_u32_f32): Delete.
(__arm_vcvtaq_m): Delete.
(__arm_vcvtmq_m): Delete.
(__arm_vcvtnq_m): Delete.
(__arm_vcvtpq_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   4 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm-mve-builtins.cc   |   9 +
 gcc/config/arm/arm_mve.h | 533 ---
 5 files changed, 21 insertions(+), 533 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 760378c91b1..281f3749bce 100644
--- a/gcc/con

[PATCH v2 19/36] arm: [MVE intrinsics] rework vddup vidup

2024-09-04 Thread Christophe Lyon

Implement vddup and vidup using the new MVE builtins framework.

We generate better code because we take advantage of the two outputs
produced by the v[id]dup instructions.

For instance, before:
ldr r3, [r0]
sub r2, r3, #8
str r2, [r0]
mov r2, r3
vddup.u16   q3, r2, #1

now:
ldr r2, [r0]
vddup.u16   q3, r2, #1
str r2, [r0]

2024-08-21  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (class viddup_impl): New.
(vddup): New.
(vidup): New.
* config/arm/arm-mve-builtins-base.def (vddupq): New.
(vidupq): New.
* config/arm/arm-mve-builtins-base.h (vddupq): New.
(vidupq): New.
* config/arm/arm_mve.h (vddupq_m): Delete.
(vddupq_u8): Delete.
(vddupq_u32): Delete.
(vddupq_u16): Delete.
(vidupq_m): Delete.
(vidupq_u8): Delete.
(vidupq_u32): Delete.
(vidupq_u16): Delete.
(vddupq_x_u8): Delete.
(vddupq_x_u16): Delete.
(vddupq_x_u32): Delete.
(vidupq_x_u8): Delete.
(vidupq_x_u16): Delete.
(vidupq_x_u32): Delete.
(vddupq_m_n_u8): Delete.
(vddupq_m_n_u32): Delete.
(vddupq_m_n_u16): Delete.
(vddupq_m_wb_u8): Delete.
(vddupq_m_wb_u16): Delete.
(vddupq_m_wb_u32): Delete.
(vddupq_n_u8): Delete.
(vddupq_n_u32): Delete.
(vddupq_n_u16): Delete.
(vddupq_wb_u8): Delete.
(vddupq_wb_u16): Delete.
(vddupq_wb_u32): Delete.
(vidupq_m_n_u8): Delete.
(vidupq_m_n_u32): Delete.
(vidupq_m_n_u16): Delete.
(vidupq_m_wb_u8): Delete.
(vidupq_m_wb_u16): Delete.
(vidupq_m_wb_u32): Delete.
(vidupq_n_u8): Delete.
(vidupq_n_u32): Delete.
(vidupq_n_u16): Delete.
(vidupq_wb_u8): Delete.
(vidupq_wb_u16): Delete.
(vidupq_wb_u32): Delete.
(vddupq_x_n_u8): Delete.
(vddupq_x_n_u16): Delete.
(vddupq_x_n_u32): Delete.
(vddupq_x_wb_u8): Delete.
(vddupq_x_wb_u16): Delete.
(vddupq_x_wb_u32): Delete.
(vidupq_x_n_u8): Delete.
(vidupq_x_n_u16): Delete.
(vidupq_x_n_u32): Delete.
(vidupq_x_wb_u8): Delete.
(vidupq_x_wb_u16): Delete.
(vidupq_x_wb_u32): Delete.
(__arm_vddupq_m_n_u8): Delete.
(__arm_vddupq_m_n_u32): Delete.
(__arm_vddupq_m_n_u16): Delete.
(__arm_vddupq_m_wb_u8): Delete.
(__arm_vddupq_m_wb_u16): Delete.
(__arm_vddupq_m_wb_u32): Delete.
(__arm_vddupq_n_u8): Delete.
(__arm_vddupq_n_u32): Delete.
(__arm_vddupq_n_u16): Delete.
(__arm_vidupq_m_n_u8): Delete.
(__arm_vidupq_m_n_u32): Delete.
(__arm_vidupq_m_n_u16): Delete.
(__arm_vidupq_n_u8): Delete.
(__arm_vidupq_m_wb_u8): Delete.
(__arm_vidupq_m_wb_u16): Delete.
(__arm_vidupq_m_wb_u32): Delete.
(__arm_vidupq_n_u32): Delete.
(__arm_vidupq_n_u16): Delete.
(__arm_vidupq_wb_u8): Delete.
(__arm_vidupq_wb_u16): Delete.
(__arm_vidupq_wb_u32): Delete.
(__arm_vddupq_wb_u8): Delete.
(__arm_vddupq_wb_u16): Delete.
(__arm_vddupq_wb_u32): Delete.
(__arm_vddupq_x_n_u8): Delete.
(__arm_vddupq_x_n_u16): Delete.
(__arm_vddupq_x_n_u32): Delete.
(__arm_vddupq_x_wb_u8): Delete.
(__arm_vddupq_x_wb_u16): Delete.
(__arm_vddupq_x_wb_u32): Delete.
(__arm_vidupq_x_n_u8): Delete.
(__arm_vidupq_x_n_u16): Delete.
(__arm_vidupq_x_n_u32): Delete.
(__arm_vidupq_x_wb_u8): Delete.
(__arm_vidupq_x_wb_u16): Delete.
(__arm_vidupq_x_wb_u32): Delete.
(__arm_vddupq_m): Delete.
(__arm_vddupq_u8): Delete.
(__arm_vddupq_u32): Delete.
(__arm_vddupq_u16): Delete.
(__arm_vidupq_m): Delete.
(__arm_vidupq_u8): Delete.
(__arm_vidupq_u32): Delete.
(__arm_vidupq_u16): Delete.
(__arm_vddupq_x_u8): Delete.
(__arm_vddupq_x_u16): Delete.
(__arm_vddupq_x_u32): Delete.
(__arm_vidupq_x_u8): Delete.
(__arm_vidupq_x_u16): Delete.
(__arm_vidupq_x_u32): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  | 112 
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 676 ---
 4 files changed, 116 insertions(+), 676 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 89724320d43..3d8bcdabe24 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -30,6 +30,7 @@
 #include "basic-block.h"
 #include "function.h"
 #include "gimple.h"
+#include "emit-rtl.h"
 #include "arm-mve-builtins.h"
 #include "ar

[PATCH v2 17/36] arm: [MVE intrinsics] factorize vddup vidup

2024-09-04 Thread Christophe Lyon

Factorize vddup and vidup so that they use the same parameterized
names.

This patch updates only the (define_insn
"@mve_q_u_insn") patterns and does not bother with the
(define_expand "mve_vidupq_n_u") ones, because a subsequent
patch avoids using them.

2024-08-21  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VIDUPQ, VDDUPQ,
VIDUPQ_M, VDDUPQ_M.
(viddupq_op): New.
(viddupq_m_op): New.
(VIDDUPQ): New.
(VIDDUPQ_M): New.
* config/arm/mve.md (mve_vddupq_u_insn)
(mve_vidupq_u_insn): Merge into ...
(mve_q_u_insn): ... this.
(mve_vddupq_m_wb_u_insn, mve_vidupq_m_wb_u_insn):
Merge into ...
(mve_q_m_wb_u_insn): ... this.
---
 gcc/config/arm/iterators.md |  7 +
 gcc/config/arm/mve.md   | 58 +
 2 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 3a1825ebab2..c0299117f26 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1007,6 +1007,8 @@ (define_int_attr mve_insn [
 (VHSUBQ_M_S "vhsub") (VHSUBQ_M_U "vhsub")
 (VHSUBQ_N_S "vhsub") (VHSUBQ_N_U "vhsub")
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
+(VIDUPQ "vidup") (VDDUPQ "vddup")
+(VIDUPQ_M "vidup") (VDDUPQ_M "vddup")
 (VMAXAQ_M_S "vmaxa")
 (VMAXAQ_S "vmaxa")
 (VMAXAVQ_P_S "vmaxav")
@@ -1340,6 +1342,9 @@ (define_int_attr mve_mnemo [
 (VRNDXQ_F "vrintx") (VRNDXQ_M_F "vrintx")
 ])
 
+(define_int_attr viddupq_op [ (VIDUPQ "plus") (VDDUPQ "minus")])
+(define_int_attr viddupq_m_op [ (VIDUPQ_M "plus") (VDDUPQ_M "minus")])
+
 ;; plus and minus are the only SHIFTABLE_OPS for which Thumb2 allows
 ;; a stack pointer operand.  The minus operation is a candidate for an rsub
 ;; and hence only plus is supported.
@@ -2961,6 +2966,8 @@ (define_int_iterator VCVTxQ_M_F16_F32 [VCVTBQ_M_F16_F32 
VCVTTQ_M_F16_F32])
 (define_int_iterator VCVTxQ_M_F32_F16 [VCVTBQ_M_F32_F16 VCVTTQ_M_F32_F16])
 (define_int_iterator VCVTxQ [VCVTAQ_S VCVTAQ_U VCVTMQ_S VCVTMQ_U VCVTNQ_S 
VCVTNQ_U VCVTPQ_S VCVTPQ_U])
 (define_int_iterator VCVTxQ_M [VCVTAQ_M_S VCVTAQ_M_U VCVTMQ_M_S VCVTMQ_M_U 
VCVTNQ_M_S VCVTNQ_M_U VCVTPQ_M_S VCVTPQ_M_U])
+(define_int_iterator VIDDUPQ [VIDUPQ VDDUPQ])
+(define_int_iterator VIDDUPQ_M [VIDUPQ_M VDDUPQ_M])
 (define_int_iterator DLSTP [DLSTP8 DLSTP16 DLSTP32
   DLSTP64])
 (define_int_iterator LETP [LETP8 LETP16 LETP32
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 62cffebd6ed..36117303fd6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -5105,18 +5105,18 @@ (define_expand "mve_vidupq_n_u"
 })
 
 ;;
-;; [vidupq_u_insn])
+;; [vddupq_u_insn, vidupq_u_insn]
 ;;
-(define_insn "mve_vidupq_u_insn"
+(define_insn "@mve_q_u_insn"
  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:SI 2 "s_register_operand" "1")
  (match_operand:SI 3 "mve_imm_selective_upto_8" "Rg")]
-VIDUPQ))
+   VIDDUPQ))
   (set (match_operand:SI 1 "s_register_operand" "=Te")
-   (plus:SI (match_dup 2)
-   (match_operand:SI 4 "immediate_operand" "i")))]
+   (:SI (match_dup 2)
+   (match_operand:SI 4 "immediate_operand" "i")))]
  "TARGET_HAVE_MVE"
- "vidup.u%#\t%q0, %1, %3")
+ ".u%#\t%q0, %1, %3")
 
 ;;
 ;; [vidupq_m_n_u])
@@ -5139,21 +5139,21 @@ (define_expand "mve_vidupq_m_n_u"
 })
 
 ;;
-;; [vidupq_m_wb_u_insn])
+;; [vddupq_m_wb_u_insn, vidupq_m_wb_u_insn]
 ;;
-(define_insn "mve_vidupq_m_wb_u_insn"
+(define_insn "@mve_q_m_wb_u_insn"
  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 1 "s_register_operand" "0")
  (match_operand:SI 3 "s_register_operand" "2")
  (match_operand:SI 4 "mve_imm_selective_upto_8" "Rg")
  (match_operand: 5 "vpr_register_operand" "Up")]
-   VIDUPQ_M))
+   VIDDUPQ_M))
   (set (match_operand:SI 2 "s_register_operand" "=Te")
-   (plus:SI (match_dup 3)
-   (match_operand:SI 6 "immediate_operand" "i")))]
+   (:SI (match_dup 3)
+ (match_operand:SI 6 "immediate_operand" "i")))]
  "TARGET_HAVE_MVE"
- "vpst\;\tvidupt.u%#\t%q0, %2, %4"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vidupq_u_insn"))
+ "vpst\;t.u%#\t%q0, %2, %4"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_u_insn"))
   (set_attr "length""8")])
 
 ;;
@@ -5173,20 +5173,6 @@ (define_expand "mve_vddupq_n_u"
   DONE;
 })
 
-;;
-;; [vddupq_u_insn])
-;;
-(define_insn "mve_vddupq_u_insn"
- [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:SI 2 "s_register_operand" "1")
- (match_operand:SI

[PATCH v2 27/36] arm: [MVE intrinsics] remove useless v[id]wdup expanders

2024-09-04 Thread Christophe Lyon

Like with vddup/vidup, we use code_for_mve_q_wb_u_insn, so we can drop
the expanders and their declarations as builtins, now useless.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-builtins.cc
(arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (viwdupq_wb_u, vdwdupq_wb_u)
(viwdupq_m_wb_u, vdwdupq_m_wb_u, viwdupq_m_n_u, vdwdupq_m_n_u)
(vdwdupq_n_u, viwdupq_n_u): Delete.
* config/arm/mve.md (mve_vdwdupq_n_u): Delete.
(mve_vdwdupq_wb_u): Delete.
(mve_vdwdupq_m_n_u): Delete.
(mve_vdwdupq_m_wb_u): Delete.
---
 gcc/config/arm/arm-builtins.cc  |  7 ---
 gcc/config/arm/arm_mve_builtins.def |  8 ---
 gcc/config/arm/mve.md   | 75 -
 3 files changed, 90 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index c9d50bf8fbb..697b91911dd 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -755,13 +755,6 @@ arm_ldru_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_pointer, qualifier_predicate};
 #define LDRU_Z_QUALIFIERS (arm_ldru_z_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-  qualifier_unsigned, qualifier_immediate, qualifier_predicate };
-#define QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED_QUALIFIERS \
-  (arm_quinop_unone_unone_unone_unone_imm_pred_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 7e88db4e4c3..f6962cd8cf5 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -799,14 +799,6 @@ VAR1 (STRSU_P, vstrdq_scatter_offset_p_u, v2di)
 VAR1 (STRSU_P, vstrdq_scatter_shifted_offset_p_u, v2di)
 VAR1 (STRSU_P, vstrwq_scatter_offset_p_u, v4si)
 VAR1 (STRSU_P, vstrwq_scatter_shifted_offset_p_u, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_wb_u, v16qi, v4si, v8hi)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_wb_u, v16qi, v4si, v8hi)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, viwdupq_m_wb_u, v16qi, v8hi, 
v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_wb_u, v16qi, v8hi, 
v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, viwdupq_m_n_u, v16qi, v8hi, 
v4si)
-VAR3 (QUINOP_UNONE_UNONE_UNONE_UNONE_IMM_PRED, vdwdupq_m_n_u, v16qi, v8hi, 
v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vdwdupq_n_u, v16qi, v4si, v8hi)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, viwdupq_n_u, v16qi, v4si, v8hi)
 VAR1 (STRSBWBU, vstrwq_scatter_base_wb_u, v4si)
 VAR1 (STRSBWBU, vstrdq_scatter_base_wb_u, v2di)
 VAR1 (STRSBWBU_P, vstrwq_scatter_base_wb_p_u, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 72a7e4dc868..0507e117f51 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -5120,41 +5120,6 @@ (define_insn "@mve_q_m_wb_u_insn"
  [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_u_insn"))
   (set_attr "length""8")])
 
-;;
-;; [vdwdupq_n_u])
-;;
-(define_expand "mve_vdwdupq_n_u"
- [(match_operand:MVE_2 0 "s_register_operand")
-  (match_operand:SI 1 "s_register_operand")
-  (match_operand:DI 2 "s_register_operand")
-  (match_operand:SI 3 "mve_imm_selective_upto_8")]
- "TARGET_HAVE_MVE"
-{
-  rtx ignore_wb = gen_reg_rtx (SImode);
-  emit_insn (gen_mve_vdwdupq_wb_u_insn (operands[0], ignore_wb,
- operands[1], operands[2],
- operands[3]));
-  DONE;
-})
-
-;;
-;; [vdwdupq_wb_u])
-;;
-(define_expand "mve_vdwdupq_wb_u"
- [(match_operand:SI 0 "s_register_operand")
-  (match_operand:SI 1 "s_register_operand")
-  (match_operand:DI 2 "s_register_operand")
-  (match_operand:SI 3 "mve_imm_selective_upto_8")
-  (unspec:MVE_2 [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
- "TARGET_HAVE_MVE"
-{
-  rtx ignore_vec = gen_reg_rtx (mode);
-  emit_insn (gen_mve_vdwdupq_wb_u_insn (ignore_vec, operands[0],
- operands[1], operands[2],
- operands[3]));
-  DONE;
-})
-
 ;;
 ;; [vdwdupq_wb_u_insn, viwdupq_wb_u_insn]
 ;;
@@ -5174,46 +5139,6 @@ (define_insn "@mve_q_wb_u_insn"
  [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_wb_u_insn"))
   (set_attr "type" "mve_move")])
 
-;;
-;; [vdwdupq_m_n_u])
-;;
-(define_expand "mve_vdwdupq_m_n_u"
- [(match_operand:MVE_2 0 "s_register_operand")
-  (match_operand:MVE_2 1 "s_register_operand")
-  (match_operand:SI 2 "s_register_operand")
-  (match_operand:DI 3 "s_register_operand")
-  (match_operand:SI 4 "mve_imm_selective_upto_8")
-  (match_operand: 5 "vpr_register_operand")]
- "TARGET_HAVE_MVE"
-{
-  rtx ignore_wb = gen_reg_

[PATCH v2 18/36] arm: [MVE intrinsics] add viddup shape

2024-09-04 Thread Christophe Lyon

This patch adds the viddup shape description for vidup and vddup.

This requires the addition of report_not_one_of and
function_checker::require_immediate_one_of to
gcc/config/arm/arm-mve-builtins.cc (they are copies of the aarch64 SVE
counterpart).

This patch also introduces MODE_wb.

2024-08-21  Christophe Lyon  

gcc/

* config/arm/arm-mve-builtins-shapes.cc (viddup): New.
* config/arm/arm-mve-builtins-shapes.h (viddup): New.
* config/arm/arm-mve-builtins.cc (report_not_one_of): New.
(function_checker::require_immediate_one_of): New.
* config/arm/arm-mve-builtins.def (wb): New mode.
* config/arm/arm-mve-builtins.h (function_checker) Add
require_immediate_one_of.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 85 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 gcc/config/arm/arm-mve-builtins.cc| 44 
 gcc/config/arm/arm-mve-builtins.def   |  1 +
 gcc/config/arm/arm-mve-builtins.h |  2 +
 5 files changed, 133 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 8a849c2bc02..971e86a2727 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -2191,6 +2191,91 @@ struct vcvtx_def : public overloaded_base<0>
 };
 SHAPE (vcvtx)
 
+/* _t vfoo[_n]_t0(uint32_t, const int)
+   _t vfoo[_wb]_t0(uint32_t *, const int)
+
+   Shape for vector increment or decrement and duplicate operations that take
+   an integer or pointer to integer first argument and an immediate, and
+   produce a vector.
+
+   Check that 'imm' is one of 1, 2, 4 or 8.
+
+   Example: vddupq.
+   uint8x16_t [__arm_]vddupq[_n]_u8(uint32_t a, const int imm)
+   uint8x16_t [__arm_]vddupq[_wb]_u8(uint32_t *a, const int imm)
+   uint8x16_t [__arm_]vddupq_m[_n_u8](uint8x16_t inactive, uint32_t a, const 
int imm, mve_pred16_t p)
+   uint8x16_t [__arm_]vddupq_m[_wb_u8](uint8x16_t inactive, uint32_t *a, const 
int imm, mve_pred16_t p)
+   uint8x16_t [__arm_]vddupq_x[_n]_u8(uint32_t a, const int imm, mve_pred16_t 
p)
+   uint8x16_t [__arm_]vddupq_x[_wb]_u8(uint32_t *a, const int imm, 
mve_pred16_t p)  */
+struct viddup_def : public overloaded_base<0>
+{
+  bool
+  explicit_type_suffix_p (unsigned int i, enum predication_index pred,
+ enum mode_suffix_index,
+ type_suffix_info) const override
+  {
+return ((i == 0) && (pred != PRED_m));
+  }
+
+  bool
+  skip_overload_p (enum predication_index, enum mode_suffix_index mode) const 
override
+  {
+/* For MODE_wb, share the overloaded instance with MODE_n.  */
+if (mode == MODE_wb)
+  return true;
+
+return false;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,su32,su64", group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,as,su64", group, MODE_wb, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type_suffix = NUM_TYPE_SUFFIXES;
+if (!r.check_gp_argument (2, i, nargs))
+  return error_mark_node;
+
+type_suffix = r.type_suffix_ids[0];
+/* With PRED_m, ther is no type suffix, so infer it from the first 
(inactive)
+   argument.  */
+if (type_suffix == NUM_TYPE_SUFFIXES)
+  type_suffix = r.infer_vector_type (0);
+
+unsigned int last_arg = i - 1;
+/* Check that last_arg is either scalar or pointer.  */
+if (!r.scalar_argument_p (last_arg))
+  return error_mark_node;
+
+if (!r.require_integer_immediate (last_arg + 1))
+  return error_mark_node;
+
+/* With MODE_n we expect a scalar, with MODE_wb we expect a pointer.  */
+mode_suffix_index mode_suffix;
+if (POINTER_TYPE_P (r.get_argument_type (last_arg)))
+  mode_suffix = MODE_wb;
+else
+  mode_suffix = MODE_n;
+
+return r.resolve_to (mode_suffix, type_suffix);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+return c.require_immediate_one_of (1, 1, 2, 4, 8);
+  }
+};
+SHAPE (viddup)
+
 /* _t vfoo[_t0](_t, _t, mve_pred16_t)
 
i.e. a version of the standard ternary shape in which
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 80340dc33ec..186287c1620 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -82,6 +82,7 @@ namespace arm_mve
 extern const function_shape *const vcvt_f16_f32;
 extern const function_shape *const vcvt_f32_f16;
 extern const function_shape *const vcvtx;
+extern const function_shape *const viddup;
 extern const function_shape *const vpsel;
 
   } /* end namespace arm_mve::shapes */
diff --git a/gcc/config/arm/arm-mve-built

[PATCH v2 33/36] arm: [MVE intrinsics] rework vadciq

2024-09-04 Thread Christophe Lyon

Implement vadciq using the new MVE builtins framework.

2024-08-28  Christophe Lyon  
gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): New.
(vadciq): New.
* config/arm/arm-mve-builtins-base.def (vadciq): New.
* config/arm/arm-mve-builtins-base.h (vadciq): New.
* config/arm/arm_mve.h (vadciq): Delete.
(vadciq_m): Delete.
(vadciq_s32): Delete.
(vadciq_u32): Delete.
(vadciq_m_s32): Delete.
(vadciq_m_u32): Delete.
(__arm_vadciq_s32): Delete.
(__arm_vadciq_u32): Delete.
(__arm_vadciq_m_s32): Delete.
(__arm_vadciq_m_u32): Delete.
(__arm_vadciq): Delete.
(__arm_vadciq_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  | 93 
 gcc/config/arm/arm-mve-builtins-base.def |  1 +
 gcc/config/arm/arm-mve-builtins-base.h   |  1 +
 gcc/config/arm/arm_mve.h | 89 ---
 4 files changed, 95 insertions(+), 89 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 9f1f7e69c57..6f3b18c2915 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -554,6 +554,98 @@ public:
   }
 };
 
+/* Map the vadc and similar functions directly to CODE (UNSPEC, UNSPEC).  Take
+   care of the implicit carry argument.  */
+class vadc_vsbc_impl : public function_base
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+unsigned int flags = CP_WRITE_MEMORY | CP_READ_FPCR;
+return flags;
+  }
+
+  tree
+  memory_scalar_type (const function_instance &) const override
+  {
+/* carry is "unsigned int".  */
+return get_typenode_from_name ("unsigned int");
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+insn_code code;
+rtx insns, carry_ptr, carry_out;
+int carry_out_arg_no;
+int unspec;
+
+if (! e.type_suffix (0).integer_p)
+  gcc_unreachable ();
+
+if (e.mode_suffix_id != MODE_none)
+  gcc_unreachable ();
+
+/* Remove carry from arguments, it is implicit for the builtin.  */
+switch (e.pred)
+  {
+  case PRED_none:
+   carry_out_arg_no = 2;
+   break;
+
+  case PRED_m:
+   carry_out_arg_no = 3;
+   break;
+
+  default:
+   gcc_unreachable ();
+  }
+
+carry_ptr = e.args[carry_out_arg_no];
+e.args.ordered_remove (carry_out_arg_no);
+
+switch (e.pred)
+  {
+  case PRED_none:
+   /* No predicate.  */
+   unspec = e.type_suffix (0).unsigned_p
+ ? VADCIQ_U
+ : VADCIQ_S;
+   code = code_for_mve_q_v4si (unspec, unspec);
+   insns = e.use_exact_insn (code);
+   break;
+
+  case PRED_m:
+   /* "m" predicate.  */
+   unspec = e.type_suffix (0).unsigned_p
+ ? VADCIQ_M_U
+ : VADCIQ_M_S;
+   code = code_for_mve_q_m_v4si (unspec, unspec);
+   insns = e.use_cond_insn (code, 0);
+   break;
+
+  default:
+   gcc_unreachable ();
+  }
+
+/* Update carry_out.  */
+carry_out = gen_reg_rtx (SImode);
+emit_insn (gen_get_fpscr_nzcvqc (carry_out));
+emit_insn (gen_rtx_SET (carry_out,
+   gen_rtx_LSHIFTRT (SImode,
+ carry_out,
+ GEN_INT (29;
+emit_insn (gen_rtx_SET (carry_out,
+   gen_rtx_AND (SImode,
+carry_out,
+GEN_INT (1;
+emit_insn (gen_rtx_SET (gen_rtx_MEM (Pmode, carry_ptr), carry_out));
+
+return insns;
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
@@ -724,6 +816,7 @@ namespace arm_mve {
 FUNCTION_PRED_P_S_U (vabavq, VABAVQ)
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
+FUNCTION (vadciq, vadc_vsbc_impl,)
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_PRED_P_S_U (vaddlvaq, VADDLVAQ)
 FUNCTION_PRED_P_S_U (vaddlvq, VADDLVQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index bd69f06d7e4..72d6461c4e4 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -21,6 +21,7 @@
 DEF_MVE_FUNCTION (vabavq, binary_acca_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vabdq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vabsq, unary, all_signed, mx_or_none)
+DEF_MVE_FUNCTION (vadciq, vadc_vsbc, integer_32, m_or_none)
 DEF_MVE_FUNCTION (vaddlvaq, unary_widen_acc, integer_32, p_or_none)
 DEF_MVE_FUNCTION (vaddlvq, unary_acc, integer_32, p_or_none)
 DEF_MVE_FUNCTION (vaddq, binary_opt_n, all_integer, mx_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 1eff50d

[PATCH v2 31/36] arm: [MVE intrinsics] add vadc_vsbc shape

2024-09-04 Thread Christophe Lyon

This patch adds the vadc_vsbc shape description.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vadc_vsbc): New.
* config/arm/arm-mve-builtins-shapes.h (vadc_vsbc): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 36 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 37 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index ee6b5b0a7b1..9deed178966 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -1996,6 +1996,42 @@ struct unary_widen_acc_def : public overloaded_base<0>
 };
 SHAPE (unary_widen_acc)
 
+/* _t vfoo[_t0](T0, T0, uint32_t*)
+
+   Example: vadcq.
+   int32x4_t [__arm_]vadcq[_s32](int32x4_t a, int32x4_t b, unsigned *carry)
+   int32x4_t [__arm_]vadcq_m[_s32](int32x4_t inactive, int32x4_t a, int32x4_t 
b, unsigned *carry, mve_pred16_t p)  */
+struct vadc_vsbc_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v0,v0,as", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+if (!r.require_matching_vector_type (1, type))
+  return error_mark_node;
+
+/* Check that last arg is a pointer.  */
+if (!POINTER_TYPE_P (r.get_argument_type (i)))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+};
+SHAPE (vadc_vsbc)
+
 /* mve_pred16_t foo_t0(uint32_t)
 
Example: vctp16q.
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index d73c74c8ad7..e53381d8f36 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -77,6 +77,7 @@ namespace arm_mve
 extern const function_shape *const unary_n;
 extern const function_shape *const unary_widen;
 extern const function_shape *const unary_widen_acc;
+extern const function_shape *const vadc_vsbc;
 extern const function_shape *const vctp;
 extern const function_shape *const vcvt;
 extern const function_shape *const vcvt_f16_f32;
-- 
2.34.1

[PATCH v2 24/36] arm: [MVE intrinsics] add vidwdup shape

2024-09-04 Thread Christophe Lyon

This patch adds the vidwdup shape description for vdwdup and viwdup.

It is very similar to viddup, but accounts for the additional 'wrap'
scalar parameter.

2024-08-21  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vidwdup): New.
* config/arm/arm-mve-builtins-shapes.h (vidwdup): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 88 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 89 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index a1d2e243128..510f15ae73a 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -2291,6 +2291,94 @@ struct viddup_def : public overloaded_base<0>
 };
 SHAPE (viddup)
 
+/* _t vfoo[_n]_t0(uint32_t, uint32_t, const int)
+   _t vfoo[_wb]_t0(uint32_t *, uint32_t, const int)
+
+   Shape for vector increment or decrement with wrap and duplicate operations
+   that take an integer or pointer to integer first argument, an integer second
+   argument and an immediate, and produce a vector.
+
+   Check that 'imm' is one of 1, 2, 4 or 8.
+
+   Example: vdwdupq.
+   uint8x16_t [__arm_]vdwdupq[_n]_u8(uint32_t a, uint32_t b, const int imm)
+   uint8x16_t [__arm_]vdwdupq[_wb]_u8(uint32_t *a, uint32_t b, const int imm)
+   uint8x16_t [__arm_]vdwdupq_m[_n_u8](uint8x16_t inactive, uint32_t a, 
uint32_t b, const int imm, mve_pred16_t p)
+   uint8x16_t [__arm_]vdwdupq_m[_wb_u8](uint8x16_t inactive, uint32_t *a, 
uint32_t b, const int imm, mve_pred16_t p)
+   uint8x16_t [__arm_]vdwdupq_x[_n]_u8(uint32_t a, uint32_t b, const int imm, 
mve_pred16_t p)
+   uint8x16_t [__arm_]vdwdupq_x[_wb]_u8(uint32_t *a, uint32_t b, const int 
imm, mve_pred16_t p)  */
+struct vidwdup_def : public overloaded_base<0>
+{
+  bool
+  explicit_type_suffix_p (unsigned int i, enum predication_index pred,
+ enum mode_suffix_index,
+ type_suffix_info) const override
+  {
+return ((i == 0) && (pred != PRED_m));
+  }
+
+  bool
+  skip_overload_p (enum predication_index, enum mode_suffix_index mode) const 
override
+  {
+/* For MODE_wb, share the overloaded instance with MODE_n.  */
+if (mode == MODE_wb)
+  return true;
+
+return false;
+  }
+
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,su32,su32,su64", group, MODE_n, preserve_user_namespace);
+build_all (b, "v0,as,su32,su64", group, MODE_wb, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type_suffix = NUM_TYPE_SUFFIXES;
+if (!r.check_gp_argument (3, i, nargs))
+  return error_mark_node;
+
+type_suffix = r.type_suffix_ids[0];
+/* With PRED_m, ther is no type suffix, so infer it from the first 
(inactive)
+   argument.  */
+if (type_suffix == NUM_TYPE_SUFFIXES)
+  type_suffix = r.infer_vector_type (0);
+
+unsigned int last_arg = i - 2;
+/* Check that last_arg is either scalar or pointer.  */
+if (!r.scalar_argument_p (last_arg))
+  return error_mark_node;
+
+if (!r.scalar_argument_p (last_arg + 1))
+  return error_mark_node;
+
+if (!r.require_integer_immediate (last_arg + 2))
+  return error_mark_node;
+
+/* With MODE_n we expect a scalar, with MODE_wb we expect a pointer.  */
+mode_suffix_index mode_suffix;
+if (POINTER_TYPE_P (r.get_argument_type (last_arg)))
+  mode_suffix = MODE_wb;
+else
+  mode_suffix = MODE_n;
+
+return r.resolve_to (mode_suffix, type_suffix);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+return c.require_immediate_one_of (2, 1, 2, 4, 8);
+  }
+};
+SHAPE (vidwdup)
+
 /* _t vfoo[_t0](_t, _t, mve_pred16_t)
 
i.e. a version of the standard ternary shape in which
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index 186287c1620..b3d08ab3866 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -83,6 +83,7 @@ namespace arm_mve
 extern const function_shape *const vcvt_f32_f16;
 extern const function_shape *const vcvtx;
 extern const function_shape *const viddup;
+extern const function_shape *const vidwdup;
 extern const function_shape *const vpsel;
 
   } /* end namespace arm_mve::shapes */
-- 
2.34.1

[PATCH v2 36/36] arm: [MVE intrinsics] use long_type_suffix / half_type_suffix helpers

2024-09-04 Thread Christophe Lyon

In several places we are looking for a type twice or half as large as
the type suffix: this patch introduces helper functions to avoid code
duplication. long_type_suffix is similar to the SVE counterpart, but
adds an 'expected_tclass' parameter.  half_type_suffix is similar to
it, but does not exist in SVE.

2024-08-28  Christophe Lyon  

gcc/

* config/arm/arm-mve-builtins-shapes.cc (long_type_suffix): New.
(half_type_suffix): New.
(struct binary_move_narrow_def): Use new helper.
(struct binary_move_narrow_unsigned_def): Likewise.
(struct binary_rshift_narrow_def): Likewise.
(struct binary_rshift_narrow_unsigned_def): Likewise.
(struct binary_widen_def): Likewise.
(struct binary_widen_n_def): Likewise.
(struct binary_widen_opt_n_def): Likewise.
(struct unary_widen_def): Likewise.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 114 +-
 1 file changed, 68 insertions(+), 46 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 9deed178966..0a108cf0127 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -320,6 +320,45 @@ build_16_32 (function_builder &b, const char *signature,
 }
 }
 
+/* TYPE is the largest type suffix associated with the arguments of R, but the
+   result is twice as wide.  Return the associated type suffix of
+   EXPECTED_TCLASS if it exists, otherwise report an appropriate error and
+   return NUM_TYPE_SUFFIXES.  */
+static type_suffix_index
+long_type_suffix (function_resolver &r,
+ type_suffix_index type,
+ type_class_index expected_tclass)
+{
+  unsigned int element_bits = type_suffixes[type].element_bits;
+  if (expected_tclass == function_resolver::SAME_TYPE_CLASS)
+expected_tclass = type_suffixes[type].tclass;
+
+  if (type_suffixes[type].integer_p && element_bits < 64)
+return find_type_suffix (expected_tclass, element_bits * 2);
+
+  r.report_no_such_form (type);
+  return NUM_TYPE_SUFFIXES;
+}
+
+/* Return the type suffix half as wide as TYPE with EXPECTED_TCLASS if it
+   exists, otherwise report an appropriate error and return
+   NUM_TYPE_SUFFIXES.  */
+static type_suffix_index
+half_type_suffix (function_resolver &r,
+ type_suffix_index type,
+ type_class_index expected_tclass)
+{
+  unsigned int element_bits = type_suffixes[type].element_bits;
+  if (expected_tclass == function_resolver::SAME_TYPE_CLASS)
+expected_tclass = type_suffixes[type].tclass;
+
+  if (type_suffixes[type].integer_p && element_bits > 8)
+return find_type_suffix (expected_tclass, element_bits / 2);
+
+  r.report_no_such_form (type);
+  return NUM_TYPE_SUFFIXES;
+}
+
 /* Declare the function shape NAME, pointing it to an instance
of class _def.  */
 #define SHAPE(NAME) \
@@ -779,16 +818,13 @@ struct binary_move_narrow_def : public overloaded_base<0>
   resolve (function_resolver &r) const override
   {
 unsigned int i, nargs;
-type_suffix_index type;
+type_suffix_index type, narrow_suffix;
 if (!r.check_gp_argument (2, i, nargs)
-   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
+   || ((narrow_suffix = half_type_suffix (r, type, r.SAME_TYPE_CLASS))
+   == NUM_TYPE_SUFFIXES))
   return error_mark_node;
 
-type_suffix_index narrow_suffix
-  = find_type_suffix (type_suffixes[type].tclass,
- type_suffixes[type].element_bits / 2);
-
-
 if (!r.require_matching_vector_type (0, narrow_suffix))
   return error_mark_node;
 
@@ -816,15 +852,13 @@ struct binary_move_narrow_unsigned_def : public 
overloaded_base<0>
   resolve (function_resolver &r) const override
   {
 unsigned int i, nargs;
-type_suffix_index type;
+type_suffix_index type, narrow_suffix;
 if (!r.check_gp_argument (2, i, nargs)
-   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES)
+   || (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
+   || ((narrow_suffix = half_type_suffix (r, type, TYPE_unsigned))
+   == NUM_TYPE_SUFFIXES))
   return error_mark_node;
 
-type_suffix_index narrow_suffix
-  = find_type_suffix (TYPE_unsigned,
- type_suffixes[type].element_bits / 2);
-
 if (!r.require_matching_vector_type (0, narrow_suffix))
   return error_mark_node;
 
@@ -1112,16 +1146,14 @@ struct binary_rshift_narrow_def : public 
overloaded_base<0>
   resolve (function_resolver &r) const override
   {
 unsigned int i, nargs;
-type_suffix_index type;
+type_suffix_index type, narrow_suffix;
 if (!r.check_gp_argument (3, i, nargs)
|| (type = r.infer_vector_type (1)) == NUM_TYPE_SUFFIXES
+   || ((narrow_suffix = half_type_suffix (r, type, r.SAME_TYPE_CLASS))
+   == NUM_TYPE_SUFF

[PATCH v2 23/36] arm: [MVE intrinsics] factorize vdwdup viwdup

2024-09-04 Thread Christophe Lyon

Factorize vdwdup and viwdup so that they use the same parameterized
names.

Like with vddup and vidup, we do not bother with the corresponding
expanders, as we stop using them in a subsequent patch.

The patch also adds the missing attributes to vdwdupq_wb_u_insn and
viwdupq_wb_u_insn patterns.

2024-08-21  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VIWDUPQ, VDWDUPQ,
VIWDUPQ_M, VDWDUPQ_M.
(VIDWDUPQ): New iterator.
(VIDWDUPQ_M): New iterator.
* config/arm/mve.md (mve_vdwdupq_wb_u_insn)
(mve_viwdupq_wb_u_insn): Merge into ...
(@mve_q_wb_u_insn): ... this. Add missing
mve_unpredicated_insn and mve_move attributes.
(mve_vdwdupq_m_wb_u_insn, mve_viwdupq_m_wb_u_insn):
Merge into ...
(@mve_q_m_wb_u_insn): ... this.
---
 gcc/config/arm/iterators.md |  4 +++
 gcc/config/arm/mve.md   | 68 +++--
 2 files changed, 17 insertions(+), 55 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c0299117f26..2fb3b25040f 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -1009,6 +1009,8 @@ (define_int_attr mve_insn [
 (VHSUBQ_S "vhsub") (VHSUBQ_U "vhsub")
 (VIDUPQ "vidup") (VDDUPQ "vddup")
 (VIDUPQ_M "vidup") (VDDUPQ_M "vddup")
+(VIWDUPQ "viwdup") (VDWDUPQ "vdwdup")
+(VIWDUPQ_M "viwdup") (VDWDUPQ_M "vdwdup")
 (VMAXAQ_M_S "vmaxa")
 (VMAXAQ_S "vmaxa")
 (VMAXAVQ_P_S "vmaxav")
@@ -2968,6 +2970,8 @@ (define_int_iterator VCVTxQ [VCVTAQ_S VCVTAQ_U VCVTMQ_S 
VCVTMQ_U VCVTNQ_S VCVTNQ
 (define_int_iterator VCVTxQ_M [VCVTAQ_M_S VCVTAQ_M_U VCVTMQ_M_S VCVTMQ_M_U 
VCVTNQ_M_S VCVTNQ_M_U VCVTPQ_M_S VCVTPQ_M_U])
 (define_int_iterator VIDDUPQ [VIDUPQ VDDUPQ])
 (define_int_iterator VIDDUPQ_M [VIDUPQ_M VDDUPQ_M])
+(define_int_iterator VIDWDUPQ [VIWDUPQ VDWDUPQ])
+(define_int_iterator VIDWDUPQ_M [VIWDUPQ_M VDWDUPQ_M])
 (define_int_iterator DLSTP [DLSTP8 DLSTP16 DLSTP32
   DLSTP64])
 (define_int_iterator LETP [LETP8 LETP16 LETP32
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 3477bbdda7b..be3be67a144 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -5156,22 +5156,23 @@ (define_expand "mve_vdwdupq_wb_u"
 })
 
 ;;
-;; [vdwdupq_wb_u_insn])
+;; [vdwdupq_wb_u_insn, viwdupq_wb_u_insn]
 ;;
-(define_insn "mve_vdwdupq_wb_u_insn"
+(define_insn "@mve_q_wb_u_insn"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:SI 2 "s_register_operand" "1")
   (subreg:SI (match_operand:DI 3 "s_register_operand" "r") 
4)
   (match_operand:SI 4 "mve_imm_selective_upto_8" "Rg")]
-VDWDUPQ))
+VIDWDUPQ))
(set (match_operand:SI 1 "s_register_operand" "=Te")
(unspec:SI [(match_dup 2)
(subreg:SI (match_dup 3) 4)
(match_dup 4)]
-VDWDUPQ))]
+VIDWDUPQ))]
   "TARGET_HAVE_MVE"
-  "vdwdup.u%#\t%q0, %2, %R3, %4"
-)
+  ".u%#\t%q0, %2, %R3, %4"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_wb_u_insn"))
+  (set_attr "type" "mve_move")])
 
 ;;
 ;; [vdwdupq_m_n_u])
@@ -5214,27 +5215,27 @@ (define_expand "mve_vdwdupq_m_wb_u"
 })
 
 ;;
-;; [vdwdupq_m_wb_u_insn])
+;; [vdwdupq_m_wb_u_insn, viwdupq_m_wb_u_insn]
 ;;
-(define_insn "mve_vdwdupq_m_wb_u_insn"
+(define_insn "@mve_q_m_wb_u_insn"
   [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
   (match_operand:SI 3 "s_register_operand" "1")
   (subreg:SI (match_operand:DI 4 "s_register_operand" "r") 
4)
   (match_operand:SI 5 "mve_imm_selective_upto_8" "Rg")
   (match_operand: 6 "vpr_register_operand" 
"Up")]
-VDWDUPQ_M))
+VIDWDUPQ_M))
(set (match_operand:SI 1 "s_register_operand" "=Te")
(unspec:SI [(match_dup 2)
(match_dup 3)
(subreg:SI (match_dup 4) 4)
(match_dup 5)
(match_dup 6)]
-VDWDUPQ_M))
+VIDWDUPQ_M))
   ]
   "TARGET_HAVE_MVE"
-  "vpst\;vdwdupt.u%#\t%q2, %3, %R4, %5"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vdwdupq_wb_u_insn"))
+  "vpst\;t.u%#\t%q2, %3, %R4, %5"
+ [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_q_wb_u_insn"))
   (set_attr "type" "mve_move")
   (set_attr "length""8")])
 
@@ -5273,24 +5274,6 @@ (define_expand "mve_viwdupq_wb_u"
   DONE;
 })
 
-;;
-;; [viwdupq_wb_u_insn])
-;;
-(define_insn "mve_viwdupq_wb_u_insn"
-  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
-   (unspec:MVE_2 [(match_operand:SI 2 "s_register_operand" "1")
-  (subreg:SI (match_operand:DI 3 "s_register_operand" "r")

[PATCH v2 26/36] arm: [MVE intrinsics] update v[id]wdup tests

2024-09-04 Thread Christophe Lyon

Testing v[id]wdup overloads with '1' as argument for uint32_t* does
not make sense: this patch adds a new 'unit32_t *a' parameter to foo2
in such tests.

The difference with v[id]dup tests (where we removed 'foo2') is that
in 'foo1' we test the overload with a variable 'wrap' parameter (b)
and we need foo2 to test the overload with an immediate (1).

2024-08-28  Christophe Lyon  

gcc/testsuite/

* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c: Use pointer
parameter in foo2.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c: Likewise.
---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c | 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_wb_u16.c  | 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_wb_u32.c  | 6 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_wb_u8.c | 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u16.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u32.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/vdwdupq_x_wb_u8.c | 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u16.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u32.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_m_wb_u8.c | 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_wb_u16.c  | 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_wb_u32.c  | 6 +++---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/viwdupq_wb_u8.c | 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u16.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u32.c| 6 +++---
 .../gcc.target/arm/mve/intrinsics/viwdupq_x_wb_u8.c | 6 +++---
 18 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
index b24e7a2f5af..e6004056c2c 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u16.c
@@ -53,13 +53,13 @@ foo1 (uint16x8_t inactive, uint32_t *a, uint32_t b, 
mve_pred16_t p)
 ** ...
 */
 uint16x8_t
-foo2 (uint16x8_t inactive, mve_pred16_t p)
+foo2 (uint16x8_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, 1, 1, 1, p);
+  return vdwdupq_m (inactive, a, 1, 1, p);
 }
 
 #ifdef __cplusplus
 }
 #endif
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
index 75c41450a38..b36dbcd8585 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u32.c
@@ -53,13 +53,13 @@ foo1 (uint32x4_t inactive, uint32_t *a, uint32_t b, 
mve_pred16_t p)
 ** ...
 */
 uint32x4_t
-foo2 (uint32x4_t inactive, mve_pred16_t p)
+foo2 (uint32x4_t inactive, uint32_t *a, mve_pred16_t p)
 {
-  return vdwdupq_m (inactive, 1, 1, 1, p);
+  return vdwdupq_m (inactive, a, 1, 1, p);
 }
 
 #ifdef __cplusplus
 }
 #endif
 
-/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
index 90d64671dcf..b1577065a48 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vdwdupq_m_wb_u8.c
@@ -53,13 +53,13 @@ foo1 (uint8x16_t i

[PATCH v2 28/36] arm: [MVE intrinsics] add vshlc shape

2024-09-04 Thread Christophe Lyon

This patch adds the vshlc shape description.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-shapes.cc (vshlc): New.
* config/arm/arm-mve-builtins-shapes.h (vshlc): New.
---
 gcc/config/arm/arm-mve-builtins-shapes.cc | 44 +++
 gcc/config/arm/arm-mve-builtins-shapes.h  |  1 +
 2 files changed, 45 insertions(+)

diff --git a/gcc/config/arm/arm-mve-builtins-shapes.cc 
b/gcc/config/arm/arm-mve-builtins-shapes.cc
index 510f15ae73a..ee6b5b0a7b1 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.cc
+++ b/gcc/config/arm/arm-mve-builtins-shapes.cc
@@ -2418,6 +2418,50 @@ struct vpsel_def : public overloaded_base<0>
 };
 SHAPE (vpsel)
 
+/* _t vfoo[_t0](T0, uint32_t* , const int)
+
+   Check that 'imm' is in [1..32].
+
+   Example: vshlcq.
+   uint8x16_t [__arm_]vshlcq[_u8](uint8x16_t a, uint32_t *b, const int imm)
+   uint8x16_t [__arm_]vshlcq_m[_u8](uint8x16_t a, uint32_t *b, const int imm, 
mve_pred16_t p)  */
+struct vshlc_def : public overloaded_base<0>
+{
+  void
+  build (function_builder &b, const function_group_info &group,
+bool preserve_user_namespace) const override
+  {
+b.add_overloaded_functions (group, MODE_none, preserve_user_namespace);
+build_all (b, "v0,v0,as,su64", group, MODE_none, preserve_user_namespace);
+  }
+
+  tree
+  resolve (function_resolver &r) const override
+  {
+unsigned int i, nargs;
+type_suffix_index type;
+if (!r.check_gp_argument (3, i, nargs)
+   || (type = r.infer_vector_type (0)) == NUM_TYPE_SUFFIXES)
+  return error_mark_node;
+
+/* Check that arg #2 is a pointer.  */
+if (!POINTER_TYPE_P (r.get_argument_type (i - 1)))
+  return error_mark_node;
+
+if (!r.require_integer_immediate (i))
+  return error_mark_node;
+
+return r.resolve_to (r.mode_suffix_id, type);
+  }
+
+  bool
+  check (function_checker &c) const override
+  {
+return c.require_immediate_range (2, 1, 32);
+  }
+};
+SHAPE (vshlc)
+
 } /* end namespace arm_mve */
 
 #undef SHAPE
diff --git a/gcc/config/arm/arm-mve-builtins-shapes.h 
b/gcc/config/arm/arm-mve-builtins-shapes.h
index b3d08ab3866..d73c74c8ad7 100644
--- a/gcc/config/arm/arm-mve-builtins-shapes.h
+++ b/gcc/config/arm/arm-mve-builtins-shapes.h
@@ -85,6 +85,7 @@ namespace arm_mve
 extern const function_shape *const viddup;
 extern const function_shape *const vidwdup;
 extern const function_shape *const vpsel;
+extern const function_shape *const vshlc;
 
   } /* end namespace arm_mve::shapes */
 } /* end namespace arm_mve */
-- 
2.34.1

[PATCH v2 25/36] arm: [MVE intrinsics] rework vdwdup viwdup

2024-09-04 Thread Christophe Lyon

Implement vdwdup and viwdup using the new MVE builtins framework.

In order to share more code with viddup_impl, the patch swaps operands
1 and 2 in @mve_v[id]wdupq_m_wb_u_insn, so that the parameter
order is similar to what @mve_v[id]dupq_m_wb_u_insn uses.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (viddup_impl): Add support
for wrapping versions.
(vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.def (vdwdupq): New.
(viwdupq): New.
* config/arm/arm-mve-builtins-base.h (vdwdupq): New.
(viwdupq): New.
* config/arm/arm_mve.h (vdwdupq_m): Delete.
(vdwdupq_u8): Delete.
(vdwdupq_u32): Delete.
(vdwdupq_u16): Delete.
(viwdupq_m): Delete.
(viwdupq_u8): Delete.
(viwdupq_u32): Delete.
(viwdupq_u16): Delete.
(vdwdupq_x_u8): Delete.
(vdwdupq_x_u16): Delete.
(vdwdupq_x_u32): Delete.
(viwdupq_x_u8): Delete.
(viwdupq_x_u16): Delete.
(viwdupq_x_u32): Delete.
(vdwdupq_m_n_u8): Delete.
(vdwdupq_m_n_u32): Delete.
(vdwdupq_m_n_u16): Delete.
(vdwdupq_m_wb_u8): Delete.
(vdwdupq_m_wb_u32): Delete.
(vdwdupq_m_wb_u16): Delete.
(vdwdupq_n_u8): Delete.
(vdwdupq_n_u32): Delete.
(vdwdupq_n_u16): Delete.
(vdwdupq_wb_u8): Delete.
(vdwdupq_wb_u32): Delete.
(vdwdupq_wb_u16): Delete.
(viwdupq_m_n_u8): Delete.
(viwdupq_m_n_u32): Delete.
(viwdupq_m_n_u16): Delete.
(viwdupq_m_wb_u8): Delete.
(viwdupq_m_wb_u32): Delete.
(viwdupq_m_wb_u16): Delete.
(viwdupq_n_u8): Delete.
(viwdupq_n_u32): Delete.
(viwdupq_n_u16): Delete.
(viwdupq_wb_u8): Delete.
(viwdupq_wb_u32): Delete.
(viwdupq_wb_u16): Delete.
(vdwdupq_x_n_u8): Delete.
(vdwdupq_x_n_u16): Delete.
(vdwdupq_x_n_u32): Delete.
(vdwdupq_x_wb_u8): Delete.
(vdwdupq_x_wb_u16): Delete.
(vdwdupq_x_wb_u32): Delete.
(viwdupq_x_n_u8): Delete.
(viwdupq_x_n_u16): Delete.
(viwdupq_x_n_u32): Delete.
(viwdupq_x_wb_u8): Delete.
(viwdupq_x_wb_u16): Delete.
(viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m_n_u8): Delete.
(__arm_vdwdupq_m_n_u32): Delete.
(__arm_vdwdupq_m_n_u16): Delete.
(__arm_vdwdupq_m_wb_u8): Delete.
(__arm_vdwdupq_m_wb_u32): Delete.
(__arm_vdwdupq_m_wb_u16): Delete.
(__arm_vdwdupq_n_u8): Delete.
(__arm_vdwdupq_n_u32): Delete.
(__arm_vdwdupq_n_u16): Delete.
(__arm_vdwdupq_wb_u8): Delete.
(__arm_vdwdupq_wb_u32): Delete.
(__arm_vdwdupq_wb_u16): Delete.
(__arm_viwdupq_m_n_u8): Delete.
(__arm_viwdupq_m_n_u32): Delete.
(__arm_viwdupq_m_n_u16): Delete.
(__arm_viwdupq_m_wb_u8): Delete.
(__arm_viwdupq_m_wb_u32): Delete.
(__arm_viwdupq_m_wb_u16): Delete.
(__arm_viwdupq_n_u8): Delete.
(__arm_viwdupq_n_u32): Delete.
(__arm_viwdupq_n_u16): Delete.
(__arm_viwdupq_wb_u8): Delete.
(__arm_viwdupq_wb_u32): Delete.
(__arm_viwdupq_wb_u16): Delete.
(__arm_vdwdupq_x_n_u8): Delete.
(__arm_vdwdupq_x_n_u16): Delete.
(__arm_vdwdupq_x_n_u32): Delete.
(__arm_vdwdupq_x_wb_u8): Delete.
(__arm_vdwdupq_x_wb_u16): Delete.
(__arm_vdwdupq_x_wb_u32): Delete.
(__arm_viwdupq_x_n_u8): Delete.
(__arm_viwdupq_x_n_u16): Delete.
(__arm_viwdupq_x_n_u32): Delete.
(__arm_viwdupq_x_wb_u8): Delete.
(__arm_viwdupq_x_wb_u16): Delete.
(__arm_viwdupq_x_wb_u32): Delete.
(__arm_vdwdupq_m): Delete.
(__arm_vdwdupq_u8): Delete.
(__arm_vdwdupq_u32): Delete.
(__arm_vdwdupq_u16): Delete.
(__arm_viwdupq_m): Delete.
(__arm_viwdupq_u8): Delete.
(__arm_viwdupq_u32): Delete.
(__arm_viwdupq_u16): Delete.
(__arm_vdwdupq_x_u8): Delete.
(__arm_vdwdupq_x_u16): Delete.
(__arm_vdwdupq_x_u32): Delete.
(__arm_viwdupq_x_u8): Delete.
(__arm_viwdupq_x_u16): Delete.
(__arm_viwdupq_x_u32): Delete.
* config/arm/mve.md (@mve_q_m_wb_u_insn): Swap
operands 1 and 2.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  62 +-
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 714 ---
 gcc/config/arm/mve.md|  10 +-
 5 files changed, 53 insertions(+), 737 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 3d8bcdabe24..eaf054d9823 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -354,16 +354,19 @@ public:
vector mode asso

[PATCH v2 29/36] arm: [MVE intrinsics] rework vshlcq

2024-09-04 Thread Christophe Lyon

Implement vshlc using the new MVE builtins framework.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (class vshlc_impl): New.
(vshlc): New.
* config/arm/arm-mve-builtins-base.def (vshlcq): New.
* config/arm/arm-mve-builtins-base.h (vshlcq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vshlc.
* config/arm/arm_mve.h (vshlcq): Delete.
(vshlcq_m): Delete.
(vshlcq_s8): Delete.
(vshlcq_u8): Delete.
(vshlcq_s16): Delete.
(vshlcq_u16): Delete.
(vshlcq_s32): Delete.
(vshlcq_u32): Delete.
(vshlcq_m_s8): Delete.
(vshlcq_m_u8): Delete.
(vshlcq_m_s16): Delete.
(vshlcq_m_u16): Delete.
(vshlcq_m_s32): Delete.
(vshlcq_m_u32): Delete.
(__arm_vshlcq_s8): Delete.
(__arm_vshlcq_u8): Delete.
(__arm_vshlcq_s16): Delete.
(__arm_vshlcq_u16): Delete.
(__arm_vshlcq_s32): Delete.
(__arm_vshlcq_u32): Delete.
(__arm_vshlcq_m_s8): Delete.
(__arm_vshlcq_m_u8): Delete.
(__arm_vshlcq_m_s16): Delete.
(__arm_vshlcq_m_u16): Delete.
(__arm_vshlcq_m_s32): Delete.
(__arm_vshlcq_m_u32): Delete.
(__arm_vshlcq): Delete.
(__arm_vshlcq_m): Delete.
* config/arm/mve.md (mve_vshlcq_): Add '@' prefix.
(mve_vshlcq_m_): Likewise.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  72 +++
 gcc/config/arm/arm-mve-builtins-base.def |   1 +
 gcc/config/arm/arm-mve-builtins-base.h   |   1 +
 gcc/config/arm/arm-mve-builtins.cc   |   1 +
 gcc/config/arm/arm_mve.h | 233 ---
 gcc/config/arm/mve.md|   4 +-
 6 files changed, 77 insertions(+), 235 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index eaf054d9823..9f1f7e69c57 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -483,6 +483,77 @@ public:
   }
 };
 
+/* Map the vshlc function directly to CODE (UNSPEC, M) where M is the vector
+   mode associated with type suffix 0.  We need this special case because the
+   intrinsics derefrence the second parameter and update its contents.  */
+class vshlc_impl : public function_base
+{
+public:
+  unsigned int
+  call_properties (const function_instance &) const override
+  {
+return CP_WRITE_MEMORY | CP_READ_MEMORY;
+  }
+
+  tree
+  memory_scalar_type (const function_instance &) const override
+  {
+return get_typenode_from_name (UINT32_TYPE);
+  }
+
+  rtx
+  expand (function_expander &e) const override
+  {
+machine_mode mode = e.vector_mode (0);
+insn_code code;
+rtx insns, carry_ptr, carry, new_carry;
+int carry_arg_no;
+
+if (! e.type_suffix (0).integer_p)
+  gcc_unreachable ();
+
+if (e.mode_suffix_id != MODE_none)
+  gcc_unreachable ();
+
+carry_arg_no = 1;
+
+carry = gen_reg_rtx (SImode);
+carry_ptr = e.args[carry_arg_no];
+emit_insn (gen_rtx_SET (carry, gen_rtx_MEM (SImode, carry_ptr)));
+e.args[carry_arg_no] = carry;
+
+new_carry = gen_reg_rtx (SImode);
+e.args.quick_insert (0, new_carry);
+
+switch (e.pred)
+  {
+  case PRED_none:
+   /* No predicate.  */
+   code = e.type_suffix (0).unsigned_p
+ ? code_for_mve_vshlcq (VSHLCQ_U, mode)
+ : code_for_mve_vshlcq (VSHLCQ_S, mode);
+   insns = e.use_exact_insn (code);
+   break;
+
+  case PRED_m:
+   /* "m" predicate.  */
+   code = e.type_suffix (0).unsigned_p
+ ? code_for_mve_vshlcq_m (VSHLCQ_M_U, mode)
+ : code_for_mve_vshlcq_m (VSHLCQ_M_S, mode);
+   insns = e.use_cond_insn (code, 0);
+   break;
+
+  default:
+   gcc_unreachable ();
+  }
+
+/* Update carry.  */
+emit_insn (gen_rtx_SET (gen_rtx_MEM (Pmode, carry_ptr), new_carry));
+
+return insns;
+  }
+};
+
 } /* end anonymous namespace */
 
 namespace arm_mve {
@@ -815,6 +886,7 @@ FUNCTION_WITH_M_N_NO_F (vrshlq, VRSHLQ)
 FUNCTION_ONLY_N_NO_F (vrshrnbq, VRSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vrshrntq, VRSHRNTQ)
 FUNCTION_ONLY_N_NO_F (vrshrq, VRSHRQ)
+FUNCTION (vshlcq, vshlc_impl,)
 FUNCTION_ONLY_N_NO_F (vshllbq, VSHLLBQ)
 FUNCTION_ONLY_N_NO_F (vshlltq, VSHLLTQ)
 FUNCTION_WITH_M_N_R (vshlq, VSHLQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index c5f1e8a197b..bd69f06d7e4 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -152,6 +152,7 @@ DEF_MVE_FUNCTION (vrshlq, binary_round_lshift, all_integer, 
mx_or_none)
 DEF_MVE_FUNCTION (vrshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vrshrntq, binary_rshift_narrow, integer_16_32, m_or_none)
 DEF_MVE_FUNCTION (vrshrq, binary_rshift, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vshlcq, vshlc, a

[PATCH v2 30/36] arm: [MVE intrinsics] remove vshlcq useless expanders

2024-09-04 Thread Christophe Lyon

Since we rewrote the implementation of vshlcq intrinsics, we no longer
need these expanders.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/arm-builtins.cc
(arm_ternop_unone_none_unone_imm_qualifiers)
(-arm_ternop_none_none_unone_imm_qualifiers): Delete.
* config/arm/arm_mve_builtins.def (vshlcq_m_vec_s)
(vshlcq_m_carry_s, vshlcq_m_vec_u, vshlcq_m_carry_u): Delete.
* config/arm/mve.md (mve_vshlcq_vec_): Delete.
(mve_vshlcq_carry_): Delete.
(mve_vshlcq_m_vec_): Delete.
(mve_vshlcq_m_carry_): Delete.
---
 gcc/config/arm/arm-builtins.cc  | 13 ---
 gcc/config/arm/arm_mve_builtins.def |  8 
 gcc/config/arm/mve.md   | 60 -
 3 files changed, 81 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index 697b91911dd..621fffec6d3 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -476,19 +476,6 @@ 
arm_ternop_unone_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_NONE_NONE_QUALIFIERS \
   (arm_ternop_unone_unone_none_none_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_unone_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_unsigned,
-  qualifier_immediate };
-#define TERNOP_UNONE_NONE_UNONE_IMM_QUALIFIERS \
-  (arm_ternop_unone_none_unone_imm_qualifiers)
-
-static enum arm_type_qualifiers
-arm_ternop_none_none_unone_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_immediate 
};
-#define TERNOP_NONE_NONE_UNONE_IMM_QUALIFIERS \
-  (arm_ternop_none_none_unone_imm_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_unone_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index f6962cd8cf5..9cce644858d 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -288,15 +288,11 @@ VAR1 (TERNOP_UNONE_UNONE_UNONE_UNONE, vrmlaldavhaq_u, 
v4si)
 VAR2 (TERNOP_NONE_NONE_UNONE_PRED, vcvtq_m_to_f_u, v8hf, v4sf)
 VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtq_m_to_f_s, v8hf, v4sf)
 VAR2 (TERNOP_PRED_NONE_NONE_PRED, vcmpeqq_m_f, v8hf, v4sf)
-VAR3 (TERNOP_UNONE_NONE_UNONE_IMM, vshlcq_carry_s, v16qi, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_carry_u, v16qi, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_IMM, vqrshrunbq_n_s, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_NONE_NONE, vabavq_s, v16qi, v8hi, v4si)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_UNONE, vabavq_u, v16qi, v8hi, v4si)
 VAR2 (TERNOP_UNONE_UNONE_NONE_PRED, vcvtaq_m_u, v8hi, v4si)
 VAR2 (TERNOP_NONE_NONE_NONE_PRED, vcvtaq_m_s, v8hi, v4si)
-VAR3 (TERNOP_UNONE_UNONE_UNONE_IMM, vshlcq_vec_u, v16qi, v8hi, v4si)
-VAR3 (TERNOP_NONE_NONE_UNONE_IMM, vshlcq_vec_s, v16qi, v8hi, v4si)
 VAR4 (TERNOP_UNONE_UNONE_UNONE_PRED, vpselq_u, v16qi, v8hi, v4si, v2di)
 VAR4 (TERNOP_NONE_NONE_NONE_PRED, vpselq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (TERNOP_UNONE_UNONE_UNONE_PRED, vrev64q_m_u, v16qi, v8hi, v4si)
@@ -862,7 +858,3 @@ VAR1 (UQSHL, urshr_, si)
 VAR1 (UQSHL, urshrl_, di)
 VAR1 (UQSHL, uqshl_, si)
 VAR1 (UQSHL, uqshll_, di)
-VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_vec_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_UNONE_IMM_PRED, vshlcq_m_carry_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_vec_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_IMM_PRED, vshlcq_m_carry_u, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 83a1eb48533..eb603b3d9a7 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -1691,34 +1691,6 @@ (define_insn "@mve_q_"
 ;;
 ;; [vshlcq_u vshlcq_s]
 ;;
-(define_expand "mve_vshlcq_vec_"
- [(match_operand:MVE_2 0 "s_register_operand")
-  (match_operand:MVE_2 1 "s_register_operand")
-  (match_operand:SI 2 "s_register_operand")
-  (match_operand:SI 3 "mve_imm_32")
-  (unspec:MVE_2 [(const_int 0)] VSHLCQ)]
- "TARGET_HAVE_MVE"
-{
-  rtx ignore_wb = gen_reg_rtx (SImode);
-  emit_insn(gen_mve_vshlcq_(operands[0], ignore_wb, operands[1],
- operands[2], operands[3]));
-  DONE;
-})
-
-(define_expand "mve_vshlcq_carry_"
- [(match_operand:SI 0 "s_register_operand")
-  (match_operand:MVE_2 1 "s_register_operand")
-  (match_operand:SI 2 "s_register_operand")
-  (match_operand:SI 3 "mve_imm_32")
-  (unspec:MVE_2 [(const_int 0)] VSHLCQ)]
- "TARGET_HAVE_MVE"
-{
-  rtx ignore_vec = gen_reg_rtx (mode);
-  emit_insn(gen_mve_vshlcq_(ignore_vec, operands[0], operands[1],
- operands[2], operands[3]));
-  DONE;
-})
-
 (define_insn "@mve_vshlcq_"
  [(set (match_operand:MVE_2 0 "s_register_operand" "=w")
(unspec:MVE_2 [(match_operand:MVE_2 2 "s_register_operand" "0")
@@ -6247,38 +6219,6 @@ (define_insn "mve_sqs

[PATCH v2 34/36] arm: [MVE intrinsics] rework vadcq

2024-09-04 Thread Christophe Lyon

Implement vadcq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patch to support
vadciq: we just need to initialize carry from the input parameter.

2024-08-28  Christophe Lyon  

gcc/

* config/arm/arm-mve-builtins-base.cc (vadcq_vsbc): Add support
for vadcq.
* config/arm/arm-mve-builtins-base.def (vadcq): New.
* config/arm/arm-mve-builtins-base.h (vadcq): New.
* config/arm/arm_mve.h (vadcq): Delete.
(vadcq_m): Delete.
(vadcq_s32): Delete.
(vadcq_u32): Delete.
(vadcq_m_s32): Delete.
(vadcq_m_u32): Delete.
(__arm_vadcq_s32): Delete.
(__arm_vadcq_u32): Delete.
(__arm_vadcq_m_s32): Delete.
(__arm_vadcq_m_u32): Delete.
(__arm_vadcq): Delete.
(__arm_vadcq_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  | 61 +++--
 gcc/config/arm/arm-mve-builtins-base.def |  1 +
 gcc/config/arm/arm-mve-builtins-base.h   |  1 +
 gcc/config/arm/arm_mve.h | 87 
 4 files changed, 56 insertions(+), 94 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 6f3b18c2915..9c2e11356ef 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -559,10 +559,19 @@ public:
 class vadc_vsbc_impl : public function_base
 {
 public:
+  CONSTEXPR vadc_vsbc_impl (bool init_carry)
+: m_init_carry (init_carry)
+  {}
+
+  /* Initialize carry with 0 (vadci).  */
+  bool m_init_carry;
+
   unsigned int
   call_properties (const function_instance &) const override
   {
 unsigned int flags = CP_WRITE_MEMORY | CP_READ_FPCR;
+if (!m_init_carry)
+  flags |= CP_READ_MEMORY;
 return flags;
   }
 
@@ -605,22 +614,59 @@ public:
 carry_ptr = e.args[carry_out_arg_no];
 e.args.ordered_remove (carry_out_arg_no);
 
+if (!m_init_carry)
+  {
+   /* Prepare carry in:
+  set_fpscr ( (fpscr & ~0x2000u)
+  | ((*carry & 1u) << 29) )  */
+   rtx carry_in = gen_reg_rtx (SImode);
+   rtx fpscr = gen_reg_rtx (SImode);
+   emit_insn (gen_get_fpscr_nzcvqc (fpscr));
+   emit_insn (gen_rtx_SET (carry_in, gen_rtx_MEM (SImode, carry_ptr)));
+
+   emit_insn (gen_rtx_SET (carry_in,
+   gen_rtx_ASHIFT (SImode,
+   carry_in,
+   GEN_INT (29;
+   emit_insn (gen_rtx_SET (carry_in,
+   gen_rtx_AND (SImode,
+carry_in,
+GEN_INT (0x2000;
+   emit_insn (gen_rtx_SET (fpscr,
+   gen_rtx_AND (SImode,
+fpscr,
+GEN_INT (~0x2000;
+   emit_insn (gen_rtx_SET (carry_in,
+   gen_rtx_IOR (SImode,
+carry_in,
+fpscr)));
+   emit_insn (gen_set_fpscr_nzcvqc (carry_in));
+  }
+
 switch (e.pred)
   {
   case PRED_none:
/* No predicate.  */
-   unspec = e.type_suffix (0).unsigned_p
- ? VADCIQ_U
- : VADCIQ_S;
+   unspec = m_init_carry
+ ? (e.type_suffix (0).unsigned_p
+? VADCIQ_U
+: VADCIQ_S)
+ : (e.type_suffix (0).unsigned_p
+? VADCQ_U
+: VADCQ_S);
code = code_for_mve_q_v4si (unspec, unspec);
insns = e.use_exact_insn (code);
break;
 
   case PRED_m:
/* "m" predicate.  */
-   unspec = e.type_suffix (0).unsigned_p
- ? VADCIQ_M_U
- : VADCIQ_M_S;
+   unspec = m_init_carry
+ ? (e.type_suffix (0).unsigned_p
+? VADCIQ_M_U
+: VADCIQ_M_S)
+ : (e.type_suffix (0).unsigned_p
+? VADCQ_M_U
+: VADCQ_M_S);
code = code_for_mve_q_m_v4si (unspec, unspec);
insns = e.use_cond_insn (code, 0);
break;
@@ -816,7 +862,8 @@ namespace arm_mve {
 FUNCTION_PRED_P_S_U (vabavq, VABAVQ)
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
-FUNCTION (vadciq, vadc_vsbc_impl,)
+FUNCTION (vadciq, vadc_vsbc_impl, (true))
+FUNCTION (vadcq, vadc_vsbc_impl, (false))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_PRED_P_S_U (vaddlvaq, VADDLVAQ)
 FUNCTION_PRED_P_S_U (vaddlvq, VADDLVQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 72d6461c4e4..37efa6bf13e 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -22,6 +22,7 @@ DEF_MVE_FUNCTION (vabavq, binary_acca_i

[PATCH v2 32/36] arm: [MVE intrinsics] factorize vadc vadci vsbc vsbci

2024-09-04 Thread Christophe Lyon

Factorize vadc/vsbc and vadci/vsbci so that they use the same
parameterized names.

2024-08-28  Christophe Lyon  

gcc/
* config/arm/iterators.md (mve_insn): Add VADCIQ_M_S, VADCIQ_M_U,
VADCIQ_U, VADCIQ_S, VADCQ_M_S, VADCQ_M_U, VADCQ_S, VADCQ_U,
VSBCIQ_M_S, VSBCIQ_M_U, VSBCIQ_S, VSBCIQ_U, VSBCQ_M_S, VSBCQ_M_U,
VSBCQ_S, VSBCQ_U.
(VADCIQ, VSBCIQ): Merge into ...
(VxCIQ): ... this.
(VADCIQ_M, VSBCIQ_M): Merge into ...
(VxCIQ_M): ... this.
(VSBCQ, VADCQ): Merge into ...
(VxCQ): ... this.
(VSBCQ_M, VADCQ_M): Merge into ...
(VxCQ_M): ... this.
* config/arm/mve.md
(mve_vadciq_v4si, mve_vsbciq_v4si): Merge into ...
(@mve_q_v4si): ... this.
(mve_vadciq_m_v4si, mve_vsbciq_m_v4si): Merge into ...
(@mve_q_m_v4si): ... this.
(mve_vadcq_v4si, mve_vsbcq_v4si): Merge into ...
(@mve_q_v4si): ... this.
(mve_vadcq_m_v4si, mve_vsbcq_m_v4si): Merge into ...
(@mve_q_m_v4si): ... this.
---
 gcc/config/arm/iterators.md |  20 +++---
 gcc/config/arm/mve.md   | 131 +---
 2 files changed, 42 insertions(+), 109 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 2fb3b25040f..59e112b228c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -941,6 +941,10 @@ (define_int_attr mve_insn [
 (VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
 (VABSQ_M_F "vabs")
 (VABSQ_M_S "vabs")
+(VADCIQ_M_S "vadci") (VADCIQ_M_U "vadci")
+(VADCIQ_S "vadci") (VADCIQ_U "vadci")
+(VADCQ_M_S "vadc") (VADCQ_M_U "vadc")
+(VADCQ_S "vadc") (VADCQ_U "vadc")
 (VADDLVAQ_P_S "vaddlva") (VADDLVAQ_P_U "vaddlva")
 (VADDLVAQ_S "vaddlva") (VADDLVAQ_U "vaddlva")
 (VADDLVQ_P_S "vaddlv") (VADDLVQ_P_U "vaddlv")
@@ -1235,6 +1239,10 @@ (define_int_attr mve_insn [
 (VRSHRNTQ_N_S "vrshrnt") (VRSHRNTQ_N_U "vrshrnt")
 (VRSHRQ_M_N_S "vrshr") (VRSHRQ_M_N_U "vrshr")
 (VRSHRQ_N_S "vrshr") (VRSHRQ_N_U "vrshr")
+(VSBCIQ_M_S "vsbci") (VSBCIQ_M_U "vsbci")
+(VSBCIQ_S "vsbci") (VSBCIQ_U "vsbci")
+(VSBCQ_M_S "vsbc") (VSBCQ_M_U "vsbc")
+(VSBCQ_S "vsbc") (VSBCQ_U "vsbc")
 (VSHLLBQ_M_N_S "vshllb") (VSHLLBQ_M_N_U "vshllb")
 (VSHLLBQ_N_S "vshllb") (VSHLLBQ_N_U "vshllb")
 (VSHLLTQ_M_N_S "vshllt") (VSHLLTQ_M_N_U "vshllt")
@@ -2949,14 +2957,10 @@ (define_int_iterator VSTRWSBWBQ [VSTRWQSBWB_S 
VSTRWQSBWB_U])
 (define_int_iterator VLDRWGBWBQ [VLDRWQGBWB_S VLDRWQGBWB_U])
 (define_int_iterator VSTRDSBWBQ [VSTRDQSBWB_S VSTRDQSBWB_U])
 (define_int_iterator VLDRDGBWBQ [VLDRDQGBWB_S VLDRDQGBWB_U])
-(define_int_iterator VADCIQ [VADCIQ_U VADCIQ_S])
-(define_int_iterator VADCIQ_M [VADCIQ_M_U VADCIQ_M_S])
-(define_int_iterator VSBCQ [VSBCQ_U VSBCQ_S])
-(define_int_iterator VSBCQ_M [VSBCQ_M_U VSBCQ_M_S])
-(define_int_iterator VSBCIQ [VSBCIQ_U VSBCIQ_S])
-(define_int_iterator VSBCIQ_M [VSBCIQ_M_U VSBCIQ_M_S])
-(define_int_iterator VADCQ [VADCQ_U VADCQ_S])
-(define_int_iterator VADCQ_M [VADCQ_M_U VADCQ_M_S])
+(define_int_iterator VxCIQ [VADCIQ_U VADCIQ_S VSBCIQ_U VSBCIQ_S])
+(define_int_iterator VxCIQ_M [VADCIQ_M_U VADCIQ_M_S VSBCIQ_M_U VSBCIQ_M_S])
+(define_int_iterator VxCQ [VADCQ_U VADCQ_S  VSBCQ_U VSBCQ_S])
+(define_int_iterator VxCQ_M [VADCQ_M_U VADCQ_M_S  VSBCQ_M_U VSBCQ_M_S])
 (define_int_iterator UQRSHLLQ [UQRSHLL_64 UQRSHLL_48])
 (define_int_iterator SQRSHRLQ [SQRSHRL_64 SQRSHRL_48])
 (define_int_iterator VSHLCQ_M [VSHLCQ_M_S VSHLCQ_M_U])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index eb603b3d9a7..9c32d0e1033 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -5717,159 +5717,88 @@ (define_insn 
"mve_vldrdq_gather_base_wb_z_v2di_insn"
 }
  [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vldrdq_gather_base_wb_v2di_insn"))
   (set_attr "length" "8")])
-;;
-;; [vadciq_m_s, vadciq_m_u])
-;;
-(define_insn "mve_vadciq_m_v4si"
-  [(set (match_operand:V4SI 0 "s_register_operand" "=w")
-   (unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "0")
- (match_operand:V4SI 2 "s_register_operand" "w")
- (match_operand:V4SI 3 "s_register_operand" "w")
- (match_operand:V4BI 4 "vpr_register_operand" "Up")]
-VADCIQ_M))
-   (set (reg:SI VFPCC_REGNUM)
-   (unspec:SI [(const_int 0)]
-VADCIQ_M))
-  ]
-  "TARGET_HAVE_MVE"
-  "vpst\;vadcit.i32\t%q0, %q2, %q3"
- [(set (attr "mve_unpredicated_insn") (symbol_ref 
"CODE_FOR_mve_vadciq_v4si"))
-  (set_attr "type" "mve_move")
-   (set_attr "length" "8")])
 
 ;;
-;; [vadciq_u, vadciq_s])
+;; [vadciq_u, vadciq_s]
+;; [vsbciq_s, vsbciq_u]
 ;;
-(define_insn "mve_vadciq_v4si

[PATCH v2 35/36] arm: [MVE intrinsics] rework vsbcq vsbciq

2024-09-04 Thread Christophe Lyon

Implement vsbcq vsbciq using the new MVE builtins framework.

We re-use most of the code introduced by the previous patches.

2024-08-28  Christophe Lyon  

gcc/

* config/arm/arm-mve-builtins-base.cc (class vadc_vsbc_impl): Add
support for vsbciq and vsbcq.
(vadciq, vadcq): Add new parameter.
(vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.def (vsbciq): New.
(vsbcq): New.
* config/arm/arm-mve-builtins-base.h (vsbciq): New.
(vsbcq): New.
* config/arm/arm_mve.h (vsbciq): Delete.
(vsbciq_m): Delete.
(vsbcq): Delete.
(vsbcq_m): Delete.
(vsbciq_s32): Delete.
(vsbciq_u32): Delete.
(vsbciq_m_s32): Delete.
(vsbciq_m_u32): Delete.
(vsbcq_s32): Delete.
(vsbcq_u32): Delete.
(vsbcq_m_s32): Delete.
(vsbcq_m_u32): Delete.
(__arm_vsbciq_s32): Delete.
(__arm_vsbciq_u32): Delete.
(__arm_vsbciq_m_s32): Delete.
(__arm_vsbciq_m_u32): Delete.
(__arm_vsbcq_s32): Delete.
(__arm_vsbcq_u32): Delete.
(__arm_vsbcq_m_s32): Delete.
(__arm_vsbcq_m_u32): Delete.
(__arm_vsbciq): Delete.
(__arm_vsbciq_m): Delete.
(__arm_vsbcq): Delete.
(__arm_vsbcq_m): Delete.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |  56 +---
 gcc/config/arm/arm-mve-builtins-base.def |   2 +
 gcc/config/arm/arm-mve-builtins-base.h   |   2 +
 gcc/config/arm/arm_mve.h | 170 ---
 4 files changed, 42 insertions(+), 188 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 9c2e11356ef..02fccdcb71f 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -559,12 +559,14 @@ public:
 class vadc_vsbc_impl : public function_base
 {
 public:
-  CONSTEXPR vadc_vsbc_impl (bool init_carry)
-: m_init_carry (init_carry)
+  CONSTEXPR vadc_vsbc_impl (bool init_carry, bool add)
+: m_init_carry (init_carry), m_add (add)
   {}
 
   /* Initialize carry with 0 (vadci).  */
   bool m_init_carry;
+  /* Add (true) or Sub (false).  */
+  bool m_add;
 
   unsigned int
   call_properties (const function_instance &) const override
@@ -647,26 +649,42 @@ public:
   {
   case PRED_none:
/* No predicate.  */
-   unspec = m_init_carry
- ? (e.type_suffix (0).unsigned_p
-? VADCIQ_U
-: VADCIQ_S)
- : (e.type_suffix (0).unsigned_p
-? VADCQ_U
-: VADCQ_S);
+   unspec = m_add
+ ? (m_init_carry
+? (e.type_suffix (0).unsigned_p
+   ? VADCIQ_U
+   : VADCIQ_S)
+: (e.type_suffix (0).unsigned_p
+   ? VADCQ_U
+   : VADCQ_S))
+ : (m_init_carry
+? (e.type_suffix (0).unsigned_p
+   ? VSBCIQ_U
+   : VSBCIQ_S)
+: (e.type_suffix (0).unsigned_p
+   ? VSBCQ_U
+   : VSBCQ_S));
code = code_for_mve_q_v4si (unspec, unspec);
insns = e.use_exact_insn (code);
break;
 
   case PRED_m:
/* "m" predicate.  */
-   unspec = m_init_carry
- ? (e.type_suffix (0).unsigned_p
-? VADCIQ_M_U
-: VADCIQ_M_S)
- : (e.type_suffix (0).unsigned_p
-? VADCQ_M_U
-: VADCQ_M_S);
+   unspec = m_add
+ ? (m_init_carry
+? (e.type_suffix (0).unsigned_p
+   ? VADCIQ_M_U
+   : VADCIQ_M_S)
+: (e.type_suffix (0).unsigned_p
+   ? VADCQ_M_U
+   : VADCQ_M_S))
+ : (m_init_carry
+? (e.type_suffix (0).unsigned_p
+   ? VSBCIQ_M_U
+   : VSBCIQ_M_S)
+: (e.type_suffix (0).unsigned_p
+   ? VSBCQ_M_U
+   : VSBCQ_M_S));
code = code_for_mve_q_m_v4si (unspec, unspec);
insns = e.use_cond_insn (code, 0);
break;
@@ -862,8 +880,8 @@ namespace arm_mve {
 FUNCTION_PRED_P_S_U (vabavq, VABAVQ)
 FUNCTION_WITHOUT_N (vabdq, VABDQ)
 FUNCTION (vabsq, unspec_based_mve_function_exact_insn, (ABS, ABS, ABS, -1, -1, 
-1, VABSQ_M_S, -1, VABSQ_M_F, -1, -1, -1))
-FUNCTION (vadciq, vadc_vsbc_impl, (true))
-FUNCTION (vadcq, vadc_vsbc_impl, (false))
+FUNCTION (vadciq, vadc_vsbc_impl, (true, true))
+FUNCTION (vadcq, vadc_vsbc_impl, (false, true))
 FUNCTION_WITH_RTX_M_N (vaddq, PLUS, VADDQ)
 FUNCTION_PRED_P_S_U (vaddlvaq, VADDLVAQ)
 FUNCTION_PRED_P_S_U (vaddlvq, VADDLVQ)
@@ -1026,6 +1044,8 @@ FUNCTION_WITH_M_N_NO_F (vrshlq, VRSHLQ)
 FUNCTION_ONLY_N_NO_F (vrshrnbq, VRSHRNBQ)
 FUNCTION_ONLY_N_NO_F (vrshrntq, VRSHRNTQ)
 FUNCTION_ONLY_N_NO_F (vrshrq, VRSHRQ)
+FUNCTION (vsbciq, vadc_vsbc_impl, (true, false))
+FUNCTION (vsbcq, vadc_vsbc_impl, (false, false))
 FUNCTION (vshlcq, vshlc_impl,)
 FUNCTION_ONLY_N_NO_F (vshllbq, VSHLLBQ)
 FUNCTION_ONLY

[Bug tree-optimization/109429] [PATCH] ivopts: fixed complexities

2024-09-04 Thread Aleksandar Rakic

>From 0130d3cb01fd9d5c1c997003245ed57bbdeb00a2 Mon Sep 17 00:00:00 2001
From: Aleksandar 
Date: Fri, 23 Aug 2024 11:36:50 +0200
Subject: [PATCH] [Bug tree-optimization/109429] ivopts: fixed complexities

This patch addresses a bug introduced in commit f9f69dd by
correcting the complexity calculation in ivopts. The fix involves
complexity computation reordering and proper invariant variables
handling in address expressions. These changes align with the
approach used in parent commit c2b64ce. The improved complexity
calculations ensure better candidate selection and reduced code
size, particularly for RISC CPUs.

Signed-off-by: Aleksandar Rakic 
Signed-off-by: Jovan Dmitrovic 

gcc/ChangeLog:

* tree-ssa-loop-ivopts.cc (get_address_cost): Fixed
complexity calculation.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/bug_tree-optimization_109429.c: New
test.
---
 .../tree-ssa/bug_tree-optimization_109429.c   | 25 +++
 gcc/tree-ssa-loop-ivopts.cc   | 15 +++
 2 files changed, 35 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bug_tree-optimization_109429.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bug_tree-optimization_109429.c 
b/gcc/testsuite/gcc.dg/tree-ssa/bug_tree-optimization_109429.c
new file mode 100644
index 000..516ce39d486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/bug_tree-optimization_109429.c
@@ -0,0 +1,25 @@
+/* { dg-do compile { target mips64-r6-linux-gnu } } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+/* This test ensures that complexity must be greater than zero if there is
+   an invariant variable or an invariant expression in the address
+   expression.  */
+
+void daxpy (float *vector1, float *vector2, int n, float fp_const)
+{
+   for (int i = 0; i < n; ++i)
+   vector1[i] += fp_const * vector2[i];
+}
+
+void dgefa (float *vector, int m, int n, int l)
+{
+   for (int i = 0; i < n - 1; ++i)
+   {
+   for (int j = i + 1; j < n; ++j)
+   {
+   float t = vector[m * j + l];
+   daxpy (&vector[m * i + i + 1],
+   &vector[m * j + i + 1], n - (i + 1), t);
+   }
+   }
+}
+
+
+/* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*:(.*\n)*Group 1:\n  Type:.*ADDRESS(.*\n)*Group 1:\n  
cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t\\d+(, 
\\d+)*;\t.+\n)(.*\n)*Group 2:\n  
cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" 
"ivopts" { target { mips64-r6-linux-gnu } } } } */
+
+
+/* { dg-final { scan-tree-dump-not "Processing loop 3(.*\n)*:(.*\n)*Group 1:\n  Type:.*ADDRESS(.*\n)*Group 1:\n  
cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*(.+\t.+\t0\t.+\t\\d+(, 
\\d+)*\n)(.*\n)*Group 2:\n  
cand\tcost\tcompl\.\tinv\.expr\.\tinv\.vars\n(.*\n)*Selected IV set for loop 3" 
"ivopts" { target { mips64-r6-linux-gnu } } } } */
+
+
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7cae5bdefea..84c33103938 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -4733,6 +4733,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use 
*use,
   /* Only true if ratio != 1.  */
   bool ok_with_ratio_p = false;
   bool ok_without_ratio_p = false;
+  bool inv_present = false;
   code_helper code = ERROR_MARK;
 
   if (use->type == USE_PTR_ADDRESS)
@@ -4744,6 +4745,7 @@ get_address_cost (struct ivopts_data *data, struct iv_use 
*use,
 
   if (!aff_combination_const_p (aff_inv))
 {
+  inv_present = true;
   parts.index = integer_one_node;
   /* Addressing mode "base + index".  */
   ok_without_ratio_p = valid_mem_ref_p (mem_mode, as, &parts, code);
@@ -4755,8 +4757,8 @@ get_address_cost (struct ivopts_data *data, struct iv_use 
*use,
  if (!ok_with_ratio_p)
parts.step = NULL_TREE;
}
-  if (ok_with_ratio_p || ok_without_ratio_p)
-   {
+  if (!(ok_with_ratio_p || ok_without_ratio_p))
+parts.index = NULL_TREE;
  if (maybe_ne (aff_inv->offset, 0))
{
  parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
@@ -4770,7 +4772,10 @@ get_address_cost (struct ivopts_data *data, struct 
iv_use *use,
  move_fixed_address_to_symbol (&parts, aff_inv);
  /* Base is fixed address and is moved to symbol part.  */
  if (parts.symbol != NULL_TREE && aff_combination_zero_p (aff_inv))
+{
parts.base = NULL_TREE;
+   inv_present = false;
+}
 
  /* Addressing mode "symbol + base + index [<< scale] [+ offset]".  */
  if (parts.symbol != NULL_TREE
@@ -4783,10 +4788,8 @@ get_address_cost (struct ivopts_data *data, struct 
iv_use *use,
  simple_inv = false;
  /* Symbol part is moved back to base part, it can't be NULL.  */
  parts.base = integer_one_node;
+ inv_presen

[PATCH] Use dg-additional-options for gfortran.dg/vect/vect-8.f90 and RISC-V

2024-09-04 Thread Richard Biener

r14-9122-g67a29f99cc8138 disabled scheduling on a lot of testcases
for RISC-V for PR113249 but using dg-options.  This makes
gfortran.dg/vect/vect-8.f90 UNRESOLVED as it relies on default
flags to enable vectorization.

The following uses dg-additional-options instead.

Tested on riscv64-linux with qemu-user, pushed.

I didn't check all the other adjusted tests for similar issues.

* gfortran.dg/vect/vect-8.f90: Use dg-additional-options.
---
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
index 283c36e0ebe..2a3fa90740e 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
@@ -2,7 +2,7 @@
 ! { dg-require-effective-target vect_double }
 ! { dg-additional-options "-fno-tree-loop-distribute-patterns 
-finline-matmul-limit=0" }
 ! PR113249
-! { dg-options "-fno-schedule-insns -fno-schedule-insns2" { target { 
riscv*-*-* } } }
+! { dg-additional-options "-fno-schedule-insns -fno-schedule-insns2" { target 
{ riscv*-*-* } } }
 
 module lfk_prec
  integer, parameter :: dp=kind(1.d0)
-- 
2.43.0

Re: [to-be-committed] [RISC-V][PR target/115921] Improve reassociation for rv64

2024-09-04 Thread Xi Ruoyao

Hi Jeff,

On Mon, 2024-09-02 at 12:53 -0600, Jeff Law wrote:
>  (define_insn_and_split "_shift_reverse"
>    [(set (match_operand:X 0 "register_operand" "=r")
>  (any_bitwise:X (ashift:X (match_operand:X 1 "register_operand" "r")
> @@ -2934,9 +2936,9 @@ (define_insn_and_split "_shift_reverse"
>    "(!SMALL_OPERAND (INTVAL (operands[3]))
>  && SMALL_OPERAND (INTVAL (operands[3]) >> INTVAL (operands[2]))
>  && popcount_hwi (INTVAL (operands[3])) > 1

I'm wondering why we need to check the popcount.  With this patch
applied:

long
test1 (long x)
{
  return (x & 0x110) << 12;
}

is compiled to:

test1:
andia0,a0,272
sllia0,a0,12
ret

as we've expected, but:

long
test2 (long x)
{
  return (x & 0x100) << 12;
}

is compiled to:

test:
li  a5,1048576
sllia0,a0,12
and a0,a0,a5
ret

So why must we spend an instruction to load the single-bit immediate?

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

[PATCH] RISC-V Handle non-grouped stores as single-lane SLP

2024-09-04 Thread Richard Biener



The following enables single-lane loop SLP discovery for non-grouped stores
and adjusts vectorizable_store to properly handle those.

For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop,
not running into the "not falling back to strided accesses" bail-out.
I have not investigated in detail.

There is a set of i386 target assembler test FAILs,
gcc.target/i386/pr88531-2[bc].c in particular fail because the
target cannot identify SLP emulated gathers, see another mail from me.
Others need adjustment, I've adjusted one with this patch only.
In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs
that are because we no longer fold a VEC_COND_EXPR during the
region value-numbering we do after vectorization since we
code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now
instead of having a separate statement which gets forwarded
and then triggers folding.  This leads to sligtly different
code generation.  The solution is probably to use gimple_build
when building stmts or, in this case, directly emit .COND_FMA
instead of .FMA and a VEC_COND_EXPR.

gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single
lane contiguous store from one lane of the 8-lane load and we
expect to use load-lanes for this reason but the heuristic for
forcing single-lane rediscovery as implemented doesn't trigger
here as it treats both SLP instances separately.  FAILs on RISC-V

gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving
scheme for group_size 12 (by extension using the group_size 3
scheme to reduce to 4 lanes and then continue with a pow2 scheme
would work);  we are also not considering load-lanes because of
the above reason, but aarch64 cannot do ld12.  FAILs on AARCH64
(load requires three vectors) and x86_64.

gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because
of "SLP induction not supported for variable-length vectors".

Bootstrapped and tested on x86_64-unknown-linux-gnu with the
caveats above.  Compared to previous versions there are some
adjustments to the testsuite adjustments and more comments on
the above FAILs.

* tree-vect-slp.cc (vect_analyze_slp): Perform single-lane
loop SLP discovery for non-grouped stores.
* tree-vect-stmts.cc (vectorizable_store): Always set
vec_num for SLP.

* gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-19a.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-4-big-array.c: Likewise.
* gcc.dg/vect/slp-4.c: Likewise.
* gcc.dg/vect/slp-5.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-37.c: Likewise.
* gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of
initialization loop.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL.  SLP can handle
inner loop inductions with multiple vector stmt copies.
* gfortran.dg/vect/vect-8.f90: Adjust expected number of
vectorized loops.
* gcc.target/i386/vectorize1.c: Adjust what we scan for.
---
 gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c  |  2 +-
 .../gcc.dg/vect/fast-math-vect-call-1.c   |  2 +-
 .../gcc.dg/vect/fast-math-vect-call-2.c   |  2 +-
 .../gcc.dg/vect/no-scevccp-outer-12.c |  3 +--
 gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c |  5 ++--
 gcc/testsuite/gcc.dg/vect/slp-12b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-12c.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19a.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-37.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4-big-array.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-5.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-7.c |  4 ++--
 gcc/testsuite/gcc.dg/vect/slp-perm-7.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-5.c   |  3 ++-
 gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c  |  1 +
 gcc/testsuite/gcc.target/i386/vectorize1.c|  4 ++--
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  2 +-
 gcc/tree-vect-slp.cc  | 23 +++
 gcc/tree-vect-stmts.cc| 11 +
 21 files changed, 54 insertions(+), 26 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 
b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
index c3f0f6dc1be..ddaac56cc0b 100644
--- a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
+++ b/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c
@@ -27,5 +27,5 @@ foo ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target 
vect_strided4 } } } */
-/* { dg-final { scan-tree-dump-times "v

[PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Evgeny Karpov

Monday, September 4, 2024
Martin Storsjö  wrote:

>> Let's consider the following example, when symbol is located at 3072.
>>
>> 1. Example without the fix
>> compilation time
>> adrp        x0, (3072 + 256) & ~0xFFF // x0 = 0
>> add         x0, x0, (3072 + 256) & 0xFFF // x0 = 3328
>>
>> linking time when symbol is relocated with offset 896
>> adrp        x0, (0 + 896) & ~0xFFF // x0 = 0
>
> Why did the 3072 suddenly become 0 here?

The test case which will be compiled.

adrp x0, symbol + 256
add  x0, x0, symbol + 256

The numbers which are presented in the example help to clarify relocation steps.
symbol is located at 3072.

compilation time
adrp x0, symbol + 256
9000 adrp x0, 0
add  x0, x0, symbol + 256
9134 add x0, x0, 3328

linking time when symbol is relocated with offset 896
compiled  9000 adrp x0, 0
relocated 9000 adrp x0, 0 // without change
((0 << 12) + 896) >> 12 = 0 // relocation calculation

>> add         x0, x0, (3328 + 896) & 0xFFF; // x0 = 128
>
> Where did 3328 come from in your example? Wasn't "symbol" supposed to be
> at address 3072, and we're adding an offset of 896 to it?

compiled  9134 add x0, x0, 3328
relocated 9102 add x0, x0, 128
(3328 + 896) & 0xFFF = 128 // relocation calculation

Regards,
Evgeny

Re: [PATCH] RISC-V: Handle unused-only-live stmts in SLP discovery

2024-09-04 Thread Palmer Dabbelt


On Wed, 04 Sep 2024 04:10:52 PDT (-0700), rguent...@suse.de wrote:

The following adds SLP discovery for roots that are only live but
otherwise unused.  These are usually inductions.  This allows a
few more testcases to be handled fully with SLP, for example
gcc.dg/vect/no-scevccp-pr86725-1.c

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

* tree-vect-slp.cc (vect_analyze_slp): Analyze SLP for live
but otherwise unused defs.
---
 gcc/tree-vect-slp.cc | 30 ++
 1 file changed, 30 insertions(+)


Are you putting the "RISC-V" in there just to kick the CI into running 
it?  If so you can also just CC  (or trip 
anything that matches the filter at [1]).  No big deal on my end, just 
worried non-RISC-V people are going to see the tag and think this is 
RISC-V-only and thus ignore it.


If you're looking for a RISC-V reviewer, I don't really know this stuff 
well enough to say much here.  Robin would probably be the best bet...


[1]: 
https://github.com/patrick-rivos/riscv-gnu-toolchain/blob/1496f76a9ad4081c0afdde8f7f8ffb22573a1789/scripts/create_patches_files.py#L89


diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 41bc92b138a..91d6927016d 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4704,6 +4704,36 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  saved_stmts.release ();
}
}
+
+  /* Make sure to vectorize only-live stmts, usually inductions.  */
+  for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
+   for (auto gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi);
+gsi_next (&gsi))
+ {
+   gphi *lc_phi = *gsi;
+   tree def = gimple_phi_arg_def_from_edge (lc_phi, e);
+   stmt_vec_info stmt_info;
+   if (TREE_CODE (def) == SSA_NAME
+   && !virtual_operand_p (def)
+   && (stmt_info = loop_vinfo->lookup_def (def))
+   && STMT_VINFO_RELEVANT (stmt_info) == vect_used_only_live
+   && STMT_VINFO_LIVE_P (stmt_info)
+   && (STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def
+   || (STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
+   && STMT_VINFO_REDUC_IDX (stmt_info) == -1)))
+ {
+   vec stmts;
+   vec roots = vNULL;
+   vec remain = vNULL;
+   stmts.create (1);
+   stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
+   vect_build_slp_instance (vinfo,
+slp_inst_kind_reduc_group,
+stmts, roots, remain,
+max_tree_size, &limit,
+bst_map, NULL);
+ }
+ }
 }

   hash_set visited_patterns;

Re: Zen5 tuning part 2: disable gather and scatter

2024-09-04 Thread Toon Moene


On 9/4/24 12:55, Jan Hubicka wrote:


On 9/3/24 15:07, Jan Hubicka wrote:


Hi,
We disable gathers for zen4.  It seems that gather has improved a bit compared
to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when
the indices are known ahead of time. Vector loads followed by shuffles result
in a higher load bandwidth." however the situation seems to be more
complicated.


A small bit of "real world" experience (but for zen3):

Recently I switched to gfortran 14.2 for my weather forecasting.
A year ago I had changed "-march=native -mtune=native" (on my zen3 system)
to "-march=native -mtune=znver2" while using gfortran 13 - it had only a
small effect (but positive).

Last Monday I switched back to "-march=native -mtune=native", but that
consistently made a 12 hour computation around 6 minutes slower (i.e., about
1/120th, or 0.8 %). The most computational intensive part of the code needs
gather (either instructions or inline expansions of them).


It would be nice to know what is causing this. Gathers can be enabled
using -mtune-ctrl=use_gather and I would be happy to know about real
world situations where they help.


Ah - one detail that I forgot to mention: our code is "special" in the 
sense that it uses 32-bit floats while it runs on 64-bit address space.


So its use of gather instructions is rather suboptimal, needing 2 gather 
instructions for each actual "gather operation".


Hope this helps,

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands

Re: [PATCH] c++: ICE with TTP [PR96097]

2024-09-04 Thread Jason Merrill


On 9/3/24 6:12 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?


The change to return bool seems like unrelated cleanup; please push that 
separately on trunk only.



+ /* We can also have:
+
+ template  typename X>
+ void func() {}
+ template 
+ struct Y {};
+ void g() { func(); }
+
+where we are not in a template, but the type of PARM is T::type
+and dependent_type_p doesn't want to see a TEMPLATE_TYPE_PARM
+outside a template.  */
+ ++processing_template_decl;
  tree t = tsubst (TREE_TYPE (parm), outer_args, complain, in_decl);
+ --processing_template_decl;
  if (!uses_template_parms (t)
  && !same_type_p (t, TREE_TYPE (arg)))


This looks like the pattern Patrick just removed from 
type_unification_real for PR101463.  Do we want to make the same change 
here?


Jason

Re: [PATCH] c++: noexcept and pointer to member function type [PR113108]

2024-09-04 Thread Jason Merrill


On 9/3/24 2:47 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?


OK.


-- >8 --
We ICE in nothrow_spec_p because it got a DEFERRED_NOEXCEPT.
This DEFERRED_NOEXCEPT was created in implicitly_declare_fn
when declaring

   Foo& operator=(Foo&&) = default;

in the test.  The problem is that in resolve_overloaded_unification
we call maybe_instantiate_noexcept before try_one_overload only in
the TEMPLATE_ID_EXPR case.

PR c++/113108

gcc/cp/ChangeLog:

* pt.cc (resolve_overloaded_unification): Call
maybe_instantiate_noexcept.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/noexcept-type28.C: New test.
---
  gcc/cp/pt.cc |  2 ++
  gcc/testsuite/g++.dg/cpp1z/noexcept-type28.C | 18 ++
  2 files changed, 20 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/noexcept-type28.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 024fa8a5529..747e627f547 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -23787,6 +23787,8 @@ resolve_overloaded_unification (tree tparms,
  for (lkp_iterator iter (arg); iter; ++iter)
{
tree fn = *iter;
+   if (flag_noexcept_type)
+ maybe_instantiate_noexcept (fn, tf_none);
if (try_one_overload (tparms, targs, tempargs, parm, TREE_TYPE (fn),
  strict, sub_strict, addr_p, explain_p)
&& (!goodfn || !decls_match (goodfn, fn)))
diff --git a/gcc/testsuite/g++.dg/cpp1z/noexcept-type28.C 
b/gcc/testsuite/g++.dg/cpp1z/noexcept-type28.C
new file mode 100644
index 000..bf0b927b8ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/noexcept-type28.C
@@ -0,0 +1,18 @@
+// PR c++/113108
+// { dg-do compile { target c++17 } }
+
+template 
+struct Foo {
+Foo& operator=(Foo&&) = default;
+T data;
+};
+
+template 
+void consume(Foo& (Foo::*)(Foo&&) ) {}
+
+template 
+void consume(Foo& (Foo::*)(Foo&&) noexcept) {}
+
+int main() {
+consume(&Foo::operator=);
+}

base-commit: f0ab3de6ec0e3540f2e57f3f5628005f0a4e3fa5

[pushed] c++: add a testcase for [PR 108620]

2024-09-04 Thread Arsen Arsenović

Pushed as obvious.
-- >8 --
Fixed by r15-2540-g32e678b2ed7521.  Add a testcase, as the original ones
do not cover this particular failure mode.

gcc/testsuite/ChangeLog:

PR c++/108620
* g++.dg/coroutines/pr108620.C: New test.
---
 gcc/testsuite/g++.dg/coroutines/pr108620.C | 95 ++
 1 file changed, 95 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr108620.C

diff --git a/gcc/testsuite/g++.dg/coroutines/pr108620.C 
b/gcc/testsuite/g++.dg/coroutines/pr108620.C
new file mode 100644
index ..e8016b9f8a23
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr108620.C
@@ -0,0 +1,95 @@
+// https://gcc.gnu.org/PR108620
+#include 
+#include 
+#include 
+
+template
+struct task;
+
+template 
+struct task_private_data {
+  inline task_private_data() noexcept : data_(nullptr) {}
+  inline task_private_data(PrivateDataType* input) noexcept : data_(input) {}
+  inline task_private_data(task_private_data&& other) noexcept = default;
+  inline task_private_data& operator=(task_private_data&&) noexcept = default;
+  inline task_private_data(const task_private_data&) = delete;
+  inline task_private_data& operator=(const task_private_data&) = delete;
+  inline ~task_private_data() {}
+
+  inline bool await_ready() const noexcept { return true; }
+  inline PrivateDataType* await_resume() const noexcept { return data_; }
+  inline void await_suspend(std::coroutine_handle<>) noexcept {}
+
+  PrivateDataType* data_;
+};
+
+template
+struct task_context {
+PrivateDataType data_;
+};
+
+template
+struct task {
+using self_type = task;
+std::shared_ptr> context_;
+
+task(const std::shared_ptr>& input): 
context_(input) {}
+
+static auto yield_private_data() noexcept { return 
task_private_data{}; }
+
+struct promise_type {
+  std::shared_ptr> context_;
+
+  template
+  promise_type(Input&& input, Rest&&...) {
+context_ = std::make_shared>();
+context_->data_ = std::forward(input);
+  }
+
+  auto get_return_object() noexcept { return self_type{context_}; }
+  std::suspend_never initial_suspend() noexcept { return {}; }
+  std::suspend_never final_suspend() noexcept { return {}; }
+  void unhandled_exception() { throw; }
+
+  template
+  void return_value(ReturnType&&) {}
+
+  template 
+  inline task_private_data yield_value(
+  task_private_data&& input) noexcept {
+input.data_ = &context_->data_;
+return task_private_data(input.data_);
+  }
+};
+};
+
+template
+task call1(TArg&& arg, OutputType& output) {
+OutputType* ptr = co_yield task::yield_private_data();
+output = *ptr;
+co_return 0;
+}
+
+
+struct container {
+std::string* ptr;
+};
+
+template
+task call2(TArg&& arg, container& output) {
+output.ptr = co_yield task::yield_private_data();
+co_return 0;
+}
+
+int main() {
+  // success
+  std::string output1;
+  call1(std::string("hello1"), output1);
+  std::cout<< "output1: "<< output1<< std::endl;
+
+  // crash
+  container output2;
+  auto task2 = call2(std::string("hello2"), output2);
+  std::cout<< "output2: "<< *output2.ptr<< std::endl;
+  return 0;
+}
-- 
2.46.0

Re: [PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-04 Thread Marek Polacek

On Wed, Sep 04, 2024 at 08:15:25AM -0400, Jason Merrill wrote:
> Tested x86_64-pc-linux-gnu.  Any objections?

Looks good except...
 
> +/* Attributes also recognized in the clang:: namespace.  */
> +const struct attribute_spec c_common_clang_attributes[] = {
> +  { "flag_enum",   0, 0, false, true, false, false,
> +   handle_flag_enum_attribute, NULL }
> +};
> +
> +const struct scoped_attribute_specs c_common_clang_attribute_table =
> +{
> +  "clang", { c_common_clang_attributes }
> +};
> +
>  /* Give the specifications for the format attributes, used by C and all
> descendants.
>  
> @@ -5017,6 +5031,25 @@ handle_fd_arg_attribute (tree *node, tree name, tree 
> args,
>return NULL_TREE;
>  }
>  
> +/* Handle the "flag_enum" attribute.  */
> +
> +static tree
> +handle_flag_enum_attribute (tree *node, tree ARG_UNUSED(name), tree args,
> + int ARG_UNUSED (flags), bool *no_add_attrs)
> +{
> +  if (args)
> +warning (OPT_Wattributes, "%qE attribute arguments ignored", name);

You don't need this check I think; if the # of args isn't correct, we
should not get here.  Then the goto can...go too.

I see that, like clang++, we accept the attribute on scoped enums too.

Marek

Re: [PATCH] c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-04 Thread Jason Merrill


On 9/2/24 1:49 PM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled, because
get_member_function_from_ptrfunc
emits something like
(((FUNCTION.__pfn & 1) != 0)
  ? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
  : FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
or so, so FUNCTION tree is used there 5 times.  There is
if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
but in this case function doesn't have side-effects, just nested ARRAY_REFs.
Now, if all the FUNCTION trees would be shared, it would work fine,
FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
that isn't the case, both the BIT_AND_EXPR shortening and conversion to
bool done for build_conditional_expr actually unshare_expr that first
expression, but none of the other 4 are unshared.  With -fsanitize=bounds,
.UBSAN_BOUNDS calls are added to the ARRAY_REFs and use SAVE_EXPRs to avoid
evaluating the argument multiple times, but because that FUNCTION tree is
first used in the second argument of COND_EXPR (i.e. conditionally), the
SAVE_EXPR initialization is done just there and then the third argument
of COND_EXPR just uses the uninitialized temporary and so does the first
argument computation as well.


If there are SAVE_EXPRs in FUNCTION, why is TREE_SIDE_EFFECTS false?


The following patch fixes it by also unsharing the trees that end up
in the third COND_EXPR argument and unsharing what is passed as the first
argument too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Guess another possibility would be to somehow arrange for the BIT_AND_EXPR
and NE_EXPR not to do unshare_expr (but that would mean build2 most likely
rather than cp_build_binary_op), or force FUNCTION into a SAVE_EXPR whenever
it isn't say a tcc_declaration or tcc_constant even if it doesn't have
side-effects (but that would mean creating the SAVE_EXPR by hand) or create
a TARGET_EXPR instead (force_target_expr).


It seems desirable to only do the bounds-checking once.


2024-09-02  Jakub Jelinek  

PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Call unshare_expr
on *instance_ptrptr and e3.

* g++.dg/ubsan/pr116449.C: New test.

--- gcc/cp/typeck.cc.jj 2024-07-26 08:34:18.117159928 +0200
+++ gcc/cp/typeck.cc2024-09-02 12:35:54.254380135 +0200
@@ -4255,8 +4255,11 @@ get_member_function_from_ptrfunc (tree *
/* ...and then the delta in the PMF.  */
instance_ptr = fold_build_pointer_plus (instance_ptr, delta);
  
-  /* Hand back the adjusted 'this' argument to our caller.  */

-  *instance_ptrptr = instance_ptr;
+  /* Hand back the adjusted 'this' argument to our caller.
+The e1 computation unfortunately can result in unshare_expr
+and we need to make sure the delta tree isn't first evaluated
+in the COND_EXPR branch.  */
+  *instance_ptrptr = unshare_expr (instance_ptr);
  
if (nonvirtual)

/* Now just return the pointer.  */
@@ -4283,6 +4286,9 @@ get_member_function_from_ptrfunc (tree *
 cp_build_addr_expr (e2, complain));
  
e2 = fold_convert (TREE_TYPE (e3), e2);

+  /* As e1 computation can result in unshare_expr, make sure e3 isn't
+shared with the e2 trees.  */
+  e3 = unshare_expr (e3);
e1 = build_conditional_expr (input_location, e1, e2, e3, complain);
if (e1 == error_mark_node)
return error_mark_node;
--- gcc/testsuite/g++.dg/ubsan/pr116449.C.jj2024-09-02 12:34:18.545629851 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr116449.C   2024-09-02 12:31:49.070581617 
+0200
@@ -0,0 +1,14 @@
+// PR c++/116449
+// { dg-do compile }
+// { dg-options "-O2 -Wall -fsanitize=undefined" }
+
+struct C { void foo (int); void bar (); int c[16]; };
+typedef void (C::*P) ();
+struct D { P d; };
+static D e[1] = { { &C::bar } };
+
+void
+C::foo (int x)
+{
+  (this->*e[c[x]].d) ();
+}

Jakub

Re: [PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Martin Storsjö


On Wed, 4 Sep 2024, Evgeny Karpov wrote:


Monday, September 4, 2024
Martin Storsjö  wrote:


Let's consider the following example, when symbol is located at 3072.

1. Example without the fix
compilation time
adrp        x0, (3072 + 256) & ~0xFFF // x0 = 0
add         x0, x0, (3072 + 256) & 0xFFF // x0 = 3328

linking time when symbol is relocated with offset 896
adrp        x0, (0 + 896) & ~0xFFF // x0 = 0


Why did the 3072 suddenly become 0 here?


The test case which will be compiled.

adrp x0, symbol + 256
add  x0, x0, symbol + 256

The numbers which are presented in the example help to clarify relocation steps.
symbol is located at 3072.

compilation time
adrp x0, symbol + 256
9000 adrp x0, 0


This is your first error.

As the symbol offset is 256, you will need to encode the offset "256" in 
the instruction immediate field. Not "256 >> 12". This is the somewhat 
non-obvious part here, but this is the only way symbol offsets can work. 
This is how MS tools handle immediates in IMAGE_REL_ARM64_PAGEBASE_REL21, 
and LLVM has replicated this bit.


See 
https://github.com/llvm/llvm-project/commit/0b7bf7a2e3cb34086d6a05419319fd35ae8dd9a8#diff-502793e1256bca6339a09f5756111a947a2aeb5c600cdd22b2e1679db5ec48b0R162 
for the case where I implemented this bit in LLVM.



add  x0, x0, symbol + 256
9134 add x0, x0, 3328

linking time when symbol is relocated with offset 896
compiled  9000 adrp x0, 0
relocated 9000 adrp x0, 0 // without change
((0 << 12) + 896) >> 12 = 0 // relocation calculation


This is the wrong calculation for how to apply a 
IMAGE_REL_ARM64_PAGEBASE_REL21 relocation.


If the instruction in the object file has the immediate obj_imm, and the 
instruction is at address instr_addr, the linker should update the 
instruction, setting the immediate to ((symbol_addr + obj_imm) >> 12 - 
instr_addr >> 12.


See 
https://github.com/llvm/llvm-project/commit/38608c0975772513007ec08116a1a3fb6160722b 
how this was implemented in LLD.


// Martin

Re: [PATCH] c++: Add missing auto_diagnostic_groups

2024-09-04 Thread Jason Merrill


On 9/2/24 7:43 AM, Nathaniel Shead wrote:

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659796.html


OK.


For clarity's sake, here's the full patch with the adjustment I
mentioned earlier:

-- >8 --

This patch goes through all .cc files in gcc/cp and adds in any
auto_diagnostic_groups that seem to be missing by looking for any
'inform' calls that aren't grouped with their respective error/warning.
Now with SARIF output support this seems to be a bit more important.

The patch isn't complete; I've tried to also track helper functions used
for diagnostics to group them, but some may have been missed.
Additionally there are a few functions that are definitely missing
groupings but I wasn't able to see an obvious way to add them without
potentially grouping together unrelated messages.

This list includes:

- lazy_load_{binding,pendings} "during load of {binding,pendings} for"
- cp_finish_decomp "in initialization of structured binding variable"
- require_deduced_type "using __builtin_source_location"
- convert_nontype_argument "in template argument for type %qT"
- coerce_template_params "so any instantiation with a non-empty parameter pack"
- tsubst_default_argument "when instantiating default argument"
- invalid_nontype_parm_type_p "invalid template non-type parameter"

gcc/cp/ChangeLog:

* class.cc (add_method): Add missing auto_diagnostic_group.
(handle_using_decl): Likewise.
(maybe_warn_about_overly_private_class): Likewise.
(check_field_decl): Likewise.
(check_field_decls): Likewise.
(resolve_address_of_overloaded_function): Likewise.
(note_name_declared_in_class): Likewise.
* constraint.cc (associate_classtype_constraints): Likewise.
(diagnose_trait_expr): Clean up whitespace.
* coroutines.cc (find_coro_traits_template_decl): Add missing
auto_diagnostic_group.
(coro_promise_type_found_p): Likewise.
(coro_diagnose_throwing_fn): Likewise.
* cvt.cc (build_expr_type_conversion): Likewise.
* decl.cc (validate_constexpr_redeclaration): Likewise.
(duplicate_function_template_decls): Likewise.
(duplicate_decls): Likewise.
(lookup_label_1): Likewise.
(check_previous_goto_1): Likewise.
(check_goto_1): Likewise.
(make_typename_type): Likewise.
(make_unbound_class_template): Likewise.
(check_tag_decl): Likewise.
(start_decl): Likewise.
(maybe_commonize_var): Likewise.
(check_for_uninitialized_const_var): Likewise.
(reshape_init_class): Likewise.
(check_initializer): Likewise.
(cp_finish_decl): Likewise.
(find_decomp_class_base): Likewise.
(cp_finish_decomp): Likewise.
(expand_static_init): Likewise.
(grokfndecl): Likewise.
(grokdeclarator): Likewise.
(check_elaborated_type_specifier): Likewise.
(lookup_and_check_tag): Likewise.
(xref_tag): Likewise.
(cxx_simulate_enum_decl): Likewise.
(finish_function): Likewise.
* decl2.cc (check_classfn): Likewise.
(record_mangling): Likewise.
(mark_used): Likewise.
* error.cc (qualified_name_lookup_error): Likewise.
* except.cc (build_throw): Likewise.
* init.cc (get_nsdmi): Likewise.
(diagnose_uninitialized_cst_or_ref_member_1): Likewise.
(warn_placement_new_too_small): Likewise.
(build_new_1): Likewise.
(build_vec_delete_1): Likewise.
(build_delete): Likewise.
* lambda.cc (add_capture): Likewise.
(add_default_capture): Likewise.
* lex.cc (unqualified_fn_lookup_error): Likewise.
* method.cc (synthesize_method): Likewise.
(defaulted_late_check): Likewise.
* module.cc (trees_in::is_matching_decl): Likewise.
(trees_in::read_enum_def): Likewise.
(module_state::check_not_purview): Likewise.
(module_state::deferred_macro): Likewise.
(module_state::read_config): Likewise.
(module_state::check_read): Likewise.
(declare_module): Likewise.
(init_modules): Likewise.
* name-lookup.cc (diagnose_name_conflict): Likewise.
(lookup_using_decl): Likewise.
(set_decl_namespace): Likewise.
(finish_using_directive): Likewise.
(push_namespace): Likewise.
(add_imported_namespace): Likewise.
* parser.cc (cp_parser_check_for_definition_in_return_type): Likewise.
(cp_parser_userdef_numeric_literal): Likewise.
(cp_parser_nested_name_specifier_opt): Likewise.
(cp_parser_new_expression): Likewise.
(cp_parser_binary_expression): Likewise.
(cp_parser_lambda_introducer): Likewise.
(cp_parser_module_declaration): Likewise.
(cp_parser_import_declaration): Likewise, removing gotos to
support this.
(cp_parser_declaration): Add missing auto_diagnostic_group.
(cp_par

Re: Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN' (was: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2024-09-04 Thread David Malcolm

On Fri, 2024-06-28 at 15:06 +0200, Thomas Schwinge wrote:
> Hi!
> 
> As part of this:
> 
> On 2013-07-26T11:04:33-0400, David Malcolm 
> wrote:
> > This patch is the hand-written part of the conversion of passes
> > from
> > C structs to C++ classes.
> 
> > --- a/gcc/passes.c
> > +++ b/gcc/passes.c
> 
> ..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to always
> refer
> to the first instance of 'PASS':
> 
> >  #define PUSH_INSERT_PASSES_WITHIN(PASS) \
> >    { \
> > -    struct opt_pass **p = &(PASS).pass.sub;
> > +    struct opt_pass **p = &(PASS ## _1)->sub;
> 
> ..., however we did change 'NEXT_PASS(PASS, NUM)' to actually use
> 'NUM':
> 
> > -#define NEXT_PASS(PASS, NUM)  (p = next_pass_1 (p,
> > &((PASS).pass)))
> > +#define NEXT_PASS(PASS, NUM) \
> > +  do { \
> > +    gcc_assert (NULL == PASS ## _ ## NUM); \
> > +    if ((NUM) == 1)  \
> > +  PASS ## _1 = make_##PASS (ctxt_);  \
> > +    else \
> > +  {  \
> > +    gcc_assert (PASS ## _1); \
> > +    PASS ## _ ## NUM = PASS ## _1->clone (); \
> > +  }  \
> > +    p = next_pass_1 (p, PASS ## _ ## NUM);  \
> > +  } while (0)
> 
> This was never re-synchronized later on, and is problematic if you
> try to
> do something like this; change:
> 
>     [...]
>     NEXT_PASS (pass_postreload);
>     PUSH_INSERT_PASSES_WITHIN (pass_postreload)
>     NEXT_PASS (pass_postreload_cse);
>     [...]
>     NEXT_PASS (pass_cprop_hardreg);
>     NEXT_PASS (pass_fast_rtl_dce);
>     NEXT_PASS (pass_reorder_blocks);
>     [...]
>     POP_INSERT_PASSES ()
>     [...]
> 
> ... into:
> 
>     [...]
>     NEXT_PASS (pass_postreload);
>     PUSH_INSERT_PASSES_WITHIN (pass_postreload)
>     NEXT_PASS (pass_postreload_cse);
>     [...]
>     NEXT_PASS (pass_cprop_hardreg);
>     POP_INSERT_PASSES ()
>     NEXT_PASS (pass_fast_rtl_dce);
>     NEXT_PASS (pass_postreload);
>     PUSH_INSERT_PASSES_WITHIN (pass_postreload)
>     NEXT_PASS (pass_reorder_blocks);
>     [...]
>     POP_INSERT_PASSES ()
>     [...]
> 
> That is, interrupt the pass pipeline within 'pass_postreload', in
> order
> to unconditionally run 'pass_fast_rtl_dce' even if not running
> 'pass_postreload'.  What happens is that the second
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' overwrites the first
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' instead of applying to
> the
> second (preceding) 'NEXT_PASS (pass_postreload);'.
> 
> (I ran into this in context of what I tried in
> 
> "nvptx vs. [PATCH] Add a late-combine pass [PR106594]"; discuss that
> specific use case over there, not here.)
> 
> OK to address this with the attached
> "Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'"?
> 
> This depends on
> 
> "Rewrite usage comment at the top of 'gcc/passes.def'" to avoid
> running
> into the 'ERROR: Can't locate [...]' that I'm adding, while
> processing
> the 'PUSH_INSERT_PASSES_WITHIN (PASS)' in the usage comment at the
> top of
> 'gcc/passes.def', where 'NEXT_PASS (PASS)' only appears later.  ;-)
> 
> I've verified this does the expected thing for the main
> 'gcc/passes.def',
> and that 'PUSH_INSERT_PASSES_WITHIN' is not used/not applicable for
> 'PASSES_EXTRA' ('gcc/config/*/*-passes.def').

Thanks; patch LGTM.

Dave

Re: [PATCH] c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-04 Thread Jakub Jelinek

On Wed, Sep 04, 2024 at 11:06:22AM -0400, Jason Merrill wrote:
> On 9/2/24 1:49 PM, Jakub Jelinek wrote:
> > Hi!
> > 
> > The following testcase is miscompiled, because
> > get_member_function_from_ptrfunc
> > emits something like
> > (((FUNCTION.__pfn & 1) != 0)
> >   ? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
> >   : FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
> > or so, so FUNCTION tree is used there 5 times.  There is
> > if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
> > but in this case function doesn't have side-effects, just nested ARRAY_REFs.
> > Now, if all the FUNCTION trees would be shared, it would work fine,
> > FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
> > that isn't the case, both the BIT_AND_EXPR shortening and conversion to
> > bool done for build_conditional_expr actually unshare_expr that first
> > expression, but none of the other 4 are unshared.  With -fsanitize=bounds,
> > .UBSAN_BOUNDS calls are added to the ARRAY_REFs and use SAVE_EXPRs to avoid
> > evaluating the argument multiple times, but because that FUNCTION tree is
> > first used in the second argument of COND_EXPR (i.e. conditionally), the
> > SAVE_EXPR initialization is done just there and then the third argument
> > of COND_EXPR just uses the uninitialized temporary and so does the first
> > argument computation as well.
> 
> If there are SAVE_EXPRs in FUNCTION, why is TREE_SIDE_EFFECTS false?

They aren't there when this function is called, they are added only during
cp_genericize when instrumenting the ARRAY_REFs with UBSAN.

And unlike this function, e.g. ubsan_instrument_bounds just calls save_expr
on the index and not if (TREE_SIDE_EFFECTS (index)) index = save_expr
(index).

> It seems desirable to only do the bounds-checking once.

So, one possibility would be to call save_expr unconditionally in
get_member_function_from_ptrfunc as well.

Or build a TARGET_EXPR (force_target_expr or similar).

Jakub

Re: [PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Martin Storsjö


On Wed, 4 Sep 2024, Martin Storsjö wrote:


On Wed, 4 Sep 2024, Evgeny Karpov wrote:


Monday, September 4, 2024
Martin Storsjö  wrote:


Let's consider the following example, when symbol is located at 3072.

1. Example without the fix
compilation time
adrp        x0, (3072 + 256) & ~0xFFF // x0 = 0
add         x0, x0, (3072 + 256) & 0xFFF // x0 = 3328

linking time when symbol is relocated with offset 896
adrp        x0, (0 + 896) & ~0xFFF // x0 = 0


Why did the 3072 suddenly become 0 here?


The test case which will be compiled.

adrp x0, symbol + 256
add  x0, x0, symbol + 256

The numbers which are presented in the example help to clarify 
relocation steps.

symbol is located at 3072.

compilation time
adrp x0, symbol + 256
9000 adrp x0, 0


This is your first error.

As the symbol offset is 256, you will need to encode the offset "256" in 
the instruction immediate field. Not "256 >> 12". This is the somewhat 
non-obvious part here, but this is the only way symbol offsets can work. 
This is how MS tools handle immediates in IMAGE_REL_ARM64_PAGEBASE_REL21, 
and LLVM has replicated this bit.


See 
https://github.com/llvm/llvm-project/commit/0b7bf7a2e3cb34086d6a05419319fd35ae8dd9a8#diff-502793e1256bca6339a09f5756111a947a2aeb5c600cdd22b2e1679db5ec48b0R162 
for the case where I implemented this bit in LLVM.


To show this in action:

$ cat adrp.s
adrp x0, symbol  + 256
add x0, x0, :lo12:symbol  + 256
$ clang -target aarch64-windows -c adrp.s
$ llvm-objdump -d -r adrp.o

adrp.o: file format coff-arm64

Disassembly of section .text:

 <.text>:
   0: 9800  adrpx0, 0x10 <.text+0x10>
:  IMAGE_REL_ARM64_PAGEBASE_REL21   symbol
   4: 9104  add x0, x0, #0x100
0004:  IMAGE_REL_ARM64_PAGEOFFSET_12A   symbol

The disassembly tool doesn't interpret the immediate correctly (it's not 
0x10, it's 0x100), but the opcode and relocation info is correct.


// Martin

Re: [PATCH RFC] c-family: add attribute flag_enum [PR46457]

2024-09-04 Thread Eric Gallager

On Wed, Sep 4, 2024 at 8:18 AM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu.  Any objections?
>
> -- 8< --
>
> Several PRs complain about -Wswitch warning about a case for a bitwise
> combination of enumerators.  Clang has an attribute flag_enum to prevent
> this; let's adopt that approach as well.
>
> This also recognizes the attribute as [[clang::flag_enum]], introducing
> handling of the clang attribute namespace.
>
> PR c++/46457

Question about PR tagging: should PR c++/81665 be tagged here, too?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81665

>
> gcc/c-family/ChangeLog:
>
> * c-attribs.cc (handle_flag_enum_attribute): New.
> (c_common_gnu_attributes): Add it.
> (c_common_clang_attributes, c_common_clang_attribute_table): New.
> * c-common.h: Declare c_common_clang_attribute_table.
> * c-warn.cc (c_do_switch_warnings): Handle flag_enum.
>
> gcc/c/ChangeLog:
>
> * c-objc-common.h (c_objc_attribute_table): Add
> c_common_clang_attribute_table.
>
> gcc/cp/ChangeLog:
>
> * cp-objcp-common.h (cp_objcp_attribute_table): Add
> c_common_clang_attribute_table.
>
> gcc/testsuite/ChangeLog:
>
> * c-c++-common/attr-flag-enum-1.c: New test.
>
> gcc/ChangeLog:
>
> * doc/extend.texi: Document flag_enum attribute.
> * doc/invoke.texi: Mention flag_enum in -Wswitch.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/regex_constants.h: Use flag_enum.
> ---
>  gcc/doc/extend.texi   |  7 
>  gcc/doc/invoke.texi   | 11 +++---
>  gcc/c-family/c-common.h   |  1 +
>  gcc/c/c-objc-common.h |  1 +
>  gcc/cp/cp-objcp-common.h  |  1 +
>  libstdc++-v3/include/bits/regex_constants.h   |  2 +-
>  gcc/c-family/c-attribs.cc | 33 +
>  gcc/c-family/c-warn.cc|  4 ++
>  gcc/testsuite/c-c++-common/attr-flag-enum-1.c | 37 +++
>  9 files changed, 91 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/attr-flag-enum-1.c
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index 5845bcedf6e..5b9d8c51059 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -9187,6 +9187,13 @@ initialization will result in future breakage.
>  GCC emits warnings based on this attribute by default; use
>  @option{-Wno-designated-init} to suppress them.
>
> +@cindex @code{flag_enum} type attribute
> +@item flag_enum
> +This attribute may be applied to an enumerated type to indicate that
> +its enumerators are used in bitwise operations, so e.g. @option{-Wswitch}
> +should not warn about a @code{case} that corresponds to a bitwise
> +combination of enumerators.
> +
>  @cindex @code{hardbool} type attribute
>  @item hardbool
>  @itemx hardbool (@var{false_value})
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 43afb0984e5..7c6175efbc0 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -7672,9 +7672,9 @@ unless C++14 mode (or newer) is active.
>  Warn whenever a @code{switch} statement has an index of enumerated type
>  and lacks a @code{case} for one or more of the named codes of that
>  enumeration.  (The presence of a @code{default} label prevents this
> -warning.)  @code{case} labels outside the enumeration range also
> -provoke warnings when this option is used (even if there is a
> -@code{default} label).
> +warning.)  @code{case} labels that do not correspond to enumerators also
> +provoke warnings when this option is used, unless the enumeration is marked
> +with the @code{flag_enum} attribute.
>  This warning is enabled by @option{-Wall}.
>
>  @opindex Wswitch-default
> @@ -7688,8 +7688,9 @@ case.
>  @item -Wswitch-enum
>  Warn whenever a @code{switch} statement has an index of enumerated type
>  and lacks a @code{case} for one or more of the named codes of that
> -enumeration.  @code{case} labels outside the enumeration range also
> -provoke warnings when this option is used.  The only difference
> +enumeration.  @code{case} labels that do not correspond to enumerators also
> +provoke warnings when this option is used, unless the enumeration is marked
> +with the @code{flag_enum} attribute.  The only difference
>  between @option{-Wswitch} and this option is that this option gives a
>  warning about an omitted enumeration code even if there is a
>  @code{default} label.
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index d3827573a36..027f077d51b 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -821,6 +821,7 @@ extern struct visibility_flags visibility_options;
>
>  /* Attribute table common to the C front ends.  */
>  extern const struct scoped_attribute_specs c_common_gnu_attribute_table;
> +extern const struct scoped_attribute_specs c_common_clang_attribute_table;
>  extern const struct scoped_attribute_specs c_common_f

Re: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-09-04 Thread Jason Merrill


On 9/1/24 2:51 PM, Simon Martin wrote:

Hi Jason,

On 26 Aug 2024, at 19:23, Jason Merrill wrote:


On 8/25/24 12:37 PM, Simon Martin wrote:

On 24 Aug 2024, at 23:59, Simon Martin wrote:

On 24 Aug 2024, at 15:13, Jason Merrill wrote:


On 8/23/24 12:44 PM, Simon Martin wrote:

We currently emit an incorrect -Woverloaded-virtual warning upon
the



following
test case

=== cut here ===
struct A {
 virtual operator int() { return 42; }
 virtual operator char() = 0;
};
struct B : public A {
 operator char() { return 'A'; }
};
=== cut here ===

The problem is that warn_hidden relies on get_basefndecls to find



the
methods
in A possibly hidden B's operator char(), and gets both the
conversion operator
to int and to char. It eventually wrongly concludes that the
conversion to int
is hidden.

This patch fixes this by filtering out conversion operators to
different types
from the list returned by get_basefndecls.


Hmm, same_signature_p already tries to handle comparing conversion
operators, why isn't that working?


It does indeed.

However, `ovl_range (fns)` does not only contain `char
B::operator()` -
for which `any_override` gets true - but also `conv_op_marker` - for



which `any_override` gets false, causing `seen_non_override` to get
to
true. Because of that, we run the last loop, that will emit a
warning
for all `base_fndecls` (except `char B::operator()` that has been
removed).

We could test `fndecl` and `base_fndecls[k]` against
`conv_op_marker` in
the loop, but we’d still need to inspect the “converting to”
type
in the last loop (for when `warn_overloaded_virtual` is 2). This
would
make the code much more complex than the current patch.


Makes sense.


It would however probably be better if `get_basefndecls` only
returned
the right conversion operator, not all of them. I’ll draft another
version of the patch that does that and submit it in this thread.


I have explored my suggestion further and it actually ends up more
complicated than the initial patch.


Yeah, you'd need to do lookup again for each member of fns.


Please find attached a new revision to fix the reported issue, as
well
as new ones I discovered while testing with -Woverloaded-virtual=2.




It’s pretty close to the initial patch, but (1) adds a missing
“continue;” (2) fixes a location problem when
-Woverloaded-virtual==2 (3) adds more test cases. The commit log is
also
more comprehensive, and should describe well the various problems and




why the patch is correct.



+   if (IDENTIFIER_CONV_OP_P (name)
+   && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
+DECL_CONV_FN_TYPE (base_fndecls[k])))
+ {
+   base_fndecls[k] = NULL_TREE;
+   continue;
+ }


So this removes base_fndecls[k] if it doesn't return the same type as
fndecl.  But what if there's another conversion op in fns that does



return the same type as base_fndecls[k]?

If I add an operator int() to both base and derived in
Woverloaded-virt7.C, the warning disappears.


That was an issue indeed. I’ve reworked the patch, and came up with
the attached latest version. It explicitly keeps track both of
overloaded and of hidden base methods (and the “hiding method” for
the latter), and uses those instead of juggling with bools and nullified
base_decls.

On top of fixing the issue the PR reports, it fixes a few that I came

across while investigating:
- wrongly emitting the warning if the base method is not virtual (the

lines added to Woverloaded-virt1.C would cause a warning without the
patch)
- wrongly emitting the warning when the derived class method is a
template, which is wrong since template members don’t override virtual
base methods (see the change in pr61945.C)


This change seems wrong to me; the warning is documented as "Warn when a 
function declaration hides virtual functions from a base class," and 
templates can certainly hide virtual base methods, as indeed they do in 
that testcase.



- an invalid early return - see Woverloaded-virt9.C,
- reporting the first overload instead of the one that actually hides -
see the dg-note in Woverloaded-virt8.C that’d fail without the patch
because we’d report the int overload)

Successfully tested on x86_64-pc-linux-gnu; OK for trunk?


else if (TREE_CODE (t) == OVERLOAD)
+t = OVL_FIRST (t) != conv_op_marker ? OVL_FIRST (t) : OVL_CHAIN
(t);


Usually OVL_CHAIN will be another OVERLOAD, you want OVL_FIRST
(OVL_CHAIN (t)) in that case.

Thanks. Even though not strictly needed anymore by the updated patch,

I’m still including the (fixed) change in the patch.

Simon

Re: [PATCH] c++, coroutines: Instrument missing return_void UB.

2024-09-04 Thread Jason Merrill


On 9/1/24 12:17 PM, Iain Sandoe wrote:

This came up in discussion of an earlier patch.

I'm in two minds as to whether it's a good idea or not - the underlying
issue being that libubsan does not yet (AFAICT) have the concept of a
coroutine, so that the diagnostics are not very specific and might appear
strange (i.e. "execution reached the end of a value-returning function
without returning a value" which is a bit of an odd diagnostic for
a missing return_void ()).

OTOH one might argue that some diagnostic is better than silent UB .. but
I do not have cycles to address improving this in upstream libsanitizer ...

The logic used here is intended to match cp_maybe_instrument_return ()
although it's not 100% clear that that is doing exactly what the comment
says - since it does not distinguish between -fno-sanitize=return and
the case that the user simply did not specify it.  So that
-fsanitize=unreachable is ignored for both fno-sanitize=return and the
unset case.


I think that's correct, what we care about is whether return 
sanitization is enabled, not which flags were used to specify that.



--- 8< ---

[dcl.fct.def.coroutine] / 6 Note 1:
"If return_void is found, flowing off the end of a coroutine is equivalent
to a co_return with no operand. Otherwise, flowing off the end of a
coroutine results in undefined behavior."

Here we implement this as a check for sanitized returns and call the ubsan
instrumentation; if that is not enabled we mark this as unreachable (which
might trap depending on the target settings).

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Instrument
the case where control flows off the end of a coroutine and the
user promise has no return_void entry.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc | 18 ++
  1 file changed, 18 insertions(+)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index e709d02b5f7..b67f4e3ef88 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -33,6 +33,9 @@ along with GCC; see the file COPYING3.  If not see
  #include "gcc-rich-location.h"
  #include "hash-map.h"
  #include "coroutines.h"
+#include "c-family/c-ubsan.h"
+#include "attribs.h" /* lookup_attribute */


I don't see any use of lookup_attribute?


+#include "asan.h" /* sanitize_flags_p */
  
  static bool coro_promise_type_found_p (tree, location_t);
  
@@ -4335,8 +4338,17 @@ cp_coroutine_transform::wrap_original_function_body ()

finish_expr_stmt (initial_await);
/* Append the original function body.  */
add_stmt (coroutine_body);
+  /* Flowing off the end of a coroutine is equivalent to calling
+promise.return_void () or is UB if the promise does not contain
+that.  Do not add an unreachable unless the user has asked for
+checking of such cases.  */


Let's mention that this is trying to parallel cp_maybe_instrument_return.

Or, actually, how about factoring out the actual statement building from 
that function so we can use it here as well?


Jason

Re: [PATCH] c++, coroutines: Revise promise construction/destruction.

2024-09-04 Thread Jason Merrill


On 8/31/24 12:37 PM, Iain Sandoe wrote:

tested on x86_64-darwin/linux powerpc64le-linux,
OK for trunk? alternate suggestions?
thanks,
Iain

--- 8< ---

In examining the coroutine testcases for unexpected diagnostic output
for 'Wall', I found a 'statement has no effect' warning for the promise
construction in one case.  In particular, the case is where the users
promise type has an implicit CTOR but a user-provided DTOR. Further, the
type does not actually need constructing.

In very early versions of the coroutines code we used to check
TYPE_NEEDS_CONSTRUCTING() to determine whether to attempt to build
a constructor call for the promise.  During review, it was suggested
to use type_build_ctor_call () instead.

This latter call checks the constructors in the type (both user-defined
and implicit) and returns true, amongst other cases if any of the found
CTORs are marked as deprecated.

In a number of places (for example [class.copy.ctor] / 6) the standard
says that some version of an implicit CTOR is deprecated when the user
provides a DTOR.

Thus, for this specific arrangement of promise type, type_build_ctor_call
returns true, because of (for example) a deprecated implicit copy CTOR.

We are not going to use any of the deprecated CTORs and thus will not
see warnings from this - however, since the call returned true, we have
now determined that we should attempt to build a constructor call.

Note as above, the type does not actually require construction and thus
one might expect either a NULL_TREE or error_mark_node in response to
the build_special_member_call ().  However, in practice the function
returns the original instance object instead of a call or some error.

When we add that as a statement it triggers the 'statement has no effect'
warning.

The patch here rearranges the promise construction/destruction code to
allow for the case that a DTOR is required independently of a CTOR. In
addition, we check that the return from build_special_member_call () is
a call expression before we add it as a statement.

gcc/cp/ChangeLog:

* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Separate the
build of promise constructor and destructor.  When evaluating
the constructor, check that build_special_member_call returns
a valid call expression before adding the statement.

Signed-off-by: Iain Sandoe 
---
  gcc/cp/coroutines.cc | 26 --
  1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 20bda5520c0..8cf87f1c135 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -4893,16 +4893,12 @@ cp_coroutine_transform::build_ramp_function ()
tree p = build_class_member_access_expr (deref_fp, promise_m, NULL_TREE,
   false, tf_warning_or_error);
  
-  tree promise_dtor = NULL_TREE;

if (type_build_ctor_call (promise_type))
  {
-  /* Do a placement new constructor for the promise type (we never call
-the new operator, just the constructor on the object in place in the
-frame).
+  /* Construct the promise object [dcl.fct.def.coroutine] / 5.7.
  
-	 First try to find a constructor with the same parameter list as the

-original function (if it has params), failing that find a constructor
-with no parameter list.  */
+First try to find a constructor with an argument list comprised of
+the parameter copies.  */
  
if (DECL_ARGUMENTS (orig_fn_decl))

{
@@ -4914,19 +4910,29 @@ cp_coroutine_transform::build_ramp_function ()
else
r = NULL_TREE;
  
+  /* If that fails then the promise constructor argument list is empty.  */

if (r == NULL_TREE || r == error_mark_node)
r = build_special_member_call (p, complete_ctor_identifier, NULL,
   promise_type, LOOKUP_NORMAL,
   tf_warning_or_error);
  
-  finish_expr_stmt (r);

+  /* If type_build_ctor_call() encounters deprecated implicit CTORs it will
+return true, and therefore we will execute this code path.  However,
+we might well not actually require a CTOR and under those conditions
+the build call above will not return a call expression, but the
+original instance object.  Do not attempt to add the statement unless
+it is a valid call.  */
+  if (r && r != error_mark_node && TREE_CODE (r) == CALL_EXPR)


Maybe check TREE_SIDE_EFFECTS instead of for CALL_EXPR?  OK with that 
change.



+   finish_expr_stmt (r);
+}
  
+  tree promise_dtor = cxx_maybe_build_cleanup (p, tf_warning_or_error);;

+  if (flag_exceptions && promise_dtor)
+{
r = build_modify_expr (loc, coro_promise_live, boolean_type_node,
 INIT_EXPR, loc, boolean_true_node,
 boolean_type_node);
finish_expr_stmt (r);
-
-

Re: [PATCH] c++: fn redecl in fn scope wrongly accepted [PR116239]

2024-09-04 Thread Jason Merrill


On 8/30/24 3:40 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
Redeclaration such as

   void f(void);
   consteval void f(void);

is invalid.  In a namespace scope, we detect the collision in
validate_constexpr_redeclaration, but not when one declaration is
at block scope.

When we have

   void f(void);
   void g() { consteval void f(void); }

we call pushdecl on the second f and call push_local_extern_decl_alias.
It finds the namespace-scope f:

 for (ovl_iterator iter (binding); iter; ++iter)
   if (decls_match (decl, *iter, /*record_versions*/false))
 {
   alias = *iter;
   break;
 }

but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
(and do not call another pushdecl leading to duplicate_decls which
would detect mismatching return types, for example).  I don't think
we want to change decls_match, so a simple fix is to detect the
problem in push_local_extern_decl_alias.

PR c++/116239

gcc/cp/ChangeLog:

* cp-tree.h (validate_constexpr_redeclaration): Declare.
* decl.cc (validate_constexpr_redeclaration): No longer static.
* name-lookup.cc (push_local_extern_decl_alias): Call
validate_constexpr_redeclaration.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/redeclaration-6.C: New test.
---
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl.cc|  2 +-
  gcc/cp/name-lookup.cc |  3 ++
  .../g++.dg/diagnostic/redeclaration-6.C   | 34 +++
  4 files changed, 39 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..1a763b683de 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6992,6 +6992,7 @@ extern bool member_like_constrained_friend_p  (tree);
  extern bool fns_correspond(tree, tree);
  extern int decls_match(tree, tree, bool = 
true);
  extern bool maybe_version_functions   (tree, tree, bool);
+extern bool validate_constexpr_redeclaration   (tree, tree);
  extern bool merge_default_template_args   (tree, tree, bool);
  extern tree duplicate_decls   (tree, tree,
 bool hiding = false,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 6458e96bded..4ad68d609d7 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1412,7 +1412,7 @@ check_redeclaration_exception_specification (tree 
new_decl,
  /* Return true if OLD_DECL and NEW_DECL agree on constexprness.
 Otherwise issue diagnostics.  */
  
-static bool

+bool
  validate_constexpr_redeclaration (tree old_decl, tree new_decl)
  {
old_decl = STRIP_TEMPLATE (old_decl);
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 70ad4cbf3b5..6777d97ac2e 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3708,6 +3708,9 @@ push_local_extern_decl_alias (tree decl)
}
  }
  
+  if (!validate_constexpr_redeclaration (alias, decl))

+return;
+
retrofit_lang_decl (decl);
DECL_LOCAL_DECL_ALIAS (decl) = alias;
  }


I don't think we need this in the case that we built a new alias and 
pushed it; in that case alias is built from decl, and should certainly 
match already.  So let's put this call before we decide to build a new 
alias, either in the loop or just after it.


Jason

Re: [PATCH v1 6/9] aarch64: Use symbols without offset to prevent relocation issues

2024-09-04 Thread Martin Storsjö


On Wed, 4 Sep 2024, Evgeny Karpov wrote:


Monday, September 4, 2024
Martin Storsjö  wrote:


compilation time
adrp x0, symbol + 256
9000 adrp x0, 0


As the symbol offset is 256, you will need to encode the offset "256" in
the instruction immediate field. Not "256 >> 12". This is the somewhat
non-obvious part here, but this is the only way symbol offsets can work.
This is how MS tools handle immediates in IMAGE_REL_ARM64_PAGEBASE_REL21,
and LLVM has replicated this bit.


This approach does not allow to address 4GB, instead it can address only 1MB.
This issue has been fixed in this patch series.

https://sourceware.org/pipermail/binutils/2024-August/136481.html


But this is not something you can redefine yourself! The relocation format 
and their behaviour is defined by Microsof (your employer?), you can't 
just change it within the scope of GNU tools because you disagree with it!



Yes, the immediate offset can be maximum 1 MB. But this doesn't mean that 
you can't address anywhere in the 4 GB address space of a PE image. It 
just means that an IMAGE_REL_ARM64_PAGEBASE_REL21 can point anywhere up to 
1 MB before/after a symbol. You can't have one single symbol at the start 
of an image and try to add a fixed 4 GB offset on top of that - that's not 
what any regular object file would do anyway.



Yes, it is possible to hit cases where you want an offset slightly larger 
than 1 MB - if you happen to have a very large object file. It's very rare 
though, but it can happen. In LLVM we fixed this by injecting extra label 
symbols with 1 MB intervals if it turns out that an individual section 
ends up larger than this, like this: 
https://github.com/llvm/llvm-project/commit/06d0d449d8555ae5f1ac33e8d4bb4ae40eb080d3



armasm produces the same opcode
for: adrp x0, symbol + 256
it will be: 9000 adrp x0, 0


It seems like armasm64 doesn't handle this case correctly, and/or is 
inconsistent.


But let's see what MSVC cl.exe does, if you don't trust my other 
references.


$ cat adrp.c
extern char array[];
char *getPtr(void) {
return &array[256];
}
$ cl -c -O2 adrp.c -Fa
Microsoft (R) C/C++ Optimizing Compiler Version 19.41.34120 for ARM64
Copyright (C) Microsoft Corporation.  All rights reserved.

adrp.c
$ dumpbin -nologo -disasm adrp.obj

Dump of file adrp.obj

File Type: COFF OBJECT

getPtr:
  : 9808  adrpx8,array+#0x100
  0004: 91040100  add x0,x8,array+#0x100
  0008: D65F03C0  ret
$ cat adrp.asm
[...]
AREA|.text$mn|, CODE, ARM64
|getPtr| PROC
adrpx8,array+#0x100
add x0,x8,array+#0x100
ret
ENDP  ; |getPtr|
END

Unfortunately, it seems like armasm64 doesn't actually manage to assemble 
the output of MSVC in this case. If the # chars are removed, it can 
assemble it, but the offsets simply aren't encoded at all - neither for 
the adrp nor for the add. So it simply seems that armasm64 doesn't support 
immediates for symbol offsets at all.


Nevertheless, the object file format supports it just fine, MSVC cl.exe 
uses it, and link.exe handles it exactly like I've described.


// Martin

[PATCH] libstdc++: hashing support for chrono value classes (P2592R2)

2024-09-04 Thread Giuseppe D'Angelo


Hello,

The attached patch implements P2592, adding std::hash specializations 
for std::chrono classes.


One aspect I'm quite unhappy with is the hash combiner I've used. I'm 
not sure if there's some longer-term goal for libstdc++ here -- would 
you prefer to roll something à la Boost.HashCombine / P0814?


Not only it's likely to be cheaper, but it would also be more 
constexpr-friendly, in preparation of constexpr std::hash (P3372).


As usual, any feedback is appreciated :)

Thank you,
--
Giuseppe D'Angelo
From 7f1c88139c2b906982cb036f39bfa80db122c7af Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Wed, 4 Sep 2024 12:57:51 +0200
Subject: [PATCH] libstdc++: hashing support for chrono value classes (P2592R2)

This commit implements [time.hash], added by P2592 for C++26.
The implementation of the various hash specializations is
mostly straightforward:

* duration hashes its representation (not the period);
* time_point hashes its duration;
* the calendaring classes (year, month, day, etc.) hash their
  values;
* zoned_time hashes the time zone pointer and its time point.

There are however a couple of challenges:

* The noexcept specifications for hashing duration, time_point,
  zoned_time are slightly more convoluted than expected, as
  their getters are noexcept(false) (e.g. calling count() on a
  duration will copy the representation and that may throw);

* [time.duration] says that "Rep shall be an arithmetic type or a
  class emulating an arithmetic type". Technically speaking, this
  means that `const int` is a valid Rep; but we can't use
  hash.

  I'm not sure if this is deliberate or not (cf. LWG951, LWG953),
  but I've decided to support it nonetheless.

* zoned_time and the calendar classes that combine several
  parts (e.g. month_day) need a hash combiner. The one
  available in _Hash_impl works on objects representations,
  not values, and given the nature of the calendar classes
  I'm very afraid that I may accidentally be hashing padding
  bits. Therefore, I've added a helper convenience combiner.

libstdc++-v3/ChangeLog:

	* include/bits/functional_hash.h: Add a convenience hash
	  combiner.
	* include/bits/version.def: Bump the feature-testing macro.
	* include/bits/version.h: Regenerate.
	* include/std/chrono: Add std::hash specializations for the
	  value classes in namespace chrono.
	* testsuite/std/time/hash.cc: New test.

Signed-off-by: Giuseppe D'Angelo 
---
 libstdc++-v3/include/bits/functional_hash.h |  31 ++
 libstdc++-v3/include/bits/version.def   |   6 +
 libstdc++-v3/include/bits/version.h |   7 +-
 libstdc++-v3/include/std/chrono | 299 
 libstdc++-v3/testsuite/std/time/hash.cc | 225 +++
 5 files changed, 567 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/hash.cc

diff --git a/libstdc++-v3/include/bits/functional_hash.h b/libstdc++-v3/include/bits/functional_hash.h
index 3626ebe816b..c0f82601b4b 100644
--- a/libstdc++-v3/include/bits/functional_hash.h
+++ b/libstdc++-v3/include/bits/functional_hash.h
@@ -235,6 +235,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return hash(&__val, sizeof(__val), __hash); }
   };
 
+#if __cplusplus >= 201103L // C++11
+  // A convenience hash combiner
+  struct _Hash_combiner
+  {
+static void __hash_combine(size_t&) {}
+
+template
+  static void
+  __hash_combine(size_t& __result,
+		 const _Arg& __arg,
+		 const _Args&... __args)
+  {
+	const size_t __arg_hash = hash<_Arg>{}(__arg);
+	__result = _Hash_impl::__hash_combine(__arg_hash, __result);
+	__hash_combine(__result, __args...);
+  }
+
+static size_t __hash()
+  { return 0; }
+
+template
+  static size_t __hash(const _Arg& __arg, const _Args&... __args)
+  {
+	const size_t __arg_hash = hash<_Arg>{}(__arg);
+	size_t __result = _Hash_impl::hash(__arg_hash);
+	__hash_combine(__result, __args...);
+	return __result;
+  }
+  };
+#endif // C++11
+
   /// Specialization for float.
   template<>
 struct hash : public __hash_base
diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def
index 59b028c44af..2a2bbf54b3f 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -575,6 +575,12 @@ ftms = {
 
 ftms = {
   name = chrono;
+  values = {
+v = 202306;
+cxxmin = 26;
+hosted = yes;
+cxx11abi = yes;
+  };
   values = {
 v = 201907;
 cxxmin = 20;
diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h
index e465131d99b..c75cb009ada 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -644,7 +644,12 @@
 #undef __glibcxx_want_boyer_moore_searcher
 
 #if !defined(__cpp_lib_chrono)
-# if (__cplusplus >= 202002L) && _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_HOSTED
+# if (__cplusplus >  202302L) && _GLIBCXX_USE_CXX11_ABI && _GLIBCXX_HOSTED
+#  define __glibcxx_chrono 202306L
+#  if

Re: [PATCH] c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-04 Thread Jason Merrill


On 9/4/24 11:15 AM, Jakub Jelinek wrote:

On Wed, Sep 04, 2024 at 11:06:22AM -0400, Jason Merrill wrote:

On 9/2/24 1:49 PM, Jakub Jelinek wrote:

Hi!

The following testcase is miscompiled, because
get_member_function_from_ptrfunc
emits something like
(((FUNCTION.__pfn & 1) != 0)
   ? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
   : FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
or so, so FUNCTION tree is used there 5 times.  There is
if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
but in this case function doesn't have side-effects, just nested ARRAY_REFs.
Now, if all the FUNCTION trees would be shared, it would work fine,
FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
that isn't the case, both the BIT_AND_EXPR shortening and conversion to
bool done for build_conditional_expr actually unshare_expr that first
expression, but none of the other 4 are unshared.  With -fsanitize=bounds,
.UBSAN_BOUNDS calls are added to the ARRAY_REFs and use SAVE_EXPRs to avoid
evaluating the argument multiple times, but because that FUNCTION tree is
first used in the second argument of COND_EXPR (i.e. conditionally), the
SAVE_EXPR initialization is done just there and then the third argument
of COND_EXPR just uses the uninitialized temporary and so does the first
argument computation as well.


If there are SAVE_EXPRs in FUNCTION, why is TREE_SIDE_EFFECTS false?


They aren't there when this function is called, they are added only during
cp_genericize when instrumenting the ARRAY_REFs with UBSAN.

And unlike this function, e.g. ubsan_instrument_bounds just calls save_expr
on the index and not if (TREE_SIDE_EFFECTS (index)) index = save_expr
(index).


It seems desirable to only do the bounds-checking once.


So, one possibility would be to call save_expr unconditionally in
get_member_function_from_ptrfunc as well.

Or build a TARGET_EXPR (force_target_expr or similar).


Yes.  I don't have a strong preference between the two.

Jason

Re: [to-be-committed] [RISC-V][PR target/115921] Improve reassociation for rv64

2024-09-04 Thread Jeff Law





On 9/4/24 8:08 AM, Xi Ruoyao wrote:

Hi Jeff,

On Mon, 2024-09-02 at 12:53 -0600, Jeff Law wrote:

  (define_insn_and_split "_shift_reverse"
    [(set (match_operand:X 0 "register_operand" "=r")
  (any_bitwise:X (ashift:X (match_operand:X 1 "register_operand" "r")
@@ -2934,9 +2936,9 @@ (define_insn_and_split "_shift_reverse"
    "(!SMALL_OPERAND (INTVAL (operands[3]))
  && SMALL_OPERAND (INTVAL (operands[3]) >> INTVAL (operands[2]))
  && popcount_hwi (INTVAL (operands[3])) > 1


I'm wondering why we need to check the popcount.  With this patch
applied:
Zbs can do these things with the single bit manipulation instructions. 
It would be reasonable to make it (TARGET_ZBS && popcount_hwi ...)


jeff

[PATCH] c++, v2: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]

2024-09-04 Thread Jakub Jelinek

On Wed, Sep 04, 2024 at 12:34:04PM -0400, Jason Merrill wrote:
> > So, one possibility would be to call save_expr unconditionally in
> > get_member_function_from_ptrfunc as well.
> > 
> > Or build a TARGET_EXPR (force_target_expr or similar).
> 
> Yes.  I don't have a strong preference between the two.

Here is a patch that uses save_expr but uses it still conditionally,
doesn't make sense to use it for the common case of just decls, there is
nothing to unshare in that case.

Passed the test, ok if it passes full bootstrap/regtest?

2024-09-04  Jakub Jelinek  

PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.

* g++.dg/ubsan/pr116449.C: New test.

--- gcc/cp/typeck.cc.jj 2024-09-02 17:07:30.115098114 +0200
+++ gcc/cp/typeck.cc2024-09-04 19:08:24.127490242 +0200
@@ -4188,10 +4188,21 @@ get_member_function_from_ptrfunc (tree *
   if (!nonvirtual && is_dummy_object (instance_ptr))
nonvirtual = true;
 
-  if (TREE_SIDE_EFFECTS (instance_ptr))
-   instance_ptr = instance_save_expr = save_expr (instance_ptr);
+  /* Use save_expr even when instance_ptr doesn't have side-effects,
+unless it is a simple decl (save_expr won't do anything on
+constants), so that we don't ubsan instrument the expression
+multiple times.  See PR116449.  */
+  if (TREE_SIDE_EFFECTS (instance_ptr) || !DECL_P (instance_ptr))
+   {
+ instance_save_expr = save_expr (instance_ptr);
+ if (instance_save_expr == instance_ptr)
+   instance_save_expr = NULL_TREE;
+ else
+   instance_ptr = instance_save_expr;
+   }
 
-  if (TREE_SIDE_EFFECTS (function))
+  /* See above comment.  */
+  if (TREE_SIDE_EFFECTS (function) || !DECL_P (function))
function = save_expr (function);
 
   /* Start by extracting all the information from the PMF itself.  */
--- gcc/testsuite/g++.dg/ubsan/pr116449.C.jj2024-09-04 18:58:46.106764285 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr116449.C   2024-09-04 18:58:46.106764285 
+0200
@@ -0,0 +1,14 @@
+// PR c++/116449
+// { dg-do compile }
+// { dg-options "-O2 -Wall -fsanitize=undefined" }
+
+struct C { void foo (int); void bar (); int c[16]; };
+typedef void (C::*P) ();
+struct D { P d; };
+static D e[1] = { { &C::bar } };
+
+void
+C::foo (int x)
+{
+  (this->*e[c[x]].d) ();
+}


Jakub

[PATCH v2] c++: fn redecl in fn scope wrongly accepted [PR116239]

2024-09-04 Thread Marek Polacek

On Wed, Sep 04, 2024 at 12:28:49PM -0400, Jason Merrill wrote:
> On 8/30/24 3:40 PM, Marek Polacek wrote:
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > Redeclaration such as
> > 
> >void f(void);
> >consteval void f(void);
> > 
> > is invalid.  In a namespace scope, we detect the collision in
> > validate_constexpr_redeclaration, but not when one declaration is
> > at block scope.
> > 
> > When we have
> > 
> >void f(void);
> >void g() { consteval void f(void); }
> > 
> > we call pushdecl on the second f and call push_local_extern_decl_alias.
> > It finds the namespace-scope f:
> > 
> >  for (ovl_iterator iter (binding); iter; ++iter)
> >if (decls_match (decl, *iter, /*record_versions*/false))
> >  {
> >alias = *iter;
> >break;
> >  }
> > 
> > but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
> > (and do not call another pushdecl leading to duplicate_decls which
> > would detect mismatching return types, for example).  I don't think
> > we want to change decls_match, so a simple fix is to detect the
> > problem in push_local_extern_decl_alias.
> > 
> > PR c++/116239
> > 
> > gcc/cp/ChangeLog:
> > 
> > * cp-tree.h (validate_constexpr_redeclaration): Declare.
> > * decl.cc (validate_constexpr_redeclaration): No longer static.
> > * name-lookup.cc (push_local_extern_decl_alias): Call
> > validate_constexpr_redeclaration.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/diagnostic/redeclaration-6.C: New test.
> > ---
> >   gcc/cp/cp-tree.h  |  1 +
> >   gcc/cp/decl.cc|  2 +-
> >   gcc/cp/name-lookup.cc |  3 ++
> >   .../g++.dg/diagnostic/redeclaration-6.C   | 34 +++
> >   4 files changed, 39 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
> > 
> > diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> > index 2eeb5e3e8b1..1a763b683de 100644
> > --- a/gcc/cp/cp-tree.h
> > +++ b/gcc/cp/cp-tree.h
> > @@ -6992,6 +6992,7 @@ extern bool member_like_constrained_friend_p  (tree);
> >   extern bool fns_correspond(tree, tree);
> >   extern int decls_match(tree, tree, bool = 
> > true);
> >   extern bool maybe_version_functions   (tree, tree, bool);
> > +extern bool validate_constexpr_redeclaration   (tree, tree);
> >   extern bool merge_default_template_args   (tree, tree, bool);
> >   extern tree duplicate_decls   (tree, tree,
> >  bool hiding = false,
> > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> > index 6458e96bded..4ad68d609d7 100644
> > --- a/gcc/cp/decl.cc
> > +++ b/gcc/cp/decl.cc
> > @@ -1412,7 +1412,7 @@ check_redeclaration_exception_specification (tree 
> > new_decl,
> >   /* Return true if OLD_DECL and NEW_DECL agree on constexprness.
> >  Otherwise issue diagnostics.  */
> > -static bool
> > +bool
> >   validate_constexpr_redeclaration (tree old_decl, tree new_decl)
> >   {
> > old_decl = STRIP_TEMPLATE (old_decl);
> > diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> > index 70ad4cbf3b5..6777d97ac2e 100644
> > --- a/gcc/cp/name-lookup.cc
> > +++ b/gcc/cp/name-lookup.cc
> > @@ -3708,6 +3708,9 @@ push_local_extern_decl_alias (tree decl)
> > }
> >   }
> > +  if (!validate_constexpr_redeclaration (alias, decl))
> > +return;
> > +
> > retrofit_lang_decl (decl);
> > DECL_LOCAL_DECL_ALIAS (decl) = alias;
> >   }
> 
> I don't think we need this in the case that we built a new alias and pushed
> it; in that case alias is built from decl, and should certainly match
> already.  So let's put this call before we decide to build a new alias,
> either in the loop or just after it.

That's right.  How about this?

dg.exp passed.

-- >8 --
Redeclaration such as

  void f(void);
  consteval void f(void);

is invalid.  In a namespace scope, we detect the collision in
validate_constexpr_redeclaration, but not when one declaration is
at block scope.

When we have

  void f(void);
  void g() { consteval void f(void); }

we call pushdecl on the second f and call push_local_extern_decl_alias.
It finds the namespace-scope f:

for (ovl_iterator iter (binding); iter; ++iter)
  if (decls_match (decl, *iter, /*record_versions*/false))
{
  alias = *iter;
  break;
}

but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
(and do not call another pushdecl leading to duplicate_decls which
would detect mismatching return types, for example).  I don't think
we want to change decls_match, so a simple fix is to detect the
problem in push_local_extern_decl_alias.

PR c++/116239

gcc/cp/ChangeLog:

* cp-tree.h (validate_constexpr_redeclaration):

Re: [PATCH] c++, v2: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-04 Thread Jason Merrill


On 8/30/24 1:37 PM, Jakub Jelinek wrote:

On Wed, Aug 21, 2024 at 02:08:16PM -0400, Jason Merrill wrote:

I was concerned about the use of a single boolean to guard the destruction
of multiple objects, suspecting that it would break in obscure EH cases.
When I finally managed to construct a testcase that breaks, I was surprised
to see that it broke before this patch as well. And then I realized that it
breaks even without structured bindings:


Ouch.


In any case, we aren't going to address that in this patch.


+ if (processing_template_decl || error_operand_p (decl))
+   cp_finish_decomp (decl, &decomp);
+  && (processing_template_decl || error_operand_p (decl)))
  cp_finish_decomp (decl, decomp);


Rather than all the callers needing to check this, how about changing the
new test_p flag to be tristate so that the second call from cp_finish_decl
is also indicated, and let cp_finish_decomp itself decide whether to
actually do anything?


Here are 2 versions of the patch.
Included here is a version which uses RAII to call cp_finish_decomp from
cp_finish_decl and drops all cp_finish_decomp calls in the callers from
after the cp_finish_decl call (which I like better),


Sounds good.


and attached is one which introduces a tristate argument to
cp_finish_decomp.

So far I've bootstrapped/regtested on x86_64-linux and i686-linux the RAII
one.

2024-08-30  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp fpr structured bindings
here if !processing_template_decl, first with test_p.  For
automatic structured binding bases if the test cp_finish_decomp
returned true wrap the initialization together with what non-test
cp_finish_decomp emits with a CLEANUP_POINT_EXPR, and if there are
any CLEANUP_STMTs needed, emit them around the whole
CLEANUP_POINT_EXPR with guard variables for the cleanups.  Call
cp_finish_decomp using RAII if not called with decomp != NULL
otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call.  Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-30 09:09:45.466623869 +0200
+++ gcc/cp/cp-tree.h2024-08-30 11:00:39.861747964 +0200
@@ -7024,7 +7024,7 @@ extern void omp_declare_variant_finalize
  struct cp_decomp { tree decl; unsigned int count; };
  extern void cp_finish_decl(tree, tree, bool, tree, int, 
cp_decomp * = nullptr);
  extern tree lookup_decomp_type(tree);
-extern void cp_finish_decomp   (tree, cp_decomp *);
+extern bool cp_finish_decomp   (tree, cp_decomp *, bool = 
false);
  extern int cp_complete_array_type (tree *, tree, bool);
  extern int cp_complete_array_type_or_error(tree *, tree, bool, 
tsubst_flags_t);
  extern tree build_ptrmemfunc_type (tree);
--- gcc/cp/decl.cc.jj   2024-08-30 09:09:45.495623494 +0200
+++ gcc/cp/decl.cc  2024-08-30 11:11:51.554212784 +0200
@@ -103,7 +103,7 @@ static tree check_special_function_retur
  static tree push_cp_library_fn (enum tree_code, tree, int);
  static tree build_cp_library_fn (tree, enum tree_code, tree, int);
  static void store_parm_decls (tree);
-static void initialize_local_var (tree, tree);
+static void initialize_local_var (tree, tree, bool);
  static void expand_static_init (tree, tree);
  static location_t smallest_type_location (const cp_decl_specifier_seq*);
  static bool identify_goto (tree, location_t, const location_t *,
@@ -8050,14 +8050,13 @@ wrap_temporary_cleanups (tree init, tree
  /* Generate code to initialize DECL (a local variable).  */
  
  static void

-initialize_local_var (tree decl, tree init)
+initialize_local_var (tree decl, tree init, bool decomp)
  {
tree type = TREE_TYPE (decl);
tree cleanup;
int already_used;
  
-  gcc_assert (VAR_P (decl)

- || TREE_CODE (decl) == RESULT_DECL);
+  gcc_assert (VAR_P (decl) || TREE_CODE (decl) == RESULT_DECL);
gcc_assert (!TREE_

[pushed] c++: cleanup coerce_template_template_parm

2024-09-04 Thread Marek Polacek

Split out from
https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662261.html
which was tested on x86_64-pc-linux-gnu.  I'm checking this in.

-- >8 --
This function could use some sprucing up.

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parm): Return bool instead of int.
---
 gcc/cp/pt.cc | 35 ---
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..1225c668e87 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7887,25 +7887,22 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
   return convert_from_reference (expr);
 }
 
-/* Subroutine of coerce_template_template_parms, which returns 1 if
-   PARM_PARM and ARG_PARM match using the rule for the template
-   parameters of template template parameters. Both PARM and ARG are
-   template parameters; the rest of the arguments are the same as for
-   coerce_template_template_parms.
- */
-static int
-coerce_template_template_parm (tree parm,
-  tree arg,
-  tsubst_flags_t complain,
-  tree in_decl,
-  tree outer_args)
+/* Subroutine of coerce_template_template_parms, which returns true if
+   PARM and ARG match using the rule for the template parameters of
+   template template parameters.  Both PARM and ARG are template parameters;
+   the rest of the arguments are the same as for
+   coerce_template_template_parms.  */
+
+static bool
+coerce_template_template_parm (tree parm, tree arg, tsubst_flags_t complain,
+  tree in_decl, tree outer_args)
 {
   if (arg == NULL_TREE || error_operand_p (arg)
   || parm == NULL_TREE || error_operand_p (parm))
-return 0;
+return false;
 
   if (TREE_CODE (arg) != TREE_CODE (parm))
-return 0;
+return false;
 
   switch (TREE_CODE (parm))
 {
@@ -7916,7 +7913,7 @@ coerce_template_template_parm (tree parm,
   {
if (!coerce_template_template_parms
(parm, arg, complain, in_decl, outer_args))
- return 0;
+ return false;
   }
   /* Fall through.  */
 
@@ -7924,7 +7921,7 @@ coerce_template_template_parm (tree parm,
   if (TEMPLATE_TYPE_PARAMETER_PACK (TREE_TYPE (arg))
  && !TEMPLATE_TYPE_PARAMETER_PACK (TREE_TYPE (parm)))
/* Argument is a parameter pack but parameter is not.  */
-   return 0;
+   return false;
   break;
 
 case PARM_DECL:
@@ -7940,13 +7937,13 @@ coerce_template_template_parm (tree parm,
  tree t = tsubst (TREE_TYPE (parm), outer_args, complain, in_decl);
  if (!uses_template_parms (t)
  && !same_type_p (t, TREE_TYPE (arg)))
-   return 0;
+   return false;
}
 
   if (TEMPLATE_PARM_PARAMETER_PACK (DECL_INITIAL (arg))
  && !TEMPLATE_PARM_PARAMETER_PACK (DECL_INITIAL (parm)))
/* Argument is a parameter pack but parameter is not.  */
-   return 0;
+   return false;
 
   break;
 
@@ -7954,7 +7951,7 @@ coerce_template_template_parm (tree parm,
   gcc_unreachable ();
 }
 
-  return 1;
+  return true;
 }
 
 /* Coerce template argument list ARGLIST for use with template

base-commit: c755c7a32590e2eef5a8b062b9756c1513910246
-- 
2.46.0

[PATCH] c++, v3: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-04 Thread Jakub Jelinek

On Wed, Sep 04, 2024 at 01:22:47PM -0400, Jason Merrill wrote:
> > @@ -8985,6 +9003,13 @@ cp_finish_decl (tree decl, tree init, bo
> > if (var_definition_p)
> > abstract_virtuals_error (decl, type);
> > +  if (decomp && !processing_template_decl)
> > +   {
> > + need_decomp_init = cp_finish_decomp (decl, decomp, true);
> > + if (!need_decomp_init)
> > +   decomp_cl.decomp = NULL;
> 
> 
> It seems like all tests of need_decomp_init could instead test
> decomp_cl.decomp.  And if we make that field a reference to the decomp
> parameter, we could refer to the parameter instead of ever referring to
> decomp_cl.

So like this (so far quickly tested on *decomp* dr2867*, full
bootstrap/regtest queued)?

2024-09-04  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp fpr structured bindings
here if !processing_template_decl, first with test_p.  For
automatic structured binding bases if the test cp_finish_decomp
returned true wrap the initialization together with what non-test
cp_finish_decomp emits with a CLEANUP_POINT_EXPR, and if there are
any CLEANUP_STMTs needed, emit them around the whole
CLEANUP_POINT_EXPR with guard variables for the cleanups.  Call
cp_finish_decomp using RAII if not called with decomp != NULL
otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call.  Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-30 09:09:45.466623869 +0200
+++ gcc/cp/cp-tree.h2024-08-30 11:00:39.861747964 +0200
@@ -7024,7 +7024,7 @@ extern void omp_declare_variant_finalize
 struct cp_decomp { tree decl; unsigned int count; };
 extern void cp_finish_decl (tree, tree, bool, tree, int, 
cp_decomp * = nullptr);
 extern tree lookup_decomp_type (tree);
-extern void cp_finish_decomp   (tree, cp_decomp *);
+extern bool cp_finish_decomp   (tree, cp_decomp *, bool = 
false);
 extern int cp_complete_array_type  (tree *, tree, bool);
 extern int cp_complete_array_type_or_error (tree *, tree, bool, 
tsubst_flags_t);
 extern tree build_ptrmemfunc_type  (tree);
--- gcc/cp/decl.cc.jj   2024-08-30 09:09:45.495623494 +0200
+++ gcc/cp/decl.cc  2024-09-04 19:55:59.046491602 +0200
@@ -103,7 +103,7 @@ static tree check_special_function_retur
 static tree push_cp_library_fn (enum tree_code, tree, int);
 static tree build_cp_library_fn (tree, enum tree_code, tree, int);
 static void store_parm_decls (tree);
-static void initialize_local_var (tree, tree);
+static void initialize_local_var (tree, tree, bool);
 static void expand_static_init (tree, tree);
 static location_t smallest_type_location (const cp_decl_specifier_seq*);
 static bool identify_goto (tree, location_t, const location_t *,
@@ -8058,14 +8058,13 @@ wrap_temporary_cleanups (tree init, tree
 /* Generate code to initialize DECL (a local variable).  */

 static void
-initialize_local_var (tree decl, tree init)
+initialize_local_var (tree decl, tree init, bool decomp)
 {
   tree type = TREE_TYPE (decl);
   tree cleanup;
   int already_used;

-  gcc_assert (VAR_P (decl)
- || TREE_CODE (decl) == RESULT_DECL);
+  gcc_assert (VAR_P (decl) || TREE_CODE (decl) == RESULT_DECL);
   gcc_assert (!TREE_STATIC (decl));

   if (DECL_SIZE (decl) == NULL_TREE)
@@ -8085,7 +8084,8 @@ initialize_local_var (tree decl, tree in
 DECL_READ_P (decl) = 1;

   /* Generate a cleanup, if necessary.  */
-  cleanup = cxx_maybe_build_cleanup (decl, tf_warning_or_error);
+  cleanup = (decomp ? NULL_TREE
+: cxx_maybe_build_cleanup (decl, tf_warning_or_error));

   /* Perform the initialization.  */
   if (init)
@@ -8120,10 +8120,16 @@ initialize_local_var (tree decl, tree in

  gcc_assert (building_stmt_list_p ());
  saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
- current_stmt_tree ()->stmts_are_full_exprs_p = 1;
+ /* Avoid CLEA

[committed][RISC-V] Fix scan test output after recent path-splitting changes

2024-09-04 Thread Jeff Law



The recent path splitting changes from Andrew result in identifying more 
saturation idioms instead of just identifying an overflow check.  As a 
result many of the tests in the RISC-V port started failing a scan check 
on the .expand output.


As expected, identifying a saturation idiom is more helpful than 
identifying an overflow check and the resultant code is better based on 
my spot checks.


So the right thing to do is to expect more saturation intrinsics in the 
.expand output.


I've verified this fixes the regressions for riscv32-elf and 
riscv64-elf.  Pushing to the trunk.


Jeffcommit 0455e85e4eda7d80bda967914d634fe5b71b7ffc
Author: Jeff Law 
Date:   Wed Sep 4 12:07:09 2024 -0600

[RISC-V] Fix scan test output after recent path-splitting changes

The recent path splitting changes from Andrew result in identifying more
saturation idioms instead of just identifying an overflow check.  As a 
result
many of the tests in the RISC-V port started failing a scan check on the
.expand output.

As expected, identifying a saturation idiom is more helpful than 
identifying an
overflow check and the resultant code is better based on my spot checks.

So the right thing to do is to expect more saturation intrinsics in the 
.expand
output.

I've verified this fixes the regressions for riscv32-elf and riscv64-elf.
Pushing to the trunk.

gcc/testsuite
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-13.c: Adjust
expected output.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-14.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-15.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-16.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-17.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-18.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-19.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add-20.c: Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-1.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-2.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-5.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-6.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-9.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-10.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-13.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-14.c:
Likewise.
* 
gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm_reconcile-15.c:
Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-9.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-10.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-11.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-12.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-13.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-14.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-15.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-16.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-17.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-18.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-19.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-20.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-21.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-22.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-23.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-24.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-33.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-34.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-35.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-36.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-37.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-38.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-39.c: Likewise.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub-40.c: Likewise.

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/bi

Re: [PATCH v3 0/5] aarch64: Fix intrinsic availability [PR112108]

2024-09-04 Thread Andrew Carlotti

On Mon, Aug 19, 2024 at 03:52:58PM +0100, Andrew Carlotti wrote:
> On Fri, Aug 16, 2024 at 07:17:24AM +, Kyrylo Tkachov wrote:
> > 
> > 
> > > On 15 Aug 2024, at 18:48, Andrew Carlotti  wrote:
> > > 
> > > External email: Use caution opening links or attachments
> > > 
> > > 
> > > On Thu, Aug 15, 2024 at 05:15:03PM +0100, Richard Sandiford wrote:
> > >> Andrew Carlotti  writes:
> > >>> This series of patches fixes issues with some intrinsics being 
> > >>> incorrectly
> > >>> gated by global target options, instad of just using function-specific 
> > >>> target
> > >>> options.  These issues have been present since the +tme, +memtag and 
> > >>> +ls64
> > >>> intrinsics were introduced.
> > >>> 
> > >>> Compared to the previous version, this series no longer adds feature 
> > >>> checks to
> > >>> the intrinsic expanders, and fixes various formatting issues pointed 
> > >>> out by
> > >>> Richard Sandiford.
> > >>> 
> > >>> Additionally, the series now refactors the checking of 
> > >>> TARGET_GENERAL_REGS_ONLY
> > >>> in check_required_extensions.  This refactor is included as a new patch 
> > >>> (1/5)
> > >>> to make the diffs more readable.
> > >>> 
> > >>> 
> > >>> Bootstrapped and regression tested on aarch64.  Ok to merge?
> > >> 
> > >> LGTM, thanks.  OK if there are no other comments before the weekend.
> > >> 
> > >>> Also, ok for backports to affected versions (with regression tests)?
> > >> 
> > >> Hmm, it seems a bit invasive.  And if the GCC 11 tag in the PR is
> > >> anything to go by, it sounds like this is already unfixable behaviour
> > >> in at least one release series.
> > > 
> > > I think the impact is minimal prior to FMV support, so backporting is less
> > > important for older versions.  The series should backport cleanly to GCC 
> > > 14,
> > > but would have conflicts in earlier version, so I think it would be 
> > > sensible to
> > > backport to GCC 14 and not further.
> > 
> > I think backporting only to GCC 14 is sensible. The intrinsics in question 
> > tbh are or will be shipping hardware that I don’t expect will be used with 
> > older compilers much to be worth the risk of adjusting the patches for 
> > those branches.
> > Thanks,
> > Kyrill
> > 
> > 
> > > 
> > >> Let's see if anyone else has any opinions.
> > >> 
> > >> Richard
> > 
> 
> I've pushed this to master now (with a couple of Changelog fixes).  I'll
> backport it to GCC 14 next week if there are no issues.

Backported cleanly to GCC 14, and pushed after passing regression testing.

1 2 >

1 - 100 of 141 matches

Mail list logo