Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE)\
>   TYPE __attribute__ ((noinline, noclone))\
>   test_##TYPE (TYPE *x, int n, TYPE value)\
>   {   \
> TYPE last;\
> for (int j = 0; j < n; ++j)   \
>   {   \
>   last = x[j];\
>   x[j] = last * value;\
>   }   \
> return last;  \
>   }
> 
> #define TEST_ALL(T)   \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ...
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
>   ...
> 
> Epilogue:
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);
> 
> For RVV since we prefer len in loop control, after this patch for RVV:
> 
> Loop:
>   ...
>   loop_len_22 = SELECT_VL;
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
>   ...
> 
> Epilogue:
>   _25 = .VEC_EXTRACT (loop_len_22 + bias - 1, vect_last_12.8_23);
> 
> Details of this approach:
> 
> 1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
> vectorization
> for LEN loop control.
>
>This function we check whether target support:
> - Use LEN as the loop control.
> - Support VEC_EXTRACT optab.
> 
> 2. Step 2 - Record LEN for loop control if 
> 'vect_can_vectorize_extract_last_with_len_p' is true.
> 
> 3. Step 3 - Gerenate VEC_EXTRACT (v, LEN + BIAS - 1).
> 
> The only difference between mask and len is that len is using length 
> generated by SELECT_VL and
> use VEC_EXTRACT pattern. The rest of the live vectorization is totally the 
> same ARM SVE.
> 
> Bootstrap and Regression on X86 passed.
> 
> Tested on ARM QEMU.
> 
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_can_vectorize_extract_last_with_len_p): New 
> function.
>   (vectorizable_live_operation): Add loop len control.
> 
> ---
>  gcc/tree-vect-loop.cc | 76 +++
>  1 file changed, 70 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bf8d677b584..809b73b966c 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8963,6 +8963,27 @@ vect_can_vectorize_without_simd_p (code_helper code)
> && vect_can_vectorize_without_simd_p (tree_code (code)));
>  }
>  
> +/* Return true if target supports extract last vectorization with LEN.  */
> +
> +static bool
> +vect_can_vectorize_extract_last_with_len_p (tree vectype)
> +{
> +  /* Return false if target doesn't support LEN in loop control.  */
> +  machine_mode vmode;
> +  machine_mode vec_mode = TYPE_MODE (vectype);
> +  if (!VECTOR_MODE_P (vec_mode))
> +return false;
> +  if (!get_len_load_store_mode (vec_mode, true).exists (&vmode)
> +  || !get_len_load_store_mode (vec_mode, false).exists (&vmode))
> +return false;

So this "hidden" bit in the end decides whether to ...

> +  /* Target need to support VEC_EXTRACT to extract the last active element.  
> */
> +  return convert_optab_handler (vec_extract_optab,
> + vec_mode,
> + TYPE_MODE (TREE_TYPE (vectype)))
> +  != CODE_FOR_nothing;
> +}
> +
>  /* Create vector init for vectorized iv.  */
>  static tree
>  vect_create_nonlinear_iv_init (gimple_seq* stmts, tree init_expr,
> @@ -10279,7 +10300,8 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
>   {
> if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> -OPTIMIZE_FOR_SPEED))
> +OPTIMIZE_FOR_SPEED)
> +   && !vect_can_vectorize_extract_last_with_len_p (vectype))
>   {
> if (dump_enabled_p ())
>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -10308,9 +10330,14 @@ vectorizable_live_operation (vec_info *vinfo, 
> stmt_vec_info stmt_info,
> else
>   {
> gcc_assert (ncopies == 1 && !slp_node);
> -   vect_record_loop_mask (loop_vinfo,
> -  

[PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API

2023-08-11 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFMADD as the below samples.

* __riscv_vfmadd_vv_f32m1_rm
* __riscv_vfmadd_vv_f32m1_rm_m
* __riscv_vfmadd_vf_f32m1_rm
* __riscv_vfmadd_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmadd_frm): New class for vfmadd frm.
(vfmadd_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmadd_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-madd.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 25 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 .../riscv/rvv/base/float-point-madd.c | 47 +++
 4 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 60c6e16f6ae..7476cdc317d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -447,6 +447,29 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfmadd
+*/
+class vfmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul_scalar (PLUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul (PLUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2211,6 +2234,7 @@ static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
 static CONSTEXPR const vfnmsac vfnmsac_obj;
 static CONSTEXPR const vfnmsac_frm vfnmsac_frm_obj;
 static CONSTEXPR const vfmadd vfmadd_obj;
+static CONSTEXPR const vfmadd_frm vfmadd_frm_obj;
 static CONSTEXPR const vfnmsub vfnmsub_obj;
 static CONSTEXPR const vfnmacc vfnmacc_obj;
 static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
@@ -2450,6 +2474,7 @@ BASE (vfmacc_frm)
 BASE (vfnmsac)
 BASE (vfnmsac_frm)
 BASE (vfmadd)
+BASE (vfmadd_frm)
 BASE (vfnmsub)
 BASE (vfnmacc)
 BASE (vfnmacc_frm)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 28eec2c3e99..5850ff0cf2e 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -164,6 +164,7 @@ extern const function_base *const vfmacc_frm;
 extern const function_base *const vfnmsac;
 extern const function_base *const vfnmsac_frm;
 extern const function_base *const vfmadd;
+extern const function_base *const vfmadd_frm;
 extern const function_base *const vfnmsub;
 extern const function_base *const vfnmacc;
 extern const function_base *const vfnmacc_frm;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index c84e052c1a9..04f3de1275c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -357,6 +357,8 @@ DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, 
f__ops)
 DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f_vvfv_ops)
 DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
 
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
new file mode 100644
index 000..00c9d002998
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmadd_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmadd_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmadd_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfmadd_vf_f32m1_rm (vfloat32m1_t vd, float32_t o

Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> So how can we resolve the issue when a non-VL operation like
>> .VEC_EXTRACT is used for _len support?

Do you mean non-VL extract last operation (I am sorry that not sure whether I 
understand your question correctly)? 
If yes, the answer is for RVV, we are reusing the same flow as ARM SVE 
(BIT_FILED_REF approach), see the example below:

https://godbolt.org/z/cqrWrY8q4 

#define EXTRACT_LAST(TYPE)  \
  TYPE __attribute__ ((noinline, noclone))  \
  test_##TYPE (TYPE *x, int n, TYPE value)  \
  { \
TYPE last;  \
for (int j = 0; j < 64; ++j)\
  { \
last = x[j];\
x[j] = last * value;\
  } \
return last;\
  }

#define TEST_ALL(T) \
  T (uint8_t)   \

TEST_ALL (EXTRACT_LAST)

  vect_cst__22 = {value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D)};
  vect_last_11.6_3 = MEM  [(uint8_t *)x_10(D)];
  vect__4.7_23 = vect_last_11.6_3 * vect_cst__22;
  MEM  [(uint8_t *)x_10(D)] = vect__4.7_23;
  _21 = BIT_FIELD_REF ;

This approach works perfectly for both RVV and ARM SVE for non-VL and non-MASK 
EXTRACT_LAST operation.

>> So, why do we test for get_len_load_store_mode and not just for
>> VEC_EXTRACT?

Before answer this question, let me first elaborate how ARM SVE is doing with 
MASK EXTRACT_LAST.

Here is the example:
https://godbolt.org/z/8cTv1jqMb 

ARM SVE IR:

   [local count: 955630224]:
  # ivtmp_31 = PHI 

  # loop_mask_22 = PHI  -> For RVV, we 
want this to be loop_len = SELECT_VL;

  _7 = &MEM  [(uint8_t *)x_11(D) + ivtmp_31 * 1];
  vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
  vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
  .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
  ivtmp_32 = ivtmp_31 + POLY_INT_CST [16, 16];
  _1 = (unsigned int) ivtmp_32;

  next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });

  if (next_mask_35 != { 0, ... })
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:

  _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); [tail call] > Use 
the last mask generated in BB 4, so for RVV, we are using the loop_len.

So this patch is trying to optimize the codegen with simulating same flow as 
ARM SVE but with replacing 'loop_mask_22' (This is generated in BB4) into 
'loop_len'.

For ARM SVE, they only check whether target support EXTRACT_LAST pattern, this 
pattern is supported means:

1. Target is using loop MASK as the partial vector loop control.
2. extract_last optab is enabled in the backend.

So for RVV, we are also checking same conditions:

1. Target is using loop LEN as the partial vector loop control (I use 
get_len_load_store_mode to check whether target is using loop LEN as the 
partial vector loop control).
2. vec_extract optab is enabled in the backend.

An alternative approach is that we can adding EXTRACT_LAST_LEN internal FN, 
then we can only check this like ARM SVE only check EXTRACT_LAST.

>> can we double-check this on powerpc and s390?

Sure, I hope it can be beneficial to powerpc and s390.
And, I think Richard's comments are also very important so I am gonna wait for 
it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-11 15:01
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; linkw; krebbel
Subject: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richard and Richi.
> 
> This patch add support live vectorization by VEC_EXTRACT for LEN loop control.
> 
> Consider this following case:
> 
> #include 
> 
> #define EXTRACT_LAST(TYPE) \
>   TYPE __attribute__ ((noinline, noclone)) \
>   test_##TYPE (TYPE *x, int n, TYPE value) \
>   { \
> TYPE last; \
> for (int j = 0; j < n; ++j) \
>   { \
> last = x[j]; \
> x[j] = last * value; \
>   } \
> return last; \
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t) \
> 
> TEST_ALL (EXTRACT_LAST)
> 
> ARM SVE IR:
> 
> Preheader:
>   max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });
> 
> Loop:
>   ...
>   # loop_mask_22 = PHI 
>   ...
>   vect_last_

Re: [PATCH v1] RISC-V: Support RVV VFNMSAC rounding mode intrinsic API

2023-08-11 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-11 13:54
To: gcc-patches
CC: juzhe.zhong; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFNMSAC rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFNMSAC for the below samples.
 
* __riscv_vfnmsac_vv_f32m1_rm
* __riscv_vfnmsac_vv_f32m1_rm_m
* __riscv_vfnmsac_vf_f32m1_rm
* __riscv_vfnmsac_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmsac_frm): New class for vfnmsac frm.
(vfnmsac_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmsac_frm): New function definition.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-nmsac.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-nmsac.c| 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 8d3970b28db..60c6e16f6ae 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -424,6 +424,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfnmsac
+*/
+class vfnmsac_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ true, code_for_pred_mul_neg_scalar (PLUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ true, code_for_pred_mul_neg (PLUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2186,6 +2209,7 @@ static CONSTEXPR const widen_binop_frm 
vfwmul_frm_obj;
static CONSTEXPR const vfmacc vfmacc_obj;
static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
+static CONSTEXPR const vfnmsac_frm vfnmsac_frm_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
static CONSTEXPR const vfnmacc vfnmacc_obj;
@@ -2424,6 +2448,7 @@ BASE (vfwmul_frm)
BASE (vfmacc)
BASE (vfmacc_frm)
BASE (vfnmsac)
+BASE (vfnmsac_frm)
BASE (vfmadd)
BASE (vfnmsub)
BASE (vfnmacc)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index ca8a6dc1cc3..28eec2c3e99 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -162,6 +162,7 @@ extern const function_base *const vfwmul_frm;
extern const function_base *const vfmacc;
extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
+extern const function_base *const vfnmsac_frm;
extern const function_base *const vfmadd;
extern const function_base *const vfnmsub;
extern const function_base *const vfnmacc;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 51a14e49075..c84e052c1a9 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -355,6 +355,8 @@ DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c
new file mode 100644
index 000..c3089234272
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfnmsac_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+ vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmsac_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfnmsac_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+ vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmsac_vv_f32m1_rm_m (mask, vd

Re: [PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API

2023-08-11 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-11 15:17
To: gcc-patches
CC: juzhe.zhong; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFMADD as the below samples.
 
* __riscv_vfmadd_vv_f32m1_rm
* __riscv_vfmadd_vv_f32m1_rm_m
* __riscv_vfmadd_vf_f32m1_rm
* __riscv_vfmadd_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class vfmadd_frm): New class for vfmadd frm.
(vfmadd_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmadd_frm): New function definition.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-madd.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-madd.c | 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 60c6e16f6ae..7476cdc317d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -447,6 +447,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmadd
+*/
+class vfmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_scalar (PLUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul (PLUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2211,6 +2234,7 @@ static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
static CONSTEXPR const vfnmsac_frm vfnmsac_frm_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
+static CONSTEXPR const vfmadd_frm vfmadd_frm_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
static CONSTEXPR const vfnmacc vfnmacc_obj;
static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
@@ -2450,6 +2474,7 @@ BASE (vfmacc_frm)
BASE (vfnmsac)
BASE (vfnmsac_frm)
BASE (vfmadd)
+BASE (vfmadd_frm)
BASE (vfnmsub)
BASE (vfnmacc)
BASE (vfnmacc_frm)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 28eec2c3e99..5850ff0cf2e 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -164,6 +164,7 @@ extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
extern const function_base *const vfnmsac_frm;
extern const function_base *const vfmadd;
+extern const function_base *const vfmadd_frm;
extern const function_base *const vfnmsub;
extern const function_base *const vfnmacc;
extern const function_base *const vfnmacc_frm;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index c84e052c1a9..04f3de1275c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -357,6 +357,8 @@ DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
new file mode 100644
index 000..00c9d002998
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmadd_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmadd_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmadd_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl)

Re: [PATCH] match.pd, v2: Implement missed optimization ((x ^ y) & z) | x -> (z & y) | x [PR109938]

2023-08-11 Thread Richard Biener via Gcc-patches
On Thu, Aug 10, 2023 at 5:43 PM Jakub Jelinek  wrote:
>
> Hi!
>
> On Thu, Aug 10, 2023 at 12:28:24PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > On Tue, Aug 08, 2023 at 03:18:51PM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > On Fri, Aug 4, 2023 at 11:49 PM Drew Ross via Gcc-patches
> > >  wrote:
> > > >
> > > > Adds a simplification for ((x ^ y) & z) | x to be folded into
> > > > (z & y) | x. Merges this simplification with ((x | y) & z) | x -> (z & 
> > > > y) | x
> > > > to prevent duplicate pattern. Tested successfully on x86_64 and x86 
> > > > targets.
> > >
> > > OK.
> >
> > Shouldn't
> >   (bit_ior:c (bit_and:cs (bit_ior:cs @0 @1) @2) @0)
> > be changed to
> >   (bit_ior:c (nop_convert1?:s
> >  (bit_and:cs (nop_convert2?:s (op:cs @0 @1)) @2)) @3)
> > rather than
> >   (bit_ior:c (nop_convert1? (bit_and:c (nop_convert2? (op:c @0 @1)) @2)) @3)
> > in the patch?
> > I mean the :s modifiers were there for a reason, if some of the
> > intermediates aren't a single use, then the simplification doesn't simplify
> > anything and can even make things larger.
>
> Here it is in patch form.  Bootstrapped/regtested on x86_64-linux and
> i686-linux, ok for trunk?

OK.

> 2023-08-10  Drew Ross  
> Jakub Jelinek  
>
> PR tree-optimization/109938
> * match.pd (((x ^ y) & z) | x -> (z & y) | x): New simplification.
>
> * gcc.c-torture/execute/pr109938.c: New test.
> * gcc.dg/tree-ssa/pr109938.c: New test.
>
> --- gcc/match.pd.jj 2023-08-10 09:26:19.390805079 +0200
> +++ gcc/match.pd2023-08-10 13:33:17.959654775 +0200
> @@ -1972,10 +1972,14 @@ (define_operator_list SYNC_FETCH_AND_AND
>(if (bitwise_inverted_equal_p (@0, @2))
> (bitop @0 @1
>
> -/* ((x | y) & z) | x -> (z & y) | x */
> -(simplify
> -  (bit_ior:c (bit_and:cs (bit_ior:cs @0 @1) @2) @0)
> -  (bit_ior (bit_and @2 @1) @0))
> +/* ((x | y) & z) | x -> (z & y) | x
> +   ((x ^ y) & z) | x -> (z & y) | x  */
> +(for op (bit_ior bit_xor)
> + (simplify
> +  (bit_ior:c (nop_convert1?:s
> +  (bit_and:cs (nop_convert2?:s (op:cs @0 @1)) @2)) @3)
> +  (if (bitwise_equal_p (@0, @3))
> +   (convert (bit_ior (bit_and @1 (convert @2)) (convert @0))
>
>  /* (x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2) */
>  (simplify
> --- gcc/testsuite/gcc.dg/tree-ssa/pr109938.c.jj 2023-08-10 13:22:19.513095403 
> +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr109938.c2023-08-10 13:35:24.428841774 
> +0200
> @@ -0,0 +1,125 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse2 -Wno-psabi" } */
> +
> +typedef int v4si __attribute__((vector_size(4 * sizeof(int;
> +
> +/* Generic */
> +__attribute__((noipa)) int
> +t1 (int a, int b, int c)
> +{
> +  return ((a ^ c) & b) | a;
> +}
> +
> +__attribute__((noipa)) unsigned int
> +t2 (int a, unsigned int b, int c)
> +{
> +  return ((a ^ c) & b) | a;
> +}
> +
> +__attribute__((noipa)) unsigned long
> +t3 (unsigned long a, long b, unsigned long c)
> +{
> +  return ((a ^ c) & b) | a;
> +}
> +
> +__attribute__((noipa)) unsigned short
> +t4 (short a, unsigned short b, unsigned short c)
> +{
> +  return (unsigned short) ((a ^ c) & b) | a;
> +}
> +
> +__attribute__((noipa)) unsigned char
> +t5 (unsigned char a, signed char b, signed char c)
> +{
> +  return ((a ^ c) & b) | a;
> +}
> +
> +__attribute__((noipa)) long long
> +t6 (long long a, long long b, long long c)
> +{
> +  return ((a ^ c) & (unsigned long long) b) | a;
> +}
> +
> +/* Gimple */
> +__attribute__((noipa)) int
> +t7 (int a, int b, int c)
> +{
> +  int t1 = a ^ c;
> +  int t2 = t1 & b;
> +  int t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) int
> +t8 (int a, unsigned int b, unsigned int c)
> +{
> +  unsigned int t1 = a ^ c;
> +  int t2 = t1 & b;
> +  int t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned int
> +t9 (unsigned int a, unsigned int b, int c)
> +{
> +  unsigned int t1 = a ^ c;
> +  unsigned int t2 = t1 & b;
> +  unsigned int t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned long
> +t10 (unsigned long a, long b, unsigned long c)
> +{
> +  unsigned long t1 = a ^ c;
> +  unsigned long t2 = t1 & b;
> +  unsigned long t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned short
> +t11 (short a, unsigned short b, short c)
> +{
> +  short t1 = a ^ c;
> +  unsigned short t2 = t1 & b;
> +  unsigned short t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned char
> +t12 (signed char a, unsigned char b, signed char c)
> +{
> +  unsigned char t1 = a ^ c;
> +  unsigned char t2 = t1 & b;
> +  unsigned char t3 = t2 | a;
> +  return t3;
> +}
> +
> +__attribute__((noipa)) unsigned long long
> +t13 (unsigned long long a, long long b, unsigned long long c)
> +{
> +  long long t1 = a ^ c;
> +  long long t2 = t1 & b;
> +  unsigned long long t3 = t2 | a;
> +  return t3;
> +}
> +
> +/* Vectors */
> +__attribute__((noipa)) v4si
> +t14 (v4si a, v4si b, v4si c)
> +{
> +  return

RE: [PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API

2023-08-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, August 11, 2023 3:30 PM
To: Li, Pan2 ; gcc-patches 
Cc: jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-11 15:17
To: gcc-patches
CC: juzhe.zhong; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMADD rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFMADD as the below samples.

* __riscv_vfmadd_vv_f32m1_rm
* __riscv_vfmadd_vv_f32m1_rm_m
* __riscv_vfmadd_vf_f32m1_rm
* __riscv_vfmadd_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmadd_frm): New class for vfmadd frm.
(vfmadd_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmadd_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-madd.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-madd.c | 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 60c6e16f6ae..7476cdc317d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -447,6 +447,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmadd
+*/
+class vfmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_scalar (PLUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul (PLUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2211,6 +2234,7 @@ static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
static CONSTEXPR const vfnmsac_frm vfnmsac_frm_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
+static CONSTEXPR const vfmadd_frm vfmadd_frm_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
static CONSTEXPR const vfnmacc vfnmacc_obj;
static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
@@ -2450,6 +2474,7 @@ BASE (vfmacc_frm)
BASE (vfnmsac)
BASE (vfnmsac_frm)
BASE (vfmadd)
+BASE (vfmadd_frm)
BASE (vfnmsub)
BASE (vfnmacc)
BASE (vfnmacc_frm)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 28eec2c3e99..5850ff0cf2e 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -164,6 +164,7 @@ extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
extern const function_base *const vfnmsac_frm;
extern const function_base *const vfmadd;
+extern const function_base *const vfmadd_frm;
extern const function_base *const vfnmsub;
extern const function_base *const vfnmacc;
extern const function_base *const vfnmacc_frm;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index c84e052c1a9..04f3de1275c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -357,6 +357,8 @@ DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-madd.c
new file mode 100644
index 000..00c9d002998
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/r

RE: [PATCH v1] RISC-V: Support RVV VFNMSAC rounding mode intrinsic API

2023-08-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, August 11, 2023 3:30 PM
To: Li, Pan2 ; gcc-patches 
Cc: jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFNMSAC rounding mode intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-11 13:54
To: gcc-patches
CC: juzhe.zhong; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFNMSAC rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFNMSAC for the below samples.

* __riscv_vfnmsac_vv_f32m1_rm
* __riscv_vfnmsac_vv_f32m1_rm_m
* __riscv_vfnmsac_vf_f32m1_rm
* __riscv_vfnmsac_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmsac_frm): New class for vfnmsac frm.
(vfnmsac_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmsac_frm): New function definition.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-nmsac.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-nmsac.c| 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 8d3970b28db..60c6e16f6ae 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -424,6 +424,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfnmsac
+*/
+class vfnmsac_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ true, code_for_pred_mul_neg_scalar (PLUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ true, code_for_pred_mul_neg (PLUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2186,6 +2209,7 @@ static CONSTEXPR const widen_binop_frm 
vfwmul_frm_obj;
static CONSTEXPR const vfmacc vfmacc_obj;
static CONSTEXPR const vfmacc_frm vfmacc_frm_obj;
static CONSTEXPR const vfnmsac vfnmsac_obj;
+static CONSTEXPR const vfnmsac_frm vfnmsac_frm_obj;
static CONSTEXPR const vfmadd vfmadd_obj;
static CONSTEXPR const vfnmsub vfnmsub_obj;
static CONSTEXPR const vfnmacc vfnmacc_obj;
@@ -2424,6 +2448,7 @@ BASE (vfwmul_frm)
BASE (vfmacc)
BASE (vfmacc_frm)
BASE (vfnmsac)
+BASE (vfnmsac_frm)
BASE (vfmadd)
BASE (vfnmsub)
BASE (vfnmacc)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index ca8a6dc1cc3..28eec2c3e99 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -162,6 +162,7 @@ extern const function_base *const vfwmul_frm;
extern const function_base *const vfmacc;
extern const function_base *const vfmacc_frm;
extern const function_base *const vfnmsac;
+extern const function_base *const vfnmsac_frm;
extern const function_base *const vfmadd;
extern const function_base *const vfnmsub;
extern const function_base *const vfnmacc;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 51a14e49075..c84e052c1a9 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -355,6 +355,8 @@ DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfnmacc_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmsac_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmsac.c
new file mode 100644
index 000..c3089234272
--- /dev/null
+++ b/gcc/testsuite/gcc.ta

Re: [PATCH] sso-string@gnu-versioned-namespace [PR83077]

2023-08-11 Thread Jonathan Wakely via Gcc-patches
On Fri, 11 Aug 2023, 06:44 François Dumont via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> I hadn't tested the most basic default configuration and it is failing,


I did wonder about that when you said which configurations you had tested :)




> I need some more time yet.
>

OK, no problem.

I actually have an idea for replacing the __cow_string hack with something
better, which I will try to work on next week. That might make things
simpler for you, as you won't need the __std_cow_string macro.





> François
>
>
> On 10/08/2023 07:13, François Dumont wrote:
> > Hi
> >
> > I've eventually completed this work.
> >
> > This evolution will allow to build libstdc++ without dual abi and
> > using cxx11 abi. For the moment such a config is only accessible
> > through the --enable-symvers=gnu-versioned-namespace configuration.
> >
> > libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi [PR83077]
> >
> > Use cxx11 abi when activating versioned namespace mode.
> >
> > libstdcxx-v3/ChangeLog:
> >
> > PR libstdc++/83077
> > * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]:
> > Default to "new" libstdcxx abi.
> > * config/locale/dragonfly/monetary_members.cc
> > [!_GLIBCXX_USE_DUAL_ABI]: Define money_base
> > members.
> > * config/locale/generic/monetary_members.cc
> > [!_GLIBCXX_USE_DUAL_ABI]: Likewise.
> > * config/locale/gnu/monetary_members.cc
> > [!_GLIBCXX_USE_DUAL_ABI]: Likewise.
> > * config/locale/gnu/numeric_members.cc
> > [!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
> > * configure: Regenerate.
> > * include/bits/c++config
> > [_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11,
> > _GLIBCXX_BEGIN_NAMESPACE_CXX11):
> > Define empty.
> > [_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11,
> > _GLIBCXX_DEFAULT_ABI_TAG):
> > Likewise.
> > * include/bits/cow_string.h [!_GLIBCXX_USE_CXX11_ABI]:
> > Define a light version of COW
> > basic_string as __std_cow_string for use in stdexcept.
> > * include/std/stdexcept [_GLIBCXX_USE_CXX11_ABI]: Define
> > __cow_string.
> > (__cow_string(const char*)): New.
> > (__cow_string::c_str()): New.
> > * python/libstdcxx/v6/printers.py
> > (StdStringPrinter::__init__): Set self.new_string to True
> > when std::__8::basic_string type is found.
> > * src/Makefile.am
> > [ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define empty.
> > * src/Makefile.in: Regenerate.
> > * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
> > (dual_abi_sources): ...this. Also move cow-local_init.cc,
> > cxx11-hash_tr1.cc,
> > cxx11-ios_failure.cc entries to...
> > (sources): ...this.
> > (extra_string_inst_sources): Move cow-fstream-inst.cc,
> > cow-sstream-inst.cc, cow-string-inst.cc,
> > cow-string-io-inst.cc, cow-wtring-inst.cc,
> > cow-wstring-io-inst.cc, cxx11-locale-inst.cc,
> > cxx11-wlocale-inst.cc entries to...
> > (inst_sources): ...this.
> > * src/c++11/Makefile.in: Regenerate.
> > * src/c++11/cow-fstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cow-sstream-inst.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cow-stdexcept.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Include .
> > [_GLIBCXX_USE_DUAL_ABI ||
> > _GLIBCXX_USE_CXX11_ABI](__cow_string): Redefine before including
> > . Define
> > _GLIBCXX_DEFINE_STDEXCEPT_INSTANTIATIONS so that __cow_string definition
> > in  is skipped.
> > [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS
> > definitions.
> > * src/c++11/cow-string-inst.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cow-string-io-inst.cc
> > [_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
> > * src/c++11/cow-wstring-inst.cc [_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cow-wstring-io-inst.cc
> > [_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
> > * src/c++11/cxx11-hash_tr1.cc [!_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cxx11-ios_failure.cc
> > [!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
> > [!_GLIBCXX_USE_DUAL_ABI] (__ios_failure): Remove.
> > * src/c++11/cxx11-locale-inst.cc: Cleanup, just include
> > locale-inst.cc.
> > * src/c++11/cxx11-stdexcept.cc [!_GLIBCXX_USE_CXX11_ABI]:
> > Skip definitions.
> > * src/c++11/cxx11-wlocale-inst.cc
> > [!_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
> > * src/c++11/locale-inst-numeric.h
> > [!_GLIBCXX_USE_DUAL_ABI](std::

Re: [PATCH] MATCH: [PR110937/PR100798] (a ? ~b : b) should be optimized to b ^ -(a)

2023-08-11 Thread Christophe Lyon via Gcc-patches
On Thu, 10 Aug 2023 at 20:52, Andrew Pinski  wrote:

> On Thu, Aug 10, 2023 at 6:39 AM Christophe Lyon via Gcc-patches
>  wrote:
> >
> > Hi Andrew,
> >
> >
> > On Wed, 9 Aug 2023 at 21:20, Andrew Pinski via Gcc-patches <
> > gcc-patches@gcc.gnu.org> wrote:
> >
> > > This adds a simple match pattern for this case.
> > > I noticed it a couple of different places.
> > > One while I was looking at code generation of a parser and
> > > also while I was looking at locations where bitwise_inverted_equal_p
> > > should be used more.
> > >
> > > Committed as approved after bootstrapped and tested on x86_64-linux-gnu
> > > with no regressions.
> > >
> > > PR tree-optimization/110937
> > > PR tree-optimization/100798
> > >
> > > gcc/ChangeLog:
> > >
> > > * match.pd (`a ? ~b : b`): Handle this
> > > case.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.dg/tree-ssa/bool-14.c: New test.
> > > * gcc.dg/tree-ssa/bool-15.c: New test.
> > > * gcc.dg/tree-ssa/phi-opt-33.c: New test.
> > > * gcc.dg/tree-ssa/20030709-2.c: Update testcase
> > > so `a ? -1 : 0` is not used to hit the match
> > > pattern.
> > >
> >
> > Our CI noticed that your patch introduced regressions as follows on
> aarch64:
> >
> >  Running gcc:gcc.target/aarch64/aarch64.exp ...
> > FAIL: gcc.target/aarch64/cond_op_imm_1.c scan-assembler csinv\tw[0-9]*.*
> > FAIL: gcc.target/aarch64/cond_op_imm_1.c scan-assembler csinv\tx[0-9]*.*
> >
> > Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-not \\tmov\\tz
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tneg\\tz[0-9]+\\.b, p[0-7]/m, 3
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tneg\\tz[0-9]+\\.h, p[0-7]/m, 2
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tneg\\tz[0-9]+\\.s, p[0-7]/m, 1
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tnot\\tz[0-9]+\\.b, p[0-7]/m, 3
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tnot\\tz[0-9]+\\.h, p[0-7]/m, 2
> > FAIL: gcc.target/aarch64/sve/cond_unary_5.c scan-assembler-times
> > \\tnot\\tz[0-9]+\\.s, p[0-7]/m, 1
> >
> > Hopefully you'll just need to update the testcases (I didn't check
> > manually, I think you can easily reproduce this on aarch64?)
>
> I have a few ideas of how to fix this properly inside isel without
> changing the testcases. I will start working on that starting
> tomorrow.
> In the meantime can you file a bug report? So we don't lose track of
> the regression?
>
> Hi Andrew,

Sure, I've just filed:  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110986

Thanks,

Christophe

Thanks,
> Andrew
>
> >
> > Thanks,
> >
> > Christophe
> >
> >
> >
> >
> > > ---
> > >  gcc/match.pd   | 14 ++
> > >  gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c |  5 +++--
> > >  gcc/testsuite/gcc.dg/tree-ssa/bool-14.c| 15 +++
> > >  gcc/testsuite/gcc.dg/tree-ssa/bool-15.c| 18 ++
> > >  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c | 13 +
> > >  5 files changed, 63 insertions(+), 2 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-14.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bool-15.c
> > >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-33.c
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 9b4819e5be7..fc630b63563 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6460,6 +6460,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >(if (cmp == NE_EXPR)
> > > { constant_boolean_node (true, type); })))
> > >
> > > +#if GIMPLE
> > > +/* a?~t:t -> (-(a))^t */
> > > +(simplify
> > > + (cond @0 @1 @2)
> > > + (if (INTEGRAL_TYPE_P (type)
> > > +  && bitwise_inverted_equal_p (@1, @2))
> > > +  (with {
> > > +auto prec = TYPE_PRECISION (type);
> > > +auto unsign = TYPE_UNSIGNED (type);
> > > +tree inttype = build_nonstandard_integer_type (prec, unsign);
> > > +   }
> > > +   (convert (bit_xor (negate (convert:inttype @0)) (convert:inttype
> > > @2))
> > > +#endif
> > > +
> > >  /* Simplify pointer equality compares using PTA.  */
> > >  (for neeq (ne eq)
> > >   (simplify
> > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> > > b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> > > index 5009cd69cfe..78938f919d4 100644
> > > --- a/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
> > > @@ -29,15 +29,16 @@ union tree_node
> > >  };
> > >  int make_decl_rtl (tree, int);
> > >  void *
> > > -get_alias_set (t)
> > > +get_alias_set (t, t1)
> > >   tree t;
> > > + void *t1;
> > >  {
> > >long set;
> > >if (t->decl.rtl)
> > >  return (t->decl.rtl->fld[1].rtmem
> > > ? 0
> > > : (((t->decl.rtl ? t->decl.rtl: (make_decl_rt

[PATCH] c, v3: Add stdckdint.h header for C23

2023-08-11 Thread Jakub Jelinek via Gcc-patches
On Thu, Aug 10, 2023 at 10:31:12PM +, Joseph Myers wrote:
> > The following patch (on top of the stdckdint.h patch and _BitInt patch
> > series) adds a test for _BitInt diagnostics of ckd_{add,sub,mul} macros.
> 
> I remain unconvinced that diagnosing use with types where it's clear what 
> the right semantics for the operation are is a good idea (given, in the 
> _BitInt case, that you've already implemented the built-in functions for 
> _BitInt types).  (Diagnosing for bool results *is* a good idea, since it's 
> not clear what the semantics should be.  Likewise for enums with fixed 
> underlying type bool, whether or not it's diagnosed for other enums.)

Ok, here is an updated patch without that diagnostics.
All that is diagnosed is when result is bool or enum (any kind).  Even for
enums without bool underlying type, a question is what exactly it would
mean, whether checking result fits into the range of the underlying type,
or range between smallest and largest enumerator, or signed/unsigned range
with minimum precision to represent smallest/largest enumerator, or only
where the result would fall into some enumerator, so I think punting on
those as we do for years for __builtin_*_overflow is ok.

2023-08-11  Jakub Jelinek  

* Makefile.in (USER_H): Add stdckdint.h.
* ginclude/stdckdint.h: New file.

* gcc.dg/stdckdint-1.c: New test.
* gcc.dg/stdckdint-2.c: New test.

--- gcc/Makefile.in.jj  2023-08-10 17:23:55.502325592 +0200
+++ gcc/Makefile.in 2023-08-11 09:37:35.968944530 +0200
@@ -469,6 +469,7 @@ USER_H = $(srcdir)/ginclude/float.h \
 $(srcdir)/ginclude/stdnoreturn.h \
 $(srcdir)/ginclude/stdalign.h \
 $(srcdir)/ginclude/stdatomic.h \
+$(srcdir)/ginclude/stdckdint.h \
 $(EXTRA_HEADERS)
 
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
--- gcc/ginclude/stdckdint.h.jj 2023-08-11 09:37:35.968944530 +0200
+++ gcc/ginclude/stdckdint.h2023-08-11 09:39:50.383054196 +0200
@@ -0,0 +1,40 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* ISO C23: 7.20 Checked Integer Arithmetic .  */
+
+#ifndef _STDCKDINT_H
+#define _STDCKDINT_H
+
+#define __STDC_VERSION_STDCKDINT_H__ 202311L
+
+#define ckd_add(r, a, b) ((_Bool) __builtin_add_overflow (a, b, r))
+#define ckd_sub(r, a, b) ((_Bool) __builtin_sub_overflow (a, b, r))
+#define ckd_mul(r, a, b) ((_Bool) __builtin_mul_overflow (a, b, r))
+
+/* Allow for the C library to add its part to the header.  */
+#if !defined (_LIBC_STDCKDINT_H) && __has_include_next ()
+# include_next 
+#endif
+
+#endif /* stdckdint.h */
--- gcc/testsuite/gcc.dg/stdckdint-1.c.jj   2023-08-11 09:37:35.968944530 
+0200
+++ gcc/testsuite/gcc.dg/stdckdint-1.c  2023-08-11 09:37:35.968944530 +0200
@@ -0,0 +1,61 @@
+/* Test C23 Checked Integer Arithmetic macros in .  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+#if __STDC_VERSION_STDCKDINT_H__ != 202311L
+# error __STDC_VERSION_STDCKDINT_H__ not defined to 202311L
+#endif
+
+extern void abort (void);
+
+int
+main ()
+{
+  unsigned int a;
+  if (ckd_add (&a, 1, 2) || a != 3)
+abort ();
+  if (ckd_add (&a, ~2U, 2) || a != ~0U)
+abort ();
+  if (!ckd_add (&a, ~2U, 4) || a != 1)
+abort ();
+  if (ckd_sub (&a, 42, 2) || a != 40)
+abort ();
+  if (!ckd_sub (&a, 11, ~0ULL) || a != 12)
+abort ();
+  if (ckd_mul (&a, 42, 16U) || a != 672)
+abort ();
+  if (ckd_mul (&a, ~0UL, 0) || a != 0)
+abort ();
+  if (ckd_mul (&a, 1, ~0U) || a != ~0U)
+abort ();
+  if (ckd_mul (&a, ~0UL, 1) != (~0UL > ~0U) || a != ~0U)
+abort ();
+  static_assert (_Generic (ckd_add (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_sub (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_mul (&a, 1, 1), bool: 1, default: 0));
+  signed char b;
+  if (ckd_add (&b, 8, 12) || b != 20)
+abort ();
+  if (ckd_sub (&b, 8UL, 12ULL) || b != -4)
+abort ();
+  if (ckd_mul (&b, 2, 3) || b != 6)
+abort ();
+  unsigned char c;
+  if (ckd_add (&c, 8, 12) || c != 20)
+abort ();
+  if (ckd_sub

[PATCH] dg-cmp-results: Escape slash from variant argument

2023-08-11 Thread Mikael Morin via Gcc-patches
Hello,

I ran into a bug recently, running dg-cmp-results.sh with variant
unix/-m32.  This fixes it.

OK for master?

-- >8 --

Escape slash characters in $header variable (coming from the variant
argument).  This avoids runs with say "unix/-m32" as variant resulting
in sed errors "unknown command: -".

contrib/ChangeLog:

* dg-cmp-results.sh: Escape slashes in $header to a new
variable.  Use the new variable in sed command.
---
 contrib/dg-cmp-results.sh | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index 33e0605dc50..7d17772dc75 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -90,8 +90,11 @@ sed $E -e '/^[[:space:]]+===/,$d' $OFILE
 echo "Newer log file: $NFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $NFILE
 
+# Escape occurences of / in $header before passing through sed.
+header_pattern=`echo "$header" | sed $E -e 's:/:[/]:g'`
+
 # Create a temporary file from the old file's interesting section.
-sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \
+sed $E -e "/$header_pattern/,/^[[:space:]]+===.*Summary ===/!d" \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \
@@ -101,7 +104,7 @@ sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \
   >$TMPDIR/o$$-$OBASE
 
 # Create a temporary file from the new file's interesting section.
-sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \
+sed $E -e "/$header_pattern/,/^[[:space:]]+===.*Summary ===/!d" \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \
-- 
2.40.1



[PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API

2023-08-11 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFNMADD as the below samples.

* __riscv_vfnmadd_vv_f32m1_rm
* __riscv_vfnmadd_vv_f32m1_rm_m
* __riscv_vfnmadd_vf_f32m1_rm
* __riscv_vfnmadd_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmadd_frm): New class for vfnmadd frm.
(vfnmadd_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmadd_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-nmadd.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 25 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 .../riscv/rvv/base/float-point-nmadd.c| 47 +++
 4 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 7476cdc317d..b085ba4f52d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -470,6 +470,29 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfnmadd
+*/
+class vfnmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2241,6 +2264,7 @@ static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
 static CONSTEXPR const vfmsac vfmsac_obj;
 static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
 static CONSTEXPR const vfnmadd vfnmadd_obj;
+static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
 static CONSTEXPR const vfmsub vfmsub_obj;
 static CONSTEXPR const vfwmacc vfwmacc_obj;
 static CONSTEXPR const vfwnmacc vfwnmacc_obj;
@@ -2481,6 +2505,7 @@ BASE (vfnmacc_frm)
 BASE (vfmsac)
 BASE (vfmsac_frm)
 BASE (vfnmadd)
+BASE (vfnmadd_frm)
 BASE (vfmsub)
 BASE (vfwmacc)
 BASE (vfwnmacc)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5850ff0cf2e..4ade0ace7b2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -171,6 +171,7 @@ extern const function_base *const vfnmacc_frm;
 extern const function_base *const vfmsac;
 extern const function_base *const vfmsac_frm;
 extern const function_base *const vfnmadd;
+extern const function_base *const vfnmadd_frm;
 extern const function_base *const vfmsub;
 extern const function_base *const vfwmacc;
 extern const function_base *const vfwnmacc;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 04f3de1275c..e9b16f99180 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -359,6 +359,8 @@ DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, 
f__ops)
 DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
 DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
 
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
new file mode 100644
index 000..9332617641b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfnmadd_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+   vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmadd_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfnmadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+   vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmadd_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfnmadd_vf_f32m1_rm (vfloat32

Re: [PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API

2023-08-11 Thread juzhe.zh...@rivai.ai
LGTM 



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-11 16:11
To: gcc-patches
CC: juzhe.zhong; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFNMADD as the below samples.
 
* __riscv_vfnmadd_vv_f32m1_rm
* __riscv_vfnmadd_vv_f32m1_rm_m
* __riscv_vfnmadd_vf_f32m1_rm
* __riscv_vfnmadd_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmadd_frm): New class for vfnmadd frm.
(vfnmadd_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmadd_frm): New function declaration.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-nmadd.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-nmadd.c| 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 7476cdc317d..b085ba4f52d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -470,6 +470,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfnmadd
+*/
+class vfnmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2241,6 +2264,7 @@ static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
static CONSTEXPR const vfmsac vfmsac_obj;
static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
static CONSTEXPR const vfnmadd vfnmadd_obj;
+static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
static CONSTEXPR const vfmsub vfmsub_obj;
static CONSTEXPR const vfwmacc vfwmacc_obj;
static CONSTEXPR const vfwnmacc vfwnmacc_obj;
@@ -2481,6 +2505,7 @@ BASE (vfnmacc_frm)
BASE (vfmsac)
BASE (vfmsac_frm)
BASE (vfnmadd)
+BASE (vfnmadd_frm)
BASE (vfmsub)
BASE (vfwmacc)
BASE (vfwnmacc)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5850ff0cf2e..4ade0ace7b2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -171,6 +171,7 @@ extern const function_base *const vfnmacc_frm;
extern const function_base *const vfmsac;
extern const function_base *const vfmsac_frm;
extern const function_base *const vfnmadd;
+extern const function_base *const vfnmadd_frm;
extern const function_base *const vfmsub;
extern const function_base *const vfwmacc;
extern const function_base *const vfwnmacc;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 04f3de1275c..e9b16f99180 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -359,6 +359,8 @@ DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
new file mode 100644
index 000..9332617641b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfnmadd_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+ vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmadd_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfnmadd_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+ vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfnmadd_vv_f32m1_rm_m (mask,

RE: [PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API

2023-08-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, August 11, 2023 4:12 PM
To: Li, Pan2 ; gcc-patches 
Cc: jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-11 16:11
To: gcc-patches
CC: juzhe.zhong; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFNMADD rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFNMADD as the below samples.

* __riscv_vfnmadd_vv_f32m1_rm
* __riscv_vfnmadd_vv_f32m1_rm_m
* __riscv_vfnmadd_vf_f32m1_rm
* __riscv_vfnmadd_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfnmadd_frm): New class for vfnmadd frm.
(vfnmadd_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfnmadd_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-nmadd.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-nmadd.c| 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 7476cdc317d..b085ba4f52d 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -470,6 +470,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfnmadd
+*/
+class vfnmadd_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_neg_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_neg (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2241,6 +2264,7 @@ static CONSTEXPR const vfnmacc_frm vfnmacc_frm_obj;
static CONSTEXPR const vfmsac vfmsac_obj;
static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
static CONSTEXPR const vfnmadd vfnmadd_obj;
+static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
static CONSTEXPR const vfmsub vfmsub_obj;
static CONSTEXPR const vfwmacc vfwmacc_obj;
static CONSTEXPR const vfwnmacc vfwnmacc_obj;
@@ -2481,6 +2505,7 @@ BASE (vfnmacc_frm)
BASE (vfmsac)
BASE (vfmsac_frm)
BASE (vfnmadd)
+BASE (vfnmadd_frm)
BASE (vfmsub)
BASE (vfwmacc)
BASE (vfwnmacc)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 5850ff0cf2e..4ade0ace7b2 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -171,6 +171,7 @@ extern const function_base *const vfnmacc_frm;
extern const function_base *const vfmsac;
extern const function_base *const vfmsac_frm;
extern const function_base *const vfnmadd;
+extern const function_base *const vfnmadd_frm;
extern const function_base *const vfmsub;
extern const function_base *const vfwmacc;
extern const function_base *const vfwnmacc;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 04f3de1275c..e9b16f99180 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -359,6 +359,8 @@ DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfnmsac_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-nmadd.c
new file mode 100644
index 000..9332617641b
--- /dev/null
+++ b/gcc/testsuite/gcc.

[PATCH] RISC-V: Fix vec_series expander[PR110985]

2023-08-11 Thread Juzhe-Zhong
This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985

PR target/110985

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Refactor the expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 74 +++
 .../riscv/rvv/autovec/vls-vlmax/pr110985.c| 90 +++
 2 files changed, 129 insertions(+), 35 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a3062c90618..5f9b296c92e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1309,6 +1309,7 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   machine_mode mode = GET_MODE (dest);
   poly_int64 nunits_m1 = GET_MODE_NUNITS (mode) - 1;
   poly_int64 value;
+  rtx result = register_operand (dest, mode) ? dest : gen_reg_rtx (mode);
 
   /* VECT_IV = BASE + I * STEP.  */
 
@@ -1317,15 +1318,10 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   rtx op[] = {vid};
   emit_vlmax_insn (code_for_pred_series (mode), RVV_MISC_OP, op);
 
-  /* Step 2: Generate I * STEP.
- - STEP is 1, we don't emit any instructions.
- - STEP is power of 2, we use vsll.vi/vsll.vx.
- - STEP is non-power of 2, we use vmul.vx.  */
   rtx step_adj;
-  if (rtx_equal_p (step, const1_rtx))
-step_adj = vid;
-  else if (rtx_equal_p (step, constm1_rtx) && poly_int_rtx_p (base, &value)
-  && known_eq (nunits_m1, value))
+  if (rtx_equal_p (step, constm1_rtx)
+  && poly_int_rtx_p (base, &value)
+  && known_eq (nunits_m1, value))
 {
   /* Special case:
   {nunits - 1, nunits - 2, ... , 0}.
@@ -1334,46 +1330,54 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 Code sequence:
   vid.v v
   vrsub nunits - 1, v.  */
-  rtx ops[] = {dest, vid, gen_int_mode (nunits_m1, GET_MODE_INNER (mode))};
+  rtx ops[]
+   = {result, vid, gen_int_mode (nunits_m1, GET_MODE_INNER (mode))};
   insn_code icode = code_for_pred_sub_reverse_scalar (mode);
   emit_vlmax_insn (icode, RVV_BINOP, ops);
-  return;
 }
   else
 {
-  step_adj = gen_reg_rtx (mode);
-  if (CONST_INT_P (step) && pow2p_hwi (INTVAL (step)))
+  /* Step 2: Generate I * STEP.
+- STEP is 1, we don't emit any instructions.
+- STEP is power of 2, we use vsll.vi/vsll.vx.
+- STEP is non-power of 2, we use vmul.vx.  */
+  if (rtx_equal_p (step, const1_rtx))
+   step_adj = vid;
+  else
{
- /* Emit logical left shift operation.  */
- int shift = exact_log2 (INTVAL (step));
- rtx shift_amount = gen_int_mode (shift, Pmode);
- insn_code icode = code_for_pred_scalar (ASHIFT, mode);
- rtx ops[] = {step_adj, vid, shift_amount};
- emit_vlmax_insn (icode, RVV_BINOP, ops);
+ step_adj = gen_reg_rtx (mode);
+ if (CONST_INT_P (step) && pow2p_hwi (INTVAL (step)))
+   {
+ /* Emit logical left shift operation.  */
+ int shift = exact_log2 (INTVAL (step));
+ rtx shift_amount = gen_int_mode (shift, Pmode);
+ insn_code icode = code_for_pred_scalar (ASHIFT, mode);
+ rtx ops[] = {step_adj, vid, shift_amount};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
+   }
+ else
+   {
+ insn_code icode = code_for_pred_scalar (MULT, mode);
+ rtx ops[] = {step_adj, vid, step};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
+   }
}
+
+  /* Step 3: Generate BASE + I * STEP.
+ - BASE is 0, use result of vid.
+ - BASE is not 0, we use vadd.vx/vadd.vi.  */
+  if (rtx_equal_p (base, const0_rtx))
+   emit_move_insn (result, step_adj);
   else
{
- insn_code icode = code_for_pred_scalar (MULT, mode);
- rtx ops[] = {step_adj, vid, step};
+ insn_code icode = code_for_pred_scalar (PLUS, mode);
+ rtx ops[] = {result, step_adj, base};
  emit_vlmax_insn (icode, RVV_BINOP, ops);
}
 }
 
-  /* Step 3: Generate BASE + I * STEP.
- - BASE is 0, use result of vid.
- - BASE is not 0, we use vadd.vx/vadd.vi.  */
-  if (rtx_equal_p (base, const0_rtx))
-{
-  emit_move_insn (dest, step_adj);
-}
-  else
-{
-  rtx result = gen_reg_rtx (mode);
-  insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[] = {result, step_adj, base};
-  emit_vlmax_insn (icode, RVV_BINOP, ops);
-  emit_move_insn (dest, result);
-}
+  if (result != dest)
+emit_move_insn (dest, result);
 }
 
 static void
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c
new file mode 100644
index 000..7710654c1bb
--- /dev/null

Re: [PATCH] c, c++, v2: Accept __builtin_classify_type (typename)

2023-08-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 11, 2023 at 01:13:32AM +0200, Jakub Jelinek wrote:
> Looking at the first uses of the builtin back in 90s in va*.h, it certainly
> relied on array/function decay there (the macros would abort e.g. on
> array_type_class, function_type_class and various other return values).
> Looking at older versions of tgmath.h, I see just checks for 8/9 (i.e.
> real/complex) and those woiuldn't be affected by any promotions/decay.
> But newer versions of tgmath.h before __builtin_tgmath do check also for
> 1 and they would be upset if char wasn't promoted to int (including latest
> glibc).
> systemtap macros also use __builtin_classify_type and do check for pointers
> but those seems to be prepared to handle even arrays.

So to sum it up, I think at least the original use of the builtin had a
strong reason to do the array to pointer etc. decay and argument promotion,
because that is what happens with the varargs too and the builtin is still
documented in the internals manual just for that purpose.  It is true GCC
doesn't use the builtin for that reason anymore, but there are numerous
uses in the wild, some might cope well with changing the behavior, others
less so.

> > > + cp_evaluated ev;
> > > + ++cp_unevaluated_operand;
> > > + ++c_inhibit_evaluation_warnings;
> > 
> > These three lines seem unnecessary for parsing a type.

I had a quick look at this and a reason to do at least some of this
is e.g. array types, __builtin_classify_type (int [foo () + whatever])
will not really evaluate foo () + whatever, all it will care about is that
it is an array, so emiting evaluation warnings for it would be weird.
cp_unevaluated_operand is harder to find out what all the effects are,
but e.g. warnings for missing member initializers in such expressions
isn't needed either.

> > > + tentative_firewall firewall (parser);
> > 
> > I think you only need a tentative_firewall if you're going to call
> > cp_parser_commit_to_tentative_parse yourself, which you don't.
> 
> I think I've just copied this from elsewhere, will double check in the
> morning which ones aren't really needed.

I admit I still don't understand match what it is doing, but it works
even without that in the limited testsuite coverage it has.

2023-08-11  Jakub Jelinek  

gcc/
* builtins.h (type_to_class): Declare.
* builtins.cc (type_to_class): No longer static.  Return
int rather than enum.
* doc/extend.texi (__builtin_classify_type): Document.
gcc/c/
* c-parser.cc (c_parser_postfix_expression_after_primary): Parse
__builtin_classify_type call with typename as argument.
gcc/cp/
* parser.cc (cp_parser_postfix_expression): Parse
__builtin_classify_type call with typename as argument.
* pt.cc (tsubst_copy_and_build): Handle __builtin_classify_type
with dependent typename as argument.
gcc/testsuite/
* c-c++-common/builtin-classify-type-1.c: New test.
* g++.dg/ext/builtin-classify-type-1.C: New test.
* g++.dg/ext/builtin-classify-type-2.C: New test.
* gcc.dg/builtin-classify-type-1.c: New test.

--- gcc/builtins.h.jj   2023-01-03 00:20:34.856089856 +0100
+++ gcc/builtins.h  2023-06-12 09:35:20.841902572 +0200
@@ -156,5 +156,6 @@ extern internal_fn associated_internal_f
 extern internal_fn replacement_internal_fn (gcall *);
 
 extern bool builtin_with_linkage_p (tree);
+extern int type_to_class (tree);
 
 #endif /* GCC_BUILTINS_H */
--- gcc/builtins.cc.jj  2023-05-20 15:31:09.03352 +0200
+++ gcc/builtins.cc 2023-06-12 09:35:31.709751296 +0200
@@ -113,7 +113,6 @@ static rtx expand_builtin_apply_args (vo
 static rtx expand_builtin_apply_args_1 (void);
 static rtx expand_builtin_apply (rtx, rtx, rtx);
 static void expand_builtin_return (rtx);
-static enum type_class type_to_class (tree);
 static rtx expand_builtin_classify_type (tree);
 static rtx expand_builtin_mathfn_3 (tree, rtx, rtx);
 static rtx expand_builtin_mathfn_ternary (tree, rtx, rtx);
@@ -1852,7 +1851,7 @@ expand_builtin_return (rtx result)
 
 /* Used by expand_builtin_classify_type and fold_builtin_classify_type.  */
 
-static enum type_class
+int
 type_to_class (tree type)
 {
   switch (TREE_CODE (type))
--- gcc/doc/extend.texi.jj  2023-06-10 19:58:26.197478291 +0200
+++ gcc/doc/extend.texi 2023-06-12 18:06:24.629413024 +0200
@@ -14354,6 +14354,30 @@ need not be a constant.  @xref{Object Si
 description of the function.
 @enddefbuiltin
 
+@defbuiltin{int __builtin_classify_type (@var{arg})}
+@defbuiltinx{int __builtin_classify_type (@var{type})}
+The @code{__builtin_classify_type} returns a small integer with a category
+of @var{arg} argument's type, like void type, integer type, enumeral type,
+boolean type, pointer type, reference type, offset type, real type, complex
+type, function type, method type, record type, union type, array type,
+string type, etc.  When the argument is an expression, for
+backwards compatibility reason the argum

[PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i

2023-08-11 Thread Lehua Ding
Hi,

This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.

Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.

Before:

  Advantages: The vsetvli info required by vmv.s.x has better compatibility 
since
  vmv.s.x only required SEW and VLEN be zero or one. That mean there
  is more opportunities to combine with other vsetlv infos in vsetvl pass.

  Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
  will be needed.

After:

  Advantages: No need `li rd, imm` instruction since vmv.v.i support imm 
operand.

  Disadvantages: Like before's advantages. Worse compatibility leads to more
  vsetvl instrunctions need.

Consider the bellow C code and asm after autovec.
there is an extra insn (vsetivli zero, 1, e32, m1, ta, ma)
after converted vmv.s.x to vmv.v.i.

```
int foo1(int* restrict a, int* restrict b, int *restrict c, int n) {
int sum = 0;
for (int i = 0; i < n; i++)
  sum += a[i] * b[i];

return sum;
}
```

asm (Before):

```
foo1:
ble a3,zero,.L7
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L6:
vsetvli a5,a3,e32,m1,tu,ma
sllia4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vvv1,v3,v2
bne a3,zero,.L6
vsetvli a2,zero,e32,m1,ta,ma
vmv.s.x v2,zero
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L7:
li  a0,0
ret
```

asm (After):

```
foo1:
ble a3,zero,.L4
vsetvli a2,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a3,e32,m1,tu,ma
sllia4,a5,2
sub a3,a3,a5
vle32.v v2,0(a0)
vle32.v v3,0(a1)
add a0,a0,a4
add a1,a1,a4
vmacc.vvv1,v3,v2
bne a3,zero,.L3
vsetivlizero,1,e32,m1,ta,ma
vmv.v.i v2,0
vsetvli a2,zero,e32,m1,ta,ma
vredsum.vs  v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li  a0,0
ret
```

Best,
Lehua

Co-Authored-By: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/predicates.md (vector_const_0_operand): New.
* config/riscv/vector.md (*pred_broadcast_zero): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-5.c: Update.
* gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto.

---
 gcc/config/riscv/predicates.md|  4 ++
 gcc/config/riscv/vector.md| 43 +--
 .../gcc.target/riscv/rvv/base/scalar_move-5.c | 20 +++--
 .../gcc.target/riscv/rvv/base/scalar_move-6.c | 22 --
 4 files changed, 70 insertions(+), 19 deletions(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index f2e406c718a..c102489d979 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -300,6 +300,10 @@
(match_test "satisfies_constraint_vi (op)
 || satisfies_constraint_Wc0 (op)")))
 
+(define_predicate "vector_const_0_operand"
+  (and (match_code "const_vector")
+   (match_test "satisfies_constraint_Wc0 (op)")))
+
 (define_predicate "vector_move_operand"
   (ior (match_operand 0 "nonimmediate_operand")
(and (match_code "const_vector")
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 508a3074080..4d98ab6f7e8 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1719,23 +1719,24 @@
  (match_operand:V_VLS 2 "vector_merge_operand")))]
   "TARGET_VECTOR"
 {
-  /* Handle vmv.s.x instruction which has memory scalar.  */
-  if (satisfies_constraint_Wdm (operands[3]) || riscv_vector::simm5_p 
(operands[3])
-  || rtx_equal_p (operands[3], CONST0_RTX (mode)))
+  /* Handle vmv.s.x instruction (Wb1 mask) which has memory scalar.  */
+  if (satisfies_constraint_Wdm (operands[3]))
 {
   if (satisfies_constraint_Wb1 (operands[1]))
-{
-  // Case 1: vmv.s.x (TA) ==> vlse.v (TA)
-  if (satisfies_constraint_vu (operands[2]))
-operands[1] = CONSTM1_RTX (mode);
-  else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode))
-{
- // Case 2: vmv.s.x (TU) ==> andi vl + vlse.v (TU) in RV32 system.
+   {
+ /* Case 1: vmv.s.x (TA, x == memory) ==> vlse.v (TA)  */
+ if (satisfies_constraint_vu (operands[2]))
+   operands[1] = CONSTM1_RTX (mode);
+ else if (GET_MODE_BITSIZE (mode) > GET_MODE_BITSIZE (Pmode))
+   {
+ /* Case 2: vmv.s.x (TU, x == memory) ==>
+  vl = 0 o

[PATCH] RISC-V: Revive test case PR 102957

2023-08-11 Thread Tsukasa OI via Gcc-patches
From: Tsukasa OI 

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will break the original intent of PR 102957 test
case because we wanted to make sure that we can parse a valid two-letter
extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit puts this extension name and revives the intent of the test case
for PR 102957.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957.c: Remove "dg-error" because we don't
need to test for error message.  Use the 'Zk' extension to continue
testing whether we can use valid two-letter extensions.
---
 gcc/testsuite/gcc.target/riscv/pr102957.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/pr102957.c 
b/gcc/testsuite/gcc.target/riscv/pr102957.c
index 5273ee6c5018..fe6241466354 100644
--- a/gcc/testsuite/gcc.target/riscv/pr102957.c
+++ b/gcc/testsuite/gcc.target/riscv/pr102957.c
@@ -1,7 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gzb -mabi=lp64" } */
+/* { dg-options "-march=rv64gzk -mabi=lp64" } */
 int foo()
 {
 }
-
-/* { dg-error "extension 'zb' starts with 'z' but is unsupported standard 
extension" "" { target *-*-* } 0 } */

base-commit: bcda361daaec8623c91d0dff3ea8e576373b5f50
-- 
2.41.0



[PATCH 1/2] PHI-OPT [PR 110984]: Add support for NE_EXPR/EQ_EXPR with casts to spaceship_replacement

2023-08-11 Thread Andrew Pinski via Gcc-patches
So with my next VRP patch, VRP causes:
```
  # c$_M_value_18 = PHI <-1(3), 0(2), 1(4)>
  _11 = (unsigned int) c$_M_value_18;
  _16 = _11 <= 1;
```
To be changed to:
```
  # c$_M_value_18 = PHI <-1(3), 0(2), 1(4)>
  _11 = (unsigned int) c$_M_value_18;
  _16 = _11 != 4294967295;
```

So let's add support for the above.
A few changes was needed, first to change
the range check of the rhs of the comparison to possibly
integer_all_onesp also.

The next is to add support for the cast and EQ/NE case.

Note on the testcases pr110984-1.c is basically pr94589-2.c but
with what the C++ code is doing with the signed char type;
pr110984-2.c is pr110984-1.c with the cast added to give an
explicit testcase to test against.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

PR tree-optimization/110984

gcc/ChangeLog:

* tree-ssa-phiopt.cc (spaceship_replacement): Add support for
NE/EQ for the cast case.

gcc/testsuite/ChangeLog:

* gcc.dg/pr110984-1.c: New test.
* gcc.dg/pr110984-2.c: New test.
---
 gcc/testsuite/gcc.dg/pr110984-1.c | 37 +++
 gcc/testsuite/gcc.dg/pr110984-2.c | 21 ++
 gcc/tree-ssa-phiopt.cc| 19 +---
 3 files changed, 74 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr110984-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr110984-2.c

diff --git a/gcc/testsuite/gcc.dg/pr110984-1.c 
b/gcc/testsuite/gcc.dg/pr110984-1.c
new file mode 100644
index 000..85b19eb8279
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr110984-1.c
@@ -0,0 +1,37 @@
+/* PR tree-optimization/110984 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g0 -ffast-math -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
\[ij]_\[0-9]+\\(D\\)" 14 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "i_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
5\\.0" 14 "optimized" } } */
+
+/* This is similar to pr94589-2.c except use signed char as the type for the 
[-1,2] case */
+
+#define A __attribute__((noipa))
+A int f1 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c == 0; }
+A int f2 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c != 0; }
+A int f3 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c > 0; }
+A int f4 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c < 0; }
+A int f5 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c >= 0; }
+A int f6 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c <= 0; }
+A int f7 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c == -1; }
+A int f8 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c != -1; }
+A int f9 (double i, double j) { signed char c; if (i == j) c = 0; else if (i < 
j) c = -1; else if (i > j) c = 1; else c = 2; return c > -1; }
+A int f10 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
< j) c = -1; else if (i > j) c = 1; else c = 2; return c <= -1; }
+A int f11 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
< j) c = -1; else if (i > j) c = 1; else c = 2; return c == 1; }
+A int f12 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
< j) c = -1; else if (i > j) c = 1; else c = 2; return c != 1; }
+A int f13 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
< j) c = -1; else if (i > j) c = 1; else c = 2; return c < 1; }
+A int f14 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
< j) c = -1; else if (i > j) c = 1; else c = 2; return c >= 1; }
+A int f15 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c == 0; }
+A int f16 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c != 0; }
+A int f17 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c > 0; }
+A int f18 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c < 0; }
+A int f19 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c >= 0; }
+A int f20 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) c 
= -1; else if (i > 5.0) c = 1; else c = 2; return c <= 0; }
+A int f2

[PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-08-11 Thread Andrew Pinski via Gcc-patches
So it turns out there was a simplier way of starting to
improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
That was rewrite test_for_singularity to use range_op_handler
and Value_Range.

This patch implements that and

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* vr-values.cc (test_for_singularity): Add edge argument
and rewrite using range_op_handler.
(simplify_compare_using_range_pairs): Use Value_Range
instead of value_range and update test_for_singularity call.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/vrp124.c: New test.
* gcc.dg/tree-ssa/vrp125.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
 gcc/testsuite/gcc.dg/tree-ssa/vrp125.c | 44 +
 gcc/vr-values.cc   | 91 --
 3 files changed, 114 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp125.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
new file mode 100644
index 000..6ccbda35d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a >= 0)
+;
+  else
+return 0;
+  return a < 0;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 100;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
new file mode 100644
index 000..f6c2f8e35f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a == -50 || a >= 0)
+;
+  else
+return 0;
+  return a < -50;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a == 50 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a == 50 || a > 100)
+;
+  else
+return 0;
+  return a < 25;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 50 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index a4fddd62841..7004b0224bd 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -907,66 +907,30 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
a known value range VR.
 
If there is one and only one value which will satisfy the
-   conditional, then return that value.  Else return NULL.
-
-   If signed overflow must be undefined for the value to satisfy
-   the conditional, then set *STRICT_OVERFLOW_P to true.  */
+   conditional on the EDGE, then return that value.
+   Else return NULL.  */
 
 static tree
 test_for_singularity (enum tree_code cond_code, tree op0,
- tree op1, const value_range *vr)
+ tree op1, Value_Range vr, bool edge)
 {
-  tree min = NULL;
-  tree max = NULL;
-
-  /* Extract minimum/maximum values which satisfy the conditional as it was
- written.  */
-  if (cond_code == LE_EXPR || cond_code == LT_EXPR)
+  /* This is already a singularity.  */
+  if (cond_code == NE_EXPR || cond_code == EQ_EXPR)
+return NULL;
+  auto range_op = range_op_handler (cond_code);
+  int_range<2> op1_range (TREE_TYPE (op0));
+  wide_int w = wi::to_wide (op1);
+  op1_range.set (TREE_TYPE (op1), w, w);
+  Value_Range vr1(TREE_TYPE (op0));
+  if (range_op.op1_range (vr1, TREE_TYPE (op0),
+ edge ? range_true () : range_false (),
+ op1_range))
 {
-  min = TYPE_MIN_VALUE (TREE_TYPE (op0));
-
-  max = op1;
-  if (cond_code == LT_EXPR)
-   {
- tree one = build_int_cst (TREE_TYPE (op0), 1);
- max = fold_build2 (MINUS_EXPR, TREE_TYPE (op0), max, one);
- /* Signal to compare_values_warnv this expr doesn't overflow.  */
- if (EXPR_P (max))
-   suppress_warning (max, OPT_Woverflow);
-   }
-}
-  else if (cond_code == GE_EXPR || cond_code == GT_EXPR)
-{
-  max = TYPE_MAX_VALUE (TREE_TYPE (op0));
-
-  min = op1;
-  if (cond_code == GT_EXPR)
-   {
- tree one = build_int_cst (TREE_TYPE (op0), 1);
- min = fold_build2 (PLUS_EXPR, TREE_TY

Re: [PATCH 1/2] PHI-OPT [PR 110984]: Add support for NE_EXPR/EQ_EXPR with casts to spaceship_replacement

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, Aug 11, 2023 at 11:17 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So with my next VRP patch, VRP causes:
> ```
>   # c$_M_value_18 = PHI <-1(3), 0(2), 1(4)>
>   _11 = (unsigned int) c$_M_value_18;
>   _16 = _11 <= 1;
> ```
> To be changed to:
> ```
>   # c$_M_value_18 = PHI <-1(3), 0(2), 1(4)>
>   _11 = (unsigned int) c$_M_value_18;
>   _16 = _11 != 4294967295;
> ```
>
> So let's add support for the above.
> A few changes was needed, first to change
> the range check of the rhs of the comparison to possibly
> integer_all_onesp also.
>
> The next is to add support for the cast and EQ/NE case.
>
> Note on the testcases pr110984-1.c is basically pr94589-2.c but
> with what the C++ code is doing with the signed char type;
> pr110984-2.c is pr110984-1.c with the cast added to give an
> explicit testcase to test against.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

LGTM.

> Thanks,
> Andrew Pinski
>
> PR tree-optimization/110984
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (spaceship_replacement): Add support for
> NE/EQ for the cast case.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/pr110984-1.c: New test.
> * gcc.dg/pr110984-2.c: New test.
> ---
>  gcc/testsuite/gcc.dg/pr110984-1.c | 37 +++
>  gcc/testsuite/gcc.dg/pr110984-2.c | 21 ++
>  gcc/tree-ssa-phiopt.cc| 19 +---
>  3 files changed, 74 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/pr110984-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/pr110984-2.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr110984-1.c 
> b/gcc/testsuite/gcc.dg/pr110984-1.c
> new file mode 100644
> index 000..85b19eb8279
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr110984-1.c
> @@ -0,0 +1,37 @@
> +/* PR tree-optimization/110984 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -g0 -ffast-math -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump-times "\[ij]_\[0-9]+\\(D\\) 
> (?:<|<=|==|!=|>|>=) \[ij]_\[0-9]+\\(D\\)" 14 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "i_\[0-9]+\\(D\\) (?:<|<=|==|!=|>|>=) 
> 5\\.0" 14 "optimized" } } */
> +
> +/* This is similar to pr94589-2.c except use signed char as the type for the 
> [-1,2] case */
> +
> +#define A __attribute__((noipa))
> +A int f1 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c == 0; }
> +A int f2 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c != 0; }
> +A int f3 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c > 0; }
> +A int f4 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c < 0; }
> +A int f5 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c >= 0; }
> +A int f6 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c <= 0; }
> +A int f7 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c == -1; }
> +A int f8 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c != -1; }
> +A int f9 (double i, double j) { signed char c; if (i == j) c = 0; else if (i 
> < j) c = -1; else if (i > j) c = 1; else c = 2; return c > -1; }
> +A int f10 (double i, double j) { signed char c; if (i == j) c = 0; else if 
> (i < j) c = -1; else if (i > j) c = 1; else c = 2; return c <= -1; }
> +A int f11 (double i, double j) { signed char c; if (i == j) c = 0; else if 
> (i < j) c = -1; else if (i > j) c = 1; else c = 2; return c == 1; }
> +A int f12 (double i, double j) { signed char c; if (i == j) c = 0; else if 
> (i < j) c = -1; else if (i > j) c = 1; else c = 2; return c != 1; }
> +A int f13 (double i, double j) { signed char c; if (i == j) c = 0; else if 
> (i < j) c = -1; else if (i > j) c = 1; else c = 2; return c < 1; }
> +A int f14 (double i, double j) { signed char c; if (i == j) c = 0; else if 
> (i < j) c = -1; else if (i > j) c = 1; else c = 2; return c >= 1; }
> +A int f15 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) 
> c = -1; else if (i > 5.0) c = 1; else c = 2; return c == 0; }
> +A int f16 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) 
> c = -1; else if (i > 5.0) c = 1; else c = 2; return c != 0; }
> +A int f17 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) 
> c = -1; else if (i > 5.0) c = 1; else c = 2; return c > 0; }
> +A int f18 (double i) { signed char c; if (i == 5.0) c = 0; else if (i < 5.0) 
> c = -1; else if (i > 5.0) c = 1; else c = 2; return c < 0; }
> +A int f19 (d

Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, Aug 11, 2023 at 11:17 AM Andrew Pinski via Gcc-patches
 wrote:
>
> So it turns out there was a simplier way of starting to
> improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
> That was rewrite test_for_singularity to use range_op_handler
> and Value_Range.
>
> This patch implements that and
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

I'm hoping Andrew/Aldy can have a look here.

Richard.

> gcc/ChangeLog:
>
> * vr-values.cc (test_for_singularity): Add edge argument
> and rewrite using range_op_handler.
> (simplify_compare_using_range_pairs): Use Value_Range
> instead of value_range and update test_for_singularity call.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/vrp124.c: New test.
> * gcc.dg/tree-ssa/vrp125.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
>  gcc/testsuite/gcc.dg/tree-ssa/vrp125.c | 44 +
>  gcc/vr-values.cc   | 91 --
>  3 files changed, 114 insertions(+), 65 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> new file mode 100644
> index 000..6ccbda35d1b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* Should be optimized to a == -100 */
> +int g(int a)
> +{
> +  if (a == -100 || a >= 0)
> +;
> +  else
> +return 0;
> +  return a < 0;
> +}
> +
> +/* Should optimize to a == 0 */
> +int f(int a)
> +{
> +  if (a == 0 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 50;
> +}
> +
> +/* Should be optimized to a == 0. */
> +int f2(int a)
> +{
> +  if (a == 0 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 100;
> +}
> +
> +/* Should optimize to a == 100 */
> +int f1(int a)
> +{
> +  if (a < 0 || a == 100)
> +;
> +  else
> +return 0;
> +  return a > 50;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
> new file mode 100644
> index 000..f6c2f8e35f1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +/* Should be optimized to a == -100 */
> +int g(int a)
> +{
> +  if (a == -100 || a == -50 || a >= 0)
> +;
> +  else
> +return 0;
> +  return a < -50;
> +}
> +
> +/* Should optimize to a == 0 */
> +int f(int a)
> +{
> +  if (a == 0 || a == 50 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 50;
> +}
> +
> +/* Should be optimized to a == 0. */
> +int f2(int a)
> +{
> +  if (a == 0 || a == 50 || a > 100)
> +;
> +  else
> +return 0;
> +  return a < 25;
> +}
> +
> +/* Should optimize to a == 100 */
> +int f1(int a)
> +{
> +  if (a < 0 || a == 50 || a == 100)
> +;
> +  else
> +return 0;
> +  return a > 50;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
> index a4fddd62841..7004b0224bd 100644
> --- a/gcc/vr-values.cc
> +++ b/gcc/vr-values.cc
> @@ -907,66 +907,30 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
> a known value range VR.
>
> If there is one and only one value which will satisfy the
> -   conditional, then return that value.  Else return NULL.
> -
> -   If signed overflow must be undefined for the value to satisfy
> -   the conditional, then set *STRICT_OVERFLOW_P to true.  */
> +   conditional on the EDGE, then return that value.
> +   Else return NULL.  */
>
>  static tree
>  test_for_singularity (enum tree_code cond_code, tree op0,
> - tree op1, const value_range *vr)
> + tree op1, Value_Range vr, bool edge)
>  {
> -  tree min = NULL;
> -  tree max = NULL;
> -
> -  /* Extract minimum/maximum values which satisfy the conditional as it was
> - written.  */
> -  if (cond_code == LE_EXPR || cond_code == LT_EXPR)
> +  /* This is already a singularity.  */
> +  if (cond_code == NE_EXPR || cond_code == EQ_EXPR)
> +return NULL;
> +  auto range_op = range_op_handler (cond_code);
> +  int_range<2> op1_range (TREE_TYPE (op0));
> +  wide_int w = wi::to_wide (op1);
> +  op1_range.set (TREE_TYPE (op1), w, w);
> +  Value_Range vr1(TREE_TYPE (op0));
> +  if (range_op.op1_range (vr1, TREE_TYPE (op0),
> + edge ? range_true () : range_false (),
> + op1_range))
>  {
> -  min = TYPE_MIN_VALUE (TREE_TYPE (op0));
> -
> -  max = op1;
> -  if (cond_code == LT_EXPR)
> -   {
> - tree one = build_int_cst (TREE_TYPE (op0), 1);
> - max = fold_build2 (MINUS_EXPR, TREE_TYPE (o

[PATCH V2] RISC-V: Allow CONST_VECTOR for VLS modes

2023-08-11 Thread Juzhe-Zhong
This patch enables COSNT_VECTOR for VLS modes.

void foo1 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
  a[i] = 8;
}

void foo2 (int * __restrict a)
{
for (int i = 0; i < 16; i++)
  a[i] = i;
}

Compile option: -O3 --param=riscv-autovec-preference=scalable

Before this patch:

foo1:
lui a5,%hi(.LC0)
addia5,a5,%lo(.LC0)
vsetivlizero,4,e32,m1,ta,ma
addia4,a0,16
vle32.v v1,0(a5)
vse32.v v1,0(a0)
vse32.v v1,0(a4)
addia4,a0,32
vse32.v v1,0(a4)
addia0,a0,48
vse32.v v1,0(a0)
ret
foo2:
lui a5,%hi(.LC1)
addia5,a5,%lo(.LC1)
vsetivlizero,4,e32,m1,ta,ma
vle32.v v1,0(a5)
lui a5,%hi(.LC2)
addia5,a5,%lo(.LC2)
vse32.v v1,0(a0)
vle32.v v1,0(a5)
lui a5,%hi(.LC3)
addia4,a0,16
addia5,a5,%lo(.LC3)
vse32.v v1,0(a4)
vle32.v v1,0(a5)
addia4,a0,32
lui a5,%hi(.LC4)
vse32.v v1,0(a4)
addia0,a0,48
addia5,a5,%lo(.LC4)
vle32.v v1,0(a5)
vse32.v v1,0(a0)
ret

After this patch:

foo1:
vsetivlizero,16,e32,mf2,ta,ma
vmv.v.i v1,8
vse32.v v1,0(a0)
ret
.size   foo1, .-foo1
.align  1
.globl  foo2
.type   foo2, @function
foo2:
vsetivlizero,16,e32,mf2,ta,ma
vid.v   v1
vse32.v v1,0(a0)
ret

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS CONST_VECTOR.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS CONST_VECTOR tests.
* gcc.target/riscv/rvv/autovec/vls/const-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/const-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/series-4.c: New test.

---
 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv.cc |  2 +-
 gcc/config/riscv/vector.md|  8 ++--
 .../riscv/rvv/autovec/vls/const-1.c   | 40 +++
 .../riscv/rvv/autovec/vls/const-2.c   | 40 +++
 .../riscv/rvv/autovec/vls/const-3.c   | 40 +++
 .../riscv/rvv/autovec/vls/const-4.c   | 40 +++
 .../riscv/rvv/autovec/vls/const-5.c   | 40 +++
 .../gcc.target/riscv/rvv/autovec/vls/def.h| 14 +++
 .../riscv/rvv/autovec/vls/series-1.c  | 40 +++
 .../riscv/rvv/autovec/vls/series-2.c  | 40 +++
 .../riscv/rvv/autovec/vls/series-3.c  | 40 +++
 .../riscv/rvv/autovec/vls/series-4.c  | 40 +++
 13 files changed, 380 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-4.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 3b396a9a990..cf4efbae44f 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -318,7 +318,7 @@
 ;; -
 
 (define_expand "@vec_series"
-  [(match_operand:VI 0 "register_operand")
+  [(match_operand:V_VLSI 0 "register_operand")
(match_operand: 1 "reg_or_int_operand")
(match_operand: 2 "reg_or_int_operand")]
   "TARGET_VECTOR"
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5dc19ecd377..f9b7a9ee749 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1336,7 +1336,7 @@ riscv_const_insns (rtx x)
   out range of [-16, 15].
  - 3. const series vector.
  ...etc.  */
-   if (riscv_v_ext_vector_mode_p (GET_MODE (x)))
+   if (riscv_v_ext_mode_p (GET_MODE (x)))
  {
/* const series vec

Re: [PATCH V2] RISC-V: Allow CONST_VECTOR for VLS modes

2023-08-11 Thread Kito Cheng via Gcc-patches
LGTM

Juzhe-Zhong  於 2023年8月11日 週五 17:56 寫道:

> This patch enables COSNT_VECTOR for VLS modes.
>
> void foo1 (int * __restrict a)
> {
> for (int i = 0; i < 16; i++)
>   a[i] = 8;
> }
>
> void foo2 (int * __restrict a)
> {
> for (int i = 0; i < 16; i++)
>   a[i] = i;
> }
>
> Compile option: -O3 --param=riscv-autovec-preference=scalable
>
> Before this patch:
>
> foo1:
> lui a5,%hi(.LC0)
> addia5,a5,%lo(.LC0)
> vsetivlizero,4,e32,m1,ta,ma
> addia4,a0,16
> vle32.v v1,0(a5)
> vse32.v v1,0(a0)
> vse32.v v1,0(a4)
> addia4,a0,32
> vse32.v v1,0(a4)
> addia0,a0,48
> vse32.v v1,0(a0)
> ret
> foo2:
> lui a5,%hi(.LC1)
> addia5,a5,%lo(.LC1)
> vsetivlizero,4,e32,m1,ta,ma
> vle32.v v1,0(a5)
> lui a5,%hi(.LC2)
> addia5,a5,%lo(.LC2)
> vse32.v v1,0(a0)
> vle32.v v1,0(a5)
> lui a5,%hi(.LC3)
> addia4,a0,16
> addia5,a5,%lo(.LC3)
> vse32.v v1,0(a4)
> vle32.v v1,0(a5)
> addia4,a0,32
> lui a5,%hi(.LC4)
> vse32.v v1,0(a4)
> addia0,a0,48
> addia5,a5,%lo(.LC4)
> vle32.v v1,0(a5)
> vse32.v v1,0(a0)
> ret
>
> After this patch:
>
> foo1:
> vsetivlizero,16,e32,mf2,ta,ma
> vmv.v.i v1,8
> vse32.v v1,0(a0)
> ret
> .size   foo1, .-foo1
> .align  1
> .globl  foo2
> .type   foo2, @function
> foo2:
> vsetivlizero,16,e32,mf2,ta,ma
> vid.v   v1
> vse32.v v1,0(a0)
> ret
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Add VLS CONST_VECTOR.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
> * config/riscv/vector.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS CONST_VECTOR
> tests.
> * gcc.target/riscv/rvv/autovec/vls/const-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/const-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/const-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/const-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/const-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/series-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/series-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/series-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/series-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  2 +-
>  gcc/config/riscv/riscv.cc |  2 +-
>  gcc/config/riscv/vector.md|  8 ++--
>  .../riscv/rvv/autovec/vls/const-1.c   | 40 +++
>  .../riscv/rvv/autovec/vls/const-2.c   | 40 +++
>  .../riscv/rvv/autovec/vls/const-3.c   | 40 +++
>  .../riscv/rvv/autovec/vls/const-4.c   | 40 +++
>  .../riscv/rvv/autovec/vls/const-5.c   | 40 +++
>  .../gcc.target/riscv/rvv/autovec/vls/def.h| 14 +++
>  .../riscv/rvv/autovec/vls/series-1.c  | 40 +++
>  .../riscv/rvv/autovec/vls/series-2.c  | 40 +++
>  .../riscv/rvv/autovec/vls/series-3.c  | 40 +++
>  .../riscv/rvv/autovec/vls/series-4.c  | 40 +++
>  13 files changed, 380 insertions(+), 6 deletions(-)
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-4.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/const-5.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-1.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-2.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-3.c
>  create mode 100644
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/series-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 3b396a9a990..cf4efbae44f 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -318,7 +318,7 @@
>  ;;
> -
>
>  (define_expand "@vec_series"
> -  [(match_operand:VI 0 "register_operand")
> +  [(match_operand:V_VLSI 0 "register_operand")
> (match_operand: 1 "reg_or_int_operand")
> (match_operand: 2 "reg_or_int_operand")]
>"TARGET_VECTOR"
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 5dc19ecd377..f9b7a9ee749 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ 

[PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API

2023-08-11 Thread Pan Li via Gcc-patches
From: Pan Li 

This patch would like to support the rounding mode API for the
VFMSUB as the below samples.

* __riscv_vfmsub_vv_f32m1_rm
* __riscv_vfmsub_vv_f32m1_rm_m
* __riscv_vfmsub_vf_f32m1_rm
* __riscv_vfmsub_vf_f32m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmsub_frm): New class for vfmsub frm.
(vfmsub_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmsub_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-msub.c: New test.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 25 ++
 .../riscv/riscv-vector-builtins-bases.h   |  1 +
 .../riscv/riscv-vector-builtins-functions.def |  2 +
 .../riscv/rvv/base/float-point-msub.c | 47 +++
 4 files changed, 75 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index b085ba4f52d..381bc72c784 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -493,6 +493,29 @@ public:
   }
 };
 
+/* Implements below instructions for frm
+   - vfmsub
+*/
+class vfmsub_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+   false, code_for_pred_mul (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
 /* Implements vrsub.  */
 class vrsub : public function_base
 {
@@ -2266,6 +2289,7 @@ static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
 static CONSTEXPR const vfnmadd vfnmadd_obj;
 static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
 static CONSTEXPR const vfmsub vfmsub_obj;
+static CONSTEXPR const vfmsub_frm vfmsub_frm_obj;
 static CONSTEXPR const vfwmacc vfwmacc_obj;
 static CONSTEXPR const vfwnmacc vfwnmacc_obj;
 static CONSTEXPR const vfwmsac vfwmsac_obj;
@@ -2507,6 +2531,7 @@ BASE (vfmsac_frm)
 BASE (vfnmadd)
 BASE (vfnmadd_frm)
 BASE (vfmsub)
+BASE (vfmsub_frm)
 BASE (vfwmacc)
 BASE (vfwnmacc)
 BASE (vfwmsac)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 4ade0ace7b2..99cfbfd78c8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -173,6 +173,7 @@ extern const function_base *const vfmsac_frm;
 extern const function_base *const vfnmadd;
 extern const function_base *const vfnmadd_frm;
 extern const function_base *const vfmsub;
+extern const function_base *const vfmsub_frm;
 extern const function_base *const vfwmacc;
 extern const function_base *const vfwnmacc;
 extern const function_base *const vfwmsac;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index e9b16f99180..75235ec01d3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -361,6 +361,8 @@ DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, 
f__ops)
 DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
 DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
 DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f_vvfv_ops)
 
 // 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
 DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
new file mode 100644
index 000..e58519d0742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmsub_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmsub_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+  vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmsub_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloat32m1_t
+test_vfmsub_vf_f32m1_rm (vfloat32m1_t vd, float32_t op1, vfloat32m

Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> So how can we resolve the issue when a non-VL operation like
> >> .VEC_EXTRACT is used for _len support?
> 
> Do you mean non-VL extract last operation (I am sorry that not sure whether I 
> understand your question correctly)? 
> If yes, the answer is for RVV, we are reusing the same flow as ARM SVE 
> (BIT_FILED_REF approach), see the example below:
> 
> https://godbolt.org/z/cqrWrY8q4 
> 
> #define EXTRACT_LAST(TYPE)  \
>   TYPE __attribute__ ((noinline, noclone))  \
>   test_##TYPE (TYPE *x, int n, TYPE value)  \
>   { \
> TYPE last;  \
> for (int j = 0; j < 64; ++j)\
>   { \
> last = x[j];\
> x[j] = last * value;\
>   } \
> return last;\
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t)   \
> 
> TEST_ALL (EXTRACT_LAST)
> 
>   vect_cst__22 = {value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D)};
>   vect_last_11.6_3 = MEM  [(uint8_t *)x_10(D)];
>   vect__4.7_23 = vect_last_11.6_3 * vect_cst__22;
>   MEM  [(uint8_t *)x_10(D)] = vect__4.7_23;
>   _21 = BIT_FIELD_REF ;
> 
> This approach works perfectly for both RVV and ARM SVE for non-VL and 
> non-MASK EXTRACT_LAST operation.
> 
> >> So, why do we test for get_len_load_store_mode and not just for
> >> VEC_EXTRACT?
> 
> Before answer this question, let me first elaborate how ARM SVE is doing with 
> MASK EXTRACT_LAST.
> 
> Here is the example:
> https://godbolt.org/z/8cTv1jqMb 
> 
> ARM SVE IR:
> 
>[local count: 955630224]:
>   # ivtmp_31 = PHI 
> 
>   # loop_mask_22 = PHI  -> For RVV, we 
> want this to be loop_len = SELECT_VL;
> 
>   _7 = &MEM  [(uint8_t *)x_11(D) + ivtmp_31 * 
> 1];
>   vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
>   vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
>   .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
>   ivtmp_32 = ivtmp_31 + POLY_INT_CST [16, 16];
>   _1 = (unsigned int) ivtmp_32;
> 
>   next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
> 
>   if (next_mask_35 != { 0, ... })
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
> 
>   _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23); [tail call] > 
> Use the last mask generated in BB 4, so for RVV, we are using the loop_len.
> 
> So this patch is trying to optimize the codegen with simulating same flow as 
> ARM SVE but with replacing 'loop_mask_22' (This is generated in BB4) into 
> 'loop_len'.
> 
> For ARM SVE, they only check whether target support EXTRACT_LAST pattern, 
> this pattern is supported means:
> 
> 1. Target is using loop MASK as the partial vector loop control.

I don't think it checks for this?

> 2. extract_last optab is enabled in the backend.
> 
> So for RVV, we are also checking same conditions:
> 
> 1. Target is using loop LEN as the partial vector loop control (I use 
> get_len_load_store_mode to check whether target is using loop LEN as the 
> partial vector loop control).

But we don't really know this at this point?  The only thing we know
is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.

> 2. vec_extract optab is enabled in the backend.
> 
> An alternative approach is that we can adding EXTRACT_LAST_LEN internal FN, 
> then we can only check this like ARM SVE only check EXTRACT_LAST.

I think it should work to change the direct_internal_fn_supported_p
check for IFN_EXTRACT_LAST to a "poitive" one guarding

  gcc_assert (ncopies == 1 && !slp_node);
  vect_record_loop_mask (loop_vinfo,
 &LOOP_VINFO_MASKS (loop_vinfo),
 1, vectype, NULL);

and in the else branch check for VEC_EXTRACT support and if present
record a loop len.  Just in this case this particular order would
be important.

> >> can we double-check this on powerpc and s390?
> 
> Sure, I hope it can be beneficial to powerpc and s390.
> And, I think Richard's comments are also very important so I am gonna wait 
> for it.

Yeah, just to double-check the b

Re: [PATCH] VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, Juzhe-Zhong wrote:

> This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization 
> patterns.
> 
> Here we want to support this following autovectorization:
> 
> #include 
> void
> foo (int8_t *__restrict a, 
> int8_t *__restrict b,
> int8_t *__restrict cond,
> int n)
> {
>   for (intptr_t i = 0; i < n; ++i)
> {
>   if (cond[i])
> a[i] = b[i * 2] + b[i * 2 + 1];
> }
> }
> 
> ARM SVE IR:
> 
> https://godbolt.org/z/cro1Eqc6a
> 
>   # loop_mask_60 = PHI 
>   ...
>   mask__39.12_63 = vect__3.11_61 != { 0, ... };
>   vec_mask_and_66 = loop_mask_60 & mask__39.12_63;
>   ...
>   vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66);
>   ...
> 
> For RVV, we would like to see IR:
>   
>   loop_len = SELECT_VL;
>   ...
>   mask__39.12_63 = vect__3.11_61 != { 0, ... };
>   ...
>   vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, 
> bias);
>   ...
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk ?

LGTM.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns.
> * internal-fn.cc (expand_partial_load_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> * internal-fn.def (MASK_LEN_LOAD_LANES): Ditto.
> (MASK_LEN_STORE_LANES): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 34 ++
>  gcc/internal-fn.cc  |  6 --
>  gcc/internal-fn.def |  6 ++
>  gcc/optabs.def  |  2 ++
>  4 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9693b6bfe79..70590e68ffe 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4978,6 +4978,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_len_load_lanes@var{m}@var{n}}
> +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2), length operand (operand 3) as well as bias operand 
> (operand 4)
> +that specifies which elements of the destination vectors should be loaded.
> +Other elements of the destination vectors are undefined.  The operation is 
> equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < operand3 + operand4; j++)
> +  if (operand2[j])
> +for (i = 0; i < c; i++)
> +  operand0[i][j] = operand1[j * c + i];
> +@end smallexample
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern
>  @item @samp{vec_store_lanes@var{m}@var{n}}
>  Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory
> @@ -5011,6 +5028,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_len_store_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_len_store_lanes@var{m}@var{n}}
> +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2), length operand (operand 3) as well as bias operand 
> (operand 4)
> +that specifies which elements of the source vectors should be stored.
> +The operation is equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < operand3 + operand4; j++)
> +  if (operand2[j])
> +for (i = 0; i < c; i++)
> +  operand0[j * c + i] = operand1[i][j];
> +@end smallexample
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{gather_load@var{m}@var{n}} instruction pattern
>  @item @samp{gather_load@var{m}@var{n}}
>  Load several separate memory locations into a vector of mode @var{m}.
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 7f5ede00c02..4f2b20a79e5 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2931,7 +2931,8 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall 
> *stmt, convert_optab optab)
>type = TREE_TYPE (lhs);
>rhs = expand_call_mem_ref (type, stmt, 0);
>  
> -  if (optab == vec_mask_load_lanes_optab)
> +  if (optab == vec_mask_load_lanes_optab
> +  || optab == vec_mask_len_load_lanes_optab)
>  icode = get_multi_vector_move (type, optab);
>else if (optab == len_load_optab)
>  icode = direct_optab_handler (optab, TYPE_MODE (type));
> @@ -2973,7 +2974,8 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall 
> *stmt, convert_optab optab
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> -  if (optab == vec_mask_store_lanes_optab)
> +  if (optab == vec_mask_store_lanes_optab
> +  || optab == vec_mask_len_store_lanes_optab)
>  icode = get_multi_vector_move (type, optab);
>else if (optab == len_store_optab)
>  icode = direct_optab_handler (optab, TYPE_MODE (type));
> diff --git a/gcc/internal-fn.def b/gcc/internal

Re: [PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API

2023-08-11 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-11 18:11
To: gcc-patches
CC: juzhe.zhong; jeffreyalaw; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFMSUB as the below samples.
 
* __riscv_vfmsub_vv_f32m1_rm
* __riscv_vfmsub_vv_f32m1_rm_m
* __riscv_vfmsub_vf_f32m1_rm
* __riscv_vfmsub_vf_f32m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(class vfmsub_frm): New class for vfmsub frm.
(vfmsub_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmsub_frm): New function declaration.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-msub.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-msub.c | 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index b085ba4f52d..381bc72c784 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -493,6 +493,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmsub
+*/
+class vfmsub_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2266,6 +2289,7 @@ static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
static CONSTEXPR const vfnmadd vfnmadd_obj;
static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
static CONSTEXPR const vfmsub vfmsub_obj;
+static CONSTEXPR const vfmsub_frm vfmsub_frm_obj;
static CONSTEXPR const vfwmacc vfwmacc_obj;
static CONSTEXPR const vfwnmacc vfwnmacc_obj;
static CONSTEXPR const vfwmsac vfwmsac_obj;
@@ -2507,6 +2531,7 @@ BASE (vfmsac_frm)
BASE (vfnmadd)
BASE (vfnmadd_frm)
BASE (vfmsub)
+BASE (vfmsub_frm)
BASE (vfwmacc)
BASE (vfwnmacc)
BASE (vfwmsac)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 4ade0ace7b2..99cfbfd78c8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -173,6 +173,7 @@ extern const function_base *const vfmsac_frm;
extern const function_base *const vfnmadd;
extern const function_base *const vfnmadd_frm;
extern const function_base *const vfmsub;
+extern const function_base *const vfmsub_frm;
extern const function_base *const vfwmacc;
extern const function_base *const vfwnmacc;
extern const function_base *const vfwmsac;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index e9b16f99180..75235ec01d3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -361,6 +361,8 @@ DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
new file mode 100644
index 000..e58519d0742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
@@ -0,0 +1,47 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+typedef float float32_t;
+
+vfloat32m1_t
+test_riscv_vfmsub_vv_f32m1_rm (vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmsub_vv_f32m1_rm (vd, op1, op2, 0, vl);
+}
+
+vfloat32m1_t
+test_vfmsub_vv_f32m1_rm_m (vbool32_t mask, vfloat32m1_t vd, vfloat32m1_t op1,
+vfloat32m1_t op2, size_t vl) {
+  return __riscv_vfmsub_vv_f32m1_rm_m (mask, vd, op1, op2, 1, vl);
+}
+
+vfloa

RE: [PATCH] VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns

2023-08-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Biener via Gcc-patches
Sent: Friday, August 11, 2023 6:23 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com
Subject: Re: [PATCH] VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns

On Fri, 11 Aug 2023, Juzhe-Zhong wrote:

> This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization 
> patterns.
> 
> Here we want to support this following autovectorization:
> 
> #include 
> void
> foo (int8_t *__restrict a, 
> int8_t *__restrict b,
> int8_t *__restrict cond,
> int n)
> {
>   for (intptr_t i = 0; i < n; ++i)
> {
>   if (cond[i])
> a[i] = b[i * 2] + b[i * 2 + 1];
> }
> }
> 
> ARM SVE IR:
> 
> https://godbolt.org/z/cro1Eqc6a
> 
>   # loop_mask_60 = PHI 
>   ...
>   mask__39.12_63 = vect__3.11_61 != { 0, ... };
>   vec_mask_and_66 = loop_mask_60 & mask__39.12_63;
>   ...
>   vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66);
>   ...
> 
> For RVV, we would like to see IR:
>   
>   loop_len = SELECT_VL;
>   ...
>   mask__39.12_63 = vect__3.11_61 != { 0, ... };
>   ...
>   vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, 
> bias);
>   ...
> 
> Bootstrap and Regression on X86 passed.
> 
> Ok for trunk ?

LGTM.

> gcc/ChangeLog:
> 
> * doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns.
> * internal-fn.cc (expand_partial_load_optab_fn): Ditto.
> (expand_partial_store_optab_fn): Ditto.
> * internal-fn.def (MASK_LEN_LOAD_LANES): Ditto.
> (MASK_LEN_STORE_LANES): Ditto.
> * optabs.def (OPTAB_CD): Ditto.
> 
> ---
>  gcc/doc/md.texi | 34 ++
>  gcc/internal-fn.cc  |  6 --
>  gcc/internal-fn.def |  6 ++
>  gcc/optabs.def  |  2 ++
>  4 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 9693b6bfe79..70590e68ffe 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4978,6 +4978,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_len_load_lanes@var{m}@var{n}}
> +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2), length operand (operand 3) as well as bias operand 
> (operand 4)
> +that specifies which elements of the destination vectors should be loaded.
> +Other elements of the destination vectors are undefined.  The operation is 
> equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < operand3 + operand4; j++)
> +  if (operand2[j])
> +for (i = 0; i < c; i++)
> +  operand0[i][j] = operand1[j * c + i];
> +@end smallexample
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern
>  @item @samp{vec_store_lanes@var{m}@var{n}}
>  Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory
> @@ -5011,6 +5028,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++)
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{vec_mask_len_store_lanes@var{m}@var{n}} instruction pattern
> +@item @samp{vec_mask_len_store_lanes@var{m}@var{n}}
> +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional
> +mask operand (operand 2), length operand (operand 3) as well as bias operand 
> (operand 4)
> +that specifies which elements of the source vectors should be stored.
> +The operation is equivalent to:
> +
> +@smallexample
> +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n});
> +for (j = 0; j < operand3 + operand4; j++)
> +  if (operand2[j])
> +for (i = 0; i < c; i++)
> +  operand0[j * c + i] = operand1[i][j];
> +@end smallexample
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{gather_load@var{m}@var{n}} instruction pattern
>  @item @samp{gather_load@var{m}@var{n}}
>  Load several separate memory locations into a vector of mode @var{m}.
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 7f5ede00c02..4f2b20a79e5 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2931,7 +2931,8 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall 
> *stmt, convert_optab optab)
>type = TREE_TYPE (lhs);
>rhs = expand_call_mem_ref (type, stmt, 0);
>  
> -  if (optab == vec_mask_load_lanes_optab)
> +  if (optab == vec_mask_load_lanes_optab
> +  || optab == vec_mask_len_load_lanes_optab)
>  icode = get_multi_vector_move (type, optab);
>else if (optab == len_load_optab)
>  icode = direct_optab_handler (optab, TYPE_MODE (type));
> @@ -2973,7 +2974,8 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall 
> *stmt, convert_optab optab
>type = TREE_TYPE (rhs);
>lhs = expand_call_mem_ref (type, stmt, 0);
>  
> -  if (optab == vec_mask_store_la

Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread juzhe.zh...@rivai.ai
Hi, Richi.

> 1. Target is using loop MASK as the partial vector loop control.
>> I don't think it checks for this?

I am not sure whether I understand EXTRACT_LAST correctly.
But if target doesn't use loop MASK for partial vector loop control, how does 
target use EXTRACT_LAST?
Since EXTRACT_LAST is always extracting the last element of the vector 
according to MASK operand.

> But we don't really know this at this point?  The only thing we know
> is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.

Yes. So I am try to use 'get_len_load_store' to check whether target support 
LEN loop control.

Well, I admit it's not a good idea.


> I think it should work to change the direct_internal_fn_supported_p
> check for IFN_EXTRACT_LAST to a "poitive" one guarding

>   gcc_assert (ncopies == 1 && !slp_node);
>   vect_record_loop_mask (loop_vinfo,
>  &LOOP_VINFO_MASKS (loop_vinfo),
>  1, vectype, NULL);

> and in the else branch check for VEC_EXTRACT support and if present
> record a loop len.  Just in this case this particular order would
> be important.

Do you mean change the codes as follows :?

- if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
-  OPTIMIZE_FOR_SPEED))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't operate on partial vectors "
-"because the target doesn't support extract "
-"last reduction.\n");
- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
-   }
- else if (slp_node)
  if (slp_node)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors "
 "because an SLP statement is live after "
 "the loop.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
  else if (ncopies > 1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors "
 "because ncopies is greater than 1.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
  else
{
  gcc_assert (ncopies == 1 && !slp_node);
  if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
  OPTIMIZE_FOR_SPEED))
vect_record_loop_mask (loop_vinfo,
   &LOOP_VINFO_MASKS (loop_vinfo),
   1, vectype, NULL);
  else
vect_record_loop_len (loop_vinfo,
  &LOOP_VINFO_LENS (loop_vinfo),
  1, vectype, 1);
}


Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-11 18:21
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw; krebbel
Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> >> So how can we resolve the issue when a non-VL operation like
> >> .VEC_EXTRACT is used for _len support?
> 
> Do you mean non-VL extract last operation (I am sorry that not sure whether I 
> understand your question correctly)? 
> If yes, the answer is for RVV, we are reusing the same flow as ARM SVE 
> (BIT_FILED_REF approach), see the example below:
> 
> https://godbolt.org/z/cqrWrY8q4 
> 
> #define EXTRACT_LAST(TYPE)  \
>   TYPE __attribute__ ((noinline, noclone))  \
>   test_##TYPE (TYPE *x, int n, TYPE value)  \
>   { \
> TYPE last;  \
> for (int j = 0; j < 64; ++j)\
>   { \
> last = x[j];\
> x[j] = last * value;\
>   } \
> return last;\
>   }
> 
> #define TEST_ALL(T) \
>   T (uint8_t)   \
> 
> TEST_ALL (EXTRACT_LAST)
> 
>   vect_cst__22 = {value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), value_12(D), 
> value_12(D), value_12(D), value_12

RE: [PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API

2023-08-11 Thread Li, Pan2 via Gcc-patches
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, August 11, 2023 6:24 PM
To: Li, Pan2 ; gcc-patches 
Cc: jeffreyalaw ; Li, Pan2 ; Wang, 
Yanzhang ; kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API

LGTM


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-11 18:11
To: gcc-patches
CC: juzhe.zhong; 
jeffreyalaw; pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFMSUB rounding mode intrinsic API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFMSUB as the below samples.

* __riscv_vfmsub_vv_f32m1_rm
* __riscv_vfmsub_vv_f32m1_rm_m
* __riscv_vfmsub_vf_f32m1_rm
* __riscv_vfmsub_vf_f32m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class vfmsub_frm): New class for vfmsub frm.
(vfmsub_frm): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfmsub_frm): New function declaration.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-msub.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  | 25 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/rvv/base/float-point-msub.c | 47 +++
4 files changed, 75 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index b085ba4f52d..381bc72c784 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -493,6 +493,29 @@ public:
   }
};
+/* Implements below instructions for frm
+   - vfmsub
+*/
+class vfmsub_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  bool has_merge_operand_p () const override { return false; }
+
+  rtx expand (function_expander &e) const override
+  {
+if (e.op_info->op == OP_TYPE_vf)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul_scalar (MINUS, e.vector_mode ()));
+if (e.op_info->op == OP_TYPE_vv)
+  return e.use_ternop_insn (
+ false, code_for_pred_mul (MINUS, e.vector_mode ()));
+
+gcc_unreachable ();
+  }
+};
+
/* Implements vrsub.  */
class vrsub : public function_base
{
@@ -2266,6 +2289,7 @@ static CONSTEXPR const vfmsac_frm vfmsac_frm_obj;
static CONSTEXPR const vfnmadd vfnmadd_obj;
static CONSTEXPR const vfnmadd_frm vfnmadd_frm_obj;
static CONSTEXPR const vfmsub vfmsub_obj;
+static CONSTEXPR const vfmsub_frm vfmsub_frm_obj;
static CONSTEXPR const vfwmacc vfwmacc_obj;
static CONSTEXPR const vfwnmacc vfwnmacc_obj;
static CONSTEXPR const vfwmsac vfwmsac_obj;
@@ -2507,6 +2531,7 @@ BASE (vfmsac_frm)
BASE (vfnmadd)
BASE (vfnmadd_frm)
BASE (vfmsub)
+BASE (vfmsub_frm)
BASE (vfwmacc)
BASE (vfwnmacc)
BASE (vfwmsac)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 4ade0ace7b2..99cfbfd78c8 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -173,6 +173,7 @@ extern const function_base *const vfmsac_frm;
extern const function_base *const vfnmadd;
extern const function_base *const vfnmadd_frm;
extern const function_base *const vfmsub;
+extern const function_base *const vfmsub_frm;
extern const function_base *const vfwmacc;
extern const function_base *const vfwnmacc;
extern const function_base *const vfwmsac;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index e9b16f99180..75235ec01d3 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -361,6 +361,8 @@ DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, 
f__ops)
DEF_RVV_FUNCTION (vfmadd_frm, alu_frm, full_preds, f_vvfv_ops)
DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f__ops)
DEF_RVV_FUNCTION (vfnmadd_frm, alu_frm, full_preds, f_vvfv_ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f__ops)
+DEF_RVV_FUNCTION (vfmsub_frm, alu_frm, full_preds, f_vvfv_ops)
// 13.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
DEF_RVV_FUNCTION (vfwmacc, alu, full_preds, f_wwvv_ops)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-msub.c
new file mode 100644
index 000..e58519d0742
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float

[PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Biener via Gcc-patches
When we vectorize fold-left reductions with partial vectors but
no target operation available we use a vector conditional to force
excess elements to zero.  But that doesn't correctly preserve
the sign of zero.  The following patch disables partial vector
support in that case.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Does this look OK?  With -frounding-math -fno-signed-zeros we are
happily using the masking again, but that's OK, right?  An additional
+ 0.0 shouldn't do anything here.

Thanks,
Richard.

PR tree-optimization/110979
* tree-vect-loop.cc (vectorizable_reduction): For
FOLD_LEFT_REDUCTION without target support make sure
we don't need to honor signed zeros.

* gcc.dg/torture/pr110979.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +
 gcc/tree-vect-loop.cc   | 11 +++
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c 
b/gcc/testsuite/gcc.dg/torture/pr110979.c
new file mode 100644
index 000..c25ad7a8a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
+
+#define FLT double
+#define N 20
+
+__attribute__((noipa))
+FLT
+foo3 (FLT *a)
+{
+  FLT sum = -0.0;
+  for (int i = 0; i != N; i++)
+sum += a[i];
+  return sum;
+}
+
+int main()
+{
+  FLT a[N];
+  for (int i = 0; i != N; i++)
+a[i] = -0.0;
+  if (!__builtin_signbit(foo3(a)))
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..741b5c20389 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 " no conditional operation is available.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
+  else if (reduction_type == FOLD_LEFT_REDUCTION
+  && reduc_fn == IFN_LAST
+  && FLOAT_TYPE_P (vectype_in)
+  && HONOR_SIGNED_ZEROS (vectype_in))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" signed zeros need to be handled.\n");
+ LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+   }
   else
{
  internal_fn mask_reduc_fn
-- 
2.35.3


[PATCH] Improve BB vectorization opt-info

2023-08-11 Thread Richard Biener via Gcc-patches
The following makes us more correctly print the used vector size
when doing BB vectorization and also print all involved SLP graph
roots, not just the random one we ended up picking as leader.
In particular the last bit improves diffing opt-info between
different GCC revs but it also requires some testsuite adjustments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_slp_region): Provide opt-info for all SLP
subgraph entries.  Dump the used vector size based on the
SLP subgraph entry root vector type.

* g++.dg/vect/slp-pr87105.cc: Adjust.
* gcc.dg/vect/bb-slp-17.c: Likewise.
* gcc.dg/vect/bb-slp-20.c: Likewise.
* gcc.dg/vect/bb-slp-21.c: Likewise.
* gcc.dg/vect/bb-slp-22.c: Likewise.
* gcc.dg/vect/bb-slp-subgroups-2.c: Likewise.
---
 gcc/testsuite/g++.dg/vect/slp-pr87105.cc  |  2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-17.c |  5 ++-
 gcc/testsuite/gcc.dg/vect/bb-slp-20.c |  3 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-21.c |  3 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-22.c |  2 +-
 .../gcc.dg/vect/bb-slp-subgroups-2.c  |  2 +-
 gcc/tree-vect-slp.cc  | 37 ---
 7 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/gcc/testsuite/g++.dg/vect/slp-pr87105.cc 
b/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
index d07b1cd46b7..17017686792 100644
--- a/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
+++ b/gcc/testsuite/g++.dg/vect/slp-pr87105.cc
@@ -99,7 +99,7 @@ void quadBoundingBoxA(const Point bez[3], Box& bBox) noexcept 
{
 
 // We should have if-converted everything down to straight-line code
 // { dg-final { scan-tree-dump-times "" 1 "slp2" } }
-// { dg-final { scan-tree-dump-times "basic block part vectorized" 1 "slp2" { 
xfail { { ! vect_element_align } && { ! vect_hw_misalign } } } } }
+// { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 1 "slp2" { xfail { { ! vect_element_align } && { ! vect_hw_misalign } } } 
} }
 // It's a bit awkward to detect that all stores were vectorized but the
 // following more or less does the trick
 // { dg-final { scan-tree-dump "vect_\[^\r\m\]* = MIN" "slp2" { xfail { { ! 
vect_element_align } && { ! vect_hw_misalign } } } } }
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-17.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-17.c
index fc3ef42f51a..e7bb06bf816 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-17.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-17.c
@@ -58,5 +58,6 @@ int main (void)
 }
 
 /* We need V2SI vector add support for the b[] vectorization, if we don't
-   have that we might only see the store vectorized and thus 2 subgraphs.  */
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target { vect_int_mult && vect64 } } } } */
+   have that we might only see the store vectorized.  In any case we have
+   two subgraph entries.  */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { vect_int_mult && vect64 } } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-20.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-20.c
index 134858c934a..7b25f91fbd3 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-20.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-20.c
@@ -63,6 +63,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 1 "slp2" { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target vect_int_mult } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp2" { 
target vect_int_mult } } } */
   
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-21.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-21.c
index d4c98d67f55..45054cd3c29 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-21.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-21.c
@@ -63,6 +63,7 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2"  } } */
+/* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 1 "slp2"  } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2"  } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "slp2" } 
} */
   
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-22.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-22.c
index 92cc2a51abd..7bf7e2c4feb 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-22.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-22.c
@@ -63,5 +63,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 1 "slp2" } } */
 /* { dg-final { scan-tree-dump "vectorizing SLP node starting from: _\[0-9\]+ 
= _\[0-9\]+ \\\* a0" "slp2" { target vect_int_mult  } } } */
diff --git a/gc

Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> > 1. Target is using loop MASK as the partial vector loop control.
> >> I don't think it checks for this?
> 
> I am not sure whether I understand EXTRACT_LAST correctly.
> But if target doesn't use loop MASK for partial vector loop control, how does 
> target use EXTRACT_LAST?
> Since EXTRACT_LAST is always extracting the last element of the vector 
> according to MASK operand.
> 
> > But we don't really know this at this point?  The only thing we know
> > is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.
> 
> Yes. So I am try to use 'get_len_load_store' to check whether target support 
> LEN loop control.
> 
> Well, I admit it's not a good idea.
> 
> 
> > I think it should work to change the direct_internal_fn_supported_p
> > check for IFN_EXTRACT_LAST to a "poitive" one guarding
> 
> >   gcc_assert (ncopies == 1 && !slp_node);
> >   vect_record_loop_mask (loop_vinfo,
> >  &LOOP_VINFO_MASKS (loop_vinfo),
> >  1, vectype, NULL);
> 
> > and in the else branch check for VEC_EXTRACT support and if present
> > record a loop len.  Just in this case this particular order would
> > be important.
> 
> Do you mean change the codes as follows :?
> 
> - if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> -  OPTIMIZE_FOR_SPEED))
> -   {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"can't operate on partial vectors "
> -"because the target doesn't support extract "
> -"last reduction.\n");
> - LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -   }
> - else if (slp_node)
>   if (slp_node)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors "
>  "because an SLP statement is live after "
>  "the loop.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> }
>   else if (ncopies > 1)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors "
>  "because ncopies is greater than 1.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> }
>   else
> {
>   gcc_assert (ncopies == 1 && !slp_node);
>   if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
>   OPTIMIZE_FOR_SPEED))
> vect_record_loop_mask (loop_vinfo,
>&LOOP_VINFO_MASKS (loop_vinfo),
>1, vectype, NULL);
>   else

check here the target supports VEC_EXTRACT

> vect_record_loop_len (loop_vinfo,
>   &LOOP_VINFO_LENS (loop_vinfo),
>   1, vectype, 1);

else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
diagnostic.

> }
> 
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-11 18:21
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw; krebbel
> Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > >> So how can we resolve the issue when a non-VL operation like
> > >> .VEC_EXTRACT is used for _len support?
> > 
> > Do you mean non-VL extract last operation (I am sorry that not sure whether 
> > I understand your question correctly)? 
> > If yes, the answer is for RVV, we are reusing the same flow as ARM SVE 
> > (BIT_FILED_REF approach), see the example below:
> > 
> > https://godbolt.org/z/cqrWrY8q4 
> > 
> > #define EXTRACT_LAST(TYPE)  \
> >   TYPE __attribute__ ((noinline, noclone))  \
> >   test_##TYPE (TYPE *x, int n, TYPE value)  \
> >   { \
> > TYPE last;  \
> > for (int j = 0; j < 64; ++j)\
> >   { \
> > last = x[j];\
> > x[j] = last * value;\
> >   } \
> > return last;\
> >   }
> > 
> > #define TEST_ALL(T) \
> >   T (uint8_t)   \
> > 
> > TEST_ALL (EXTRACT_LAST)
> > 
> >   vect_cst__22 = {value_12(D), value_12(D), value_12(D), value_12(D), 
> > value_12(D), value_12(D)

Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Alexander Monakov


On Fri, 11 Aug 2023, Richard Biener wrote:

> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Does this look OK?  With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

I think it converts SNan to QNan (when the partial vector has just one
element which is SNan), so is a test for -fsignaling-nans missing?

In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
can do the reduction by substituting negative zero for masked-off
elements — maybe it's worth diagnosing that case separately (i.e.
as "not yet implemented", not an incorrect transform)?

(note that in avx512 it's possible to materialize negative zeroes
by mask with a single vpternlog instruction, which is cheap)

Alexander


Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> check here the target supports VEC_EXTRACT
>> else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
>> diagnostic.

I am wondering target has VEC_EXTRACT but no EXTRACT_LAST, and such target is 
using MASK as the loop control.
It seems that it will cause ICE for such target ? (Not sure whether we 
currently have such target so far).

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-11 19:10
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw; krebbel
Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
vectorization
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> > 1. Target is using loop MASK as the partial vector loop control.
> >> I don't think it checks for this?
> 
> I am not sure whether I understand EXTRACT_LAST correctly.
> But if target doesn't use loop MASK for partial vector loop control, how does 
> target use EXTRACT_LAST?
> Since EXTRACT_LAST is always extracting the last element of the vector 
> according to MASK operand.
> 
> > But we don't really know this at this point?  The only thing we know
> > is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.
> 
> Yes. So I am try to use 'get_len_load_store' to check whether target support 
> LEN loop control.
> 
> Well, I admit it's not a good idea.
> 
> 
> > I think it should work to change the direct_internal_fn_supported_p
> > check for IFN_EXTRACT_LAST to a "poitive" one guarding
> 
> >   gcc_assert (ncopies == 1 && !slp_node);
> >   vect_record_loop_mask (loop_vinfo,
> >  &LOOP_VINFO_MASKS (loop_vinfo),
> >  1, vectype, NULL);
> 
> > and in the else branch check for VEC_EXTRACT support and if present
> > record a loop len.  Just in this case this particular order would
> > be important.
> 
> Do you mean change the codes as follows :?
> 
> - if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> -  OPTIMIZE_FOR_SPEED))
> -   {
> - if (dump_enabled_p ())
> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -"can't operate on partial vectors "
> -"because the target doesn't support extract "
> -"last reduction.\n");
> - LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> -   }
> - else if (slp_node)
>   if (slp_node)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors "
>  "because an SLP statement is live after "
>  "the loop.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> }
>   else if (ncopies > 1)
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "can't operate on partial vectors "
>  "because ncopies is greater than 1.\n");
>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> }
>   else
> {
>   gcc_assert (ncopies == 1 && !slp_node);
>   if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
>   OPTIMIZE_FOR_SPEED))
> vect_record_loop_mask (loop_vinfo,
>&LOOP_VINFO_MASKS (loop_vinfo),
>1, vectype, NULL);
>   else
 
check here the target supports VEC_EXTRACT
 
> vect_record_loop_len (loop_vinfo,
>   &LOOP_VINFO_LENS (loop_vinfo),
>   1, vectype, 1);
 
else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
diagnostic.
 
> }
> 
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-11 18:21
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw; krebbel
> Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > >> So how can we resolve the issue when a non-VL operation like
> > >> .VEC_EXTRACT is used for _len support?
> > 
> > Do you mean non-VL extract last operation (I am sorry that not sure whether 
> > I understand your question correctly)? 
> > If yes, the answer is for RVV, we are reusing the same flow as ARM SVE 
> > (BIT_FILED_REF approach), see the example below:
> > 
> > https://godbolt.org/z/cqrWrY8q4 
> > 
> > #define EXTRACT_LAST(TYPE)  \
> >   TYPE __attribute__ ((

Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> When we vectorize fold-left reductions with partial vectors but
> no target operation available we use a vector conditional to force
> excess elements to zero.  But that doesn't correctly preserve
> the sign of zero.  The following patch disables partial vector
> support in that case.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> Does this look OK?

LGTM.

> With -frounding-math -fno-signed-zeros we are
> happily using the masking again, but that's OK, right?  An additional
> + 0.0 shouldn't do anything here.

Yeah, I would hope so.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>   PR tree-optimization/110979
>   * tree-vect-loop.cc (vectorizable_reduction): For
>   FOLD_LEFT_REDUCTION without target support make sure
>   we don't need to honor signed zeros.
>
>   * gcc.dg/torture/pr110979.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +
>  gcc/tree-vect-loop.cc   | 11 +++
>  2 files changed, 36 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c 
> b/gcc/testsuite/gcc.dg/torture/pr110979.c
> new file mode 100644
> index 000..c25ad7a8a31
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
> +
> +#define FLT double
> +#define N 20
> +
> +__attribute__((noipa))
> +FLT
> +foo3 (FLT *a)
> +{
> +  FLT sum = -0.0;
> +  for (int i = 0; i != N; i++)
> +sum += a[i];
> +  return sum;
> +}
> +
> +int main()
> +{
> +  FLT a[N];
> +  for (int i = 0; i != N; i++)
> +a[i] = -0.0;
> +  if (!__builtin_signbit(foo3(a)))
> +__builtin_abort();
> +  return 0;
> +}
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bf8d677b584..741b5c20389 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8037,6 +8037,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>" no conditional operation is available.\n");
> LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>   }
> +  else if (reduction_type == FOLD_LEFT_REDUCTION
> +&& reduc_fn == IFN_LAST
> +&& FLOAT_TYPE_P (vectype_in)
> +&& HONOR_SIGNED_ZEROS (vectype_in))
> + {
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +  "can't operate on partial vectors because"
> +  " signed zeros need to be handled.\n");
> +   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> + }
>else
>   {
> internal_fn mask_reduc_fn


[PATCH] analyzer: New option fanalyzer-show-events-in-system-headers [PR110543]

2023-08-11 Thread Benjamin Priour via Gcc-patches
From: benjamin priour 

This patch introduces -fanalyzer-show-events-in-system-headers,
disabled by default.

This option reduce the noise of the analyzer emitted diagnostics
when dealing with system headers.
The new option only affects the display of the diagnostics,
but doesn't hinder the actual analysis.

Given a diagnostics path diving into a system header in the form
[
  prefix events...,
  system header call,
system header entry,
events within system headers...,
  system header return,
  suffix events...
]
then disabling the option (either by default or explicitly)
will shorten the path into:
[
  prefix events...,
  system header call,
  system header return,
  suffix events...
]

Signed-off-by: benjamin priour 

gcc/analyzer/ChangeLog:

PR analyzer/110543
* analyzer.cc (is_std_function_p): No longer static.
* analyzer.h (is_std_function_p): Add declaration.
* analyzer.opt: Add new option.
* diagnostic-manager.cc
(INCLUDE_VECTOR): Include vector from system.h
(diagnostic_manager::prune_path): Call prune_system_headers.
(prune_frame): New function that deletes all events in a frame.
(diagnostic_manager::prune_system_headers): New function.
* diagnostic-manager.h: Add prune_system_headers declaration.

gcc/testsuite/ChangeLog:

PR analyzer/110543
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers-default.C:
New test.
* g++.dg/analyzer/fanalyzer-show-events-in-system-headers-no.C:
New test.
---
 gcc/analyzer/analyzer.cc  |  2 +-
 gcc/analyzer/analyzer.h   |  1 +
 gcc/analyzer/analyzer.opt |  4 ++
 gcc/analyzer/diagnostic-manager.cc| 65 +++
 gcc/analyzer/diagnostic-manager.h |  1 +
 ...er-show-events-in-system-headers-default.C | 19 ++
 ...nalyzer-show-events-in-system-headers-no.C | 19 ++
 7 files changed, 110 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers-default.C
 create mode 100644 
gcc/testsuite/g++.dg/analyzer/fanalyzer-show-events-in-system-headers-no.C

diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
index 5091fb7a583..b27d8e359db 100644
--- a/gcc/analyzer/analyzer.cc
+++ b/gcc/analyzer/analyzer.cc
@@ -274,7 +274,7 @@ is_named_call_p (const_tree fndecl, const char *funcname)
Compare with cp/typeck.cc: decl_in_std_namespace_p, but this doesn't
rely on being the C++ FE (or handle inline namespaces inside of std).  */
 
-static inline bool
+bool
 is_std_function_p (const_tree fndecl)
 {
   tree name_decl = DECL_NAME (fndecl);
diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
index 579517c23e6..31597079153 100644
--- a/gcc/analyzer/analyzer.h
+++ b/gcc/analyzer/analyzer.h
@@ -386,6 +386,7 @@ extern bool is_special_named_call_p (const gcall *call, 
const char *funcname,
 extern bool is_named_call_p (const_tree fndecl, const char *funcname);
 extern bool is_named_call_p (const_tree fndecl, const char *funcname,
 const gcall *call, unsigned int num_args);
+extern bool is_std_function_p (const_tree fndecl);
 extern bool is_std_named_call_p (const_tree fndecl, const char *funcname);
 extern bool is_std_named_call_p (const_tree fndecl, const char *funcname,
 const gcall *call, unsigned int num_args);
diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
index 2760aaa8151..d97cd569f52 100644
--- a/gcc/analyzer/analyzer.opt
+++ b/gcc/analyzer/analyzer.opt
@@ -290,6 +290,10 @@ fanalyzer-transitivity
 Common Var(flag_analyzer_transitivity) Init(0)
 Enable transitivity of constraints during analysis.
 
+fanalyzer-show-events-in-system-headers
+Common Var(flag_analyzer_show_events_in_system_headers) Init(0)
+Trim diagnostics path that are too long before emission.
+
 fanalyzer-call-summaries
 Common Var(flag_analyzer_call_summaries) Init(0)
 Approximate the effect of function calls to simplify analysis.
diff --git a/gcc/analyzer/diagnostic-manager.cc 
b/gcc/analyzer/diagnostic-manager.cc
index cfca305d552..2a9705a464f 100644
--- a/gcc/analyzer/diagnostic-manager.cc
+++ b/gcc/analyzer/diagnostic-manager.cc
@@ -20,9 +20,11 @@ along with GCC; see the file COPYING3.  If not see
 
 #include "config.h"
 #define INCLUDE_MEMORY
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "tree.h"
+#include "input.h"
 #include "pretty-print.h"
 #include "gcc-rich-location.h"
 #include "gimple-pretty-print.h"
@@ -2281,6 +2283,8 @@ diagnostic_manager::prune_path (checker_path *path,
   path->maybe_log (get_logger (), "path");
   prune_for_sm_diagnostic (path, sm, sval, state);
   prune_interproc_events (path);
+  if (! flag_analyzer_show_events_in_system_headers)
+prune_system_headers (path);
   consolidate_conditions (path);
   finish_pruning (path);
   path->maybe_log (get_logger

[PATCH] VECT: Fix ICE on MASK_LEN_{LOAD, STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Juzhe-Zhong
This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110989

This ICE is caused because of this situation:

mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
...
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, 
POLY_INT_CST [2, 2], 0);

The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison 
wheras the LEN
is the dummy LEN which is the vectorization factor.

In this situation, we didn't enter 'vect_record_loop_len' since there is no LEN 
loop control.
Then 'LOOP_VINFO_RGROUP_IV_TYPE' is not suitable type for 'build_int_cst' used 
for producing
LEN argument for 'MASK_LEN_LOAD', so use sizetype instead which is perfectly 
matching
RVV length requirement.

PR middle-end/110989

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Replace iv_type with 
sizetype.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110989.c: New test.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c | 11 +++
 gcc/tree-vect-stmts.cc|  7 ++-
 2 files changed, 13 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
new file mode 100644
index 000..cf3b247e604
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=scalable -Ofast" } */
+
+int a, b, c;
+double *d;
+void e() {
+  double f;
+  for (; c; c++, d--)
+f = *d ?: *(&a + c);
+  b = f;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 398fbe945e5..e0e2083d022 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9126,9 +9126,8 @@ vectorizable_store (vec_info *vinfo,
{
  /* Pass VF value to 'len' argument of
 MASK_LEN_STORE if LOOP_LENS is invalid.  */
- tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
  final_len
-   = build_int_cst (iv_type,
+   = build_int_cst (sizetype,
 TYPE_VECTOR_SUBPARTS (vectype));
}
  if (!final_mask)
@@ -10713,10 +10712,8 @@ vectorizable_load (vec_info *vinfo,
  {
/* Pass VF value to 'len' argument of
   MASK_LEN_LOAD if LOOP_LENS is invalid.  */
-   tree iv_type
- = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
final_len
- = build_int_cst (iv_type,
+ = build_int_cst (sizetype,
   TYPE_VECTOR_SUBPARTS (vectype));
  }
if (!final_mask)
-- 
2.36.3



Re: [PATCH] analyzer: New option fanalyzer-show-events-in-system-headers [PR110543]

2023-08-11 Thread Benjamin Priour via Gcc-patches
I forgot to mention that this has been successfully regstrapped off trunk
54be338589ea93ad4ff53d22adde476a0582537b on x86_64-linux-gnu.

Is it OK for trunk ?

Thanks,
Benjamin.


Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, Alexander Monakov wrote:

> 
> On Fri, 11 Aug 2023, Richard Biener wrote:
> 
> > When we vectorize fold-left reductions with partial vectors but
> > no target operation available we use a vector conditional to force
> > excess elements to zero.  But that doesn't correctly preserve
> > the sign of zero.  The following patch disables partial vector
> > support in that case.
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > 
> > Does this look OK?  With -frounding-math -fno-signed-zeros we are
> > happily using the masking again, but that's OK, right?  An additional
> > + 0.0 shouldn't do anything here.
> 
> I think it converts SNan to QNan (when the partial vector has just one
> element which is SNan), so is a test for -fsignaling-nans missing?

Hm, I guess that's a corner case that could happen when there's no
runtime profitability check on more than one element and when the
element accumulated is directly loaded from memory.  OTOH the
loop vectorizer always expects an initial value for the reduction
and thus we perform either no add (when the loop isn't entered)
or at least a single add (when it is).  So I think this particular
situation cannot occur?

> In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> can do the reduction by substituting negative zero for masked-off
> elements ? maybe it's worth diagnosing that case separately (i.e.
> as "not yet implemented", not an incorrect transform)?

Ah, that's interesting.  So the only case we can't handle is
-frounding-math -fsigned-zeros then.  I'll see to adjust the patch
accordingly, like the following incremental patch:

> (note that in avx512 it's possible to materialize negative zeroes
> by mask with a single vpternlog instruction, which is cheap)

It ends up loading the { -0.0, ... } constant from memory, the
{ 0.0, ... } mask is handled by using a zero-masked load, so
indeed cheaper.

Richard.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 741b5c20389..bc3063c3615 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info 
loop_vinfo,
 
   tree vector_identity = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-vector_identity = build_zero_cst (vectype_out);
+{
+  vector_identity = build_zero_cst (vectype_out);
+  if (!HONOR_SIGNED_ZEROS (vectype_out))
+   ;
+  else
+   {
+ gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+ vector_identity = const_unop (NEGATE_EXPR, vectype_out,
+   vector_identity);
+   }
+}
 
   tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL);
   int i;
@@ -8040,12 +8050,13 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (reduction_type == FOLD_LEFT_REDUCTION
   && reduc_fn == IFN_LAST
   && FLOAT_TYPE_P (vectype_in)
-  && HONOR_SIGNED_ZEROS (vectype_in))
+  && HONOR_SIGNED_ZEROS (vectype_in)
+  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't operate on partial vectors because"
-" signed zeros need to be handled.\n");
+" signed zeros cannot be preserved.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else



Re: [PATCH] VECT: Fix ICE on MASK_LEN_{LOAD,STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, Juzhe-Zhong wrote:

> This patch fixes bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110989
> 
> This ICE is caused because of this situation:
> 
> mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
> ...
> vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, 
> POLY_INT_CST [2, 2], 0);
> 
> The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison 
> wheras the LEN
> is the dummy LEN which is the vectorization factor.
> 
> In this situation, we didn't enter 'vect_record_loop_len' since there is no 
> LEN loop control.
> Then 'LOOP_VINFO_RGROUP_IV_TYPE' is not suitable type for 'build_int_cst' 
> used for producing
> LEN argument for 'MASK_LEN_LOAD', so use sizetype instead which is perfectly 
> matching
> RVV length requirement.

Looks reasonable.  Note there is size_int (...) you could use to shorten
the code.

Thanks,
Richard.

> PR middle-end/110989
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vectorizable_store): Replace iv_type with 
> sizetype.
> (vectorizable_load): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/pr110989.c: New test.
> 
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c | 11 +++
>  gcc/tree-vect-stmts.cc|  7 ++-
>  2 files changed, 13 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> new file mode 100644
> index 000..cf3b247e604
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d 
> --param=riscv-autovec-preference=scalable -Ofast" } */
> +
> +int a, b, c;
> +double *d;
> +void e() {
> +  double f;
> +  for (; c; c++, d--)
> +f = *d ?: *(&a + c);
> +  b = f;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 398fbe945e5..e0e2083d022 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9126,9 +9126,8 @@ vectorizable_store (vec_info *vinfo,
>   {
> /* Pass VF value to 'len' argument of
>MASK_LEN_STORE if LOOP_LENS is invalid.  */
> -   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> final_len
> - = build_int_cst (iv_type,
> + = build_int_cst (sizetype,
>TYPE_VECTOR_SUBPARTS (vectype));
>   }
> if (!final_mask)
> @@ -10713,10 +10712,8 @@ vectorizable_load (vec_info *vinfo,
> {
>   /* Pass VF value to 'len' argument of
>  MASK_LEN_LOAD if LOOP_LENS is invalid.  */
> - tree iv_type
> -   = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>   final_len
> -   = build_int_cst (iv_type,
> +   = build_int_cst (sizetype,
>  TYPE_VECTOR_SUBPARTS (vectype));
> }
>   if (!final_mask)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> check here the target supports VEC_EXTRACT
> >> else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
> >> diagnostic.
> 
> I am wondering target has VEC_EXTRACT but no EXTRACT_LAST, and such 
> target is using MASK as the loop control. It seems that it will cause 
> ICE for such target ? (Not sure whether we currently have such target so 
> far).

No, we'd record a length in that case and give up on partial vectors
because we'd have both length and masks recorded in the end.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-11 19:10
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw; krebbel
> Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
> vectorization
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > > 1. Target is using loop MASK as the partial vector loop control.
> > >> I don't think it checks for this?
> > 
> > I am not sure whether I understand EXTRACT_LAST correctly.
> > But if target doesn't use loop MASK for partial vector loop control, how 
> > does target use EXTRACT_LAST?
> > Since EXTRACT_LAST is always extracting the last element of the vector 
> > according to MASK operand.
> > 
> > > But we don't really know this at this point?  The only thing we know
> > > is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.
> > 
> > Yes. So I am try to use 'get_len_load_store' to check whether target 
> > support LEN loop control.
> > 
> > Well, I admit it's not a good idea.
> > 
> > 
> > > I think it should work to change the direct_internal_fn_supported_p
> > > check for IFN_EXTRACT_LAST to a "poitive" one guarding
> > 
> > >   gcc_assert (ncopies == 1 && !slp_node);
> > >   vect_record_loop_mask (loop_vinfo,
> > >  &LOOP_VINFO_MASKS (loop_vinfo),
> > >  1, vectype, NULL);
> > 
> > > and in the else branch check for VEC_EXTRACT support and if present
> > > record a loop len.  Just in this case this particular order would
> > > be important.
> > 
> > Do you mean change the codes as follows :?
> > 
> > - if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> > -  OPTIMIZE_FOR_SPEED))
> > -   {
> > - if (dump_enabled_p ())
> > -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > -"can't operate on partial vectors "
> > -"because the target doesn't support 
> > extract "
> > -"last reduction.\n");
> > - LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > -   }
> > - else if (slp_node)
> >   if (slp_node)
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "can't operate on partial vectors "
> >  "because an SLP statement is live after "
> >  "the loop.\n");
> >   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > }
> >   else if (ncopies > 1)
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "can't operate on partial vectors "
> >  "because ncopies is greater than 1.\n");
> >   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
> > }
> >   else
> > {
> >   gcc_assert (ncopies == 1 && !slp_node);
> >   if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
> >   OPTIMIZE_FOR_SPEED))
> > vect_record_loop_mask (loop_vinfo,
> >&LOOP_VINFO_MASKS (loop_vinfo),
> >1, vectype, NULL);
> >   else
>  
> check here the target supports VEC_EXTRACT
>  
> > vect_record_loop_len (loop_vinfo,
> >   &LOOP_VINFO_LENS (loop_vinfo),
> >   1, vectype, 1);
>  
> else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
> diagnostic.
>  
> > }
> > 
> > 
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-11 18:21
> > To: juzhe.zh...@rivai.ai
> > CC: gcc-patches; richard.sandiford; linkw; krebbel
> > Subject: Re: Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST 
> > vectorization
> > On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richi.
> > > 
> > > >> So how can we resolve the issue when a 

[PATCH][v2] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Richard Biener via Gcc-patches
When we vectorize fold-left reductions with partial vectors but
no target operation available we use a vector conditional to force
excess elements to zero.  But that doesn't correctly preserve
the sign of zero.  The following patch disables partial vector
support when we have to do that and also need to honor rounding
modes other than round-to-nearest.  When round-to-nearest is in
effect and we have to preserve the sign of zero instead use
negative zero for the excess elements.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/110979
* tree-vect-loop.cc (vectorizable_reduction): For
FOLD_LEFT_REDUCTION without target support make sure
we don't need to honor signed zeros and sign dependent rounding.

* gcc.dg/torture/pr110979.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr110979.c | 25 +
 gcc/tree-vect-loop.cc   | 24 +++-
 2 files changed, 48 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr110979.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr110979.c 
b/gcc/testsuite/gcc.dg/torture/pr110979.c
new file mode 100644
index 000..c25ad7a8a31
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr110979.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "--param vect-partial-vector-usage=2" } */
+
+#define FLT double
+#define N 20
+
+__attribute__((noipa))
+FLT
+foo3 (FLT *a)
+{
+  FLT sum = -0.0;
+  for (int i = 0; i != N; i++)
+sum += a[i];
+  return sum;
+}
+
+int main()
+{
+  FLT a[N];
+  for (int i = 0; i != N; i++)
+a[i] = -0.0;
+  if (!__builtin_signbit(foo3(a)))
+__builtin_abort();
+  return 0;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..bc3063c3615 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6905,7 +6905,17 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 
   tree vector_identity = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
-vector_identity = build_zero_cst (vectype_out);
+{
+  vector_identity = build_zero_cst (vectype_out);
+  if (!HONOR_SIGNED_ZEROS (vectype_out))
+   ;
+  else
+   {
+ gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+ vector_identity = const_unop (NEGATE_EXPR, vectype_out,
+   vector_identity);
+   }
+}
 
   tree scalar_dest_var = vect_create_destination_var (scalar_dest, NULL);
   int i;
@@ -8037,6 +8047,18 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
 " no conditional operation is available.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
+  else if (reduction_type == FOLD_LEFT_REDUCTION
+  && reduc_fn == IFN_LAST
+  && FLOAT_TYPE_P (vectype_in)
+  && HONOR_SIGNED_ZEROS (vectype_in)
+  && HONOR_SIGN_DEPENDENT_ROUNDING (vectype_in))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"can't operate on partial vectors because"
+" signed zeros cannot be preserved.\n");
+ LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
+   }
   else
{
  internal_fn mask_reduc_fn
-- 
2.35.3


Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-11 Thread Christophe Lyon via Gcc-patches
Hi Martin,


On Fri, 4 Aug 2023 at 18:26, Martin Jambor  wrote:

> Hello,
>
> On Wed, Aug 02 2023, Richard Biener wrote:
> > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
> >>
> >> Hi,
> >>
> >> when IPA-SRA detects whether a parameter passed by reference is
> >> written to, it does not special case CLOBBERs which means it often
> >> bails out unnecessarily, especially when dealing with C++ destructors.
> >> Fixed by the obvious continue in the two relevant loops.
> >>
> >> The (slightly) more complex testcases in the PR need surprisingly more
> >> effort but the simple one can be fixed now easily by this patch and I'll
> >> work on the others incrementally.
> >>
> >> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
> >> if it passes too?
> >
> > LGTM, btw - how are the clobbers handled during transform?
>
> it turns out your question is spot on.  I assumed that the mini-DCE that
> I implemented into IPA-SRA transform would delete but I had a closer
> look and it is not invoked on split parameters,only on removed ones.
> What was actually happening is that the parameter got remapped to a
> default definition of a replacement VAR_DECL and we were thus
> gimple-clobbering a pointer pointing to nowhere.  The clobber then got
> DSEd and so I originally did not notice looking at the optimized dump.
>
> Still that is of course not ideal and so I added a simple function
> removing clobbers when splitting.  I as considering adding that
> functionality to ipa_param_body_adjustments::mark_dead_statements but
> that would make the function harder to read without much gain.
>
> So thanks again for the remark.  The following passes bootstrap and
> testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
> passes?
>
> Martin
>
>
>
> When IPA-SRA detects whether a parameter passed by reference is
> written to, it does not special case CLOBBERs which means it often
> bails out unnecessarily, especially when dealing with C++ destructors.
> Fixed by the obvious continue in the two relevant loops and by adding
> a simple function that marks the clobbers in the transformation code
> as statements to be removed.
>
>
Not sure if you noticed: I updated bugzilla because the new test fails on
arm, and I attached  pr110378-1.C.083i.sra there, to help you debug.

Thanks,

Christophe

gcc/ChangeLog:
>
> 2023-08-04  Martin Jambor  
>
> PR ipa/110378
> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
> members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
> (ptr_parm_has_nonarg_uses): Likewise.
> * ipa-param-manipulation.cc
> (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New.
> (ipa_param_body_adjustments::mark_dead_statements): Move initial
> checks to get_ddef_if_exists_and_is_used.
> (ipa_param_body_adjustments::mark_clobbers_dead): New.
> (ipa_param_body_adjustments::common_initialization): Call
> mark_clobbers_dead when splitting.
>
> gcc/testsuite/ChangeLog:
>
> 2023-07-31  Martin Jambor  
>
> PR ipa/110378
> * g++.dg/ipa/pr110378-1.C: New test.
> ---
>  gcc/ipa-param-manipulation.cc | 44 +---
>  gcc/ipa-param-manipulation.h  |  2 ++
>  gcc/ipa-sra.cc|  6 ++--
>  gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
>  4 files changed, 94 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C
>
> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
> index a286af7f5d9..4a185ddbdf4 100644
> --- a/gcc/ipa-param-manipulation.cc
> +++ b/gcc/ipa-param-manipulation.cc
> @@ -1072,6 +1072,20 @@ ipa_param_body_adjustments::carry_over_param (tree
> t)
>return new_parm;
>  }
>
> +/* If DECL is a gimple register that has a default definition SSA name
> and that
> +   has some uses, return the default definition, otherwise return
> NULL_TREE.  */
> +
> +tree
> +ipa_param_body_adjustments::get_ddef_if_exists_and_is_used (tree decl)
> +{
> + if (!is_gimple_reg (decl))
> +return NULL_TREE;
> +  tree ddef = ssa_default_def (m_id->src_cfun, decl);
> +  if (!ddef || has_zero_uses (ddef))
> +return NULL_TREE;
> +  return ddef;
> +}
> +
>  /* Populate m_dead_stmts given that DEAD_PARAM is going to be removed
> without
> any replacement or splitting.  REPL is the replacement VAR_SECL to
> base any
> remaining uses of a removed parameter on.  Push all removed SSA names
> that
> @@ -1084,10 +1098,8 @@ ipa_param_body_adjustments::mark_dead_statements
> (tree dead_param,
>/* Current IPA analyses which remove unused parameters never remove a
>   non-gimple register ones which have any use except as parameters in
> other
>   calls, so we can safely leve them as they are.  */
> -  if (!is_gimple_reg (dead_param))
> -return;
> -  tree 

[PATCH v3] c++: extend cold, hot attributes to classes

2023-08-11 Thread Javier Martinez via Gcc-patches
Hi Jason,

Regarding the initialization example - no, the set of classes that we
consider cold is more loosely defined.

On Thu, Aug 10, 2023 at 11:01 PM Jason Merrill  wrote:
> Yes, but that's because the implicit op== isn't declared lazily like
> some other special member functions (CLASSTYPE_LAZY_*/lazily_declare_fn)
> which can happen after the class is complete.

I see, thanks. I have fixed this now by injecting it directly from
lazily_declare_fn, works well. Doing it from grokclassfn instead seems to
be a nuisance because the explicit method attribute might be processed
after the class-propagated attribute is injected, which is the wrong way
around for the desired precedence.

> I think it would work to check for (flags & (ATTR_FLAG_FUNCTION_NEXT |
> ATTR_FLAG_DECL_NEXT)) and return without warning in that case.  You'd
> still set *no_add_attr.

Correct, done.

I have added the patch as an attachment, if it garbles it then I will use
git-send-email next time.

---
From 684ee3b19463fe7f447fbaa96a7b44522f1ce594 Mon Sep 17 00:00:00 2001
From: Javier Martinez 
Date: Thu, 10 Aug 2023 17:08:27 +0200
Subject: [PATCH v3] c++: extend cold, hot attributes to classes

Signed-off-by: Javier Martinez 

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_hot_attribute): remove warning on RECORD_TYPE
and UNION_TYPE when in c_dialect_xx.
(handle_cold_attribute): Likewise

gcc/cp/ChangeLog:

* class.cc (propagate_class_warmth_attribute): New function.
(finish_struct): propagate hot and cold attributes to all
FUNCTION_DECL when the record is marked hot or cold.
* cp-tree.h (maybe_propagate_warmth_attributes): New function.
* decl2.cc (maybe_propagate_warmth_attributes): New function.
* method.cc (lazily_declare_fn): propagate hot and cold
attributes to lazily declared functions when the record is
marked hot or cold.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-hotness.C: New test.

---
 gcc/c-family/c-attribs.cc   | 48 +++--
 gcc/cp/class.cc | 29 +++
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/decl2.cc | 37 +++
 gcc/cp/method.cc|  6 
 gcc/testsuite/g++.dg/ext/attr-hotness.C | 16 +
 6 files changed, 135 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-hotness.C

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..bfd2ff194b5 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -452,10 +452,10 @@ const struct attribute_spec c_common_attribute_table[] =
   { "alloc_size",	  1, 2, false, true, true, false,
 			  handle_alloc_size_attribute,
 	  attr_alloc_exclusions },
-  { "cold",   0, 0, true,  false, false, false,
+  { "cold",		  0, 0, false,  false, false, false,
 			  handle_cold_attribute,
 	  attr_cold_hot_exclusions },
-  { "hot",0, 0, true,  false, false, false,
+  { "hot",		  0, 0, false,  false, false, false,
 			  handle_hot_attribute,
 	  attr_cold_hot_exclusions },
   { "no_address_safety_analysis",
@@ -1110,6 +1110,28 @@ handle_hot_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute hot processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	  && c_dialect_cxx ())
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree cold_attr = lookup_attribute ("cold", TYPE_ATTRIBUTES (*node));
+  if (cold_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+			"conflicts with attribute %qs", name, "cold");
+	  *no_add_attrs = true;
+	}
+}
+  else if (flags & ((int) ATTR_FLAG_FUNCTION_NEXT
+		| (int) ATTR_FLAG_DECL_NEXT))
+{
+	/* Avoid applying the attribute to a function return type when
+	   used as:  void __attribute ((hot)) foo (void).  It will be
+	   passed to the function.  */
+	*no_add_attrs = true;
+}
   else
 {
   warning (OPT_Wattributes, "%qE attribute ignored", name);
@@ -1131,6 +1153,28 @@ handle_cold_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   /* Attribute cold processing is done later with lookup_attribute.  */
 }
+  else if ((TREE_CODE (*node) == RECORD_TYPE
+	|| TREE_CODE (*node) == UNION_TYPE)
+	  && c_dialect_cxx ())
+{
+  /* Check conflict here as decl_attributes will otherwise only catch
+	 it late at the function when the attribute is used on a class.  */
+  tree hot_attr = lookup_attribute ("hot", TYPE_ATTRIBUTES (*node));
+  if (hot_attr)
+	{
+	  warning (OPT_Wattributes, "ignoring attribute %qE because it "
+			"conf

Re: [PATCH] RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/10/23 20:12, Jin Ma wrote:



My fault, I'm very sorry for not replying to the patch follow-up, I just
forgot this :)
No worries.  We're tracking it in patchwork and it also overlaps with 
some work we had internally at Ventana.  So it was trivial to pick it up 
once it was clear it'd fallen through the cracks.


jeff


Re: [PATCH V3] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Fri, 11 Aug 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi, Richi.
>> 
>> > 1. Target is using loop MASK as the partial vector loop control.
>> >> I don't think it checks for this?
>> 
>> I am not sure whether I understand EXTRACT_LAST correctly.
>> But if target doesn't use loop MASK for partial vector loop control, how 
>> does target use EXTRACT_LAST?
>> Since EXTRACT_LAST is always extracting the last element of the vector 
>> according to MASK operand.
>> 
>> > But we don't really know this at this point?  The only thing we know
>> > is that nothing set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false.
>> 
>> Yes. So I am try to use 'get_len_load_store' to check whether target support 
>> LEN loop control.
>> 
>> Well, I admit it's not a good idea.
>> 
>> 
>> > I think it should work to change the direct_internal_fn_supported_p
>> > check for IFN_EXTRACT_LAST to a "poitive" one guarding
>> 
>> >   gcc_assert (ncopies == 1 && !slp_node);
>> >   vect_record_loop_mask (loop_vinfo,
>> >  &LOOP_VINFO_MASKS (loop_vinfo),
>> >  1, vectype, NULL);
>> 
>> > and in the else branch check for VEC_EXTRACT support and if present
>> > record a loop len.  Just in this case this particular order would
>> > be important.
>> 
>> Do you mean change the codes as follows :?
>> 
>> - if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
>> -  OPTIMIZE_FOR_SPEED))
>> -   {
>> - if (dump_enabled_p ())
>> -   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> -"can't operate on partial vectors "
>> -"because the target doesn't support extract 
>> "
>> -"last reduction.\n");
>> - LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>> -   }
>> - else if (slp_node)
>>   if (slp_node)
>> {
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>  "can't operate on partial vectors "
>>  "because an SLP statement is live after "
>>  "the loop.\n");
>>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>> }
>>   else if (ncopies > 1)
>> {
>>   if (dump_enabled_p ())
>> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>  "can't operate on partial vectors "
>>  "because ncopies is greater than 1.\n");
>>   LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
>> }
>>   else
>> {
>>   gcc_assert (ncopies == 1 && !slp_node);
>>   if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
>>   OPTIMIZE_FOR_SPEED))
>> vect_record_loop_mask (loop_vinfo,
>>&LOOP_VINFO_MASKS (loop_vinfo),
>>1, vectype, NULL);
>>   else
>
> check here the target supports VEC_EXTRACT
>
>> vect_record_loop_len (loop_vinfo,
>>   &LOOP_VINFO_LENS (loop_vinfo),
>>   1, vectype, 1);
>
> else set LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P to false with a
> diagnostic.

I agree with all this FWIW.  That is, the check should be based on
.VEC_EXTRACT alone, but .EXTRACT_LAST should take priority (not least
because SVE provides both .VEC_EXTRACT and .EXTRACT_LAST).

Thanks,
Richard


Re: [PATCH] c, v3: Add stdckdint.h header for C23

2023-08-11 Thread Joseph Myers
On Fri, 11 Aug 2023, Jakub Jelinek wrote:

> All that is diagnosed is when result is bool or enum (any kind).  Even for

I'd suggest tests that other nonsense cases are diagnosed, such as 
floating-point or pointer arguments or results (hopefully such cases are 
already diagnosed and just need tests).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] RISC-V: Handle no_insn in TARGET_SCHED_VARIABLE_ISSUE.

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/10/23 21:45, Palmer Dabbelt wrote:


OK, that seems like the way to go.  I still think it's likely we'll need 
to split up these types more, but that's something we can only deal with 
when there's HW that behaves oddly.

Yea, but I think we can fault this in as problematic hardware arrives.



No, it's really a target issue.  And what I was suggesting is that we
get to the point where we can enable the currently #if 0'd assert so
that if we introduce insns without an associated type, we get a nice
early warning.  I wasn't up for tackling that this week ;-)


I was thinking of some sort of "TARGET_ALLOWS_UNKNOWN_INSNS" hook, but 
poking around the uses that might not be meaningfully simpler than just 
rejecting these in the backend -- certainly simpler if we're just 
worried about RISC-V ;)
Not all ports have types at all.  Some use types for things other than 
scheduling.  It'd be a huge can of worms.




This seems pretty mechinacial: just scrub through our MDs to check for 
any un-typed insns, then add the assert and fix the failures.  You're 
more than welcome to have at it, but LMK if you want me to try and find 
some time for someone to do it -- certainly seems like a good way for 
someone new to dig in a bit.
Yes, definitely mechanical.  And yes, it's a good way for someone to 
start to get familiar with these bits -- I used the lack of types on 
some of the bitmanip insns to help ramp up Raphael and one of the RAU 
guys in this space.


Jeff


Re: [PATCH] tree-optimization/110979 - fold-left reduction and partial vectors

2023-08-11 Thread Alexander Monakov


On Fri, 11 Aug 2023, Richard Biener wrote:

> > I think it converts SNan to QNan (when the partial vector has just one
> > element which is SNan), so is a test for -fsignaling-nans missing?
> 
> Hm, I guess that's a corner case that could happen when there's no
> runtime profitability check on more than one element and when the
> element accumulated is directly loaded from memory.  OTOH the
> loop vectorizer always expects an initial value for the reduction
> and thus we perform either no add (when the loop isn't entered)
> or at least a single add (when it is).  So I think this particular
> situation cannot occur?

Yes, that makes sense, thanks for the elaboration.
(it's a bit subtle so maybe worth a comment? not sure)

> > In the defaut -fno-rounding-math -fno-signaling-nans mode I think we
> > can do the reduction by substituting negative zero for masked-off
> > elements ? maybe it's worth diagnosing that case separately (i.e.
> > as "not yet implemented", not an incorrect transform)?
> 
> Ah, that's interesting.  So the only case we can't handle is
> -frounding-math -fsigned-zeros then.  I'll see to adjust the patch
> accordingly, like the following incremental patch:

Yeah, nice!

> > (note that in avx512 it's possible to materialize negative zeroes
> > by mask with a single vpternlog instruction, which is cheap)
> 
> It ends up loading the { -0.0, ... } constant from memory, the
> { 0.0, ... } mask is handled by using a zero-masked load, so
> indeed cheaper.

I was thinking it could be easily done without a memory load,
but got confused, sorry.

Alexander


Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-11 Thread Martin Jambor
Hello,

On Fri, Aug 11 2023, Christophe Lyon wrote:
> Hi Martin,
>
>
> On Fri, 4 Aug 2023 at 18:26, Martin Jambor  wrote:
>
>> Hello,
>>
>> On Wed, Aug 02 2023, Richard Biener wrote:
>> > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor  wrote:
>> >>
>> >> Hi,
>> >>
>> >> when IPA-SRA detects whether a parameter passed by reference is
>> >> written to, it does not special case CLOBBERs which means it often
>> >> bails out unnecessarily, especially when dealing with C++ destructors.
>> >> Fixed by the obvious continue in the two relevant loops.
>> >>
>> >> The (slightly) more complex testcases in the PR need surprisingly more
>> >> effort but the simple one can be fixed now easily by this patch and I'll
>> >> work on the others incrementally.
>> >>
>> >> Bootstrapped and currently undergoing testsuite run on x86_64-linux.  OK
>> >> if it passes too?
>> >
>> > LGTM, btw - how are the clobbers handled during transform?
>>
>> it turns out your question is spot on.  I assumed that the mini-DCE that
>> I implemented into IPA-SRA transform would delete but I had a closer
>> look and it is not invoked on split parameters,only on removed ones.
>> What was actually happening is that the parameter got remapped to a
>> default definition of a replacement VAR_DECL and we were thus
>> gimple-clobbering a pointer pointing to nowhere.  The clobber then got
>> DSEd and so I originally did not notice looking at the optimized dump.
>>
>> Still that is of course not ideal and so I added a simple function
>> removing clobbers when splitting.  I as considering adding that
>> functionality to ipa_param_body_adjustments::mark_dead_statements but
>> that would make the function harder to read without much gain.
>>
>> So thanks again for the remark.  The following passes bootstrap and
>> testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
>> passes?
>>
>> Martin
>>
>>
>>
>> When IPA-SRA detects whether a parameter passed by reference is
>> written to, it does not special case CLOBBERs which means it often
>> bails out unnecessarily, especially when dealing with C++ destructors.
>> Fixed by the obvious continue in the two relevant loops and by adding
>> a simple function that marks the clobbers in the transformation code
>> as statements to be removed.
>>
>>
> Not sure if you noticed: I updated bugzilla because the new test fails on
> arm, and I attached  pr110378-1.C.083i.sra there, to help you debug.
>

I am aware and have actually started looking at the issue a while ago.
Sorry, I'm only slowly making my way through my TODO list.

The difference on 32bit ARM is that the destructor return this pointer,
which means that IPA-SRA cannot just split the loaded bit - without any
follow-up IPA analysis that the return value is unused which it does not
take into account this way.  But now that we remove useless returns
before splitting it should be doable.

Meanwhile, is there a dejagnu target macro for architectures with
destructors returning value so that we could xfail the test there?

Thanks for bringing my attention to this.

Martin



> Thanks,
>
> Christophe
>
> gcc/ChangeLog:
>>
>> 2023-08-04  Martin Jambor  
>>
>> PR ipa/110378
>> * ipa-param-manipulation.h (class ipa_param_body_adjustments): New
>> members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
>> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
>> (ptr_parm_has_nonarg_uses): Likewise.
>> * ipa-param-manipulation.cc
>> (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used): New.
>> (ipa_param_body_adjustments::mark_dead_statements): Move initial
>> checks to get_ddef_if_exists_and_is_used.
>> (ipa_param_body_adjustments::mark_clobbers_dead): New.
>> (ipa_param_body_adjustments::common_initialization): Call
>> mark_clobbers_dead when splitting.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2023-07-31  Martin Jambor  
>>
>> PR ipa/110378
>> * g++.dg/ipa/pr110378-1.C: New test.
>> ---
>>  gcc/ipa-param-manipulation.cc | 44 +---
>>  gcc/ipa-param-manipulation.h  |  2 ++
>>  gcc/ipa-sra.cc|  6 ++--
>>  gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
>>  4 files changed, 94 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C
>>
>> diff --git a/gcc/ipa-param-manipulation.cc b/gcc/ipa-param-manipulation.cc
>> index a286af7f5d9..4a185ddbdf4 100644
>> --- a/gcc/ipa-param-manipulation.cc
>> +++ b/gcc/ipa-param-manipulation.cc
>> @@ -1072,6 +1072,20 @@ ipa_param_body_adjustments::carry_over_param (tree
>> t)
>>return new_parm;
>>  }
>>
>> +/* If DECL is a gimple register that has a default definition SSA name
>> and that
>> +   has some uses, return the default definition, otherwise return
>> NULL_TREE.  */
>> +
>> +tree
>> +ipa_param_body_adjustments::get_ddef_if_exists_and_is_used (tree decl)
>> +{
>> + if (!is_gimpl

Re: [PATCH] c++: bogus warning w/ deduction guide in anon ns [PR106604]

2023-08-11 Thread Patrick Palka via Gcc-patches
On Thu, 10 Aug 2023, Jason Merrill wrote:

> On 8/10/23 16:40, Patrick Palka wrote:
> > On Thu, 10 Aug 2023, Jason Merrill wrote:
> > 
> > > On 8/10/23 12:09, Patrick Palka wrote:
> > > > Booststrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > > > for
> > > > trunk and perhaps 13?
> > > > 
> > > > -- >8 --
> > > > 
> > > > We shouldn't issue a "declared static but never defined" warning
> > > > for a deduction guide (declared in an anonymous namespace).
> > > > 
> > > > PR c++/106604
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * decl.cc (wrapup_namespace_globals): Don't issue a
> > > > -Wunused-function warning for a deduction guide.
> > > 
> > > Maybe instead of special casing this here we could set DECL_INITIAL on
> > > deduction guides so they look defined?
> > 
> > That seems to work, but it requires some tweaks in duplicate_decls to keep
> > saying "declared" instead of "defined" when diagnosing a deduction guide
> > redeclaration.  I'm not sure which approach is preferable?
> 
> I'm not sure it matters which we say; the restriction that you can't repeat a
> deduction guide makes it more like a definition anyway (even if [basic.def]
> disagrees).  Is the diagnostic worse apart from that word?

Ah, makes sense.  So we can also remove the special case for them in the
redeclaration checking code after we give them a dummy DECL_INITIAL.
Like so?

Here's a before/after for the diagnostic with the below patch:

Before

src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:11:1: error: deduction guide 
‘S()-> S’ redeclared
   11 | S() -> S; // { dg-error "redefinition" }
  | ^
src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:10:1: note: ‘S()-> S’ 
previously declared here
   10 | S() -> S; // { dg-message "previously defined here|old 
declaration" }
  | ^

After

src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:11:1: error: redefinition of 
‘S()-> S’
   11 | S() -> S; // { dg-error "redefinition" }
  | ^
src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:10:1: note: ‘S()-> S’ 
previously defined here
   10 | S() -> S; // { dg-message "previously defined here|old 
declaration" }
  | ^

-- >8 --

Subject: [PATCH] c++: bogus warning w/ deduction guide in anon ns [PR106604]

Here we're unintentionally issuing a "declared static but never defined"
warning for a deduction guide declared in an anonymous namespace.
This patch fixes this by giving deduction guides a dummy DECL_INITIAL,
which suppresses the warning and also allows us to simplify redeclaration
checking for them.

Co-authored-by: Jason Merrill 

PR c++/106604

gcc/cp/ChangeLog:

* decl.cc (redeclaration_error_message): Remove special handling
for deduction guides.
(grokfndecl): Give deduction guides a dummy DECL_INITIAL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction74.C: Expect "defined" instead
of "declared" in diagnostics for a repeated deduction guide.
* g++.dg/cpp1z/class-deduction116.C: New test.
---
 gcc/cp/decl.cc  | 14 ++
 gcc/testsuite/g++.dg/cpp1z/class-deduction116.C |  8 
 gcc/testsuite/g++.dg/cpp1z/class-deduction74.C  | 14 +++---
 3 files changed, 21 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction116.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 792ab330dd0..3ada5516c58 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -3297,10 +3297,6 @@ redeclaration_error_message (tree newdecl, tree olddecl)
}
}
 
-  if (deduction_guide_p (olddecl)
- && deduction_guide_p (newdecl))
-   return G_("deduction guide %q+D redeclared");
-
   /* [class.compare.default]: A definition of a comparison operator as
 defaulted that appears in a class shall be the first declaration of
 that function.  */
@@ -3355,10 +3351,6 @@ redeclaration_error_message (tree newdecl, tree olddecl)
}
}
 
-  if (deduction_guide_p (olddecl)
- && deduction_guide_p (newdecl))
-   return G_("deduction guide %q+D redeclared");
-
   /* Core issue #226 (C++11):
 
If a friend function template declaration specifies a
@@ -10352,6 +10344,12 @@ grokfndecl (tree ctype,
   DECL_CXX_DESTRUCTOR_P (decl) = 1;
   DECL_NAME (decl) = dtor_identifier;
   break;
+case sfk_deduction_guide:
+  /* Give deduction guides a definition even though they don't really
+have one: the restriction that you can't repeat a deduction guide
+makes them more like a definition anyway.  */
+  DECL_INITIAL (decl) = void_node;
+  break;
 default:
   break;
 }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C
new file mode 100644
index 000..00f6d5fef41
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C
@@ -0,0 +1,8 @@
+// PR c++/10660

[PATCH V2] VECT: Fix ICE on MASK_LEN_{LOAD, STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Juzhe-Zhong
This ICE is caused because of this situation:

mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
...
vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, 
POLY_INT_CST [2, 2], 0);

The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison 
wheras the LEN
is the dummy LEN which is the vectorization factor.

In this situation, we didn't enter 'vect_record_loop_len' since there is no LEN 
loop control.
Then 'LOOP_VINFO_RGROUP_IV_TYPE' is not suitable type for 'build_int_cst' used 
for producing
LEN argument for 'MASK_LEN_LOAD', so use sizetype instead which is perfectly 
matching
RVV length requirement.

PR middle-end/110989

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_store): Replace iv_type with 
sizetype.
(vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr110989.c: New test.

---
 .../gcc.target/riscv/rvv/autovec/pr110989.c  | 11 +++
 gcc/tree-vect-stmts.cc   | 12 +++-
 2 files changed, 14 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
new file mode 100644
index 000..cf3b247e604
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=scalable -Ofast" } */
+
+int a, b, c;
+double *d;
+void e() {
+  double f;
+  for (; c; c++, d--)
+f = *d ?: *(&a + c);
+  b = f;
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 398fbe945e5..89607a98f99 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -9125,11 +9125,8 @@ vectorizable_store (vec_info *vinfo,
  if (!final_len)
{
  /* Pass VF value to 'len' argument of
-MASK_LEN_STORE if LOOP_LENS is invalid.  */
- tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
- final_len
-   = build_int_cst (iv_type,
-TYPE_VECTOR_SUBPARTS (vectype));
+MASK_LEN_STORE if LOOP_LENS is invalid.  */
+ final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype));
}
  if (!final_mask)
{
@@ -10713,11 +10710,8 @@ vectorizable_load (vec_info *vinfo,
  {
/* Pass VF value to 'len' argument of
   MASK_LEN_LOAD if LOOP_LENS is invalid.  */
-   tree iv_type
- = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
final_len
- = build_int_cst (iv_type,
-  TYPE_VECTOR_SUBPARTS (vectype));
+ = size_int (TYPE_VECTOR_SUBPARTS (vectype));
  }
if (!final_mask)
  {
-- 
2.36.3



Re: [PATCH V2] VECT: Fix ICE on MASK_LEN_{LOAD,STORE} when no LEN recorded[PR110989]

2023-08-11 Thread Richard Biener via Gcc-patches
On Fri, 11 Aug 2023, Juzhe-Zhong wrote:

> This ICE is caused because of this situation:
> 
> mask__49.21_99 = vect__17.19_96 == { 0.0, ... };
> ...
> vect__6.24_107 = .MASK_LEN_LOAD (vectp.22_105, 32B, mask__49.21_99, 
> POLY_INT_CST [2, 2], 0);
> 
> The MASK_LEN_LOAD is using real MASK which is produced by the EQ comparison 
> wheras the LEN
> is the dummy LEN which is the vectorization factor.
> 
> In this situation, we didn't enter 'vect_record_loop_len' since there is no 
> LEN loop control.
> Then 'LOOP_VINFO_RGROUP_IV_TYPE' is not suitable type for 'build_int_cst' 
> used for producing
> LEN argument for 'MASK_LEN_LOAD', so use sizetype instead which is perfectly 
> matching
> RVV length requirement.

OK.

> PR middle-end/110989
> 
> gcc/ChangeLog:
> 
> * tree-vect-stmts.cc (vectorizable_store): Replace iv_type with 
> sizetype.
> (vectorizable_load): Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/pr110989.c: New test.
> 
> ---
>  .../gcc.target/riscv/rvv/autovec/pr110989.c  | 11 +++
>  gcc/tree-vect-stmts.cc   | 12 +++-
>  2 files changed, 14 insertions(+), 9 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> new file mode 100644
> index 000..cf3b247e604
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr110989.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d 
> --param=riscv-autovec-preference=scalable -Ofast" } */
> +
> +int a, b, c;
> +double *d;
> +void e() {
> +  double f;
> +  for (; c; c++, d--)
> +f = *d ?: *(&a + c);
> +  b = f;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 398fbe945e5..89607a98f99 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -9125,11 +9125,8 @@ vectorizable_store (vec_info *vinfo,
> if (!final_len)
>   {
> /* Pass VF value to 'len' argument of
> -  MASK_LEN_STORE if LOOP_LENS is invalid.  */
> -   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> -   final_len
> - = build_int_cst (iv_type,
> -  TYPE_VECTOR_SUBPARTS (vectype));
> +  MASK_LEN_STORE if LOOP_LENS is invalid.  */
> +   final_len = size_int (TYPE_VECTOR_SUBPARTS (vectype));
>   }
> if (!final_mask)
>   {
> @@ -10713,11 +10710,8 @@ vectorizable_load (vec_info *vinfo,
> {
>   /* Pass VF value to 'len' argument of
>  MASK_LEN_LOAD if LOOP_LENS is invalid.  */
> - tree iv_type
> -   = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>   final_len
> -   = build_int_cst (iv_type,
> -TYPE_VECTOR_SUBPARTS (vectype));
> +   = size_int (TYPE_VECTOR_SUBPARTS (vectype));
> }
>   if (!final_mask)
> {
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 03:51, Richard Biener via Gcc-patches wrote:

On Fri, Aug 11, 2023 at 11:17 AM Andrew Pinski via Gcc-patches
 wrote:


So it turns out there was a simplier way of starting to
improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
That was rewrite test_for_singularity to use range_op_handler
and Value_Range.

This patch implements that and

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.


I'm hoping Andrew/Aldy can have a look here.
It's actually pretty simple stuff.  Instead of open-coding range 
identification for op1 and simplification of that range using op0's 
known range (VR), instead we generate a real range for op1 and intersect 
that result with the known range for op0.  If the result is a singleton, 
return it.  Simpler and more effective in the end.


I guess the interactions with the warning subsystem are a non-issue in 
the updated code since it doesn't return an expression, but the 
singleton value (when in the hell did it start returning an expression, 
that just seems wrong given the result is supposed to be a singleton!)


LGTM.

Jeff


Re: [PATCH] c++: bogus warning w/ deduction guide in anon ns [PR106604]

2023-08-11 Thread Jason Merrill via Gcc-patches

On 8/11/23 09:54, Patrick Palka wrote:

On Thu, 10 Aug 2023, Jason Merrill wrote:


On 8/10/23 16:40, Patrick Palka wrote:

On Thu, 10 Aug 2023, Jason Merrill wrote:


On 8/10/23 12:09, Patrick Palka wrote:

Booststrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for
trunk and perhaps 13?

-- >8 --

We shouldn't issue a "declared static but never defined" warning
for a deduction guide (declared in an anonymous namespace).

PR c++/106604

gcc/cp/ChangeLog:

* decl.cc (wrapup_namespace_globals): Don't issue a
-Wunused-function warning for a deduction guide.


Maybe instead of special casing this here we could set DECL_INITIAL on
deduction guides so they look defined?


That seems to work, but it requires some tweaks in duplicate_decls to keep
saying "declared" instead of "defined" when diagnosing a deduction guide
redeclaration.  I'm not sure which approach is preferable?


I'm not sure it matters which we say; the restriction that you can't repeat a
deduction guide makes it more like a definition anyway (even if [basic.def]
disagrees).  Is the diagnostic worse apart from that word?


Ah, makes sense.  So we can also remove the special case for them in the
redeclaration checking code after we give them a dummy DECL_INITIAL.
Like so?


OK, thanks.


Here's a before/after for the diagnostic with the below patch:

Before

src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:11:1: error: deduction guide 
‘S()-> S’ redeclared
11 | S() -> S; // { dg-error "redefinition" }
   | ^
src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:10:1: note: ‘S()-> S’ 
previously declared here
10 | S() -> S; // { dg-message "previously defined here|old 
declaration" }
   | ^

After

src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:11:1: error: redefinition of 
‘S()-> S’
11 | S() -> S; // { dg-error "redefinition" }
   | ^
src/gcc/testsuite/g++.dg/cpp1z/class-deduction74.C:10:1: note: ‘S()-> S’ 
previously defined here
10 | S() -> S; // { dg-message "previously defined here|old 
declaration" }
   | ^

-- >8 --

Subject: [PATCH] c++: bogus warning w/ deduction guide in anon ns [PR106604]

Here we're unintentionally issuing a "declared static but never defined"
warning for a deduction guide declared in an anonymous namespace.
This patch fixes this by giving deduction guides a dummy DECL_INITIAL,
which suppresses the warning and also allows us to simplify redeclaration
checking for them.

Co-authored-by: Jason Merrill 

PR c++/106604

gcc/cp/ChangeLog:

* decl.cc (redeclaration_error_message): Remove special handling
for deduction guides.
(grokfndecl): Give deduction guides a dummy DECL_INITIAL.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction74.C: Expect "defined" instead
of "declared" in diagnostics for a repeated deduction guide.
* g++.dg/cpp1z/class-deduction116.C: New test.
---
  gcc/cp/decl.cc  | 14 ++
  gcc/testsuite/g++.dg/cpp1z/class-deduction116.C |  8 
  gcc/testsuite/g++.dg/cpp1z/class-deduction74.C  | 14 +++---
  3 files changed, 21 insertions(+), 15 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction116.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 792ab330dd0..3ada5516c58 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -3297,10 +3297,6 @@ redeclaration_error_message (tree newdecl, tree olddecl)
}
}
  
-  if (deduction_guide_p (olddecl)

- && deduction_guide_p (newdecl))
-   return G_("deduction guide %q+D redeclared");
-
/* [class.compare.default]: A definition of a comparison operator as
 defaulted that appears in a class shall be the first declaration of
 that function.  */
@@ -3355,10 +3351,6 @@ redeclaration_error_message (tree newdecl, tree olddecl)
}
}
  
-  if (deduction_guide_p (olddecl)

- && deduction_guide_p (newdecl))
-   return G_("deduction guide %q+D redeclared");
-
/* Core issue #226 (C++11):
  
 If a friend function template declaration specifies a

@@ -10352,6 +10344,12 @@ grokfndecl (tree ctype,
DECL_CXX_DESTRUCTOR_P (decl) = 1;
DECL_NAME (decl) = dtor_identifier;
break;
+case sfk_deduction_guide:
+  /* Give deduction guides a definition even though they don't really
+have one: the restriction that you can't repeat a deduction guide
+makes them more like a definition anyway.  */
+  DECL_INITIAL (decl) = void_node;
+  break;
  default:
break;
  }
diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C 
b/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C
new file mode 100644
index 000..00f6d5fef41
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction116.C
@@ -0,0 +1,8 @@
+// PR c++/106604
+// { dg-do compile { target c++17 } }
+// { dg-additional-options "-Wunused-functio

[committed] libstdc++: Revert accidentally committed change to bits/stl_iterator.h

2023-08-11 Thread Jonathan Wakely via Gcc-patches
As promised yesterday, this reverts the part of the change I didn't mean
to commit. Tested x86_64-linux. Pushed to trunk.

-- >8 --

In commit r14-3134-g9cb2a7c8d54b1f I only meant to change some uses of
__clamp_iter_cat to use __iter_category_t, I didn't mean to commit the
additional change introducing __clamped_iter_cat_t. This reverts that
part.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__clamped_iter_cat_t): Remove.
---
 libstdc++-v3/include/bits/stl_iterator.h | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index d5ba05f3e22..b13f4f8ddbf 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -103,10 +103,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __clamp_iter_cat
= __conditional_t, _Limit, _Otherwise>;
 
-template
-  using __clamped_iter_cat_t
-   = __clamp_iter_cat<__iter_category_t<_Iter>, _Limit>;
-
 template
   concept __different_from
= !same_as, remove_cvref_t<_Up>>;
@@ -172,7 +168,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  random_access_iterator_tag,
  bidirectional_iterator_tag>;
   using iterator_category
-   = __detail::__clamped_iter_cat_t<_Iterator, random_access_iterator_tag>;
+   = __detail::__clamp_iter_cat;
   using value_type = iter_value_t<_Iterator>;
   using difference_type = iter_difference_t<_Iterator>;
   using reference = iter_reference_t<_Iterator>;
@@ -1433,7 +1430,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   struct __move_iter_cat<_Iterator>
   {
using iterator_category
- = __clamped_iter_cat_t<_Iterator, random_access_iterator_tag>;
+ = __clamp_iter_cat<__iter_category_t<_Iterator>,
+random_access_iterator_tag>;
   };
 #endif
   }
-- 
2.41.0



[committed] libstdc++: Handle invalid values in std::chrono pretty printers

2023-08-11 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, Pushed to trunk. I'll backport this to gcc-13 too.

-- >8 --

This avoids an IndexError exception when printing invalid chrono::month
or chrono::weekday values.

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (StdChronoCalendarPrinter):
Check for out-of-range month an weekday indices.
* testsuite/libstdc++-prettyprinters/chrono.cc: Check invalid
month and weekday values.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py  | 7 ++-
 libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc | 7 +++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index b4c427d487c..0187c4b60e6 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2021,11 +2021,16 @@ class StdChronoCalendarPrinter:
 if typ == 'std::chrono::day':
 return '{}'.format(int(val['_M_d']))
 if typ == 'std::chrono::month':
+if m < 1 or m >= len(months):
+return "%d is not a valid month" % m
 return months[m]
 if typ == 'std::chrono::year':
 return '{}y'.format(y)
 if typ == 'std::chrono::weekday':
-return '{}'.format(weekdays[val['_M_wd']])
+wd = val['_M_wd']
+if wd < 0 or wd >= len(weekdays):
+return "%d is not a valid weekday" % wd
+return '{}'.format(weekdays[wd])
 if typ == 'std::chrono::weekday_indexed':
 return '{}[{}]'.format(val['_M_wd'], int(val['_M_index']))
 if typ == 'std::chrono::weekday_last':
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc
index b5314e025cc..9aa284aea2f 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/chrono.cc
@@ -75,6 +75,13 @@ main()
   [[maybe_unused]] year_month_weekday_last donnerstag = 
2017y/July/Thursday[last];
   // { dg-final { note-test donnerstag {2017y/July/Thursday[last]} } }
 
+  [[maybe_unused]] month nam(13);
+  // { dg-final { note-test nam {13 is not a valid month} } }
+  [[maybe_unused]] month nam0(0);
+  // { dg-final { note-test nam0 {0 is not a valid month} } }
+  [[maybe_unused]] weekday nawd(8);
+  // { dg-final { note-test nawd {8 is not a valid weekday} } }
+  //
   hh_mm_ss hms(4h + 3min + 2s);
   // { dg-final { note-test hms {04:03:02} } }
 
-- 
2.41.0



Re: [PATCH] RISC-V: Revive test case PR 102957

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 03:11, Tsukasa OI via Gcc-patches wrote:

From: Tsukasa OI 

Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
extensions") changed how do we handle unknown extensions and
commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic messages
in testsuite") "fixed" test failures caused by that change (on pr102957.c,
by testing the error message after the first change).

However, the latter change will break the original intent of PR 102957 test
case because we wanted to make sure that we can parse a valid two-letter
extension name.

Fortunately, there is a valid two-letter extension name, 'Zk' (standard
scalar cryptography extension superset with NIST algorithm suite).

This commit puts this extension name and revives the intent of the test case
for PR 102957.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr102957.c: Remove "dg-error" because we don't
need to test for error message.  Use the 'Zk' extension to continue
testing whether we can use valid two-letter extensions.
This doesn't look right to me.  The whole point of this specific dg line 
is to verify that we get an error with an invalid extension specification.


What might make more sense would be to split this into two tests.  One 
which continues to test that we get an error for something like zb and 
the other with everything else.


jeff


Re: [v2 PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-11 Thread Shung-Hsi Yu via Gcc-patches
Hi,

Thanks for working on the BPF backend!

I noticed a tiny typo while test compiling libbpf-tools[1]. (Have yet look
into the cause of failure in detail though)

On Thu, Aug 03, 2023 at 10:54:31AM +0100, Cupertino Miranda wrote:
> [snip]
> +
> +pack_type_fail:
> +  bpf_error_at (EXPR_LOC_OR_LOC (args[0], UNKNOWN_LOCATION),
> + "invelid first argument format for enum value builtin");
 ^^^

> +  ret.fail = true;
> +  return ret;
> +}
> +
> [snip]

Thanks,
Shung-Hsi

1: https://github.com/iovisor/bcc/tree/master/libbpf-tools


Re: [PATCH] ipa-sra: Don't consider CLOBBERS as writes preventing splitting

2023-08-11 Thread Christophe Lyon via Gcc-patches
On Fri, 11 Aug 2023 at 15:50, Martin Jambor  wrote:

> Hello,
>
> On Fri, Aug 11 2023, Christophe Lyon wrote:
> > Hi Martin,
> >
> >
> > On Fri, 4 Aug 2023 at 18:26, Martin Jambor  wrote:
> >
> >> Hello,
> >>
> >> On Wed, Aug 02 2023, Richard Biener wrote:
> >> > On Mon, Jul 31, 2023 at 7:05 PM Martin Jambor 
> wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> when IPA-SRA detects whether a parameter passed by reference is
> >> >> written to, it does not special case CLOBBERs which means it often
> >> >> bails out unnecessarily, especially when dealing with C++
> destructors.
> >> >> Fixed by the obvious continue in the two relevant loops.
> >> >>
> >> >> The (slightly) more complex testcases in the PR need surprisingly
> more
> >> >> effort but the simple one can be fixed now easily by this patch and
> I'll
> >> >> work on the others incrementally.
> >> >>
> >> >> Bootstrapped and currently undergoing testsuite run on
> x86_64-linux.  OK
> >> >> if it passes too?
> >> >
> >> > LGTM, btw - how are the clobbers handled during transform?
> >>
> >> it turns out your question is spot on.  I assumed that the mini-DCE that
> >> I implemented into IPA-SRA transform would delete but I had a closer
> >> look and it is not invoked on split parameters,only on removed ones.
> >> What was actually happening is that the parameter got remapped to a
> >> default definition of a replacement VAR_DECL and we were thus
> >> gimple-clobbering a pointer pointing to nowhere.  The clobber then got
> >> DSEd and so I originally did not notice looking at the optimized dump.
> >>
> >> Still that is of course not ideal and so I added a simple function
> >> removing clobbers when splitting.  I as considering adding that
> >> functionality to ipa_param_body_adjustments::mark_dead_statements but
> >> that would make the function harder to read without much gain.
> >>
> >> So thanks again for the remark.  The following passes bootstrap and
> >> testing on x86_64-linux.  I am running LTO bootstrap now.  OK if it
> >> passes?
> >>
> >> Martin
> >>
> >>
> >>
> >> When IPA-SRA detects whether a parameter passed by reference is
> >> written to, it does not special case CLOBBERs which means it often
> >> bails out unnecessarily, especially when dealing with C++ destructors.
> >> Fixed by the obvious continue in the two relevant loops and by adding
> >> a simple function that marks the clobbers in the transformation code
> >> as statements to be removed.
> >>
> >>
> > Not sure if you noticed: I updated bugzilla because the new test fails on
> > arm, and I attached  pr110378-1.C.083i.sra there, to help you debug.
> >
>
> I am aware and have actually started looking at the issue a while ago.
> Sorry, I'm only slowly making my way through my TODO list.
>
No worries, thanks for confirming you are aware of the problem ;-)


>
> The difference on 32bit ARM is that the destructor return this pointer,
> which means that IPA-SRA cannot just split the loaded bit - without any
> follow-up IPA analysis that the return value is unused which it does not
> take into account this way.  But now that we remove useless returns
> before splitting it should be doable.
>
> Meanwhile, is there a dejagnu target macro for architectures with
> destructors returning value so that we could xfail the test there?
>
I'm not aware of any at quick glance


>
> Thanks for bringing my attention to this.
>
> Martin
>
>
Thanks,

Christophe


>
>
> > Thanks,
> >
> > Christophe
> >
> > gcc/ChangeLog:
> >>
> >> 2023-08-04  Martin Jambor  
> >>
> >> PR ipa/110378
> >> * ipa-param-manipulation.h (class ipa_param_body_adjustments):
> New
> >> members get_ddef_if_exists_and_is_used and mark_clobbers_dead.
> >> * ipa-sra.cc (isra_track_scalar_value_uses): Ignore clobbers.
> >> (ptr_parm_has_nonarg_uses): Likewise.
> >> * ipa-param-manipulation.cc
> >> (ipa_param_body_adjustments::get_ddef_if_exists_and_is_used):
> New.
> >> (ipa_param_body_adjustments::mark_dead_statements): Move initial
> >> checks to get_ddef_if_exists_and_is_used.
> >> (ipa_param_body_adjustments::mark_clobbers_dead): New.
> >> (ipa_param_body_adjustments::common_initialization): Call
> >> mark_clobbers_dead when splitting.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2023-07-31  Martin Jambor  
> >>
> >> PR ipa/110378
> >> * g++.dg/ipa/pr110378-1.C: New test.
> >> ---
> >>  gcc/ipa-param-manipulation.cc | 44 +---
> >>  gcc/ipa-param-manipulation.h  |  2 ++
> >>  gcc/ipa-sra.cc|  6 ++--
> >>  gcc/testsuite/g++.dg/ipa/pr110378-1.C | 48 +++
> >>  4 files changed, 94 insertions(+), 6 deletions(-)
> >>  create mode 100644 gcc/testsuite/g++.dg/ipa/pr110378-1.C
> >>
> >> diff --git a/gcc/ipa-param-manipulation.cc
> b/gcc/ipa-param-manipulation.cc
> >> index a286af7f5d9..4a185ddbdf4 100644
> >> --- a/gcc/ipa-param-manipulation.cc
> >> +++ b

[PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch add support live vectorization by VEC_EXTRACT for LEN loop control.

Consider this following case:

#include 

#define EXTRACT_LAST(TYPE)  \
  TYPE __attribute__ ((noinline, noclone))  \
  test_##TYPE (TYPE *x, int n, TYPE value)  \
  { \
TYPE last;  \
for (int j = 0; j < n; ++j) \
  { \
last = x[j];\
x[j] = last * value;\
  } \
return last;\
  }

#define TEST_ALL(T) \
  T (uint8_t)   \

TEST_ALL (EXTRACT_LAST)

ARM SVE IR:

Preheader:
  max_mask_34 = .WHILE_ULT (0, bnd.5_6, { 0, ... });

Loop:
  ...
  # loop_mask_22 = PHI 
  ...
  vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_mask_22);
  vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
  .MASK_STORE (_7, 8B, loop_mask_22, vect__4.9_27);
  ...
  next_mask_35 = .WHILE_ULT (_1, bnd.5_6, { 0, ... });
  ...

Epilogue:
  _25 = .EXTRACT_LAST (loop_mask_22, vect_last_12.8_23);

For RVV since we prefer len in loop control, after this patch for RVV:

Loop:
  ...
  loop_len_22 = SELECT_VL;
  vect_last_12.8_23 = .MASK_LOAD (_7, 8B, loop_len_22);
  vect__4.9_27 = vect_last_12.8_23 * vect_cst__26;
  .MASK_STORE (_7, 8B, loop_len_22, vect__4.9_27);
  ...

Epilogue:
  _25 = .VEC_EXTRACT (loop_len_22 + bias - 1, vect_last_12.8_23);

Details of this approach:

1. Step 1 - Add 'vect_can_vectorize_extract_last_with_len_p'  to enable live 
vectorization
for LEN loop control.
   
   This function we check whether target support:
- Use LEN as the loop control.
- Support VEC_EXTRACT optab.

2. Step 2 - Record LEN for loop control if 
'vect_can_vectorize_extract_last_with_len_p' is true.

3. Step 3 - Gerenate VEC_EXTRACT (v, LEN + BIAS - 1).

The only difference between mask and len is that len is using length generated 
by SELECT_VL and
use VEC_EXTRACT pattern. The rest of the live vectorization is totally the same 
ARM SVE.

gcc/ChangeLog:

* tree-vect-loop.cc (vectorizable_live_operation): Add loop len control.

---
 gcc/tree-vect-loop.cc | 78 ++-
 1 file changed, 62 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..a011e2dacb2 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10278,17 +10278,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* No transformation required.  */
   if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
- if (!direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
-  OPTIMIZE_FOR_SPEED))
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"can't operate on partial vectors "
-"because the target doesn't support extract "
-"last reduction.\n");
- LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
-   }
- else if (slp_node)
+ if (slp_node)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -10308,9 +10298,28 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
  else
{
  gcc_assert (ncopies == 1 && !slp_node);
- vect_record_loop_mask (loop_vinfo,
-&LOOP_VINFO_MASKS (loop_vinfo),
-1, vectype, NULL);
+ if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
+ OPTIMIZE_FOR_SPEED))
+   vect_record_loop_mask (loop_vinfo,
+  &LOOP_VINFO_MASKS (loop_vinfo),
+  1, vectype, NULL);
+ else if (convert_optab_handler (vec_extract_optab,
+ TYPE_MODE (vectype),
+ TYPE_MODE (TREE_TYPE (vectype)))
+  != CODE_FOR_nothing)
+   vect_record_loop_len (loop_vinfo,
+ &LOOP_VINFO_LENS (loop_vinfo),
+ 1, vectype, 1);
+ else
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (
+ MSG_MISSED_OPTIMIZATION, vect_location,
+ "can't operate on partial vectors "
+ "because the target doesn't support extract "
+

Re: [PATCH] RISC-V: Revive test case PR 102957

2023-08-11 Thread Tsukasa OI via Gcc-patches
On 2023/08/11 23:15, Jeff Law wrote:
> 
> 
> On 8/11/23 03:11, Tsukasa OI via Gcc-patches wrote:
>> From: Tsukasa OI 
>>
>> Commit c283c4774d1c ("RISC-V: Throw compilation error for unknown
>> extensions") changed how do we handle unknown extensions and
>> commit 6f709f79c915a ("[committed] [RISC-V] Fix expected diagnostic
>> messages
>> in testsuite") "fixed" test failures caused by that change (on
>> pr102957.c,
>> by testing the error message after the first change).
>>
>> However, the latter change will break the original intent of PR 102957
>> test
>> case because we wanted to make sure that we can parse a valid two-letter
>> extension name.
>>
>> Fortunately, there is a valid two-letter extension name, 'Zk' (standard
>> scalar cryptography extension superset with NIST algorithm suite).
>>
>> This commit puts this extension name and revives the intent of the
>> test case
>> for PR 102957.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr102957.c: Remove "dg-error" because we don't
>> need to test for error message.  Use the 'Zk' extension to continue
>> testing whether we can use valid two-letter extensions.
> This doesn't look right to me.  The whole point of this specific dg line
> is to verify that we get an error with an invalid extension specification.
> 
> What might make more sense would be to split this into two tests.  One
> which continues to test that we get an error for something like zb and
> the other with everything else.
> 
> jeff
> 

Originally, it tested that a two letter extension ('Zb') is accepted by
GCC (because the background of PR 102957 was GCC assumed multi-letter
'Z' extensions are three letters or more).

After rejecting unrecognized extensions, "dg-error" is added **just to
avoid the test failure** and that doesn't look right.  Yes, we now don't
have an ICE (like in the original report) but after the PR 102957 fix,
we just accepted it, not rejecting it.

Instead, we have a valid (recognized) two-letter 'Z' extension: 'Zk'.  I
think replacing "zb" with "zk" is more correct considering the original
bug report (PR 102957) and its assumption.

cf. 

Regards,
Tsukasa


Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar

On 2023-08-10 14:50, Siddhesh Poyarekar wrote:

  As a result, the only case for a potential security issue in all
  these cases is when it ends up generating vulnerable output for
  valid input source code.


I think this leaves open the interpretation "every wrong code bug
is potentially a security bug".  I suppose that's true in a trite sense,
but not in a useful sense.  As others said earlier in the thread,
whether a wrong code bug in GCC leads to a security bug in the object
code is too application-dependent to be a useful classification for GCC.

I think we should explicitly say that we don't generally consider wrong
code bugs to be security bugs.  Leaving it implicit is bound to lead
to misunderstanding.


I see what you mean, but the context-dependence of a bug is something 
GCC will have to deal with, similar to how libraries have to deal with 
bugs.  But I agree this probably needs some more expansion.  Let me try 
and come up with something more detailed for that last paragraph.


How's this:

As a result, the only case for a potential security issue in the 
compiler is when it generates vulnerable application code for valid, 
trusted input source code.  The output application code could be 
considered vulnerable if it produces an actual vulnerability in the 
target application, specifically in the following cases:


- The application dereferences an invalid memory location despite the 
application sources being valid.


- The application reads from or writes to a valid but incorrect memory 
location, resulting in an information integrity issue or an information 
leak.


- The application ends up running in an infinite loop or with severe 
degradation in performance despite the input sources having no such 
issue, resulting in a Denial of Service.  Note that correct but 
non-performant code is not a security issue candidate, this only applies 
to incorrect code that may result in performance degradation.


- The application crashes due to the generated incorrect code, resulting 
in a Denial of Service.




Re: [PATCH] RISC-V: Fix vec_series expander[PR110985]

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 02:45, Juzhe-Zhong wrote:

This patch fix bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110985

PR target/110985

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Refactor the expander.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/pr110985.c: New test.
OK.  The wording on the ChangeLog could perhaps be improved -- typically 
when one says "refactor" there's not supposed to be a functional change. 
 So perhaps "Refactor the expander and don't lose final assignment" or 
something like that.


Also it's generally useful to reviewers to explain the core problem.  I 
can guess from the BZ that we lost an assignment and I can speculate it 
was a case when the destination wasn't initially a pseudo.  The 
refactoring ensured that the sequence always stores into a pseudo and if 
that pseudo is not the same as the ultimate target, then we copy from 
the pseudo to the ultimate target when expansion is done.


jeff


Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 03:01, Lehua Ding wrote:

Hi,

This patch revert the convert from vmv.s.x to vmv.v.i and add new pattern
optimize the special case when the scalar operand is zero.

Currently, the broadcast pattern where the scalar operand is a imm
will be converted to vmv.v.i from vmv.s.x and the mask operand will be
converted from 00..01 to 11..11. There are some advantages and
disadvantages before and after the conversion after discussing
with Juzhe offline and we chose not to do this transform.

Before:

   Advantages: The vsetvli info required by vmv.s.x has better compatibility 
since
   vmv.s.x only required SEW and VLEN be zero or one. That mean there
   is more opportunities to combine with other vsetlv infos in vsetvl pass.

   Disadvantages: For non-zero scalar imm, one more `li rd, imm` instruction
   will be needed.

After:

   Advantages: No need `li rd, imm` instruction since vmv.v.i support imm 
operand.

   Disadvantages: Like before's advantages. Worse compatibility leads to more
   vsetvl instrunctions need.
I can't speak for other uarches, but as a guiding principle for Ventana 
we're assuming vsetvl instructions are common and as a result need to be 
very cheap in hardware.   It's likely a good tradeoff for us.


I could see other uarches making different design choices though.  So at 
a high level, do we want this to be driven by cost modeling in some way?


Not a review yet.  Wanted to get that feedback to you now since the rest 
of my day is going to be fairly busy.


jeff


Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-08-11 Thread Andrew MacLeod via Gcc-patches



On 8/11/23 05:51, Richard Biener wrote:

On Fri, Aug 11, 2023 at 11:17 AM Andrew Pinski via Gcc-patches
 wrote:

So it turns out there was a simplier way of starting to
improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
That was rewrite test_for_singularity to use range_op_handler
and Value_Range.

This patch implements that and

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

I'm hoping Andrew/Aldy can have a look here.

Richard.


gcc/ChangeLog:

 * vr-values.cc (test_for_singularity): Add edge argument
 and rewrite using range_op_handler.
 (simplify_compare_using_range_pairs): Use Value_Range
 instead of value_range and update test_for_singularity call.

gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/vrp124.c: New test.
 * gcc.dg/tree-ssa/vrp125.c: New test.
---
  gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
  gcc/testsuite/gcc.dg/tree-ssa/vrp125.c | 44 +
  gcc/vr-values.cc   | 91 --
  3 files changed, 114 insertions(+), 65 deletions(-)
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp125.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
new file mode 100644
index 000..6ccbda35d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a >= 0)
+;
+  else
+return 0;
+  return a < 0;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a > 100)
+;
+  else
+return 0;
+  return a < 100;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
new file mode 100644
index 000..f6c2f8e35f1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Should be optimized to a == -100 */
+int g(int a)
+{
+  if (a == -100 || a == -50 || a >= 0)
+;
+  else
+return 0;
+  return a < -50;
+}
+
+/* Should optimize to a == 0 */
+int f(int a)
+{
+  if (a == 0 || a == 50 || a > 100)
+;
+  else
+return 0;
+  return a < 50;
+}
+
+/* Should be optimized to a == 0. */
+int f2(int a)
+{
+  if (a == 0 || a == 50 || a > 100)
+;
+  else
+return 0;
+  return a < 25;
+}
+
+/* Should optimize to a == 100 */
+int f1(int a)
+{
+  if (a < 0 || a == 50 || a == 100)
+;
+  else
+return 0;
+  return a > 50;
+}
+
+/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
index a4fddd62841..7004b0224bd 100644
--- a/gcc/vr-values.cc
+++ b/gcc/vr-values.cc
@@ -907,66 +907,30 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
 a known value range VR.

 If there is one and only one value which will satisfy the
-   conditional, then return that value.  Else return NULL.
-
-   If signed overflow must be undefined for the value to satisfy
-   the conditional, then set *STRICT_OVERFLOW_P to true.  */
+   conditional on the EDGE, then return that value.
+   Else return NULL.  */

  static tree
  test_for_singularity (enum tree_code cond_code, tree op0,
- tree op1, const value_range *vr)
+ tree op1, Value_Range vr, bool edge)


VR should be a "vrange &".   THis is the top level base class for all 
ranges of all types/kinds, and what we usually pass values around as if 
we want tohem to be any kind.   If this is inetger only, we'd pass a an 
'irange &'


Value_Range is the opposite. Its the sink that contains one of each kind 
of range and can switch around between them as needed. You do not want 
to pass that by value!   The generic engine uses these so it can suppose 
floats. int, pointers, whatever...



  {
-  tree min = NULL;
-  tree max = NULL;
-
-  /* Extract minimum/maximum values which satisfy the conditional as it was
- written.  */
-  if (cond_code == LE_EXPR || cond_code == LT_EXPR)
+  /* This is already a singularity.  */
+  if (cond_code == NE_EXPR || cond_code == EQ_EXPR)
+return NULL;
+  auto range_op = range_op_handler (cond_code);
+  int_range<2> op1_range (TREE_TYPE (op0));
+  wide_int w = wi::to_wide (op1);
+  op1_range.set (TREE_TYPE (op1), w, w);


If this is only going to work with integers, you might want to check 
that somewhere or switch to irange and int_range_max..


You can make it work with any kind (if you know op1 is a constant) by 
simpl

Re: [RFC] GCC Security policy

2023-08-11 Thread Paul Koning via Gcc-patches



> On Aug 11, 2023, at 10:36 AM, Siddhesh Poyarekar  wrote:
> 
> On 2023-08-10 14:50, Siddhesh Poyarekar wrote:
   As a result, the only case for a potential security issue in all
   these cases is when it ends up generating vulnerable output for
   valid input source code.
>>> 
>>> I think this leaves open the interpretation "every wrong code bug
>>> is potentially a security bug".  I suppose that's true in a trite sense,
>>> but not in a useful sense.  As others said earlier in the thread,
>>> whether a wrong code bug in GCC leads to a security bug in the object
>>> code is too application-dependent to be a useful classification for GCC.
>>> 
>>> I think we should explicitly say that we don't generally consider wrong
>>> code bugs to be security bugs.  Leaving it implicit is bound to lead
>>> to misunderstanding.
>> I see what you mean, but the context-dependence of a bug is something GCC 
>> will have to deal with, similar to how libraries have to deal with bugs.  
>> But I agree this probably needs some more expansion.  Let me try and come up 
>> with something more detailed for that last paragraph.
> 
> How's this:
> 
> As a result, the only case for a potential security issue in the compiler is 
> when it generates vulnerable application code for valid, trusted input source 
> code.  The output application code could be considered vulnerable if it 
> produces an actual vulnerability in the target application, specifically in 
> the following cases:

You might make it explicit that we're talking about wrong code errors here -- 
in other words, the source code is correct (conforms to the standard) and the 
algorithm expressed in the source code does not have a vulnerability, but the 
generated code has semantics that differ from those of the source code such 
that it does have a vulnerability.

> - The application dereferences an invalid memory location despite the 
> application sources being valid.
> 
> - The application reads from or writes to a valid but incorrect memory 
> location, resulting in an information integrity issue or an information leak.
> 
> - The application ends up running in an infinite loop or with severe 
> degradation in performance despite the input sources having no such issue, 
> resulting in a Denial of Service.  Note that correct but non-performant code 
> is not a security issue candidate, this only applies to incorrect code that 
> may result in performance degradation.

The last sentence somewhat contradicts the preceding one.  Perhaps "...may 
result in performance degradation severe enough to amount to a denial of 
service".

> - The application crashes due to the generated incorrect code, resulting in a 
> Denial of Service.

paul



Re: [PATCH] Use strtol instead of std::stoi in gensupport.cc

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/10/23 13:27, John David Anglin wrote:

Ping.

On 2023-07-19 2:59 p.m., John David Anglin wrote:

Tested on trunk with hppa64-hp-hpux11.11.

Okay?

Dave
---

Use strtol instead of std::stoi [PR110646]

Implementation of std::stoi was overlooked on hppa-hpux, so use
strtol instead.

2023-07-19  John David Anglin  

gcc/ChangeLog:

PR bootstrap/110646
* gensupport.cc(class conlist): Use strtol instead of std::stoi.

OK.  Sorry this got missed.

jeff


Re: [RFC] GCC Security policy

2023-08-11 Thread David Edelsohn via Gcc-patches
On Wed, Aug 9, 2023 at 1:33 PM Siddhesh Poyarekar 
wrote:

> On 2023-08-08 10:30, Siddhesh Poyarekar wrote:
> >> Do you have a suggestion for the language to address libgcc,
> >> libstdc++, etc. and libiberty, libbacktrace, etc.?
> >
> > I'll work on this a bit and share a draft.
>
> Hi David,
>
> Here's what I came up with for different parts of GCC, including the
> runtime libraries.  Over time we may find that specific parts of runtime
> libraries simply cannot be used safely in some contexts and flag that.
>
> Sid
>
> """
> What is a GCC security bug?
> ===
>
>  A security bug is one that threatens the security of a system or
>  network, or might compromise the security of data stored on it.
>  In the context of GCC there are multiple ways in which this might
>  happen and they're detailed below.
>
> Compiler drivers, programs, libgccjit and support libraries
> ---
>
>  The compiler driver processes source code, invokes other programs
>  such as the assembler and linker and generates the output result,
>  which may be assembly code or machine code.  It is necessary that
>  all source code inputs to the compiler are trusted, since it is
>  impossible for the driver to validate input source code beyond
>  conformance to a programming language standard.
>
>  The GCC JIT implementation, libgccjit, is intended to be plugged
>  into applications to translate input source code in the application
>  context.  Limitations that apply to the compiler
>  driver, apply here too in terms of sanitizing inputs, so it is
>  recommended that inputs are either sanitized by an external program
>  to allow only trusted, safe execution in the context of the
>  application or the JIT execution context is appropriately sandboxed
>  to contain the effects of any bugs in the JIT or its generated code
>  to the sandboxed environment.
>
>  Support libraries such as libiberty, libcc1 libvtv and libcpp have
>  been developed separately to share code with other tools such as
>  binutils and gdb.  These libraries again have similar challenges to
>  compiler drivers.  While they are expected to be robust against
>  arbitrary input, they should only be used with trusted inputs.
>
>  Libraries such as zlib and libffi that bundled into GCC to build it
>  will be treated the same as the compiler drivers and programs as far
>  as security coverage is concerned.
>
>  As a result, the only case for a potential security issue in all
>  these cases is when it ends up generating vulnerable output for
>  valid input source code.
>
> Language runtime libraries
> --
>
>  GCC also builds and distributes libraries that are intended to be
>  used widely to implement runtime support for various programming
>  languages.  These include the following:
>
>  * libada
>  * libatomic
>  * libbacktrace
>  * libcc1
>  * libcody
>  * libcpp
>  * libdecnumber
>  * libgcc
>  * libgfortran
>  * libgm2
>  * libgo
>  * libgomp
>  * libiberty
>  * libitm
>  * libobjc
>  * libphobos
>  * libquadmath
>  * libssp
>  * libstdc++
>
>  These libraries are intended to be used in arbitrary contexts and as
>  a result, bugs in these libraries may be evaluated for security
>  impact.  However, some of these libraries, e.g. libgo, libphobos,
>  etc.  are not maintained in the GCC project, due to which the GCC
>  project may not be the correct point of contact for them.  You are
>  encouraged to look at README files within those library directories
>  to locate the canonical security contact point for those projects.
>

Hi, Sid

The text above states "bugs in these libraries may be evaluated for
security impact", but there is no comment about the criteria for a security
impact, unlike the GLIBC SECURITY.md document.  The text seems to imply the
"What is a security bug?" definitions from GLIBC, but the definitions are
not explicitly stated in the GCC Security policy.

Should this "Language runtime libraries" section include some of the GLIBC
"What is a security bug?" text or should the GCC "What is a security bug?"
section earlier in this document include the text with a qualification that
issues like buffer overflow, memory leaks, information disclosure, etc.
specifically apply to "Language runtime libraries" and not all components
of GCC?

Thanks, David


>
> Diagnostic libraries
> 
>
>  The sanitizer library bundled in GCC is intended to be used in
>  diagnostic cases and not intended for use in sensitive environments.
>  As a result, bugs in the sanitizer will not be considered security
>  sensitive.
>
> GCC plugins
> ---
>
>  It should be noted that GCC may execute arbitrary code loaded by a
>

Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar

On 2023-08-11 11:09, Paul Koning wrote:




On Aug 11, 2023, at 10:36 AM, Siddhesh Poyarekar  wrote:

On 2023-08-10 14:50, Siddhesh Poyarekar wrote:

   As a result, the only case for a potential security issue in all
   these cases is when it ends up generating vulnerable output for
   valid input source code.


I think this leaves open the interpretation "every wrong code bug
is potentially a security bug".  I suppose that's true in a trite sense,
but not in a useful sense.  As others said earlier in the thread,
whether a wrong code bug in GCC leads to a security bug in the object
code is too application-dependent to be a useful classification for GCC.

I think we should explicitly say that we don't generally consider wrong
code bugs to be security bugs.  Leaving it implicit is bound to lead
to misunderstanding.

I see what you mean, but the context-dependence of a bug is something GCC will 
have to deal with, similar to how libraries have to deal with bugs.  But I 
agree this probably needs some more expansion.  Let me try and come up with 
something more detailed for that last paragraph.


How's this:

As a result, the only case for a potential security issue in the compiler is 
when it generates vulnerable application code for valid, trusted input source 
code.  The output application code could be considered vulnerable if it 
produces an actual vulnerability in the target application, specifically in the 
following cases:


You might make it explicit that we're talking about wrong code errors here -- 
in other words, the source code is correct (conforms to the standard) and the 
algorithm expressed in the source code does not have a vulnerability, but the 
generated code has semantics that differ from those of the source code such 
that it does have a vulnerability.


Ack, thanks for the suggestion.




- The application dereferences an invalid memory location despite the 
application sources being valid.

- The application reads from or writes to a valid but incorrect memory 
location, resulting in an information integrity issue or an information leak.

- The application ends up running in an infinite loop or with severe 
degradation in performance despite the input sources having no such issue, 
resulting in a Denial of Service.  Note that correct but non-performant code is 
not a security issue candidate, this only applies to incorrect code that may 
result in performance degradation.


The last sentence somewhat contradicts the preceding one.  Perhaps "...may result in 
performance degradation severe enough to amount to a denial of service".


Ack, will fix that up, thanks.

Sid


Re: [PATCH] preserve base pointer for __deregister_frame [PR110956]

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/10/23 05:33, Thomas Neumann via Gcc-patches wrote:

Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956
Rainer Orth successfully tested the patch on Solaris with a full bootstrap.



Some uncommon unwinding table encodings need to access the base pointer
for address computations. We do not have that information in calls to
__deregister_frame_info_bases, and previously simply used nullptr as
base pointer. That is usually fine, but for some Solaris i386 shared
libraries that results in wrong address computations.

To fix this problem we now associate the unwinding object with
the table pointer itself, which is always known, in addition to
the PC range. When deregistering a frame, we first locate the object
using the table pointer, and then use the base pointer stored within
the object to compute the PC range.

libgcc/ChangeLog:
 PR libgcc/110956
 * unwind-dw2-fde.c: Associate object with address of unwinding
 table.

Pushed to the trunk.  Thanks.

Jeff



Re: [RFC] GCC Security policy

2023-08-11 Thread Siddhesh Poyarekar

On 2023-08-11 11:12, David Edelsohn wrote:
The text above states "bugs in these libraries may be evaluated for 
security impact", but there is no comment about the criteria for a 
security impact, unlike the GLIBC SECURITY.md document.  The text seems 
to imply the "What is a security bug?" definitions from GLIBC, but the 
definitions are not explicitly stated in the GCC Security policy.


Should this "Language runtime libraries" section include some of the 
GLIBC "What is a security bug?" text or should the GCC "What is a 
security bug?" section earlier in this document include the text with a 
qualification that issues like buffer overflow, memory leaks, 
information disclosure, etc. specifically apply to "Language runtime 
libraries" and not all components of GCC?


Yes, that makes sense.  This part will likely evolve though, much like 
the glibc one did, based on reports we get over time.  I'll work it in 
and post an updated draft.


Thanks,
Sid


[pushed][LRA]: Implement output stack pointer reloads

2023-08-11 Thread Vladimir Makarov via Gcc-patches
Sorry, I had some problems with email.  Therefore there are email 
duplication and they were sent to g...@gcc.gnu.org instead of 
gcc-patches@gcc.gnu.org



On 8/9/23 16:54, Vladimir Makarov wrote:




On 8/9/23 07:15, senthilkumar.selva...@microchip.com wrote:

Hi,

   After turning on FP -> SP elimination after Vlad fixed
   an elimination issue in 
https://gcc.gnu.org/git?p=gcc.git;a=commit;h=2971ff7b1d564ac04b537d907c70e6093af70832,

   I'm now running into reload failure if arithmetic is done on SP.

I think we can permit to stack pointer output reloads.  The only thing 
we need to update sp offset accurately for the original and reload 
insns.  I'll try to make the patch on this week.



The following patch fixes the problem.  The patch was successfully 
bootstrapped and tested on x86_64, aarch64, and ppc64le.


The test case is actually one from GCC test suite.

commit c0121083d07ffd4a8424f4be50de769d9ad0386d
Author: Vladimir N. Makarov 
Date:   Fri Aug 11 07:57:37 2023 -0400

[LRA]: Implement output stack pointer reloads

LRA prohibited output stack pointer reloads but it resulted in LRA
failure for AVR target which has no arithmetic insns working with the
stack pointer register.  Given patch implements the output stack
pointer reloads.

gcc/ChangeLog:

* lra-constraints.cc (goal_alt_out_sp_reload_p): New flag.
(process_alt_operands): Set the flag.
(curr_insn_transform): Modify stack pointer offsets if output
stack pointer reload is generated.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 09ff6de1657..26239908747 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1466,6 +1466,8 @@ static int goal_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
 static bool goal_alt_swapped;
 /* The chosen insn alternative.	 */
 static int goal_alt_number;
+/* True if output reload of the stack pointer should be generated.  */
+static bool goal_alt_out_sp_reload_p;
 
 /* True if the corresponding operand is the result of an equivalence
substitution.  */
@@ -2128,6 +2130,9 @@ process_alt_operands (int only_alternative)
   int curr_alt_dont_inherit_ops_num;
   /* Numbers of operands whose reload pseudos should not be inherited.	*/
   int curr_alt_dont_inherit_ops[MAX_RECOG_OPERANDS];
+  /* True if output stack pointer reload should be generated for the current
+ alternative.  */
+  bool curr_alt_out_sp_reload_p;
   rtx op;
   /* The register when the operand is a subreg of register, otherwise the
  operand itself.  */
@@ -2211,7 +2216,8 @@ process_alt_operands (int only_alternative)
 	}
   reject += static_reject;
   early_clobbered_regs_num = 0;
-
+  curr_alt_out_sp_reload_p = false;
+  
   for (nop = 0; nop < n_operands; nop++)
 	{
 	  const char *p;
@@ -2682,12 +2688,10 @@ process_alt_operands (int only_alternative)
 	  bool no_regs_p;
 
 	  reject += op_reject;
-	  /* Never do output reload of stack pointer.  It makes
-		 impossible to do elimination when SP is changed in
-		 RTL.  */
-	  if (op == stack_pointer_rtx && ! frame_pointer_needed
+	  /* Mark output reload of the stack pointer.  */
+	  if (op == stack_pointer_rtx
 		  && curr_static_id->operand[nop].type != OP_IN)
-		goto fail;
+		curr_alt_out_sp_reload_p = true;
 
 	  /* If this alternative asks for a specific reg class, see if there
 		 is at least one allocatable register in that class.  */
@@ -3317,6 +3321,7 @@ process_alt_operands (int only_alternative)
 	  for (nop = 0; nop < curr_alt_dont_inherit_ops_num; nop++)
 	goal_alt_dont_inherit_ops[nop] = curr_alt_dont_inherit_ops[nop];
 	  goal_alt_swapped = curr_swapped;
+	  goal_alt_out_sp_reload_p = curr_alt_out_sp_reload_p;
 	  best_overall = overall;
 	  best_losers = losers;
 	  best_reload_nregs = reload_nregs;
@@ -4836,6 +4841,27 @@ curr_insn_transform (bool check_only_p)
 	lra_asm_insn_error (curr_insn);
 }
   lra_process_new_insns (curr_insn, before, after, "Inserting insn reload");
+  if (goal_alt_out_sp_reload_p)
+{
+  /* We have an output stack pointer reload -- update sp offset: */
+  rtx set;
+  bool done_p = false;
+  poly_int64 sp_offset = curr_id->sp_offset;
+  for (rtx_insn *insn = after; insn != NULL_RTX; insn = NEXT_INSN (insn))
+	if ((set = single_set (insn)) != NULL_RTX
+	&& SET_DEST (set) == stack_pointer_rtx)
+	  {
+	lra_assert (!done_p);
+	curr_id->sp_offset = 0;
+	lra_insn_recog_data_t id = lra_get_insn_recog_data (insn);
+	id->sp_offset = sp_offset;
+	if (lra_dump_file != NULL)
+	  fprintf (lra_dump_file,
+		   "Moving sp offset from insn %u to %u\n",
+		   INSN_UID (curr_insn), INSN_UID (insn));
+	  }
+  lra_assert (!done_p);
+}
   return change_p;
 }
 


Re: [PATCH] analyzer: New option fanalyzer-show-events-in-system-headers [PR110543]

2023-08-11 Thread David Malcolm via Gcc-patches
On Fri, 2023-08-11 at 13:51 +0200, priour...@gmail.com wrote:
> From: benjamin priour 

Hi Benjamin, thanks for the patch.

Overall, the patch is close to being ready, but see the various
comments inline below...

> 
> This patch introduces -fanalyzer-show-events-in-system-headers,
> disabled by default.
> 
> This option reduce the noise of the analyzer emitted diagnostics
> when dealing with system headers.
> The new option only affects the display of the diagnostics,
> but doesn't hinder the actual analysis.
> 
> Given a diagnostics path diving into a system header in the form
> [
>   prefix events...,
>   system header call,
>     system header entry,
>     events within system headers...,
>   system header return,
>   suffix events...
> ]
> then disabling the option (either by default or explicitly)
> will shorten the path into:
> [
>   prefix events...,
>   system header call,
>   system header return,
>   suffix events...
> ]
> 
> Signed-off-by: benjamin priour 
> 

[...]

> 
> diff --git a/gcc/analyzer/analyzer.cc b/gcc/analyzer/analyzer.cc
> index 5091fb7a583..b27d8e359db 100644
> --- a/gcc/analyzer/analyzer.cc
> +++ b/gcc/analyzer/analyzer.cc
> @@ -274,7 +274,7 @@ is_named_call_p (const_tree fndecl, const char *funcname)
>     Compare with cp/typeck.cc: decl_in_std_namespace_p, but this doesn't
>     rely on being the C++ FE (or handle inline namespaces inside of std).  */
>  
> -static inline bool
> +bool
>  is_std_function_p (const_tree fndecl)
>  {
>    tree name_decl = DECL_NAME (fndecl);
> diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
> index 579517c23e6..31597079153 100644
> --- a/gcc/analyzer/analyzer.h
> +++ b/gcc/analyzer/analyzer.h
> @@ -386,6 +386,7 @@ extern bool is_special_named_call_p (const gcall *call, 
> const char *funcname,
>  extern bool is_named_call_p (const_tree fndecl, const char *funcname);
>  extern bool is_named_call_p (const_tree fndecl, const char *funcname,
>  const gcall *call, unsigned int num_args);
> +extern bool is_std_function_p (const_tree fndecl);

The analyzer.{cc|h} parts of the patch make is_std_function_p "extern",
but I didn't see any use of it in the rest of the patch.  Did I miss
something, or are the changes to is_std_function_p a vestige from an
earlier version of the patch?

[...]

> diff --git a/gcc/analyzer/analyzer.opt b/gcc/analyzer/analyzer.opt
> index 2760aaa8151..d97cd569f52 100644
> --- a/gcc/analyzer/analyzer.opt
> +++ b/gcc/analyzer/analyzer.opt
> @@ -290,6 +290,10 @@ fanalyzer-transitivity
>  Common Var(flag_analyzer_transitivity) Init(0)
>  Enable transitivity of constraints during analysis.
>  
> +fanalyzer-show-events-in-system-headers
> +Common Var(flag_analyzer_show_events_in_system_headers) Init(0)
> +Trim diagnostics path that are too long before emission.
> +

There's a mismatch here between the sense of the name of the option as
opposed to the sense of the description, and the wording isn't quite
accurate.

You could either

(A) rename the option to:
  fanalyzer-hide-events-in-system-headers
and make it be Init(1), and change the sense of the conditional in
diagnostic_manager::prune_path?
That way the user would suppy:
  -fno-analyzer-hide-events-in-system-headers

or:

(B) change the wording to something like
"Show events within system headers in analyzer execution paths."
or somesuch

All options should have a corresponding entry in invoke.texi, so please
add one for the new option (have a look at the existing ones).

>  fanalyzer-call-summaries
>  Common Var(flag_analyzer_call_summaries) Init(0)
>  Approximate the effect of function calls to simplify analysis.
> diff --git a/gcc/analyzer/diagnostic-manager.cc 
> b/gcc/analyzer/diagnostic-manager.cc
> index cfca305d552..2a9705a464f 100644
> --- a/gcc/analyzer/diagnostic-manager.cc
> +++ b/gcc/analyzer/diagnostic-manager.cc
> @@ -20,9 +20,11 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #include "config.h"
>  #define INCLUDE_MEMORY
> +#define INCLUDE_VECTOR

I don't see any use of std::vector in the patch; is this a vestige from
an earlier version of the patch?

>  #include "system.h"
>  #include "coretypes.h"
>  #include "tree.h"
> +#include "input.h"
>  #include "pretty-print.h"
>  #include "gcc-rich-location.h"
>  #include "gimple-pretty-print.h"
> @@ -2281,6 +2283,8 @@ diagnostic_manager::prune_path (checker_path *path,
>    path->maybe_log (get_logger (), "path");
>    prune_for_sm_diagnostic (path, sm, sval, state);
>    prune_interproc_events (path);
> +  if (! flag_analyzer_show_events_in_system_headers)
> +    prune_system_headers (path);
>    consolidate_conditions (path);
>    finish_pruning (path);
>    path->maybe_log (get_logger (), "pruned");
> @@ -2667,6 +2671,67 @@ diagnostic_manager::prune_interproc_events 
> (checker_path *path) const
>    while (changed);
>  }
>  
> +/* Remove everything within [call point, IDX]. For consistency,
> +   IDX should represent the return event of the frame to delete,
> + 

Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i

2023-08-11 Thread Lehua Ding
> I can't speak for other uarches, but as a guiding principle for Ventana
> we're assuming vsetvl instructions are common and as a result need to be
> very cheap in hardware.   It's likely a good tradeoff for us.


> I could see other uarches making different design choices though.  So 
at
> a high level, do we want this to be driven by cost modeling in some way?

> Not a review yet.  Wanted to get that feedback to you now since the 
rest
> of my day is going to be fairly busy.


Thanks for the feedback. We'll think about it some more.
Just out of curiosity, will the combination of vsetvli + vmv.v.x perform
better than li + vmv.s.x on Ventana's CPU? 




-- Original --
From:   
 "Jeff Law" 
   


Re: [PATCH] RISC-V: Revert the convert from vmv.s.x to vmv.v.i

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 09:43, Lehua Ding wrote:

 > I can't speak for other uarches, but as a guiding principle for Ventana
 > we're assuming vsetvl instructions are common and as a result need to be
 > very cheap in hardware.   It's likely a good tradeoff for us.

 > I could see other uarches making different design choices though.  So at
 > a high level, do we want this to be driven by cost modeling in some way?

 > Not a review yet.  Wanted to get that feedback to you now since the rest
 > of my day is going to be fairly busy.

Thanks for the feedback. We'll think about it some more.
Just out of curiosity, will the combination of vsetvli + vmv.v.x perform
better than li + vmv.s.x on Ventana's CPU?
It's context dependent, but in general vsetvli+vmv would generally be 
better than li + vmv.



jeff


Re: [PATCH v9] RISC-V: Add the 'zfa' extension, version 0.2

2023-08-11 Thread Jin Ma via Gcc-patches
> Hi Jin Ma,
> 
> On 5/16/23 00:06, jinma via Gcc-patches wrote:
> > On 5/15/23 07:16, Jin Ma wrote:
> >>
> >> Do we also need to check Z[FDH]INX too?
> >>
> >> Otherwise it looks pretty good.  We just need to wait for everything to
> >> freeze and finalization on the assembler interface.
> >>
> >> jeff
> > Yes, you are right, we also need to check Z[FDH]INX. I will send a patch
> > again to fix it after others give some review comments.
> 
> Can we please revisit this and get this merged upstream.
> Seems like gcc is supporting frozen but not ratified extensions.
> 
> Thx,
> -Vineet

OK, I will check and resend a patch about this in a few days.

Thanks,
Jin

Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/8/23 21:54, Lehua Ding wrote:

Hi Jeff,

 > The pattern's operand 0 explicitly allows MEMs as do the constraints.
 > So forcing the operand into a register just seems like it's papering
 > over the real problem.

The added of force_reg code is address the problem preduced after 
address the error combine.
The more restrict condtion of the pattern forbidden mem->mem pattern 
which will
produced in -O0. I think the implementation forgot to do this force_reg 
operation before
when doing the intrinis expansion The reason this problem isn't exposed 
before is because
the reload pass will converts mem->mem to mem->reg; reg->mem based on 
the constraint.

So if the core issue if mem->mem, that is a common thing to avoid.

Basically in the expander you use a force_reg and then have a test like
!(MEM_P (op0) && MEM_P (op1)) in the define_insn's condition.

But the v1 had a much more complex condition.  It looks like that got 
cleaned up in the v2.  So I'll need to look at that one more closely.





 > This comment doesn't make sense in conjuction with your earlier details.
 > In particular combine doesn't run at -O0, so your earlier comment that
 > combine creates the problem seems inconsistent with the comment above.

As the above says, the code addresses the problem which produced
after addressing the combine problem.
But combine doesn't run at -O0.  So something is inconsistent.  I 
certainly believe we need to avoid the mem->mem case, but that's 
independent of combine and affects all optimization levels.





 > Umm, wow.  I haven't thought deeply about this, but the complexity of
 > that insn condition is a huge red flag that our operand predicates
 > aren't correct for this pattern.

This condition is large because the vsetvl info need (compare to scalar 
mov or *mov_whole pattern),
but I think this condition is enough clear to understand. Let me explain 
briefly.


     (register_operand (operands[0], mode) && MEM_P (operands[3]))
     || (MEM_P (operands[0]) && register_operand(operands[3], mode))

This two conditons mean allow mem->reg and reg->mem pattern.

I think we can simplify to just

 !(MEM_P (operands[0]) && MEM_P (operands[1])



     (register_operand (operands[0], mode) && 
satisfies_constraint_Wc1 (operands[1]))


This condition mean the mask must be all trues for reg->reg_or_imm 
pattern since> reg->reg insn doen't support mask operand.
I would have expected those to be handled by the constraints rather than 
the pattern's condition.


Jeff



Re: [PATCH] c, c++, v2: Accept __builtin_classify_type (typename)

2023-08-11 Thread Jason Merrill via Gcc-patches

On 8/11/23 04:48, Jakub Jelinek wrote:

On Fri, Aug 11, 2023 at 01:13:32AM +0200, Jakub Jelinek wrote:

Looking at the first uses of the builtin back in 90s in va*.h, it certainly
relied on array/function decay there (the macros would abort e.g. on
array_type_class, function_type_class and various other return values).
Looking at older versions of tgmath.h, I see just checks for 8/9 (i.e.
real/complex) and those woiuldn't be affected by any promotions/decay.
But newer versions of tgmath.h before __builtin_tgmath do check also for
1 and they would be upset if char wasn't promoted to int (including latest
glibc).
systemtap macros also use __builtin_classify_type and do check for pointers
but those seems to be prepared to handle even arrays.


So to sum it up, I think at least the original use of the builtin had a
strong reason to do the array to pointer etc. decay and argument promotion,
because that is what happens with the varargs too and the builtin is still
documented in the internals manual just for that purpose.  It is true GCC
doesn't use the builtin for that reason anymore, but there are numerous
uses in the wild, some might cope well with changing the behavior, others
less so.


+   cp_evaluated ev;
+   ++cp_unevaluated_operand;
+   ++c_inhibit_evaluation_warnings;


These three lines seem unnecessary for parsing a type.


I had a quick look at this and a reason to do at least some of this
is e.g. array types, __builtin_classify_type (int [foo () + whatever])
will not really evaluate foo () + whatever, all it will care about is that
it is an array, so emiting evaluation warnings for it would be weird.
cp_unevaluated_operand is harder to find out what all the effects are,
but e.g. warnings for missing member initializers in such expressions
isn't needed either.


Fair enough.  But you should only need a single line

cp_unevaluated ev;

The C++ bits are OK with that change.

Jason



Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-11 Thread Lehua Ding
> But combine doesn't run at -O0.  So something is inconsistent.  I
> certainly believe we need to avoid the mem->mem case, but that's
> independent of combine and affects all optimization levels.


This is an new bug when running all tests after fixing the combine bug.
I understand that maybe I should send a separate patch to fix the problem.
Maybe this problem was exposed after I changed the pattern. I will continue to 
track it.


> I think we can simplify to just
> !(MEM_P (operands[0]) && MEM_P (operands[1])


> I would have expected those to be handled by the constraints rather than
> the pattern's condition.
Yeh, the condition of the V2 becomes much simpler after split.







-- Original --
From:   
 "Jeff Law" 
   


Re: [PATCH] RISC-V: Fix error combine of pred_mov pattern

2023-08-11 Thread Lehua Ding
>> But combine doesn't run at -O0.  So something is 
inconsistent.  I >> certainly believe we need to avoid the 
mem->mem case, but that's >> independent of combine and affects all 
optimization levels.  > This is an new bug when running all tests after 
fixing the combine bug. > I understand that maybe I should send a separate 
patch to fix the problem. > Maybe this problem was exposed after I changed 
the pattern. I will continue to track it.
Just now, I debug and found that the -O0 problem
after repairing error combine was caused by the condition
of pred_mov becoming more strict. Before was
(MEM_P (operands[0]) || MEM_P (operands[3])  || CONST_VECTOR_P 
(operands[1])
That is, mem->mem is allowed. This faulty condition causes
two problems at once. One is error combine, the other is to hide
the error pattern with -O0. After correcting the condition with this patch,
I fixed the error combine problem, and also exposed the problem under -O0.
So I think force_reg still needs to be put together with this patch.



-- Original --
From:   
 "Lehua Ding"   
 


Re: [v2 PATCH 1/2] bpf: Implementation of BPF CO-RE builtins

2023-08-11 Thread Cupertino Miranda via Gcc-patches


Hi,

Thanks for the finding.
I will fix it in next upcoming patches.

Thanks,
Cupertino

Shung-Hsi Yu writes:

> Hi,
>
> Thanks for working on the BPF backend!
>
> I noticed a tiny typo while test compiling libbpf-tools[1]. (Have yet look
> into the cause of failure in detail though)
>
> On Thu, Aug 03, 2023 at 10:54:31AM +0100, Cupertino Miranda wrote:
>> [snip]
>> +
>> +pack_type_fail:
>> +  bpf_error_at (EXPR_LOC_OR_LOC (args[0], UNKNOWN_LOCATION),
>> +"invelid first argument format for enum value builtin");
>  ^^^
>
>> +  ret.fail = true;
>> +  return ret;
>> +}
>> +
>> [snip]
>
> Thanks,
> Shung-Hsi
>
> 1: https://github.com/iovisor/bcc/tree/master/libbpf-tools


[committed] libstdc++: Do not call log10(0.0) in std::format [PR110860]

2023-08-11 Thread Jonathan Wakely via Gcc-patches
Second attempt to fix this PR. Tested x86_64-linux, pushed to trunk.

-- >8 --

Calling log10(0.0) returns -inf which has undefined behaviour when
converted to an integer. We only need to use log10 for large values
anyway. If the value is zero then the larger buffer is only needed due
to a large precision, so we don't need to use log10 to estimate the
number of digits for the significand.

libstdc++-v3/ChangeLog:

PR libstdc++/110860
* include/std/format (__formatter_fp::format): Do not call log10
with zero values.
---
 libstdc++-v3/include/std/format | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/format b/libstdc++-v3/include/std/format
index 2fe430f75f6..23da6b008c5 100644
--- a/libstdc++-v3/include/std/format
+++ b/libstdc++-v3/include/std/format
@@ -1490,7 +1490,7 @@ namespace __format
  // If the buffer is too small it's probably because of a large
  // precision, or a very large value in fixed format.
  size_t __guess = 8 + __prec;
- if (__fmt == chars_format::fixed) // +ddd.prec
+ if (__fmt == chars_format::fixed && __v != 0) // +ddd.prec
{
  if constexpr (is_same_v<_Fp, float>)
__guess += __builtin_log10f(__v < 0.0f ? -__v : __v);
-- 
2.41.0



[COMMITTED] MAINTAINERS: Add myself to write after approval

2023-08-11 Thread Eric Feng via Gcc-patches
ChangeLog:

* MAINTAINERS: Add myself.

Signed-off-by: Eric Feng 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1e54844c905..7a3ad68bc42 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -411,6 +411,7 @@ Chris Fairles   

 Alessandro Fanfarillo  
 Changpeng Fang 
 Sam Feifer 
+Eric Feng  
 Li Feng
 Thomas Fitzsimmons 
 Alexander Fomin

-- 
2.30.2



[PATCH] c, v4: Add stdckdint.h header for C23

2023-08-11 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 11, 2023 at 01:25:38PM +, Joseph Myers wrote:
> On Fri, 11 Aug 2023, Jakub Jelinek wrote:
> 
> > All that is diagnosed is when result is bool or enum (any kind).  Even for
> 
> I'd suggest tests that other nonsense cases are diagnosed, such as 
> floating-point or pointer arguments or results (hopefully such cases are 
> already diagnosed and just need tests).

So like this then?

2023-08-11  Jakub Jelinek  

* Makefile.in (USER_H): Add stdckdint.h.
* ginclude/stdckdint.h: New file.

* gcc.dg/stdckdint-1.c: New test.
* gcc.dg/stdckdint-2.c: New test.

--- gcc/Makefile.in.jj  2023-08-11 10:15:49.669691051 +0200
+++ gcc/Makefile.in 2023-08-11 18:48:52.829964582 +0200
@@ -469,6 +469,7 @@ USER_H = $(srcdir)/ginclude/float.h \
 $(srcdir)/ginclude/stdnoreturn.h \
 $(srcdir)/ginclude/stdalign.h \
 $(srcdir)/ginclude/stdatomic.h \
+$(srcdir)/ginclude/stdckdint.h \
 $(EXTRA_HEADERS)
 
 USER_H_INC_NEXT_PRE = @user_headers_inc_next_pre@
--- gcc/ginclude/stdckdint.h.jj 2023-08-11 18:48:52.829964582 +0200
+++ gcc/ginclude/stdckdint.h2023-08-11 18:48:52.829964582 +0200
@@ -0,0 +1,40 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* ISO C23: 7.20 Checked Integer Arithmetic .  */
+
+#ifndef _STDCKDINT_H
+#define _STDCKDINT_H
+
+#define __STDC_VERSION_STDCKDINT_H__ 202311L
+
+#define ckd_add(r, a, b) ((_Bool) __builtin_add_overflow (a, b, r))
+#define ckd_sub(r, a, b) ((_Bool) __builtin_sub_overflow (a, b, r))
+#define ckd_mul(r, a, b) ((_Bool) __builtin_mul_overflow (a, b, r))
+
+/* Allow for the C library to add its part to the header.  */
+#if !defined (_LIBC_STDCKDINT_H) && __has_include_next ()
+# include_next 
+#endif
+
+#endif /* stdckdint.h */
--- gcc/testsuite/gcc.dg/stdckdint-1.c.jj   2023-08-11 18:48:52.829964582 
+0200
+++ gcc/testsuite/gcc.dg/stdckdint-1.c  2023-08-11 18:48:52.829964582 +0200
@@ -0,0 +1,61 @@
+/* Test C23 Checked Integer Arithmetic macros in .  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+#if __STDC_VERSION_STDCKDINT_H__ != 202311L
+# error __STDC_VERSION_STDCKDINT_H__ not defined to 202311L
+#endif
+
+extern void abort (void);
+
+int
+main ()
+{
+  unsigned int a;
+  if (ckd_add (&a, 1, 2) || a != 3)
+abort ();
+  if (ckd_add (&a, ~2U, 2) || a != ~0U)
+abort ();
+  if (!ckd_add (&a, ~2U, 4) || a != 1)
+abort ();
+  if (ckd_sub (&a, 42, 2) || a != 40)
+abort ();
+  if (!ckd_sub (&a, 11, ~0ULL) || a != 12)
+abort ();
+  if (ckd_mul (&a, 42, 16U) || a != 672)
+abort ();
+  if (ckd_mul (&a, ~0UL, 0) || a != 0)
+abort ();
+  if (ckd_mul (&a, 1, ~0U) || a != ~0U)
+abort ();
+  if (ckd_mul (&a, ~0UL, 1) != (~0UL > ~0U) || a != ~0U)
+abort ();
+  static_assert (_Generic (ckd_add (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_sub (&a, 1, 1), bool: 1, default: 0));
+  static_assert (_Generic (ckd_mul (&a, 1, 1), bool: 1, default: 0));
+  signed char b;
+  if (ckd_add (&b, 8, 12) || b != 20)
+abort ();
+  if (ckd_sub (&b, 8UL, 12ULL) || b != -4)
+abort ();
+  if (ckd_mul (&b, 2, 3) || b != 6)
+abort ();
+  unsigned char c;
+  if (ckd_add (&c, 8, 12) || c != 20)
+abort ();
+  if (ckd_sub (&c, 8UL, 12ULL) != (-4ULL > (unsigned char) -4U)
+  || c != (unsigned char) -4U)
+abort ();
+  if (ckd_mul (&c, 2, 3) || c != 6)
+abort ();
+  long long d;
+  if (ckd_add (&d, ~0U, ~0U) != (~0U + 1ULL < ~0U)
+  || d != (long long) (2 * (unsigned long long) ~0U))
+abort ();
+  if (ckd_sub (&d, 0, 0) || d != 0)
+abort ();
+  if (ckd_mul (&d, 16, 1) || d != 16)
+abort ();
+}
--- gcc/testsuite/gcc.dg/stdckdint-2.c.jj   2023-08-11 18:48:52.829964582 
+0200
+++ gcc/testsuite/gcc.dg/stdckdint-2.c  2023-08-11 19:28:50.081643961 +0200
@@ -0,0 +1,87 @@
+/* Test C23 Checked Integer Arithmetic macros in .  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x" } */
+
+#include 
+
+int
+main ()
+{
+  char a;
+  bool b;
+  enum E { E1, E2 } c = E1;
+  int d;
+  int *e;
+  float f;
+  double g

[PATCH] tree-pretty-print: delimit TREE_VEC with braces

2023-08-11 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK for trunk?

-- >8 --

This makes the generic pretty printer print braces around a TREE_VEC
like we do for CONSTRUCTOR.  This should improve readability of nested
TREE_VECs in particular.

gcc/ChangeLog:

* tree-pretty-print.cc (dump_generic_node) :
Delimit output with braces.
---
 gcc/tree-pretty-print.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 51a213529d1..579037b32c2 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -1900,6 +1900,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
 case TREE_VEC:
   {
size_t i;
+   pp_left_brace (pp);
if (TREE_VEC_LENGTH (node) > 0)
  {
size_t len = TREE_VEC_LENGTH (node);
@@ -1913,6 +1914,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
dump_generic_node (pp, TREE_VEC_ELT (node, len - 1), spc,
   flags, false);
  }
+   pp_right_brace (pp);
   }
   break;
 
-- 
2.42.0.rc1



[COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc-patches
Thanks for the feedback! I've incorporated the changes (aside from
expanding test coverage, which I plan on releasing in a follow-up),
rebased, and performed a bootstrap and regtest on
aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
with nits fixed and no problems after rebase, the patch has now been pushed. 

Best,
Eric

---

This patch adds known function subclasses for Python/C API functions
PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
optional parameters for
region_model::get_or_create_region_for_heap_alloc, allowing for the
newly allocated region to immediately transition from the start state to
the assumed non-null state in the malloc state machine if desired.
Finally, it adds a new procedure, dg-require-python-h, intended as a
directive in Python-related analyzer tests, to append necessary Python
flags during the tests' build process.

The main warnings we gain in this patch with respect to the known function
subclasses mentioned are leak related. For example:

rc3.c: In function ‘create_py_object’:
│
rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
│
   21 |   return list;
  │
  |  ^~~~
│
  ‘create_py_object’: events 1-4
│
|
│
|4 |   PyObject* item = PyLong_FromLong(10);
│
|  |^~~
│
|  ||
│
|  |(1) allocated here
│
|  |(2) when ‘PyLong_FromLong’ succeeds
│
|5 |   PyObject* list = PyList_New(2);
│
|  |~
│
|  ||
│
|  |(3) when ‘PyList_New’ fails
│
|..
│
|   21 |   return list;
│
|  |  
│
|  |  |
│
|  |  (4) ‘item’ leaks here; was allocated at (1)
│

Some concessions were made to
simplify the analysis process when comparing kf_PyList_Append with the
real implementation. In particular, PyList_Append performs some
optimization internally to try and avoid calls to realloc if
possible. For simplicity, we assume that realloc is called every time.
Also, we grow the size by just 1 (to ensure enough space for adding a
new element) rather than abide by the heuristics that the actual implementation
follows.

gcc/analyzer/ChangeLog:
PR analyzer/107646
* call-details.h: New function.
* region-model.cc (region_model::get_or_create_region_for_heap_alloc):
New optional parameters.
* region-model.h (class region_model): New optional parameters.
* sm-malloc.cc (on_realloc_with_move): New function.
(region_model::transition_ptr_sval_non_null): New function.

gcc/testsuite/ChangeLog:
PR analyzer/107646
* gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
PyList_New, PyList_Append, PyLong_FromLong
* gcc.dg/plugin/plugin.exp: New test.
* lib/target-supports.exp: New procedure.
* gcc.dg/plugin/cpython-plugin-test-2.c: New test.

Signed-off-by: Eric Feng 
---
 gcc/analyzer/call-details.h   |   4 +
 gcc/analyzer/region-model.cc  |  17 +-
 gcc/analyzer/region-model.h   |  14 +-
 gcc/analyzer/sm-malloc.cc |  42 +
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
 .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
 gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
 gcc/testsuite/lib/target-supports.exp |  25 +
 8 files changed, 899 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c

diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 24be2247e63..bf2601151ea 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -49,6 +49,10 @@ public:
 return POINTER_TYPE_P (get_arg_type (idx));
   }
   bool arg_is_size_p (unsigned idx) const;
+  bool arg_is_integral_p (unsigned idx) const
+  {
+return INTEGRAL_TYPE_P (get_arg_type (idx));
+  }
 
   const gcall *get_call_stmt () const { return m_call; }
   location_t get_location () const;
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 094b7af3dbc..aa9fe008b9d 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4991,11 +4991,16 @@ region_model::check_dynamic_size_for_floats (const 
svalue *size_in_bytes,
Use CTXT to complain about tainted sizes.
 
Reuse an existing heap_allocated_region if it's not being referenced by
-   this region_model; otherwise create a new one.  */
+   this region_model; otherwise create a new one.
+
+   Optionally (update_state_machine) transitions the pointer pointing to the
+   heap_allocated_region from start to assumed non-null.  */
 
 const region *
 region_model::get_or_create_region_for_heap_alloc (const svalue *size_in_bytes,
-  region_mod

Re: [PATCH] tree-pretty-print: delimit TREE_VEC with braces

2023-08-11 Thread Jason Merrill via Gcc-patches

On 8/11/23 13:35, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk?


OK.


-- >8 --

This makes the generic pretty printer print braces around a TREE_VEC
like we do for CONSTRUCTOR.  This should improve readability of nested
TREE_VECs in particular.

gcc/ChangeLog:

* tree-pretty-print.cc (dump_generic_node) :
Delimit output with braces.
---
  gcc/tree-pretty-print.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 51a213529d1..579037b32c2 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -1900,6 +1900,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
  case TREE_VEC:
{
size_t i;
+   pp_left_brace (pp);
if (TREE_VEC_LENGTH (node) > 0)
  {
size_t len = TREE_VEC_LENGTH (node);
@@ -1913,6 +1914,7 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
dump_generic_node (pp, TREE_VEC_ELT (node, len - 1), spc,
   flags, false);
  }
+   pp_right_brace (pp);
}
break;
  




[COMMITTED] bpf: allow exceeding max num of args in BPF when always_inline

2023-08-11 Thread Jose E. Marchesi via Gcc-patches
BPF currently limits the number of registers used to pass arguments to
functions to five registers.  There is a check for this at function
expansion time.  However, if a function is guaranteed to be always
inlined (and its body never generated) by virtue of the always_inline
attribute, it can "receive" any number of arguments.

Tested in host x86_64-linux-gnu and target bpf-unknown-none.

gcc/ChangeLog

* config/bpf/bpf.cc (bpf_function_arg_advance): Do not complain
about too many arguments if function is always inlined.

gcc/testsuite/ChangeLog

* gcc.target/bpf/diag-funargs-inline-1.c: New test.
* gcc.target/bpf/diag-funargs.c: Adapt test.
---
 gcc/config/bpf/bpf.cc |  9 +++-
 .../gcc.target/bpf/diag-funargs-inline-1.c| 21 +++
 gcc/testsuite/gcc.target/bpf/diag-funargs.c   |  8 ++-
 3 files changed, 36 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/diag-funargs-inline-1.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index 33218b3a818..d27a971d0af 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -732,7 +732,14 @@ bpf_function_arg_advance (cumulative_args_t ca,
   unsigned num_words = CEIL (num_bytes, UNITS_PER_WORD);
 
   if (*cum <= 5 && *cum + num_words > 5)
-error ("too many function arguments for eBPF");
+{
+  /* Too many arguments for BPF.  However, if the function is
+ gonna be inline for sure, we let it pass.  Otherwise, issue
+ an error.  */
+  if (!lookup_attribute ("always_inline",
+ DECL_ATTRIBUTES (cfun->decl)))
+error ("too many function arguments for eBPF");
+}
 
   *cum += num_words;
 }
diff --git a/gcc/testsuite/gcc.target/bpf/diag-funargs-inline-1.c 
b/gcc/testsuite/gcc.target/bpf/diag-funargs-inline-1.c
new file mode 100644
index 000..e917ef1294e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/bpf/diag-funargs-inline-1.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+inline int __attribute__ ((always_inline))
+foo (int a1,
+ int a2,
+ int a3,
+ int a4,
+ int a5,
+ int a6)
+{
+  return a1 + a2 + a3 + a4 + a5 + a6;
+}
+
+int
+bar (int i1, int i2, int i3, int i4, int i5)
+{
+  return foo (i1, i2, i3, i4, i5, 10);
+}
+
+/* { dg-final { scan-assembler-not "call\t.*" } } */
diff --git a/gcc/testsuite/gcc.target/bpf/diag-funargs.c 
b/gcc/testsuite/gcc.target/bpf/diag-funargs.c
index d4e9c0683f2..42b5f05b67c 100644
--- a/gcc/testsuite/gcc.target/bpf/diag-funargs.c
+++ b/gcc/testsuite/gcc.target/bpf/diag-funargs.c
@@ -11,5 +11,11 @@ foo (int a1,  /* { dg-error "too many function arguments" } 
*/
  int a5,
  int a6)
 {
-  return a6;
+  return a1 + a2 + a3 + a4 + a5 + a6;
+}
+
+int
+bar (int i1, int i2, int i3, int i4, int i5)
+{
+  return foo (i1, i2, i3, i4, i5, 10);
 }
-- 
2.30.2



[COMMITTED] bpf: liberate R9 for general register allocation

2023-08-11 Thread Jose E. Marchesi via Gcc-patches
We were reserving one of the hard registers in BPF in order to
implement dynamic stack allocation: alloca and VLAs. However, there is
kernel code that has inline assembly that requires all the non-fixed
registers to be available for register allocation.

This patch:

1. Liberates r9 that is now available for register allocation.

2. Adds a check to GCC so it errors out if the user tries to do
   dynamic stack allocation.  A couple of tests are added for this.

3. Changes xbpf so it no longer saves and restores callee-saved
   registers.  A couple of tests for this have been removed.

4. Adds bpf-*-* to the list of targets that do not support alloca in
   target-support.exp.

Tested in host x86_64-linux-gnu and target bpf-unknown-none.

gcc/ChangeLog

* config/bpf/bpf.md (allocate_stack): Define.
* config/bpf/bpf.h (FIRST_PSEUDO_REGISTER): Make room for fake
stack pointer register.
(FIXED_REGISTERS): Adjust accordingly.
(CALL_USED_REGISTERS): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(REGISTER_NAMES): Likewise.
* config/bpf/bpf.cc (bpf_compute_frame_layout): Do not reserve
space for callee-saved registers.
(bpf_expand_prologue): Do not save callee-saved registers in xbpf.
(bpf_expand_epilogue): Do not restore callee-saved registers in
xbpf.

gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_alloca): BPF
target does not support alloca.
* gcc.target/bpf/diag-alloca-1.c: New test.
* gcc.target/bpf/diag-alloca-2.c: Likewise.
* gcc.target/bpf/xbpf-callee-saved-regs-1.c: Remove test.
* gcc.target/bpf/xbpf-callee-saved-regs-2.c: Likewise.
* gcc.target/bpf/regs-availability-1.c: Likewise.
---
 gcc/config/bpf/bpf.cc | 128 ++
 gcc/config/bpf/bpf.h  |  23 ++--
 gcc/config/bpf/bpf.md |  13 ++
 gcc/testsuite/gcc.target/bpf/diag-alloca-1.c  |   9 ++
 gcc/testsuite/gcc.target/bpf/diag-alloca-2.c  |   9 ++
 .../gcc.target/bpf/regs-availability-1.c  |  21 +++
 .../gcc.target/bpf/xbpf-callee-saved-regs-1.c |  17 ---
 .../gcc.target/bpf/xbpf-callee-saved-regs-2.c |  17 ---
 gcc/testsuite/lib/target-supports.exp |   3 +
 9 files changed, 82 insertions(+), 158 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/bpf/diag-alloca-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/diag-alloca-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/regs-availability-1.c
 delete mode 100644 gcc/testsuite/gcc.target/bpf/xbpf-callee-saved-regs-1.c
 delete mode 100644 gcc/testsuite/gcc.target/bpf/xbpf-callee-saved-regs-2.c

diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index d27a971d0af..3516b79bce4 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -76,10 +76,6 @@ struct GTY(()) machine_function
 {
   /* Number of bytes saved on the stack for local variables.  */
   int local_vars_size;
-
-  /* Number of bytes saved on the stack for callee-saved
- registers.  */
-  int callee_saved_reg_size;
 };
 
 /* Handle an attribute requiring a FUNCTION_DECL;
@@ -346,7 +342,7 @@ static void
 bpf_compute_frame_layout (void)
 {
   int stack_alignment = STACK_BOUNDARY / BITS_PER_UNIT;
-  int padding_locals, regno;
+  int padding_locals;
 
   /* Set the space used in the stack by local variables.  This is
  rounded up to respect the minimum stack alignment.  */
@@ -358,23 +354,9 @@ bpf_compute_frame_layout (void)
 
   cfun->machine->local_vars_size += padding_locals;
 
-  if (TARGET_XBPF)
-{
-  /* Set the space used in the stack by callee-saved used
-registers in the current function.  There is no need to round
-up, since the registers are all 8 bytes wide.  */
-  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
-   if ((df_regs_ever_live_p (regno)
-&& !call_used_or_fixed_reg_p (regno))
-   || (cfun->calls_alloca
-   && regno == STACK_POINTER_REGNUM))
- cfun->machine->callee_saved_reg_size += 8;
-}
-
   /* Check that the total size of the frame doesn't exceed the limit
  imposed by eBPF.  */
-  if ((cfun->machine->local_vars_size
-   + cfun->machine->callee_saved_reg_size) > bpf_frame_limit)
+  if (cfun->machine->local_vars_size > bpf_frame_limit)
 {
   static int stack_limit_exceeded = 0;
 
@@ -393,69 +375,19 @@ bpf_compute_frame_layout (void)
 void
 bpf_expand_prologue (void)
 {
-  HOST_WIDE_INT size;
-
-  size = (cfun->machine->local_vars_size
- + cfun->machine->callee_saved_reg_size);
-
   /* The BPF "hardware" provides a fresh new set of registers for each
  called function, some of which are initialized to the values of
  the arguments passed in the first five registers.  In doing so,
- it saves the values of the registers of the caller, and restored
+ it saves the values of the registers of the caller, and resto

[committed] libstdc++: Implement C++20 std::chrono::parse [PR104167]

2023-08-11 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

This is a pure addition that only affects C++20 mode, so I'm considering
backporting it to gcc-13 at some point (once any dust has settled from
landing it on trunk).

-- >8 --

This adds the missing C++20 features to .

I've implemented my proposed resolutions to LWG issues 3960, 3961, and
3962. There are some unimplemented flags such as %OI which I think are
not implementable in general. It might be possible to use na_llanginfo
with ALT_DIGITS, but that isn't available on all targets. I intend to
file another LWG issue about that.

libstdc++-v3/ChangeLog:

PR libstdc++/104167
* include/bits/chrono_io.h (operator|=, operator|): Add noexcept
to _ChronoParts operators.
(from_stream, parse): Define new functions.
(__detail::_Parse, __detail::_Parser): New class templates.
* include/std/chrono (__cpp_lib_chrono): Define to 201907L for
C++20.
* include/std/version (__cpp_lib_chrono): Likewise.
* testsuite/20_util/duration/arithmetic/constexpr_c++17.cc:
Adjust expected value of feature test macro.
* testsuite/20_util/duration/io.cc: Test parsing.
* testsuite/std/time/clock/file/io.cc: Likewise.
* testsuite/std/time/clock/gps/io.cc: Likewise.
* testsuite/std/time/clock/system/io.cc: Likewise.
* testsuite/std/time/clock/tai/io.cc: Likewise.
* testsuite/std/time/clock/utc/io.cc: Likewise.
* testsuite/std/time/day/io.cc: Likewise.
* testsuite/std/time/month/io.cc: Likewise.
* testsuite/std/time/month_day/io.cc: Likewise.
* testsuite/std/time/weekday/io.cc: Likewise.
* testsuite/std/time/year/io.cc: Likewise.
* testsuite/std/time/year_month/io.cc: Likewise.
* testsuite/std/time/year_month_day/io.cc: Likewise.
* testsuite/std/time/syn_c++20.cc: Check value of macro and for
the existence of parse and from_stream in namespace chrono.
* testsuite/std/time/clock/local/io.cc: New test.
* testsuite/std/time/parse.cc: New test.
---
 libstdc++-v3/include/bits/chrono_io.h | 1691 -
 libstdc++-v3/include/std/chrono   |5 +-
 libstdc++-v3/include/std/version  |4 +-
 .../duration/arithmetic/constexpr_c++17.cc|2 +-
 libstdc++-v3/testsuite/20_util/duration/io.cc |  102 +-
 .../testsuite/std/time/clock/file/io.cc   |   18 +
 .../testsuite/std/time/clock/gps/io.cc|   22 +-
 .../testsuite/std/time/clock/local/io.cc  |   42 +
 .../testsuite/std/time/clock/system/io.cc |   73 +
 .../testsuite/std/time/clock/tai/io.cc|   22 +-
 .../testsuite/std/time/clock/utc/io.cc|   31 +
 libstdc++-v3/testsuite/std/time/day/io.cc |   60 +-
 libstdc++-v3/testsuite/std/time/month/io.cc   |  122 +-
 .../testsuite/std/time/month_day/io.cc|   79 +-
 libstdc++-v3/testsuite/std/time/parse.cc  |  309 +++
 libstdc++-v3/testsuite/std/time/syn_c++20.cc  |9 +-
 libstdc++-v3/testsuite/std/time/weekday/io.cc |   78 +-
 libstdc++-v3/testsuite/std/time/year/io.cc|   74 +-
 .../testsuite/std/time/year_month/io.cc   |   50 +-
 .../testsuite/std/time/year_month_day/io.cc   |   65 +-
 20 files changed, 2816 insertions(+), 42 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/clock/local/io.cc
 create mode 100644 libstdc++-v3/testsuite/std/time/parse.cc

diff --git a/libstdc++-v3/include/bits/chrono_io.h 
b/libstdc++-v3/include/bits/chrono_io.h
index c95301361d8..84791d41fb1 100644
--- a/libstdc++-v3/include/bits/chrono_io.h
+++ b/libstdc++-v3/include/bits/chrono_io.h
@@ -235,9 +235,13 @@ namespace __format
   };
 
   constexpr _ChronoParts
-  operator|(_ChronoParts __x, _ChronoParts __y)
+  operator|(_ChronoParts __x, _ChronoParts __y) noexcept
   { return static_cast<_ChronoParts>((int)__x | (int)__y); }
 
+  constexpr _ChronoParts&
+  operator|=(_ChronoParts& __x, _ChronoParts __y) noexcept
+  { return __x = __x | __y; }
+
   // TODO rename this to chrono::__formatter? or chrono::__detail::__formatter?
   template
 struct __formatter_chrono
@@ -2136,18 +2140,150 @@ namespace chrono
 /// @addtogroup chrono
 /// @{
 
-  // TODO: from_stream for duration
-#if 0
+/// @cond undocumented
+namespace __detail
+{
+  template
+struct _Parser
+{
+  static_assert(is_same_v, _Duration>);
+
+  explicit
+  _Parser(__format::_ChronoParts __need) : _M_need(__need) { }
+
+  _Parser(_Parser&&) = delete;
+  void operator=(_Parser&&) = delete;
+
+  _Duration _M_time{}; // since midnight
+  sys_days _M_sys_days{};
+  year_month_day _M_ymd{};
+  weekday _M_wd{};
+  __format::_ChronoParts _M_need;
+
+  template
+   basic_istream<_CharT, _Traits>&
+   operator()(basic_istream<_CharT, _Traits>& __is, const _CharT* __fmt,
+  basic_string<_CharT, _Traits, _Alloc>* __abbrev = nullptr,
+  minutes* __offset = nullptr)

[PATCH] RISC-V: Specify -mabi for ztso testcases

2023-08-11 Thread Patrick O'Neill
On rv32 targets, this patch fixes ztso testcases errors like this:
cc1: error: ABI requires '-march=rv32'

2023-08-11 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add -mabi=lp64d
to dg-options.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-3.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-4.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-5.c   | 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-1.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-2.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-3.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-4.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-5.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-6.c| 2 +-
 .../gcc.target/riscv/amo-table-ztso-compare-exchange-7.c| 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-3.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-4.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-fence-5.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-1.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-2.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-load-3.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-1.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-2.c | 2 +-
 gcc/testsuite/gcc.target/riscv/amo-table-ztso-store-3.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c | 2 +-
 .../gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c | 2 +-
 28 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
index a7097e9aab9..a88d08eb3f4 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
+++ b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* Verify that atomic op mappings match the Ztso suggested mapping.  */
-/* { dg-options "-march=rv64id_ztso -O3" } */
+/* { dg-options "-march=rv64id_ztso -mabi=lp64d -O3" } */
 /* { dg-skip-if "" { *-*-* } { "-g" "-flto"} } */
 /* { dg-final { check-function-bodies "**" "" } } */

diff --git a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c 
b/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
index 8e993903439..ebd240f9dd2 100644
--- a/gcc/testsuite/gcc.target/riscv/amo-table-ztso-amo-add-2.c
+++ b/gcc/testsuite/gcc.ta

Re: [COMMITTED] analyzer: More features for CPython analyzer plugin [PR107646]

2023-08-11 Thread Eric Feng via Gcc-patches
I've noticed there were still some strange indentations in the last
patch ... however, I think I've finally figured out a sane formatting
solution for me (fingers crossed). I will address them in the
follow-up patch at the same time as adding more test coverage.

---

In case, anyone else using VSCode has been having issues with
formatting according to GNU/GCC conventions, these are the relevant
formatting settings that I've found work for me. Assuming the C/C++
extension is installed, then in settings.json:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 8 }"

Just setting the base style to GNU formats everything correctly except
for the fact that indentation defaults to spaces (which is what I've
been struggling with fixing manually in the last few patches). The
rest of the settings are for replacing blocks of 8 spaces with tabs
(which is a requirement in check_GNU_style). In combination, this
works for everything except for header files for some reason, but I'll
defer that battle to another day.

On Fri, Aug 11, 2023 at 1:47 PM Eric Feng  wrote:
>
> Thanks for the feedback! I've incorporated the changes (aside from
> expanding test coverage, which I plan on releasing in a follow-up),
> rebased, and performed a bootstrap and regtest on
> aarch64-unknown-linux-gnu. Since you mentioned that it is good for trunk
> with nits fixed and no problems after rebase, the patch has now been pushed.
>
> Best,
> Eric
>
> ---
>
> This patch adds known function subclasses for Python/C API functions
> PyList_New, PyLong_FromLong, and PyList_Append. It also adds new
> optional parameters for
> region_model::get_or_create_region_for_heap_alloc, allowing for the
> newly allocated region to immediately transition from the start state to
> the assumed non-null state in the malloc state machine if desired.
> Finally, it adds a new procedure, dg-require-python-h, intended as a
> directive in Python-related analyzer tests, to append necessary Python
> flags during the tests' build process.
>
> The main warnings we gain in this patch with respect to the known function
> subclasses mentioned are leak related. For example:
>
> rc3.c: In function ‘create_py_object’:
> │
> rc3.c:21:10: warning: leak of ‘item’ [CWE-401] [-Wanalyzer-malloc-leak]
> │
>21 |   return list;
>   │
>   |  ^~~~
> │
>   ‘create_py_object’: events 1-4
> │
> |
> │
> |4 |   PyObject* item = PyLong_FromLong(10);
> │
> |  |^~~
> │
> |  ||
> │
> |  |(1) allocated here
> │
> |  |(2) when ‘PyLong_FromLong’ succeeds
> │
> |5 |   PyObject* list = PyList_New(2);
> │
> |  |~
> │
> |  ||
> │
> |  |(3) when ‘PyList_New’ fails
> │
> |..
> │
> |   21 |   return list;
> │
> |  |  
> │
> |  |  |
> │
> |  |  (4) ‘item’ leaks here; was allocated at (1)
> │
>
> Some concessions were made to
> simplify the analysis process when comparing kf_PyList_Append with the
> real implementation. In particular, PyList_Append performs some
> optimization internally to try and avoid calls to realloc if
> possible. For simplicity, we assume that realloc is called every time.
> Also, we grow the size by just 1 (to ensure enough space for adding a
> new element) rather than abide by the heuristics that the actual 
> implementation
> follows.
>
> gcc/analyzer/ChangeLog:
> PR analyzer/107646
> * call-details.h: New function.
> * region-model.cc (region_model::get_or_create_region_for_heap_alloc):
> New optional parameters.
> * region-model.h (class region_model): New optional parameters.
> * sm-malloc.cc (on_realloc_with_move): New function.
> (region_model::transition_ptr_sval_non_null): New function.
>
> gcc/testsuite/ChangeLog:
> PR analyzer/107646
> * gcc.dg/plugin/analyzer_cpython_plugin.c: Analyzer support for
> PyList_New, PyList_Append, PyLong_FromLong
> * gcc.dg/plugin/plugin.exp: New test.
> * lib/target-supports.exp: New procedure.
> * gcc.dg/plugin/cpython-plugin-test-2.c: New test.
>
> Signed-off-by: Eric Feng 
> ---
>  gcc/analyzer/call-details.h   |   4 +
>  gcc/analyzer/region-model.cc  |  17 +-
>  gcc/analyzer/region-model.h   |  14 +-
>  gcc/analyzer/sm-malloc.cc |  42 +
>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 722 ++
>  .../gcc.dg/plugin/cpython-plugin-test-2.c |  78 ++
>  gcc/testsuite/gcc.dg/plugin/plugin.exp|   3 +-
>  gcc/testsuite/lib/target-supports.exp |  25 +
>  8 files changed, 899 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/plugin/cpython-plugin-test-2.c
>
> diff

Re: [PATCH] RISC-V: Specify -mabi for ztso testcases

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/11/23 13:15, Patrick O'Neill wrote:

On rv32 targets, this patch fixes ztso testcases errors like this:
cc1: error: ABI requires '-march=rv32'

2023-08-11 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add -mabi=lp64d
to dg-options.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

OK
jeff


[Committed] RISC-V: Specify -mabi for ztso testcases

2023-08-11 Thread Patrick O'Neill



On 8/11/23 13:44, Jeff Law wrote:



On 8/11/23 13:15, Patrick O'Neill wrote:

On rv32 targets, this patch fixes ztso testcases errors like this:
cc1: error: ABI requires '-march=rv32'

2023-08-11 Patrick O'Neill 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/amo-table-ztso-amo-add-1.c: Add -mabi=lp64d
to dg-options.
* gcc.target/riscv/amo-table-ztso-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-amo-add-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-6.c: Ditto.
* gcc.target/riscv/amo-table-ztso-compare-exchange-7.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-fence-5.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-load-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-store-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-1.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-2.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-3.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-4.c: Ditto.
* gcc.target/riscv/amo-table-ztso-subword-amo-add-5.c: Ditto.

OK
jeff

Committed

Patrick


Re: [PATCH v3] Mode-Switching: Fix SET_SRC ICE when CLOBBER insn

2023-08-11 Thread Jeff Law via Gcc-patches




On 8/8/23 21:05, pan2...@intel.com wrote:

From: Pan Li 

In same cases, like gcc/testsuite/gcc.dg/pr78148.c in RISC-V, there will
be only 1 operand when SET_SRC in create_pre_exit. For example as below.

(insn 13 9 14 2 (clobber (reg/i:TI 10 a0)) 
"gcc/testsuite/gcc.dg/pr78148.c":24:1 -1
   (expr_list:REG_UNUSED (reg/i:TI 10 a0)
 (nil)))

Unfortunately, SET_SRC requires at least 2 operands and then Segment
Fault here. For SH4 part result in Segment Fault, it looks like only
valid when the return_copy_pat is load or something like that. Thus,
this patch try to fix it by ingnoring the CLOBBER insn for SH4.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* mode-switching.cc (create_pre_exit): Add CLOBBER check.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mode-switch-ice-1.c: New test.



---
  gcc/mode-switching.cc |  2 +-
  .../gcc.target/riscv/mode-switch-ice-1.c  | 22 +++
  2 files changed, 23 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/mode-switch-ice-1.c

diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index 64ae2bc29c3..b034cf7d437 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -392,7 +392,7 @@ create_pre_exit (int n_entities, int *entity_map, const int 
*num_modes)
&& mode != targetm.mode_switching.exit (e))
  break;
  }
-   if (j >= 0)
+   if (j >= 0 && GET_CODE (return_copy_pat) != CLOBBER)
  {
/* __builtin_return emits a sequence of loads to all
   return registers.  One of them might require
I'd tend to prefer to guard the code a bit later so that the test for 
CLOBBERS is closer to the point where they're not allowed.  ie




/* For the SH4, floating point loads depend on fpscr,
   thus we might need to put the final mode switch
   after the return value copy.  That is still OK,
   because a floating point return value does not
   conflict with address reloads.  */
if (copy_start >= ret_start 
&& copy_start + copy_num <= ret_end

&& OBJECT_P (SET_SRC (return_copy_pat)))
  forced_late_switch = true;
break;

I'd put it in that code.  Probably something like

&& GET_CODE (return_copy_pat) = SET
&& OBJECT_P (SET_SRC (return_copy_pat)))

That way we make it clear that we should only be looking at SET_SRC of 
an actual SET.


Is there some reason you put the guard earlier?

jeff


Re: [PATCH] c, v4: Add stdckdint.h header for C23

2023-08-11 Thread Joseph Myers
On Fri, 11 Aug 2023, Jakub Jelinek via Gcc-patches wrote:

> On Fri, Aug 11, 2023 at 01:25:38PM +, Joseph Myers wrote:
> > On Fri, 11 Aug 2023, Jakub Jelinek wrote:
> > 
> > > All that is diagnosed is when result is bool or enum (any kind).  Even for
> > 
> > I'd suggest tests that other nonsense cases are diagnosed, such as 
> > floating-point or pointer arguments or results (hopefully such cases are 
> > already diagnosed and just need tests).
> 
> So like this then?

I think it should also test the diagnostic for when *result is 
const-qualified.  OK with that change.

-- 
Joseph S. Myers
jos...@codesourcery.com


  1   2   >