date:20240102

Re: Re: [PATCH v5 1/2] RISC-V: Add crypto vector builtin function.

2024-01-02 Thread Feng Wang

2024-01-02 15:55 juzhe.zhong  wrote:



>+/* Static information about a set of crypto vector functions.  */

>+struct crypto_function_group_info

>+{

>+  struct function_group_info rvv_function_group_info;

>+  /* Whether the function is available.  */

>+  unsigned int (*avail) (void);

>+};

>

>What is this used for ?

Will delete it.

>

>

>juzhe.zh...@rivai.ai

> 

>From: Feng Wang

>Date: 2024-01-02 15:47

>To: gcc-patches

>CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang

>Subject: [PATCH v5 1/2] RISC-V: Add crypto vector builtin function.

>Patch v5:Rebase.

>Patch v4:Merge crypto vector function.def into vector.

>Patch v3:Define a shape for vaesz and merge vector-crypto-types.def

> into riscv-vector-builtins-types.def.

>Patch v2:Optimize function_shape class for crypto_vector.

> 

>This patch add the intrinsic funtions of crypto vector based on the

>intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob

>/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).

> 

>Co-Authored by: Songhe Zhu 

>Co-Authored by: Ciyan Pan 

>gcc/ChangeLog:

> 

>* config/riscv/riscv-vector-builtins-bases.cc (class vandn):

>Add new function_base for crypto vector.

>(class bitmanip): Ditto. 

>(class b_reverse):Ditto. 

>(class vwsll):   Ditto. 

>(class clmul):   Ditto. 

>(class vg_nhab):  Ditto. 

>(class crypto_vv):Ditto. 

>(class crypto_vi):Ditto. 

>(class vaeskf2_vsm3c):Ditto.

>(class vsm3me): Ditto.

>(BASE): Add BASE declaration for crypto vector.
>* config/riscv/riscv-vector-builtins-bases.h: Ditto.



>* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):



>Add crypto vector intrinsic definition.



>(vbrev): Ditto.



>(vclz): Ditto.



>(vctz): Ditto.



>(vwsll): Ditto.



>(vandn): Ditto.



>(vbrev8): Ditto.



>(vrev8): Ditto.



>(vrol): Ditto.



>(vror): Ditto.



>(vclmul): Ditto.



>(vclmulh): Ditto.



>(vghsh): Ditto.



>(vgmul): Ditto.



>(vaesef): Ditto.



>(vaesem): Ditto.



>(vaesdf): Ditto.



>(vaesdm): Ditto.



>(vaesz): Ditto.



>(vaeskf1): Ditto.



>(vaeskf2): Ditto.



>(vsha2ms): Ditto.



>(vsha2ch): Ditto.



>(vsha2cl): Ditto.



>(vsm4k): Ditto.



>(vsm4r): Ditto.



>(vsm3me): Ditto.



>(vsm3c): Ditto.



>* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):



>Add new function_shape for crypto vector.



>(struct crypto_vi_def): Ditto.



>(struct crypto_vv_no_op_type_def): Ditto.



>(SHAPE): Add SHAPE declaration of crypto vector.



>* config/riscv/riscv-vector-builtins-shapes.h: Ditto.



>* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):



>Add new data type for crypto vector.



>(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.



>(vuint32mf2_t): Ditto.



>(vuint32m1_t): Ditto.



>(vuint32m2_t): Ditto.



>(vuint32m4_t): Ditto.



>(vuint32m8_t): Ditto.



>(vuint64m1_t): Ditto.



>(vuint64m2_t): Ditto.



>(vuint64m4_t): Ditto.



>(vuint64m8_t): Ditto.



>* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):



>Add new data struct for crypto vector.



>(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.



>(registered_function::overloaded_hash): Processing size_t uimm for C 
>overloaded func.



>* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.



>---



>.../riscv/riscv-vector-builtins-bases.cc  | 264 +-



>.../riscv/riscv-vector-builtins-bases.h   |  28 ++



>.../riscv/riscv-vector-builtins-functions.def |  94 +++



>.../riscv/riscv-vector-builtins-shapes.cc |  87 +-



>.../riscv/riscv-vector-builtins-shapes.h  |   4 +



>.../riscv/riscv-vector-builtins-types.def |  25 ++



>gcc/config/riscv/riscv-vector-builtins.cc | 133 -



>gcc/config/riscv/riscv-vector-builtins.def    |   1 +



>gcc/config/riscv/riscv-vector-builtins.h  |   8 +



>9 files changed, 641 insertions(+), 3 deletions(-)



> 



>diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
>b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>index d70468542ee..d12bb89f91c 100644



>--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc



>+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc



>@@ -2127,6 +2127,212 @@ public:



>   }



>};



>+/* Below implements are vector crypto */



>+/* Implements vandn.[vv,vx] */



>+class vandn : public function_base



>+{



>+public:



>+  rtx expand (function_expander &e) const override



>+  {



>+    switch (e.op_info->op)



>+  {



>+  case OP_TYPE_vv:



>+    return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));



>+  case OP_TYPE_vx:



>+    return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
>()));



>+  default:



>+    gcc_unreachable ();



>+  }



>+  }



>+};



>+



>+/* Implements vrol/vror/clz/ctz.  */



>+template



>+class bitmanip : public function_base



>+{



>+public:



>+  bool apply_tail_policy_p () const override



>+  {



>+    return (CODE == CLZ

[PATCH v6 1/2] RISC-V: Add crypto vector builtin function.

2024-01-02 Thread Feng Wang

Patch v6:Remove unused code.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.

This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).

Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def 
(REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def 
(DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C 
overloaded func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
 .../riscv/riscv-vector-builtins-bases.cc  | 264 +-
 .../riscv/riscv-vector-builtins-bases.h   |  28 ++
 .../riscv/riscv-vector-builtins-functions.def |  94 +++
 .../riscv/riscv-vector-builtins-shapes.cc |  87 +-
 .../riscv/riscv-vector-builtins-shapes.h  |   4 +
 .../riscv/riscv-vector-builtins-types.def |  25 ++
 gcc/config/riscv/riscv-vector-builtins.cc | 133 -
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 8 files changed, 633 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
 };
 
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:

Re: [PATCH v6 1/2] RISC-V: Add crypto vector builtin function.

2024-01-02 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-02 17:18
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v6 1/2] RISC-V: Add crypto vector builtin function.
Patch v6:Remove unused code.
Patch v5:Rebase.
Patch v4:Merge crypto vector function.def into vector.
Patch v3:Define a shape for vaesz and merge vector-crypto-types.def
 into riscv-vector-builtins-types.def.
Patch v2:Optimize function_shape class for crypto_vector.
 
This patch add the intrinsic funtions of crypto vector based on the
intrinsic doc(https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob
/eopc/vector-crypto/auto-generated/vector-crypto/intrinsic_funcs.md).
 
Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc (class vandn):
Add new function_base for crypto vector.
(class bitmanip): Ditto. 
(class b_reverse):Ditto. 
(class vwsll):   Ditto. 
(class clmul):   Ditto. 
(class vg_nhab):  Ditto. 
(class crypto_vv):Ditto. 
(class crypto_vi):Ditto. 
(class vaeskf2_vsm3c):Ditto.
(class vsm3me): Ditto.
(BASE): Add BASE declaration for crypto vector.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (REQUIRED_EXTENSIONS):
Add crypto vector intrinsic definition.
(vbrev): Ditto.
(vclz): Ditto.
(vctz): Ditto.
(vwsll): Ditto.
(vandn): Ditto.
(vbrev8): Ditto.
(vrev8): Ditto.
(vrol): Ditto.
(vror): Ditto.
(vclmul): Ditto.
(vclmulh): Ditto.
(vghsh): Ditto.
(vgmul): Ditto.
(vaesef): Ditto.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz): Ditto.
(vaeskf1): Ditto.
(vaeskf2): Ditto.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
(vsm4k): Ditto.
(vsm4r): Ditto.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def):
Add new function_shape for crypto vector.
(struct crypto_vi_def): Ditto.
(struct crypto_vv_no_op_type_def): Ditto.
(SHAPE): Add SHAPE declaration of crypto vector.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data type for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_CRYPTO_SEW32_OPS):
Add new data struct for crypto vector.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
(registered_function::overloaded_hash): Processing size_t uimm for C overloaded 
func.
* config/riscv/riscv-vector-builtins.def (vi): Add vi OP_TYPE.
---
.../riscv/riscv-vector-builtins-bases.cc  | 264 +-
.../riscv/riscv-vector-builtins-bases.h   |  28 ++
.../riscv/riscv-vector-builtins-functions.def |  94 +++
.../riscv/riscv-vector-builtins-shapes.cc |  87 +-
.../riscv/riscv-vector-builtins-shapes.h  |   4 +
.../riscv/riscv-vector-builtins-types.def |  25 ++
gcc/config/riscv/riscv-vector-builtins.cc | 133 -
gcc/config/riscv/riscv-vector-builtins.def|   1 +
8 files changed, 633 insertions(+), 3 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index d70468542ee..d12bb89f91c 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2127,6 +2127,212 @@ public:
   }
};
+/* Below implements are vector crypto */
+/* Implements vandn.[vv,vx] */
+class vandn : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vandn (e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vandn_scalar (e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
+/* Implements vrol/vror/clz/ctz.  */
+template
+class bitmanip : public function_base
+{
+public:
+  bool apply_tail_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool apply_mask_policy_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  bool has_merge_operand_p () const override
+  {
+return (CODE == CLZ || CODE == CTZ) ? false : true;
+  }
+  
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+{
+  case OP_TYPE_v:
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_v (CODE, e.vector_mode ()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_v_scalar (CODE, e.vector_mode 
()));
+  default:
+gcc_unreachable ();
+}
+  }
+};
+
+/* Implements vbrev/vbrev8/vrev8.  */
+template
+class b_reverse : public function_base
+{
+public:
+  rtx expand (function_expander &e) const override
+  {
+  retu

[PATCH v3 00/12] [GCC] arm: vld1q vst1 vst1q vst1 intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

Add vld1q, vst1, vst1q and vst1 intrinsics to arm port.

Ezra Sitorus (12):
  [GCC] arm: vld1q_types_x2 ACLE intrinsics
  [GCC] arm: vld1q_types_x3 ACLE intrinsics
  [GCC] arm: vld1q_types_x4 ACLE intrinsics
  [GCC] arm: vst1_types_x2 ACLE intrinsics
  [GCC] arm: vst1_types_x3 ACLE intrinsics
  [GCC] arm: vst1_types_x4 ACLE intrinsics
  [GCC] arm: vst1q_types_x2 ACLE intrinsics
  [GCC] arm: vst1q_types_x3 ACLE intrinsics
  [GCC] arm: vst1q_types_x4 ACLE intrinsics
  [GCC] arm: vld1_types_x2 ACLE intrinsics
  [GCC] arm: vld1_types_x3 ACLE intrinsics
  [GCC] arm: vld1_types_x4 ACLE intrinsics

 gcc/config/arm/arm_neon.h | 2032 ++---
 gcc/config/arm/arm_neon_builtins.def  |   12 +
 gcc/config/arm/iterators.md   |6 +
 gcc/config/arm/neon.md|  249 ++
 gcc/config/arm/unspecs.md |8 +
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  176 ++
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   23 +
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   23 +
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   23 +
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  183 ++
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   24 +
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   24 +
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   24 +
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  176 ++
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   22 +
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   23 +
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   23 +
 .../gcc.target/arm/simd/vst1q_base_xN_1.c |  185 ++
 .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   24 +
 .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   24 +
 .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   24 +
 21 files changed, 3018 insertions(+), 290 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_p64_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_p64_xN_1.c

-- 
2.25.1

[PATCH v3 01/12] [GCC] arm: vld1q_types_x2 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x2 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  67 +
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |  14 ++
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |  14 ++
 7 files changed, 247 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index cdfdb44259a..3eb41c6bdc8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10403,6 +10403,15 @@ vld1q_p64 (const poly64_t * __a)
   return (poly64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ extern __inline poly64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10432,6 +10441,42 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ extern __inline int8x16x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x2 (const int8_t * __a)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x2 (const int16_t * __a)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x2 (const int32_t * __a)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x2 (const int64_t * __a)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10448,6 +10493,26 @@ vld1q_f32 (const float32_t * __a)
   return (float32x4_t)__builtin_neon_vld1v4sf ((const __builtin_neon_sf *) 
__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x2 (const float16_t * __a)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x2 (const float32_t * __a)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __

[PATCH v3 03/12] [GCC] arm: vld1q_types_x4 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x4 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x4, vld1q_u16_x4, vld1q_u32_x4, vld1q_u64_x4): New.
(vld1q_s8_x4, vld1q_s16_x4, vld1q_s32_x4, vld1q_s64_x4): New.
(vld1q_f16_x4, vld1q_f32_x4): New.
(vld1q_p8_x4, vld1q_p16_x4, vld1q_p64_x4): New.
(vld1q_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vld1_x4): New entries.
* config/arm/neon.md
(neon_vld1_x4): New.
(neon_vld1x4qa, neon_vld1x4qb): New
* config/arm/unspecs.md
(UNSPEC_VLD1X4A, UNSPEC_VLD1X4B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Updated.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  48 +++
 gcc/config/arm/unspecs.md |   2 +
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  71 --
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   9 +-
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   9 +-
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   9 +-
 8 files changed, 263 insertions(+), 14 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 557873ac028..c03be9912f8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10421,6 +10421,15 @@ vld1q_p64_x3 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x4 (const poly64_t * __a)
+{
+  union { poly64x2x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10522,6 +10531,42 @@ vld1q_s64_x3 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x4 (const uint8_t * __a)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x4 (const uint16_t * __a)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x4 (const int32_t * __a)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x4 (const int64_t * __a)
+{
+  union { int64x2x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10578,6 +10623,26 @@ vld1q_f32_x3 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x4 (const float16_t * __a)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x4 (const float32_t * __a)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1q_u8 (const uint8_t * __a)
@@ -10678,6 +10743,42 @@ vld1q_u64_x3 (const uint64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline uint8x16x4_t
+__attribute__

[PATCH v3 05/12] [GCC] arm: vst1_types_x3 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x3 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x3, vst1_u16_x3, vst1_u32_x3, vst1_u64_x3): New.
(vst1_s8_x3, vst1_s16_x3, vst1_s32_x3, vst1_s64_x3): New.
(vst1_f16_x3, vst1_f32_x3): New.
(vst1_p8_x3, vst1_p16_x3, vst1_p64_x3): New.
(vst1_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vst1_x3): New entries.
* config/arm/neon.md (vst1_x3): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  63 +-
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
 7 files changed, 202 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 60f1077752c..e76be3516d9 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11250,6 +11250,14 @@ vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11311,6 +11319,38 @@ vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x3 (int16_t * __a, int16x4x3_t __b)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x3 (int32_t * __a, int32x2x3_t __b)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11345,6 +11385,24 @@ vst1_f32_x2 (float32_t * __a, float32x2x2_t __b)
   __builtin_neon_vst1_x2v2sf ((__builtin_neon_sf *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x3 (float16_t * __a, float16x4x3_t __b)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11405,6 +11463,38 @@ vst1_u64_x2 (uint64_t * __a, uint64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_u8_x3 (uint8_t * __a, uint8x8x3_t __b)
+{

[PATCH v3 10/12] [GCC] arm: vld1_types_x2 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x2 variants of the vld1 intrinsic.

The previous vld1_x2 has been updated to vld1q_x2 to take into
account that it works with 4-word-length types. vld1_x2 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x2, vld1_u16_x2, vld1_u32_x2, vld1_u64_x2): New.
(vld1_s8_x2, vld1_s16_x2, vld1_s32_x2, vld1_s64_x2): New.
(vld1_f16_x2, vld1_f32_x2): New.
(vld1_p8_x2, vld1_p16_x2, vld1_p64_x2): New.
(vld1_bf16_x2): New.
(vld1q_types_x2): Updated to use vld1q_x2 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x2): Updated entries.
(vld1q_x2): New entries, but comes from the old vld1_x2
* config/arm/neon.md
(neon_vld1_x2): Updated from
neon_vld1_x2.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   7 +-
 gcc/config/arm/neon.md|  10 +-
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  66 
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |  13 ++
 7 files changed, 256 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index af1f747f262..669b8fffb40 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10307,6 +10307,15 @@ vld1_p64 (const poly64_t * __a)
   return (poly64x1_t) { *__a };
 }
 
+__extension__ extern __inline poly64x1x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10336,6 +10345,42 @@ vld1_s64 (const int64_t * __a)
   return (int64x1_t) { *__a };
 }
 
+__extension__ extern __inline int8x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x2 (const int8_t * __a)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x2 (const int16_t * __a)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x2 (const int32_t * __a)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x2 (const int64_t * __a)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10352,6 +10397,26 @@ vld1_f32 (const float32_t * __a)
   return (float32x2_t)__builtin_neon_vld1v2sf ((const __builtin_neon_sf *) 
__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x2 (const float16_t * __a)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__,

[PATCH v3 04/12] [GCC] arm: vst1_types_x2 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x2 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x2): New.
(vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New.
(vst1_f16_x2, vst1_f32_x2): New.
(vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New.
(vst1_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vst1_x2): New entries.
* config/arm/neon.md (vst1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  67 ++
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |  13 ++
 7 files changed, 231 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c03be9912f8..60f1077752c 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11242,6 +11242,14 @@ vst1_p64 (poly64_t * __a, poly64x1_t __b)
   __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11271,6 +11279,38 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
   __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x2 (int8_t * __a, int8x8x2_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x2 (int16_t * __a, int16x4x2_t __b)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x2 (int32_t * __a, int32x2x2_t __b)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11287,6 +11327,24 @@ vst1_f32 (float32_t * __a, float32x2_t __b)
   __builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x2 (float16_t * __a, float16x4x2_t __b)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x2 (float32_t * __a, float32x2x2_t __b)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11315,6 +11373,38 @@ vst1_u64 (uint64_t *

[PATCH v3 02/12] [GCC] arm: vld1q_types_x3 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1q intrinsic for the arm port. This patch adds the
_x3 variants of the vld1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x3, vld1q_u16_x3, vld1q_u32_x3, vld1q_u64_x3): New.
(vld1q_s8_x3, vld1q_s16_x3, vld1q_s32_x3, vld1q_s64_x3): New.
(vld1q_f16_x3, vld1q_f32_x3): New.
(vld1q_p8_x3, vld1q_p16_x3, vld1q_p64_x3): New.
(vld1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vld1_x3): New entries.
* config/arm/neon.md
(neon_vld1_x3): New.
(neon_vld1x3qa, neon_vld1x3qb): New.
* config/arm/unspecs.md
(UNSPEC_VLD1X3A, UNSPEC_VLD1X3B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  48 +++
 gcc/config/arm/unspecs.md |   2 +
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  69 +-
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   8 +-
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   7 +-
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   7 +-
 8 files changed, 263 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3eb41c6bdc8..557873ac028 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10412,6 +10412,15 @@ vld1q_p64_x2 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x3 (const poly64_t * __a)
+{
+  union { poly64x2x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10477,6 +10486,42 @@ vld1q_s64_x2 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x16x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x3 (const uint8_t * __a)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x3 (const uint16_t * __a)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x3 (const int32_t * __a)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x3 (const int64_t * __a)
+{
+  union { int64x2x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10513,6 +10558,26 @@ vld1q_f32_x2 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x3 (const float16_t * __a)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x3 (const float32_t * __a)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1q_u8 (const uint8_t * __a)
@@ -10577,6 +10642,42 @@ vld1q_u64_x2 (const uint64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline uint8

[PATCH v3 07/12] [GCC] arm: vst1q_types_x2 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x2 variants of the vst1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x2, vst1q_u16_x2, vst1q_u32_x2, vst1q_u64_x2): New.
(vst1q_s8_x2, vst1q_s16_x2, vst1q_s32_x2, vst1q_s64_x2): New.
(vst1q_f16_x2, vst1q_f32_x2): New.
(vst1q_p8_x2, vst1q_p16_x2, vst1q_p64_x2): New.
(vst1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vst1<_x2): New entries.
* config/arm/neon.md
(neon_vst1_x2): Updated from
neon_vst1_x2.
* config/arm/iterators.md
(VMEMX2): New mode iterator.
(VMEMX2_q): New mode attribute.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/iterators.md   |   6 +
 gcc/config/arm/neon.md|   6 +-
 .../gcc.target/arm/simd/vst1q_base_xN_1.c |  70 +++
 .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |  13 ++
 8 files changed, 233 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c9bdda39663..1c447b6d42f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11327,6 +11327,38 @@ vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x2 (int8_t * __a, int8x16x2_t __b)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x2 (int16_t * __a, int16x8x2_t __b)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x2 (int32_t * __a, int32x4x2_t __b)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x2 (int64_t * __a, int64x2x2_t __b)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11656,6 +11688,14 @@ vst1q_p64 (poly64_t * __a, poly64x2_t __b)
   __builtin_neon_vst1v2di ((__builtin_neon_di *) __a, (int64x2_t) __b);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t __b)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11701,6 +11741,24 @@ vst1q_f32 (float32_t * __a, float32x4_t __b)
   __builtin_neon_vst1v4sf ((__builtin_neon_sf *) __a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x2 (float16_t * __a, float16x8x2_t __b)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x2v8hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f32_x2 (float32_t * __a, float32x4x2_t __b)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  _

[PATCH v3 09/12] [GCC] arm: vst1q_types_x4 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x4 variants of the vst1q intrinsic.

ACLE:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x4, vst1q_u16_x4, vst1q_u32_x4, vst1q_u64_x4): New.
(vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_s64_x4): New.
(vst1q_f16_x4, vst1q_f32_x4): New.
(vst1q_p8_x4, vst1q_p16_x4, vst1q_p64_x4): New.
(vst1q_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vst1q_x4): New entries.
* config/arm/neon.md
(neon_vst1q_x4): New.
(neon_vst1x4qa, neon_vst1x4qb): New.
* config/arm/unspecs.md
(UNSPEC_VST1X4A, UNSPEC_VST1X4B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Updated
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Updated
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  47 
 gcc/config/arm/unspecs.md |   2 +
 .../gcc.target/arm/simd/vst1q_base_xN_1.c |  71 +--
 .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   9 +-
 .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   9 +-
 .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |  17 ++-
 8 files changed, 252 insertions(+), 18 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 5cec7dd876f..af1f747f262 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11391,6 +11391,38 @@ vst1q_s64_x3 (int64_t * __a, int64x2x3_t __b)
   __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x4 (int8_t * __a, int8x16x4_t __b)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x4 (int16_t * __a, int16x8x4_t __b)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x4 (int32_t * __a, int32x4x4_t __b)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x4 (int64_t * __a, int64x2x4_t __b)
+{
+  union { int64x2x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11736,6 +11768,14 @@ vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t __b)
   __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x4 (poly64_t * __a, poly64x2x4_t __b)
+{
+  union { poly64x2x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11817,6 +11857,24 @@ vst1q_f32_x3 (float32_t * __a, float32x4x3_t __b)
   __builtin_neon_vst1q_x3v4sf (__a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x4 (float16_t * __a, float16x8x4_t __b)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v8hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f32_x4 (float32_t * __a, float32x4x4_t __b)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __bu = { __b };
+  __builtin_neon_vst1q_x4v4sf (__a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1q_u8 (uint8_t * __a, uint8x16_t __b)
@@ -11909,6 +11967,38 @@ vst1q_u64_x3 (uint64_t * __a, uint64x2x3_t __b)
   __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu._

[PATCH v3 06/12] [GCC] arm: vst1_types_x4 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1 intrinsic for the arm port. This patch adds the
_x4 variants of the vst1 intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x4, vst1_u16_x4, vst1_u32_x4, vst1_u64_x4): New.
(vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_s64_x4): New.
(vst1_f16_x4, vst1_f32_x4): New.
(vst1_p8_x4, vst1_p16_x4, vst1_p64_x4): New.
(vst1_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vst1_x4): New entries.
* config/arm/neon.md (vst1_x4): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Updated.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  62 +-
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   6 +-
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
 7 files changed, 200 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index e76be3516d9..c9bdda39663 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11258,6 +11258,14 @@ vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x4 (poly64_t * __a, poly64x1x4_t __b)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11351,6 +11359,38 @@ vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x4 (int8_t * __a, int8x8x4_t __b)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x4 (int16_t * __a, int16x4x4_t __b)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x4 (int32_t * __a, int32x2x4_t __b)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x4 (int64_t * __a, int64x1x4_t __b)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11403,6 +11443,24 @@ vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
   __builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x4 (float16_t * __a, float16x4x4_t __b)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x4 (float32_t * __a, float32x2x4_t __b)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11495,6 +11553,38 @@ vst1_u64_x3 (uint64_t * __a, uint64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_u8_x4 (uint8_t * __a, uint8x8x4_t __b)
+{

[PATCH v3 08/12] [GCC] arm: vst1q_types_x3 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vst1q intrinsic for the arm port. This patch adds the
_x3 variants of the vst1q intrinsic.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1q_u8_x3, vst1q_u16_x3, vst1q_u32_x3, vst1q_u64_x3): New.
(vst1q_s8_x3, vst1q_s16_x3, vst1q_s32_x3, vst1q_s64_x3): New.
(vst1q_f16_x3, vst1q_f32_x3): New.
(vst1q_p8_x3, vst1q_p16_x3, vst1q_p64_x3): New.
(vst1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vst1q_x3): New entries.
* config/arm/neon.md
(neon_vst1q_x3): New.
(neon_vld1x3qa, neon_vst1x3qb): New.
* config/arm/unspecs.md
(UNSPEC_VST1X3A, UNSPEC_VST1X3B): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1q_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  47 
 gcc/config/arm/unspecs.md |   2 +
 .../gcc.target/arm/simd/vst1q_base_xN_1.c |  68 ++-
 .../gcc.target/arm/simd/vst1q_bf16_xN_1.c |   8 +-
 .../gcc.target/arm/simd/vst1q_fp16_xN_1.c |   8 +-
 .../gcc.target/arm/simd/vst1q_p64_xN_1.c  |   8 +-
 8 files changed, 249 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 1c447b6d42f..5cec7dd876f 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11359,6 +11359,38 @@ vst1q_s64_x2 (int64_t * __a, int64x2x2_t __b)
   __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s8_x3 (int8_t * __a, int8x16x3_t __b)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v16qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s16_x3 (int16_t * __a, int16x8x3_t __b)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v8hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s32_x3 (int32_t * __a, int32x4x3_t __b)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v4si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_s64_x3 (int64_t * __a, int64x2x3_t __b)
+{
+  union { int64x2x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
@@ -11696,6 +11728,14 @@ vst1q_p64_x2 (poly64_t * __a, poly64x2x2_t __b)
   __builtin_neon_vst1q_x2v2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_p64_x3 (poly64_t * __a, poly64x2x3_t __b)
+{
+  union { poly64x2x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11759,6 +11799,24 @@ vst1q_f32_x2 (float32_t * __a, float32x4x2_t __b)
   __builtin_neon_vst1q_x2v4sf (__a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f16_x3 (float16_t * __a, float16x8x3_t __b)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v8hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1q_f32_x3 (float32_t * __a, float32x4x3_t __b)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __bu = { __b };
+  __builtin_neon_vst1q_x3v4sf (__a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1q_u8 (uint8_t * __a, uint8x16_t __b)
@@ -11819,6 +11877,38 @@ vst1q_u64_x2 (uint64_t * __a, uint64x2x2_t __b)
   __builtin_neon_vst1q_x2v2d

[PATCH v3 11/12] [GCC] arm: vld1_types_x3 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x3 variants of the vld1 intrinsic.

The previous vld1_x3 has been updated to vld1q_x3 to take into
account that it works with 4-word-length types. vld1_x3 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x3, vld1_u16_x3, vld1_u32_x3, vld1_u64_x3): New.
(vld1_s8_x3, vld1_s16_x3, vld1_s32_x3, vld1_s64_x3): New.
(vld1_f16_x3, vld1_f32_x3): New.
(vld1_p8_x3, vld1_p16_x3, vld1_p64_x3): New.
(vld1_bf16_x3): New.
(vld1q_types_x3): Updated to use vld1q_x3 from
arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x3): Updated entries.
(vld1q_x3): New entries, but comes from the old vld1_x2
* config/arm/neon.md
(neon_vld1q_x3): Updated from neon_vld1_x3.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   3 +-
 gcc/config/arm/neon.md|  12 +-
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  63 ++-
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   7 +-
 7 files changed, 232 insertions(+), 23 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 669b8fffb40..dbc37cafe28 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10316,6 +10316,15 @@ vld1_p64_x2 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x1x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x3 (const poly64_t * __a)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10381,6 +10390,42 @@ vld1_s64_x2 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x3 (const int8_t * __a)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x3 (const int16_t * __a)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x3 (const int32_t * __a)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x3 (const int64_t * __a)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10417,6 +10462,26 @@ vld1_f32_x2 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x3 (const float16_t * __a)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f32_x3 (const float32_t * __a)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_u8 (const uint8_t * __a)
@@ -10481,6 +10546,42 @@ vld1_u6

[PATCH v3 12/12] [GCC] arm: vld1_types_x4 ACLE intrinsics

2024-01-02 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN
variants of the vld1 intrinsic for the arm port. This patch adds the
_x4 variants of the vld1 intrinsic.

The previous vld1_x4 has been updated to vld1q_x4 to take into
account that it works with 4-word-length types. vld1_x4 is now
only for 2-word-length types.

ACLE documents:
https://developer.arm.com/documentation/ihi0053/latest/

ISA documents:
https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_u64_x4): New.
(vld1_s8_x4, vld1_s16_x4, vld1_s32_x4, vld1_s64_x4): New.
(vld1_f16_x4, vld1_f32_x4): New.
(vld1_p8_x4, vld1_p16_x4, vld1_p64_x4): New.
(vld1_bf16_x4): New.
(vld1q_types_x4): Updated to use vld1q_x4
from arm_neon_builtins.def
* config/arm/arm_neon_builtins.def
(vld1_x4): Updated entries.
(vld1q_x4): New entries, but comes from the old vld1_x4
* config/arm/neon.md
(neon_vld1q_x4): Updated from neon_vld1_x4.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1_base_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_bf16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_fp16_xN_1.c: Updated.
* gcc.target/arm/simd/vld1_p64_xN_1.c: Updated.
---
 gcc/config/arm/arm_neon.h | 156 --
 gcc/config/arm/arm_neon_builtins.def  |   3 +-
 gcc/config/arm/neon.md|  11 +-
 .../gcc.target/arm/simd/vld1_base_xN_1.c  |  63 ++-
 .../gcc.target/arm/simd/vld1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vld1_p64_xN_1.c   |   7 +-
 7 files changed, 231 insertions(+), 23 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index dbc37cafe28..8bcf1d6325e 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10325,6 +10325,15 @@ vld1_p64_x3 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x1x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_p64_x4 (const poly64_t * __a)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10426,6 +10435,42 @@ vld1_s64_x3 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s8_x4 (const int8_t * __a)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s16_x4 (const int16_t * __a)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s32_x4 (const int32_t * __a)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x1x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_s64_x4 (const int64_t * __a)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x4_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10482,6 +10527,26 @@ vld1_f32_x3 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f16_x4 (const float16_t * __a)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1_f32_x4 (const float32_t * __a)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1_u8 (const uint8_t * __a)
@@ -10582,6 +10647,42 @@ vld1_u6

Re: [PATCH v5 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-02 Thread juzhe.zh...@rivai.ai

Move all tests into gcc.target/riscv/rvv/base

All of these:
#include 

change them into:
#include "riscv_vector.h"



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-02 15:47
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v5 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v5: Rebase.
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing case s.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/zvbb-intrinsic.c: New test.
* gcc.target/riscv/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/zvbc-intrinsic.c: New test.
* gcc.target/riscv/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/zvkb.c: New test.
* gcc.target/riscv/zvkg-intrinsic.c: New test.
* gcc.target/riscv/zvkned-intrinsic.c: New test.
* gcc.target/riscv/zvknha-intrinsic.c: New test.
* gcc.target/riscv/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/zvksed-intrinsic.c: New test.
* gcc.target/riscv/zvksh-intrinsic.c: New test.
---
.../gcc.target/riscv/zvbb-intrinsic.c | 179 ++
.../riscv/zvbb_vandn_vx_constraint.c  |  15 ++
.../gcc.target/riscv/zvbc-intrinsic.c |  62 ++
.../gcc.target/riscv/zvbc_vx_constraint-2.c   |  14 ++
.../gcc.target/riscv/zvbc_vx_constraint.c |  14 ++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
.../gcc.target/riscv/zvkg-intrinsic.c |  24 +++
.../gcc.target/riscv/zvkned-intrinsic.c   | 105 ++
.../gcc.target/riscv/zvknha-intrinsic.c   |  33 
.../gcc.target/riscv/zvknhb-intrinsic.c   |  33 
.../gcc.target/riscv/zvksed-intrinsic.c   |  33 
.../gcc.target/riscv/zvksh-intrinsic.c|  24 +++
12 files changed, 549 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvbc_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvksh-intrinsic.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
new file mode 100644
index 000..7d436d2a43c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include 
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u32m1_m(mask, vs2, vl);
+}
+
+vuint64m1_t test_vbrev8_v_u64m1_tumu(vbool64_t mask, vuint64m1_t maskedoff, 
vuint64m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u64m1_tumu(mask,

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-02 Thread Richard Sandiford

Segher Boessenkool  writes:
> Hi!
>
> On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote:
>> This patch adds a combine pass that runs late in the pipeline.
>
> But it is not.  It is a completely new thing, and much closer to
> fwprop than to combine, too.

Well, it is a combine pass.  It's not a new instance of the pass in
combine.cc, but I don't think that's the implication.  We already have
two combine passes: the combine.cc one and the postreload one.
Similarly we have at least two DCE implementations.

> Could you rename it to something else, please?  Something less confusing
> to both users and maintainers :-)

Do you have any suggestions?

>> There are two instances: one between combine and split1,
>
> So, what kind of things does this do that the real combine does not?
> And, same question but for fwprop.  That would be the crucial motivation
> for why we want to have this new pass at all :-)

You've already responded (below) to the part where I explained that.

>> The pass currently has a single objective: remove definitions by
>> substituting into all uses.
>
> The easy case ;-)

And the yet a case that no existing pass handles. :)  That's why I'm
trying to add something that does.

Richard

Re：[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-02 Thread joshua


+  if (TARGET_XTHEADVECTOR)
+  {
+ emit_insn (gen_pred_th_whole_mov (mode, dest, src,
+   RVV_VLMAX, GEN_INT(VLMAX)));
+ return true;
+  }

Move it outside legitimize_move
It should be it:

if (TARGET_THEADVECTOR)
emit_th_move...
DONE;
else if (riscv_vector::legitimize_move (operands[0], &operands[1]))
DONE; 

If we move emit_insn (gen_pred_th_whole_mov...) outside legitimize,
we need to modify the patterns which will call  legitimize_move ().
I have tried and that will result in 5 extra changes in vector.md.





--
发件人：juzhe.zh...@rivai.ai 
发送时间：2024年1月2日(星期二) 10:00
收件人："cooper.joshua"; 
"gcc-patches"
抄　送：Jim Wilson; palmer; 
andrew; "philipp.tomsich"; 
jeffreyalaw; 
"christoph.muellner"; 
"cooper.joshua"; 
jinma; "cooper.qu"
主　题：Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector


+   if (TARGET_XTHEADVECTOR)
+    return false;



Move it to :
  if (TARGET_VECTOR && stringop_strategy & STRATEGY_VECTOR)
    {
      bool ok = riscv_vector::expand_block_move (dest, src, length);
      if (ok)
  return true;
    }





(define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
-   (match_operand 0 "const_csr_operand")))
+  (and (match_test "!TARGET_XTHEADVECTOR || rtx_equal_p (op, const0_rtx)")
+    (match_operand 0 "const_csr_operand"



It's hard to trace. Change it into :


(ior
1. TARGET_THEADVECTOR && rtx_equal_p (op, const0_rtx) 
2. !TAGEET_THEADVECTOR && const_csr_operand)


+  if (TARGET_XTHEADVECTOR)
+  {
+   emit_insn (gen_pred_th_whole_mov (mode, dest, src,
+     RVV_VLMAX, GEN_INT(VLMAX)));
+   return true;
+  }



Move it outside legitimize_move
It should be it:


if (TARGET_THEADVECTOR)
emit_th_move...
DONE;
else if (riscv_vector::legitimize_move (operands[0], &operands[1]))
    DONE; 




vsetvli issues:
I wonder whether we can use ASM_OUTPUT_OPCODE to recognize 
"ta,ma"/"ta,mu"/"tu,ma"/"tu,mu" and replace these 4 variants
by "". So that we don't have tail policy and mask policy in vsetvli ASM string.


Another alternative approach is we can change vsetlvi ASM rule:


"vset%i1vli\t%0,%1,e%2,%m3,t%p4,m%p5"


if (TARGET_THEADVECTOR)
...
else
        else if (code == CONST_INT)
          {
            /* Tail && Mask policy.  */
            asm_fprintf (file, "%s", IS_AGNOSTIC (UINTVAL (op)) ? "a" : "u");
          }



in riscv.cc.


The benefit is that we can avoid adding all th_vsetvl patterns and invasive 
code changs in VSETVL PASS.




juzhe.zh...@rivai.ai

 
From: Jun Sha (Joshua)
Date: 2023-12-29 12:21
To: gcc-patches
CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
Subject: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 
 
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.
 
gcc/ChangeLog:
 
* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (legitimize_move):
New expansion.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Like

RE: skip vector profiles multiple exits

2024-01-02 Thread Tamar Christina

> -Original Message-
> From: Jan Hubicka 
> Sent: Friday, December 29, 2023 10:32 PM
> To: Tamar Christina 
> Cc: rguent...@suse.de; GCC Patches ; nd
> 
> Subject: Re: skip vector profiles multiple exits
> 
> > Hi Honza,
> Hi,
> >
> > I wasn't sure what to do here so I figured I'd ask.
> >
> > In adding support for multiple exits to the vectorizer I didn't know how to 
> > update
> this bit:
> >
> > https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-vect-loop-
> manip.cc#L3363
> >
> > Essentially, if skip_vector (i.e. not enough iteration to enter the vector 
> > loop) then
> the
> > previous code would update the new probability to be the same as that of the
> > exit edge.  This made sense because that's the only edge which could bring 
> > you to
> > the next loop preheader.
> >
> > With multiple exits this is no longer the case since any exit can bring you 
> > to the
> > Preaheader node.  I figured the new counts should simply be the sum of all 
> > exit
> > edges.  But that gives quite large count values compared to the rest of the 
> > loop.
> The sum of all exit counts (not probabilities) relative to header count should
> give you estimated probability that the loop iterates at any given
> iteration.  I am not sure how good estimate this is for loop
> preconditioning to be true (without profile histograms it is really hard
> to tell).
Happy new years!

Ah, so I need to subtract the loop header from the sum? I'll try 😊

> >
> > I then thought I would need to scale the counts by the probability of the 
> > edge
> > being taken.  The problem here is that the probabilities don't end up to 
> > 100%
> 
> So you are summing exit_edge->count ()?
> I am not sure how useful would be summit probabilities since they are
> conditional (relative to probability of entering BB you go to).
> How complicated CFG we now handle with vectorization?
> 

Yeah I as trying to sum the edge counts.  The CFG can get quite complicated
because we allow vectorization of any arbitrary number of exits as long as
that exit leaves the loop body.

In this current version we force everything to the scalar epilog, so the merge
block can get any number of incoming edges now.  Aside from this we still
support versioning and skip_epilog so you have the additional edges coming
in from there too.

Regards,
Tamar

> Honza
> >
> > so the scaled counts also looked kinda wonkey.   Any suggestions?
> >
> > If you want some small examples to look at, testcases
> > ./gcc/testsuite/gcc.dg/vect/vect-early-break_90.c to
> ./gcc/testsuite/gcc.dg/vect/vect-early-break_93.c
> > should be relevant here.
> >
> > Thanks,
> > Tamar

Ping: Re: [PATCH] libiberty/buildargv: POSIX behaviour for backslash handling

2024-01-02 Thread Andrew Burgess



Ping!

Thanks,
Andrew

Andrew Burgess  writes:

> GDB makes use of the libiberty function buildargv for splitting the
> inferior (program being debugged) argument string in the case where
> the inferior is not being started under a shell.
>
> I have recently been working to improve this area of GDB, and have
> tracked done some of the unexpected behaviour to the libiberty
> function buildargv, and how it handles backslash escapes.
>
> For reference, I've been mostly reading:
>
>   https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html
>
> The issues that I would like to fix are:
>
>   1. Backslashes within single quotes should not be treated as an
>   escape, thus: '\a' should split to \a, retaining the backslash.
>
>   2. Backslashes within double quotes should only act as an escape if
>   they are immediately before one of the characters $ (dollar),
>   ` (backtick), " (double quote), ` (backslash), or \n (newline).  In
>   all other cases a backslash should not be treated as an escape
>   character.  Thus: "\a" should split to \a, but "\$" should split to
>   $.
>
>   3. A backslash-newline sequence should be treated as a line
>   continuation, both the backslash and the newline should be removed.
>
> I've updated libiberty and also added some tests.  All the existing
> libiberty tests continue to pass, but I'm not sure if there is more
> testing that should be done, buildargv is used within lto-wraper.cc,
> so maybe there's some testing folk can suggest that I run?
> ---
>  libiberty/argv.c  |  8 +--
>  libiberty/testsuite/test-expandargv.c | 34 +++
>  2 files changed, 40 insertions(+), 2 deletions(-)
>
> diff --git a/libiberty/argv.c b/libiberty/argv.c
> index c2823d3e4ba..6bae4ca2ee9 100644
> --- a/libiberty/argv.c
> +++ b/libiberty/argv.c
> @@ -224,9 +224,13 @@ char **buildargv (const char *input)
> if (bsquote)
>   {
> bsquote = 0;
> -   *arg++ = *input;
> +   if (*input != '\n')
> + *arg++ = *input;
>   }
> -   else if (*input == '\\')
> +   else if (*input == '\\'
> +&& !squote
> +&& (!dquote
> +|| strchr ("$`\"\\\n", *(input + 1)) != NULL))
>   {
> bsquote = 1;
>   }
> diff --git a/libiberty/testsuite/test-expandargv.c 
> b/libiberty/testsuite/test-expandargv.c
> index 30f2337ef77..b8dcc6a269a 100644
> --- a/libiberty/testsuite/test-expandargv.c
> +++ b/libiberty/testsuite/test-expandargv.c
> @@ -142,6 +142,40 @@ const char *test_data[] = {
>"b",
>0,
>  
> +  /* Test 7 - No backslash removal within single quotes.  */
> +  "'a\\$VAR' '\\\"'",/* Test 7 data */
> +  ARGV0,
> +  "@test-expandargv-7.lst",
> +  0,
> +  ARGV0,
> +  "a\\$VAR",
> +  "\\\"",
> +  0,
> +
> +  /* Test 8 - Remove backslash / newline pairs.  */
> +  "\"ab\\\ncd\" ef\\\ngh",/* Test 8 data */
> +  ARGV0,
> +  "@test-expandargv-8.lst",
> +  0,
> +  ARGV0,
> +  "abcd",
> +  "efgh",
> +  0,
> +
> +  /* Test 9 - Backslash within double quotes.  */
> +  "\"\\$VAR\" \"\\`\" \"\\\"\" \"\" \"\\n\" \"\\t\"",/* Test 9 data 
> */
> +  ARGV0,
> +  "@test-expandargv-9.lst",
> +  0,
> +  ARGV0,
> +  "$VAR",
> +  "`",
> +  "\"",
> +  "\\",
> +  "\\n",
> +  "\\t",
> +  0,
> +
>0 /* Test done marker, don't remove. */
>  };
>  
>
> base-commit: 458e7c937924bbcef80eb006af0b61420dbfc1c1
> -- 
> 2.25.4

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-02 Thread Vaseeharan Vinayagamoorthy

Hi Lipeng,

It looks like your draft patch to fix the builds for arm-none-eabi target is 
not merged yet, because our arm-none-eabi builds are still broken. Are you 
waiting for additional information, or would you be able to fix this issue?

Kind regards,
Vasee

From: Richard Earnshaw 
Sent: 15 December 2023 19:23
To: Lipeng Zhu ; Richard Earnshaw 
; ja...@redhat.com 
Cc: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org 
; hongjiu...@intel.com ; 
pan.d...@intel.com ; rep.dot@gmail.com 
; tianyou...@intel.com ; 
tkoe...@netcologne.de ; wangyang@intel.com 

Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock



On 15/12/2023 11:31, Lipeng Zhu wrote:
>
>
> On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:
>> On 09/12/2023 15:39, Lipeng Zhu wrote:
>>> This patch try to introduce the rwlock and split the read/write to
>>> unit_root tree and unit_cache with rwlock instead of the mutex to
>>> increase CPU efficiency. In the get_gfc_unit function, the percentage
>>> to step into the insert_unit function is around 30%, in most instances,
>>> we can get the unit in the phase of reading the unit_cache or unit_root
>>> tree. So split the read/write phase by rwlock would be an approach to
>>> make it more parallel.
>>>
>>> BTW, the IPC metrics can gain around 9x in our test
>>> server with 220 cores. The benchmark we used is
>>> https://github.com/rwesson/NEAT
>>>
>>> libgcc/ChangeLog:
>>>
>>> * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
>>> (__gthrw): New function.
>>> (__gthread_rwlock_rdlock): New function.
>>> (__gthread_rwlock_tryrdlock): New function.
>>> (__gthread_rwlock_wrlock): New function.
>>> (__gthread_rwlock_trywrlock): New function.
>>> (__gthread_rwlock_unlock): New function.
>>>
>>> libgfortran/ChangeLog:
>>>
>>> * io/async.c (DEBUG_LINE): New macro.
>>> * io/async.h (RWLOCK_DEBUG_ADD): New macro.
>>> (CHECK_RDLOCK): New macro.
>>> (CHECK_WRLOCK): New macro.
>>> (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
>>> (IN_RWLOCK_DEBUG_QUEUE): New macro.
>>> (RDLOCK): New macro.
>>> (WRLOCK): New macro.
>>> (RWUNLOCK): New macro.
>>> (RD_TO_WRLOCK): New macro.
>>> (INTERN_RDLOCK): New macro.
>>> (INTERN_WRLOCK): New macro.
>>> (INTERN_RWUNLOCK): New macro.
>>> * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
>>> a comment.
>>> (unit_lock): Remove including associated internal_proto.
>>> (unit_rwlock): New declarations including associated internal_proto.
>>> (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
>>> instead of __gthread_mutex_lock and __gthread_mutex_unlock on
>>> unit_lock.
>>> * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
>>> unit_rwlock instead of LOCK and UNLOCK on unit_lock.
>>> (st_write_done_worker): Likewise.
>>> * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
>>> comment. Use unit_rwlock variable instead of unit_lock variable.
>>> (get_gfc_unit_from_unit_root): New function.
>>> (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
>>> instead of LOCK and UNLOCK on unit_lock.
>>> (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
>>> LOCK and UNLOCK on unit_lock.
>>> (close_units): Likewise.
>>> (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
>>> unit_lock.
>>> * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
>>> instead of LOCK and UNLOCK on unit_lock.
>>> (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
>>> of LOCK and UNLOCK on unit_lock.
>>>
>>
>> It looks like this has broken builds on arm-none-eabi when using newlib:
>>
>> In file included from
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
>> /runtime/error.c:27:
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In
>> function
>> ‘dec_waiting_unlocked’:
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: 
>> error
>> : implicit declaration of function ‘WRLOCK’
>> [-Wimplicit-function-declaration]
>>   1023 |   WRLOCK (&unit_rwlock);
>>|   ^~
>> /work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: 
>> error
>> : implicit declaration of function ‘RWUNLOCK’
>> [-Wimplicit-function-declaration]
>>   1025 |   RWUNLOCK (&unit_rwlock);
>>|   ^~~~
>>
>>
>> R.
>
> Hi Richard,
>
> The root cause is that the macro WRLOCK and RWUNLOCK are not defined in
> io.h. The reason of x86 platform not failed is that
> HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never
> been used. Code logic show as below:
> #ifdef HAVE_ATOMIC_FETCH_ADD
>(void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
> #else
>WRLOCK (&unit_rwlock);
>u->waiting--;
>RWUNLOCK (&unit_rwlock);
> #endif
>
> I just draft a patch try to fix this bug, because I didn't have arm
> p

[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-02 Thread Jun Sha (Joshua)

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   6 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 +++--
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  |  69 +++
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  55 --
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 16 files changed, 425 insertions(+), 208 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..1445d98c147 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8b8a92f10a1..1fac56c7095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 riscv_vector::expand_rawmemchr(mode, operands[0], operands[1],
   operands[2]);
diff --git a/gcc/config/riscv/predicates.md b/gcc/con

[PATCH v3] RISC-V: Bugfix for doesn't honor no-signed-zeros option

2024-01-02 Thread pan2 . li

From: Pan Li 

According to the sematics of no-signed-zeros option, the backend
like RISC-V should treat the minus zero -0.0f as plus zero 0.0f.

Consider below example with option -fno-signed-zeros.

void
test (float *a)
{
  *a = -0.0;
}

We will generate code as below, which doesn't treat the minus zero
as plus zero.

test:
  lui  a5,%hi(.LC0)
  flw  fa5,%lo(.LC0)(a5)
  fsw  fa5,0(a0)
  ret

.LC0:
  .word -2147483648 // aka -0.0 (0x8000 in hex)

This patch would like to fix the bug and treat the minus zero -0.0
as plus zero, aka +0.0. Thus after this patch we will have asm code
as below for the above sampe code.

test:
  sw zero,0(a0)
  ret

This patch also fix the run failure of the test case pr30957-1.c. The
below tests are passed for this patch.

* The riscv regression tests.
* The pr30957-1.c run tests.

gcc/ChangeLog:

* config/riscv/constraints.md: Leverage func 
riscv_float_const_zero_rtx_p
for predicating the rtx is const zero float or not.
* config/riscv/predicates.md: Ditto.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
(riscv_float_const_zero_rtx_p): New func impl for predicating the rtx is
const zero float or not.
(riscv_const_zero_rtx_p): New func impl for predicating the rtx
is const zero (both int and fp) or not.
* config/riscv/riscv-protos.h (riscv_float_const_zero_rtx_p):
New func decl.
(riscv_const_zero_rtx_p): Ditto.
* config/riscv/riscv.md: Making sure the operand[1] of movfp is
CONST0_RTX when the operand[1] is const zero float.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/no-signed-zeros-0.c: New test.
* gcc.target/riscv/no-signed-zeros-1.c: New test.
* gcc.target/riscv/no-signed-zeros-2.c: New test.
* gcc.target/riscv/no-signed-zeros-3.c: New test.
* gcc.target/riscv/no-signed-zeros-4.c: New test.
* gcc.target/riscv/no-signed-zeros-5.c: New test.
* gcc.target/riscv/no-signed-zeros-run-0.c: New test.
* gcc.target/riscv/no-signed-zeros-run-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/constraints.md   |  2 +-
 gcc/config/riscv/predicates.md|  2 +-
 gcc/config/riscv/riscv-protos.h   |  2 +
 gcc/config/riscv/riscv.cc | 35 -
 gcc/config/riscv/riscv.md | 49 ---
 .../gcc.target/riscv/no-signed-zeros-0.c  | 26 ++
 .../gcc.target/riscv/no-signed-zeros-1.c  | 28 +++
 .../gcc.target/riscv/no-signed-zeros-2.c  | 26 ++
 .../gcc.target/riscv/no-signed-zeros-3.c  | 28 +++
 .../gcc.target/riscv/no-signed-zeros-4.c  | 26 ++
 .../gcc.target/riscv/no-signed-zeros-5.c  | 28 +++
 .../gcc.target/riscv/no-signed-zeros-run-0.c  | 36 ++
 .../gcc.target/riscv/no-signed-zeros-run-1.c  | 36 ++
 13 files changed, 314 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/no-signed-zeros-run-1.c

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index de4359af00d..db1d5e1385f 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -108,7 +108,7 @@ (define_constraint "DnS"
 (define_constraint "G"
   "@internal"
   (and (match_code "const_double")
-   (match_test "op == CONST0_RTX (mode)")))
+   (match_test "riscv_float_const_zero_rtx_p (op)")))
 
 (define_memory_constraint "A"
   "An address that is held in a general-purpose register."
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index b87a6900841..b428d842101 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -78,7 +78,7 @@ (define_predicate "sleu_operand"
 
 (define_predicate "const_0_operand"
   (and (match_code "const_int,const_wide_int,const_double,const_vector")
-   (match_test "op == CONST0_RTX (GET_MODE (op))")))
+   (match_test "riscv_const_zero_rtx_p (op)")))
 
 (define_predicate "const_1_operand"
   (and (match_code "const_int,const_wide_int,const_vector")
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..fcf30e084a3 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -131,6 +131,8 @@ extern void riscv_asm_output_external (FILE *, const tree, 
const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int

[PATCH] libsanitizer: Enable LSan and TSan for riscv64

2024-01-02 Thread Andreas Schwab

All new (tsan) tests are working as expected.

* configure.tgt (riscv64-*-linux*): Enable LSan and TSan.
---
 libsanitizer/configure.tgt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
index d24566a2343..38fc7001ff7 100644
--- a/libsanitizer/configure.tgt
+++ b/libsanitizer/configure.tgt
@@ -72,6 +72,11 @@ case "${target}" in
   x86_64-*-solaris2.11* | i?86-*-solaris2.11*)
;;
   riscv64-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x8; then
+   TSAN_SUPPORTED=yes
+   LSAN_SUPPORTED=yes
+   TSAN_TARGET_DEPENDENT_OBJECTS=tsan_rtl_riscv64.lo
+   fi
;;
   loongarch64-*-linux*)
;;
-- 
2.43.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[PATCH, OpenACC 2.7] Implement reductions for arrays and structs

2024-01-02 Thread Chung-Lin Tang

Hi Thomas, Andrew,
this patch implements reductions for arrays and structs for OpenACC. Following 
the pattern for OpenACC reductions, this is mostly in the respective NVPTX/GCN 
backends' *_goacc_reduction_setup/init/fini/teardown hooks, particularly in the 
fini part, and [nvptx/gcn]_reduction_update routines. The code is mostly 
similar between the two targets, with mostly the lack of vector mode handling 
in GCN.

To Julian, there is a patch to the middle-end neutering, a hack actually, that 
detects SSA_NAMEs used in reduction array MEM_REFs, and avoids single->parallel 
copying (by moving those definitions before BUILT_IN_GOACC_SINGLE_COPY_START). 
This appears to work because reductions do their own initializing of the 
private copy.

As we discussed in our internal calls, the real proper way is to create the 
private array in a more appropriate stage, but that is too long a shot for now. 
The changes here are needed at least for some -O0 cases (when under 
optimization, propagation of the private copies' local address eliminate the 
SSA_NAME and things actually just work in that case). So please bear with this 
hack.

I believe the new added libgomp testcases should be fairly complete. Though 
note that one case of reduction of * for double arrays has been commented out 
for now, for there appears to be a (presumably) unrelated issue causing this 
case to fail (maybe has to do with the loop-based atomic form used by both 
NVPTX/GCN). Maybe should XFAIL instead of comment out. Will do this in next 
iteration.

Thanks,
Chung-Lin

2024-01-02  Chung-Lin Tang  

gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* c-typeck.cc (c_oacc_reduction_defined_type_p): New function.
(c_oacc_reduction_code_name): Likewise.
(c_finish_omp_clauses): Handle OpenACC cases using new functions.

gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_reduction): Adjustments for
OpenACC-specific cases.
* semantics.cc (cp_oacc_reduction_defined_type_p): New function.
(cp_oacc_reduction_code_name): Likewise.
(finish_omp_reduction_clause): Handle OpenACC cases using new functions.

gcc/ChangeLog:
* config/gcn/gcn-tree.cc (gcn_reduction_update): Additions for
handling ARRAY_TYPE and RECORD_TYPE reductions.
(gcn_goacc_reduction_setup): Likewise.
(gcn_goacc_reduction_init): Likewise.
(gcn_goacc_reduction_fini): Likewise.
(gcn_goacc_reduction_teardown): Likewise.

* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Properly generate
V2SI shuffle using vec_extract op.
(nvptx_get_shared_red_addr): Adjust type/alignment calculations to
use TYPE_SIZE/ALIGN_UNIT instead of machine mode based.
(nvptx_reduction_update): Additions for handling ARRAY_TYPE and
RECORD_TYPE reductions.
(nvptx_goacc_reduction_setup): Likewise.
(nvptx_goacc_reduction_init): Likewise.
(nvptx_goacc_reduction_fini): Likewise.
(nvptx_goacc_reduction_teardown): Likewise.

* omp-low.cc (scan_sharing_clauses): Adjust ARRAY_REF pointer type
building to use decl type, rather than generic ptr_type_node.
(omp_reduction_init_op): Add ARRAY_TYPE and RECORD_TYPE init op
construction.
(lower_oacc_reductions): Add code to teardown/recover array access
MEM_REF in OMP_CLAUSE_DECL, to accomodate for lookup requirements.
Adjust type/alignment calculations to use TYPE_SIZE/ALIGN_UNIT
instead of machine mode based.

* omp-oacc-neuter-broadcast.cc (worker_single_copy):
Add 'hash_set *array_reduction_base_vars' parameter.
Add xxx.

(neuter_worker_single): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust recursive calls to self and worker_single_copy.
(oacc_do_neutering): Add 'hash_set *array_reduction_base_vars'
parameter. Adjust call to neuter_worker_single.
(execute_omp_oacc_neuter_broadcast): Add local
'hash_set array_reduction_base_vars' declaration. Collect MEM_REF
base-pointer SSA_NAMEs of arrays into array_reduction_base_vars. Add
'&array_reduction_base_vars' argument to call of oacc_do_neutering.

* omp-offload.cc (default_goacc_reduction): Add unshare_expr.

gcc/testsuite/ChangeLog:
* c-c++-common/goacc/reduction-9.c: New test.
* c-c++-common/goacc/reduction-10.c: New test.
* c-c++-common/goacc/reduction-11.c: New test.
* c-c++-common/goacc/reduction-12.c: New test.
* c-c++-common/goacc/reduction-13.c: New test.

libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/reduction.h
(check_reduction_array_xx): New macro.
(operator_apply): Likewise.
(check_reduction_array_op): Likewise.
(check_reduction_arraysec_op): Likewise.
(function_

Re: [PATCH] libsanitizer: Enable LSan and TSan for riscv64

2024-01-02 Thread Jeff Law





On 1/2/24 06:56, Andreas Schwab wrote:

All new (tsan) tests are working as expected.

* configure.tgt (riscv64-*-linux*): Enable LSan and TSan.

OK
Jeff

Re: [committed] RISC-V: Modify copyright year of vector-crypto.md

2024-01-02 Thread Jeff Law





On 1/1/24 19:25, Feng Wang wrote:

gcc/ChangeLog:
* config/riscv/vector-crypto.md: Modify copyright year.
---
  gcc/config/riscv/vector-crypto.md | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector-crypto.md 
b/gcc/config/riscv/vector-crypto.md
index e40b1543954..9625014e45e 100755
--- a/gcc/config/riscv/vector-crypto.md
+++ b/gcc/config/riscv/vector-crypto.md
@@ -1,5 +1,5 @@
  ;; Machine description for the RISC-V Vector Crypto  extensions.
-;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Copyright (C) 2024 Free Software Foundation, Inc.Please don't change Copyright notices in the future.  There's very 
specific rules around those and we do them en-masse at the start of the 
year using existing scripts and such.


jeff

Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2024-01-02 Thread Jeff Law





On 12/30/23 21:34, YunQiang Su wrote:

Right.  But that's the whole point behind avoiding the narrowing subreg
and forcing use of a truncate operation.

So basically the question becomes is there a way to modify those bits in
a way that GCC doesn't know that it needs to to truncate/extend?



I guess that this code may cause some problem.
int test(int val, unsigned char c, int pos) {
   ((unsigned char*)&val)[pos+0] = c;
   return val;
}
GCC avoids using bitops, instead it uses load/store for it.
Any ISA has INSERT_CHAR_VAR instruction?
  INSERT_CHAR_VAR $rN, $rM,$rX
ISAs that can do that via load certainly exist, but none (to the best of 
my knowledge) need to extend sub-words for correctness reasons.






So I guess that  known_lt may be a better choice
if (known_lt)
 no_truncate_or_extend_needed;
else
 add_truncate_or_extend;

I think Roger's argument is these cases can't happen.

jeff

Re: [PATCH] Improved RTL expansion of field assignments into promoted registers.

2024-01-02 Thread Jeff Law





On 12/28/23 12:35, Roger Sayle wrote:


Hi Jeff,
Thanks for the speedy review.


On 12/28/23 07:59, Roger Sayle wrote:

This patch fixes PR rtl-optmization/104914 by tweaking/improving the
way that fields are written into a pseudo register that needs to be
kept sign extended.

Well, I think "fixes" is a bit of a stretch.  We're avoiding the issue by 
changing the
early RTL generation, but if I understand what's going on in the RTL optimizers
and MIPS backend correctly, the core bug still remains.  Admittedly I haven't 
put it
under a debugger, but that MIPS definition of NOOP_TRUNCATION just seems
badly wrong and is just waiting to pop it's ugly head up again.


I think this really is the/a correct fix. The MIPS backend defines 
NOOP_TRUNCATION
to false, so it's not correct to use a SUBREG to convert from DImode to SImode.
The problem then is where in the compiler (middle-end or backend) is this 
invalid
SUBREG being created and how can it be fixed.  In this particular case, the 
fault
is in RTL expansion.  There may be other places where a SUBREG is 
inappropriately
used instead of a TRUNCATE, but this is the place where things go wrong for
PR rtl-optimization/104914.
Maybe a better way to put it is I think you're patch one piece of the 
solution, but we still have the potential for bugs due to the seemingly 
bogus defintion of TRULY_NOOP_TRUNCATION in the mips port.






Once an inappropriate SImode SUBREG is in the RTL stream, it can remain
harmlessly latent (most of the time), unless it gets split, simplified or 
spilled.
Copying this SImode expression into it's own pseudo, results in incorrect code.
One approach might be to use an UNSPEC for places where backend
invariants are temporarily invalid, but in this case it's machine independent
middle-end code that's using SUBREGs as though the target was an x86/pdp11.

So I agree that on the surface, both of these appear to be identical:

(set (reg:DI) (sign_extend:DI (truncate:SI (reg:DI
(set (reg:DI) (sign_extend:DI (subreg:SI (reg:DI


But should they get split or spilled by reload:
Even if they're not spilled, on a target like mips they're not 
equivalent.  It highlights the poor design around TRULY_NOOP_TRUNCATION. 
 Essentially it's out of band information on how RTL need to be 
interpreted.  We have similar problems with SHIFT_COUNT_TRUNCATED and 
other macros.






2023-12-28  Roger Sayle  

gcc/ChangeLog
  PR rtl-optimization/104914
  * expr.cc (expand_assignment): When target is

SUBREG_PROMOTED_VAR_P

  a sign or zero extension is only required if the modified field
  overlaps the SUBREG's most significant bit.  On MODE_REP_EXTENDED
  targets, don't refer to the temporarily incorrectly extended value
  using a SUBREG, but instead generate an explicit TRUNCATE rtx.

[ ... ]



+ /* Check if the field overlaps the MSB, requiring extension.  */
+ else if (known_eq (bitpos + bitsize,
+GET_MODE_BITSIZE (GET_MODE (to_rtx

Do you need to look at the size of the field as well?  ie, the starting 
position might
be before the sign bit, but the width of the field might cover the mode's sign 
bit?

I'm not real good in the RTL expansion code, so if I'm offbase on this, just 
let me
know.


There are two things that help here.  The first is that the most significant
bit never appears in the middle of a field, so we don't have to worry about
overlapping, nor writes to the paradoxical bits of the SUBREG.  And secondly,
bits are numbered from zero for least significant, to MODE_BITSIZE (mode) - 1
for most significant, irrespective of the endian-ness.  So the code only needs
to check the highest value bitpos + bitsize is the maximum value for the mode.
The above logic stays the same, but which byte insert requires extension will
change between mips64be and mips64le.  i.e. we test that the most significant
bit of the field/byte being written in the most significant bit of the SUBREG
target. [That's my understanding/rationalization, I could wrong].
Yea, if the type is mode M, then we won't have a field that exceeds the 
size of M.  So we just need to know if the position + size lands 
squarely on the MSB.  Thanks for walking me what should have been fairly 
obvious in retrospect.





One thing I could be more cautious about is using maybe_eq instead of
known_eq, but the rest of the code (including truly_noop_truncation) assumes
scalar integer modes, so variable length vectors aren't (yet) a concern.
Would using maybe_eq be better coding style?
It's probably better to use maybe_eq for future proofing.  OK with that 
change after the usual testing.


Thanks diving into this.

jeff

Re: [PATCH] config-ml.in: Fix multi-os-dir search

2024-01-02 Thread Jeff Law





On 1/1/24 09:48, YunQiang Su wrote:

When building multilib libraries, CC/CXX etc are set with an option
-B*/lib/, instead of -B/lib/.
This will make some trouble in some case, for example building
cross toolchain based on Debian's cross packages:

   If we have libc6-dev-i386-amd64-cross packages installed on
   a non-x86 machine. This package will have the files in
   /usr/x86_4-linux-gnu/lib32.  The fellow configure will fail
   when build libgcc for i386, with complains the libc is not
   i386 ones:
  ../configure --enable-multilib --enable-multilib \
 --target=x86_64-linux-gnu

Let's insert a "-B*/lib/`CC ${flags} --print-multi-os-directory`"
before "-B*/lib/".

This patch is based on the patch used by Debian now.

ChangeLog

* config-ml.in: Insert an -B option with multi-os-dir into
compiler commands used to build libraries.
I would prefer this to wait for gcc-15.   I'll go ahead and ACK it for 
gcc-15 though.


What would also be valuable would be to extract out the rest of the 
multiarch patches from the Debian patches and get those into into GCC 
proper.


Jeff

Re: [PATCH] RISC-V: RVV: add toggle to control vsetvl pass behavior

2024-01-02 Thread Jeff Law





On 12/22/23 12:45, Vineet Gupta wrote:

RVV requires VSET?VL? instructions to dynamically configure VLEN at
runtime. There's a custom pass to do that which has a simple mode
which generates a VSETVL for each V insn and a lazy/optimal mode which
uses LCM dataflow to move VSETVL around, identify/delete the redundant
ones.

Currently simple mode is default for !optimize invocations while lazy
mode being the default.

This patch allows simple mode to be forced via a toggle independent of
the optimization level. A lot of gcc developers are currently doing this
in some form in their local setups, as in the initial phase of autovec
development issues are expected. It makes sense to provide this facility
upstream. It could potentially also be used by distro builder for any
quick workarounds in autovec bugs of future.

gcc/ChangeLog:
* config/riscv/riscv.opt: New -param=vsetvl-strategy.
* config/riscv/riscv-opts.h: New enum vsetvl_strategy_enum.
* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::pre_global_vsetvl_info): Use vsetvl_strategy.
(pass_vsetvl::execute): Use vsetvl_strategy.
OK if we mark them as undocumented since I would expect these are really 
just for developers to use during bring-up/debugging and ideally the 
param will disappear by gcc-15.



While I realize there are differing ideas on the set of knobs we may 
want to turn during the debugging phases, we can fault the different 
sets of knobs in as we need them.


jeff

Re: [PATCH] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-02 Thread Andrew Pinski

On Tue, Dec 12, 2023 at 12:22 AM Andrew Pinski  wrote:
>
> Ccmp is not used if the result of the and/ior is used by both
> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> here by using ccmp in this case.
> Two changes is required, first we need to allow the outer statement's
> result be used more than once.
> The second change is that during the expansion of the gimple, we need
> to try using ccmp. This is needed because we don't use expand the ssa
> name of the lhs but rather expand directly from the gimple.
>
> A small note on the ccmp_4.c testcase, we should be able to get slightly
> better than with this patch but it is one extra instruction compared to
> before.
>
> Bootstraped and tested on aarch64-linux-gnu with no regressions.

Ping?

>
> PR target/100942
>
> gcc/ChangeLog:
>
> * ccmp.cc (ccmp_candidate_p): Add outer argument.
> Allow if the outer is true and the lhs is used more
> than once.
> (expand_ccmp_expr): Update call to ccmp_candidate_p.
> * cfgexpand.cc (expand_gimple_stmt_1): Try using ccmp
> for binary assignments.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/ccmp_3.c: New test.
> * gcc.target/aarch64/ccmp_4.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/ccmp.cc   |  9 +++---
>  gcc/cfgexpand.cc  | 25 
>  gcc/testsuite/gcc.target/aarch64/ccmp_3.c | 20 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_4.c | 35 +++
>  4 files changed, 85 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
>
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 1bd6fadea35..a274f8c3d53 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -92,7 +92,7 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
>
>  /* Check whether G is a potential conditional compare candidate.  */
>  static bool
> -ccmp_candidate_p (gimple *g)
> +ccmp_candidate_p (gimple *g, bool outer = false)
>  {
>tree lhs, op0, op1;
>gimple *gs0, *gs1;
> @@ -109,8 +109,9 @@ ccmp_candidate_p (gimple *g)
>lhs = gimple_assign_lhs (g);
>op0 = gimple_assign_rhs1 (g);
>op1 = gimple_assign_rhs2 (g);
> -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> -  || !has_single_use (lhs))
> +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> +return false;
> +  if (!outer && !has_single_use (lhs))
>  return false;
>
>bb = gimple_bb (g);
> @@ -284,7 +285,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
>rtx_insn *last;
>rtx tmp;
>
> -  if (!ccmp_candidate_p (g))
> +  if (!ccmp_candidate_p (g, true))
>  return NULL_RTX;
>
>last = get_last_insn ();
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index b860be8bb77..0f9aad8e3eb 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "ccmp.h"
>
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -3972,6 +3973,30 @@ expand_gimple_stmt_1 (gimple *stmt)
> if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
>   promoted = true;
>
> +   /* Try to expand conditonal compare.  */
> +   if (targetm.gen_ccmp_first
> +   && gimple_assign_rhs_class (assign_stmt) == GIMPLE_BINARY_RHS)
> + {
> +   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> +   gcc_checking_assert (targetm.gen_ccmp_next != NULL);
> +   temp = expand_ccmp_expr (stmt, mode);
> +   if (temp)
> + {
> +   if (promoted)
> + {
> +   int unsignedp = SUBREG_PROMOTED_SIGN (target);
> +   convert_move (SUBREG_REG (target), temp, unsignedp);
> + }
> +   else
> +{
> +   temp = force_operand (temp, target);
> +   if (temp != target)
> + emit_move_insn (target, temp);
> + }
> +   return;
> + }
> + }
> +
> ops.code = gimple_assign_rhs_code (assign_stmt);
> ops.type = TREE_TYPE (lhs);
> switch (get_gimple_rhs_class (ops.code))
> diff --git a/gcc/testsuite/gcc.target/aarch64/ccmp_3.c 
> b/gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> new file mode 100644
> index 000..a2b47fbee14
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ccmp_3.c
> @@ -0,0 +1,20 @@
> +/* { dg-options "-O2" } */
> +/* PR target/100942 */
> +
> +void foo(void);
> +int f1(int a, int b

Re: [PATCH] testsuite: Reduce gcc.dg/torture/inline-mem-cpy-1.c by 11 for simulators

2024-01-02 Thread Jeff Law





On 1/1/24 20:22, Hans-Peter Nilsson wrote:

Tested mmix-knuth-mmixware (where all torture-variants of
gcc.dg/torture/inline-mem-cpy-1.c now pass) and native
x86_64-pc-linux-gnu.  Also stepped through the test for native,
w/wo. RUN_FRACTION defined to see that it worked as intended.

You may wonder what about the "sibling" tests inline-mem-cmp-1.c and
inline-mem-cpy-cmp-1.c.  Well, they FAIL, but not because of
timeouts(!)  To be continued

Ok to commit?

Or, other suggestions?
I'm pretty sure there's already a target selector for "simulator"  So 
you might be able to do this automagically with somethign like


dg-additional-options "-DRUN_FRACTION=11" { target { simulator } }"

Or something close to that.

jeff

[PATCH] libstdc++: testsuite: reduce max_size_type.cc exec time [PR113175]

2024-01-02 Thread Patrick Palka

Tested on x86_64-pc-linux-gnu, does this look OK for trunk and release
branches (r14-205 was backported everywhere)?

-- >8 --

The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
inadvertently increased the execution time of the test by over 5x due to
enabling the two main loops to actually run in the signed_p case instead
of being dead code.  This suggests that the current range of the loop is
far too big and the test too time consuming, especially when run on
simulators.

So this patch cuts the loop range by 10x as proposed in the PR.  This
shouldn't significantly weaken the test since the same important edge
cases are still checked in the new range.  On my x86_64 machine this
reduces the test execution time by 10x, and 1.6x less time than before
r14-205.

PR testsuite/113175

libstdc++-v3/ChangeLog:

* testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
(test03): Likewise.
---
 libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc 
b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
index a1fbc3241dc..27f25c758fe 100644
--- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
+++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
@@ -199,8 +199,8 @@ test02()
   using max_type = std::conditional_t;
   using shorten_type = std::conditional_t;
   const int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
-  const int limit = 1000;
-  const int log2_limit = 10;
+  const unsigned limit = 100;
+  const int log2_limit = 7;
   static_assert((1 << log2_limit) >= limit);
   const int min = (signed_p ? -limit : 0);
   const int max = limit;
@@ -257,8 +257,8 @@ test03()
   using max_type = std::conditional_t;
   using base_type = std::conditional_t;
   constexpr int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
-  constexpr int limit = 1000;
-  constexpr int log2_limit = 10;
+  constexpr unsigned limit = 100;
+  constexpr int log2_limit = 7;
   static_assert((1 << log2_limit) >= limit);
   const int min = (signed_p ? -limit : 0);
   const int max = limit;
-- 
2.43.0.232.ge79552d197

[PATCH]middle-end: check if target can do extract first for early breaks [PR113199]

2024-01-02 Thread Tamar Christina

Hi All,

I was generating the vector reverse mask without checking if the target
actually supported such an operation.

It also seems like more targets implement VEC_EXTRACT than permute on mask
registers.

So this adds a check for IFN_VEC_EXTRACT support when required and changes
the select first code to use it.

This is good for now since masks always come from whilelo.  But in the future
when masks can come from other sources we will need the old code back.

Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
and no issues with --enable-checking=release --enable-lto
--with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra.
tested on cross cc1 for amdgcn-amdhsa and issue fixed.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR tree-optimization/113199
* tree-vect-loop.cc (vectorizable_live_operation_1): Use
IFN_VEC_EXTRACT.
(vectorizable_live_operation): Check for IFN_VEC_EXTRACT support.

gcc/testsuite/ChangeLog:

PR tree-optimization/113199
* gcc.target/gcn/pr113199.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/testsuite/gcc.target/gcn/pr113199.c 
b/gcc/testsuite/gcc.target/gcn/pr113199.c
new file mode 100644
index 
..8a641e5536e80e207ca0163cac66c0f4f6ca93f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/pr113199.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2" } */
+
+typedef long unsigned int size_t;
+typedef int wchar_t;
+struct tm
+{
+  int tm_mon;
+  int tm_year;
+};
+int abs (int);
+struct lc_time_T { const char *month[12]; };
+struct __locale_t * __get_current_locale (void) { }
+const struct lc_time_T * __get_time_locale (struct __locale_t *locale) { }
+const wchar_t * __ctloc (wchar_t *buf, const char *elem, size_t *len_ret) { 
return buf; }
+size_t
+__strftime (wchar_t *s, size_t maxsize, const wchar_t *format,
+ const struct tm *tim_p, struct __locale_t *locale)
+{
+  size_t count = 0;
+  const wchar_t *ctloc;
+  wchar_t ctlocbuf[256];
+  size_t i, ctloclen;
+  const struct lc_time_T *_CurrentTimeLocale = __get_time_locale (locale);
+{
+  switch (*format)
+ {
+ case L'B':
+   (ctloc = __ctloc (ctlocbuf, _CurrentTimeLocale->month[tim_p->tm_mon], 
&ctloclen));
+   for (i = 0; i < ctloclen; i++)
+ {
+   if (count < maxsize - 1)
+  s[count++] = ctloc[i];
+   else
+  return 0;
+   {
+  int century = tim_p->tm_year >= 0
+? tim_p->tm_year / 100 + 1900 / 100
+: abs (tim_p->tm_year + 1900) / 100;
+   }
+   }
+ }
+}
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 
37f1be1101ffae779214056a0886411e0683e887..5aa92e67444e7aacf458fffa1428f1983c482374
 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10648,36 +10648,18 @@ vectorizable_live_operation_1 (loop_vec_info 
loop_vinfo,
  &LOOP_VINFO_MASKS (loop_vinfo),
  1, vectype, 0);
   tree scalar_res;
+  gimple_seq_add_seq (&stmts, tem);
 
   /* For an inverted control flow with early breaks we want EXTRACT_FIRST
-instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
+instead of EXTRACT_LAST.  For now since the mask always comes from a
+WHILELO we can get the first element ignoring the mask since CLZ of the
+mask will always be zero.  */
   if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
-   {
- /* First create the permuted mask.  */
- tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
- tree perm_dest = copy_ssa_name (mask);
- gimple *perm_stmt
-   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
-  mask, perm_mask);
- vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
-  &gsi);
- mask = perm_dest;
-
- /* Then permute the vector contents.  */
- tree perm_elem = perm_mask_for_reverse (vectype);
- perm_dest = copy_ssa_name (vec_lhs_phi);
- perm_stmt
-   = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
-  vec_lhs_phi, perm_elem);
- vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
-  &gsi);
- vec_lhs_phi = perm_dest;
-   }
-
-  gimple_seq_add_seq (&stmts, tem);
-
-  scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
-mask, vec_lhs_phi);
+   scalar_res = gimple_build (&stmts, CFN_VEC_EXTRACT, TREE_TYPE (vectype),
+  vec_lhs_phi, bitstart);
+  else
+   scalar_res = gimple_build (&stmts, CFN_EXTRACT_LAST, scalar_type,
+  mask, vec_lhs_phi);
 
   /* Convert the extracted vector element to the scalar type.  */
   new_tree = gimple_

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2024-01-02 Thread Harald Anlauf


Dear all,

we might want to update changes.html to reflect this.  How about:

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 403feb06..9b16f5e3 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -284,6 +284,11 @@ a work-in-progress.

 Fortran
 
+  The compiler now accepts the -std=f2023 option, which
+has been added in preparation of support of Fortran 2023.  This option
+increases the line-length limit for source in free-form to 1, and
+statements may have up to 1 million characters.
+  
With the -save-temps option, preprocessed files
 with the .fii extension will be generated from
 free-form source files such as .F90 and


Cheers,
Harald


Am 17.11.23 um 12:38 schrieb Tobias Burnus:

Hi Harald, hi all,

On 16.11.23 20:30, Harald Anlauf wrote:

On 11/16/23 14:01, Tobias Burnus wrote:

This adds -std=f2023, which is mostly a prep patch for future changes.

...

(B) In "6.3.2.6 Free form statements":
Fortran 2018: "A statement shall not have more than 255 continuation
lines."
Fortran 2023: "A statement shall not have more than one million
characters."


this is really a funny change: we're not really prepared to handle
this.


I can confirm this. I tried to get it working in scanner.cc but due to
the re-parsing it is quite difficult to get it right; the main problem
is that we keep reparsing code ("gfc_current_locus = old_loc"), such
that a simple count will be wrong.

→ Now tracked at https://gcc.gnu.org/PR112586



According to the standard one can have 99 lines with only
"&" and then an ";", but then only 100 lines with 1 characters.


I believe a single '&' is not valid, you either need '&&' or something
else + '&'; thus, you can have only half a million lines + 1.

In the code, I still use 1,000,000 but now with a comment.


There is a similar wording for fixed-form which you overlooked:

Ups - fixed.

If you think that we need testcases for fixed-form, add them,
or forget them.  I don't bother.

I added one.


- there are existing testcases continuation_5.f, continuation_6.f,
  thus I suggest to rename your new continuation_{5,6}.f90 to
  continuation_17.f90+ .


Done. We are rather inconsistent whether we enumerate .f{,90}
together or separately; as the suffix is shown, either works.



- I don't understand your new testcase line_length_14.f90 .
  This is supposed to test -std=gnu, but then -std=gnu is not a
  standard but a moving target, which is why you had to adjust
  existing testcases.
  So what does it buy us beyond line_length_1{2,3}.f90 ?


Well, it ensures that the warning is not only shown for -std=f2023 but
also for -std=f2028 and (current -std=gnu). In general, I think it is
useful to check the lower and the upper bound.

I have now removed it - as it is unlikely that we would regress on such
changes.


PPS: I did not bother adding .f23 as file extension; I believe that also
.f18 is unsupported.

I never use extensions other than .f90 for portable code.


Likewise  - especially as '.f95' starts out as Fortran code that
complies to -std=f95 but slowly Fortran 2003 or later code creeps in. I
think that's fine but then one can also directly use .f90. (Most code
does so.)

Unless there are follow up comments, I will commit it later today.

Thanks for the comments!

Tobias

PS: I fixed the wording issue in the subject line of the email and
header. I first wrote 'support' but that sounded a bit as if F2023 is
supported. Hence, I wrote 'Accept' and did not remove 'support'.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2024-01-02 Thread Steve Kargl

On Tue, Jan 02, 2024 at 08:31:15PM +0100, Harald Anlauf wrote:
> 
> we might want to update changes.html to reflect this.  How about:
> 
> diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
> index 403feb06..9b16f5e3 100644
> --- a/htdocs/gcc-14/changes.html
> +++ b/htdocs/gcc-14/changes.html
> @@ -284,6 +284,11 @@ a work-in-progress.
> 
>  Fortran
>  
> +  The compiler now accepts the -std=f2023 option, which
> +has been added in preparation of support of Fortran 2023.  This option
> +increases the line-length limit for source in free-form to 1, and
> +statements may have up to 1 million characters.
> +  
> With the -save-temps option, preprocessed files
>  with the .fii extension will be generated from
>  free-form source files such as .F90 and
> 

LGTM.

-- 
Steve

[PATCH v2 0/2] asan: Align .LASANPC on function boundary

2024-01-02 Thread Ilya Leoshkevich

v1: 
https://inbox.sourceware.org/gcc-patches/20231207121005.3425208-1-...@linux.ibm.com/
v1 -> v2: Fix style issues (Jakub).
  Jakub has reviewed patch 2 and mentioned that he'd defer the
  patch 1 review to Jeff.



Hi,

this is another attempt to fix the .LASANPC alignment on s390x.
Currently it's not only inefficient ([1]-[5]), but also causes linker
errors in template-heavy code ([6]).

The previous attempts to add a new constant for minimum code alignment
value ([1]-[5]) did not arouse considerable enthusiasm, and fixing the
fallout ([6]) is probably just a wrong thing to do.

So here I'm taking another approach: making sure that .LASANPC is
aligned on function boundary in the first place.  This requires moving
the asan_function_start() invocation to ASM_OUTPUT_FUNCTION_LABEL().

Bootstrapped and regtested on x86_64-redhat-linux, ppc64le-redhat-linux
and s390x-redhat-linux.  Compile tested for platforms listed in [7].

Best regards,
Ilya

[1] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525016.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2019-July/525069.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548338.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549252.html
[5] https://patchwork.ozlabs.org/project/gcc/list/?series=320223
[6] https://patchwork.ozlabs.org/project/gcc/list/?series=297132
[7] http://toolchain.lug-owl.de/laminar/jobs

Ilya Leoshkevich (2):
  Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL
  asan: Align .LASANPC on function boundary

 gcc/asan.cc |  6 ++
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 14 ++
 22 files changed, 59 insertions(+), 48 deletions(-)

-- 
2.43.0

[PATCH v2 1/2] Implement ASM_DECLARE_FUNCTION_NAME using ASM_OUTPUT_FUNCTION_LABEL

2024-01-02 Thread Ilya Leoshkevich

gccint recommends using ASM_OUTPUT_FUNCTION_LABEL in
ASM_DECLARE_FUNCTION_NAME, but many implementations use
ASM_OUTPUT_LABEL instead.  It's inconsistent and prevents changes to
ASM_OUTPUT_FUNCTION_LABEL from affecting the respective targets.
---
 gcc/config/aarch64/aarch64.cc   |  2 +-
 gcc/config/alpha/alpha.cc   |  5 ++---
 gcc/config/arm/aout.h   |  2 +-
 gcc/config/arm/arm.cc   |  2 +-
 gcc/config/bfin/bfin.h  | 16 
 gcc/config/c6x/c6x.h|  2 +-
 gcc/config/gcn/gcn.cc   |  5 ++---
 gcc/config/h8300/h8300.h|  2 +-
 gcc/config/ia64/ia64.cc |  5 ++---
 gcc/config/mcore/mcore-elf.h|  2 +-
 gcc/config/microblaze/microblaze.cc |  3 +--
 gcc/config/mips/mips.cc | 19 ++-
 gcc/config/pa/pa.cc |  3 ++-
 gcc/config/riscv/riscv.cc   |  2 +-
 gcc/config/rs6000/rs6000.cc |  4 ++--
 15 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 298477d88bb..e3c72f60d4e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24207,7 +24207,7 @@ aarch64_declare_function_name (FILE *stream, const 
char* name,
 
   /* Don't forget the type directive for ELF.  */
   ASM_OUTPUT_TYPE_DIRECTIVE (stream, name, "function");
-  ASM_OUTPUT_LABEL (stream, name);
+  ASM_OUTPUT_FUNCTION_LABEL (stream, name, fndecl);
 
   cfun->machine->label_is_assembled = true;
 }
diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 6aa93783226..8118255e737 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -7986,8 +7986,7 @@ int num_source_filenames = 0;
 /* Output the textual info surrounding the prologue.  */
 
 void
-alpha_start_function (FILE *file, const char *fnname,
- tree decl ATTRIBUTE_UNUSED)
+alpha_start_function (FILE *file, const char *fnname, tree decl)
 {
   unsigned long imask, fmask;
   /* Complete stack size needed.  */
@@ -8052,7 +8051,7 @@ alpha_start_function (FILE *file, const char *fnname,
   if (TARGET_ABI_OPEN_VMS)
 strcat (entry_label, "..en");
 
-  ASM_OUTPUT_LABEL (file, entry_label);
+  ASM_OUTPUT_FUNCTION_LABEL (file, entry_label, decl);
   inside_function = TRUE;
 
   if (TARGET_ABI_OPEN_VMS)
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 49896bb9620..380147aed7d 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -152,7 +152,7 @@
   do   \
 {  \
   ARM_DECLARE_FUNCTION_NAME (STREAM, NAME, DECL);   \
-  ASM_OUTPUT_LABEL (STREAM, NAME); \
+  ASM_OUTPUT_FUNCTION_LABEL (STREAM, NAME, DECL);  \
 }  \
   while (0)
 #endif
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 0c0cb14a8a4..7ca607b3de1 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -21800,7 +21800,7 @@ arm_asm_declare_function_name (FILE *file, const char 
*name, tree decl)
   ARM_DECLARE_FUNCTION_NAME (file, name, decl);
   ASM_OUTPUT_TYPE_DIRECTIVE (file, name, "function");
   ASM_DECLARE_RESULT (file, DECL_RESULT (decl));
-  ASM_OUTPUT_LABEL (file, name);
+  ASM_OUTPUT_FUNCTION_LABEL (file, name, decl);
 
   if (cmse_name)
 ASM_OUTPUT_LABEL (file, cmse_name);
diff --git a/gcc/config/bfin/bfin.h b/gcc/config/bfin/bfin.h
index c25f41f6839..60a8d716819 100644
--- a/gcc/config/bfin/bfin.h
+++ b/gcc/config/bfin/bfin.h
@@ -995,14 +995,14 @@ typedef enum directives {
 fputc ('\n',FILE); \
   } while (0)
 
-#define ASM_DECLARE_FUNCTION_NAME(FILE,NAME,DECL) \
-  do { \
-fputs (".type ", FILE);\
-assemble_name (FILE, NAME); \
-fputs (", STT_FUNC", FILE); \
-fputc (';',FILE);   \
-fputc ('\n',FILE); \
-ASM_OUTPUT_LABEL(FILE, NAME);  \
+#define ASM_DECLARE_FUNCTION_NAME(FILE, NAME, DECL)\
+  do { \
+fputs (".type ", FILE);\
+assemble_name (FILE, NAME);\
+fputs (", STT_FUNC", FILE);\
+fputc (';', FILE); \
+fputc ('\n', FILE);\
+ASM_OUTPUT_FUNCTION_LABEL (FILE, NAME, DECL);  \
   } while (0)
 
 #define ASM_OUTPUT_LABEL(FILE, NAME)\
diff --git a/gcc/config/c6x/c6x.h b/gcc/config/c6x/c6x.h
index 26b2f2f0700..790b9627ebe 100644
--- a/gcc/config/c6x/c6x.h
+++ b/gcc/config/c6x/c6x.h
@@ -459,7 +459,7 @@ struct GTY(()) machine_function
   c6x_output_file_unwind (FILE);   \
   ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "function");  \
   ASM_DECLARE_RESULT (FILE, D

[PATCH v2 2/2] asan: Align .LASANPC on function boundary

2024-01-02 Thread Ilya Leoshkevich

GCC can emit code between the function label and the .LASANPC label,
making the latter unaligned.  Some architectures cannot load unaligned
labels directly and require literal pool entries, which is inefficient.

Move the invocation of asan_function_start to
ASM_OUTPUT_FUNCTION_LABEL, which guarantees that no additional code is
emitted.  This allows setting the .LASANPC label alignment to the
respective function alignment.
---
 gcc/asan.cc |  6 ++
 gcc/config/i386/i386.cc |  2 +-
 gcc/config/s390/s390.cc |  2 +-
 gcc/defaults.h  |  2 +-
 gcc/final.cc|  3 ---
 gcc/output.h|  4 
 gcc/varasm.cc   | 14 ++
 7 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/asan.cc b/gcc/asan.cc
index 8d0ffb497cc..48738244aba 100644
--- a/gcc/asan.cc
+++ b/gcc/asan.cc
@@ -1481,10 +1481,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 void
 asan_function_start (void)
 {
-  section *fnsec = function_section (current_function_decl);
-  switch_to_section (fnsec);
-  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC",
-current_function_funcdef_no);
+  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, "LASANPC", 
current_function_funcdef_no);
 }
 
 /* Return number of shadow bytes that are occupied by a local variable
@@ -2006,6 +2003,7 @@ asan_emit_stack_protection (rtx base, rtx pbase, unsigned 
int alignb,
   DECL_INITIAL (decl) = decl;
   TREE_ASM_WRITTEN (decl) = 1;
   TREE_ASM_WRITTEN (id) = 1;
+  DECL_ALIGN_RAW (decl) = DECL_ALIGN_RAW (current_function_decl);
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38d515dac04..09fc2b63ee3 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1640,7 +1640,7 @@ ix86_asm_output_function_label (FILE *out_file, const 
char *fname,
   SUBTARGET_ASM_UNWIND_INIT (out_file);
 #endif
 
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
 
   /* Output magic byte marker, if hot-patch attribute is set.  */
   if (is_ms_hook)
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index a5c36b43972..c871a10506a 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -8323,7 +8323,7 @@ s390_asm_output_function_label (FILE *out_file, const 
char *fname,
   asm_fprintf (out_file, "\t# fn:%s wd%d\n", fname,
   s390_warn_dynamicstack_p);
 }
-  ASM_OUTPUT_LABEL (out_file, fname);
+  assemble_function_label_raw (out_file, fname);
   if (hw_after > 0)
 asm_fprintf (out_file,
 "\t# post-label NOPs for hotpatch (%d halfwords)\n",
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 6f095969410..b76734908cd 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -150,7 +150,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #ifndef ASM_OUTPUT_FUNCTION_LABEL
 #define ASM_OUTPUT_FUNCTION_LABEL(FILE, NAME, DECL) \
-  ASM_OUTPUT_LABEL ((FILE), (NAME))
+  assemble_function_label_raw ((FILE), (NAME))
 #endif
 
 /* Output the definition of a compiler-generated label named NAME.  */
diff --git a/gcc/final.cc b/gcc/final.cc
index e6f1b1e166b..5e21aedf8ed 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -1686,9 +1686,6 @@ final_start_function_1 (rtx_insn **firstp, FILE *file, 
int *seen,
 
   high_block_linenum = high_function_linenum = last_linenum;
 
-  if (flag_sanitize & SANITIZE_ADDRESS)
-asan_function_start ();
-
   rtx_insn *first = *firstp;
   if (in_initial_view_p (first))
 {
diff --git a/gcc/output.h b/gcc/output.h
index 76cfd58c1e6..bfdecc5ea74 100644
--- a/gcc/output.h
+++ b/gcc/output.h
@@ -178,6 +178,10 @@ extern void assemble_asm (tree);
 /* Get the function's name from a decl, as described by its RTL.  */
 extern const char *get_fnname_from_decl (tree);
 
+/* Output function label, possibly with accompanying metadata.  No additional
+   code or data is output after the label.  */
+extern void assemble_function_label_raw (FILE *, const char *);
+
 /* Output assembler code for the constant pool of a function and associated
with defining the name of the function.  DECL describes the function.
NAME is the function's name.  For the constant pool, we use the current
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 69f8f8ee018..d0d670d009c 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "toplev.h"
 #include "opts.h"
+#include "asan.h"
 
 /* The (assembler) name of the first globally-visible object output.  */
 extern GTY(()) const char *first_global_object_name;
@@ -1835,6 +1836,19 @@ get_fnname_from_decl (tree decl)
   return XSTR (x, 0);
 }
 
+/* Output function label, possibly with accompanying metadata.  No additional
+

Patch ping: Fix for PR 112560

2024-01-02 Thread Uros Bizjak

Hello!

I have sent an explanation on ICE in try_combine on pr112494.c [1],and
an argument that explains why we can safely ignore non-COMPARISON_P
mode changes [2].

Can we proceed with the proposed solution?

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638726.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639137.html

Thanks,
Uros.

Re: [PATCH v4] RISC-V: Change csr_operand into vector_length_operand for vsetvl patterns.

2024-01-02 Thread Christoph Müllner

On Tue, Jan 2, 2024 at 2:35 AM juzhe.zh...@rivai.ai
 wrote:
>
> LGTM assume you have passed the regression.

Committed.
I've rebased this patch, validated that there are no regressions with the patch,
and reworded the commit message a bit before that.

>
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Jun Sha (Joshua)
> Date: 2023-12-29 12:10
> To: gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; jeffreyalaw; 
> christoph.muellner; juzhe.zhong; Jun Sha (Joshua); Jin Ma; Xianmiao Qu
> Subject: [PATCH v4] RISC-V: Change csr_operand into vector_length_operand for 
> vsetvl patterns.
> This patch use vector_length_operand instead of csr_operand for
> vsetvl patterns, so that changes for vector will not affect scalar
> patterns using csr_operand in riscv.md.
>
> gcc/ChangeLog:
>
> * config/riscv/vector.md:
> Use vector_length_operand for vsetvl patterns.
>
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
> gcc/config/riscv/vector.md | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index f607d768b26..b5a9055cdc4 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -1496,7 +1496,7 @@
> (define_insn "@vsetvl"
>[(set (match_operand:P 0 "register_operand" "=r")
> - (unspec:P [(match_operand:P 1 "csr_operand" "rK")
> + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
>(match_operand 2 "const_int_operand" "i")
>(match_operand 3 "const_int_operand" "i")
>(match_operand 4 "const_int_operand" "i")
> @@ -1542,7 +1542,7 @@
> ;; in vsetvl instruction pattern.
> (define_insn "@vsetvl_discard_result"
>[(set (reg:SI VL_REGNUM)
> - (unspec:SI [(match_operand:P 0 "csr_operand" "rK")
> + (unspec:SI [(match_operand:P 0 "vector_length_operand" "rK")
> (match_operand 1 "const_int_operand" "i")
> (match_operand 2 "const_int_operand" "i")] UNSPEC_VSETVL))
> (set (reg:SI VTYPE_REGNUM)
> @@ -1564,7 +1564,7 @@
> ;; such pattern can allow us gain benefits of these optimizations.
> (define_insn_and_split "@vsetvl_no_side_effects"
>[(set (match_operand:P 0 "register_operand" "=r")
> - (unspec:P [(match_operand:P 1 "csr_operand" "rK")
> + (unspec:P [(match_operand:P 1 "vector_length_operand" "rK")
>(match_operand 2 "const_int_operand" "i")
>(match_operand 3 "const_int_operand" "i")
>(match_operand 4 "const_int_operand" "i")
> @@ -1608,7 +1608,7 @@
>[(set (match_operand:DI 0 "register_operand")
>  (sign_extend:DI
>(subreg:SI
> - (unspec:DI [(match_operand:P 1 "csr_operand")
> + (unspec:DI [(match_operand:P 1 "vector_length_operand")
> (match_operand 2 "const_int_operand")
> (match_operand 3 "const_int_operand")
> (match_operand 4 "const_int_operand")
> --
> 2.17.1
>
>

Re: [Patch] Fortran: Accept -std=f2023, update line-length for Fortran 2023

2024-01-02 Thread Harald Anlauf


Am 02.01.24 um 20:37 schrieb Steve Kargl:

On Tue, Jan 02, 2024 at 08:31:15PM +0100, Harald Anlauf wrote:


we might want to update changes.html to reflect this.  How about:

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 403feb06..9b16f5e3 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -284,6 +284,11 @@ a work-in-progress.

  Fortran
  
+  The compiler now accepts the -std=f2023 option, which
+has been added in preparation of support of Fortran 2023.  This option
+increases the line-length limit for source in free-form to 1, and
+statements may have up to 1 million characters.
+  
 With the -save-temps option, preprocessed files
  with the .fii extension will be generated from
  free-form source files such as .F90 and



LGTM.



Thanks, this is now pushed.

[PATCH] RISC-V: Implement ZACAS extensions

2024-01-02 Thread trdthg47

From: trdthg 

This patch supports Zacas extension.
It includes instruction's machine description and built-in functions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add zacas extensions.
(riscv_ext_version_table): Likewise.
* config/riscv/arch-canonicalize
(IMPLIED_EXT): Add zacas extensions.
* config/riscv/iterators.md
(SIDI): New iterator.
(SIDITI): Likewise.
(amocas): New attribute.
* config/riscv/riscv-builtins.cc
(AVAIL): Add new.
* config/riscv/riscv-ftypes.def: Add new type for zacas instructions.
* config/riscv/riscv-zacas.def: Add ZACAS extension's built-in function 
file.
* config/riscv/riscv.md: Add new type for zacas instructions.
* config/riscv/riscv.opt: Add introduction of riscv_zacas_subext.
* config/riscv/zacas.md: Add ZACAS extension's machine description file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zacas32.c: New test.
* gcc.target/riscv/zacas64.c: New test.
* gcc.target/riscv/zacas128.c: New test.

Signed-off-by: trdthg 
---
 gcc/common/config/riscv/riscv-common.cc   |  5 ++
 gcc/config/riscv/arch-canonicalize|  1 +
 gcc/config/riscv/iterators.md |  9 +++
 gcc/config/riscv/riscv-builtins.cc| 85 ++-
 gcc/config/riscv/riscv-ftypes.def |  3 +
 gcc/config/riscv/riscv-zacas.def  | 11 +++
 gcc/config/riscv/riscv.md |  5 +-
 gcc/config/riscv/riscv.opt|  2 +
 gcc/config/riscv/zacas.md | 52 ++
 gcc/testsuite/gcc.target/riscv/zacas128.c | 19 +
 gcc/testsuite/gcc.target/riscv/zacas32.c  | 34 +
 gcc/testsuite/gcc.target/riscv/zacas64.c  | 34 +
 12 files changed, 257 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-zacas.def
 create mode 100644 gcc/config/riscv/zacas.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/zacas128.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zacas32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zacas64.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index f20d179568d..14de9968c9e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -77,6 +77,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"f", "zicsr"},
   {"d", "zicsr"},
 
+  {"zacas", "a"},
+
   {"zdinx", "zfinx"},
   {"zfinx", "zicsr"},
   {"zdinx", "zicsr"},
@@ -251,6 +253,8 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zacas", ISA_SPEC_CLASS_NONE, 1, 0},
+
   {"zba", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zbc", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1624,6 +1628,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zicond",   &gcc_options::x_riscv_zi_subext, MASK_ZICOND},
 
   {"zawrs", &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
+  {"zacas", &gcc_options::x_riscv_za_subext, MASK_ZACAS},
 
   {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
   {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
diff --git a/gcc/config/riscv/arch-canonicalize 
b/gcc/config/riscv/arch-canonicalize
index a8f47a1752b..616e0ed4726 100755
--- a/gcc/config/riscv/arch-canonicalize
+++ b/gcc/config/riscv/arch-canonicalize
@@ -41,6 +41,7 @@ LONG_EXT_PREFIXES = ['z', 's', 'h', 'x']
 IMPLIED_EXT = {
   "d" : ["f", "zicsr"],
   "f" : ["zicsr"],
+  "zacas" : ["a"],
   "zdinx" : ["zfinx", "zicsr"],
   "zfinx" : ["zicsr"],
   "zhinx" : ["zhinxmin", "zfinx", "zicsr"],
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index f332fba7031..b16b3892969 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -53,6 +53,12 @@ (define_mode_iterator SHORT [QI HI])
 ;; Iterator for HImode constant generation.
 (define_mode_iterator HISI [HI SI])
 
+;; Iterator for SImode and DImode constant generation.
+(define_mode_iterator SIDI [SI DI])
+
+;; Iterator for SImode, DImode and TImode constant generation.
+(define_mode_iterator SIDITI [SI DI TI])
+
 ;; Iterator for QImode extension patterns.
 (define_mode_iterator SUPERQI [HI SI (DI "TARGET_64BIT")])
 
@@ -113,6 +119,9 @@ (define_mode_attr ifmt [(SI "w") (DI "l")])
 ;; This attribute gives the format suffix for atomic memory operations.
 (define_mode_attr amo [(SI "w") (DI "d")])
 
+;; This attribute gives the format suffix for amocas operations.
+(define_mode_attr amocas [(SI "w") (DI "d") (TI "q")])
+
 ;; This attribute gives the upper-case mode name for one unit of a
 ;; floating-point mode.
 (define_mode_attr UNITMODE [(HF "HF") (SF "SF") (DF "DF")])
diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 5ee11ebe3bc..5074acfd69b 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-bui

Re: [PATCH] testsuite: Reduce gcc.dg/torture/inline-mem-cpy-1.c by 11 for simulators

2024-01-02 Thread Hans-Peter Nilsson

On Tue, 2 Jan 2024, Jeff Law wrote:
> 
> On 1/1/24 20:22, Hans-Peter Nilsson wrote:
> > Tested mmix-knuth-mmixware (where all torture-variants of
> > gcc.dg/torture/inline-mem-cpy-1.c now pass) and native
> > x86_64-pc-linux-gnu.  Also stepped through the test for native,
> > w/wo. RUN_FRACTION defined to see that it worked as intended.
> > 
> > You may wonder what about the "sibling" tests inline-mem-cmp-1.c and
> > inline-mem-cpy-cmp-1.c.  Well, they FAIL, but not because of
> > timeouts(!)  To be continued
> > 
> > Ok to commit?
> > 
> > Or, other suggestions?
> I'm pretty sure there's already a target selector for "simulator"  So you
> might be able to do this automagically with somethign like
> 
> dg-additional-options "-DRUN_FRACTION=11" { target { simulator } }"
> 
> Or something close to that.

Hm...  But that's exactly what the one-line patch to 
gcc.dg/torture/inline-mem-cpy-1.c looked like, last in the 
submitted commit.  I had to double-check my sent-mail folder 
that I didn't miss that part. :)

I'm mostly worried about the patch to gcc.dg/memcpy-1.c.
Does that mean all-ok?

brgds, H-P

Re: [PATCH] libstdc++: testsuite: reduce max_size_type.cc exec time [PR113175]

2024-01-02 Thread Jonathan Wakely

On Tue, 2 Jan 2024, 17:49 Patrick Palka,  wrote:

> Tested on x86_64-pc-linux-gnu, does this look OK for trunk and release
> branches (r14-205 was backported everywhere)?
>

Yes, thanks.



> -- >8 --
>
> The adjustment to max_size_type.cc in r14-205-g83470a5cd4c3d2
> inadvertently increased the execution time of the test by over 5x due to
> enabling the two main loops to actually run in the signed_p case instead
> of being dead code.  This suggests that the current range of the loop is
> far too big and the test too time consuming, especially when run on
> simulators.
>
> So this patch cuts the loop range by 10x as proposed in the PR.  This
> shouldn't significantly weaken the test since the same important edge
> cases are still checked in the new range.  On my x86_64 machine this
> reduces the test execution time by 10x, and 1.6x less time than before
> r14-205.
>
> PR testsuite/113175
>
> libstdc++-v3/ChangeLog:
>
> * testsuite/std/ranges/iota/max_size_type.cc (test02): Reduce
> 'limit' to 100 from 1000 and adjust 'log2_limit' accordingly.
> (test03): Likewise.
> ---
>  libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> index a1fbc3241dc..27f25c758fe 100644
> --- a/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/iota/max_size_type.cc
> @@ -199,8 +199,8 @@ test02()
>using max_type = std::conditional_t;
>using shorten_type = std::conditional_t;
>const int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
> -  const int limit = 1000;
> -  const int log2_limit = 10;
> +  const unsigned limit = 100;
> +  const int log2_limit = 7;
>static_assert((1 << log2_limit) >= limit);
>const int min = (signed_p ? -limit : 0);
>const int max = limit;
> @@ -257,8 +257,8 @@ test03()
>using max_type = std::conditional_t;
>using base_type = std::conditional_t;
>constexpr int hw_type_bit_size = sizeof(hw_type) * __CHAR_BIT__;
> -  constexpr int limit = 1000;
> -  constexpr int log2_limit = 10;
> +  constexpr unsigned limit = 100;
> +  constexpr int log2_limit = 7;
>static_assert((1 << log2_limit) >= limit);
>const int min = (signed_p ? -limit : 0);
>const int max = limit;
> --
> 2.43.0.232.ge79552d197
>
>

c++/modules: Emit definitions of ODR-used static members imported from modules [PR112899]

2024-01-02 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Static data members marked 'inline' should be emitted in TUs where they
are ODR-used.  We need to make sure that statics imported from modules
are correctly added to the 'pending_statics' map so that they get
emitted if needed, otherwise the attached testcase fails to link.

PR c++/112899

gcc/cp/ChangeLog:

* cp-tree.h (note_variable_template_instantiation): Rename to...
(note_static_storage_variable): ...this.
* decl2.cc (note_variable_template_instantiation): Rename to...
(note_static_storage_variable): ...this.
* pt.cc (instantiate_decl): Rename usage of above function.
* module.cc (trees_in::read_var_def): Remember pending statics
that we stream in.

gcc/testsuite/ChangeLog:

* g++.dg/modules/init-4_a.C: New test.
* g++.dg/modules/init-4_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h|  2 +-
 gcc/cp/decl2.cc |  4 ++--
 gcc/cp/module.cc|  4 
 gcc/cp/pt.cc|  2 +-
 gcc/testsuite/g++.dg/modules/init-4_a.C |  9 +
 gcc/testsuite/g++.dg/modules/init-4_b.C | 11 +++
 6 files changed, 28 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/init-4_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/init-4_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 1979572c365..ebd2850599a 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7113,7 +7113,7 @@ extern tree maybe_get_tls_wrapper_call(tree);
 extern void mark_needed(tree);
 extern bool decl_needed_p  (tree);
 extern void note_vague_linkage_fn  (tree);
-extern void note_variable_template_instantiation (tree);
+extern void note_static_storage_variable   (tree);
 extern tree build_artificial_parm  (tree, tree, tree);
 extern bool possibly_inlined_p (tree);
 extern int parm_index   (tree);
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 0850d3f5bce..241216b0dfe 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -910,10 +910,10 @@ note_vague_linkage_fn (tree decl)
   vec_safe_push (deferred_fns, decl);
 }
 
-/* As above, but for variable template instantiations.  */
+/* As above, but for variables with static storage duration.  */
 
 void
-note_variable_template_instantiation (tree decl)
+note_static_storage_variable (tree decl)
 {
   vec_safe_push (pending_statics, decl);
 }
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 0bd46414da9..14818131a70 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11752,6 +11752,10 @@ trees_in::read_var_def (tree decl, tree maybe_template)
  DECL_INITIALIZED_P (decl) = true;
  if (maybe_dup && DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P 
(maybe_dup))
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = true;
+ if (DECL_CONTEXT (decl)
+ && RECORD_OR_UNION_TYPE_P (DECL_CONTEXT (decl))
+ && !DECL_TEMPLATE_INFO (decl))
+   note_static_storage_variable (decl);
}
   DECL_INITIAL (decl) = init;
   if (!dyn_init)
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f7063e09581..ce498750758 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27150,7 +27150,7 @@ instantiate_decl (tree d, bool defer_ok, bool 
expl_inst_class_mem_p)
 {
   set_instantiating_module (d);
   if (variable_template_p (gen_tmpl))
-   note_variable_template_instantiation (d);
+   note_static_storage_variable (d);
   instantiate_body (td, args, d, false);
 }
 
diff --git a/gcc/testsuite/g++.dg/modules/init-4_a.C 
b/gcc/testsuite/g++.dg/modules/init-4_a.C
new file mode 100644
index 000..e0eb97b474e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/init-4_a.C
@@ -0,0 +1,9 @@
+// PR c++/112899
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M }
+
+export module M;
+
+export struct A {
+  static constexpr int x = -1;
+};
diff --git a/gcc/testsuite/g++.dg/modules/init-4_b.C 
b/gcc/testsuite/g++.dg/modules/init-4_b.C
new file mode 100644
index 000..d28017a1d14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/init-4_b.C
@@ -0,0 +1,11 @@
+// PR c++/112899
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import M;
+
+int main() {
+  const int& x = A::x;
+  if (x != -1)
+__builtin_abort();
+}
-- 
2.43.0

[PATCH] c++/modules: Fix ICE when writing nontrivial variable initializers

2024-01-02 Thread Nathaniel Shead

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The attached testcase Patrick found in PR c++/112899 ICEs because it is
attempting to write a variable initializer that is no longer in the
static_aggregates map.

The issue is that, for non-header modules, the loop in
c_parse_final_cleanups prunes the static_aggregates list, which means
that by the time we get to emitting module information those
initialisers have been lost.

However, we don't actually need to write non-trivial initialisers for
non-header modules, because they've already been emitted as part of the
module TU itself.  Instead let's just only write the initializers from
header modules (which skipped writing them in c_parse_final_cleanups).

gcc/cp/ChangeLog:

* module.cc (trees_out::write_var_def): Only write initializers
in header modules.

gcc/testsuite/ChangeLog:

* g++.dg/modules/init-5_a.C: New test.
* g++.dg/modules/init-5_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc|  3 ++-
 gcc/testsuite/g++.dg/modules/init-5_a.C |  9 +
 gcc/testsuite/g++.dg/modules/init-5_b.C | 10 ++
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/init-5_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/init-5_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 14818131a70..82b61a2c2ad 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -11707,7 +11707,8 @@ trees_out::write_var_def (tree decl)
 {
   tree dyn_init = NULL_TREE;
 
-  if (DECL_NONTRIVIALLY_INITIALIZED_P (decl))
+  /* We only need to write initializers in header modules.  */
+  if (header_module_p () && DECL_NONTRIVIALLY_INITIALIZED_P (decl))
{
  dyn_init = value_member (decl,
   CP_DECL_THREAD_LOCAL_P (decl)
diff --git a/gcc/testsuite/g++.dg/modules/init-5_a.C 
b/gcc/testsuite/g++.dg/modules/init-5_a.C
new file mode 100644
index 000..466b120b5a0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/init-5_a.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi M }
+
+export module M;
+
+export struct A {
+  static int f() { return -1; }
+  static inline int x = f();
+};
diff --git a/gcc/testsuite/g++.dg/modules/init-5_b.C 
b/gcc/testsuite/g++.dg/modules/init-5_b.C
new file mode 100644
index 000..40973cc6936
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/init-5_b.C
@@ -0,0 +1,10 @@
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import M;
+
+int main() {
+  const int& x = A::x;
+  if (x != -1)
+__builtin_abort();
+}
-- 
2.43.0

Re: [PATCH] aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]

2024-01-02 Thread Richard Sandiford

Andrew Pinski  writes:
> Ccmp is not used if the result of the and/ior is used by both
> a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation
> here by using ccmp in this case.
> Two changes is required, first we need to allow the outer statement's
> result be used more than once.
> The second change is that during the expansion of the gimple, we need
> to try using ccmp. This is needed because we don't use expand the ssa
> name of the lhs but rather expand directly from the gimple.
>
> A small note on the ccmp_4.c testcase, we should be able to get slightly
> better than with this patch but it is one extra instruction compared to
> before.
>
> Bootstraped and tested on aarch64-linux-gnu with no regressions.
>
>   PR target/100942
>
> gcc/ChangeLog:
>
>   * ccmp.cc (ccmp_candidate_p): Add outer argument.
>   Allow if the outer is true and the lhs is used more
>   than once.
>   (expand_ccmp_expr): Update call to ccmp_candidate_p.
>   * cfgexpand.cc (expand_gimple_stmt_1): Try using ccmp
>   for binary assignments.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/ccmp_3.c: New test.
>   * gcc.target/aarch64/ccmp_4.c: New test.
>
> Signed-off-by: Andrew Pinski 

TBH I was hoping someone more familiar with the code would comment,
since it looks like the current ccmp decision might have been deliberate.
I agree it doesn't seem to make much sense though.  It's not specific
to a use in a GIMPLE_ASSIGN + GIMPLE_COND, it applies even to uses
in two GIMPLE_ASSIGNs.  E.g.:

int f1(int a, int b, _Bool *x)
{
  x[0] = x[1] = a == 0 || b == 0;
}

int f2(int a, int b, int *x)
{
  x[0] = x[1] = a == 0 || b == 0;
}

produces:

f1:
cmp w0, 0
csetw3, eq
cmp w1, 0
csetw1, eq
orr w1, w3, w1
strbw1, [x2]
strbw1, [x2, 1]
ret

f2:
cmp w0, 0
ccmpw1, 0, 4, ne
csetw1, eq
stp w1, w1, [x2]
ret

because f2 has a cast to int (making the || single-use) whereas f1
doesn't.  Might be nice to include that as a ccmp_5.c.

> ---
>  gcc/ccmp.cc   |  9 +++---
>  gcc/cfgexpand.cc  | 25 
>  gcc/testsuite/gcc.target/aarch64/ccmp_3.c | 20 +
>  gcc/testsuite/gcc.target/aarch64/ccmp_4.c | 35 +++
>  4 files changed, 85 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_3.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ccmp_4.c
>
> diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> index 1bd6fadea35..a274f8c3d53 100644
> --- a/gcc/ccmp.cc
> +++ b/gcc/ccmp.cc
> @@ -92,7 +92,7 @@ ccmp_tree_comparison_p (tree t, basic_block bb)
>  
>  /* Check whether G is a potential conditional compare candidate.  */
>  static bool
> -ccmp_candidate_p (gimple *g)
> +ccmp_candidate_p (gimple *g, bool outer = false)

The function comment should describe the new parameter.

>  {
>tree lhs, op0, op1;
>gimple *gs0, *gs1;
> @@ -109,8 +109,9 @@ ccmp_candidate_p (gimple *g)
>lhs = gimple_assign_lhs (g);
>op0 = gimple_assign_rhs1 (g);
>op1 = gimple_assign_rhs2 (g);
> -  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
> -  || !has_single_use (lhs))
> +  if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME))
> +return false;
> +  if (!outer && !has_single_use (lhs))
>  return false;
>  
>bb = gimple_bb (g);
> @@ -284,7 +285,7 @@ expand_ccmp_expr (gimple *g, machine_mode mode)
>rtx_insn *last;
>rtx tmp;
>  
> -  if (!ccmp_candidate_p (g))
> +  if (!ccmp_candidate_p (g, true))
>  return NULL_RTX;
>  
>last = get_last_insn ();
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index b860be8bb77..0f9aad8e3eb 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "ccmp.h"
>  
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
> cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -3972,6 +3973,30 @@ expand_gimple_stmt_1 (gimple *stmt)
>   if (GET_CODE (target) == SUBREG && SUBREG_PROMOTED_VAR_P (target))
> promoted = true;
>  
> + /* Try to expand conditonal compare.  */
> + if (targetm.gen_ccmp_first
> + && gimple_assign_rhs_class (assign_stmt) == GIMPLE_BINARY_RHS)
> +   {
> + machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
> + gcc_checking_assert (targetm.gen_ccmp_next != NULL);
> + temp = expand_ccmp_expr (stmt, mode);
> + if (temp)
> +   {
> + if (promoted)
> +   {
> + int unsignedp = SUBREG_PROMOTED_SIGN (target);
> + convert_move (SUBREG_REG (target),

[PATCH] c++/modules: Emit definitions of ODR-used static members imported from modules [PR112899]

2024-01-02 Thread Nathaniel Shead

(Whoops, forgot the '[PATCH]', fixed the subject in email.)

On Wed, Jan 03, 2024 at 09:40:55AM +1100, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> Static data members marked 'inline' should be emitted in TUs where they
> are ODR-used.  We need to make sure that statics imported from modules
> are correctly added to the 'pending_statics' map so that they get
> emitted if needed, otherwise the attached testcase fails to link.
> 
>   PR c++/112899
> 
> gcc/cp/ChangeLog:
> 
>   * cp-tree.h (note_variable_template_instantiation): Rename to...
>   (note_static_storage_variable): ...this.
>   * decl2.cc (note_variable_template_instantiation): Rename to...
>   (note_static_storage_variable): ...this.
>   * pt.cc (instantiate_decl): Rename usage of above function.
>   * module.cc (trees_in::read_var_def): Remember pending statics
>   that we stream in.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/init-4_a.C: New test.
>   * g++.dg/modules/init-4_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/cp-tree.h|  2 +-
>  gcc/cp/decl2.cc |  4 ++--
>  gcc/cp/module.cc|  4 
>  gcc/cp/pt.cc|  2 +-
>  gcc/testsuite/g++.dg/modules/init-4_a.C |  9 +
>  gcc/testsuite/g++.dg/modules/init-4_b.C | 11 +++
>  6 files changed, 28 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/init-4_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/init-4_b.C
> 
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 1979572c365..ebd2850599a 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -7113,7 +7113,7 @@ extern tree maybe_get_tls_wrapper_call  (tree);
>  extern void mark_needed  (tree);
>  extern bool decl_needed_p(tree);
>  extern void note_vague_linkage_fn(tree);
> -extern void note_variable_template_instantiation (tree);
> +extern void note_static_storage_variable (tree);
>  extern tree build_artificial_parm(tree, tree, tree);
>  extern bool possibly_inlined_p   (tree);
>  extern int parm_index   (tree);
> diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
> index 0850d3f5bce..241216b0dfe 100644
> --- a/gcc/cp/decl2.cc
> +++ b/gcc/cp/decl2.cc
> @@ -910,10 +910,10 @@ note_vague_linkage_fn (tree decl)
>vec_safe_push (deferred_fns, decl);
>  }
>  
> -/* As above, but for variable template instantiations.  */
> +/* As above, but for variables with static storage duration.  */
>  
>  void
> -note_variable_template_instantiation (tree decl)
> +note_static_storage_variable (tree decl)
>  {
>vec_safe_push (pending_statics, decl);
>  }
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 0bd46414da9..14818131a70 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -11752,6 +11752,10 @@ trees_in::read_var_def (tree decl, tree 
> maybe_template)
> DECL_INITIALIZED_P (decl) = true;
> if (maybe_dup && DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P 
> (maybe_dup))
>   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = true;
> +   if (DECL_CONTEXT (decl)
> +   && RECORD_OR_UNION_TYPE_P (DECL_CONTEXT (decl))
> +   && !DECL_TEMPLATE_INFO (decl))
> + note_static_storage_variable (decl);
>   }
>DECL_INITIAL (decl) = init;
>if (!dyn_init)
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index f7063e09581..ce498750758 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -27150,7 +27150,7 @@ instantiate_decl (tree d, bool defer_ok, bool 
> expl_inst_class_mem_p)
>  {
>set_instantiating_module (d);
>if (variable_template_p (gen_tmpl))
> - note_variable_template_instantiation (d);
> + note_static_storage_variable (d);
>instantiate_body (td, args, d, false);
>  }
>  
> diff --git a/gcc/testsuite/g++.dg/modules/init-4_a.C 
> b/gcc/testsuite/g++.dg/modules/init-4_a.C
> new file mode 100644
> index 000..e0eb97b474e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/init-4_a.C
> @@ -0,0 +1,9 @@
> +// PR c++/112899
> +// { dg-additional-options "-fmodules-ts" }
> +// { dg-module-cmi M }
> +
> +export module M;
> +
> +export struct A {
> +  static constexpr int x = -1;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/init-4_b.C 
> b/gcc/testsuite/g++.dg/modules/init-4_b.C
> new file mode 100644
> index 000..d28017a1d14
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/init-4_b.C
> @@ -0,0 +1,11 @@
> +// PR c++/112899
> +// { dg-module-do run }
> +// { dg-additional-options "-fmodules-ts" }
> +
> +import M;
> +
> +int main() {
> +  const int& x = A::x;
> +  if (x != -1)
> +__builtin_abort();
> +}
> -- 
> 2.43.0
>

Re: [PATCH] c++/modules: Prevent overwriting arguments when merging duplicates [PR112588]

2024-01-02 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637768.html.

On Sat, Dec 16, 2023 at 09:50:10PM +1100, Nathaniel Shead wrote:
> Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637768.html.
> (I've changed the summary message a little from that email but the patch
> is otherwise unchanged.)
> 
> On Wed, Nov 22, 2023 at 10:33:15PM +1100, Nathaniel Shead wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu. I don't have write
> > access.
> > 
> > -- >8 --
> > 
> > When merging duplicate instantiations of function templates, currently
> > read_function_def overwrites the arguments with that of the existing
> > duplicate. This is problematic, however, since this means that the
> > PARM_DECLs in the body of the function definition no longer match with
> > the PARM_DECLs in the argument list, which causes issues when it comes
> > to generating RTL.
> > 
> > There doesn't seem to be any reason to do this replacement, so this
> > patch removes that logic.
> > 
> > PR c++/112588
> > 
> > gcc/cp/ChangeLog:
> > 
> > * module.cc (trees_in::read_function_def): Don't overwrite
> > arguments.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/merge-16.h: New test.
> > * g++.dg/modules/merge-16_a.C: New test.
> > * g++.dg/modules/merge-16_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >  gcc/cp/module.cc  |  2 --
> >  gcc/testsuite/g++.dg/modules/merge-16.h   | 10 ++
> >  gcc/testsuite/g++.dg/modules/merge-16_a.C |  7 +++
> >  gcc/testsuite/g++.dg/modules/merge-16_b.C |  5 +
> >  4 files changed, 22 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.dg/modules/merge-16.h
> >  create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_a.C
> >  create mode 100644 gcc/testsuite/g++.dg/modules/merge-16_b.C
> > 
> > diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> > index 4f5b6e2747a..2520ab659cc 100644
> > --- a/gcc/cp/module.cc
> > +++ b/gcc/cp/module.cc
> > @@ -11665,8 +11665,6 @@ trees_in::read_function_def (tree decl, tree 
> > maybe_template)
> >DECL_RESULT (decl) = result;
> >DECL_INITIAL (decl) = initial;
> >DECL_SAVED_TREE (decl) = saved;
> > -  if (maybe_dup)
> > -   DECL_ARGUMENTS (decl) = DECL_ARGUMENTS (maybe_dup);
> >  
> >if (context)
> > SET_DECL_FRIEND_CONTEXT (decl, context);
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16.h 
> > b/gcc/testsuite/g++.dg/modules/merge-16.h
> > new file mode 100644
> > index 000..fdb38551103
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16.h
> > @@ -0,0 +1,10 @@
> > +// PR c++/112588
> > +
> > +void f(int*);
> > +
> > +template 
> > +struct S {
> > +  void g(int n) { f(&n); }
> > +};
> > +
> > +template struct S;
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_a.C 
> > b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > new file mode 100644
> > index 000..c243224c875
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16_a.C
> > @@ -0,0 +1,7 @@
> > +// PR c++/112588
> > +// { dg-additional-options "-fmodules-ts" }
> > +// { dg-module-cmi merge16 }
> > +
> > +module;
> > +#include "merge-16.h"
> > +export module merge16;
> > diff --git a/gcc/testsuite/g++.dg/modules/merge-16_b.C 
> > b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > new file mode 100644
> > index 000..8c7b1f0511f
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/modules/merge-16_b.C
> > @@ -0,0 +1,5 @@
> > +// PR c++/112588
> > +// { dg-additional-options "-fmodules-ts" }
> > +
> > +#include "merge-16.h"
> > +import merge16;
> > -- 
> > 2.42.0
> >

Re: [PATCH v2] c++: Follow module grammar more closely [PR110808]

2024-01-02 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638089.html.

On Fri, Nov 24, 2023 at 10:32:13PM +1100, Nathaniel Shead wrote:
> On Thu, Nov 23, 2023 at 12:11:58PM -0500, Nathan Sidwell wrote:
> > On 11/14/23 01:24, Nathaniel Shead wrote:
> > > I'll also note that the comments above the parsing functions here no
> > > longer exactly match with the grammar in the standard, should they be
> > > updated as well?
> > 
> > please.
> > 
> 
> As I was attempting to rewrite the comments I ended up splitting up the
> work that was being done by cp_parser_module_name a lot to better match
> the grammar, and also caught a few other segfaults that were occurring
> along the way.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> This patch cleans up the parsing of module-declarations and
> import-declarations to more closely follow the grammar defined by the
> standard.
> 
> For instance, currently we allow declarations like 'import A:B', even
> from an unrelated source file (not part of module A), which causes
> errors in merging declarations. However, the syntax in [module.import]
> doesn't even allow this form of import, so this patch prevents this from
> parsing at all and avoids the error that way.
> 
> Additionally, we sometimes allow statements like 'import :X' or
> 'module :X' even when not in a named module, and this causes segfaults,
> so we disallow this too.
> 
>   PR c++/110808
> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_parser_module_name): Rewrite to handle
>   module-names and module-partitions independently.
>   (cp_parser_module_partition): New function.
>   (cp_parser_module_declaration): Parse module partitions
>   explicitly. Don't change state if parsing module decl failed.
>   (cp_parser_import_declaration): Handle different kinds of
>   import-declarations locally.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/part-hdr-1_c.C: Fix syntax.
>   * g++.dg/modules/part-mac-1_c.C: Likewise.
>   * g++.dg/modules/mod-invalid-1.C: New test.
>   * g++.dg/modules/part-8_a.C: New test.
>   * g++.dg/modules/part-8_b.C: New test.
>   * g++.dg/modules/part-8_c.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/parser.cc | 100 ---
>  gcc/testsuite/g++.dg/modules/mod-invalid-1.C |   7 ++
>  gcc/testsuite/g++.dg/modules/part-8_a.C  |   6 ++
>  gcc/testsuite/g++.dg/modules/part-8_b.C  |   6 ++
>  gcc/testsuite/g++.dg/modules/part-8_c.C  |   8 ++
>  gcc/testsuite/g++.dg/modules/part-hdr-1_c.C  |   2 +-
>  gcc/testsuite/g++.dg/modules/part-mac-1_c.C  |   2 +-
>  7 files changed, 95 insertions(+), 36 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/mod-invalid-1.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/part-8_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/part-8_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/part-8_c.C
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index f6d088bc73f..20bd8d45a08 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -14853,58 +14853,64 @@ cp_parser_already_scoped_statement (cp_parser* 
> parser, bool *if_p,
>  
>  /* Modules */
>  
> -/* Parse a module-name,
> -   identifier
> -   module-name . identifier
> -   header-name
> +/* Parse a module-name or module-partition.
>  
> -   Returns a pointer to module object, NULL.   */
> +   module-name:
> + module-name-qualifier [opt] identifier
>  
> -static module_state *
> -cp_parser_module_name (cp_parser *parser)
> -{
> -  cp_token *token = cp_lexer_peek_token (parser->lexer);
> -  if (token->type == CPP_HEADER_NAME)
> -{
> -  cp_lexer_consume_token (parser->lexer);
> +   module-partition:
> + : module-name-qualifier [opt] identifier
>  
> -  return get_module (token->u.value);
> -}
> +   module-name-qualifier:
> + identifier .
> + module-name-qualifier identifier . 
>  
> -  module_state *parent = NULL;
> -  bool partitioned = false;
> -  if (token->type == CPP_COLON && named_module_p ())
> -{
> -  partitioned = true;
> -  cp_lexer_consume_token (parser->lexer);
> -}
> +   Returns a pointer to the module object, or NULL on failure.
> +   For PARTITION_P, PARENT is the module this is a partition of.  */
> +
> +static module_state *
> +cp_parser_module_name (cp_parser *parser, bool partition_p = false,
> +module_state *parent = NULL)
> +{
> +  if (partition_p
> +  && cp_lexer_consume_token (parser->lexer)->type != CPP_COLON)
> +return NULL;
>  
>for (;;)
>  {
>if (cp_lexer_peek_token (parser->lexer)->type != CPP_NAME)
>   {
> -   cp_parser_error (parser, "expected module-name");
> -   break;
> +   if (partition_p)
> + cp_parser_error (parser, "expected module-partition");
> +   else
> + cp_parser_error (parser, "expected module-name");
> +   return NULL;
>   }

Re: [PATCH] c++/modules: Prevent treating suppressed debug info as extern template [PR112820]

2024-01-02 Thread Nathaniel Shead

Ping for https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639082.html.

On Sun, Dec 03, 2023 at 11:46:36PM +1100, Nathaniel Shead wrote:
> Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> 
> -- >8 --
> 
> The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same
> underlying bit. This is causing confusion when attempting to determine
> the interface for a streamed-in class type, since the modules code
> currently assumes that all DECL_EXTERNAL types are extern templates.
> However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence
> DECL_EXTERNAL) is marked on various other kinds of declarations, such as
> vtables, which causes them to never be emitted.
> 
> This patch constrains the checks for DECL_EXTERNAL for this to only
> consider template instantiations, thus avoiding the issue.
> 
>   PR c++/102607
>   PR c++/112820
> 
> gcc/cp/ChangeLog:
> 
>   * module.cc (trees_in::read_class_def): Only set interface for
>   template instantiations.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/debug-2_a.C: New test.
>   * g++.dg/modules/debug-2_b.C: New test.
>   * g++.dg/modules/debug-2_c.C: New test.
>   * g++.dg/modules/debug-3_a.C: New test.
>   * g++.dg/modules/debug-3_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/module.cc | 4 +++-
>  gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 +
>  gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 
>  gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 +
>  6 files changed, 46 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C
> 
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 33fcf396875..257f39421d0 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -12041,7 +12041,9 @@ trees_in::read_class_def (tree defn, tree 
> maybe_template)
>bool installing = maybe_dup && !TYPE_SIZE (type);
>if (installing)
>  {
> -  if (DECL_EXTERNAL (defn) && TYPE_LANG_SPECIFIC (type))
> +  if (DECL_EXTERNAL (defn)
> +   && TYPE_LANG_SPECIFIC (type)
> +   && CLASSTYPE_TEMPLATE_INSTANTIATION (type))
>   {
> /* We don't deal with not-really-extern, because, for a
>module you want the import to be the interface, and for a
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> new file mode 100644
> index 000..eed0905542b
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C
> @@ -0,0 +1,9 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +// { dg-module-cmi io }
> +
> +export module io;
> +
> +export struct error {
> +  virtual const char* what() const noexcept;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> new file mode 100644
> index 000..fc9afbc02e0
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C
> @@ -0,0 +1,8 @@
> +// PR c++/112820
> +// { dg-additional-options "-fmodules-ts -g" }
> +
> +module io;
> +
> +const char* error::what() const noexcept {
> +  return "bla";
> +}
> diff --git a/gcc/testsuite/g++.dg/modules/debug-2_c.C 
> b/gcc/testsuite/g++.dg/modules/debug-2_c.C
> new file mode 100644
> index 000..37117f69dcd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-2_c.C
> @@ -0,0 +1,9 @@
> +// PR c++/112820
> +// { dg-module-do link }
> +// { dg-additional-options "-fmodules-ts -g" }
> +
> +import io;
> +
> +int main() {
> +  error{};
> +}
> diff --git a/gcc/testsuite/g++.dg/modules/debug-3_a.C 
> b/gcc/testsuite/g++.dg/modules/debug-3_a.C
> new file mode 100644
> index 000..9e33d8260fd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-3_a.C
> @@ -0,0 +1,8 @@
> +// PR c++/102607
> +// { dg-additional-options "-fmodules-ts -g" }
> +// { dg-module-cmi mod }
> +
> +export module mod;
> +export struct B {
> +  virtual ~B() = default;
> +};
> diff --git a/gcc/testsuite/g++.dg/modules/debug-3_b.C 
> b/gcc/testsuite/g++.dg/modules/debug-3_b.C
> new file mode 100644
> index 000..03c78b71b5d
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/debug-3_b.C
> @@ -0,0 +1,9 @@
> +// PR c++/102607
> +// { dg-module-do link }
> +// { dg-additional-options "-fmodules-ts -g" }
> +
> +import mod;
> +int main() {
> +  struct D : B {};
> +  (void)D{};
> +}
> -- 
> 2.42.0
>

Re: [PATCH 4/4] libbacktrace: get debug information for loaded dlls

2024-01-02 Thread Björn Schäpers


Am 30.11.2023 um 20:53 schrieb Ian Lance Taylor:

On Fri, Jan 20, 2023 at 2:55 AM Björn Schäpers  wrote:


From: Björn Schäpers 

Fixes https://github.com/ianlancetaylor/libbacktrace/issues/53, except
that libraries loaded after the backtrace_initialize are not handled.
But as far as I can see that's the same for elf.


Thanks, but I don't want a patch that loops using goto statements.
Please rewrite to avoid that.  It may be simpler to call a function.

Also starting with a module count of 1000 seems like a lot.  Do
typical Windows programs load that many modules?

Ian




Rewritten using a function.

If that is commited, could you attribute that commit to me (--author="Björn 
Schäpers ")?


Thanks and kind regards,
Björn.
From bd552716ee7937cad9d54d4966532d6ea6dbc1bc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bj=C3=B6rn=20Sch=C3=A4pers?= 
Date: Sun, 30 Apr 2023 23:54:32 +0200
Subject: [PATCH] libbacktrace: get debug information for loaded dlls
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixes https://github.com/ianlancetaylor/libbacktrace/issues/53, except
that libraries loaded after the backtrace_initialize are not handled.
But as far as I can see that's the same for elf.

Tested on x86_64-linux and i686-w64-mingw32.

-- >8 --

* pecoff.c (coff_add): New argument for the module handle of the
file, to get the base address.
* pecoff.c (backtrace_initialize): Iterate over loaded libraries
and call coff_add.

Signed-off-by: Björn Schäpers 
---
 libbacktrace/pecoff.c | 104 ++
 1 file changed, 96 insertions(+), 8 deletions(-)

diff --git a/libbacktrace/pecoff.c b/libbacktrace/pecoff.c
index f976a963bf3..3eb9c4a4853 100644
--- a/libbacktrace/pecoff.c
+++ b/libbacktrace/pecoff.c
@@ -49,6 +49,7 @@ POSSIBILITY OF SUCH DAMAGE.  */
 #endif
 
 #include 
+#include 
 #endif
 
 /* Coff file header.  */
@@ -592,7 +593,8 @@ coff_syminfo (struct backtrace_state *state, uintptr_t addr,
 static int
 coff_add (struct backtrace_state *state, int descriptor,
  backtrace_error_callback error_callback, void *data,
- fileline *fileline_fn, int *found_sym, int *found_dwarf)
+ fileline *fileline_fn, int *found_sym, int *found_dwarf,
+ uintptr_t module_handle ATTRIBUTE_UNUSED)
 {
   struct backtrace_view fhdr_view;
   off_t fhdr_off;
@@ -870,12 +872,7 @@ coff_add (struct backtrace_state *state, int descriptor,
 }
 
 #ifdef HAVE_WINDOWS_H
-  {
-uintptr_t module_handle;
-
-module_handle = (uintptr_t) GetModuleHandle (NULL);
-base_address = module_handle - image_base;
-  }
+  base_address = module_handle - image_base;
 #endif
 
   if (!backtrace_dwarf_add (state, base_address, &dwarf_sections,
@@ -903,6 +900,53 @@ coff_add (struct backtrace_state *state, int descriptor,
   return 0;
 }
 
+#ifdef HAVE_WINDOWS_H
+static void
+free_modules (struct backtrace_state *state,
+ backtrace_error_callback error_callback, void *data,
+ HMODULE **modules, DWORD bytes_allocated)
+{
+  backtrace_free (state, *modules, bytes_allocated, error_callback, data);
+  *modules = NULL;
+}
+
+static void
+get_all_modules (struct backtrace_state *state,
+backtrace_error_callback error_callback, void *data, 
+HMODULE **modules, DWORD *module_count, DWORD *bytes_allocated)
+{
+  DWORD bytes_needed = 0;
+
+  for (;;)
+{
+  *bytes_allocated = *module_count * sizeof(HMODULE);
+  *modules = backtrace_alloc (state, *bytes_allocated, error_callback, 
data);
+
+  if (*modules == NULL)
+   return;
+
+  if (!EnumProcessModules (GetCurrentProcess (), *modules, *module_count,
+  &bytes_needed))
+   {
+ error_callback(data, "Could not enumerate process modules",
+(int) GetLastError ());
+ free_modules (state, error_callback, data, modules, *bytes_allocated);
+ return;
+   }
+
+  *module_count = bytes_needed / sizeof(HMODULE);
+  if (bytes_needed <= *bytes_allocated)
+   {
+ return;
+   }
+
+  free_modules (state, error_callback, data, modules, *bytes_allocated);
+  // Add an extra of 2, of some module is loaded in another thread.
+  *module_count += 2;
+}
+}
+#endif
+
 /* Initialize the backtrace data we need from an ELF executable.  At
the ELF level, all we need to do is find the debug info
sections.  */
@@ -917,12 +961,56 @@ backtrace_initialize (struct backtrace_state *state,
   int found_sym;
   int found_dwarf;
   fileline coff_fileline_fn;
+  uintptr_t module_handle = 0;
+
+#ifdef HAVE_WINDOWS_H
+  DWORD i;
+  DWORD module_count = 100;
+  DWORD bytes_allocated_for_modules = 0;
+  HMODULE *modules = NULL;
+  char module_name[MAX_PATH];
+  int module_found_sym;
+  fileline module_fileline_fn;
+
+  module_handle = (uintptr_t) GetModuleHandle (NULL);
+#endif
 
   ret = coff_add (state, descriptor, erro

[PATCH 1/4; v4] options: add gcc/regenerate-opt-urls.py

2024-01-02 Thread David Malcolm

On Wed, 2023-12-20 at 00:24 +, Joseph Myers wrote:

Thanks for the review.

> On Thu, 14 Dec 2023, David Malcolm wrote:
> 
> > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> > index 26a7e9c35070..9a394b3e2c77 100644
> > --- a/gcc/doc/sourcebuild.texi
> > +++ b/gcc/doc/sourcebuild.texi
> > @@ -813,6 +813,10 @@ options supported by this target (@pxref{Run-time 
> > Target, , Run-time
> >  Target Specification}).  This means both entries in the summary table
> >  of options and details of the individual options.
> >  @item
> > +An entry in @file{gcc/regenerate-opt-urls.py}'s TARGET_SPECIFIC_PAGES
> > +dictionary mapping from target-specific HTML documentation pages
> > +to the target specific source directory.
> 
> There should probably also be something under Front End to indicate when 
> anything needs to be done (based on a front end having its own option 
> index? one with options also present in other manuals?), outside the front 
> end's own directory, for that front end's option URLs.

I've added an item to "Anatomy of a Language Front End" for this, and
added a PER_LANGUAGE_OPTION_INDEXES to regenerate-opt-urls.py to make
it easier to do this for people unfamiliar with Python.


> 
> > +def add_entry(self, matched_text, url_suffix, language, verbose=False):
> > +# TODO: use language
> 
> This TODO seems out of date.

Removed.

> 
> > +#print(f'{url_suffix=} {index_text=}')
> 
> Various commented-out or "if 0" debugging code like this should probably 
> be removed (or made into an actual runtime conditional if desired).

I've removed them all.

Here's an updated version of this patch within the kit (patch 1/4) with
the above changes.

I've rebased on top of today's r14-6884-g046cea56fd1e8b.
I was able to use this to regenerate the generated patch (2 of 4),
and have successfully bootstrapped and regression-tested the resulting
kit on x86_64-pc-linux-gnu.

Is this updated version of the patch OK?  (I see that you approved the
remainder of the kit once this patch is ready)

Thanks
Dave
 

Changed in v4:
- added PER_LANGUAGE_OPTION_INDEXES
- added info to sourcebuild.texi on adding a new front end
- removed TODOs and out-of-date comment

Changed in v3:
- Makefile.in: added OPT_URLS_HTML_DEPS and a comment

Changed in v2:
- added convenience targets to Makefile for regenerating the .opt.urls
  files, and for running unit tests for the generation code
- parse gdc and gfortran documentation, and create LangUrlSuffix_{lang}
directives for language-specific URLs.
- add documentation to sourcebuild.texi

gcc/ChangeLog:
* Makefile.in (OPT_URLS_HTML_DEPS): New.
(regenerate-opt-urls): New target.
(regenerate-opt-urls-unit-test): New target.
* doc/options.texi (Option properties): Add UrlSuffix and
description of regenerate-opt-urls.py.  Add LangUrlSuffix_*.
* doc/sourcebuild.texi (Anatomy of a Language Front End): Add
reference to regenerate-opt-urls.py's PER_LANGUAGE_OPTION_INDEXES
and Makefile.in's OPT_URLS_HTML_DEPS.
(Anatomy of a Target Back End): Add
reference to regenerate-opt-urls.py's TARGET_SPECIFIC_PAGES.
* regenerate-opt-urls.py: New file.

Signed-off-by: David Malcolm 
---
 gcc/Makefile.in|  16 ++
 gcc/doc/options.texi   |  26 +++
 gcc/doc/sourcebuild.texi   |   9 +
 gcc/regenerate-opt-urls.py | 400 +
 4 files changed, 451 insertions(+)
 create mode 100755 gcc/regenerate-opt-urls.py

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 937380001877..07f4646ca58f 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3612,6 +3612,22 @@ $(build_htmldir)/gccinstall/index.html: 
$(TEXI_GCCINSTALL_FILES)
DESTDIR=$(@D) \
$(SHELL) $(srcdir)/doc/install.texi2html
 
+# Regenerate the .opt.urls files from the generated html, and from the .opt
+# files.  Doing so requires all languages that have their own HTML manuals
+# to be enabled.
+.PHONY: regenerate-opt-urls
+OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \
+   $(build_htmldir)/gdc/Option-Index.html \
+   $(build_htmldir)/gfortran/Option-Index.html
+
+regenerate-opt-urls: $(srcdir)/regenerate-opt-urls.py $(OPT_URLS_HTML_DEPS)
+   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
$(srcdir))
+
+# Run the unit tests for regenerate-opt-urls.py
+.PHONY: regenerate-opt-urls-unit-test
+regenerate-opt-urls-unit-test: $(OPT_URLS_HTML_DEPS)
+   $(srcdir)/regenerate-opt-urls.py $(build_htmldir) $(shell dirname 
$(srcdir)) --unit-test
+
 MANFILES = doc/gcov.1 doc/cpp.1 doc/gcc.1 doc/gfdl.7 doc/gpl.7 \
doc/fsf-funding.7 doc/gcov-tool.1 doc/gcov-dump.1 \
   $(if $(filter yes,@enable_lto@),doc/lto-dump.1)
diff --git a/gcc/doc/options.texi b/gcc/doc/options.texi
index 715f0a1479c7..37d7ecc1477d 100644
--- a/gcc/doc/options.texi
+++ b/gcc/doc/options.texi
@@ -597,4 +597,30 @@ This warning option corresponds to @co

Re: Re: [committed] RISC-V: Modify copyright year of vector-crypto.md

2024-01-02 Thread Feng Wang

2024-01-03 00:32 Jeff Law  wrote:



>

>

>On 1/1/24 19:25, Feng Wang wrote:

>> gcc/ChangeLog:

>>  * config/riscv/vector-crypto.md: Modify copyright year.

>> ---

>>   gcc/config/riscv/vector-crypto.md | 2 +-

>>   1 file changed, 1 insertion(+), 1 deletion(-)

>> 

>> diff --git a/gcc/config/riscv/vector-crypto.md 
>> b/gcc/config/riscv/vector-crypto.md

>> index e40b1543954..9625014e45e 100755

>> --- a/gcc/config/riscv/vector-crypto.md

>> +++ b/gcc/config/riscv/vector-crypto.md

>> @@ -1,5 +1,5 @@

>>   ;; Machine description for the RISC-V Vector Crypto  extensions.

>> -;; Copyright (C) 2023 Free Software Foundation, Inc.

>> +;; Copyright (C) 2024 Free Software Foundation, Inc.Please don't change 
>> Copyright notices in the future.  There's very 

>specific rules around those and we do them en-masse at the start of the 

>year using existing scripts and such.

>
>jeff

OK, got it. Thanks.

Re: [PATCH 1/4; v4] options: add gcc/regenerate-opt-urls.py

2024-01-02 Thread Joseph Myers

On Tue, 2 Jan 2024, David Malcolm wrote:

> > > +#print(f'{url_suffix=} {index_text=}')
> > 
> > Various commented-out or "if 0" debugging code like this should probably 
> > be removed (or made into an actual runtime conditional if desired).
> 
> I've removed them all.

There are still a few left.  The patch is OK with the remaining 
commented-out code removed or made into runtime conditionals.

> +#print(f'{url_suffix=} {index_text=}')

Here.

> +class OptFile:
> +def __init__(self, opt_path, rel_path):
> +"""
> +Parse a .opt file.  Similar to opt-gather.awk.
> +"""
> +self.rel_path = rel_path
> +assert rel_path.startswith('gcc')
> +# self.filename = os.path.basename(path)
> +self.records = []
> +with open(opt_path) as f:
> +flag = 0
> +for line in f:
> +#print(repr(line))

And these two here (well, the commented-out setting of self.filename may 
be something else rather than debugging code, but much the same still 
applies).

-- 
Joseph S. Myers
j...@polyomino.org.uk

Re: [PATCH v7] libgfortran: Replace mutex with rwlock

2024-01-02 Thread Lipeng Zhu





On 2024/1/2 11:57, Vaseeharan Vinayagamoorthy wrote:

Hi Lipeng,

It looks like your draft patch to fix the builds for arm-none-eabi target is 
not merged yet, because our arm-none-eabi builds are still broken. Are you 
waiting for additional information, or would you be able to fix this issue?

Kind regards,
Vasee


Hi Vasee,

Actually I already sent a patch 
https://inbox.sourceware.org/gcc-patches/20231222023605.3894839-1-lipeng@intel.com/ 
to fix the build failure issue, now it is waiting for community to review.


Lipeng Zhu


From: Richard Earnshaw 
Sent: 15 December 2023 19:23
To: Lipeng Zhu ; Richard Earnshaw ; 
ja...@redhat.com 
Cc: fort...@gcc.gnu.org ; gcc-patches@gcc.gnu.org ; 
hongjiu...@intel.com ; pan.d...@intel.com ; rep.dot@gmail.com 
; tianyou...@intel.com ; tkoe...@netcologne.de 
; wangyang@intel.com 
Subject: Re: [PATCH v7] libgfortran: Replace mutex with rwlock



On 15/12/2023 11:31, Lipeng Zhu wrote:



On 2023/12/14 23:50, Richard Earnshaw (lists) wrote:

On 09/12/2023 15:39, Lipeng Zhu wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test
server with 220 cores. The benchmark we used is
https://github.com/rwesson/NEAT

libgcc/ChangeLog:

 * gthr-posix.h (__GTHREAD_RWLOCK_INIT): New macro.
 (__gthrw): New function.
 (__gthread_rwlock_rdlock): New function.
 (__gthread_rwlock_tryrdlock): New function.
 (__gthread_rwlock_wrlock): New function.
 (__gthread_rwlock_trywrlock): New function.
 (__gthread_rwlock_unlock): New function.

libgfortran/ChangeLog:

 * io/async.c (DEBUG_LINE): New macro.
 * io/async.h (RWLOCK_DEBUG_ADD): New macro.
 (CHECK_RDLOCK): New macro.
 (CHECK_WRLOCK): New macro.
 (TAIL_RWLOCK_DEBUG_QUEUE): New macro.
 (IN_RWLOCK_DEBUG_QUEUE): New macro.
 (RDLOCK): New macro.
 (WRLOCK): New macro.
 (RWUNLOCK): New macro.
 (RD_TO_WRLOCK): New macro.
 (INTERN_RDLOCK): New macro.
 (INTERN_WRLOCK): New macro.
 (INTERN_RWUNLOCK): New macro.
 * io/io.h (struct gfc_unit): Change UNIT_LOCK to UNIT_RWLOCK in
 a comment.
 (unit_lock): Remove including associated internal_proto.
 (unit_rwlock): New declarations including associated internal_proto.
 (dec_waiting_unlocked): Use WRLOCK and RWUNLOCK on unit_rwlock
 instead of __gthread_mutex_lock and __gthread_mutex_unlock on
 unit_lock.
 * io/transfer.c (st_read_done_worker): Use WRLOCK and RWUNLOCK on
 unit_rwlock instead of LOCK and UNLOCK on unit_lock.
 (st_write_done_worker): Likewise.
 * io/unit.c: Change UNIT_LOCK to UNIT_RWLOCK in 'IO locking rules'
 comment. Use unit_rwlock variable instead of unit_lock variable.
 (get_gfc_unit_from_unit_root): New function.
 (get_gfc_unit): Use RDLOCK, WRLOCK and RWUNLOCK on unit_rwlock
 instead of LOCK and UNLOCK on unit_lock.
 (close_unit_1): Use WRLOCK and RWUNLOCK on unit_rwlock instead of
 LOCK and UNLOCK on unit_lock.
 (close_units): Likewise.
 (newunit_alloc): Use RWUNLOCK on unit_rwlock instead of UNLOCK on
 unit_lock.
 * io/unix.c (find_file): Use RDLOCK and RWUNLOCK on unit_rwlock
 instead of LOCK and UNLOCK on unit_lock.
 (flush_all_units): Use WRLOCK and RWUNLOCK on unit_rwlock instead
 of LOCK and UNLOCK on unit_lock.



It looks like this has broken builds on arm-none-eabi when using newlib:

In file included from
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran
/runtime/error.c:27:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h: In
function
‘dec_waiting_unlocked’:
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1023:3: error
: implicit declaration of function ‘WRLOCK’
[-Wimplicit-function-declaration]
   1023 |   WRLOCK (&unit_rwlock);
|   ^~
/work/rearnsha/gnusrc/nightly/gcc-cross/master/libgfortran/io/io.h:1025:3: error
: implicit declaration of function ‘RWUNLOCK’
[-Wimplicit-function-declaration]
   1025 |   RWUNLOCK (&unit_rwlock);
|   ^~~~


R.


Hi Richard,

The root cause is that the macro WRLOCK and RWUNLOCK are not defined in
io.h. The reason of x86 platform not failed is that
HAVE_ATOMIC_FETCH_ADD is defined then caused above macros were never
been used. Code logic show as below:
#ifdef HAVE_ATOMIC_FETCH_ADD
(void) __atomic_fetch_add (&u->waiting, -1, __ATOMIC_RELAXED);
#else
WRLOCK (&unit_rwlock);
u->waiting--;
RWUNLOCK (&unit_rwlock);
#endif

I just draft a patch try to fix this bug, because I didn't have arm
platform, woul

Re: [PATCH v6 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-02 Thread juzhe.zh...@rivai.ai

\ No newline at end of file

All files Need newline.


juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-03 09:01
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v6 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
.../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
.../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
.../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
.../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
.../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
.../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
.../riscv/rvv/base/zvkned-intrinsic.c | 105 ++
.../riscv/rvv/base/zvknha-intrinsic.c |  33 
.../riscv/rvv/base/zvknhb-intrinsic.c |  33 
.../riscv/rvv/base/zvksed-intrinsic.c |  33 
.../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
12 files changed, 549 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t

[PATCH v6 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-02 Thread Feng Wang

Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c

This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
 .../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
 .../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
 .../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
 .../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
 .../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
 .../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
 .../riscv/rvv/base/zvkned-intrinsic.c | 105 ++
 .../riscv/rvv/base/zvknha-intrinsic.c |  33 
 .../riscv/rvv/base/zvknhb-intrinsic.c |  33 
 .../riscv/rvv/base/zvksed-intrinsic.c |  33 
 .../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
 gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
 12 files changed, 549 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u32m1_m(mask, vs2, vl);
+}
+
+vuint64m1_t test_vbrev8_v_u64m1_tumu(vbool64_t mask, vuint64

[PATCH v3 0/3] Libatomic: Add LSE128 atomics support for AArch64

2024-01-02 Thread Victor Do Nascimento

v3 updates:

   1. In the absence of the `HWCAP_LSE128' feature bit in the current
   Linux Kernel release, the feature check continues to rely on a user
   space-issued `mrs' instruction.  Since the the ABI for exporting
   the AArch64 CPU ID/feature registers to userspace relies on
   FEAT_IDST [1], we make the ID_AA64ISAR0_EL1-mediated feature check
   contingent on having the HWCAP_CPUID bit set, ensuring FEAT_IDST
   support, avoiding potential runtime errors.

   2. It is established that, given LSE2 is mandatory from Armv8.4
   onward, LSE128 as introduced from Armv9.4 necessarily implies LSE2,
   such that a separate check for LSE2 is not necessary.

   3. Given that the infrastructure for exposing `mrs' to userspace
   hooks into the exception handler, the feature-register read is
   relatively expensive and ought to be avoided where possible.
   Consequently, where we can ascertain whether prerequisites are met
   via HWCAPS, we query these as a way of returning early where we
   known unequivocally that a given feature cannot be implemented due
   to unmet dependencies.  Such checks are added to both `has_lse2'
   and `has_lse128'.

Regression-tested on aarch64-none-linux-gnu on Cortex-A72 and
LSE128-enabled Armv-A Base RevC AEM FVP.

[1] https://www.kernel.org/doc/html/v6.6/arch/arm64/cpu-feature-registers.html

---

Building upon Wilco Dijkstra's work on AArch64 128-bit atomics for
Libatomic, namely the patches from [1] and [2],  this patch series
extends the library's  capabilities to dynamically select and emit
Armv9.4-a LSE128 implementations of atomic operations via ifuncs at
run-time whenever architectural support is present.

Regression tested on aarch64-linux-gnu target with LSE128-support.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620529.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626358.html

Victor Do Nascimento (3):
  libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface
  libatomic: Enable LSE128 128-bit atomics for armv9.4-a
  aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 ++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 251 ---
 libatomic/config/linux/aarch64/host-config.h |  36 ++-
 libatomic/configure  |  59 -
 libatomic/configure.ac   |   1 +
 8 files changed, 331 insertions(+), 42 deletions(-)

-- 
2.42.0

[PATCH v3 2/3] libatomic: Enable LSE128 128-bit atomics for armv9.4-a

2024-01-02 Thread Victor Do Nascimento

The armv9.4-a architectural revision adds three new atomic operations
associated with the LSE128 feature:

  * LDCLRP - Atomic AND NOT (bitclear) of a location with 128-bit
  value held in a pair of registers, with original data loaded into
  the same 2 registers.
  * LDSETP - Atomic OR (bitset) of a location with 128-bit value held
  in a pair of registers, with original data loaded into the same 2
  registers.
  * SWPP - Atomic swap of one 128-bit value with 128-bit value held
  in a pair of registers.

This patch adds the logic required to make use of these when the
architectural feature is present and a suitable assembler available.

In order to do this, the following changes are made:

  1. Add a configure-time check to check for LSE128 support in the
  assembler.
  2. Edit host-config.h so that when N == 16, nifunc = 2.
  3. Where available due to LSE128, implement the second ifunc, making
  use of the novel instructions.
  4. For atomic functions unable to make use of these new
  instructions, define a new alias which causes the _i1 function
  variant to point ahead to the corresponding _i2 implementation.

libatomic/ChangeLog:

* Makefile.am (AM_CPPFLAGS): add conditional setting of
-DHAVE_FEAT_LSE128.
* acinclude.m4 (LIBAT_TEST_FEAT_LSE128): New.
* config/linux/aarch64/atomic_16.S (LSE128): New macro
definition.
(libat_exchange_16): New LSE128 variant.
(libat_fetch_or_16): Likewise.
(libat_or_fetch_16): Likewise.
(libat_fetch_and_16): Likewise.
(libat_and_fetch_16): Likewise.
* config/linux/aarch64/host-config.h (IFUNC_COND_2): New.
(IFUNC_NCOND): Add operand size checking.
(has_lse2): Renamed from `ifunc1`.
(has_lse128): New.
(HAS_LSE128): Likewise.
* libatomic/configure.ac: Add call to LIBAT_TEST_FEAT_LSE128.
* configure (ac_subst_vars): Regenerated via autoreconf.
* libatomic/Makefile.in: Likewise.
* libatomic/auto-config.h.in: Likewise.
---
 libatomic/Makefile.am|   3 +
 libatomic/Makefile.in|   1 +
 libatomic/acinclude.m4   |  19 +++
 libatomic/auto-config.h.in   |   3 +
 libatomic/config/linux/aarch64/atomic_16.S   | 170 ++-
 libatomic/config/linux/aarch64/host-config.h |  29 +++-
 libatomic/configure  |  59 ++-
 libatomic/configure.ac   |   1 +
 8 files changed, 276 insertions(+), 9 deletions(-)

diff --git a/libatomic/Makefile.am b/libatomic/Makefile.am
index c0b8dea5037..24e843db67d 100644
--- a/libatomic/Makefile.am
+++ b/libatomic/Makefile.am
@@ -130,6 +130,9 @@ libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix 
_$(s)_.lo,$(SIZEOBJS)))
 ## On a target-specific basis, include alternates to be selected by IFUNC.
 if HAVE_IFUNC
 if ARCH_AARCH64_LINUX
+if ARCH_AARCH64_HAVE_LSE128
+AM_CPPFLAGS = -DHAVE_FEAT_LSE128
+endif
 IFUNC_OPTIONS   = -march=armv8-a+lse
 libatomic_la_LIBADD += $(foreach s,$(SIZES),$(addsuffix 
_$(s)_1_.lo,$(SIZEOBJS)))
 libatomic_la_SOURCES += atomic_16.S
diff --git a/libatomic/Makefile.in b/libatomic/Makefile.in
index dc2330b91fd..cd48fa21334 100644
--- a/libatomic/Makefile.in
+++ b/libatomic/Makefile.in
@@ -452,6 +452,7 @@ M_SRC = $(firstword $(filter %/$(M_FILE), $(all_c_files)))
 libatomic_la_LIBADD = $(foreach s,$(SIZES),$(addsuffix \
_$(s)_.lo,$(SIZEOBJS))) $(am__append_1) $(am__append_3) \
$(am__append_4) $(am__append_5)
+@ARCH_AARCH64_HAVE_LSE128_TRUE@@ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@AM_CPPFLAGS
 = -DHAVE_FEAT_LSE128
 @ARCH_AARCH64_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv8-a+lse
 @ARCH_ARM_LINUX_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=armv7-a+fp 
-DHAVE_KERNEL64
 @ARCH_I386_TRUE@@HAVE_IFUNC_TRUE@IFUNC_OPTIONS = -march=i586
diff --git a/libatomic/acinclude.m4 b/libatomic/acinclude.m4
index f35ab5b60a5..4197db8f404 100644
--- a/libatomic/acinclude.m4
+++ b/libatomic/acinclude.m4
@@ -83,6 +83,25 @@ AC_DEFUN([LIBAT_TEST_ATOMIC_BUILTIN],[
   ])
 ])
 
+dnl
+dnl Test if the host assembler supports armv9.4-a LSE128 isns.
+dnl
+AC_DEFUN([LIBAT_TEST_FEAT_LSE128],[
+  AC_CACHE_CHECK([for armv9.4-a LSE128 insn support],
+[libat_cv_have_feat_lse128],[
+AC_LANG_CONFTEST([AC_LANG_PROGRAM([],[asm(".arch armv9-a+lse128")])])
+if AC_TRY_EVAL(ac_link); then
+  eval libat_cv_have_feat_lse128=yes
+else
+  eval libat_cv_have_feat_lse128=no
+fi
+rm -f conftest*
+  ])
+  LIBAT_DEFINE_YESNO([HAVE_FEAT_LSE128], [$libat_cv_have_feat_lse128],
+   [Have LSE128 support for 16 byte integers.])
+  AM_CONDITIONAL([ARCH_AARCH64_HAVE_LSE128], [test x$libat_cv_have_feat_lse128 
= xyes])
+])
+
 dnl
 dnl Test if we have __atomic_load and __atomic_store for mode $1, size $2
 dnl
diff --git a/libatomic/auto-config.h.in b/libatomic/auto-config.h.in
index ab3424a759e..7c78933b07d 100644
--- a/libatomic/auto-confi

[PATCH v3 1/3] libatomic: atomic_16.S: Improve ENTRY, END and ALIAS macro interface

2024-01-02 Thread Victor Do Nascimento

The introduction of further architectural-feature dependent ifuncs
for AArch64 makes hard-coding ifunc `_i' suffixes to functions
cumbersome to work with.  It is awkward to remember which ifunc maps
onto which arch feature and makes the code harder to maintain when new
ifuncs are added and their suffixes possibly altered.

This patch uses pre-processor `#define' statements to map each suffix to
a descriptive feature name macro, for example:

  #define LSE2 _i1

and reconstructs function names with the pre-processor's token
concatenation feature, such that for `MACRO(_i)', we would
now have `MACRO_FEAT(name, feature)' and in the macro definition body
we replace `name` with `name##feature`.

Consequently, for base functionality, where the ifunc suffix is
absent, the macro interface remains the same.  For example, the entry
and endpoints of `libat_store_16' remain defined by:

  - ENTRY (libat_store_16)
and
  - END (libat_store_16)

For the LSE2 implementation of the same 16-byte atomic store, we now
have:

  - ENTRY_FEAT (libat_store_16, LSE2)
and
  - END_FEAT (libat_store_16, LSE2)

For the alising of ifunc names, we define the following new
implementation of the ALIAS macro:

  - ALIAS (FN_BASE_NAME, FROM_SUFFIX, TO_SUFFIX)

Defining the base feature name macro to map `CORE' to the empty string,
mapping LSE2 to the base implementation, we'd alias the LSE2
`libat_exchange_16' to it base implementation with:

  - ALIAS (libat_exchange_16, LSE2, CORE)

libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (CORE): New macro.
(LSE2): Likewise.
(ENTRY_FEAT): Likewise.
(END_FEAT): Likewise.
(ENTRY_FEAT1): Likewise.
(END_FEAT1): Likewise.
(ALIAS): Modify macro to take in `arch' arguments.
---
 libatomic/config/linux/aarch64/atomic_16.S | 83 +-
 1 file changed, 49 insertions(+), 34 deletions(-)

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index a099037179b..eb8e749b8a2 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -40,22 +40,38 @@
 
.arch   armv8-a+lse
 
-#define ENTRY(name)\
-   .global name;   \
-   .hidden name;   \
-   .type name,%function;   \
-   .p2align 4; \
-name:  \
-   .cfi_startproc; \
+#define ENTRY(name) ENTRY_FEAT (name, CORE)
+
+#define ENTRY_FEAT(name, feat) \
+   ENTRY_FEAT1(name, feat)
+
+#define ENTRY_FEAT1(name, feat)\
+   .global name##feat; \
+   .hidden name##feat; \
+   .type name##feat,%function; \
+   .p2align 4; \
+name##feat:\
+   .cfi_startproc; \
hint34  // bti c
 
-#define END(name)  \
-   .cfi_endproc;   \
-   .size name, .-name;
+#define END(name) END_FEAT (name, CORE)
 
-#define ALIAS(alias,name)  \
-   .global alias;  \
-   .set alias, name;
+#define END_FEAT(name, feat)   \
+   END_FEAT1(name, feat)
+
+#define END_FEAT1(name, feat)  \
+   .cfi_endproc;   \
+   .size name##feat, .-name##feat;
+
+#define ALIAS(alias, from, to) \
+   ALIAS1(alias,from,to)
+
+#define ALIAS1(alias, from, to)\
+   .global alias##from;\
+   .set alias##from, alias##to;
+
+#define CORE
+#define LSE2   _i1
 
 #define res0 x0
 #define res1 x1
@@ -108,7 +124,7 @@ ENTRY (libat_load_16)
 END (libat_load_16)
 
 
-ENTRY (libat_load_16_i1)
+ENTRY_FEAT (libat_load_16, LSE2)
cbnzw1, 1f
 
/* RELAXED.  */
@@ -128,7 +144,7 @@ ENTRY (libat_load_16_i1)
ldp res0, res1, [x0]
dmb ishld
ret
-END (libat_load_16_i1)
+END_FEAT (libat_load_16, LSE2)
 
 
 ENTRY (libat_store_16)
@@ -148,7 +164,7 @@ ENTRY (libat_store_16)
 END (libat_store_16)
 
 
-ENTRY (libat_store_16_i1)
+ENTRY_FEAT (libat_store_16, LSE2)
cbnzw4, 1f
 
/* RELAXED.  */
@@ -160,7 +176,7 @@ ENTRY (libat_store_16_i1)
stlxp   w4, in0, in1, [x0]
cbnzw4, 1b
ret
-END (libat_store_16_i1)
+END_FEAT (libat_store_16, LSE2)
 
 
 ENTRY (libat_exchange_16)
@@ -237,7 +253,7 @@ ENTRY (libat_compare_exchange_16)
 END (libat_compare_exchange_16)
 
 
-ENTRY (libat_compare_exchange_16_i1)
+ENTRY_FEAT (libat_compare_exchange_16, LSE2)
ldp exp0, exp1, [x1]
mov tmp0, exp0
mov tmp1, exp1
@@ -270,7 +286,7 @@ ENTRY (libat_compare_exchange_16_i1)
/* ACQ_REL/SEQ_CST.  */
 4: caspal  exp0, exp1, in0, in1, [x0]
b   0b
-END (libat_compare_exchange_16_i1)
+END_FEAT (libat_compare_exchange_16, LSE2)
 
 
 ENTRY (libat_fetch_add_16)
@@ -556,21 +572,20 @@ END (libat_test_and_set_16)
 
 /* Alias entry points which are the same in baseline and LSE2.  */
 
-AL

[PATCH v3 3/3] aarch64: Add explicit checks for implicit LSE/LSE2 requirements.

2024-01-02 Thread Victor Do Nascimento

At present, Evaluation of both `has_lse2(hwcap)' and
`has_lse128(hwcap)' may require issuing an `mrs' instruction to query
a system register.  This instruction, when issued from user-space
results in a trap by the kernel which then returns the value read in
by the system register.  Given the undesirable nature of the
computational expense associated with the context switch, it is
important to implement mechanisms to, wherever possible, forgo the
operation.

In light of this, given how other architectural requirements serving
as prerequisites have long been assigned HWCAP bits by the kernel, we
can inexpensively query for their availability before attempting to
read any system registers.  Where one of these early tests fail, we
can assert that the main feature of interest (be it LSE2 or LSE128)
cannot be present, allowing us to return from the function early and
skip the unnecessary expensive kernel-mediated access to system
registers.

libatomic/ChangeLog:

* config/linux/aarch64/host-config.h (has_lse2): Add test for LSE.
(has_lse128): Add test for LSE2.
---
 libatomic/config/linux/aarch64/host-config.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index c5485d63855..3be4db6e5f8 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -53,8 +53,13 @@
 static inline bool
 has_lse2 (unsigned long hwcap)
 {
+  /* Check for LSE2.  */
   if (hwcap & HWCAP_USCAT)
 return true;
+  /* No point checking further for atomic 128-bit load/store if LSE
+ prerequisite not met.  */
+  if (!(hwcap & HWCAP_ATOMICS))
+return false;
   if (!(hwcap & HWCAP_CPUID))
 return false;
 
@@ -76,12 +81,14 @@ has_lse2 (unsigned long hwcap)
 static inline bool
 has_lse128 (unsigned long hwcap)
 {
-  if (!(hwcap & HWCAP_CPUID))
-return false;
+  /* In the absence of HWCAP_CPUID, we are unable to check for LSE128, return.
+ If feature check available, check LSE2 prerequisite before proceeding.  */
+  if (!(hwcap & HWCAP_CPUID) || !(hwcap & HWCAP_USCAT))
+ return false;
   unsigned long isar0;
   asm volatile ("mrs %0, ID_AA64ISAR0_EL1" : "=r" (isar0));
   if (AT_FEAT_FIELD (isar0) >= 3)
-return true;
+  return true;
   return false;
 }
 
-- 
2.42.0

[PATCH v4] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-02 Thread Jun Sha (Joshua)

For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.

To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.

gcc/ChangeLog:

* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,th):
* config/riscv/vector.md:

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md  |   6 +-
 gcc/config/riscv/vector.md | 314 +
 2 files changed, 185 insertions(+), 135 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 68f7203b676..d736501784d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -504,7 +504,7 @@
 ;; Widening instructions have group-overlap constraints.  Those are only
 ;; valid for certain register-group sizes.  This attribute marks the
 ;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0,th"
   (const_string "none"))
 
 (define_attr "group_overlap_valid" "no,yes"
@@ -543,6 +543,10 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
> 1"))
 (const_string "no")
+
+ (and (eq_attr "group_overlap" "th")
+ (match_test "TARGET_XTHEADVECTOR"))
+(const_string "no")
 ]
(const_string "yes")))
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 5fa30716143..77eaba16c97 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3255,7 +3255,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none,none")])
 
 (define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, &vr")
@@ -3274,7 +3275,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,th,none")])
 
 (define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3294,7 +3296,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3314,7 +3317,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3363,7 +3367,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3384,7 +3389,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3433,7 +3439,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_msbc_extended_scalar"
   [(set (match_operand: 0 "register_operand"  "=vr, &vr")
@@ -3454,7 +3461,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_madc_overflow"
   [(set (match_operand: 0 "register_operand" "=vr, &vr, &vr")
@@ -3472,7 +3480,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "3")
-   (set (attr "avl_type_idx") (const_int 4))])
+   (set (attr "avl_type_idx") (const_int 4))
+   (set_attr "group_overlap" "th,none,none")])
 
 (define_insn "@pred_msbc_overflow"
   [(set (m

[PATCH v4] RISC-V: Rewrite some instructions using ASM targethook

2024-01-02 Thread Jun Sha (Joshua)

There are some xtheadvector instructions that differ from RVV1.0
apart from simply adding "th." prefix. For example, RVV1.0
load/store instructions will have SEW while xtheadvector not;
RVV1.0 will have "o" for indexed-ordered store instructions while
xtheadvecotr not; xtheadvector and RVV1.0 have different
vnsrl/vnsra/vfncvt suffix (vv/vx/vi vs wv/wx/wi).

To address this issue without duplicating patterns, we use ASM
targethook to rewrite the whole string of the instructions. We
identify different instructions from the corresponding attribute.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_asm_output_opcode):

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.cc | 213 +-
 1 file changed, 210 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a80bf8d1a74..13cdfc4ee27 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5646,9 +5646,216 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
 {
   /* We need to add th. prefix to all the xtheadvector
  insturctions here.*/
-  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
-  p[0] == 'v')
-fputs ("th.", asm_out_file);
+  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX)
+{
+  if (get_attr_type (current_output_insn) == TYPE_VLDE ||
+ get_attr_type (current_output_insn) == TYPE_VSTE ||
+ get_attr_type (current_output_insn) == TYPE_VLDFF)
+   {
+ if (strstr (p, "e8") || strstr (p, "e16") ||
+ strstr (p, "e32") || strstr (p, "e64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTE
+ ? fputs ("th.vse", asm_out_file)
+ : fputs ("th.vle", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+4;
+ else
+   return p+5;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLDS ||
+ get_attr_type (current_output_insn) == TYPE_VSTS)
+   {
+ if (strstr (p, "vle8") || strstr (p, "vse8") ||
+ strstr (p, "vle16") || strstr (p, "vse16") ||
+ strstr (p, "vle32") || strstr (p, "vse32") ||
+ strstr (p, "vle64") || strstr (p, "vse64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTS
+ ? fputs ("th.vse", asm_out_file)
+ : fputs ("th.vle", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+4;
+ else
+   return p+5;
+   }
+ else if (strstr (p, "vlse8") || strstr (p, "vsse8") ||
+  strstr (p, "vlse16") || strstr (p, "vsse16") ||
+  strstr (p, "vlse32") || strstr (p, "vsse32") ||
+  strstr (p, "vlse64") || strstr (p, "vsse64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTS
+ ? fputs ("th.vsse", asm_out_file)
+ : fputs ("th.vlse", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+5;
+ else
+   return p+6;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLDUX ||
+ get_attr_type (current_output_insn) == TYPE_VLDOX)
+   {
+ if (strstr (p, "ei"))
+   {
+ fputs ("th.vlxe", asm_out_file);
+ if (strstr (p, "ei8"))
+   return p+7;
+ else
+   return p+8;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VSTUX ||
+ get_attr_type (current_output_insn) == TYPE_VSTOX)
+   {
+ if (strstr (p, "ei"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTUX
+   ? fputs ("th.vsuxe", asm_out_file)
+   : fputs ("th.vsxe", asm_out_file);
+ if (strstr (p, "ei8"))
+   return p+7;
+ else
+   return p+8;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLSEGDE ||
+ get_attr_type (current_output_insn) == TYPE_VSSEGTE ||
+ get_attr_type (current_output_insn) == TYPE_VLSEGDFF)
+   {
+ get_attr_type (current_output_insn) == TYPE_VSSEGTE
+   ? fputs ("th.vsseg", asm_out_file)
+   : fputs ("th.vlseg", asm_out_file);
+ asm_fprintf (asm_out_file, "%c", p[5]);
+ fputs ("e", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+8;
+ else
+   return p+9;
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLSEGDS ||
+ get_attr_type (current_output_insn) == TYPE_VSSEGTS)
+   {
+ get_attr_type (current

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread Andrew Pinski

On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> I will be back to work so I will take a look at other patches today.


Note I hate it. It would be better if you use something like `%^' (see
`~` for an example of how that works) instead of hacking
riscv_asm_output_opcode really. In fact that is how other targets
implement this kind of things.

Thanks,
Andrew PInski

> 
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > New function to add assembler insn code prefix/suffix.
> > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h|  1 +
> >   gcc/config/riscv/riscv.cc  | 14 ++
> >   gcc/config/riscv/riscv.h   |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index 31049ef7523..5ea54b45703 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -102,6 +102,7 @@ struct riscv_address_info {
> >   };
> >
> >   /* Routines implemented in riscv.cc.  */
> > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> > *p);
> >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> > return lmul;
> >   }
> >
> > +/* Define ASM_OUTPUT_OPCODE to do anything special before
> > +   emitting an opcode.  */
> > +const char *
> > +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> > +{
> > +  /* We need to add th. prefix to all the xtheadvector
> > + insturctions here.*/
> > +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> > +  p[0] == 'v')
> > +fputs ("th.", asm_out_file);
> > +
> > +  return p;
> Just a formatting nit. The GNU standards break lines before the
> operator, not after.  So
>if (TARGET_XTHEADVECTOR
>&& current_output_insn != NULL
>&& p[0] == 'v')
>
> Note that current_output_insn is "extern rtx_insn *", so use NULL, not
> NULL_RTX.
>
> Neither of these nits require a new version for review.  Just fix them.
>
> If Juzhe is fine with this, so am I.  We can refine it if necessary later.
>
> jeff
>

Re: [PATCH v4] RISC-V: Rewrite some instructions using ASM targethook

2024-01-02 Thread Kito Cheng

Please move those code logic to thead.cc, e.g.

if (TARGET_XTHEADVECTOR)
  return th_asm_output_opcode (asm_out_file, p);

And then implement th_asm_output_opcode in thead.cc.


On Wed, Jan 3, 2024 at 10:39 AM Jun Sha (Joshua)
 wrote:
>
> There are some xtheadvector instructions that differ from RVV1.0
> apart from simply adding "th." prefix. For example, RVV1.0
> load/store instructions will have SEW while xtheadvector not;
> RVV1.0 will have "o" for indexed-ordered store instructions while
> xtheadvecotr not; xtheadvector and RVV1.0 have different
> vnsrl/vnsra/vfncvt suffix (vv/vx/vi vs wv/wx/wi).
>
> To address this issue without duplicating patterns, we use ASM
> targethook to rewrite the whole string of the instructions. We
> identify different instructions from the corresponding attribute.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_asm_output_opcode):
>
> Co-authored-by: Jin Ma 
> Co-authored-by: Xianmiao Qu 
> Co-authored-by: Christoph Müllner 
> ---
>  gcc/config/riscv/riscv.cc | 213 +-
>  1 file changed, 210 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index a80bf8d1a74..13cdfc4ee27 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -5646,9 +5646,216 @@ riscv_asm_output_opcode (FILE *asm_out_file, const 
> char *p)
>  {
>/* We need to add th. prefix to all the xtheadvector
>   insturctions here.*/
> -  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> -  p[0] == 'v')
> -fputs ("th.", asm_out_file);
> +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX)
> +{
> +  if (get_attr_type (current_output_insn) == TYPE_VLDE ||
> + get_attr_type (current_output_insn) == TYPE_VSTE ||
> + get_attr_type (current_output_insn) == TYPE_VLDFF)
> +   {
> + if (strstr (p, "e8") || strstr (p, "e16") ||
> + strstr (p, "e32") || strstr (p, "e64"))
> +   {
> + get_attr_type (current_output_insn) == TYPE_VSTE
> + ? fputs ("th.vse", asm_out_file)
> + : fputs ("th.vle", asm_out_file);
> + if (strstr (p, "e8"))
> +   return p+4;
> + else
> +   return p+5;
> +   }
> +   }
> +
> +  if (get_attr_type (current_output_insn) == TYPE_VLDS ||
> + get_attr_type (current_output_insn) == TYPE_VSTS)
> +   {
> + if (strstr (p, "vle8") || strstr (p, "vse8") ||
> + strstr (p, "vle16") || strstr (p, "vse16") ||
> + strstr (p, "vle32") || strstr (p, "vse32") ||
> + strstr (p, "vle64") || strstr (p, "vse64"))
> +   {
> + get_attr_type (current_output_insn) == TYPE_VSTS
> + ? fputs ("th.vse", asm_out_file)
> + : fputs ("th.vle", asm_out_file);
> + if (strstr (p, "e8"))
> +   return p+4;
> + else
> +   return p+5;
> +   }
> + else if (strstr (p, "vlse8") || strstr (p, "vsse8") ||
> +  strstr (p, "vlse16") || strstr (p, "vsse16") ||
> +  strstr (p, "vlse32") || strstr (p, "vsse32") ||
> +  strstr (p, "vlse64") || strstr (p, "vsse64"))
> +   {
> + get_attr_type (current_output_insn) == TYPE_VSTS
> + ? fputs ("th.vsse", asm_out_file)
> + : fputs ("th.vlse", asm_out_file);
> + if (strstr (p, "e8"))
> +   return p+5;
> + else
> +   return p+6;
> +   }
> +   }
> +
> +  if (get_attr_type (current_output_insn) == TYPE_VLDUX ||
> + get_attr_type (current_output_insn) == TYPE_VLDOX)
> +   {
> + if (strstr (p, "ei"))
> +   {
> + fputs ("th.vlxe", asm_out_file);
> + if (strstr (p, "ei8"))
> +   return p+7;
> + else
> +   return p+8;
> +   }
> +   }
> +
> +  if (get_attr_type (current_output_insn) == TYPE_VSTUX ||
> + get_attr_type (current_output_insn) == TYPE_VSTOX)
> +   {
> + if (strstr (p, "ei"))
> +   {
> + get_attr_type (current_output_insn) == TYPE_VSTUX
> +   ? fputs ("th.vsuxe", asm_out_file)
> +   : fputs ("th.vsxe", asm_out_file);
> + if (strstr (p, "ei8"))
> +   return p+7;
> + else
> +   return p+8;
> +   }
> +   }
> +
> +  if (get_attr_type (current_output_insn) == TYPE_VLSEGDE ||
> + get_attr_type (current_output_insn) == TYPE_VSSEGTE ||
> + get_attr_type (current_output_insn) == TYPE_VLSEGDFF)
> +   {
> + get_attr_type (current_output_insn) == TYPE_VSSEGTE
> +

Re: [PATCH] RISC-V: Implement ZACAS extensions

2024-01-02 Thread Jeff Law





On 1/2/24 13:17, trdth...@gmail.com wrote:

From: trdthg 

This patch supports Zacas extension.
It includes instruction's machine description and built-in functions.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_implied_info): Add zacas extensions.
(riscv_ext_version_table): Likewise.
* config/riscv/arch-canonicalize
(IMPLIED_EXT): Add zacas extensions.
* config/riscv/iterators.md
(SIDI): New iterator.
(SIDITI): Likewise.
(amocas): New attribute.
* config/riscv/riscv-builtins.cc
(AVAIL): Add new.
* config/riscv/riscv-ftypes.def: Add new type for zacas instructions.
* config/riscv/riscv-zacas.def: Add ZACAS extension's built-in function 
file.
* config/riscv/riscv.md: Add new type for zacas instructions.
* config/riscv/riscv.opt: Add introduction of riscv_zacas_subext.
* config/riscv/zacas.md: Add ZACAS extension's machine description file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zacas32.c: New test.
* gcc.target/riscv/zacas64.c: New test.
* gcc.target/riscv/zacas128.c: New test.
Just a note.  I'm deferring to gcc-15.  We're well past the point where 
new features should be accepted for gcc-14.


jeff

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread juzhe.zh...@rivai.ai

We have no choice. You should know theadvector is totally unrelated with RVV1.0 
standard ISA.

Adding `%^' which missing totally unrelated ISA makes no sens to me.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2024-01-03 10:54
To: 钟居哲
CC: Jeff Law; cooper.joshua; gcc-patches; jim.wilson.gcc; palmer; andrew; 
philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
of XTheadVector.
On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
>
> This is Ok from my side.
> But before commit this patch, I think we need this patch first:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
>
> I will be back to work so I will take a look at other patches today.
 
 
Note I hate it. It would be better if you use something like `%^' (see
`~` for an example of how that works) instead of hacking
riscv_asm_output_opcode really. In fact that is how other targets
implement this kind of things.
 
Thanks,
Andrew PInski
 
> 
> juzhe.zh...@rivai.ai
>
>
> From: Jeff Law
> Date: 2024-01-01 01:43
> To: Jun Sha (Joshua); gcc-patches
> CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> juzhe.zhong; Jin Ma; Xianmiao Qu
> Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of 
> XTheadVector.
>
>
> On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > This patch adds th. prefix to all XTheadVector instructions by
> > implementing new assembly output functions. We only check the
> > prefix is 'v', so that no extra attribute is needed.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > New function to add assembler insn code prefix/suffix.
> > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> >
> > Co-authored-by: Jin Ma 
> > Co-authored-by: Xianmiao Qu 
> > Co-authored-by: Christoph Müllner 
> > ---
> >   gcc/config/riscv/riscv-protos.h|  1 +
> >   gcc/config/riscv/riscv.cc  | 14 ++
> >   gcc/config/riscv/riscv.h   |  4 
> >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> >   4 files changed, 31 insertions(+)
> >   create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> >
> > diff --git a/gcc/config/riscv/riscv-protos.h 
> > b/gcc/config/riscv/riscv-protos.h
> > index 31049ef7523..5ea54b45703 100644
> > --- a/gcc/config/riscv/riscv-protos.h
> > +++ b/gcc/config/riscv/riscv-protos.h
> > @@ -102,6 +102,7 @@ struct riscv_address_info {
> >   };
> >
> >   /* Routines implemented in riscv.cc.  */
> > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char 
> > *p);
> >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> > return lmul;
> >   }
> >
> > +/* Define ASM_OUTPUT_OPCODE to do anything special before
> > +   emitting an opcode.  */
> > +const char *
> > +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> > +{
> > +  /* We need to add th. prefix to all the xtheadvector
> > + insturctions here.*/
> > +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> > +  p[0] == 'v')
> > +fputs ("th.", asm_out_file);
> > +
> > +  return p;
> Just a formatting nit. The GNU standards break lines before the
> operator, not after.  So
>if (TARGET_XTHEADVECTOR
>&& current_output_insn != NULL
>&& p[0] == 'v')
>
> Note that current_output_insn is "extern rtx_insn *", so use NULL, not
> NULL_RTX.
>
> Neither of these nits require a new version for review.  Just fix them.
>
> If Juzhe is fine with this, so am I.  We can refine it if necessary later.
>
> jeff
>

Re: [PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-02 Thread Kito Cheng

> diff --git a/gcc/config/riscv/riscv_th_vector.h 
> b/gcc/config/riscv/riscv_th_vector.h
> new file mode 100644
> index 000..6f47e0c90a4
> --- /dev/null
> +++ b/gcc/config/riscv/riscv_th_vector.h
> @@ -0,0 +1,49 @@
> +/* RISC-V 'XTheadVector' Extension intrinsics include file.
> +   Copyright (C) 2022-2023 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#ifndef __RISCV_TH_VECTOR_H
> +#define __RISCV_TH_VECTOR_H
> +
> +#include 
> +#include 
> +
> +#ifndef __riscv_xtheadvector
> +#error "XTheadVector intrinsics require the xtheadvector extension."
> +#else
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/* NOTE: This implementation of riscv_th_vector.h is intentionally short.  
> It does
> +   not define the RVV types and intrinsic functions directly in C and C++
> +   code, but instead uses the following pragma to tell GCC to insert the
> +   necessary type and function definitions itself.  The net effect is the
> +   same, and the file is a complete implementation of riscv_th_vector.h.  */
> +#pragma riscv intrinsic "vector"

#pragma riscv intrinsic "theadvector"

Don't reuse `#pragma riscv intrinsic "vector"` to prevent including
riscv_vector.h work with __riscv_xtheadvector.
I know we already guarded with ifndef __riscv_xtheadvector and ifndef
__riscv_vector for now,
but we eventually will remove that due to multi-version function support.

e.g.

a.c compile with -march=rv64gc

a.c:

#include 

void foo(){
...
}

void foo_vector () __attribute__(("arch=+v"));
void foo_vector () {
// Use vector intrinsic to implement something
}

Re: [PATCH] RISC-V: Implement ZACAS extensions

2024-01-02 Thread Trd thg

Got it.

Jeff Law  于 2024年1月3日周三 上午11:05写道：

>
>
> On 1/2/24 13:17, trdth...@gmail.com wrote:
> > From: trdthg 
> >
> > This patch supports Zacas extension.
> > It includes instruction's machine description and built-in functions.
> >
> > gcc/ChangeLog:
> >
> >   * common/config/riscv/riscv-common.cc
> >   (riscv_implied_info): Add zacas extensions.
> >   (riscv_ext_version_table): Likewise.
> >   * config/riscv/arch-canonicalize
> >   (IMPLIED_EXT): Add zacas extensions.
> >   * config/riscv/iterators.md
> >   (SIDI): New iterator.
> >   (SIDITI): Likewise.
> >   (amocas): New attribute.
> >   * config/riscv/riscv-builtins.cc
> >   (AVAIL): Add new.
> >   * config/riscv/riscv-ftypes.def: Add new type for zacas
> instructions.
> >   * config/riscv/riscv-zacas.def: Add ZACAS extension's built-in
> function file.
> >   * config/riscv/riscv.md: Add new type for zacas instructions.
> >   * config/riscv/riscv.opt: Add introduction of riscv_zacas_subext.
> >   * config/riscv/zacas.md: Add ZACAS extension's machine description
> file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/zacas32.c: New test.
> >   * gcc.target/riscv/zacas64.c: New test.
> >   * gcc.target/riscv/zacas128.c: New test.
> Just a note.  I'm deferring to gcc-15.  We're well past the point where
> new features should be accepted for gcc-14.
>
> jeff
>

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread Andrew Pinski

On Tue, Jan 2, 2024 at 7:07 PM juzhe.zh...@rivai.ai
 wrote:
>
> We have no choice. You should know theadvector is totally unrelated with 
> RVV1.0 standard ISA.
>
> Adding `%^' which missing totally unrelated ISA makes no sens to me.

No, it implements it in a different way.
Basically all of the patterns which are supported get changed to be
instead of "v*" becomes instead "%^v" and then you change
riscv_print_operand_punct_valid_p to allow '^' and then you add '^'
support to riscv_print_operand (like '~' is handled there).

And the next patch adds a few more '%' to support printing different
different strings based on XTheadVector or not.

This is how almost all other targets handle this kind of things
instead of hacking ASM_OUTPUT_OPCODE .

Thanks,
Andrew Pinski


>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Andrew Pinski
> Date: 2024-01-03 10:54
> To: 钟居哲
> CC: Jeff Law; cooper.joshua; gcc-patches; jim.wilson.gcc; palmer; andrew; 
> philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> instructions of XTheadVector.
> On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
> >
> > This is Ok from my side.
> > But before commit this patch, I think we need this patch first:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> > I will be back to work so I will take a look at other patches today.
>
>
> Note I hate it. It would be better if you use something like `%^' (see
> `~` for an example of how that works) instead of hacking
> riscv_asm_output_opcode really. In fact that is how other targets
> implement this kind of things.
>
> Thanks,
> Andrew PInski
>
> > 
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Jeff Law
> > Date: 2024-01-01 01:43
> > To: Jun Sha (Joshua); gcc-patches
> > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > juzhe.zhong; Jin Ma; Xianmiao Qu
> > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > This patch adds th. prefix to all XTheadVector instructions by
> > > implementing new assembly output functions. We only check the
> > > prefix is 'v', so that no extra attribute is needed.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > > New function to add assembler insn code prefix/suffix.
> > > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > >
> > > Co-authored-by: Jin Ma 
> > > Co-authored-by: Xianmiao Qu 
> > > Co-authored-by: Christoph Müllner 
> > > ---
> > >   gcc/config/riscv/riscv-protos.h|  1 +
> > >   gcc/config/riscv/riscv.cc  | 14 ++
> > >   gcc/config/riscv/riscv.h   |  4 
> > >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> > >   4 files changed, 31 insertions(+)
> > >   create mode 100644 
> > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > b/gcc/config/riscv/riscv-protos.h
> > > index 31049ef7523..5ea54b45703 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -102,6 +102,7 @@ struct riscv_address_info {
> > >   };
> > >
> > >   /* Routines implemented in riscv.cc.  */
> > > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const 
> > > char *p);
> > >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> > >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> > >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> > > return lmul;
> > >   }
> > >
> > > +/* Define ASM_OUTPUT_OPCODE to do anything special before
> > > +   emitting an opcode.  */
> > > +const char *
> > > +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> > > +{
> > > +  /* We need to add th. prefix to all the xtheadvector
> > > + insturctions here.*/
> > > +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> > > +  p[0] == 'v')
> > > +fputs ("th.", asm_out_file);
> > > +
> > > +  return p;
> > Just a formatting nit. The GNU standards break lines before the
> > operator, not after.  So
> >if (TARGET_XTHEADVECTOR
> >&& current_output_insn != NULL
> >&& p[0] == 'v')
> >
> > Note that current_output_insn is "extern rtx_insn *", so use NULL, not
> > NULL_RTX.
> >
> > Neither of these nits require a new version for review.  Just fix them.
> >
> > If Juzhe is fine with this, so am I.  We can refi

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread juzhe.zh...@rivai.ai

No. It will need to change all patterns in vector.md.
It's a nightmare.

You should note I will refine vector.md in GCC-15, mixing theadvector things 
make me impossible to maintain
RVV1.0.



juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2024-01-03 11:19
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; cooper.joshua; gcc-patches; Jim Wilson; palmer; andrew; 
philipp.tomsich; christoph.muellner; jinma; cooper.qu
Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
of XTheadVector.
On Tue, Jan 2, 2024 at 7:07 PM juzhe.zh...@rivai.ai
 wrote:
>
> We have no choice. You should know theadvector is totally unrelated with 
> RVV1.0 standard ISA.
>
> Adding `%^' which missing totally unrelated ISA makes no sens to me.
 
No, it implements it in a different way.
Basically all of the patterns which are supported get changed to be
instead of "v*" becomes instead "%^v" and then you change
riscv_print_operand_punct_valid_p to allow '^' and then you add '^'
support to riscv_print_operand (like '~' is handled there).
 
And the next patch adds a few more '%' to support printing different
different strings based on XTheadVector or not.
 
This is how almost all other targets handle this kind of things
instead of hacking ASM_OUTPUT_OPCODE .
 
Thanks,
Andrew Pinski
 
 
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Andrew Pinski
> Date: 2024-01-03 10:54
> To: 钟居哲
> CC: Jeff Law; cooper.joshua; gcc-patches; jim.wilson.gcc; palmer; andrew; 
> philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> instructions of XTheadVector.
> On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
> >
> > This is Ok from my side.
> > But before commit this patch, I think we need this patch first:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> >
> > I will be back to work so I will take a look at other patches today.
>
>
> Note I hate it. It would be better if you use something like `%^' (see
> `~` for an example of how that works) instead of hacking
> riscv_asm_output_opcode really. In fact that is how other targets
> implement this kind of things.
>
> Thanks,
> Andrew PInski
>
> > 
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Jeff Law
> > Date: 2024-01-01 01:43
> > To: Jun Sha (Joshua); gcc-patches
> > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > juzhe.zhong; Jin Ma; Xianmiao Qu
> > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions 
> > of XTheadVector.
> >
> >
> > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > This patch adds th. prefix to all XTheadVector instructions by
> > > implementing new assembly output functions. We only check the
> > > prefix is 'v', so that no extra attribute is needed.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > > New function to add assembler insn code prefix/suffix.
> > > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > >
> > > Co-authored-by: Jin Ma 
> > > Co-authored-by: Xianmiao Qu 
> > > Co-authored-by: Christoph Müllner 
> > > ---
> > >   gcc/config/riscv/riscv-protos.h|  1 +
> > >   gcc/config/riscv/riscv.cc  | 14 ++
> > >   gcc/config/riscv/riscv.h   |  4 
> > >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> > >   4 files changed, 31 insertions(+)
> > >   create mode 100644 
> > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> > >
> > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > b/gcc/config/riscv/riscv-protos.h
> > > index 31049ef7523..5ea54b45703 100644
> > > --- a/gcc/config/riscv/riscv-protos.h
> > > +++ b/gcc/config/riscv/riscv-protos.h
> > > @@ -102,6 +102,7 @@ struct riscv_address_info {
> > >   };
> > >
> > >   /* Routines implemented in riscv.cc.  */
> > > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const 
> > > char *p);
> > >   extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
> > >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> > >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > > index 0d1cbc5cb5f..ea1d59d9cf2 100644
> > > --- a/gcc/config/riscv/riscv.cc
> > > +++ b/gcc/config/riscv/riscv.cc
> > > @@ -5636,6 +5636,20 @@ riscv_get_v_regno_alignment (machine_mode mode)
> > > return lmul;
> > >   }
> > >
> > > +/* Define ASM_OUTPUT_OPCODE to do anything special before
> > > +   emitting an opcode.  */
> > > +const char *
> > > +riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
> > > +{
> > > +  /* We need to add th. prefix to all the xtheadvector
> > > + insturctions here.*/
> > > +  if (TARGET_XTHEADVECTOR && current_output_insn != NULL_RTX &&
> > >

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread Andrew Pinski

On Tue, Jan 2, 2024 at 7:26 PM juzhe.zh...@rivai.ai
 wrote:
>
> No. It will need to change all patterns in vector.md.
> It's a nightmare.
>
> You should note I will refine vector.md in GCC-15, mixing theadvector things 
> make me impossible to maintain
> RVV1.0.

Then we should not support theadvector if things are getting this
messy. Both ways are hacks really.
Either way we need a better way of implementing this. Hacking
theadvector support using rewriting is wrong and not maintainable
either.
I suspect we should wait on supporting theadvector until GCC 15 anyways.

Thanks,
Andrew Pinski

>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Andrew Pinski
> Date: 2024-01-03 11:19
> To: juzhe.zh...@rivai.ai
> CC: jeffreyalaw; cooper.joshua; gcc-patches; Jim Wilson; palmer; andrew; 
> philipp.tomsich; christoph.muellner; jinma; cooper.qu
> Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> instructions of XTheadVector.
> On Tue, Jan 2, 2024 at 7:07 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > We have no choice. You should know theadvector is totally unrelated with 
> > RVV1.0 standard ISA.
> >
> > Adding `%^' which missing totally unrelated ISA makes no sens to me.
>
> No, it implements it in a different way.
> Basically all of the patterns which are supported get changed to be
> instead of "v*" becomes instead "%^v" and then you change
> riscv_print_operand_punct_valid_p to allow '^' and then you add '^'
> support to riscv_print_operand (like '~' is handled there).
>
> And the next patch adds a few more '%' to support printing different
> different strings based on XTheadVector or not.
>
> This is how almost all other targets handle this kind of things
> instead of hacking ASM_OUTPUT_OPCODE .
>
> Thanks,
> Andrew Pinski
>
>
> >
> > 
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Andrew Pinski
> > Date: 2024-01-03 10:54
> > To: 钟居哲
> > CC: Jeff Law; cooper.joshua; gcc-patches; jim.wilson.gcc; palmer; andrew; 
> > philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> > Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> > instructions of XTheadVector.
> > On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
> > >
> > > This is Ok from my side.
> > > But before commit this patch, I think we need this patch first:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> > >
> > > I will be back to work so I will take a look at other patches today.
> >
> >
> > Note I hate it. It would be better if you use something like `%^' (see
> > `~` for an example of how that works) instead of hacking
> > riscv_asm_output_opcode really. In fact that is how other targets
> > implement this kind of things.
> >
> > Thanks,
> > Andrew PInski
> >
> > > 
> > > juzhe.zh...@rivai.ai
> > >
> > >
> > > From: Jeff Law
> > > Date: 2024-01-01 01:43
> > > To: Jun Sha (Joshua); gcc-patches
> > > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > > juzhe.zhong; Jin Ma; Xianmiao Qu
> > > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> > > instructions of XTheadVector.
> > >
> > >
> > > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > > This patch adds th. prefix to all XTheadVector instructions by
> > > > implementing new assembly output functions. We only check the
> > > > prefix is 'v', so that no extra attribute is needed.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > > > New function to add assembler insn code prefix/suffix.
> > > > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > > > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > > >
> > > > Co-authored-by: Jin Ma 
> > > > Co-authored-by: Xianmiao Qu 
> > > > Co-authored-by: Christoph Müllner 
> > > > ---
> > > >   gcc/config/riscv/riscv-protos.h|  1 +
> > > >   gcc/config/riscv/riscv.cc  | 14 ++
> > > >   gcc/config/riscv/riscv.h   |  4 
> > > >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> > > >   4 files changed, 31 insertions(+)
> > > >   create mode 100644 
> > > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> > > >
> > > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > > b/gcc/config/riscv/riscv-protos.h
> > > > index 31049ef7523..5ea54b45703 100644
> > > > --- a/gcc/config/riscv/riscv-protos.h
> > > > +++ b/gcc/config/riscv/riscv-protos.h
> > > > @@ -102,6 +102,7 @@ struct riscv_address_info {
> > > >   };
> > > >
> > > >   /* Routines implemented in riscv.cc.  */
> > > > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const 
> > > > char *p);
> > > >   extern enum riscv_symbol_type riscv_classify_symbolic_expression 
> > > > (rtx);
> > > >   extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
> > > >   extern int riscv_float_const_rtx_index_for_fli (rtx);
> > > > di

Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread Kito Cheng

Hi Andrew:

That's kinda compromise and trade off on the t-head vector stuffs, we
would like to accept that, but without disturbing the vector 1.0
implementation too much, t-head vector is transitional product and it
will freeze/stop there forever without extending new stuffs like
vector bfloat and vector crypto stuffs.

So we think using ASM_OUTPUT_OPCODE is better in this case rather than
adding %^ to every vector pattern for the t-head vector.

On Wed, Jan 3, 2024 at 11:26 AM juzhe.zh...@rivai.ai
 wrote:
>
> No. It will need to change all patterns in vector.md.
> It's a nightmare.
>
> You should note I will refine vector.md in GCC-15, mixing theadvector things 
> make me impossible to maintain
> RVV1.0.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Andrew Pinski
> Date: 2024-01-03 11:19
> To: juzhe.zh...@rivai.ai
> CC: jeffreyalaw; cooper.joshua; gcc-patches; Jim Wilson; palmer; andrew; 
> philipp.tomsich; christoph.muellner; jinma; cooper.qu
> Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> instructions of XTheadVector.
> On Tue, Jan 2, 2024 at 7:07 PM juzhe.zh...@rivai.ai
>  wrote:
> >
> > We have no choice. You should know theadvector is totally unrelated with 
> > RVV1.0 standard ISA.
> >
> > Adding `%^' which missing totally unrelated ISA makes no sens to me.
>
> No, it implements it in a different way.
> Basically all of the patterns which are supported get changed to be
> instead of "v*" becomes instead "%^v" and then you change
> riscv_print_operand_punct_valid_p to allow '^' and then you add '^'
> support to riscv_print_operand (like '~' is handled there).
>
> And the next patch adds a few more '%' to support printing different
> different strings based on XTheadVector or not.
>
> This is how almost all other targets handle this kind of things
> instead of hacking ASM_OUTPUT_OPCODE .
>
> Thanks,
> Andrew Pinski
>
>
> >
> > 
> > juzhe.zh...@rivai.ai
> >
> >
> > From: Andrew Pinski
> > Date: 2024-01-03 10:54
> > To: 钟居哲
> > CC: Jeff Law; cooper.joshua; gcc-patches; jim.wilson.gcc; palmer; andrew; 
> > philipp.tomsich; Christoph Müllner; jinma; Cooper Qu
> > Subject: Re: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> > instructions of XTheadVector.
> > On Mon, Jan 1, 2024 at 2:59 PM 钟居哲  wrote:
> > >
> > > This is Ok from my side.
> > > But before commit this patch, I think we need this patch first:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-December/641533.html
> > >
> > > I will be back to work so I will take a look at other patches today.
> >
> >
> > Note I hate it. It would be better if you use something like `%^' (see
> > `~` for an example of how that works) instead of hacking
> > riscv_asm_output_opcode really. In fact that is how other targets
> > implement this kind of things.
> >
> > Thanks,
> > Andrew PInski
> >
> > > 
> > > juzhe.zh...@rivai.ai
> > >
> > >
> > > From: Jeff Law
> > > Date: 2024-01-01 01:43
> > > To: Jun Sha (Joshua); gcc-patches
> > > CC: jim.wilson.gcc; palmer; andrew; philipp.tomsich; christoph.muellner; 
> > > juzhe.zhong; Jin Ma; Xianmiao Qu
> > > Subject: Re: [PATCH v4] RISC-V: Adds the prefix "th." for the 
> > > instructions of XTheadVector.
> > >
> > >
> > > On 12/28/23 21:19, Jun Sha (Joshua) wrote:
> > > > This patch adds th. prefix to all XTheadVector instructions by
> > > > implementing new assembly output functions. We only check the
> > > > prefix is 'v', so that no extra attribute is needed.
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/riscv/riscv-protos.h (riscv_asm_output_opcode):
> > > > New function to add assembler insn code prefix/suffix.
> > > > * config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise.
> > > > * config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
> > > >
> > > > Co-authored-by: Jin Ma 
> > > > Co-authored-by: Xianmiao Qu 
> > > > Co-authored-by: Christoph Müllner 
> > > > ---
> > > >   gcc/config/riscv/riscv-protos.h|  1 +
> > > >   gcc/config/riscv/riscv.cc  | 14 ++
> > > >   gcc/config/riscv/riscv.h   |  4 
> > > >   .../gcc.target/riscv/rvv/xtheadvector/prefix.c | 12 
> > > >   4 files changed, 31 insertions(+)
> > > >   create mode 100644 
> > > > gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
> > > >
> > > > diff --git a/gcc/config/riscv/riscv-protos.h 
> > > > b/gcc/config/riscv/riscv-protos.h
> > > > index 31049ef7523..5ea54b45703 100644
> > > > --- a/gcc/config/riscv/riscv-protos.h
> > > > +++ b/gcc/config/riscv/riscv-protos.h
> > > > @@ -102,6 +102,7 @@ struct riscv_address_info {
> > > >   };
> > > >
> > > >   /* Routines implemented in riscv.cc.  */
> > > > +extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const 
> > > > char *p);
> > > >   extern enum riscv_symbol_type riscv_classify_symbolic_expression 
> > > > (rtx);
> > > >   extern bool riscv_symbolic_

Re: [PATCH] Add a late-combine pass [PR106594]

2024-01-02 Thread Jeff Law





On 10/24/23 12:49, Richard Sandiford wrote:

This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
So have you done any investigation on cases caught by your new pass 
between combine and split1 to characterize them?  In particular do they 
point at solvable problems in combine?  Or do you forsee this subsuming 
the old combiner pass at some point in the distant future?


rth and I sketched out an SSA based RTL combine at some point in the 
deep past.  The key goal we were trying to achieve was combining across 
blocks.  We didn't have a functioning RTL SSA form at the time, so it 
never went to any implementation work.  It looks like yours would solve 
the class of problems rth and I were considering.


[ ... ]



The patch therefore enables the pass by default only on AArch64.
However, I did test the patch with it enabled on x86_64-linux-gnu
as well, which was useful for debugging.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu (as posted, with no regressions, and with the
pass enabled by default, with some gcc.target/i386 regressions).
OK to install?
I'm going to adjust this slightly so that it's enabled across the board 
and throw it into the tester tomorrow (tester is busy tonight).  Even if 
we make it opt-in on a per-port basis, the alternate target testing does 
seems to find stuff that needs fixing ;-)






Richard


gcc/
PR rtl-optimization/106594
* Makefile.in (OBJS): Add late-combine.o.
* common.opt (flate-combine-instructions): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc: Enable it by default
at -O2 and above.
* tree-pass.h (make_pass_late_combine): Declare.
* late-combine.cc: New file.
* passes.def: Add two instances of late_combine.

gcc/testsuite/
PR rtl-optimization/106594
* gcc.dg/ira-shrinkwrap-prep-1.c: Restrict XFAIL to non-aarch64
targets.
* gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.
* gcc.dg/stack-check-4.c: Add -fno-shrink-wrap.
* gcc.target/aarch64/sve/cond_asrd_3.c: Remove XFAILs.
* gcc.target/aarch64/sve/cond_convert_3.c: Likewise.
* gcc.target/aarch64/sve/cond_fabd_5.c: Likewise.
* gcc.target/aarch64/sve/cond_convert_6.c: Expect the MOVPRFX /Zs
described in the comment.
* gcc.target/aarch64/sve/cond_unary_4.c: Likewise.
* gcc.target/aarch64/pr106594_1.c: New test.





+
+  // Don't substitute into a non-local goto, since it can then be
+  // treated as a jump to local label, e.g. in shorten_branches.
+  // ??? But this shouldn't be necessary.
+  if (use_insn->is_jump ()
+ && find_reg_note (use_insn->rtl (), REG_NON_LOCAL_GOTO, NULL_RTX))
+   return false;
Agreed that this shouldn't be necessary.  In fact, if you can substitute 
it's potentially very profitable.  Then again, I doubt it happens all 
that much in practice, particularly since I think gimple does squash out 
some of these.


Nothing jumps out at horribly wrong.  You might want/need to reject 
frame related insns in optimizable_set, though I guess if the dwarf2 
writer isn't complaining, then we haven't mucked things up too bad.

Re: [PATCH DejaGNU/GCC 0/1] Support per-test execution timeout factor

2024-01-02 Thread Hans-Peter Nilsson

On Tue, 12 Dec 2023, Maciej W. Rozycki wrote:

> Hi,
> 
>  This patch quasi-series makes it possible for individual test cases 
> identified as being slow to request more time via the GCC test harness by 
> providing a test execution timeout factor, applied to the tool execution 
> timeout set globally for all the test cases.  This is to avoid excessive 
> testsuite run times where other test cases do hang as it would be the 
> case if the timeout set globally was to be increased.
> 
>  The test execution timeout is different from the tool execution timeout 
> where it is GCC execution that is being guarded against taking excessive 
> amount of time on the test host rather than the resulting test case 
> executable run on the target afterwards, as concerned here.  GCC already 
> has a `dg-timeout-factor' setting for the tool execution timeout, but has 
> no means to increase the test execution timeout.  The GCC side of these 
> changes adds a corresponding `dg-test-timeout-factor' setting.

Hmm.  I think it would be more correct to emphasize that the 
existing dg-timeout-factor affects both the tool execution *and* 
the test execution, whereas your new dg-test-timeout-factor only 
affects the test execution.  (And still measured on the host.)

Usually the compilation time is close to 0, so is this based on 
an actual need more than an itchy "wart"?

Or did I miss something?

brgds, H-P

[PATCH v7 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-02 Thread Feng Wang

Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c

This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
 .../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
 .../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
 .../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
 .../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
 .../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
 .../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
 .../riscv/rvv/base/zvkned-intrinsic.c | 104 ++
 .../riscv/rvv/base/zvknha-intrinsic.c |  33 
 .../riscv/rvv/base/zvknhb-intrinsic.c |  33 
 .../riscv/rvv/base/zvksed-intrinsic.c |  33 
 .../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
 gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
 12 files changed, 548 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u32m1_m(mask, vs2, vl);
+}
+
+vuint64m1_t test_v

Re: [PATCH 5/5][_Hashtable] Prefer to insert after last node

2024-01-02 Thread François Dumont




On 21/12/2023 22:55, Jonathan Wakely wrote:

I think this should wait for the next stage 1. It's a big patch
affecting the default -std mode (not just experimental C++20/23/26
material), and was first posted after the end of stage 1.
The idea of this patch was in the air far before it but I agree that its 
form has changed a lot.


Do we really need the changes for versioned namespace? How much
difference does that extra member make to performance, compared with
the version for the default config?


It is huge and demonstrated by the bench results below. You can see that 
the number of calls to the hash functor is divided by 2. With such a 
member you only need to compute 1 hash code when inserting in an empty 
bucket (the perfect hasher case) whereas we currently need 2 to properly 
maintained the bucket pointing to the before-begin base node.


This is also why I proposed it now as I still hope that the patch to 
move to cxx11 abi in gnu versioned namespace will be integrated so with 
its implied version bump.





On Wed, 20 Dec 2023 at 06:10, François Dumont  wrote:

Here is a new version of this patch.

The previous one had some flaws that were unnoticed by testsuite tests,
only the performance tests were spotting it. So I'm adding checks on the
consistency of the unordered containers in this patch.

I also forget to signal that after this patch gnu versioned namespace
version is supposed to be bump. But I hope it's already the plan because
of the move to the cxx11 abi in this mode.

Note for reviewer, after application of the patch, a 'git diff -b' is
much more readable.

And some benches results:

before:

unordered_set_range_insert.cc-threadhash code NOT cached 2 X 100
inserts individually 1990 calls  44r   44u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 2 X 100
inserts in range 2000 calls  43r   43u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 100 X
inserts individually 1990 calls  44r   44u 0s  95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 100 X
inserts in range 2000 calls  43r   43u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code cached 2 X 100
inserts individually 1000 calls  30r   30u0s 111999328mem
0pf
unordered_set_range_insert.cc-threadhash code cached 2 X 100
inserts in range 1010 calls  33r   32u0s 111999328mem0pf
unordered_set_range_insert.cc-threadhash code cached 100 X
inserts individually 1000 calls  30r   31u0s 111999328mem
0pf
unordered_set_range_insert.cc-threadhash code cached 100 X
inserts in range 1010 calls  32r   32u0s 111999328mem0pf

after:

unordered_set_range_insert.cc-threadhash code NOT cached 2 X 100
inserts individually 1990 calls  44r   44u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 2 X 100
inserts in range 1020 calls  26r   25u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 100 X
inserts individually 1990 calls  43r   44u 0s  95999760mem0pf
unordered_set_range_insert.cc-threadhash code NOT cached 100 X
inserts in range 1020 calls  26r   26u0s 95999760mem0pf
unordered_set_range_insert.cc-threadhash code cached 2 X 100
inserts individually 1000 calls  35r   35u0s 111999328mem
0pf
unordered_set_range_insert.cc-threadhash code cached 2 X 100
inserts in range 1010 calls  32r   33u0s 111999328mem0pf
unordered_set_range_insert.cc-threadhash code cached 100 X
inserts individually 1000 calls  31r   32u0s 111999328mem
0pf
unordered_set_range_insert.cc-threadhash code cached 100 X
inserts in range 1010 calls  31r   31u0s 111999328mem0pf


  libstdc++: [_Hashtable] Prefer to insert after last node

  When inserting an element into an empty bucket we currently insert
the new node
  after the before-begin node so in first position. The drawback of
doing this is
  that we are forced to update the bucket that was containing this
before-begin
  node to point to the newly inserted node. To do so we need at best
to do a modulo
  to find this bucket and at worst, when hash code is not cached,
also compute it.

  To avoid this side effect it is better to insert after the last
node. To do so
  we are introducing a helper type _HintPolicy that has 3
resposibilities.

  1. When the gnu versioned namespace is used we add a _M_last member
to _Hashtable,
  _HintPolicy is then in charge of maintaining it. For this purpose
_HintPolicy is
  using the RAII pattern, it resets the _M_last at destruction level.
It also maintain
  its own _M_last, all mutable operations are updating it when needed.

  2. When the gnu versioned namespace is

[PATCH v4] RISC-V: Adds the prefix "th." for the instructions of XTheadVector.

2024-01-02 Thread Jun Sha (Joshua)

This patch adds th. prefix to all XTheadVector instructions by
implementing new assembly output functions. We only check the
prefix is 'v', so that no extra attribute is needed.

gcc/ChangeLog:

* config/riscv/riscv-protos.h (riscv_asm_output_opcode):
New function to add assembler insn code prefix/suffix.
(th_asm_output_opcode):
Thead function to add assembler insn code prefix/suffix.
* config/riscv/riscv.cc (riscv_asm_output_opcode): Likewise
* config/riscv/riscv.h (ASM_OUTPUT_OPCODE): Likewise.
* config/riscv/thead.cc (th_asm_output_opcode): Likewise

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/xtheadvector/prefix.c: New test.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv-protos.h |  2 ++
 gcc/config/riscv/riscv.cc   | 11 +++
 gcc/config/riscv/riscv.h|  4 
 gcc/config/riscv/thead.cc   | 13 +
 .../gcc.target/riscv/rvv/xtheadvector/prefix.c  | 12 
 5 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 31049ef7523..71724dabdb5 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -102,6 +102,7 @@ struct riscv_address_info {
 };
 
 /* Routines implemented in riscv.cc.  */
+extern const char *riscv_asm_output_opcode (FILE *asm_out_file, const char *p);
 extern enum riscv_symbol_type riscv_classify_symbolic_expression (rtx);
 extern bool riscv_symbolic_constant_p (rtx, enum riscv_symbol_type *);
 extern int riscv_float_const_rtx_index_for_fli (rtx);
@@ -717,6 +718,7 @@ extern void th_mempair_prepare_save_restore_operands 
(rtx[4], bool,
  int, HOST_WIDE_INT,
  int, HOST_WIDE_INT);
 extern void th_mempair_save_restore_regs (rtx[4], bool, machine_mode);
+extern const char *th_asm_output_opcode (FILE *asm_out_file, const char *p);
 #ifdef RTX_CODE
 extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..51878797287 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5636,6 +5636,17 @@ riscv_get_v_regno_alignment (machine_mode mode)
   return lmul;
 }
 
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+riscv_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  if (TARGET_XTHEADVECTOR)
+return th_asm_output_opcode (asm_out_file, p);
+
+  return p;
+}
+
 /* Implement TARGET_PRINT_OPERAND.  The RISCV-specific operand codes are:
 
'h' Print the high-part relocation associated with OP, after stripping
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 6df9ec73c5e..c33361a254d 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -826,6 +826,10 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
   asm_fprintf ((FILE), "%U%s", (NAME));\
   } while (0)
 
+#undef ASM_OUTPUT_OPCODE
+#define ASM_OUTPUT_OPCODE(STREAM, PTR) \
+  (PTR) = riscv_asm_output_opcode(STREAM, PTR)
+
 #define JUMP_TABLES_IN_TEXT_SECTION 0
 #define CASE_VECTOR_MODE SImode
 #define CASE_VECTOR_PC_RELATIVE (riscv_cmodel != CM_MEDLOW)
diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index 20353995931..dc3aed3904d 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -883,6 +883,19 @@ th_output_move (rtx dest, rtx src)
   return NULL;
 }
 
+/* Define ASM_OUTPUT_OPCODE to do anything special before
+   emitting an opcode.  */
+const char *
+th_asm_output_opcode (FILE *asm_out_file, const char *p)
+{
+  /* We need to add th. prefix to all the xtheadvector
+ instructions here.*/
+  if (current_output_insn != NULL && p[0] == 'v')
+fputs ("th.", asm_out_file);
+
+  return p;
+}
+
 /* Implement TARGET_PRINT_OPERAND_ADDRESS for XTheadMemIdx.  */
 
 bool
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c 
b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
new file mode 100644
index 000..eee727ef6b4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/xtheadvector/prefix.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_xtheadvector -mabi=ilp32 -O0" } */
+
+#include "riscv_vector.h"
+
+vint32m1_t
+prefix (vint32m1_t vx, vint32m1_t vy, size_t vl)
+{
+  return __riscv_vadd_vv_i32m1 (vx, vy, vl);
+}
+
+/* { dg-final { scan-assembler {\mth\.v\M} } } */
-- 
2.17.1

[PATCH v4] RISC-V: Handle differences between XTheadvector and Vector

2024-01-02 Thread Jun Sha (Joshua)

This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches. 

For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r and vsext.vf2.

gcc/ChangeLog:

* config.gcc:  Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/riscv-c.cc: Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (expand_block_move):
Guard XTheadVector.
(get_prefer_tail_policy): Give specific value for tail.
(get_prefer_mask_policy): Give specific value for mask.
(vls_mode_valid_p): Avoid autovec.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
(build_one): New function.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION):
(DEF_THEAD_RVV_FUNCTION): Add new marcos.
(check_required_extensions):
(handle_pragma_vector):
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_VECTOR):
(RVV_REQUIRE_XTHEADVECTOR):
Add RVV_REQUIRE_VECTOR and RVV_REQUIRE_XTHEADVECTOR.
(struct function_group_info):
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv-vsetvl.cc: Add functions for xtheadvector.
* config/riscv/riscv.cc (riscv_v_ext_vls_mode_p):
Guard XTheadVector.
(riscv_v_adjust_bytesize): Likewise.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/vector-iterators.md: Remove fractional LMUL.
* config/riscv/vector.md: Include thead-vector.md.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/autovec.md   |   2 +-
 gcc/config/riscv/predicates.md|   4 +-
 gcc/config/riscv/riscv-c.cc   |   3 +-
 gcc/config/riscv/riscv-string.cc  |   3 +-
 gcc/config/riscv/riscv-v.cc   |   6 +-
 .../riscv/riscv-vector-builtins-bases.cc  |  48 +++--
 .../riscv/riscv-vector-builtins-shapes.cc |  23 +++
 gcc/config/riscv/riscv-vector-switch.def  | 150 +++---
 gcc/config/riscv/riscv.cc |  20 +-
 gcc/config/riscv/riscv_th_vector.h|  49 +
 gcc/config/riscv/thead-vector.md  |  69 +++
 gcc/config/riscv/vector-iterators.md  | 186 +-
 gcc/config/riscv/vector.md|  55 --
 .../gcc.target/riscv/rvv/base/abi-1.c |   2 +-
 .../gcc.target/riscv/rvv/base/pragma-1.c  |   2 +-
 gcc/testsuite/lib/target-supports.exp |  12 ++
 17 files changed, 427 insertions(+), 209 deletions(-)
 create mode 100644 gcc/config/riscv/riscv_th_vector.h
 create mode 100644 gcc/config/riscv/thead-vector.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index f0676c830e8..1445d98c147 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -549,7 +549,7 @@ riscv*)
extra_objs="${extra_objs} riscv-vector-builtins.o 
riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
extra_objs="${extra_objs} thead.o riscv-target-attr.o"
d_target_objs="riscv-d.o"
-   extra_headers="riscv_vector.h"
+   extra_headers="riscv_vector.h riscv_th_vector.h"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.cc"
target_gtfiles="$target_gtfiles 
\$(srcdir)/config/riscv/riscv-vector-builtins.h"
;;
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 8b8a92f10a1..1fac56c7095 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2579,7 +2579,7 @@
   [(match_operand  0 "register_operand")
(match_operand  1 "memory_operand")
(match_operand:ANYI 2 "const_int_operand")]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && !TARGET_XTHEADVECTOR"
   {
 riscv_vector::expand_rawmemchr(mode, operands

Re: [PATCH v7 2/2] RISC-V: Add crypto vector api-testing cases.

2024-01-02 Thread juzhe.zh...@rivai.ai

LGTM.



juzhe.zh...@rivai.ai
 
From: Feng Wang
Date: 2024-01-03 13:21
To: gcc-patches
CC: kito.cheng; jeffreyalaw; juzhe.zhong; Feng Wang
Subject: [PATCH v7 2/2] RISC-V: Add crypto vector api-testing cases.
Patch v7: Add newline at the end of file.
Patch v6: Move intrinsic tests into rvv/base.
Patch v5: Rebase
Patch v4: Add some RV32 vx constraint testcase.
Patch v3: Refine crypto vector api-testing cases.
Patch v2: Update march info according to the change of riscv-common.c
 
This patch add crypto vector api-testing cases based on
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/vector-crypto
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvbb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c: New test.
* gcc.target/riscv/rvv/base/zvbc-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: New test.
* gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: New test.
* gcc.target/riscv/rvv/base/zvkg-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvkned-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknha-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksed-intrinsic.c: New test.
* gcc.target/riscv/rvv/base/zvksh-intrinsic.c: New test.
* gcc.target/riscv/zvkb.c: New test.
---
.../riscv/rvv/base/zvbb-intrinsic.c   | 179 ++
.../riscv/rvv/base/zvbb_vandn_vx_constraint.c |  15 ++
.../riscv/rvv/base/zvbc-intrinsic.c   |  62 ++
.../riscv/rvv/base/zvbc_vx_constraint-1.c |  14 ++
.../riscv/rvv/base/zvbc_vx_constraint-2.c |  14 ++
.../riscv/rvv/base/zvkg-intrinsic.c   |  24 +++
.../riscv/rvv/base/zvkned-intrinsic.c | 104 ++
.../riscv/rvv/base/zvknha-intrinsic.c |  33 
.../riscv/rvv/base/zvknhb-intrinsic.c |  33 
.../riscv/rvv/base/zvksed-intrinsic.c |  33 
.../riscv/rvv/base/zvksh-intrinsic.c  |  24 +++
gcc/testsuite/gcc.target/riscv/zvkb.c |  13 ++
12 files changed, 548 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbb_vandn_vx_constraint.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvbc-intrinsic.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkg-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvkned-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknha-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvknhb-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksed-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvksh-intrinsic.c
create mode 100644 gcc/testsuite/gcc.target/riscv/zvkb.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
new file mode 100644
index 000..b7e25bfe819
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvbb-intrinsic.c
@@ -0,0 +1,179 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zvbb_zve64x -mabi=lp64d -Wno-psabi" } */
+#include "riscv_vector.h"
+
+vuint8mf8_t test_vandn_vv_u8mf8(vuint8mf8_t vs2, vuint8mf8_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u8mf8(vs2, vs1, vl);
+}
+
+vuint32m1_t test_vandn_vx_u32m1(vuint32m1_t vs2, uint32_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u32m1(vs2, rs1, vl);
+}
+
+vuint32m2_t test_vandn_vv_u32m2_m(vbool16_t mask, vuint32m2_t vs2, vuint32m2_t 
vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m2_m(mask, vs2, vs1, vl);
+}
+
+vuint16mf2_t test_vandn_vx_u16mf2_m(vbool32_t mask, vuint16mf2_t vs2, uint16_t 
rs1, size_t vl) {
+  return __riscv_vandn_vx_u16mf2_m(mask, vs2, rs1, vl);
+}
+
+vuint32m4_t test_vandn_vv_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, vuint32m4_t vs1, size_t vl) {
+  return __riscv_vandn_vv_u32m4_tumu(mask, maskedoff, vs2, vs1, vl);
+}
+
+vuint64m4_t test_vandn_vx_u64m4_tumu(vbool16_t mask, vuint64m4_t maskedoff, 
vuint64m4_t vs2, uint64_t rs1, size_t vl) {
+  return __riscv_vandn_vx_u64m4_tumu(mask, maskedoff, vs2, rs1, vl);
+}
+
+vuint8m8_t test_vbrev_v_u8m8(vuint8m8_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u8m8(vs2, vl);
+}
+
+vuint16m1_t test_vbrev_v_u16m1_m(vbool16_t mask, vuint16m1_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u16m1_m(mask, vs2, vl);
+}
+
+vuint32m4_t test_vbrev_v_u32m4_tumu(vbool8_t mask, vuint32m4_t maskedoff, 
vuint32m4_t vs2, size_t vl) {
+  return __riscv_vbrev_v_u32m4_tumu(mask, maskedoff, vs2, vl);
+}
+
+vuint16mf4_t test_vbrev8_v_u16mf4(vuint16mf4_t vs2, size_t vl) {
+  return __riscv_vbrev8_v_u16mf4(vs2, vl);
+}
+
+vuint32m1_t test_vbrev8_v_u32m1_m(vbool32_t mask, vuint32m1_t vs2

RE: [PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'

2024-01-02 Thread Di Zhao OS

> -Original Message-
> From: Richard Sandiford 
> Sent: Friday, December 29, 2023 6:24 PM
> To: Di Zhao OS 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: add 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA'
> 
> Di Zhao OS  writes:
> > This patch adds a new tuning option 
> > 'AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA',
> > to consider fully pipelined FMAs in reassociation. Also, set this option
> > by default for Ampere CPUs.
> >
> > Tested on aarch64-unknown-linux-gnu. Is this OK for trunk?
> >
> > Thanks,
> > Di Zhao
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
> > New tuning option AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
> > * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set
> > param_fully_pipelined_fma according to tuning option.
> > * config/aarch64/tuning_models/ampere1.h: Add
> > AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA to tune_flags.
> > * config/aarch64/tuning_models/ampere1a.h: Likewise.
> > * config/aarch64/tuning_models/ampere1b.h: Likewise.
> >
> > ---
> >  gcc/config/aarch64/aarch64-tuning-flags.def | 2 ++
> >  gcc/config/aarch64/aarch64.cc   | 6 ++
> >  gcc/config/aarch64/tuning_models/ampere1.h  | 3 ++-
> >  gcc/config/aarch64/tuning_models/ampere1a.h | 3 ++-
> >  gcc/config/aarch64/tuning_models/ampere1b.h | 3 ++-
> >  5 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def
> b/gcc/config/aarch64/aarch64-tuning-flags.def
> > index f28a73839a6..256f17bad60 100644
> > --- a/gcc/config/aarch64/aarch64-tuning-flags.def
> > +++ b/gcc/config/aarch64/aarch64-tuning-flags.def
> > @@ -49,4 +49,6 @@ AARCH64_EXTRA_TUNING_OPTION ("matched_vector_throughput",
> MATCHED_VECTOR_THROUGH
> >
> >  AARCH64_EXTRA_TUNING_OPTION ("avoid_cross_loop_fma", AVOID_CROSS_LOOP_FMA)
> >
> > +AARCH64_EXTRA_TUNING_OPTION ("fully_pipelined_FMA", FULLY_PIPELINED_FMA)
> 
> Could you change this to all-lowercase, i.e. fully_pipelined_fma,
> for consistency with avoid_cross_loop_fma above?
> 
> > +
> >  #undef AARCH64_EXTRA_TUNING_OPTION
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index f9850320f61..1b3b288cdf9 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -18289,6 +18289,12 @@ aarch64_override_options_internal (struct
> gcc_options *opts)
> >  SET_OPTION_IF_UNSET (opts, &global_options_set,
> param_avoid_fma_max_bits,
> >  512);
> >
> > +  /* Consider fully pipelined FMA in reassociation.  */
> > +  if (aarch64_tune_params.extra_tuning_flags
> > +  & AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA)
> > +SET_OPTION_IF_UNSET (opts, &global_options_set,
> param_fully_pipelined_fma,
> > +1);
> > +
> >aarch64_override_options_after_change_1 (opts);
> >  }
> >
> > diff --git a/gcc/config/aarch64/tuning_models/ampere1.h
> b/gcc/config/aarch64/tuning_models/ampere1.h
> > index a144e8f94b3..d63788528a7 100644
> > --- a/gcc/config/aarch64/tuning_models/ampere1.h
> > +++ b/gcc/config/aarch64/tuning_models/ampere1.h
> > @@ -104,7 +104,8 @@ static const struct tune_params ampere1_tunings =
> >2,   /* min_div_recip_mul_df.  */
> >0,   /* max_case_values.  */
> >tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> > -  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
> > +  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA |
> > +   AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags.  */
> 
> Formatting nit, but GCC style is to put the "|" at the start of the
> following line:
> 
>   (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA
>| AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA), /* tune_flags.  */
> 
> Same for the others.
> 
> OK with those changes, thanks.

Fixed the problems and committed to master.

Thanks,
Di

> 
> Richard
> 
> >&ere1_prefetch_tune,
> >AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> >AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
> > diff --git a/gcc/config/aarch64/tuning_models/ampere1a.h
> b/gcc/config/aarch64/tuning_models/ampere1a.h
> > index f688ed08a79..63506e1d1c6 100644
> > --- a/gcc/config/aarch64/tuning_models/ampere1a.h
> > +++ b/gcc/config/aarch64/tuning_models/ampere1a.h
> > @@ -56,7 +56,8 @@ static const struct tune_params ampere1a_tunings =
> >2,   /* min_div_recip_mul_df.  */
> >0,   /* max_case_values.  */
> >tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> > -  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA),   /* tune_flags.  */
> > +  (AARCH64_EXTRA_TUNE_AVOID_CROSS_LOOP_FMA |
> > +   AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA),/* tune_flags.  */
> >&ere1_prefetch_tune,
> >AARCH64_LDP_STP_POLICY_ALIGNED,   /* ldp_policy_model.  */
> >AARCH64_LDP_STP_POLICY_ALIGNED/* stp_policy_model.  */
> > diff --git a/gcc/config/aarch64/tun

[PATCH v4] RISC-V: Fix register overlap issue for some xtheadvector instructions

2024-01-02 Thread Jun Sha (Joshua)

For th.vmadc/th.vmsbc as well as narrowing arithmetic instructions
and floating-point compare instructions, an illegal instruction
exception will be raised if the destination vector register overlaps
a source vector register group.

To handle this issue, we use "group_overlap" and "enabled" attribute
to disable some alternatives for xtheadvector.

gcc/ChangeLog:

* config/riscv/riscv.md (none,W21,W42,W84,W43,W86,W87,W0):
(none,W21,W42,W84,W43,W86,W87,W0,th):
Add group-overlap constraint for xtheadvector.
* config/riscv/vector.md: 
Disable alternatives that destination register overlaps
source register group for xtheadvector.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/riscv.md  |   6 +-
 gcc/config/riscv/vector.md | 314 +
 2 files changed, 185 insertions(+), 135 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 68f7203b676..d736501784d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -504,7 +504,7 @@
 ;; Widening instructions have group-overlap constraints.  Those are only
 ;; valid for certain register-group sizes.  This attribute marks the
 ;; alternatives not matching the required register-group size as disabled.
-(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0"
+(define_attr "group_overlap" "none,W21,W42,W84,W43,W86,W87,W0,th"
   (const_string "none"))
 
 (define_attr "group_overlap_valid" "no,yes"
@@ -543,6 +543,10 @@
  (and (eq_attr "group_overlap" "W0")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
> 1"))
 (const_string "no")
+
+ (and (eq_attr "group_overlap" "th")
+ (match_test "TARGET_XTHEADVECTOR"))
+(const_string "no")
 ]
(const_string "yes")))
 
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index cb30c9ae97c..63d0573d4aa 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -3248,7 +3248,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none,none")])
 
 (define_insn "@pred_msbc"
   [(set (match_operand: 0 "register_operand""=vr, vr, &vr")
@@ -3267,7 +3268,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,th,none")])
 
 (define_insn "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3287,7 +3289,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3307,7 +3310,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_madc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3356,7 +3360,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_madc_extended_scalar"
   [(set (match_operand: 0 "register_operand" "=vr, &vr")
@@ -3377,7 +3382,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_expand "@pred_msbc_scalar"
   [(set (match_operand: 0 "register_operand")
@@ -3426,7 +3432,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "*pred_msbc_extended_scalar"
   [(set (match_operand: 0 "register_operand"  "=vr, &vr")
@@ -3447,7 +3454,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "4")
-   (set (attr "avl_type_idx") (const_int 5))])
+   (set (attr "avl_type_idx") (const_int 5))
+   (set_attr "group_overlap" "th,none")])
 
 (define_insn "@pred_madc_overflow"
   [(set (match_operand: 0 "register_operand" "=vr, &vr, &vr")
@@ -3465,7 +3473,8 @@
   [(set_attr "type" "vicalu")
(set_attr "mode" "")
(set_attr "vl_op_idx" "3")
-   (set (attr "avl_typ

[PATCH v4] RISC-V: Rewrite some instructions using ASM targethook

2024-01-02 Thread Jun Sha (Joshua)

There are some xtheadvector instructions that differ from RVV1.0
apart from simply adding "th." prefix. For example, RVV1.0
load/store instructions will have SEW while xtheadvector not;
RVV1.0 will have "o" for indexed-ordered store instructions while
xtheadvecotr not; xtheadvector and RVV1.0 have different
vnsrl/vnsra/vfncvt suffix (vv/vx/vi vs wv/wx/wi).

To address this issue without duplicating patterns, we use ASM
targethook to rewrite the whole string of the instructions. We
identify different instructions from the corresponding attribute.

gcc/ChangeLog:

* config/riscv/thead.cc
(th_asm_output_opcode): Rewrite some instructions.

Co-authored-by: Jin Ma 
Co-authored-by: Xianmiao Qu 
Co-authored-by: Christoph Müllner 
---
 gcc/config/riscv/thead.cc | 215 +-
 1 file changed, 213 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/thead.cc b/gcc/config/riscv/thead.cc
index dc3aed3904d..fb088ebff02 100644
--- a/gcc/config/riscv/thead.cc
+++ b/gcc/config/riscv/thead.cc
@@ -27,6 +27,7 @@
 #include "backend.h"
 #include "tree.h"
 #include "rtl.h"
+#include "insn-attr.h"
 #include "explow.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
@@ -890,8 +891,218 @@ th_asm_output_opcode (FILE *asm_out_file, const char *p)
 {
   /* We need to add th. prefix to all the xtheadvector
  instructions here.*/
-  if (current_output_insn != NULL && p[0] == 'v')
-fputs ("th.", asm_out_file);
+  if (current_output_insn != NULL)
+{
+  if (get_attr_type (current_output_insn) == TYPE_VLDE ||
+ get_attr_type (current_output_insn) == TYPE_VSTE ||
+ get_attr_type (current_output_insn) == TYPE_VLDFF)
+   {
+ if (strstr (p, "e8") || strstr (p, "e16") ||
+ strstr (p, "e32") || strstr (p, "e64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTE
+ ? fputs ("th.vse", asm_out_file)
+ : fputs ("th.vle", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+4;
+ else
+   return p+5;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLDS ||
+ get_attr_type (current_output_insn) == TYPE_VSTS)
+   {
+ if (strstr (p, "vle8") || strstr (p, "vse8") ||
+ strstr (p, "vle16") || strstr (p, "vse16") ||
+ strstr (p, "vle32") || strstr (p, "vse32") ||
+ strstr (p, "vle64") || strstr (p, "vse64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTS
+ ? fputs ("th.vse", asm_out_file)
+ : fputs ("th.vle", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+4;
+ else
+   return p+5;
+   }
+ else if (strstr (p, "vlse8") || strstr (p, "vsse8") ||
+  strstr (p, "vlse16") || strstr (p, "vsse16") ||
+  strstr (p, "vlse32") || strstr (p, "vsse32") ||
+  strstr (p, "vlse64") || strstr (p, "vsse64"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTS
+ ? fputs ("th.vsse", asm_out_file)
+ : fputs ("th.vlse", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+5;
+ else
+   return p+6;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLDUX ||
+ get_attr_type (current_output_insn) == TYPE_VLDOX)
+   {
+ if (strstr (p, "ei"))
+   {
+ fputs ("th.vlxe", asm_out_file);
+ if (strstr (p, "ei8"))
+   return p+7;
+ else
+   return p+8;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VSTUX ||
+ get_attr_type (current_output_insn) == TYPE_VSTOX)
+   {
+ if (strstr (p, "ei"))
+   {
+ get_attr_type (current_output_insn) == TYPE_VSTUX
+   ? fputs ("th.vsuxe", asm_out_file)
+   : fputs ("th.vsxe", asm_out_file);
+ if (strstr (p, "ei8"))
+   return p+7;
+ else
+   return p+8;
+   }
+   }
+
+  if (get_attr_type (current_output_insn) == TYPE_VLSEGDE ||
+ get_attr_type (current_output_insn) == TYPE_VSSEGTE ||
+ get_attr_type (current_output_insn) == TYPE_VLSEGDFF)
+   {
+ get_attr_type (current_output_insn) == TYPE_VSSEGTE
+   ? fputs ("th.vsseg", asm_out_file)
+   : fputs ("th.vlseg", asm_out_file);
+ asm_fprintf (asm_out_file, "%c", p[5]);
+ fputs ("e", asm_out_file);
+ if (strstr (p, "e8"))
+   return p+8;
+ else
+   return p+9;
+   }
+
+  if (get_attr_type (current_outp

87 matches

Mail list logo