Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:

> This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> scope
> which is going to be used in VSETVL PASS of RISC-V backend.
> 
> The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> dominate and pre-config for most
> of the RVV instructions in order to elide redundant VSETVLs.
> 
> For exmaple:
> 
> for
>  for
>   for
> if (cond}
>   VSETVL demand 1: SEW/LMUL = 16 and TU policy
> else
>   VSETVL demand 2: SEW = 32
> 
> VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW 
> = 32, LMUL = M2, TU policy.
> Then emit such VSETVL at the outmost of the for loop to get the most optimal 
> codegen and run-time execution.
> 
> Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> un-reliable as well as un-maintainable.
> And, I recently read dragon book and morgan's book again, I found there 
> "earliest" can allow us to do the
> demand fusion in a very reliable and optimal way.
> 
> So, this patch exports these 2 functions which are very helpful for VSETVL 
> pass.

It would be nice to put these internal functions into a class or a
namespace given their non LCM name.  I don't see how you are going
to use these intermediate DF functions - they are just necessary
to compute pre_edge_lcm_avs which I see you already do.  Just to say
you are possibly going to blow up compile-time complexity of your
VSETVL dataflow problem?

> gcc/ChangeLog:
> 
>   * lcm.cc (compute_antinout_edge): Export as global use.
>   (compute_earliest): Ditto.
>   (compute_rev_insert_delete): Ditto.
>   * lcm.h (compute_antinout_edge): Ditto.
>   (compute_earliest): Ditto.
> 
> ---
>  gcc/lcm.cc | 7 ++-
>  gcc/lcm.h  | 3 +++
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> index 94a3ed43aea..03421e490e4 100644
> --- a/gcc/lcm.cc
> +++ b/gcc/lcm.cc
> @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "lcm.h"
>  
>  /* Edge based LCM routines.  */
> -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> -   sbitmap *, sbitmap *, sbitmap *);
>  static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
>sbitmap *, sbitmap *);
>  static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
> @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list 
> *edge_list, sbitmap *,
> This is done based on the flow graph, and not on the pred-succ lists.
> Other than that, its pretty much identical to compute_antinout.  */
>  
> -static void
> +void
>  compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin,
>  sbitmap *antout)
>  {
> @@ -170,7 +167,7 @@ compute_antinout_edge (sbitmap *antloc, sbitmap *transp, 
> sbitmap *antin,
>  
>  /* Compute the earliest vector for edge based lcm.  */
>  
> -static void
> +void
>  compute_earliest (struct edge_list *edge_list, int n_exprs, sbitmap *antin,
> sbitmap *antout, sbitmap *avout, sbitmap *kill,
> sbitmap *earliest)
> diff --git a/gcc/lcm.h b/gcc/lcm.h
> index e08339352e0..7145d6fc46d 100644
> --- a/gcc/lcm.h
> +++ b/gcc/lcm.h
> @@ -31,4 +31,7 @@ extern struct edge_list *pre_edge_rev_lcm (int, sbitmap *,
>  sbitmap *, sbitmap *,
>  sbitmap *, sbitmap **,
>  sbitmap **);
> +extern void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> +extern void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> +   sbitmap *, sbitmap *, sbitmap *);
>  #endif /* GCC_LCM_H */
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-21 Thread Li, Pan2 via Gcc-patches
To double confirm, you mean this declaration ?

+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, August 21, 2023 2:40 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API

Why does this patch not have HAS_FRM?


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-17 16:05
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic 
API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples

* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  2 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  1 +
.../riscv/rvv/base/float-point-wredusum.c | 33 +++
4 files changed, 37 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index abf03bab0da..5ee7d3119db 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2548,6 +2548,7 @@ static CONSTEXPR const freducop 
vfredosum_frm_obj;
static CONSTEXPR const reducop vfredmax_obj;
static CONSTEXPR const reducop vfredmin_obj;
static CONSTEXPR const widen_freducop vfwredusum_obj;
+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;
static CONSTEXPR const widen_freducop vfwredosum_obj;
static CONSTEXPR const widen_freducop 
vfwredosum_frm_obj;
static CONSTEXPR const vmv vmv_x_obj;
@@ -2810,6 +2811,7 @@ BASE (vfredmin)
BASE (vfwredosum)
BASE (vfwredosum_frm)
BASE (vfwredusum)
+BASE (vfwredusum_frm)
BASE (vmv_x)
BASE (vmv_s)
BASE (vfmv_f)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index c1bb164a712..69d4562091f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -247,6 +247,7 @@ extern const function_base *const vfredmin;
extern const function_base *const vfwredosum;
extern const function_base *const vfwredosum_frm;
extern const function_base *const vfwredusum;
+extern const function_base *const vfwredusum_frm;
extern const function_base *const vmv_x;
extern const function_base *const vmv_s;
extern const function_base *const vfmv_f;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index da1157f5a56..3ce06dc60b7 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -508,6 +508,7 @@ DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, 
wf_vs_ops)
DEF_RVV_FUNCTION (vfwredusum, reduc_alu, no_mu_preds, wf_vs_ops)
DEF_RVV_FUNCTION (vfwredosum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
+DEF_RVV_FUNCTION (vfwredusum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
/* 15. Vector Mask Instructions.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
new file mode 100644
index 000..6c888c10c0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1_rm (vfloat32m1_t op1, vfloat64m1_t op2,
+ size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm (op1, op2, 0, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_rm_m (vbool32_t mask, vfloat32m1_t op1,
+  vfloat64m1_t op2, size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1 (vfloat32m1_t op1, vfloat64m1_t op2,
+   size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1 (op1, op2, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_m (vbool32_t mask, vfloat32m1_t op1,
+   vfloat64m1_t op2, size_t vl) {
+  retur

Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread juzhe.zh...@rivai.ai
Hi, Richi.

This patch is how I use LCM functions:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627953.html 

>> they are just necessary
>> to compute pre_edge_lcm_avs which I see you already do.  

In Phase 4 I use pre_edge_lcm_av to PRE fo current VSETVL cfg information.
However, it's not enough since I need phase 3 fuse VSETVL information to get 
better codegen.

The is how I use the functions:

+  /* Compute global availability.  */
   compute_available (m_vector_manager->vector_comp,
 m_vector_manager->vector_kill,
 m_vector_manager->vector_avout,
 m_vector_manager->vector_avin);
-  changed_p |= cleanup_illegal_dirty_blocks ();
+  /* Compute global anticipatability.  */
+  compute_antinout_edge (m_vector_manager->vector_antic,
+m_vector_manager->vector_transp,
+m_vector_manager->vector_antin,
+m_vector_manager->vector_antout);
+  /* Compute earliestness.  */
+  compute_earliest (m_vector_manager->vector_edge_list,
+   m_vector_manager->vector_exprs.length (),
+   m_vector_manager->vector_antin,
+   m_vector_manager->vector_antout,
+   m_vector_manager->vector_avout,
+   m_vector_manager->vector_kill,
+   m_vector_manager->vector_earliest);
+  changed_p |= earliest_fusion ();
You can see I explicitly call 'compute_earliest' followed by 'earliest_fusion'I 
need the result from 'compute_earliest' to do the VSETVL fusion that's the 
information 'pre_edge_lcm_av' didn't give us.
>> Just to say
>> you are possibly going to blow up compile-time complexity of your
>> VSETVL dataflow problem?

No, I export 'compute_earliest' as global because 'pre_edge_lcm_av' didn't give 
us the 'earliest' result that I need.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-21 15:09
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS 
use in RISC-V backend
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
 
> This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> scope
> which is going to be used in VSETVL PASS of RISC-V backend.
> 
> The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> dominate and pre-config for most
> of the RVV instructions in order to elide redundant VSETVLs.
> 
> For exmaple:
> 
> for
>  for
>   for
> if (cond}
>   VSETVL demand 1: SEW/LMUL = 16 and TU policy
> else
>   VSETVL demand 2: SEW = 32
> 
> VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW 
> = 32, LMUL = M2, TU policy.
> Then emit such VSETVL at the outmost of the for loop to get the most optimal 
> codegen and run-time execution.
> 
> Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> un-reliable as well as un-maintainable.
> And, I recently read dragon book and morgan's book again, I found there 
> "earliest" can allow us to do the
> demand fusion in a very reliable and optimal way.
> 
> So, this patch exports these 2 functions which are very helpful for VSETVL 
> pass.
 
It would be nice to put these internal functions into a class or a
namespace given their non LCM name.  I don't see how you are going
to use these intermediate DF functions - they are just necessary
to compute pre_edge_lcm_avs which I see you already do.  Just to say
you are possibly going to blow up compile-time complexity of your
VSETVL dataflow problem?
 
> gcc/ChangeLog:
> 
> * lcm.cc (compute_antinout_edge): Export as global use.
> (compute_earliest): Ditto.
> (compute_rev_insert_delete): Ditto.
> * lcm.h (compute_antinout_edge): Ditto.
> (compute_earliest): Ditto.
> 
> ---
>  gcc/lcm.cc | 7 ++-
>  gcc/lcm.h  | 3 +++
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> index 94a3ed43aea..03421e490e4 100644
> --- a/gcc/lcm.cc
> +++ b/gcc/lcm.cc
> @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "lcm.h"
>  
>  /* Edge based LCM routines.  */
> -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> -   sbitmap *, sbitmap *, sbitmap *);
>  static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
>   sbitmap *, sbitmap *);
>  static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
> @@ -79,7 +76,7 @@ static void compute_rev_insert_delete (struct edge_list 
> *edge_list, sbitmap *,
> This is done based on the flow graph, and not on the pred-succ lists.
> Other than that, its pretty much identical to compute_antinout.  */
>  
> -static void
> +void
>  compute_antinout_edge (sbitmap *antloc, sbitmap *transp, sbitmap *antin,
> sbit

[PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-21 Thread Pan Li via Gcc-patches
From: Pan Li 

We have EMIT hook in mode switching already, which will insert the
insn before in most cases. However, in some arch like RISC-V, it
requires the additional insn to be inserted after when meet a call.

   |
   | <- EMIT HOOK, insert the insn before.
 +---+
 | ptr->insn |
 +---+
   | <- EMIT_AFTER HOOK, insert the insn after.
   |

Thus, this patch would like to add one optional EMIT_AFTER hook, which
will try to insert the emitted insn after. The end-user can either
implement this HOOK or leave it NULL as is.

If the backend ignore this optinal hook, there is no impact to the
original mode switching stuff. If the backend implement this optional
hook, the mode switching will try to insert the insn after. Please note
the EMIT_AFTER doen't have any impact to EMIT hook.

Passed both the regression and bootstrap test in x86.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* doc/tm.texi: Add hook def and update the description.
* doc/tm.texi.in: Ditto.
* mode-switching.cc (optimize_mode_switching): Insert the
emitted insn after ptr->insn.
* target.def (insn): Define emit_after hook.
---
 gcc/doc/tm.texi   | 12 ++--
 gcc/doc/tm.texi.in|  6 --
 gcc/mode-switching.cc | 45 +++
 gcc/target.def|  9 +
 4 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index e4d0cc43f41..a0798d4468b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10334,8 +10334,8 @@ return nonzero for any @var{entity} that needs 
mode-switching.
 If you define this macro, you also have to define
 @code{NUM_MODES_FOR_MODE_SWITCHING}, @code{TARGET_MODE_NEEDED},
 @code{TARGET_MODE_PRIORITY} and @code{TARGET_MODE_EMIT}.
-@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, and @code{TARGET_MODE_EXIT}
-are optional.
+@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, 
@code{TARGET_MODE_EMIT_AFTER},
+and @code{TARGET_MODE_EXIT} are optional.
 @end defmac
 
 @defmac NUM_MODES_FOR_MODE_SWITCHING
@@ -10359,6 +10359,14 @@ to switch from. Sets of a lower numbered entity will 
be emitted before
 sets of a higher numbered entity to a mode of the same or lower priority.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_MODE_EMIT_AFTER (int @var{entity}, int 
@var{mode}, int @var{prev_mode}, HARD_REG_SET @var{regs_live})
+Generate one or more insns to set @var{entity} to @var{mode}.
+@var{hard_reg_live} is the set of hard registers live at the point where
+the insn(s) are to be inserted after. @var{prev_moxde} indicates the mode
+to switch from. Sets of a lower numbered entity will be emitted before
+sets of a higher numbered entity to a mode of the same or lower priority.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_MODE_NEEDED (int @var{entity}, rtx_insn 
*@var{insn})
 @var{entity} is an integer specifying a mode-switched entity.
 If @code{OPTIMIZE_MODE_SWITCHING} is defined, you must define this macro
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..2942ce0be3b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -6911,8 +6911,8 @@ return nonzero for any @var{entity} that needs 
mode-switching.
 If you define this macro, you also have to define
 @code{NUM_MODES_FOR_MODE_SWITCHING}, @code{TARGET_MODE_NEEDED},
 @code{TARGET_MODE_PRIORITY} and @code{TARGET_MODE_EMIT}.
-@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, and @code{TARGET_MODE_EXIT}
-are optional.
+@code{TARGET_MODE_AFTER}, @code{TARGET_MODE_ENTRY}, 
@code{TARGET_MODE_EMIT_AFTER},
+and @code{TARGET_MODE_EXIT} are optional.
 @end defmac
 
 @defmac NUM_MODES_FOR_MODE_SWITCHING
@@ -6930,6 +6930,8 @@ switch is needed / supplied.
 
 @hook TARGET_MODE_EMIT
 
+@hook TARGET_MODE_EMIT_AFTER
+
 @hook TARGET_MODE_NEEDED
 
 @hook TARGET_MODE_AFTER
diff --git a/gcc/mode-switching.cc b/gcc/mode-switching.cc
index f483c831c35..98051dff487 100644
--- a/gcc/mode-switching.cc
+++ b/gcc/mode-switching.cc
@@ -34,6 +34,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "lcm.h"
 #include "cfgcleanup.h"
 #include "tree-pass.h"
+#include "cfgbuild.h"
+#include "gcse.h"
 
 /* We want target macros for the mode switching code to be able to refer
to instruction attribute values.  */
@@ -831,6 +833,49 @@ optimize_mode_switching (void)
emit_insn_before (mode_set, ptr->insn_ptr);
}
 
+ if (targetm.mode_switching.emit_after)
+   {
+ if (control_flow_insn_p (ptr->insn_ptr)
+   && ptr->insn_ptr == BB_END (bb))
+   {
+ edge eg;
+ edge_iterator eg_iterator;
+
+ FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
+   {
+ start_sequence ();
+ targetm.mode_switching.emit_after (entity_map[j],
+   

Re: [PATCH] Testsuite, LTO: silence warning to make test pass on Darwin

2023-08-21 Thread Richard Biener via Gcc-patches
On Sun, Aug 20, 2023 at 12:24 PM FX Coudert via Gcc-patches
 wrote:
>
> Hi,
>
> On darwin (both x86_64-apple-darwin and aarch64-apple-darwin) we see the 
> following test failure:
>
> FAIL: gcc.dg/lto/20091013-1 c_lto_20091013-1_2.o assemble, -fPIC -r -nostdlib 
> -O2 -flto
>
> which is due to this extra warning:
>
> In function 'fontcmp',
> inlined from 'find_in_cache' at 
> /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:140:13,
> inlined from 'WineEngCreateFontInstance' at 
> /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:160:15:
> /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:107:8: warning: 
> 'memcmp' specified bound 4 exceeds source size 0 [-Wst
> ringop-overread]
> /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c: In function 
> 'WineEngCreateFontInstance':
> /tmp/gcc-darwin-arm64/gcc/testsuite/gcc.dg/lto/20091013-1_2.c:66:20: note: 
> source object allocated here
>
> Now, the main file for the test has:
>
> /* { dg-extra-ld-options "-flinker-output=nolto-rel -Wno-stringop-overread" } 
> */
>
> and I believe the intent of -Wno-stringop-overread is to silence this 
> warning, but that only applies to the linker, and the warning on darwin is 
> produced by the compiler (in addition to the linker). Adding the flag to the 
> compilation of the source file makes the test pass on darwin.

In the end this is because darwin is -ffat-lto-objects and not using
the linker plugin(?)

> OK to commit?

OK.

Thanks,
Richard.

> FX
>
>


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
 wrote:
>
> On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > With the proposed design of these switches, how would I restrict AVX10.1
> > to particular AVX-512 subsets?
> We can't, avx10.1 is taken as an indivisible ISA which contains all
> AVX512 related instructions.
>
> > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > allowed, so in some cases it might prove difficult to guarantee this).
> intel sde support avx10.1-256 target which can be used to validate the
> binary(if there's invalid 512-bit vector register or 64-bit kmask
> register is used).
> > I don’t see any other way of doing what you want within the constraints of 
> > this design.
> It looks like the requirement is that we want a
> -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> option that acts on the original -mavx512XXX option to produce
> avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> include avx512fp16 directives and thus not be backward compatible
> SKX/CLX/ICX.

Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
flag that would restrict AVX512VL to 256bit, possibly using a common internal
flag for this and the -mavx10.1-256 vector size effect.

Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
-mavx512vl-256?  Writing these the last looks most sensible to me?
Note it should combine with -mavx512vl to -mavx512vl-256 to make
-march=native -mavx512vl-256 work (I think we should also allow the
flag together with -mavx10.1*?)

mavx512vl-256
Target ...
Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
the 256bit vector ISA subset of AVX512.

Richard.

> >
> > For example, usage of the |_mm256_rol_epi32| intrinsic should be
> > compatible on any AVX10/256 implementation, /as well as /any AVX-512VL
> > without AVX10 implementation (e.g. Skylake-X).  But how do I signal that
> > I want compatibility with both these targets?
> >
> >   * |-mavx512vl| lets the compiler use 512-bit registers -> incompatible
> > with 256-bit AVX10.
> >   * |-mavx512vl -mprefer-vector-width=256| might steer the compiler away
> > from 512-bit registers, but I don't think it guarantees it.
> >   * |-mavx10.1-256| lets the compiler use all Sapphire Rapids AVX-512
> > features at 256-bit wide (so in theory, it could choose to compile
> > it with |vpshldd|) -> incompatible with Skylake-X.
> >   * |-mavx10.1-256 -mno-avx512fp16 -mno-avx512...| will emit a warning
> > and ignore the attempts at disabling AVX-512 subsets.
> >   * |-mavx10.1-256 -mavx512vl| takes the /union/ of the features, not
> > the /intersection./
> >
> > Is there something like |-mavx512vl -mmax-vector-width=256|, or am I
> > misunderstanding the situation?
> >
> > Thanks!
>
>
>
> --
> BR,
> Hongtao


Re: [PATCH] MATCH: [PR111002] Sink view_convert for vec_cond

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 5:22 AM Andrew Pinski via Gcc-patches
 wrote:
>
> Like convert we can sink view_convert into vec_cond but
> we can only do it if the element types are nop_conversions.
> This is to allow conversion between signed and unsigned types only.
> Rather than between integer and float types which mess up the vec_cond
> so that isel does not understand `a?-1:0` is still that.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.

OK.

> PR tree-optimization/111002
>
> gcc/ChangeLog:
>
> * match.pd (view_convert(vec_cond(a,b,c))): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/sve/cond_convert_8.c: New test.
> ---
>  gcc/match.pd  |  9 
>  .../gcc.target/aarch64/sve/cond_convert_8.c   | 22 +++
>  2 files changed, 31 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 851f1af6eac..81666f28465 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4718,6 +4718,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& types_match (TREE_TYPE (@0), truth_type_for (type)))
>(vec_cond @0 (convert! @1) (convert! @2
>
> +/* Likewise for view_convert of nop_conversions. */
> +(simplify
> + (view_convert (vec_cond:s @0 @1 @2))
> + (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@1))
> +  && known_eq (TYPE_VECTOR_SUBPARTS (type),
> +  TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
> +  && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
> (@1
> +  (vec_cond @0 (view_convert! @1) (view_convert! @2
> +
>  /* Sink binary operation to branches, but only if we can fold it.  */
>  (for op (tcc_comparison plus minus mult bit_and bit_ior bit_xor
>  lshift rshift rdiv trunc_div ceil_div floor_div round_div
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c
> new file mode 100644
> index 000..d8b96e5fcfb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/cond_convert_8.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ftree-vectorize -moverride=sve_width=256 
> -fdump-tree-optimized" } */
> +/* PR tree-optimization/111002 */
> +
> +/* We should be able to remove the neg. */
> +
> +void __attribute__ ((noipa))
> +f (int *__restrict r,
> +   int *__restrict a,
> +   short *__restrict pred)
> +{
> +  for (int i = 0; i < 1024; ++i)
> +r[i] = pred[i] != 0 ? -1 : 0;
> +}
> +
> +
> +/* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.h, p[0-7]+/z, #-1} 1 } 
> } */
> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.[hs], p[0-7]+/z, #1} } } 
> */
> +
> +/* { dg-final { scan-tree-dump-not "VIEW_CONVERT_EXPR " "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = -" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not " = \\\(vector" "optimized" } } */
> --
> 2.31.1
>


Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread juzhe.zh...@rivai.ai
Hi. Richi.
I'd like to share more details that I want to do in VSETVL PASS.

Consider this following case:

for
  for 
for
  ...
 for
 VSETVL demand: RATIO = 32 and TU policy.

For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
"vsetvli e32,tu" to the outer-most loop.

However, for this case:
  for
  for 
for
  ...
 for
   if (...)
 VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
 VSETVL 2 demand: SEW = 16.
   else
 VSETVL 3 demand: MU policy.

'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 1,  
VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce by 
RVV intrinsic and they are already in our RVV testsuite.

To get the optimal codegen for this case,  We need I call it as "Demand fusion" 
which is fusing all "compatible" VSETVLs into a single VSETVL
then set them to avoid redundant VSETVLs.

In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands (VSETVL 
1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
program will be transformed as:

VSETVL SEW = 16, LMUL = MF2, TU, MU
  for
  for 
for
  ...
 for
   if (...) 
 .   no vsetvl insn.
   else if (...)
   no vsetvl insn.

   else
   no vsetvl insn.

So, how to do the demand fusion in this case? 
Before this patch and following RISC-V refactor patch, I do it explictly with 
my own decide algorithm.
Meaning I calculate which location of the program to do the VSETVL fusion is 
correct and optimal.

However, I found "compute_earliest" can help us to do the job for calculating 
the location of the program to do VSETVL fusion and
turns out it's a quite more reliable and reasonable approach than I do.

So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
fusion) in RISC-V backend VSETVL PASS.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-21 15:09
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS 
use in RISC-V backend
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
 
> This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> scope
> which is going to be used in VSETVL PASS of RISC-V backend.
> 
> The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> dominate and pre-config for most
> of the RVV instructions in order to elide redundant VSETVLs.
> 
> For exmaple:
> 
> for
>  for
>   for
> if (cond}
>   VSETVL demand 1: SEW/LMUL = 16 and TU policy
> else
>   VSETVL demand 2: SEW = 32
> 
> VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: SEW 
> = 32, LMUL = M2, TU policy.
> Then emit such VSETVL at the outmost of the for loop to get the most optimal 
> codegen and run-time execution.
> 
> Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> un-reliable as well as un-maintainable.
> And, I recently read dragon book and morgan's book again, I found there 
> "earliest" can allow us to do the
> demand fusion in a very reliable and optimal way.
> 
> So, this patch exports these 2 functions which are very helpful for VSETVL 
> pass.
 
It would be nice to put these internal functions into a class or a
namespace given their non LCM name.  I don't see how you are going
to use these intermediate DF functions - they are just necessary
to compute pre_edge_lcm_avs which I see you already do.  Just to say
you are possibly going to blow up compile-time complexity of your
VSETVL dataflow problem?
 
> gcc/ChangeLog:
> 
> * lcm.cc (compute_antinout_edge): Export as global use.
> (compute_earliest): Ditto.
> (compute_rev_insert_delete): Ditto.
> * lcm.h (compute_antinout_edge): Ditto.
> (compute_earliest): Ditto.
> 
> ---
>  gcc/lcm.cc | 7 ++-
>  gcc/lcm.h  | 3 +++
>  2 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> index 94a3ed43aea..03421e490e4 100644
> --- a/gcc/lcm.cc
> +++ b/gcc/lcm.cc
> @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
>  #include "lcm.h"
>  
>  /* Edge based LCM routines.  */
> -static void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap 
> *);
> -static void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *,
> -   sbitmap *, sbitmap *, sbitmap *);
>  static void compute_laterin (struct edge_list *, sbitmap *, sbitmap *,
>   sbitmap *, sbitmap *);
>  static void compute_insert_delete (struct edge_list *edge_list, sbitmap *,
> @@ -79,7 +76,7 @@ static void compute_rev_inser

Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-21 Thread Richard Biener via Gcc-patches
On Fri, Aug 18, 2023 at 7:13 PM Bradley Lucier  wrote:
>
> On 8/17/23 3:54 AM, Richard Biener wrote:
> > I think it needs a new category, 'inline' is probably the "closest" 
> > existing one
> > but that also tends to be noisy.  Maybe 'call' would be a good name?  We 
> > could
> > report things like tail-recursion optimization, tail-calling and sibling 
> > calling
> > optimizations there, possibly also return/argument copy elision.
>
> OK, thanks.
>
> I have two questions:
>
> 1.  Is the information dumped by -fopt-info intended for compiler
> developers, to see something of the internal logic of gcc, or for end users?

-fopt-info is for the user.  Note when you dump -all instead of
-missed or -optimized
you will get a lot of extra and eventually less useful things.  Basically
-fopt-info splices out info from passes that's both useful to users and also
things that are mostly interesting to compiler developers (that's -all).

> 2.  You say that "'inline' ... tends to be noisy".  Most of the output I
> see from -fopt-info-missed is basically
>
> _io.c:103829:4: missed:   not inlinable: ___H___io/396 ->
> __builtin_expect/2486, function body not available
>
> Is ___builtin_expect truly a function whose body is not available, or
> should -fopt-info-missed not report these instances?

Yeah, that looks like a mis-classification to me.  It's using
the assigned CIF code (cif-code.def), and yes, we don't
have a function body for __builtin_expect but that's not the
reason we should present to users.  Instead of
MSG_MISSED_OPTIMIZATION for these class of callees
we should use MSG_NOTE.  At least to me a missed
optimization should point out places we would eventually
expect to be optimized either by providing enough of
the constraints required in the source or by fixing a missed
optimization capability in the compiler.

Richard.

> Brad


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread ZiNgA BuRgA via Gcc-patches

Thanks for the responses!

It'd be unfortunate if AVX10 adoption is desired, yet there's no way to 
compile existing 256-bit code to be compatible with it.

Relying on SDE to check the output isn't a particularly viable solution.

It looks like `-mavx512vl -mprefer-vector-width=256` is my best bet 
under this design, and hope it works.  Fortunately, I'm not relying on 
3rd party code here, so I control all intrinsics used.


Something like a `-mmax-vector-width=256` option sounds more preferrable 
though, particularly for those using 3rd party code which checks the 
`__AVX512VL__` define, and assumes 512-bit vectors are available.


Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi. Richi.
> I'd like to share more details that I want to do in VSETVL PASS.
> 
> Consider this following case:
> 
> for
>   for 
> for
>   ...
>  for
>VSETVL demand: RATIO = 32 and TU policy.
> 
> For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
> "vsetvli e32,tu" to the outer-most loop.
> 
> However, for this case:
>   for
>   for 
> for
>   ...
>  for
>  if (...)
>VSETVL 1 demand: RATIO = 32 and TU policy.
>  else if (...)
>VSETVL 2 demand: SEW = 16.
>  else
>VSETVL 3 demand: MU policy.
> 
> 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 
> 1,  VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
> 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce 
> by RVV intrinsic and they are already in our RVV testsuite.
> 
> To get the optimal codegen for this case,  We need I call it as "Demand 
> fusion" which is fusing all "compatible" VSETVLs into a single VSETVL
> then set them to avoid redundant VSETVLs.
> 
> In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
> new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
> new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands 
> (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
> Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
> program will be transformed as:
> 
> VSETVL SEW = 16, LMUL = MF2, TU, MU
>   for
>   for 
> for
>   ...
>  for
>  if (...) 
>.   no vsetvl insn.
>  else if (...)
>  no vsetvl insn.
> 
>  else
>  no vsetvl insn.
> 
> So, how to do the demand fusion in this case? 
> Before this patch and following RISC-V refactor patch, I do it explictly with 
> my own decide algorithm.
> Meaning I calculate which location of the program to do the VSETVL fusion is 
> correct and optimal.
> 
> However, I found "compute_earliest" can help us to do the job for calculating 
> the location of the program to do VSETVL fusion and
> turns out it's a quite more reliable and reasonable approach than I do.
> 
> So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
> fusion) in RISC-V backend VSETVL PASS.

Thanks for the explanation, exporting the functions is OK.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-21 15:09
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
>  
> > This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> > scope
> > which is going to be used in VSETVL PASS of RISC-V backend.
> > 
> > The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> > dominate and pre-config for most
> > of the RVV instructions in order to elide redundant VSETVLs.
> > 
> > For exmaple:
> > 
> > for
> >  for
> >   for
> > if (cond}
> >   VSETVL demand 1: SEW/LMUL = 16 and TU policy
> > else
> >   VSETVL demand 2: SEW = 32
> > 
> > VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: 
> > SEW = 32, LMUL = M2, TU policy.
> > Then emit such VSETVL at the outmost of the for loop to get the most 
> > optimal codegen and run-time execution.
> > 
> > Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> > un-reliable as well as un-maintainable.
> > And, I recently read dragon book and morgan's book again, I found there 
> > "earliest" can allow us to do the
> > demand fusion in a very reliable and optimal way.
> > 
> > So, this patch exports these 2 functions which are very helpful for VSETVL 
> > pass.
>  
> It would be nice to put these internal functions into a class or a
> namespace given their non LCM name.  I don't see how you are going
> to use these intermediate DF functions - they are just necessary
> to compute pre_edge_lcm_avs which I see you already do.  Just to say
> you are possibly going to blow up compile-time complexity of your
> VSETVL dataflow problem?
>  
> > gcc/ChangeLog:
> > 
> > * lcm.cc (compute_antinout_edge): Export as global use.
> > (compute_earliest): Ditto.
> > (compute_rev_insert_delete): Ditto.
> > * lcm.h (compute_antinout_edge): Ditto.
> > (compute_earliest): Ditto.
> > 
> > ---
> >  gcc/lcm.cc | 7 ++-
> >  gcc/lcm.h  | 3 +++
> >  2 files changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/gcc/lcm.cc b/gcc/lcm.cc
> > index 94a3ed43aea..03421e490e4 100644
> > --- a/gcc/lcm.cc
> > +++ b/gcc/lcm.cc
> > @@ -56,9 +56,6 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "lcm.h"
> >  
> >  /* Edge based LCM routines.  */
> > -static void compute_antinout_edge (sbitmap *, sbitmap *, 

Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread juzhe.zh...@rivai.ai
>> Thanks for the explanation, exporting the functions is OK.
Thanks.

>> It would be nice to put these internal functions into a class or a
>> namespace given their non LCM name.
Hi, Richi. I saw there is a function "extern void compute_available (sbitmap *, 
sbitmap *, sbitmap *, sbitmap *);"
which is already exported as global.

Do you mean I add those 2 functions (I export this patch) and 
"compute_avaialble" which has already been exported
into namespace lcm like this:

namespace lcm
{
  compute_available 
compute_antinout_edge
compute_earliest
}
?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-21 15:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi. Richi.
> I'd like to share more details that I want to do in VSETVL PASS.
> 
> Consider this following case:
> 
> for
>   for 
> for
>   ...
>  for
>  VSETVL demand: RATIO = 32 and TU policy.
> 
> For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
> "vsetvli e32,tu" to the outer-most loop.
> 
> However, for this case:
>   for
>   for 
> for
>   ...
>  for
>if (...)
>  VSETVL 1 demand: RATIO = 32 and TU policy.
>else if (...)
>  VSETVL 2 demand: SEW = 16.
>else
>  VSETVL 3 demand: MU policy.
> 
> 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 
> 1,  VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
> 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce 
> by RVV intrinsic and they are already in our RVV testsuite.
> 
> To get the optimal codegen for this case,  We need I call it as "Demand 
> fusion" which is fusing all "compatible" VSETVLs into a single VSETVL
> then set them to avoid redundant VSETVLs.
> 
> In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
> new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
> new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands 
> (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
> Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
> program will be transformed as:
> 
> VSETVL SEW = 16, LMUL = MF2, TU, MU
>   for
>   for 
> for
>   ...
>  for
>if (...) 
>  .   no vsetvl insn.
>else if (...)
>    no vsetvl insn.
> 
>else
>    no vsetvl insn.
> 
> So, how to do the demand fusion in this case? 
> Before this patch and following RISC-V refactor patch, I do it explictly with 
> my own decide algorithm.
> Meaning I calculate which location of the program to do the VSETVL fusion is 
> correct and optimal.
> 
> However, I found "compute_earliest" can help us to do the job for calculating 
> the location of the program to do VSETVL fusion and
> turns out it's a quite more reliable and reasonable approach than I do.
> 
> So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
> fusion) in RISC-V backend VSETVL PASS.
 
Thanks for the explanation, exporting the functions is OK.
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-21 15:09
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
>  
> > This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> > scope
> > which is going to be used in VSETVL PASS of RISC-V backend.
> > 
> > The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> > dominate and pre-config for most
> > of the RVV instructions in order to elide redundant VSETVLs.
> > 
> > For exmaple:
> > 
> > for
> >  for
> >   for
> > if (cond}
> >   VSETVL demand 1: SEW/LMUL = 16 and TU policy
> > else
> >   VSETVL demand 2: SEW = 32
> > 
> > VSETVL pass should be able to fuse demand 1 and demand 2 into new demand: 
> > SEW = 32, LMUL = M2, TU policy.
> > Then emit such VSETVL at the outmost of the for loop to get the most 
> > optimal codegen and run-time execution.
> > 
> > Currenty the VSETVL PASS Phase 3 (demand fusion) is really messy and 
> > un-reliable as well as un-maintainable.
> > And, I recently read dragon book and morgan's book again, I found there 
> > "earliest" can allow us to do the
> > demand fusion in a very reliable and optimal way.
> > 
> > So, this patch exports these 2 functions which are very helpful for VSETVL 
> > pass.
>  
> It would be nice to put these internal functions into a class or a
> namespace given their non LCM name.  I don't see how you are going
> to use these intermediate DF functions - they are just necessary
> to compute pre_edge_lcm_avs which I see you already do.  Just to say
> you are possibly going to blow up compile-time compl

[PATCH] tree-optimization/111070 - fix ICE with recent ifcombine fix

2023-08-21 Thread Richard Biener via Gcc-patches
We now got test coverage for non-SSA name bits so the following amends
the SSA_NAME_OCCURS_IN_ABNORMAL_PHI checks.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR tree-optimization/111070
* tree-ssa-ifcombine.cc (ifcombine_ifandif): Check we have
an SSA name before checking SSA_NAME_OCCURS_IN_ABNORMAL_PHI.

* gcc.dg/pr111070.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr111070.c | 20 
 gcc/tree-ssa-ifcombine.cc   |  9 ++---
 2 files changed, 26 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr111070.c

diff --git a/gcc/testsuite/gcc.dg/pr111070.c b/gcc/testsuite/gcc.dg/pr111070.c
new file mode 100644
index 000..1ebc7adf782
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111070.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+/* common */
+char c;
+/* arrays must be 8 byte aligned, regardless of size */
+char c_ary[1];
+
+/* data */
+char d = 1;
+char d_ary[1] = {1};
+
+int main ()
+{
+  if (((unsigned long)&c_ary[0] & 7) != 0)
+return 1;
+  if (((unsigned long)&d_ary[0] & 7) != 0)
+return 1;
+  return 0;
+}
diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
index d5701e8c407..46b076804f4 100644
--- a/gcc/tree-ssa-ifcombine.cc
+++ b/gcc/tree-ssa-ifcombine.cc
@@ -430,7 +430,8 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
 {
   tree t, t2;
 
-  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name1))
+  if (TREE_CODE (name1) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name1))
return false;
 
   /* Do it.  */
@@ -489,8 +490,10 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
inner_inv,
   gimple_stmt_iterator gsi;
   tree t;
 
-  if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name1)
- || SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name2))
+  if ((TREE_CODE (name1) == SSA_NAME
+  && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name1))
+ || (TREE_CODE (name2) == SSA_NAME
+ && SSA_NAME_OCCURS_IN_ABNORMAL_PHI (name2)))
return false;
 
   /* Find the common name which is bit-tested.  */
-- 
2.35.3


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches wrote:
> > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > With the proposed design of these switches, how would I restrict AVX10.1
> > > to particular AVX-512 subsets?
> > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > AVX512 related instructions.
> >
> > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > allowed, so in some cases it might prove difficult to guarantee this).
> > intel sde support avx10.1-256 target which can be used to validate the
> > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > register is used).
> > > I don’t see any other way of doing what you want within the constraints 
> > > of this design.
> > It looks like the requirement is that we want a
> > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > option that acts on the original -mavx512XXX option to produce
> > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > include avx512fp16 directives and thus not be backward compatible
> > SKX/CLX/ICX.
> 
> Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> flag that would restrict AVX512VL to 256bit, possibly using a common internal
> flag for this and the -mavx10.1-256 vector size effect.
> 
> Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> -mavx512vl-256?  Writing these the last looks most sensible to me?
> Note it should combine with -mavx512vl to -mavx512vl-256 to make
> -march=native -mavx512vl-256 work (I think we should also allow the
> flag together with -mavx10.1*?)
> 
> mavx512vl-256
> Target ...
> Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> the 256bit vector ISA subset of AVX512.

Wouldn't it be better to have it similarly to other ISA options as something
positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
EVEX.128)?
Have -mavx512f (and anything that implies it right now) imply also -mevex512
but allow -mno-evex512 which wouldn't unset everything dependent on
-mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
nothing is left.
TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
on 512-bit vector registers or 64-bit mask registers (in addition to the
other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
512-bit modes can be used etc.

Jakub



Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
>
> On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> wrote:
> > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > With the proposed design of these switches, how would I restrict AVX10.1
> > > > to particular AVX-512 subsets?
> > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > AVX512 related instructions.
> > >
> > > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > > allowed, so in some cases it might prove difficult to guarantee this).
> > > intel sde support avx10.1-256 target which can be used to validate the
> > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > register is used).
> > > > I don’t see any other way of doing what you want within the constraints 
> > > > of this design.
> > > It looks like the requirement is that we want a
> > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > option that acts on the original -mavx512XXX option to produce
> > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > include avx512fp16 directives and thus not be backward compatible
> > > SKX/CLX/ICX.
> >
> > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
> > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > internal
> > flag for this and the -mavx10.1-256 vector size effect.
> >
> > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > -march=native -mavx512vl-256 work (I think we should also allow the
> > flag together with -mavx10.1*?)
> >
> > mavx512vl-256
> > Target ...
> > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > the 256bit vector ISA subset of AVX512.
>
> Wouldn't it be better to have it similarly to other ISA options as something
> positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> EVEX.128)?
> Have -mavx512f (and anything that implies it right now) imply also -mevex512
> but allow -mno-evex512 which wouldn't unset everything dependent on
> -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> nothing is left.
> TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> on 512-bit vector registers or 64-bit mask registers (in addition to the
> other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> 512-bit modes can be used etc.
We have an undocumented option mavx10-max-512bit.

1314;; Only for implementation use
1315mavx10-max-512bit
1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
1317Indicates 512 bit vector width support for AVX10.

Currently it's only used for AVX10 only, maybe we can extend it to
existing AVX512*** FLAGS.
so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
compatible binaries.

>From the implementation perspective, we need to restrict all 512-bit
vector patterns/builtins/intrinsics under both AVX512XXX and
TARGET_AVX10_512BIT.
similar for register allocation, parameter passing, return value,
vector_mode_supported_p, gather/scatter hook, and all other hooks.
After that, the -mavx10-max-512bit will divide existing AVX512 into 2
parts, AVX512XXX-256, AVX512XXX-512.


>
> Jakub
>


-- 
BR,
Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> We have an undocumented option mavx10-max-512bit.

How it is called internally is one thing, but it is weird to use
avx10 in an option name which would be meant for finding common subset
of -mavx512xxx and -mavx10.1-256.

Jakub



Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread juzhe.zh...@rivai.ai
Hi, Richi.

I found when I try this in lcm.h:

namespace lcm {
void compute_available (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, sbitmap *,
   sbitmap *, sbitmap *);
} // namespace lcm


Then I need to add namespace lcm for these 3 functions in lcm.cc too.
However, they are not located in the same location. So I need to do this:

namspace lcm {
compute_antinout_edge
compute_earliest
}
...
namspace lcm {
compute_available
}

I think it's a little bit ugly since some functions in lcm.cc belongs to LCM 
namespace, some are not.

And we already have compute_available that has non LCM name.
May be this patch is better and OK? 

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-08-21 16:06
To: rguenther
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend
>> Thanks for the explanation, exporting the functions is OK.
Thanks.

>> It would be nice to put these internal functions into a class or a
>> namespace given their non LCM name.
Hi, Richi. I saw there is a function "extern void compute_available (sbitmap *, 
sbitmap *, sbitmap *, sbitmap *);"
which is already exported as global.

Do you mean I add those 2 functions (I export this patch) and 
"compute_avaialble" which has already been exported
into namespace lcm like this:

namespace lcm
{
  compute_available 
compute_antinout_edge
compute_earliest
}
?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-21 15:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi. Richi.
> I'd like to share more details that I want to do in VSETVL PASS.
> 
> Consider this following case:
> 
> for
>   for 
> for
>   ...
>  for
>  VSETVL demand: RATIO = 32 and TU policy.
> 
> For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
> "vsetvli e32,tu" to the outer-most loop.
> 
> However, for this case:
>   for
>   for 
> for
>   ...
>  for
>if (...)
>  VSETVL 1 demand: RATIO = 32 and TU policy.
>else if (...)
>  VSETVL 2 demand: SEW = 16.
>else
>  VSETVL 3 demand: MU policy.
> 
> 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 
> 1,  VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
> 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce 
> by RVV intrinsic and they are already in our RVV testsuite.
> 
> To get the optimal codegen for this case,  We need I call it as "Demand 
> fusion" which is fusing all "compatible" VSETVLs into a single VSETVL
> then set them to avoid redundant VSETVLs.
> 
> In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
> new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
> new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands 
> (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
> Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
> program will be transformed as:
> 
> VSETVL SEW = 16, LMUL = MF2, TU, MU
>   for
>   for 
> for
>   ...
>  for
>if (...) 
>  .   no vsetvl insn.
>else if (...)
>    no vsetvl insn.
> 
>else
>    no vsetvl insn.
> 
> So, how to do the demand fusion in this case? 
> Before this patch and following RISC-V refactor patch, I do it explictly with 
> my own decide algorithm.
> Meaning I calculate which location of the program to do the VSETVL fusion is 
> correct and optimal.
> 
> However, I found "compute_earliest" can help us to do the job for calculating 
> the location of the program to do VSETVL fusion and
> turns out it's a quite more reliable and reasonable approach than I do.
> 
> So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
> fusion) in RISC-V backend VSETVL PASS.
 
Thanks for the explanation, exporting the functions is OK.
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-21 15:09
> To: Juzhe-Zhong
> CC: gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
>  
> > This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
> > scope
> > which is going to be used in VSETVL PASS of RISC-V backend.
> > 
> > The demand fusion is the fusion of VSETVL information to emit VSETVL which 
> > dominate and pre-config for most
> > of the RVV instructions in order to elide redundant VSETVLs.
> > 
> > For exmaple:
> > 
> > for
> >  for
> >   for
> > if (cond}
> >   VSETVL demand 1: SEW/LMU

Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 4:38 PM Jakub Jelinek  wrote:
>
> On Mon, Aug 21, 2023 at 04:28:20PM +0800, Hongtao Liu wrote:
> > We have an undocumented option mavx10-max-512bit.
>
> How it is called internally is one thing, but it is weird to use
> avx10 in an option name which would be meant for finding common subset
> of -mavx512xxx and -mavx10.1-256.
We can have an alias for the name, but internally use the same bit
since they're doing the same thing.
And the option is somewhat orthogonal to  AVX512XXX/AVX10, it only
care about vector/kmask size.
>
> Jakub
>


-- 
BR,
Hongtao


Re: RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-21 Thread juzhe.zh...@rivai.ai
Yes. I wonder why some floating-point rounding mode has HAS_FRM, some doesn't 
have?



juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-08-21 15:10
To: juzhe.zh...@rivai.ai; gcc-patches
CC: Wang, Yanzhang; kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API
To double confirm, you mean this declaration ?
 
+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;
 
Pan
 
From: juzhe.zh...@rivai.ai  
Sent: Monday, August 21, 2023 2:40 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API
 
Why does this patch not have HAS_FRM?
 


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-08-17 16:05
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic 
API
From: Pan Li 
 
This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples
 
* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  2 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  1 +
.../riscv/rvv/base/float-point-wredusum.c | 33 +++
4 files changed, 37 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index abf03bab0da..5ee7d3119db 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2548,6 +2548,7 @@ static CONSTEXPR const freducop 
vfredosum_frm_obj;
static CONSTEXPR const reducop vfredmax_obj;
static CONSTEXPR const reducop vfredmin_obj;
static CONSTEXPR const widen_freducop vfwredusum_obj;
+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;
static CONSTEXPR const widen_freducop vfwredosum_obj;
static CONSTEXPR const widen_freducop 
vfwredosum_frm_obj;
static CONSTEXPR const vmv vmv_x_obj;
@@ -2810,6 +2811,7 @@ BASE (vfredmin)
BASE (vfwredosum)
BASE (vfwredosum_frm)
BASE (vfwredusum)
+BASE (vfwredusum_frm)
BASE (vmv_x)
BASE (vmv_s)
BASE (vfmv_f)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index c1bb164a712..69d4562091f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -247,6 +247,7 @@ extern const function_base *const vfredmin;
extern const function_base *const vfwredosum;
extern const function_base *const vfwredosum_frm;
extern const function_base *const vfwredusum;
+extern const function_base *const vfwredusum_frm;
extern const function_base *const vmv_x;
extern const function_base *const vmv_s;
extern const function_base *const vfmv_f;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index da1157f5a56..3ce06dc60b7 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -508,6 +508,7 @@ DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, 
wf_vs_ops)
DEF_RVV_FUNCTION (vfwredusum, reduc_alu, no_mu_preds, wf_vs_ops)
DEF_RVV_FUNCTION (vfwredosum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
+DEF_RVV_FUNCTION (vfwredusum_frm, reduc_alu_frm, no_mu_preds, wf_vs_ops)
/* 15. Vector Mask Instructions.  */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
new file mode 100644
index 000..6c888c10c0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
+
+#include "riscv_vector.h"
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1_rm (vfloat32m1_t op1, vfloat64m1_t op2,
+ size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm (op1, op2, 0, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_rm_m (vbool32_t mask, vfloat32m1_t op1,
+  vfloat64m1_t op2, size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1_rm_m (mask, op1, op2, 1, vl);
+}
+
+vfloat64m1_t
+test_riscv_vfwredusum_vs_f32m1_f64m1 (vfloat32m1_t op1, vfloat64m1_t op2,
+   size_t vl) {
+  return __riscv_vfwredusum_vs_f32m1_f64m1 (op1, op2, vl);
+}
+
+vfloat64m1_t
+test_vfwredusum_vs_f32m1_f64m1_m (vbool32_t mask, vfloat32m1_t op1,
+   vfloat64

[PATCH] debug/111080 - avoid outputting debug info for unused restrict qualified type

2023-08-21 Thread Richard Biener via Gcc-patches
The following applies some maintainance with respect to type qualifiers
and kinds added by later DWARF standards to prune_unused_types_walk.
The particular case in the bug is not handling (thus marking required)
all restrict qualified type DIEs.  I've found more DW_TAG_*_type that
are unhandled, looked up the DWARF docs and added them as well based
on common sense.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR debug/111080
* dwarf2out.cc (prune_unused_types_walk): Handle
DW_TAG_restrict_type, DW_TAG_shared_type, DW_TAG_atomic_type,
DW_TAG_immutable_type, DW_TAG_coarray_type, DW_TAG_unspecified_type
and DW_TAG_dynamic_type as to only output them when referenced.

* gcc.dg/debug/dwarf2/pr111080.c: New testcase.
---
 gcc/dwarf2out.cc |  7 +++
 gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c | 18 ++
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index fa0fe4c41bb..69018bde238 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -30141,8 +30141,13 @@ prune_unused_types_walk (dw_die_ref die)
 case DW_TAG_reference_type:
 case DW_TAG_rvalue_reference_type:
 case DW_TAG_volatile_type:
+case DW_TAG_restrict_type:
+case DW_TAG_shared_type:
+case DW_TAG_atomic_type:
+case DW_TAG_immutable_type:
 case DW_TAG_typedef:
 case DW_TAG_array_type:
+case DW_TAG_coarray_type:
 case DW_TAG_friend:
 case DW_TAG_enumeration_type:
 case DW_TAG_subroutine_type:
@@ -30151,6 +30156,8 @@ prune_unused_types_walk (dw_die_ref die)
 case DW_TAG_subrange_type:
 case DW_TAG_ptr_to_member_type:
 case DW_TAG_file_type:
+case DW_TAG_unspecified_type:
+case DW_TAG_dynamic_type:
   /* Type nodes are useful only when other DIEs reference them --- don't
 mark them.  */
   /* FALLTHROUGH */
diff --git a/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c 
b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
new file mode 100644
index 000..3949d7e7c64
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/debug/dwarf2/pr111080.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -gdwarf-3 -dA" } */
+
+struct foo {
+int field_number_1;
+int field_number_2;
+int field_number_3;
+int field_number_4;
+int field_number_5;
+};
+
+typedef int fun_t(struct foo *restrict);
+
+int main() {
+return 0;
+}
+
+/* { dg-final { scan-assembler-not "DW_TAG_structure_type" } } */
-- 
2.35.3


Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> I found when I try this in lcm.h:
> 
> namespace lcm {
> void compute_available (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
> void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
> void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, sbitmap 
> *,
>sbitmap *, sbitmap *);
> } // namespace lcm
> 
> 
> Then I need to add namespace lcm for these 3 functions in lcm.cc too.
> However, they are not located in the same location. So I need to do this:
> 
> namspace lcm {
> compute_antinout_edge
> compute_earliest
> }
> ...
> namspace lcm {
> compute_available
> }
> 
> I think it's a little bit ugly since some functions in lcm.cc belongs to LCM 
> namespace, some are not.
> 
> And we already have compute_available that has non LCM name.
> May be this patch is better and OK? 

The original patch is OK.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zh...@rivai.ai
> Date: 2023-08-21 16:06
> To: rguenther
> CC: gcc-patches; jeffreyalaw
> Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> >> Thanks for the explanation, exporting the functions is OK.
> Thanks.
> 
> >> It would be nice to put these internal functions into a class or a
> >> namespace given their non LCM name.
> Hi, Richi. I saw there is a function "extern void compute_available (sbitmap 
> *, sbitmap *, sbitmap *, sbitmap *);"
> which is already exported as global.
> 
> Do you mean I add those 2 functions (I export this patch) and 
> "compute_avaialble" which has already been exported
> into namespace lcm like this:
> 
> namespace lcm
> {
>   compute_available 
> compute_antinout_edge
> compute_earliest
> }
> ?
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-21 15:50
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; jeffreyalaw
> Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> PASS use in RISC-V backend
> On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi. Richi.
> > I'd like to share more details that I want to do in VSETVL PASS.
> > 
> > Consider this following case:
> > 
> > for
> >   for 
> > for
> >   ...
> >  for
> >  VSETVL demand: RATIO = 32 and TU policy.
> > 
> > For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will 
> > hoist "vsetvli e32,tu" to the outer-most loop.
> > 
> > However, for this case:
> >   for
> >   for 
> > for
> >   ...
> >  for
> >if (...)
> >  VSETVL 1 demand: RATIO = 32 and TU policy.
> >else if (...)
> >  VSETVL 2 demand: SEW = 16.
> >else
> >  VSETVL 3 demand: MU policy.
> > 
> > 'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 
> > 1,  VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
> > 'pre_edge_lcm_av' can only hoist one of them. Such case I can easily 
> > produce by RVV intrinsic and they are already in our RVV testsuite.
> > 
> > To get the optimal codegen for this case,  We need I call it as "Demand 
> > fusion" which is fusing all "compatible" VSETVLs into a single VSETVL
> > then set them to avoid redundant VSETVLs.
> > 
> > In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 
> > into new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single 
> > new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands 
> > (VSETVL 1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
> > Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
> > program will be transformed as:
> > 
> > VSETVL SEW = 16, LMUL = MF2, TU, MU
> >   for
> >   for 
> > for
> >   ...
> >  for
> >if (...) 
> >  .   no vsetvl insn.
> >else if (...)
> >    no vsetvl insn.
> > 
> >else
> >    no vsetvl insn.
> > 
> > So, how to do the demand fusion in this case? 
> > Before this patch and following RISC-V refactor patch, I do it explictly 
> > with my own decide algorithm.
> > Meaning I calculate which location of the program to do the VSETVL fusion 
> > is correct and optimal.
> > 
> > However, I found "compute_earliest" can help us to do the job for 
> > calculating the location of the program to do VSETVL fusion and
> > turns out it's a quite more reliable and reasonable approach than I do.
> > 
> > So that's why I export those 2 functions for us to be use in Phase 3 
> > (Demand fusion) in RISC-V backend VSETVL PASS.
>  
> Thanks for the explanation, exporting the functions is OK.
>  
> Richard.
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-21 15:09
> > To: Juzhe-Zhong
> > CC: gcc-patches; jeffreyalaw
> > Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
> > PASS use in RISC-V backend
> > On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
> >  
> > > This patch exports 'compute_

Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread ZiNgA BuRgA via Gcc-patches
Another way (not saying this is better, just throwing out ideas) is to 
break AVX10.1 into all the AVX-512 subsets.

So you'd have something like -mavx10.1-256-vl, -mavx10.1-512-vbmi etc.

* -mavx10.1-256  would effectively be an alias for all the 128+256-bit 
subsets, and set the __AVX10_1__ define
* -mavx512vbmi  would effectively be an alias for `-mavx10.1-128-vbmi 
-mavx10.1-256-vbmi -mavx10.1-512-vbmi` and set the __AVX512VBMI__ define 
(`-mavx10.1-512-vl` might not make much sense unless it implies AVX512F?)
* -mno-avx512vbmi  would similarly be an alias for 
`-mno-avx10.1-128-vbmi -mno-avx10.1-256-vbmi -mno-avx10.1-512-vbmi`; 
with this, `-mavx10.1-256 -mno-avx512vbmi` would make sense, even if 
unusual (enable all AVX10.1 but disable all VBMI)
* -mavx10.2-256  would act as a single feature, cementing in AVX10.2 
like the current AVX10.1 proposal, and AVX-512 subsets can't be turned off



On 21/08/2023 5:36 pm, Richard Biener wrote:

On Mon, Aug 21, 2023 at 3:20 AM Hongtao Liu via Gcc-patches
 wrote:

Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since that
would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
flag that would restrict AVX512VL to 256bit, possibly using a common internal
flag for this and the -mavx10.1-256 vector size effect.

Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
-mavx512vl-256?  Writing these the last looks most sensible to me?
Note it should combine with -mavx512vl to -mavx512vl-256 to make
-march=native -mavx512vl-256 work (I think we should also allow the
flag together with -mavx10.1*?)

mavx512vl-256
Target ...
Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
the 256bit vector ISA subset of AVX512.

Richard.





Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS use in RISC-V backend

2023-08-21 Thread Lehua Ding

Committed, thanks Richard.

On 2023/8/21 17:12, Richard Biener via Gcc-patches wrote:

On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:


Hi, Richi.

I found when I try this in lcm.h:

namespace lcm {
void compute_available (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
void compute_antinout_edge (sbitmap *, sbitmap *, sbitmap *, sbitmap *);
void compute_earliest (struct edge_list *, int, sbitmap *, sbitmap *, sbitmap *,
sbitmap *, sbitmap *);
} // namespace lcm


Then I need to add namespace lcm for these 3 functions in lcm.cc too.
However, they are not located in the same location. So I need to do this:

namspace lcm {
compute_antinout_edge
compute_earliest
}
...
namspace lcm {
compute_available
}

I think it's a little bit ugly since some functions in lcm.cc belongs to LCM 
namespace, some are not.

And we already have compute_available that has non LCM name.
May be this patch is better and OK?


The original patch is OK.

Richard.


Thanks.


juzhe.zh...@rivai.ai
  
From: juzhe.zh...@rivai.ai

Date: 2023-08-21 16:06
To: rguenther
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend

Thanks for the explanation, exporting the functions is OK.

Thanks.


It would be nice to put these internal functions into a class or a
namespace given their non LCM name.

Hi, Richi. I saw there is a function "extern void compute_available (sbitmap *, 
sbitmap *, sbitmap *, sbitmap *);"
which is already exported as global.

Do you mean I add those 2 functions (I export this patch) and 
"compute_avaialble" which has already been exported
into namespace lcm like this:

namespace lcm
{
   compute_available
compute_antinout_edge
compute_earliest
}
?

Thanks.


juzhe.zh...@rivai.ai
  
From: Richard Biener

Date: 2023-08-21 15:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; jeffreyalaw
Subject: Re: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL 
PASS use in RISC-V backend
On Mon, 21 Aug 2023, juzhe.zh...@rivai.ai wrote:
  

Hi. Richi.
I'd like to share more details that I want to do in VSETVL PASS.

Consider this following case:

for
   for
 for
   ...
  for
  VSETVL demand: RATIO = 32 and TU policy.

For this simple case, 'pre_edge_lcm_av' can perfectly work for us, will hoist 
"vsetvli e32,tu" to the outer-most loop.

However, for this case:
   for
   for
 for
   ...
  for
if (...)
  VSETVL 1 demand: RATIO = 32 and TU policy.
else if (...)
  VSETVL 2 demand: SEW = 16.
else
  VSETVL 3 demand: MU policy.

'pre_edge_lcm_av' is not sufficient to give us optimal codegen since VSETVL 1,  
VSETVL 2 and VSETVL 3 are 3 different VSETVL demands
'pre_edge_lcm_av' can only hoist one of them. Such case I can easily produce by 
RVV intrinsic and they are already in our RVV testsuite.

To get the optimal codegen for this case,  We need I call it as "Demand fusion" which is 
fusing all "compatible" VSETVLs into a single VSETVL
then set them to avoid redundant VSETVLs.

In this case, we should be able to fuse VSETVL 1, VSETVL 2 and VSETVL 3 into 
new VSETVL demand : SEW = 16, LMUL = MF2, TU, MU into a single
new VSETVL demand. Instead of giving 'pre_edge_lcm_av' 3 VSETVL demands (VSETVL 
1/2/3). I give 'pre_edge_lcm_av' only single 1 new VSETVL demand.
Then, LCM PRE can hoist such fused VSETVL to the outer-most loop. So the 
program will be transformed as:

VSETVL SEW = 16, LMUL = MF2, TU, MU
   for
   for
 for
   ...
  for
if (...)
  .   no vsetvl insn.
else if (...)
    no vsetvl insn.

else
    no vsetvl insn.

So, how to do the demand fusion in this case?
Before this patch and following RISC-V refactor patch, I do it explictly with 
my own decide algorithm.
Meaning I calculate which location of the program to do the VSETVL fusion is 
correct and optimal.

However, I found "compute_earliest" can help us to do the job for calculating 
the location of the program to do VSETVL fusion and
turns out it's a quite more reliable and reasonable approach than I do.

So that's why I export those 2 functions for us to be use in Phase 3 (Demand 
fusion) in RISC-V backend VSETVL PASS.
  
Thanks for the explanation, exporting the functions is OK.
  
Richard.
  

Thanks.


juzhe.zh...@rivai.ai
  
From: Richard Biener

Date: 2023-08-21 15:09
To: Juzhe-Zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] LCM: Export 2 helpful functions as global for VSETVL PASS 
use in RISC-V backend
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:
  

This patch exports 'compute_antinout_edge' and 'compute_earliest' as global 
scope
which is going to be used in VSETVL PASS of RISC-V backend.

The demand fusion is the fusion of VSETVL information to emit VSETVL which 
dominate and pre-config for most
of the RVV instructions in order to elide redundant VSETVLs.

For exmaple:

for
  for
   for
 if (cond}
   VSETVL demand 1: SEW/LMUL = 16 and TU 

Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
>
> On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> >
> > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > wrote:
> > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > With the proposed design of these switches, how would I restrict 
> > > > > AVX10.1
> > > > > to particular AVX-512 subsets?
> > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > AVX512 related instructions.
> > > >
> > > > > We’ve been taking these cases as bugs (but yes, intrinsics are still 
> > > > > allowed, so in some cases it might prove difficult to guarantee this).
> > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > register is used).
> > > > > I don’t see any other way of doing what you want within the 
> > > > > constraints of this design.
> > > > It looks like the requirement is that we want a
> > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > option that acts on the original -mavx512XXX option to produce
> > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > include avx512fp16 directives and thus not be backward compatible
> > > > SKX/CLX/ICX.
> > >
> > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > that
> > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a new
> > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > internal
> > > flag for this and the -mavx10.1-256 vector size effect.
> > >
> > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > flag together with -mavx10.1*?)
> > >
> > > mavx512vl-256
> > > Target ...
> > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > the 256bit vector ISA subset of AVX512.
> >
> > Wouldn't it be better to have it similarly to other ISA options as something
> > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > EVEX.128)?
> > Have -mavx512f (and anything that implies it right now) imply also -mevex512
> > but allow -mno-evex512 which wouldn't unset everything dependent on
> > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > nothing is left.
> > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which operate
> > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > 512-bit modes can be used etc.
> We have an undocumented option mavx10-max-512bit.
>
> 1314;; Only for implementation use
> 1315mavx10-max-512bit
> 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> 1317Indicates 512 bit vector width support for AVX10.

Ah, missed that, but ...

> Currently it's only used for AVX10 only, maybe we can extend it to
> existing AVX512*** FLAGS.
> so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> compatible binaries.

... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
apply, so what is it then?

If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
-mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
I think we opened up too many holes here and the options should be fixed
to decouple the size from the base ISA.

What variable we map this to internally doesn't really matter but yes,
we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 512-enabled-flag

Richard.

> From the implementation perspective, we need to restrict all 512-bit
> vector patterns/builtins/intrinsics under both AVX512XXX and
> TARGET_AVX10_512BIT.
> similar for register allocation, parameter passing, return value,
> vector_mode_supported_p, gather/scatter hook, and all other hooks.
> After that, the -mavx10-max-512bit will divide existing AVX512 into 2
> parts, AVX512XXX-256, AVX512XXX-512.
>
>
> >
> > Jakub
> >
>
>
> --
> BR,
> Hongtao


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, Aug 21, 2023 at 11:34 AM Richard Biener
 wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict 
> > > > > > AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are 
> > > > > > still allowed, so in some cases it might prove difficult to 
> > > > > > guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the 
> > > > > > constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > > that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a 
> > > > new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > > internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as 
> > > something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also 
> > > -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which 
> > > operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.

Like how about -mavx10.1 -mavx10.2 plus a -mavx10-512 where
-mavx10.[12...] enables just 256 bits (the intended default as Intel thinks)
and -mavx10-512 will enable 512 bits but for the whole selected ISA
(maybe have it enable -max10.1 if that wasn't specified, maybe not).
We can then allow -mno-avx10-512 also with AVX512?

>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 
> 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we need to restrict all 512-bit
> > vector patterns/builtins/intrinsics under both AVX512XXX and
> > TARGET_AVX10_512BIT.
> > similar for register allocati

[committed] libstdc++: Remove reliance on unspecified behaviour in std::rethrow_if_nested test

2023-08-21 Thread Jonathan Wakely via Gcc-patches
This is the patch resolving the non-portable test that Iain raised in:
https://gcc.gnu.org/pipermail/libstdc++/2023-August/056534.html

Tested x86_64-linux. Pushed to trunk.

Backports would be OK, but I don't think they are needed.

-- >8 --

This test case calls std::set_terminate while there is an active
exception. Since LWG 2111 it is unspecified which terminate handler is
used when std::nested_exception::rethrow_nested() calls std::terminate.
With libsupc++ the global handler changed by std::set_terminate is used,
but libc++abi uses the active exception's handler (the one that was
current when the exception was first thrown).

Adjust the test case so that it works with either implementation choice.
So that the process doesn't exit cleanly if std::terminate happens
sooner than expected, use a global variable to control when the "clean
terminate" behaviour happens.

libstdc++-v3/ChangeLog:

* testsuite/18_support/nested_exception/rethrow_if_nested-term.cc:
Call std::set_terminate before throwing the nested exception.
---
 .../nested_exception/rethrow_if_nested-term.cc | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git 
a/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc 
b/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
index 3bfc7ab9943..b221eea3178 100644
--- 
a/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
+++ 
b/libstdc++-v3/testsuite/18_support/nested_exception/rethrow_if_nested-term.cc
@@ -4,25 +4,33 @@
 #include 
 #include 
 
-[[noreturn]] void terminate_cleanly() noexcept { std::exit(0); }
+int exit_status = 1;
+[[noreturn]] void terminate_cleanly() noexcept { std::exit(exit_status); }
 
 struct A { virtual ~A() = default; };
 
 int main()
 {
+  std::set_terminate(terminate_cleanly);
   try
   {
 // At this point std::current_exception() == nullptr so the
 // std::nested_exception object is empty.
 std::throw_with_nested(A{});
+
+// Should not reach this point.
+std::abort();
   }
   catch (const A& a)
   {
-std::set_terminate(terminate_cleanly);
+// This means the expected std::terminate() call will exit cleanly,
+// so this test will PASS.
+exit_status = 0;
+
 std::rethrow_if_nested(a);
 #if __cpp_rtti
 // No nested exception, so trying to rethrow it calls std::terminate()
-// which calls std::exit(0). Shoud not reach this point.
+// which calls std::exit(0). Should not reach this point.
 std::abort();
 #else
 // Without RTTI we can't dynamic_cast(&a)
-- 
2.41.0



[PATCH] tree-optimization/111082 - bogus promoted min

2023-08-21 Thread Richard Biener via Gcc-patches
vectorize_slp_instance_root_stmt promotes operations with undefined
overflow to unsigned arithmetic but fails to consider operations
that do not overflow like MIN which it turned into MIN with wrong
signedness and in the case of the PR an unsupported operation.
The following rectifies this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/111082
* tree-vect-slp.cc (vectorize_slp_instance_root_stmt): Only
pun operations that can overflow.

* gcc.dg/pr111082.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr111082.c | 10 ++
 gcc/tree-vect-slp.cc|  3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr111082.c

diff --git a/gcc/testsuite/gcc.dg/pr111082.c b/gcc/testsuite/gcc.dg/pr111082.c
new file mode 100644
index 000..46e36e320d1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr111082.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fno-vect-cost-model" } */
+/* { dg-additional-options "-mavx512f" { target { x86_64-*-* i?86-*-* } } } */
+
+long minarray2(const long *input)
+{
+  if (input[0] < input[1])
+return input[0] ;
+  return input[1];
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 51f3466805c..e8484401bc9 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9180,7 +9180,8 @@ vectorize_slp_instance_root_stmt (slp_tree node, 
slp_instance instance)
   tree vectype = TREE_TYPE (vec_def);
   tree compute_vectype = vectype;
   bool pun_for_overflow_p = (ANY_INTEGRAL_TYPE_P (vectype)
-&& TYPE_OVERFLOW_UNDEFINED (vectype));
+&& TYPE_OVERFLOW_UNDEFINED (vectype)
+&& operation_can_overflow (reduc_code));
   if (pun_for_overflow_p)
{
  compute_vectype = unsigned_type_for (vectype);
-- 
2.35.3


Re: Intel AVX10.1 Compiler Design and Support

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 5:35 PM Richard Biener
 wrote:
>
> On Mon, Aug 21, 2023 at 10:28 AM Hongtao Liu  wrote:
> >
> > On Mon, Aug 21, 2023 at 4:09 PM Jakub Jelinek  wrote:
> > >
> > > On Mon, Aug 21, 2023 at 09:36:16AM +0200, Richard Biener via Gcc-patches 
> > > wrote:
> > > > > On Sun, Aug 20, 2023 at 6:44 AM ZiNgA BuRgA via Gcc-patches
> > > > >  wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > With the proposed design of these switches, how would I restrict 
> > > > > > AVX10.1
> > > > > > to particular AVX-512 subsets?
> > > > > We can't, avx10.1 is taken as an indivisible ISA which contains all
> > > > > AVX512 related instructions.
> > > > >
> > > > > > We’ve been taking these cases as bugs (but yes, intrinsics are 
> > > > > > still allowed, so in some cases it might prove difficult to 
> > > > > > guarantee this).
> > > > > intel sde support avx10.1-256 target which can be used to validate the
> > > > > binary(if there's invalid 512-bit vector register or 64-bit kmask
> > > > > register is used).
> > > > > > I don’t see any other way of doing what you want within the 
> > > > > > constraints of this design.
> > > > > It looks like the requirement is that we want a
> > > > > -mavx10-vector-width=256(or maybe reuse -mprefer-vector-width=256)
> > > > > option that acts on the original -mavx512XXX option to produce
> > > > > avx10.1-256 compatible binary. we can't use -mavx10.1-256 since it may
> > > > > include avx512fp16 directives and thus not be backward compatible
> > > > > SKX/CLX/ICX.
> > > >
> > > > Yes.  Note we cannot really re-purpose -mprefer-vector-width=256 since 
> > > > that
> > > > would also make uses of 512bit intrinsics ill-formed.  So we'd need a 
> > > > new
> > > > flag that would restrict AVX512VL to 256bit, possibly using a common 
> > > > internal
> > > > flag for this and the -mavx10.1-256 vector size effect.
> > > >
> > > > Maybe -mdisable-vector-width-512 or -mavx512vl-for-avx10.1-256 or
> > > > -mavx512vl-256?  Writing these the last looks most sensible to me?
> > > > Note it should combine with -mavx512vl to -mavx512vl-256 to make
> > > > -march=native -mavx512vl-256 work (I think we should also allow the
> > > > flag together with -mavx10.1*?)
> > > >
> > > > mavx512vl-256
> > > > Target ...
> > > > Disable the 512bit vector ISA subset of AVX512 or AVX10, enable
> > > > the 256bit vector ISA subset of AVX512.
> > >
> > > Wouldn't it be better to have it similarly to other ISA options as 
> > > something
> > > positive, say -mevex512 (the ISA docs talk about EVEX.512, EVEX.256 and
> > > EVEX.128)?
> > > Have -mavx512f (and anything that implies it right now) imply also 
> > > -mevex512
> > > but allow -mno-evex512 which wouldn't unset everything dependent on
> > > -mavx512f.  There is one gotcha, if -mavx512vl isn't enabled in the end,
> > > then -mavx512f -mno-evex512 should disable whole TARGET_AVX512F because
> > > nothing is left.
> > > TARGET_EVEX512 then would guard all TARGET_AVX512* intrinsics which 
> > > operate
> > > on 512-bit vector registers or 64-bit mask registers (in addition to the
> > > other TARGET_AVX512* options, perhaps except TARGET_AVX512F), whether the
> > > 512-bit modes can be used etc.
> > We have an undocumented option mavx10-max-512bit.
> >
> > 1314;; Only for implementation use
> > 1315mavx10-max-512bit
> > 1316Target Mask(ISA2_AVX10_512BIT) Var(ix86_isa_flags2) Undocumented Save
> > 1317Indicates 512 bit vector width support for AVX10.
>
> Ah, missed that, but ...
>
> > Currently it's only used for AVX10 only, maybe we can extend it to
> > existing AVX512*** FLAGS.
> > so users can use -mavx512XXX -mno-avx10-max-512bit to get avx10.1-256
> > compatible binaries.
>
> ... -mno-avx10-max-512bit sounds awkward, no-..-max implies the max doesn't
> apply, so what is it then?
>
> If you think -mavx512vl-256 isn't good then maybe -mavx-width-512
> and -mno-avx-width-512 would be better (applying to both avx512 and avx10).
> I chose -mavx512vl-256 because of the existing -mavx10.1-256.  Btw,
> will we then have -mavx10.2-256 as well?  Do we allow -mavx10.1-512
> -mavx10.2-256 then, thus just enable 256bit for 10.2 extensions to 10.1?!
We're only allowing a single vector width.
-mavx10.1-512 mavx10.2-256 will only enable -mavx10.2-256 + -mavx10.1-256.
> I think we opened up too many holes here and the options should be fixed
> to decouple the size from the base ISA.
I see, we can try to use -mavx-max-512bit(maybe another name) to
decouple the size from the base ISA.
And make
 -mavx10.1-256 just implies all -mavx512XXX + -mno-avx-max-512bit,
 -mavx10.1-512 implies -mavx512XXX + mavx-max-512bit.
then -mavx512vl-256 is just equal to -mavx512vl + mno-avx-max-512bit.

Lots of work to do, but still not too late for GCC14.1
>
> What variable we map this to internally doesn't really matter but yes,
> we'd need to guard 512bit patterns with (AVX512VL || AVX10) && 
> 512-enabled-flag
>
> Richard.
>
> > From the implementation perspective, we need

[PATCH] Adjust testcase for Intel GDS.

2023-08-21 Thread liuhongt via Gcc-patches
gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512f-pr88464-2.c: Add -mgather to
options.
* gcc.target/i386/avx512f-pr88464-3.c: Ditto.
* gcc.target/i386/avx512f-pr88464-4.c: Ditto.
* gcc.target/i386/avx512f-pr88464-6.c: Ditto.
* gcc.target/i386/avx512f-pr88464-7.c: Ditto.
* gcc.target/i386/avx512f-pr88464-8.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-10.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-12.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-13.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-14.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-15.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-16.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-2.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-4.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-5.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-6.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-7.c: Ditto.
* gcc.target/i386/avx512vl-pr88464-8.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-2.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-3.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-4.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-6.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-7.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512f-pr88464-8.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-10.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-12.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-13.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-14.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-15.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-16.c | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-2.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-4.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-5.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-6.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-7.c  | 2 +-
 gcc/testsuite/gcc.target/i386/avx512vl-pr88464-8.c  | 2 +-
 18 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-2.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-2.c
index 845bf509d82..28827dbd75d 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-2.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-2.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/88464 */
 /* { dg-do run { target { avx512f } } } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 
-mtune=skylake-avx512" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -mtune=skylake-avx512 
-mgather" } */
 
 #include "avx512f-check.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-3.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-3.c
index 9eda4aa9b13..2df64bfa063 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-3.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-3.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/88464 */
 /* { dg-do compile } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -mtune=skylake-avx512 
-fdump-tree-vect-details" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -mtune=skylake-avx512 
-fdump-tree-vect-details -mgather" } */
 /* { dg-final { scan-tree-dump-times "loop vectorized using 64 byte vectors" 4 
"vect" } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 4 "vect" 
} } */
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-4.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-4.c
index e347e63b17a..173858aadd5 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-4.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-4.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/88464 */
 /* { dg-do run { target { avx512f } } } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 
-mtune=skylake-avx512" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -mtune=skylake-avx512 
-mgather" } */
 
 #include "avx512f-check.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-6.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-6.c
index 9ebb72a5bae..0adf3b6726a 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-6.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-6.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/88464 */
 /* { dg-do run { target { avx512f } } } */
-/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 
-mtune=skylake-avx512" } */
+/* { dg-options "-O3 -mavx512f -mprefer-vector-width=512 -mtune=skylake-avx512 
-mgather" } */
 
 #include "avx512f-check.h"
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-7.c 
b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-7.c
index 738640c2bf5..471ebc1676d 100644
--- a/gcc/testsuite/gcc.target/i386/avx512f-pr88464-7.c
+++ b/gcc/testsuite/gcc.target/i386/avx512f-pr88464-7.c
@@ -1,6 +1,6 @@
 /* PR tree-optimization/88464 */
 /* { dg-do compile

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 21 Aug 2023 at 12:26, Richard Biener  wrote:
>
> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>
> > On Fri, 18 Aug 2023 at 14:52, Richard Biener  wrote:
> > >
> > > On Fri, 18 Aug 2023, Richard Sandiford wrote:
> > >
> > > > Richard Biener  writes:
> > > > > The following avoids running into somehow flawed logic in 
> > > > > fold_vec_perm
> > > > > for non-VLA vectors.
> > > > >
> > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> > > > >
> > > > > Richard.
> > > > >
> > > > > PR tree-optimization/111048
> > > > > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
> > > > > vectors first.
> > > > >
> > > > > * gcc.dg/torture/pr111048.c: New testcase.
> > > >
> > > > Please don't do this as a permanent thing.  It was a deliberate choice
> > > > to have the is_constant be the fallback, so that the "generic" (VLA+VLS)
> > > > logic gets more coverage.  Like you say, if something is wrong for VLS
> > > > then the chances are that it's also wrong for VLA.
> > >
> > > Sure, feel free to undo this change together with the fix for the
> > > VLA case.
> > Hi,
> > The attached patch reverts the workaround, and fixes the issue.
> > Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
> > x64_64-linux-gnu.
> > OK to commit ?
>
> OK.
Thanks, committed to trunk in 649388462e9a3c2de0b90ce525de8044704cc521

Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > > Thanks,
> > > > Richard
> > > >
> > > >
> > > > > ---
> > > > >  gcc/fold-const.cc   | 12 ++--
> > > > >  gcc/testsuite/gcc.dg/torture/pr111048.c | 24 
> > > > >  2 files changed, 30 insertions(+), 6 deletions(-)
> > > > >  create mode 100644 gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > >
> > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> > > > > index 5c51c9d91be..144fd7481b3 100644
> > > > > --- a/gcc/fold-const.cc
> > > > > +++ b/gcc/fold-const.cc
> > > > > @@ -10625,6 +10625,11 @@ fold_vec_perm_cst (tree type, tree arg0, 
> > > > > tree arg1, const vec_perm_indices &sel,
> > > > >unsigned res_npatterns, res_nelts_per_pattern;
> > > > >unsigned HOST_WIDE_INT res_nelts;
> > > > >
> > > > > +  if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > > +{
> > > > > +  res_npatterns = res_nelts;
> > > > > +  res_nelts_per_pattern = 1;
> > > > > +}
> > > > >/* (1) If SEL is a suitable mask as determined by
> > > > >   valid_mask_for_fold_vec_perm_cst_p, then:
> > > > >   res_npatterns = max of npatterns between ARG0, ARG1, and SEL
> > > > > @@ -10634,7 +10639,7 @@ fold_vec_perm_cst (tree type, tree arg0, tree 
> > > > > arg1, const vec_perm_indices &sel,
> > > > >   res_npatterns = nelts in result vector.
> > > > >   res_nelts_per_pattern = 1.
> > > > >   This exception is made so that VLS ARG0, ARG1 and SEL work as 
> > > > > before.  */
> > > > > -  if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason))
> > > > > +  else if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, 
> > > > > reason))
> > > > >  {
> > > > >res_npatterns
> > > > > = std::max (VECTOR_CST_NPATTERNS (arg0),
> > > > > @@ -10648,11 +10653,6 @@ fold_vec_perm_cst (tree type, tree arg0, 
> > > > > tree arg1, const vec_perm_indices &sel,
> > > > >
> > > > >res_nelts = res_npatterns * res_nelts_per_pattern;
> > > > >  }
> > > > > -  else if (TYPE_VECTOR_SUBPARTS (type).is_constant (&res_nelts))
> > > > > -{
> > > > > -  res_npatterns = res_nelts;
> > > > > -  res_nelts_per_pattern = 1;
> > > > > -}
> > > > >else
> > > > >  return NULL_TREE;
> > > > >
> > > > > diff --git a/gcc/testsuite/gcc.dg/torture/pr111048.c 
> > > > > b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > > new file mode 100644
> > > > > index 000..475978aae2b
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/torture/pr111048.c
> > > > > @@ -0,0 +1,24 @@
> > > > > +/* { dg-do run } */
> > > > > +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */
> > > > > +
> > > > > +typedef unsigned char u8;
> > > > > +
> > > > > +__attribute__((noipa))
> > > > > +static void check(const u8 * v) {
> > > > > +if (*v != 15) __builtin_trap();
> > > > > +}
> > > > > +
> > > > > +__attribute__((noipa))
> > > > > +static void bug(void) {
> > > > > +u8 in_lanes[32];
> > > > > +for (unsigned i = 0; i < 32; i += 2) {
> > > > > +  in_lanes[i + 0] = 0;
> > > > > +  in_lanes[i + 1] = ((u8)0xff) >> (i & 7);
> > > > > +}
> > > > > +
> > > > > +check(&in_lanes[13]);
> > > > > +  }
> > > > > +
> > > > > +int main() {
> > > > > +bug();
> > > > > +}
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener 
> SUSE Software Solutions German

Re: [PATCH] tree-optimization/111048 - avoid flawed logic in fold_vec_perm

2023-08-21 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> On Mon, 21 Aug 2023 at 12:26, Richard Biener  wrote:
>>
>> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>>
>> > On Fri, 18 Aug 2023 at 14:52, Richard Biener  wrote:
>> > >
>> > > On Fri, 18 Aug 2023, Richard Sandiford wrote:
>> > >
>> > > > Richard Biener  writes:
>> > > > > The following avoids running into somehow flawed logic in 
>> > > > > fold_vec_perm
>> > > > > for non-VLA vectors.
>> > > > >
>> > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>> > > > >
>> > > > > Richard.
>> > > > >
>> > > > > PR tree-optimization/111048
>> > > > > * fold-const.cc (fold_vec_perm_cst): Check for non-VLA
>> > > > > vectors first.
>> > > > >
>> > > > > * gcc.dg/torture/pr111048.c: New testcase.
>> > > >
>> > > > Please don't do this as a permanent thing.  It was a deliberate choice
>> > > > to have the is_constant be the fallback, so that the "generic" 
>> > > > (VLA+VLS)
>> > > > logic gets more coverage.  Like you say, if something is wrong for VLS
>> > > > then the chances are that it's also wrong for VLA.
>> > >
>> > > Sure, feel free to undo this change together with the fix for the
>> > > VLA case.
>> > Hi,
>> > The attached patch reverts the workaround, and fixes the issue.
>> > Bootstrapped+tested on aarch64-linux-gnu with and without SVE, and
>> > x64_64-linux-gnu.
>> > OK to commit ?
>>
>> OK.
> Thanks, committed to trunk in 649388462e9a3c2de0b90ce525de8044704cc521

Thanks for the patch.  Please remember to close the PR too.

Richard


[PATCH V5] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-21 Thread Juzhe-Zhong
Co-Authored-By: Kewen.Lin 

Hi, @Richi and @Richard, base on previous disscussion, I simpily fix issuses for
powerpc and s390 with your suggestions:

-  machine_mode len_load_mode = get_len_load_store_mode
-(loop_vinfo->vector_mode, true).require ();
-  machine_mode len_store_mode = get_len_load_store_mode
-(loop_vinfo->vector_mode, false).require ();
+  machine_mode len_load_mode, len_store_mode;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
+.exists (&len_load_mode))
+return false;
+  if (!get_len_load_store_mode (loop_vinfo->vector_mode, false)
+.exists (&len_store_mode))
+return false;

Hi, @Kewen and @Stefan

Could you test this patch again ? Thanks.

Co-Authored-By: Kewen.Lin 

gcc/ChangeLog:

* tree-vect-loop.cc (vect_verify_loop_lens): Add exists check.
(vectorizable_live_operation): Add live vectorization for length loop 
control.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/partial/live-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/live_run-1.c: New test.

---
 .../riscv/rvv/autovec/partial/live-1.c| 34 +++
 .../riscv/rvv/autovec/partial/live_run-1.c| 35 
 gcc/tree-vect-loop.cc | 89 ++-
 3 files changed, 138 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
new file mode 100644
index 000..75fa2eba8cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
+
+#include 
+
+#define EXTRACT_LAST(TYPE) 
\
+  TYPE __attribute__ ((noinline, noclone)) 
\
+  test_##TYPE (TYPE *x, int n, TYPE value) 
\
+  {
\
+TYPE last; 
\
+for (int j = 0; j < n; ++j)
\
+  {
\
+   last = x[j];   \
+   x[j] = last * value;   \
+  }
\
+return last;   
\
+  }
+
+#define TEST_ALL(T)
\
+  T (int8_t)   
\
+  T (int16_t)  
\
+  T (int32_t)  
\
+  T (int64_t)  
\
+  T (uint8_t)  
\
+  T (uint16_t) 
\
+  T (uint32_t) 
\
+  T (uint64_t) 
\
+  T (_Float16) 
\
+  T (float)
\
+  T (double)
+
+TEST_ALL (EXTRACT_LAST)
+
+/* { dg-final { scan-tree-dump-times "\.VEC_EXTRACT" 10 "optimized" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
new file mode 100644
index 000..42913a112c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
@@ -0,0 +1,35 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
+
+#include "live-1.c"
+
+#define N 107
+#define OP 70
+
+#define TEST_LOOP(TYPE)\
+  {\
+TYPE a[N]; \
+for (int i = 0; i < N; ++i)\
+  {\
+   a[i] = i * 2 + (i % 3); \
+   asm volatile ("" ::: "memory"); \
+  }\
+TYPE expected = a[N - 1];  \
+TYPE res = test_##TYPE (a, N, OP); \
+if (res != expected)   \
+  __builtin_abort ();  

[PATCH] Fix gcc.dg/vect/bb-slp-46.c FAIL

2023-08-21 Thread Richard Biener via Gcc-patches
When relaxing vectorization of possibly overflowing reductions I
failed to update a testcase that will now vectorize and no longer
test for what it was written for.  The following replaces the
vectorizable add with a division.

Tested on x86_64-unknown-linux-gnu, pushed.

* gcc.dg/vect/bb-slp-46.c: Use division instead of addition
to avoid reduction vectorization.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-46.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
index 98b29062a19..4eceea44efc 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
@@ -15,7 +15,7 @@ int foo ()
   a[1] = tem1;
   a[2] = tem2;
   a[3] = tem3;
-  return temx + temy;
+  return temx / temy;
 }
 
 /* We should extract the live lane from the vectorized add rather than
-- 
2.35.3


Re: [PATCH][RFC] tree-optimization/92335 - Improve sinking heuristics for vectorization

2023-08-21 Thread Prathamesh Kulkarni via Gcc-patches
On Mon, 21 Aug 2023 at 12:27, Richard Biener  wrote:
>
> On Sat, 19 Aug 2023, Prathamesh Kulkarni wrote:
>
> > On Fri, 18 Aug 2023 at 17:11, Richard Biener  wrote:
> > >
> > > On Fri, 18 Aug 2023, Richard Biener wrote:
> > >
> > > > On Thu, 17 Aug 2023, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Tue, 15 Aug 2023 at 14:28, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Richard Biener  writes:
> > > > > > > On Mon, 14 Aug 2023, Prathamesh Kulkarni wrote:
> > > > > > >> On Mon, 7 Aug 2023 at 13:19, Richard Biener 
> > > > > > >>  wrote:
> > > > > > >> > It doesn't seem to make a difference for x86.  That said, the 
> > > > > > >> > "fix" is
> > > > > > >> > probably sticking the correct target on the dump-check, it 
> > > > > > >> > seems
> > > > > > >> > that vect_fold_extract_last is no longer correct here.
> > > > > > >> Um sorry, I did go thru various checks in target-supports.exp, 
> > > > > > >> but not
> > > > > > >> sure which one will be appropriate for this case,
> > > > > > >> and am stuck here :/ Could you please suggest how to proceed ?
> > > > > > >
> > > > > > > Maybe Richard S. knows the magic thing to test, he originally
> > > > > > > implemented the direct conversion support.  I suggest to implement
> > > > > > > such dg-checks if they are not present (I can't find them),
> > > > > > > possibly quite specific to the modes involved (like we have
> > > > > > > other checks with _qi_to_hi suffixes, for float modes maybe
> > > > > > > just _float).
> > > > > >
> > > > > > Yeah, can't remember specific selectors for that feature.  TBH I 
> > > > > > think
> > > > > > most (all?) of the tests were AArch64-specific.
> > > > > Hi,
> > > > > As Richi mentioned above, the test now vectorizes on AArch64 because
> > > > > it has support for direct conversion
> > > > > between vectors while x86 doesn't. IIUC this is because
> > > > > supportable_convert_operation returns true
> > > > > for V4HI -> V4SI on Aarch64 since it can use extend_v4hiv4si2 for
> > > > > doing the conversion ?
> > > > >
> > > > > In the attached patch, I added a new target check vect_extend which
> > > > > (currently) returns 1 only for aarch64*-*-*,
> > > > > which makes the test PASS on both the targets, altho I am not sure if
> > > > > this is entirely correct.
> > > > > Does the patch look OK ?
> > > >
> > > > Can you make vect_extend more specific, say vect_extend_hi_si or
> > > > what is specifically needed here?  Note I'll have to investigate
> > > > why x86 cannot vectorize here since in fact it does have
> > > > the extend operation ... it might be also worth splitting the
> > > > sign/zero extend case, so - vect_sign_extend_hi_si or
> > > > vect_extend_short_int?
> > >
> > > And now having anaylzed _why_ x86 doesn't vectorize it's rather
> > > why we get this vectorized with NEON which is because
> > >
> > > static opt_machine_mode
> > > aarch64_vectorize_related_mode (machine_mode vector_mode,
> > > scalar_mode element_mode,
> > > poly_uint64 nunits)
> > > {
> > > ...
> > >   /* Prefer to use 1 128-bit vector instead of 2 64-bit vectors.  */
> > >   if (TARGET_SIMD
> > >   && (vec_flags & VEC_ADVSIMD)
> > >   && known_eq (nunits, 0U)
> > >   && known_eq (GET_MODE_BITSIZE (vector_mode), 64U)
> > >   && maybe_ge (GET_MODE_BITSIZE (element_mode)
> > >* GET_MODE_NUNITS (vector_mode), 128U))
> > > {
> > >   machine_mode res = aarch64_simd_container_mode (element_mode, 128);
> > >   if (VECTOR_MODE_P (res))
> > > return res;
> > >
> > > which makes us get a V4SImode vector for a V4HImode loop vector_mode.
> > Thanks for the explanation!
> > >
> > > So I think the appropriate effective dejagnu target is
> > > aarch64-*-* (there's none specifically to advsimd, not sure if one
> > > can disable that?)
> > The attached patch uses aarch64*-*-* target check, and additionally
> > for SVE (and other targets supporting vect_fold_extract_last) it
> > checks
> > if the condition reduction was carried out using FOLD_EXTRACT_LAST.
> > Does that look OK ?
>
> Works for me.
Thanks, committed to trunk in dd606dc7c7e49feb7a900902ec6d35b421789173

Thanks,
Prathamesh
>
> Richard.
>
> > Thanks,
> > Prathamesh
> > >
> >
> > > Richard.
> > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> >
>
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] Fix gcc.dg/vect/bb-slp-subgroups-2.c with 256bit vectors

2023-08-21 Thread Richard Biener via Gcc-patches
The following adds vect128, vect256 and vect512 effective targets
and adjusts gcc.dg/vect/bb-slp-subgroups-2.c accordingly.

Tested on x86_64-unknown-linux-gnu, pushed.

* gcc.dg/vect/bb-slp-subgroups-2.c: Properly handle the
vect256 case.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c |  3 ++-
 gcc/testsuite/lib/target-supports.exp  | 18 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
index ead8d92f202..9431bcb9d5c 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c
@@ -39,4 +39,5 @@ main (int argc, char **argv)
 }
 
 /* { dg-final { scan-tree-dump-times "Basic block will be vectorized using 
SLP" 1 "slp2" } } */
-/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { 
target { ! vect256 } } } } */
+/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { 
target { vect256 } } } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 5b5f8655184..d4623ee6b45 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8634,6 +8634,24 @@ proc check_effective_target_vect_variable_length { } {
 return [expr { [lindex [available_vector_sizes] 0] == 0 }]
 }
 
+# Return 1 if the target supports vectors of 512 bits.
+
+proc check_effective_target_vect512 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 512] >= 0 }]
+}
+
+# Return 1 if the target supports vectors of 256 bits.
+
+proc check_effective_target_vect256 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 256] >= 0 }]
+}
+
+# Return 1 if the target supports vectors of 128 bits.
+
+proc check_effective_target_vect128 { } {
+return [expr { [lsearch -exact [available_vector_sizes] 128] >= 0 }]
+}
+
 # Return 1 if the target supports vectors of 64 bits.
 
 proc check_effective_target_vect64 { } {
-- 
2.35.3


[PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

2023-08-21 Thread Richard Biener via Gcc-patches
The following fixes the gcc.target/i386/pr87007-5.c testcase which
changed code generation again after the recent sinking improvements.
We now have

vxorps  %xmm0, %xmm0, %xmm0
vsqrtsd d2(%rip), %xmm0, %xmm0

and an unnecessary xor again in one case, the other vsqrtsd has
a register source and a properly zeroing load:

vmovsd  d3(%rip), %xmm0
testl   %esi, %esi
jg  .L11
.L3:
vsqrtsd %xmm0, %xmm0, %xmm0

the following patch XFAILs the scan.  I'm not sure what's at
fault here, there are no loops in the CFG, but somehow
r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
doesn't.  I guess I don't really understand what
remove_partial_avx_dependency is supposed to do so can't
really assess whether the pxor is necessary or not.

OK?

* gcc.target/i386/pr87007-5.c: Update comment, XFAIL
subtest.
---
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index a6cdf11522e..5902616d1f1 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
-/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
avoid partial dependence.  */
+/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
+   are sunk out of the loop and the loop is elided.  One vsqrtsd with
+   memory operand will need a xor to avoid partial dependence.  */
 
 #include
 
@@ -17,4 +19,4 @@ foo (int n, int k)
 
 /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
 /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.35.3


RE: RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-21 Thread Li, Pan2 via Gcc-patches
By design, HAS_FRM must be present if this insn honor FRM.
For example, if one insn don't honor FRM, there should be only one declaration 
as below.

static CONSTEXPR const binop vfmax_obj;

But if one insn honors FRM, there will be 2 declaration as below for code reuse.

static CONSTEXPR const binop vfsub_obj;
static CONSTEXPR const binop_frm vfadd_frm_obj;

Pan

From: juzhe.zh...@rivai.ai 
Sent: Monday, August 21, 2023 4:48 PM
To: Li, Pan2 ; gcc-patches 
Cc: Wang, Yanzhang ; kito.cheng 
Subject: Re: RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API

Yes. I wonder why some floating-point rounding mode has HAS_FRM, some doesn't 
have?


juzhe.zh...@rivai.ai

From: Li, Pan2
Date: 2023-08-21 15:10
To: juzhe.zh...@rivai.ai; 
gcc-patches
CC: Wang, Yanzhang; 
kito.cheng
Subject: RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API
To double confirm, you mean this declaration ?

+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;

Pan

From: juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>
Sent: Monday, August 21, 2023 2:40 PM
To: Li, Pan2 mailto:pan2...@intel.com>>; gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: Li, Pan2 mailto:pan2...@intel.com>>; Wang, Yanzhang 
mailto:yanzhang.w...@intel.com>>; kito.cheng 
mailto:kito.ch...@gmail.com>>
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API

Why does this patch not have HAS_FRM?


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-08-17 16:05
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic 
API
From: Pan Li mailto:pan2...@intel.com>>

This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples

* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.
---
.../riscv/riscv-vector-builtins-bases.cc  |  2 ++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  1 +
.../riscv/rvv/base/float-point-wredusum.c | 33 +++
4 files changed, 37 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-wredusum.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index abf03bab0da..5ee7d3119db 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2548,6 +2548,7 @@ static CONSTEXPR const freducop 
vfredosum_frm_obj;
static CONSTEXPR const reducop vfredmax_obj;
static CONSTEXPR const reducop vfredmin_obj;
static CONSTEXPR const widen_freducop vfwredusum_obj;
+static CONSTEXPR const widen_freducop 
vfwredusum_frm_obj;
static CONSTEXPR const widen_freducop vfwredosum_obj;
static CONSTEXPR const widen_freducop 
vfwredosum_frm_obj;
static CONSTEXPR const vmv vmv_x_obj;
@@ -2810,6 +2811,7 @@ BASE (vfredmin)
BASE (vfwredosum)
BASE (vfwredosum_frm)
BASE (vfwredusum)
+BASE (vfwredusum_frm)
BASE (vmv_x)
BASE (vmv_s)
BASE (vfmv_f)
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index c1bb164a712..69d4562091f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.h
@@ -247,6 +247,7 @@ extern const function_base *const vfredmin;
extern const function_base *const vfwredosum;
extern const function_base *const vfwredosum_frm;
extern const function_base *const vfwredusum;
+extern const function_base *const vfwredusum_frm;
extern const function_base *const vmv_x;
extern const function_base *const vmv_s;
extern const function_base *const vfmv_f;
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index da1157f5a56..3ce06dc60b7 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -508,6 +508,7 @@ DEF_RVV_FUNCTION (vfwredosum, reduc_alu, no_mu_preds, 
wf_vs_ops)
DEF_RVV_FUNCTION (vfwredusum, reduc_alu,

Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches
 wrote:
>
> The following fixes the gcc.target/i386/pr87007-5.c testcase which
> changed code generation again after the recent sinking improvements.
> We now have
>
> vxorps  %xmm0, %xmm0, %xmm0
> vsqrtsd d2(%rip), %xmm0, %xmm0
>
> and an unnecessary xor again in one case, the other vsqrtsd has
> a register source and a properly zeroing load:
>
> vmovsd  d3(%rip), %xmm0
> testl   %esi, %esi
> jg  .L11
> .L3:
> vsqrtsd %xmm0, %xmm0, %xmm0
>
> the following patch XFAILs the scan.  I'm not sure what's at
> fault here, there are no loops in the CFG, but somehow
> r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> doesn't.  I guess I don't really understand what
> remove_partial_avx_dependency is supposed to do so can't
> really assess whether the pxor is necessary or not.
There's a false dependency on xmm0 when the source operand in the
pattern is memory, the pattern only takes xmm0 as dest, but the output
instruction takes xmm0 also as input(the second source operand),
that's why we need an pxor here.
When the source operand in the pattern is register_operand, we can
reuse the register_operand for the second source operand. The
instructions here are not very obvious, the more representative one
should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
>
> OK?
Can we add -fno-XXX to disable the optimization to make the assembly
more stable?
Or current codegen should be optimal(for the sinking), then Ok for the patch.

>
> * gcc.target/i386/pr87007-5.c: Update comment, XFAIL
> subtest.
> ---
>  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
> b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index a6cdf11522e..5902616d1f1 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" 
> } */
> -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
> avoid partial dependence.  */
> +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> +   memory operand will need a xor to avoid partial dependence.  */
>
>  #include
>
> @@ -17,4 +19,4 @@ foo (int n, int k)
>
>  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
>  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> --
> 2.35.3



-- 
BR,
Hongtao


Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 8:40 PM Hongtao Liu  wrote:
>
> On Mon, Aug 21, 2023 at 8:25 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > changed code generation again after the recent sinking improvements.
> > We now have
> >
> > vxorps  %xmm0, %xmm0, %xmm0
> > vsqrtsd d2(%rip), %xmm0, %xmm0
> >
> > and an unnecessary xor again in one case, the other vsqrtsd has
> > a register source and a properly zeroing load:
> >
> > vmovsd  d3(%rip), %xmm0
> > testl   %esi, %esi
> > jg  .L11
> > .L3:
> > vsqrtsd %xmm0, %xmm0, %xmm0
> >
> > the following patch XFAILs the scan.  I'm not sure what's at
> > fault here, there are no loops in the CFG, but somehow
> > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > doesn't.  I guess I don't really understand what
> > remove_partial_avx_dependency is supposed to do so can't
> > really assess whether the pxor is necessary or not.
> There's a false dependency on xmm0 when the source operand in the
> pattern is memory, the pattern only takes xmm0 as dest, but the output
> instruction takes xmm0 also as input(the second source operand),
> that's why we need an pxor here.
> When the source operand in the pattern is register_operand, we can
> reuse the register_operand for the second source operand. The
> instructions here are not very obvious, the more representative one
> should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
And there's no false dependence here.
> >
> > OK?
> Can we add -fno-XXX to disable the optimization to make the assembly
> more stable?
> Or current codegen should be optimal(for the sinking), then Ok for the patch.
>
> >
> > * gcc.target/i386/pr87007-5.c: Update comment, XFAIL
> > subtest.
> > ---
> >  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
> > b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > index a6cdf11522e..5902616d1f1 100644
> > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> > @@ -1,6 +1,8 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> > -fno-tree-vectorize -fdump-tree-cddce3-details 
> > -fdump-tree-lsplit-optimized" } */
> > -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
> > avoid partial dependence.  */
> > +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> > +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> > +   memory operand will need a xor to avoid partial dependence.  */
> >
> >  #include
> >
> > @@ -17,4 +19,4 @@ foo (int n, int k)
> >
> >  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
> >  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> > -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> > +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> > --
> > 2.35.3
>
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, 21 Aug 2023, Hongtao Liu wrote:

> On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches
>  wrote:
> >
> > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > changed code generation again after the recent sinking improvements.
> > We now have
> >
> > vxorps  %xmm0, %xmm0, %xmm0
> > vsqrtsd d2(%rip), %xmm0, %xmm0
> >
> > and an unnecessary xor again in one case, the other vsqrtsd has
> > a register source and a properly zeroing load:
> >
> > vmovsd  d3(%rip), %xmm0
> > testl   %esi, %esi
> > jg  .L11
> > .L3:
> > vsqrtsd %xmm0, %xmm0, %xmm0
> >
> > the following patch XFAILs the scan.  I'm not sure what's at
> > fault here, there are no loops in the CFG, but somehow
> > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > doesn't.  I guess I don't really understand what
> > remove_partial_avx_dependency is supposed to do so can't
> > really assess whether the pxor is necessary or not.
> There's a false dependency on xmm0 when the source operand in the
> pattern is memory, the pattern only takes xmm0 as dest, but the output
> instruction takes xmm0 also as input(the second source operand),
> that's why we need an pxor here.

OK, so XFAIL is wrong, we should instead scan for one xorps then
(like it was in the past).

> When the source operand in the pattern is register_operand, we can
> reuse the register_operand for the second source operand. The
> instructions here are not very obvious, the more representative one
> should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
> >
> > OK?
> Can we add -fno-XXX to disable the optimization to make the assembly
> more stable?

Not sure, we could feed GIMPLE IR to RTL expansion instead of
feeding a complex testcase through the pipeline, but I'm not sure
what we were originally supposed to test (the PR trail is a bit
large).

> Or current codegen should be optimal(for the sinking), then Ok for the patch.

So like the following (I've just adjusted the comments to reflect the
pxor is necessary).

OK?

Richard.

>From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Mon, 21 Aug 2023 14:09:48 +0200
Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
To: gcc-patches@gcc.gnu.org

The following fixes the gcc.target/i386/pr87007-5.c testcase which
changed code generation again after the recent sinking improvements.
We now have

vxorps  %xmm0, %xmm0, %xmm0
vsqrtsd d2(%rip), %xmm0, %xmm0

and a necessary xor again in one case, the other vsqrtsd has
a register source and a properly zeroing load:

vmovsd  d3(%rip), %xmm0
testl   %esi, %esi
jg  .L11
.L3:
vsqrtsd %xmm0, %xmm0, %xmm0

the following patch adjusts the scan.

* gcc.target/i386/pr87007-5.c: Update comment, adjust subtest.
---
 gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
b/gcc/testsuite/gcc.target/i386/pr87007-5.c
index a6cdf11522e..8f2dc947f6c 100644
--- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
@@ -1,6 +1,8 @@
 /* { dg-do compile } */
 /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize 
-fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */
-/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
avoid partial dependence.  */
+/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
+   are sunk out of the loop and the loop is elided.  One vsqrtsd with
+   memory operand needs a xor to avoid partial dependence.  */
 
 #include
 
@@ -17,4 +19,4 @@ foo (int n, int k)
 
 /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
 /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
-/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
+/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
-- 
2.35.3



Re: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c

2023-08-21 Thread Hongtao Liu via Gcc-patches
On Mon, Aug 21, 2023 at 8:59 PM Richard Biener  wrote:
>
> On Mon, 21 Aug 2023, Hongtao Liu wrote:
>
> > On Mon, Aug 21, 2023 at 8:25?PM Richard Biener via Gcc-patches
> >  wrote:
> > >
> > > The following fixes the gcc.target/i386/pr87007-5.c testcase which
> > > changed code generation again after the recent sinking improvements.
> > > We now have
> > >
> > > vxorps  %xmm0, %xmm0, %xmm0
> > > vsqrtsd d2(%rip), %xmm0, %xmm0
> > >
> > > and an unnecessary xor again in one case, the other vsqrtsd has
> > > a register source and a properly zeroing load:
> > >
> > > vmovsd  d3(%rip), %xmm0
> > > testl   %esi, %esi
> > > jg  .L11
> > > .L3:
> > > vsqrtsd %xmm0, %xmm0, %xmm0
> > >
> > > the following patch XFAILs the scan.  I'm not sure what's at
> > > fault here, there are no loops in the CFG, but somehow
> > > r84:DF=sqrt(['d2']) gets a pxor but r84:DF=sqrt(r83:DF)
> > > doesn't.  I guess I don't really understand what
> > > remove_partial_avx_dependency is supposed to do so can't
> > > really assess whether the pxor is necessary or not.
> > There's a false dependency on xmm0 when the source operand in the
> > pattern is memory, the pattern only takes xmm0 as dest, but the output
> > instruction takes xmm0 also as input(the second source operand),
> > that's why we need an pxor here.
>
> OK, so XFAIL is wrong, we should instead scan for one xorps then
> (like it was in the past).
>
> > When the source operand in the pattern is register_operand, we can
> > reuse the register_operand for the second source operand. The
> > instructions here are not very obvious, the more representative one
> > should be vsqrtsd %xmm1, %xmm1(rused one), %xmm0.
> > >
> > > OK?
> > Can we add -fno-XXX to disable the optimization to make the assembly
> > more stable?
>
> Not sure, we could feed GIMPLE IR to RTL expansion instead of
> feeding a complex testcase through the pipeline, but I'm not sure
> what we were originally supposed to test (the PR trail is a bit
> large).
>
> > Or current codegen should be optimal(for the sinking), then Ok for the 
> > patch.
>
> So like the following (I've just adjusted the comments to reflect the
> pxor is necessary).
>
> OK?
OK.
>
> Richard.
>
> From 7bed9399ae736c20a677ccf7e7fc4d2751a32327 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Mon, 21 Aug 2023 14:09:48 +0200
> Subject: [PATCH] Fix FAIL: gcc.target/i386/pr87007-5.c
> To: gcc-patches@gcc.gnu.org
>
> The following fixes the gcc.target/i386/pr87007-5.c testcase which
> changed code generation again after the recent sinking improvements.
> We now have
>
> vxorps  %xmm0, %xmm0, %xmm0
> vsqrtsd d2(%rip), %xmm0, %xmm0
>
> and a necessary xor again in one case, the other vsqrtsd has
> a register source and a properly zeroing load:
>
> vmovsd  d3(%rip), %xmm0
> testl   %esi, %esi
> jg  .L11
> .L3:
> vsqrtsd %xmm0, %xmm0, %xmm0
>
> the following patch adjusts the scan.
>
> * gcc.target/i386/pr87007-5.c: Update comment, adjust subtest.
> ---
>  gcc/testsuite/gcc.target/i386/pr87007-5.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c 
> b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> index a6cdf11522e..8f2dc947f6c 100644
> --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c
> +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c
> @@ -1,6 +1,8 @@
>  /* { dg-do compile } */
>  /* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse 
> -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" 
> } */
> -/* Load of d2/d3 is hoisted out, vrndscalesd will reuse loades register to 
> avoid partial dependence.  */
> +/* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt
> +   are sunk out of the loop and the loop is elided.  One vsqrtsd with
> +   memory operand needs a xor to avoid partial dependence.  */
>
>  #include
>
> @@ -17,4 +19,4 @@ foo (int n, int k)
>
>  /* { dg-final { scan-tree-dump "optimized: loop split" "lsplit" } } */
>  /* { dg-final { scan-tree-dump-times "removing loop" 2 "cddce3" } } */
> -/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 0 } } */
> +/* { dg-final { scan-assembler-times "vxorps\[^\n\r\]*xmm\[0-9\]" 1 } } */
> --
> 2.35.3
>


-- 
BR,
Hongtao


Re: [PATCH] RISC-V/testsuite: Add missing conversion tests.

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/18/23 13:32, Robin Dapp wrote:

Hi,

this patch adds some missing tests for vf[nw]cvt.

Regards
  Robin

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c:
Add tests.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h:
Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c:
Ditto.

OK.

jeff


Re: [PATCH 1/2] RISC-V: Add quotes to #error messages

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/17/23 21:52, Tsukasa OI wrote:

From: Tsukasa OI 

In commit 1aaf3a64e92a ("[PATCH] RISC-V: Deduplicate #error messages in
testsuite"), the author made a mistake to miss the test after adding
quotes around extension names.  To avoid future errors and for consistency
with other #error uses in the RISC-V testsuite, this commit quotes #error
messages where necessary to avoid current test case failures.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn.c: Quote #error messages.
* gcc.target/riscv/zvkn-1.c: Ditto.
* gcc.target/riscv/zvknc.c: Ditto.
* gcc.target/riscv/zvknc-1.c: Ditto.
* gcc.target/riscv/zvknc-2.c: Ditto.
* gcc.target/riscv/zvkng.c: Ditto.
* gcc.target/riscv/zvkng-1.c: Ditto.
* gcc.target/riscv/zvkng-2.c: Ditto.
* gcc.target/riscv/zvks.c: Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvksc.c: Ditto.
* gcc.target/riscv/zvksc-1.c: Ditto.
* gcc.target/riscv/zvksc-2.c: Ditto.
* gcc.target/riscv/zvksg.c: Ditto.
* gcc.target/riscv/zvksg-1.c: Ditto.
* gcc.target/riscv/zvksg-2.c: Ditto.

Thanks.  I've pushed both patches in this series to the trunk.
jeff


Re: [committed][RISC-V] Fix 20010221-1.c with zicond

2023-08-21 Thread Maciej W. Rozycki
On Tue, 8 Aug 2023, Jeff Law wrote:

> >   I wonder however why do we need so much more code, including the middle
> > end too, to support this ISA extension than we do for the very same set of
> > MIPSr6 instructions under ISA_HAS_SEL, hmm...
> Because it doesn't handle as many cases as we're handling in the RISC-V port.
> 
> I'd bet if you take Xiao's testcases and run them on a mips cross many, if not
> most, won't optimize down into the mips equivalents.

 Actually most do, except for the arithmetic ones, so good point here, 
thanks.  As this is in the middle end it would be good to expand coverage 
for the relevant non-RISC-V ports, perhaps in the same commit, or maybe 
better yet, by committing the middle-end piece separately, immediately 
followed by per-port individual testsuite updates.

 Of the non-arithmetic ones, interestingly enough, MOVN and SELEQZ are 
correctly produced with MIPS IV and MIPS64r6 compilation respectively, 
however MOVZ and SELNEZ are missed in a couple of cases in favour to a 
branched sequence which, given the complete symmetry of the operations, 
suggests a silly bug in the backend somewhere.

> One such example would be
> 
> (set (target)
>  (if_then_else (eq (reg A) (const_int 0))
>(reg A)
>(reg B)))
> 
> This is just one example obviously, but there are others.

 This I believe corresponds to `primitiveSemantics_06' and it works with 
the MIPS IV ISA producing MOVZ, but fails with the MIPS64r6 ISA where 
SELNEZ is expected.  Since the MIPS64r6 machine description pretends it 
has a full conditional-move machine operation and emulates it via an RTL 
expander with SELEQZ and SELNEZ combined with OR as required I still think 
this particular expression is supposed to work with our tree without the 
changes and it's probably due to a bug in the backend too, possibly one 
considered in the previous paragraph.

 To double-check the plausibility of the hypothesis I have then tried Xiao 
Zeng's  
proposed patch, but it has caused an ICE:

during RTL pass: ce1
src/gcc/gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c:
 In function 'conditionalArithmetic_compare_reg_return_imm_reg_08':
src/gcc/gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c:85:1:
 internal compiler error: in gen_rtx_SUBREG, at emit-rtl.cc:1022
   85 | }
  | ^

As this was with GCC 12.0.1 20220404 (one I had handy, so quick to check) 
I chose to retry with the top of the tree, i.e. 14.0.0 20230820.  But the 
ICE is still there:

during RTL pass: ce1
src/gcc/gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c:
 In function 'conditionalArithmetic_compare_reg_return_imm_reg_08':
src/gcc/gcc/testsuite/gcc.target/riscv/zicond-conditionalArithmetic_compare_reg_return_imm_reg.c:85:1:
 internal compiler error: in gen_rtx_SUBREG, at emit-rtl.cc:1031
   85 | }
  | ^

And furthermore many of the test cases does not produce any of the 
conditional moves anymore (whether with or without Xiao Zeng's patch).  
This is with a `mips64-linux-gnu', `--with-abi=64' cross-compiler and 
compilations made with `-mips4' and `-mips64r6' as appropriate.

 E.g. with GCC 12:

$ grep -c movn zicond-*-mips4.s
zicond-conditionalArithmetic_compare_0_return_imm_reg-mips4.s:0
zicond-conditionalArithmetic_compare_0_return_reg_reg-mips4.s:0
zicond-conditionalArithmetic_compare_imm_return_imm_reg-mips4.s:0
zicond-conditionalArithmetic_compare_imm_return_reg_reg-mips4.s:0
zicond-conditionalArithmetic_compare_reg_return_imm_reg-mips4.s:0
zicond-conditionalArithmetic_compare_reg_return_reg_reg-mips4.s:0
zicond-primitiveSemantics-mips4.s:6
zicond-primitiveSemantics_compare_imm-mips4.s:6
zicond-primitiveSemantics_compare_imm_return_0_imm-mips4.s:6
zicond-primitiveSemantics_compare_imm_return_imm_imm-mips4.s:6
zicond-primitiveSemantics_compare_imm_return_imm_reg-mips4.s:6
zicond-primitiveSemantics_compare_imm_return_reg_reg-mips4.s:6
zicond-primitiveSemantics_compare_reg-mips4.s:6
zicond-primitiveSemantics_compare_reg_return_0_imm-mips4.s:6
zicond-primitiveSemantics_compare_reg_return_imm_imm-mips4.s:6
zicond-primitiveSemantics_compare_reg_return_imm_reg-mips4.s:6
zicond-primitiveSemantics_compare_reg_return_reg_reg-mips4.s:6
zicond-primitiveSemantics_return_0_imm-mips4.s:6
zicond-primitiveSemantics_return_imm_imm-mips4.s:6
zicond-primitiveSemantics_return_imm_reg-mips4.s:6
zicond-primitiveSemantics_return_reg_reg-mips4.s:6
$ grep -c seleqz zicond-*-mips64r6.s
zicond-conditionalArithmetic_compare_0_return_imm_reg-mips64r6.s:0
zicond-conditionalArithmetic_compare_0_return_reg_reg-mips64r6.s:0
zicond-conditionalArithmetic_compare_imm_return_imm_reg-mips64r6.s:0
zicond-conditionalArithmetic_compare_imm_return_reg_reg-mips64r6.s:0
zicond-conditionalArithmetic_compare_reg_return_imm_reg-mips64r6.s:0
zicond-conditionalArithmetic_com

Re: RISC-V: Added support for CRC.

2023-08-21 Thread Mariam Harutyunyan via Gcc-patches
Thank you for the review.
I'm already working on suggested changes.
The answers to the few questions are attached here.

Thanks,
Mariam

On Wed, Aug 16, 2023 at 8:59 AM Jeff Law  wrote:

>
>
> On 8/3/23 13:37, Mariam Harutyunyan via Gcc-patches wrote:
> > This patch adds CRC support for the RISC-V architecture. It adds internal
> > functions and built-ins specifically designed to handle CRC computations
> > efficiently.
> >
> > If the target is ZBC, the clmul instruction is used for the CRC code
> > generation; otherwise, table-based CRC is generated.  A table with 256
> > elements is used to store precomputed CRCs.
> >
> > These CRC calculation algorithms have higher performance than the naive
> CRC
> > calculation algorithm.
> [ ... ]
> Various comments attached.
>


reply-to-comments
Description: Binary data


Re: [PATCH V5] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-21 Thread Richard Biener via Gcc-patches
On Mon, 21 Aug 2023, Juzhe-Zhong wrote:

> Co-Authored-By: Kewen.Lin 
> 
> Hi, @Richi and @Richard, base on previous disscussion, I simpily fix issuses 
> for
> powerpc and s390 with your suggestions:
> 
> -  machine_mode len_load_mode = get_len_load_store_mode
> -(loop_vinfo->vector_mode, true).require ();
> -  machine_mode len_store_mode = get_len_load_store_mode
> -(loop_vinfo->vector_mode, false).require ();
> +  machine_mode len_load_mode, len_store_mode;
> +  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
> +.exists (&len_load_mode))
> +return false;
> +  if (!get_len_load_store_mode (loop_vinfo->vector_mode, false)
> +.exists (&len_store_mode))
> +return false;

LGTM.

Richard.

> Hi, @Kewen and @Stefan
> 
> Could you test this patch again ? Thanks.
> 
> Co-Authored-By: Kewen.Lin 
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_verify_loop_lens): Add exists check.
>   (vectorizable_live_operation): Add live vectorization for length loop 
> control.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/partial/live-1.c: New test.
>   * gcc.target/riscv/rvv/autovec/partial/live_run-1.c: New test.
> 
> ---
>  .../riscv/rvv/autovec/partial/live-1.c| 34 +++
>  .../riscv/rvv/autovec/partial/live_run-1.c| 35 
>  gcc/tree-vect-loop.cc | 89 ++-
>  3 files changed, 138 insertions(+), 20 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
> new file mode 100644
> index 000..75fa2eba8cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include 
> +
> +#define EXTRACT_LAST(TYPE)   
>   \
> +  TYPE __attribute__ ((noinline, noclone))   
>   \
> +  test_##TYPE (TYPE *x, int n, TYPE value)   
>   \
> +  {  
>   \
> +TYPE last;   
>   \
> +for (int j = 0; j < n; ++j)  
>   \
> +  {  
>   \
> + last = x[j];   \
> + x[j] = last * value;   \
> +  }  
>   \
> +return last; 
>   \
> +  }
> +
> +#define TEST_ALL(T)  
>   \
> +  T (int8_t) 
>   \
> +  T (int16_t)
>   \
> +  T (int32_t)
>   \
> +  T (int64_t)
>   \
> +  T (uint8_t)
>   \
> +  T (uint16_t)   
>   \
> +  T (uint32_t)   
>   \
> +  T (uint64_t)   
>   \
> +  T (_Float16)   
>   \
> +  T (float)  
>   \
> +  T (double)
> +
> +TEST_ALL (EXTRACT_LAST)
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_EXTRACT" 10 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> new file mode 100644
> index 000..42913a112c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "live-1.c"
> +
> +#define N 107
> +#define OP 70
> +
> +#define TEST_LOOP(TYPE)  \
> +  {  \
> +TYPE a[N];   \
> +for (int i = 0; i < N; ++i)  \
> +  {  \
> + a[i] = i * 2 + (i % 3);   

[COMMITTED] [frange] Return false if nothing changed in union_nans().

2023-08-21 Thread Aldy Hernandez via Gcc-patches
When one operand is a known NAN, we always return TRUE from
union_nans(), even if no change occurred.  This patch fixes the
oversight.

gcc/ChangeLog:

* value-range.cc (frange::union_nans): Return false if nothing
changed.
(range_tests_floats): New test.
---
 gcc/value-range.cc | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 76f88d91046..60180c80e55 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -540,16 +540,26 @@ frange::union_nans (const frange &r)
 {
   gcc_checking_assert (known_isnan () || r.known_isnan ());
 
-  if (known_isnan ())
+  bool changed = false;
+  if (known_isnan () && m_kind != r.m_kind)
 {
   m_kind = r.m_kind;
   m_min = r.m_min;
   m_max = r.m_max;
+  changed = true;
 }
-  m_pos_nan |= r.m_pos_nan;
-  m_neg_nan |= r.m_neg_nan;
-  normalize_kind ();
-  return true;
+  if (m_pos_nan != r.m_pos_nan || m_neg_nan != r.m_neg_nan)
+{
+  m_pos_nan |= r.m_pos_nan;
+  m_neg_nan |= r.m_neg_nan;
+  changed = true;
+}
+  if (changed)
+{
+  normalize_kind ();
+  return true;
+}
+  return false;
 }
 
 bool
@@ -2715,6 +2725,22 @@ range_tests_nan ()
   ASSERT_TRUE (real_identical (&r, &r0.upper_bound ()));
   ASSERT_TRUE (!r0.signbit_p (signbit));
   ASSERT_TRUE (r0.maybe_isnan ());
+
+  // NAN U NAN shouldn't change anything.
+  r0.set_nan (float_type_node);
+  r1.set_nan (float_type_node);
+  ASSERT_FALSE (r0.union_ (r1));
+
+  // [3,5] NAN U NAN shouldn't change anything.
+  r0 = frange_float ("3", "5");
+  r1.set_nan (float_type_node);
+  ASSERT_FALSE (r0.union_ (r1));
+
+  // [3,5] U NAN *does* trigger a change.
+  r0 = frange_float ("3", "5");
+  r0.clear_nan ();
+  r1.set_nan (float_type_node);
+  ASSERT_TRUE (r0.union_ (r1));
 }
 
 static void
-- 
2.41.0



Re: [RFC PATCH v2 1/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/16/23 02:33, Philipp Tomsich wrote:



Could we use the underlying 'fence' instruction (unless the assembler
rejects the specific form that is needed) instead of the hex-insn?
Should this also check HAVE_AS_MARCH_ZIHINTPAUSE (which must then also
be added to configure.ac)?
It seems reasonable from an encoding standpoint.   But I haven't been 
able to convince the assembler to encode a 0 into the pred/succ fields.


Jeff


Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-21 Thread Richard Sandiford via Gcc-patches
Juzhe-Zhong  writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define TEST_TYPE(TYPE)   
>  \
>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,   
>  \
> TYPE *__restrict a,  \
> TYPE *__restrict b, int n)   \
>   {   
>  \
> for (int i = 0; i < n; i++)   
>  \
>   dst[i] -= a[i] * b[i];   \
>   }
>
> #define TEST_ALL()
>  \
>   TEST_TYPE (float)   
>  \
>
> TEST_ALL ()
>
> Gimple IR for RVV:
>
> ...
> _39 = -vect__8.14_26;
> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
> vect__4.8_34, vect__4.8_34, _46, 0);
> ...
>
> This is because this following piece of codes in tree-ssa-math-opts.cc:
>
>   if (len)
>   fma_stmt
> = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>   addop, else_value, len, bias);
>   else if (cond)
>   fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>  op2, addop, else_value);
>   else
>   fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>   gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>   gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>  use_stmt));
>   gsi_replace (&gsi, fma_stmt, true);
>   /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>regardless of where the negation occurs.  */
>   gimple *orig_stmt = gsi_stmt (gsi);
>   if (fold_stmt (&gsi, follow_all_ssa_edges))
>   {
> if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>   gcc_unreachable ();
> update_stmt (gsi_stmt (gsi));
>   }
>
> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA > COND_LEN_FNMA.
>
> This patch support STMT fold into:
>
> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
> vect__4.8_34, { 0.0, ... }, _46, 0);
>
> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
>
> Extend maximum num ops:
> -  static const unsigned int MAX_NUM_OPS = 5;
> +  static const unsigned int MAX_NUM_OPS = 7;
>
> Bootstrap and Regtest on X86 passed.
>
> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
>
> Testing on aarch64 is on progress.
>
> gcc/ChangeLog:
>
> * genmatch.cc (decision_tree::gen): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> * gimple-match-exports.cc (gimple_simplify): Ditto.
> (gimple_resimplify6): New function.
> (gimple_resimplify7): New function.
> (gimple_match_op::resimplify): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> (convert_conditional_op): Ditto.
> (build_call_internal): Ditto.
> (try_conditional_simplification): Ditto.
> (gimple_extract): Ditto.
> * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
> * internal-fn.cc (CASE): Ditto.
>
> ---
>  gcc/genmatch.cc |   2 +-
>  gcc/gimple-match-exports.cc | 124 ++--
>  gcc/gimple-match.h  |  19 +-
>  gcc/internal-fn.cc  |  11 ++--
>  4 files changed, 144 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f46d2e1520d..a1925a747a7 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec  &files, bool gimple)
>  }
>fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>  
> -  for (unsigned n = 1; n <= 5; ++n)
> +  for (unsigned n = 1; n <= 7; ++n)
>  {
>bool has_kids_p = false;
>  
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index 7aeb4ddb152..895950309b7 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq 
> *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree);
>  extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree, tree);
> +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
> +  code_helper, tree, tree, 

Re: [PATCH] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-21 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Wed, 16 Aug 2023, Juzhe-Zhong wrote:
>
>> Hi, Richard and Richi.
>> 
>> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
>> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>> 
>> Consider this following case:
>> #define TEST_TYPE(TYPE)  
>>   \
>>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,  
>>   \
>>TYPE *__restrict a,  \
>>TYPE *__restrict b, int n)   \
>>   {  
>>   \
>> for (int i = 0; i < n; i++)  
>>   \
>>   dst[i] -= a[i] * b[i];   \
>>   }
>> 
>> #define TEST_ALL()   
>>   \
>>   TEST_TYPE (float)  
>>   \
>> 
>> TEST_ALL ()
>> 
>> Gimple IR for RVV:
>> 
>> ...
>> _39 = -vect__8.14_26;
>> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
>> vect__4.8_34, vect__4.8_34, _46, 0);
>> ...
>> 
>> This is because this following piece of codes in tree-ssa-math-opts.cc:
>> 
>>   if (len)
>>  fma_stmt
>>= gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>>  addop, else_value, len, bias);
>>   else if (cond)
>>  fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>> op2, addop, else_value);
>>   else
>>  fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>>   gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>>   gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>> use_stmt));
>>   gsi_replace (&gsi, fma_stmt, true);
>>   /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>>   regardless of where the negation occurs.  */
>>   gimple *orig_stmt = gsi_stmt (gsi);
>>   if (fold_stmt (&gsi, follow_all_ssa_edges))
>>  {
>>if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>>  gcc_unreachable ();
>>update_stmt (gsi_stmt (gsi));
>>  }
>> 
>> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA > COND_LEN_FNMA.
>> 
>> This patch support STMT fold into:
>> 
>> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
>> vect__4.8_34, { 0.0, ... }, _46, 0);
>> 
>> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
>> 
>> Extend maximum num ops:
>> -  static const unsigned int MAX_NUM_OPS = 5;
>> +  static const unsigned int MAX_NUM_OPS = 7;
>> 
>> Bootstrap and Regtest on X86 passed.
>> 
>> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
>> 
>> Testing on aarch64 is on progress.
>> 
>> gcc/ChangeLog:
>> 
>> * genmatch.cc (decision_tree::gen): Support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
>> * gimple-match-exports.cc (gimple_simplify): Ditto.
>> (gimple_resimplify6): New function.
>> (gimple_resimplify7): New function.
>> (gimple_match_op::resimplify): Support 
>> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
>> (convert_conditional_op): Ditto.
>> (build_call_internal): Ditto.
>> (try_conditional_simplification): Ditto.
>> (gimple_extract): Ditto.
>> * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
>> * internal-fn.cc (CASE): Ditto.
>> 
>> ---
>>  gcc/genmatch.cc |   2 +-
>>  gcc/gimple-match-exports.cc | 124 ++--
>>  gcc/gimple-match.h  |  19 +-
>>  gcc/internal-fn.cc  |  11 ++--
>>  4 files changed, 144 insertions(+), 12 deletions(-)
>> 
>> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
>> index f46d2e1520d..a1925a747a7 100644
>> --- a/gcc/genmatch.cc
>> +++ b/gcc/genmatch.cc
>> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec  &files, bool gimple)
>>  }
>>fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>>  
>> -  for (unsigned n = 1; n <= 5; ++n)
>> +  for (unsigned n = 1; n <= 7; ++n)
>>  {
>>bool has_kids_p = false;
>>  
>> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
>> index 7aeb4ddb152..895950309b7 100644
>> --- a/gcc/gimple-match-exports.cc
>> +++ b/gcc/gimple-match-exports.cc
>> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, 
>> gimple_seq *, tree (*)(tree),
>>   code_helper, tree, tree, tree, tree, tree);
>>  extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree 
>> (*)(tree),
>>   code_he

Re: [PATCH v1] Mode-Switching: Add optional EMIT_AFTER hook

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/21/23 01:26, pan2...@intel.com wrote:

From: Pan Li 

We have EMIT hook in mode switching already, which will insert the
insn before in most cases. However, in some arch like RISC-V, it
requires the additional insn to be inserted after when meet a call.

|
| <- EMIT HOOK, insert the insn before.
  +---+
  | ptr->insn |
  +---+
| <- EMIT_AFTER HOOK, insert the insn after.
|

Thus, this patch would like to add one optional EMIT_AFTER hook, which
will try to insert the emitted insn after. The end-user can either
implement this HOOK or leave it NULL as is.

If the backend ignore this optinal hook, there is no impact to the
original mode switching stuff. If the backend implement this optional
hook, the mode switching will try to insert the insn after. Please note
the EMIT_AFTER doen't have any impact to EMIT hook.

Passed both the regression and bootstrap test in x86.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* doc/tm.texi: Add hook def and update the description.
* doc/tm.texi.in: Ditto.
* mode-switching.cc (optimize_mode_switching): Insert the
emitted insn after ptr->insn.
* target.def (insn): Define emit_after hook.
Not a full review.  I think I need to know a bit more about why you need 
these additional hooks.


Presumably you can't use the current ".emit" hook because it doesn't 
give you access to the block or insn that you can then iterate on for 
insertion on the outgoing edges?





@@ -831,6 +833,49 @@ optimize_mode_switching (void)
emit_insn_before (mode_set, ptr->insn_ptr);
}
  
+		  if (targetm.mode_switching.emit_after)

+   {
+ if (control_flow_insn_p (ptr->insn_ptr)
+   && ptr->insn_ptr == BB_END (bb))
I'm not aware of a case where we can have an insn with control flow that 
isn't the end of the block.  So perhaps then that second conditional 
into an assertion inside the true arm?




+   {
+ edge eg;
+ edge_iterator eg_iterator;
+
+ FOR_EACH_EDGE (eg, eg_iterator, bb->succs)
+   {
+ start_sequence ();
+ targetm.mode_switching.emit_after (entity_map[j],
+   ptr->mode, cur_mode, ptr->regs_live);
+ mode_set = get_insns ();
+ end_sequence ();
+
+ if (mode_set != NULL_RTX)
+   {
+ if (eg->flags & EDGE_ABNORMAL)
+   insert_insn_end_basic_block (mode_set, bb);
+ else
+   insert_insn_on_edge (mode_set, eg);
Is this really correct for EDGE_ABNORMAL?  If the abnormal edge is 
created by, say a nonlocal goto, exception handling, etc, then the insn 
you insert at the end of the block will never be executed.


This is a classic problem with these classes of algorithms and I suspect 
there's code elsewhere to deal with these cases.




Jeff


Re: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/17/23 20:53, Pan Li via Gcc-patches wrote:

From: Pan Li 

As suggested by kito, we will add new frm_opt_type template arg
to the op class, to avoid the duplicated function expand.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): Removed.
(class reverse_binop_frm): Ditto.
(class widen_binop_frm): Ditto.
(class vfmacc_frm): Ditto.
(class vfnmacc_frm): Ditto.
(class vfmsac_frm): Ditto.
(class vfnmsac_frm): Ditto.
(class vfmadd_frm): Ditto.
(class vfnmadd_frm): Ditto.
(class vfmsub_frm): Ditto.
(class vfnmsub_frm): Ditto.
(class vfwmacc_frm): Ditto.
(class vfwnmacc_frm): Ditto.
(class vfwmsac_frm): Ditto.
(class vfwnmsac_frm): Ditto.
(class unop_frm): Ditto.
(class vfrec7_frm): Ditto.
(class binop): Add frm_op_type template arg.
(class unop): Ditto.
(class widen_binop): Ditto.
(class widen_binop_fp): Ditto.
(class reverse_binop): Ditto.
(class vfmacc): Ditto.
(class vfnmsac): Ditto.
(class vfmadd): Ditto.
(class vfnmsub): Ditto.
(class vfnmacc): Ditto.
(class vfmsac): Ditto.
(class vfnmadd): Ditto.
(class vfmsub): Ditto.
(class vfwmacc): Ditto.
(class vfwnmacc): Ditto.
(class vfwmsac): Ditto.
(class vfwnmsac): Ditto.
(class float_misc): Ditto.
So in the expand method, you added a case for OP_TYPE_vx.  I assume that 
was intentional -- but it's not mentioned anywhere in the ChangeLog.  So 
please update the ChangeLog if it was intentional or remove the change 
if it wasn't intentional.  Pre-approved with whichever change is 
appropriate.


Thanks,
Jeff


Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/17/23 02:05, Pan Li via Gcc-patches wrote:

From: Pan Li 

This patch would like to support the rounding mode API for the
VFWREDUSUM.VS as the below samples

* __riscv_vfwredusum_vs_f32m1_f64m1_rm
* __riscv_vfwredusum_vs_f32m1_f64m1_rm_m

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc
(vfwredusum_frm_obj): New declaration.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def
(vfwredusum_frm): New intrinsic function def.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.

OK
jeff


Re: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

2023-08-21 Thread Kito Cheng via Gcc-patches
Just one nit from me: plz add assertion to OP_TYPE_vx to make sure NO
FRM_OP == HAS_FRM there

On Mon, Aug 21, 2023 at 11:04 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/17/23 20:53, Pan Li via Gcc-patches wrote:
> > From: Pan Li 
> >
> > As suggested by kito, we will add new frm_opt_type template arg
> > to the op class, to avoid the duplicated function expand.
> >
> > Signed-off-by: Pan Li 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-vector-builtins-bases.cc
> >   (class binop_frm): Removed.
> >   (class reverse_binop_frm): Ditto.
> >   (class widen_binop_frm): Ditto.
> >   (class vfmacc_frm): Ditto.
> >   (class vfnmacc_frm): Ditto.
> >   (class vfmsac_frm): Ditto.
> >   (class vfnmsac_frm): Ditto.
> >   (class vfmadd_frm): Ditto.
> >   (class vfnmadd_frm): Ditto.
> >   (class vfmsub_frm): Ditto.
> >   (class vfnmsub_frm): Ditto.
> >   (class vfwmacc_frm): Ditto.
> >   (class vfwnmacc_frm): Ditto.
> >   (class vfwmsac_frm): Ditto.
> >   (class vfwnmsac_frm): Ditto.
> >   (class unop_frm): Ditto.
> >   (class vfrec7_frm): Ditto.
> >   (class binop): Add frm_op_type template arg.
> >   (class unop): Ditto.
> >   (class widen_binop): Ditto.
> >   (class widen_binop_fp): Ditto.
> >   (class reverse_binop): Ditto.
> >   (class vfmacc): Ditto.
> >   (class vfnmsac): Ditto.
> >   (class vfmadd): Ditto.
> >   (class vfnmsub): Ditto.
> >   (class vfnmacc): Ditto.
> >   (class vfmsac): Ditto.
> >   (class vfnmadd): Ditto.
> >   (class vfmsub): Ditto.
> >   (class vfwmacc): Ditto.
> >   (class vfwnmacc): Ditto.
> >   (class vfwmsac): Ditto.
> >   (class vfwnmsac): Ditto.
> >   (class float_misc): Ditto.
> So in the expand method, you added a case for OP_TYPE_vx.  I assume that
> was intentional -- but it's not mentioned anywhere in the ChangeLog.  So
> please update the ChangeLog if it was intentional or remove the change
> if it wasn't intentional.  Pre-approved with whichever change is
> appropriate.
>
> Thanks,
> Jeff


Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-21 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

thanks, this is a reasonable approach and improves readability noticeably.
LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
looked closely into the vsetvl pass before and cannot entirely review it
quickly.  As we already have good test coverage there is not much that
can go wrong IMHO.

Regards
 Robin


Patch ping Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-21 Thread Jakub Jelinek via Gcc-patches
Hi!

On Wed, Aug 09, 2023 at 08:14:14PM +0200, Jakub Jelinek via Gcc-patches wrote:
> Jakub Jelinek (12):
>   expr: Small optimization [PR102989]
>   lto-streamer-in: Adjust assert [PR102989]
>   phiopt: Fix phiopt ICE on vops [PR102989]
>   Middle-end _BitInt support [PR102989]
>   _BitInt lowering support [PR102989]
>   i386: Enable _BitInt on x86-64 [PR102989]
>   ubsan: _BitInt -fsanitize=undefined support [PR102989]
>   libgcc: Generated tables for _BitInt <-> _Decimal* conversions [PR102989]
>   libgcc _BitInt support [PR102989]
>   C _BitInt support [PR102989]
>   testsuite part 1 for _BitInt support [PR102989]
>   testsuite part 2 for _BitInt support [PR102989]

+   C _BitInt incremental fixes [PR102989]

I'd like to ping this patch series.
First 3 patches are committed, the rest awaits patch review.

Joseph, could I ask now at least for an overall design review of the
C patches (8-10,13) whether its interfaces with middle-end are ok,
so that Richi can review the middle-end parts?

Thanks.

Jakub



[PATCH] s390: Fix some builtin definitions

2023-08-21 Thread Stefan Schulze Frielinghaus via Gcc-patches
Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtins.def (s390_vec_signed_flt): Fix
builtin flag.
(s390_vec_unsigned_flt): Ditto.
(s390_vec_revb_flt): Ditto.
(s390_vec_reve_flt): Ditto.
(s390_vclfnhs): Fix operand flags.
(s390_vclfnls): Ditto.
(s390_vcrnfs): Ditto.
(s390_vcfn): Ditto.
(s390_vcnf): Ditto.
---
 gcc/config/s390/s390-builtins.def | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index c829f445a11..964d86c74a0 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -2846,12 +2846,12 @@ B_DEF  (s390_vcelfb,
floatunsv4siv4sf2,  0,
 B_DEF  (s390_vcdlgb,floatunsv2div2df2,  0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DF_UV2DI)
 
 OB_DEF (s390_vec_signed,
s390_vec_signed_flt,s390_vec_signed_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
-OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, 0, 
 B_VXE2, BT_OV_V4SI_V4SF)
+OB_DEF_VAR (s390_vec_signed_flt,s390_vcfeb, B_VXE2,
 0,  BT_OV_V4SI_V4SF)
 OB_DEF_VAR (s390_vec_signed_dbl,s390_vcgdb, 0, 
 0,  BT_OV_V2DI_V2DF)
 
 OB_DEF (s390_vec_unsigned,  
s390_vec_unsigned_flt,s390_vec_unsigned_dbl,B_VX,   BT_FN_OV4SI_OV4SI)
-OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,0,
B_VXE2, BT_OV_UV4SI_V4SF)
-OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0,
0,  BT_OV_UV2DI_V2DF)
+OB_DEF_VAR (s390_vec_unsigned_flt,  s390_vclfeb,B_VXE2,
 0,  BT_OV_UV4SI_V4SF)
+OB_DEF_VAR (s390_vec_unsigned_dbl,  s390_vclgdb,0, 
 0,  BT_OV_UV2DI_V2DF)
 
 B_DEF  (s390_vcfeb, fix_truncv4sfv4si2, 0, 
 B_VXE2, O2_U4 | O3_U3,  BT_FN_V4SI_V4SF)
 B_DEF  (s390_vcgdb, fix_truncv2dfv2di2, 0, 
 B_VX,   O2_U4 | O3_U3,  BT_FN_V2DI_V2DF)
@@ -2929,7 +2929,7 @@ OB_DEF_VAR (s390_vec_revb_s32,  s390_vlbrf,   
  0,
 OB_DEF_VAR (s390_vec_revb_u32,  s390_vlbrf, 0, 
 0,  BT_OV_UV4SI_UV4SI)
 OB_DEF_VAR (s390_vec_revb_s64,  s390_vlbrg, 0, 
 0,  BT_OV_V2DI_V2DI)
 OB_DEF_VAR (s390_vec_revb_u64,  s390_vlbrg, 0, 
 0,  BT_OV_UV2DI_UV2DI)
-OB_DEF_VAR (s390_vec_revb_flt,  s390_vlbrf_flt, 0, 
 B_VXE,  BT_OV_V4SF_V4SF)
+OB_DEF_VAR (s390_vec_revb_flt,  s390_vlbrf_flt, B_VXE, 
 0,  BT_OV_V4SF_V4SF)
 OB_DEF_VAR (s390_vec_revb_dbl,  s390_vlbrg_dbl, 0, 
 0,  BT_OV_V2DF_V2DF)
 
 B_DEF  (s390_vlbrh, bswapv8hi,  0, 
 B_VX,   0,   BT_FN_V8HI_V8HI)
@@ -2960,7 +2960,7 @@ OB_DEF_VAR (s390_vec_reve_u32,  s390_vlerf,   
  0,
 OB_DEF_VAR (s390_vec_reve_b64,  s390_vlerg, 0, 
 0,  BT_OV_BV2DI_BV2DI)
 OB_DEF_VAR (s390_vec_reve_s64,  s390_vlerg, 0, 
 0,  BT_OV_V2DI_V2DI)
 OB_DEF_VAR (s390_vec_reve_u64,  s390_vlerg, 0, 
 0,  BT_OV_UV2DI_UV2DI)
-OB_DEF_VAR (s390_vec_reve_flt,  s390_vlerf_flt, 0, 
 B_VXE,  BT_OV_V4SF_V4SF)
+OB_DEF_VAR (s390_vec_reve_flt,  s390_vlerf_flt, B_VXE, 
 0,  BT_OV_V4SF_V4SF)
 OB_DEF_VAR (s390_vec_reve_dbl,  s390_vlerg_dbl, 0, 
 0,  BT_OV_V2DF_V2DF)
 
 B_DEF  (s390_vlerb, eltswapv16qi,   0, 
 B_VX,   0,   BT_FN_V16QI_V16QI)
@@ -3037,10 +3037,10 @@ B_DEF  (s390_vstrszf,vstrszv4si,
0,
 
 /* arch 14 builtins */
 
-B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0, 
 B_NNPA, O3_U4,  BT_FN_V4SF_V8HI_UINT)
-B_DEF  (s390_vclfnls,vclfnls_v8hi,  0, 
 B_NNPA, O3_U4,  BT_FN_V4SF_V8HI_UINT)
+B_DEF  (s390_vclfnhs,vclfnhs_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_V8HI_UINT)
+B_DEF  (s390_vclfnls,vclfnls_v8hi,  0, 
 B_NNPA, O2_U4,  BT_FN_V4SF_V8HI_UINT)
 
-B_DEF  (s390_vcrnfs, vcrnfs_v8hi,   0,  

[PATCH] s390: Fix builtins vec_rli and verll

2023-08-21 Thread Stefan Schulze Frielinghaus via Gcc-patches
The second argument of these builtins is an unsigned immediate.  For
vec_rli the API allows immediates up to 64 bits whereas the instruction
verll only allows immediates up to 32 bits.  Since the shift count
equals the immediate modulo vector element size, truncating those
immediates is fine.

Bootstrapped and regtested on s390.  Ok for mainline?

gcc/ChangeLog:

* config/s390/s390-builtins.def (O_U64): New.
(O1_U64): Ditto.
(O2_U64): Ditto.
(O3_U64): Ditto.
(O4_U64): Ditto.
(O_M12): Change bit position.
(O_S2): Ditto.
(O_S3): Ditto.
(O_S4): Ditto.
(O_S5): Ditto.
(O_S8): Ditto.
(O_S12): Ditto.
(O_S16): Ditto.
(O_S32): Ditto.
(O_ELEM): Ditto.
(O_LIT): Ditto.
(OB_DEF_VAR): Add operand constraints.
(B_DEF): Ditto.
* config/s390/s390.cc (s390_const_operand_ok): Honour 64 bit
operands.
---
 gcc/config/s390/s390-builtins.def | 60 ++-
 gcc/config/s390/s390.cc   |  6 ++--
 2 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index a16983b18bd..c829f445a11 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -28,6 +28,7 @@
 #undef O_U12
 #undef O_U16
 #undef O_U32
+#undef O_U64
 
 #undef O_M12
 
@@ -88,6 +89,11 @@
 #undef O3_U32
 #undef O4_U32
 
+#undef O1_U64
+#undef O2_U64
+#undef O3_U64
+#undef O4_U64
+
 #undef O1_M12
 #undef O2_M12
 #undef O3_M12
@@ -157,20 +163,21 @@
 #define O_U127 /* unsigned 16 bit literal */
 #define O_U168 /* unsigned 16 bit literal */
 #define O_U329 /* unsigned 32 bit literal */
+#define O_U64   10 /* unsigned 64 bit literal */
 
-#define O_M12   10 /* matches bitmask of 12 */
+#define O_M12   11 /* matches bitmask of 12 */
 
-#define O_S211 /* signed  2 bit literal */
-#define O_S312 /* signed  3 bit literal */
-#define O_S413 /* signed  4 bit literal */
-#define O_S514 /* signed  5 bit literal */
-#define O_S815 /* signed  8 bit literal */
-#define O_S12   16 /* signed 12 bit literal */
-#define O_S16   17 /* signed 16 bit literal */
-#define O_S32   18 /* signed 32 bit literal */
+#define O_S212 /* signed  2 bit literal */
+#define O_S313 /* signed  3 bit literal */
+#define O_S414 /* signed  4 bit literal */
+#define O_S515 /* signed  5 bit literal */
+#define O_S816 /* signed  8 bit literal */
+#define O_S12   17 /* signed 12 bit literal */
+#define O_S16   18 /* signed 16 bit literal */
+#define O_S32   19 /* signed 32 bit literal */
 
-#define O_ELEM  19 /* Element selector requiring modulo arithmetic. */
-#define O_LIT   20 /* Operand must be a literal fitting the target type.  */
+#define O_ELEM  20 /* Element selector requiring modulo arithmetic. */
+#define O_LIT   21 /* Operand must be a literal fitting the target type.  */
 
 #define O_SHIFT 5
 
@@ -223,6 +230,11 @@
 #define O3_U32 (O_U32 << (2 * O_SHIFT))
 #define O4_U32 (O_U32 << (3 * O_SHIFT))
 
+#define O1_U64 O_U64
+#define O2_U64 (O_U64 << O_SHIFT)
+#define O3_U64 (O_U64 << (2 * O_SHIFT))
+#define O4_U64 (O_U64 << (3 * O_SHIFT))
+
 #define O1_M12 O_M12
 #define O2_M12 (O_M12 << O_SHIFT)
 #define O3_M12 (O_M12 << (2 * O_SHIFT))
@@ -1989,19 +2001,19 @@ B_DEF  (s390_verllvf,   vrotlv4si3, 
0,
 B_DEF  (s390_verllvg,   vrotlv2di3, 0, 
 B_VX,   0,  BT_FN_UV2DI_UV2DI_UV2DI)
 
 OB_DEF (s390_vec_rli,   s390_vec_rli_u8,s390_vec_rli_s64,  
 B_VX,   BT_FN_OV4SI_OV4SI_ULONG)
-OB_DEF_VAR (s390_vec_rli_u8,s390_verllb,0, 
 0,  BT_OV_UV16QI_UV16QI_ULONG)
-OB_DEF_VAR (s390_vec_rli_s8,s390_verllb,0, 
 0,  BT_OV_V16QI_V16QI_ULONG)
-OB_DEF_VAR (s390_vec_rli_u16,   s390_verllh,0, 
 0,  BT_OV_UV8HI_UV8HI_ULONG)
-OB_DEF_VAR (s390_vec_rli_s16,   s390_verllh,0, 
 0,  BT_OV_V8HI_V8HI_ULONG)
-OB_DEF_VAR (s390_vec_rli_u32,   s390_verllf,0, 
 0,  BT_OV_UV4SI_UV4SI_ULONG)
-OB_DEF_VAR (s390_vec_rli_s32,   s390_verllf,0, 
 0,  BT_OV_V4SI_V4SI_ULONG)
-OB_DEF_VAR (s390_vec_rli_u64,   s390_verllg,0, 
 0,  BT_OV_UV2DI_UV2DI_ULONG)
-OB_DEF_VAR (s390_vec_rli_s64,   s390_verllg,0, 
 0,  BT_OV_V2DI_V2DI_ULONG)
-
-B_DEF  (s390_verllb,rotlv16qi3, 0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UINT)
-B_DEF  (s390_verllh,rotlv8hi3,  0, 
 B_VX,   0,  BT_FN_UV8HI_UV

Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-21 Thread Kito Cheng via Gcc-patches
I think I could do some details review tomorrow on the plane, I am free
from the meeting hell tomorrow :p



Robin Dapp via Gcc-patches  於 2023年8月21日 週一 23:24
寫道:

> Hi Juzhe,
>
> thanks, this is a reasonable approach and improves readability noticeably.
> LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
> looked closely into the vsetvl pass before and cannot entirely review it
> quickly.  As we already have good test coverage there is not much that
> can go wrong IMHO.
>
> Regards
>  Robin
>


RE: [PING][PATCH] arm: Remove unsigned variant of vcaddq_m

2023-08-21 Thread Kyrylo Tkachov via Gcc-patches
Ok.
Thanks,
Kyrill

From: Stam Markianos-Wright  
Sent: Saturday, August 19, 2023 12:42 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw 

Subject: [PING][PATCH] arm: Remove unsigned variant of vcaddq_m



(Pinging since I realised that this is required for my later Low Overhead Loop 
patch series to work)

Ok for trunk with the updated changelog that Christophe mentioned?

Thanks,
Stamatis/Stam Markianos-Wright 


From: Stam Markianos-Wright
Sent: Tuesday, August 1, 2023 6:21 PM
To: mailto:gcc-patches@gcc.gnu.org 
Cc: Richard Earnshaw ; Kyrylo Tkachov 

Subject: arm: Remove unsigned variant of vcaddq_m 
 
Hi all,

The unsigned variants of the vcaddq_m operation are not needed within the
compiler, as the assembly output of the signed and unsigned versions of the
ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u`
suffixes).

Tested with baremetal arm-none-eabi on Arm's fastmodels.

Ok for trunk?

Thanks,
Stamatis Markianos-Wright

gcc/ChangeLog:

     * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270):
       Use common insn for signed and unsigned front-end definitions.
     * config/arm/arm_mve_builtins.def
       (vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common.
       (vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove.
     * config/arm/iterators.md (mve_insn): Merge signed and unsigned defs.
       (isu): Likewise.
       (rot): Likewise.
       (mve_rot): Likewise.
       (supf): Likewise.
       (VxCADDQ_M): Likewise.
     * config/arm/unspecs.md (unspec): Likewise.
---
  gcc/config/arm/arm-mve-builtins-base.cc |  4 ++--
  gcc/config/arm/arm_mve_builtins.def |  6 ++---
  gcc/config/arm/iterators.md | 30 +++--
  gcc/config/arm/mve.md   |  4 ++--
  gcc/config/arm/unspecs.md   |  6 ++---
  5 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index e31095ae112..426a87e9852 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -260,8 +260,8 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
  FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
  FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
-FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S, 
VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
-FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S, 
VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M, 
VCADDQ_ROT90_M, VCADDQ_ROT90_M_F))
+FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot, 
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M, 
VCADDQ_ROT270_M, VCADDQ_ROT270_M_F))
  FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F))
  FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F))
  FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1, 
UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F))
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 43dacc3dda1..6ac1812c697 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -523,8 +523,8 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, 
vhsubq_m_n_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi, 
v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi, 
v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi, 
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_, v16qi, 
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_, v16qi, 
v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si)
@@ -587,8 +587,6 @@ VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, 
vhcaddq_rot270_m_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi, 
v4

[OpenMP/offloading][RFC] How to handle target/device-specifics with C pre-processor (in general, inside 'omp declare variant')

2023-08-21 Thread Tobias Burnus

RFC – and idea how to handle this best in GCC? See the two examples
below for what we would like to support.

* * *

In GCC, we handle OpenMP (and OpenACC) by parsing the input file once,
produce an internal representation (in LTO format) for offloading code
and only at link time process it by passing it via the LTO wrapper to
the offloading-device compilers (mkoffload / device lto1).
See https://gcc.gnu.org/wiki/Offloading

This works okayish - even though it causes some issues like with
metadirectives (they are implemened on the OG13 branch, however).
And with declare variant or a nohost version, where getting rid of
the host version is not that easy as it has to be in there until
omp-offload.cc's functions are run, which comes rather late.

There are currently already some issues like with -ffast-math
and GLIBC's finite math functions, which are not be available
on the device side when using newlib's libm..
(However, GLIBC has removed those.)

Likewise, it would be nice to do like Clang+LLVM does: Auto-enable
some device-specific math functions. (Albeit that won't work well
with Fortran.)


However, with OpenMP 5.1, there is a real issue. In 5.1, Appendix B
it reads as:
"For C/C++, the declare variant directive was extended to support elision
of preprocessed code and to allow enclosed function definitions to be
interpreted as variant functions (see Section 7.5)."

The problem is the "elision of preprocessed" as it permits code like the
following:

|#ifdef _OPENMP #pragma omp begin declare variant
match(device={arch=NVPTX}) #include "cuda/math.h" #pragma omp begin
declare variant match(device={isa=sm70}) #include "cuda/sm70/math.h"
#pragma omp end declare variant #pragma omp end declare variant #pragma
omp begin declare variant match(arch=AMD) #include "amdgpu/math.h"
#pragma omp end declare variant #endif|

And such code needs to keep working if there is a '#define ABC ...' in
one file and an '#ifndef ABC / #define ABC ...' in the other file.

Additionally, it would be neat if it would handle target-specific defines
like '#if __PTX_SM__ == 350' for the relevant parts (here: arch=nvptx).
(We already do support context selectors via the gcc/config/*/t-omp-device 
files;
see also https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Context-Selectors.html )

Thoughts?

* * *

The question is also what to support – "just" function declarations which are 
specific
to a device or some generic replacement of the kind:

|#pragma omp begin declare variant match(device={arch=NVPTX})|
  #define NUM_THREADS 128
#pragma omp end declare variant
|#pragma omp begin declare variant match(device={arch=AMDGCN})|
  #define NUM_THREADS 64
#pragma omp end declare variant

#ifndef NUM_THREADSß
  #define NUM_THREADS 16
#endif

...
printf ("Running with %d threads\n", NUM_THREADS);
#pragma omp parallel for num_threads(NUM_THREADS)

* * *

If we only handle 'begin/end declare variant', the following
works in principle:
- Parse the file once with only host-code parsing but
- keep track of delimited '|omp begin declare variant|'
  where the context selector matches one of the supported
  offload targets.
- parse the file n-times again but this time set the
  target-#defined (extended version of gcc/config/*/t-omp-device
  to make them available?)
- When doing so, ignore all non-offloading bits (issue: implicit
  'declare target' + have the data available for variant resolution).
- Store this in some way.

But it is not really clear to me how to do this in actual code.

Any suggestion?

Tobias

PS: I would like to have some input before the Cauldron, but we might want
to additionally discuss this in detail during the cauldron, possibly some
brainstorming before the BoF and then surely also in the BoF.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] aarch64: fix format specifier

2023-08-21 Thread Richard Earnshaw (lists) via Gcc-patches

On 18/08/2023 17:37, FX Coudert via Gcc-patches wrote:

A rather trivial fix for fprintf() specifier of a HOST_WIDE_INT value.
Tested on aarch64-apple-darwin. OK to commit?

FX



OK.

R.


[PATCH] RISC-V: Add Types to Missing Bitmanip Instructions:

2023-08-21 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the bitmanip instructions to ensure that no insn is left
without a type attribute. Updates a total of 8 insns to have type "bitmanip"

Tested for regressions using rv32/64 multilib with newlib/linux. 

gcc/Changelog:

* config/riscv/bitmanip.md: Added bitmanip type to insns
that are missing types

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/bitmanip.md | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index c42e7b890db..0c99152ffc8 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -103,7 +103,8 @@ (define_insn_and_split "*andi_add.uw"
   (match_dup 4)))]
 {
operands[3] = GEN_INT (INTVAL (operands[3]) >> INTVAL (operands[2]));
-})
+}
+[(set_attr "type" "bitmanip")])
 
 (define_insn "*shNadduw"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -533,7 +534,9 @@ (define_insn_and_split "*minmax"
   "&& reload_completed"
   [(set (match_dup 3) (sign_extend:DI (match_dup 1)))
(set (match_dup 4) (match_dup 2))
-   (set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))])
+   (set (match_dup 0) (:DI (match_dup 3) (match_dup 4)))]
+  ""
+  [(set_attr "type" "bitmanip")])
 
 ;; ZBS extension.
 
@@ -628,7 +631,8 @@ (define_insn_and_split "*bclri_nottwobits"
 
operands[3] = GEN_INT (~bits | topbit);
operands[4] = GEN_INT (~topbit);
-})
+}
+[(set_attr "type" "bitmanip")])
 
 ;; In case of a paradoxical subreg, the sign bit and the high bits are
 ;; not allowed to be changed
@@ -648,7 +652,8 @@ (define_insn_and_split "*bclridisi_nottwobits"
 
operands[3] = GEN_INT (~bits | topbit);
operands[4] = GEN_INT (~topbit);
-})
+}
+[(set_attr "type" "bitmanip")])
 
 (define_insn "*binv"
   [(set (match_operand:X 0 "register_operand" "=r")
@@ -743,7 +748,8 @@ (define_insn_and_split "*i_extrabit"
 
operands[3] = GEN_INT (bits &~ topbit);
operands[4] = GEN_INT (topbit);
-})
+}
+[(set_attr "type" "bitmanip")])
 
 ;; Same to use blcri + andi and blcri + bclri
 (define_insn_and_split "*andi_extrabit"
@@ -761,7 +767,8 @@ (define_insn_and_split "*andi_extrabit"
 
operands[3] = GEN_INT (bits | topbit);
operands[4] = GEN_INT (~topbit);
-})
+}
+[(set_attr "type" "bitmanip")])
 
 ;; IF_THEN_ELSE: test for 2 bits of opposite polarity
 (define_insn_and_split "*branch_mask_twobits_equals_singlebit"
@@ -803,7 +810,8 @@ (define_insn_and_split 
"*branch_mask_twobits_equals_singlebit"
 
operands[8] = GEN_INT (setbit);
operands[9] = GEN_INT (clearbit);
-})
+}
+[(set_attr "type" "bitmanip")])
 
 ;; IF_THEN_ELSE: test for (a & (1 << BIT_NO))
 (define_insn_and_split "*branch_bext"
@@ -826,7 +834,9 @@ (define_insn_and_split "*branch_bext"
(zero_extend:X (match_dup 3
(set (pc) (if_then_else (match_op_dup 1 [(match_dup 4) (const_int 0)])
   (label_ref (match_dup 0))
-  (pc)))])
+  (pc)))]
+   ""
+  [(set_attr "type" "bitmanip")])
 
 ;; ZBKC or ZBC extension
 (define_insn "riscv_clmul_"
-- 
2.41.0



[PATCH] RISC-V: Add Types to Un-Typed Sync Instructions:

2023-08-21 Thread Edwin Lu
Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the sync instructions to ensure that no insn is left
without a type attribute. Updates a total of 6 insns to have type "atomic"

Tested for regressions using rv32/64 multilib with newlib/linux. 
gcc/Changelog:

* config/riscv/sync-rvwmo.md: Added atomic type to insns
missing types
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Signed-off-by: Edwin Lu 
---
 gcc/config/riscv/sync-rvwmo.md |  3 ++-
 gcc/config/riscv/sync-ztso.md  |  5 +++--
 gcc/config/riscv/sync.md   | 12 
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index 1fc7cf16b5b..4970d561211 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -41,7 +41,8 @@ (define_insn "mem_thread_fence_rvwmo"
 else
gcc_unreachable ();
   }
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index 91c2a48c069..c8968d01488 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -35,7 +35,8 @@ (define_insn "mem_thread_fence_ztso"
 else
gcc_unreachable ();
   }
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
@@ -77,4 +78,4 @@ (define_insn "atomic_store_ztso"
   return "s\t%z1,%0";
   }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
\ No newline at end of file
+   (set (attr "length") (const_int 8))])
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 2f85951508f..d6c44afd9ca 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -136,7 +136,8 @@ (define_insn "subword_atomic_fetch_strong_"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
   }
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 28)) ])
 
 (define_expand "atomic_fetch_nand"
   [(match_operand:SHORT 0 "register_operand");; old 
value at mem
@@ -203,7 +204,8 @@ (define_insn "subword_atomic_fetch_strong_nand"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
   }
-  [(set (attr "length") (const_int 32))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 32)) ])
 
 (define_expand "atomic_fetch_"
   [(match_operand:SHORT 0 "register_operand")   ;; old value 
at mem
@@ -310,7 +312,8 @@ (define_insn "subword_atomic_exchange_strong"
   "sc.w%J3\t%5, %5, %1\;"
   "bnez\t%5, 1b";
   }
-  [(set (attr "length") (const_int 20))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 20))])
 
 (define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -497,7 +500,8 @@ (define_insn "subword_atomic_cas_strong"
   "bnez\t%7, 1b\;"
   "1:";
   }
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 28))])
 
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "") ;; bool output
-- 
2.34.1



[PATCH] Fix tests sensitive to internal library allocations

2023-08-21 Thread François Dumont via Gcc-patches

Hi

Here is a propocal to fix tests sensitive to libstdc++ internal allocations.

Tested by restoring allocation in tzdb.cc.

As announced I'm also adding a test to detect such allocations. If it is 
ok let me know if you prefer to see it in a different place.


    libstdc++: Fix tests relying on operator new/delete overload

    Fix tests that are checking for an allocation plan. They are failing if
    an allocation is taking place outside the test.

    libstdc++-v3/ChangeLog

    * testsuite/util/replacement_memory_operators.h
    (counter::_M_pre_enter_count): New.
    (counter::enter, counter::exit): New static methods to call 
on main() enter/exit.

    * testsuite/23_containers/unordered_map/96088.cc (main):
    Call __gnu_test::counter::enter/exit.
    * testsuite/23_containers/unordered_multimap/96088.cc 
(main): Likewise.
    * testsuite/23_containers/unordered_multiset/96088.cc 
(main): Likewise.
    * testsuite/23_containers/unordered_set/96088.cc (main): 
Likewise.
    * testsuite/ext/malloc_allocator/deallocate_local.cc 
(main): Likewise.
    * testsuite/ext/new_allocator/deallocate_local.cc (main): 
Likewise.
    * testsuite/ext/throw_allocator/deallocate_local.cc (main): 
Likewise.
    * testsuite/ext/pool_allocator/allocate_chunk.cc (started): 
New global.

    (operator new(size_t)): Check started.
    (main): Set/Unset started.
    * testsuite/ext/no_library_allocation.cc: New test case.

Ok to commit ?

François
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
index c6d50c20fbf..bcae891e5ec 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
@@ -268,6 +268,7 @@ test03()
 int
 main()
 {
+  __gnu_test::counter::enter();
   test01();
   test02();
   test11();
@@ -275,5 +276,6 @@ main()
   test21();
   test22();
   test03();
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
index 214bc91a559..9f16ad68218 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
@@ -61,7 +61,9 @@ test02()
 int
 main()
 {
+  __gnu_test::counter::enter();
   test01();
   test02();
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
index 838ce8d5bc5..b34cfe67092 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
@@ -61,7 +61,9 @@ test02()
 int
 main()
 {
+  __gnu_test::counter::enter();
   test01();
   test02();
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
index 0f7dce2b38c..d5717fcec2b 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
@@ -269,6 +269,7 @@ test03()
 int
 main()
 {
+  __gnu_test::counter::enter();
   test01();
   test02();
   test11();
@@ -277,5 +278,6 @@ main()
   test22();
   test23();
   test03();
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/ext/malloc_allocator/deallocate_local.cc b/libstdc++-v3/testsuite/ext/malloc_allocator/deallocate_local.cc
index 79b583bd716..3aa65f298b1 100644
--- a/libstdc++-v3/testsuite/ext/malloc_allocator/deallocate_local.cc
+++ b/libstdc++-v3/testsuite/ext/malloc_allocator/deallocate_local.cc
@@ -27,6 +27,7 @@ typedef std::basic_string string_t;
 
 int main()
 {
+  __gnu_test::counter::enter();
   {
 string_t s;
 s += "bayou bend";
@@ -34,5 +35,7 @@ int main()
 
   if (__gnu_test::counter::count() != 0)
 throw std::runtime_error("count not zero");
+
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/ext/new_allocator/deallocate_local.cc b/libstdc++-v3/testsuite/ext/new_allocator/deallocate_local.cc
index fcde46e6e10..ac4996698c7 100644
--- a/libstdc++-v3/testsuite/ext/new_allocator/deallocate_local.cc
+++ b/libstdc++-v3/testsuite/ext/new_allocator/deallocate_local.cc
@@ -27,6 +27,7 @@ typedef std::basic_string string_t;
 
 int main()
 {
+  __gnu_test::counter::enter();
   {
 string_t s;
 s += "bayou bend";
@@ -34,5 +35,7 @@ int main()
 
   if (__gnu_test::counter::count() != 0)
 throw std::runtime_error("count not zero");
+
+  __gnu_test::counter::exit();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/ext/no_library_allocation.cc b/libstdc++-v3/testsuite/ext/no_library_allocation.cc
new f

[no subject]

2023-08-21 Thread Jacco via Gcc-patches
Hello,

Hope you are doing well with the family.

I noticed on LinkedIn.

Can I share an idea here?


[RISCV][committed] Remove spurious newline in ztso sequence

2023-08-21 Thread Jeff Law via Gcc-patches


amo-table-ztso-load-3 the coordination branch after merging up the Ztso 
changes due to a spurious newline in the output causing 
scan-function-body to fail.  There's probably an over-zealous .* or 
similar regexp in the framework.  I didn't see it in a quick scan, but 
could have easily missed it.


Regardless, fixing the extraneous newline is easy :-)

Committed to the trunk,
Jeff
commit 39491441a3aca7725d5a6dfeea4b01229d30c899
Author: Jeff Law 
Date:   Mon Aug 21 11:20:28 2023 -0600

[RISCV][committed] Remove spurious newline in ztso sequence

amo-table-ztso-load-3 the coordination branch after merging up the Ztso 
changes
due to a spurious newline in the output causing scan-function-body to fail.
There's probably an over-zealous .* or similar regexp in the framework.  I
didn't see it in a quick scan, but could have easily missed it.

Regardless, fixing the extraneous newline is easy :-)

gcc/
* config/riscv/sync-ztso.md (atomic_load_ztso): Avoid 
extraenous
newline.

diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index 91c2a48c069..ed94471b96b 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -52,7 +52,7 @@ (define_insn "atomic_load_ztso"
 
 if (model == MEMMODEL_SEQ_CST)
   return "fence\trw,rw\;"
-"l\t%0,%1\;";
+"l\t%0,%1";
 else
   return "l\t%0,%1";
   }
@@ -77,4 +77,4 @@ (define_insn "atomic_store_ztso"
   return "s\t%z1,%0";
   }
   [(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
\ No newline at end of file
+   (set (attr "length") (const_int 8))])


Re: Patch ping Re: [PATCH 0/12] GCC _BitInt support [PR102989]

2023-08-21 Thread Joseph Myers
On Mon, 21 Aug 2023, Jakub Jelinek via Gcc-patches wrote:

> Joseph, could I ask now at least for an overall design review of the
> C patches (8-10,13) whether its interfaces with middle-end are ok,
> so that Richi can review the middle-end parts?

I am fine with the interface to the middle-end parts.

I think the libgcc functions (i.e. those exported by libgcc, to which 
references are generated by the compiler) need documenting in libgcc.texi.  
Internal functions or macros in the libgcc patch need appropriate comments 
specifying their semantics; especially FP_TO_BITINT and FP_FROM_BITINT 
which have a lot of arguments and no comments saying what the semantics of 
the macros and their arguments are supposed to me.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v7 4/5] OpenMP/OpenACC: Unordered/non-constant component offset runtime diagnostic

2023-08-21 Thread Julian Brown
On Fri, 18 Aug 2023 15:47:50 -0700
Julian Brown  wrote:

> This version of the patch scales back the previously-posted version to
> merely add a diagnostic for incorrect usage of component accesses with
> variably-indexed arrays of structs: the only permitted variant is
> where we have multiple indices that are the same, but we could not
> prove so at compile time.  Rather than silently producing the wrong
> result for cases where the indices are in fact different, we error
> out (e.g., "map(dtarr(i)%arrptr, dtarr(j)%arrptr(4:8))", for
> different i/j).

Here's a small followup fix for this one that hopefully addresses the
issue discovered by Linaro's automated pre-commit tester (reported to
me via Maxim, thanks!).

This is probably obvious if the parent patch is OK.

Thanks,

Julian
commit e3b84ec499ae128320b948d07d258322902e6e70
Author: Julian Brown 
Date:   Mon Aug 21 17:51:01 2023 +

OpenMP: Fix map-arrayofstruct-{2,3}.c tests for shared-memory systems

This is a small fix for two testcases when offload_device_nonshared_as
returns false, e.g. for systems not using GPU offloading.

diff --git a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
index 81f7efc27c98..ff7ce0eb1622 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-2.c
@@ -53,6 +53,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-output "(\n|\r|\r\n)" } */
-/* { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" } */
+/* { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } } */
+/* { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } } */
 /* { dg-shouldfail "" { offload_device_nonshared_as } } */
diff --git a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
index 639a0d2bc1e3..770ac2ae1aa6 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/map-arrayofstruct-3.c
@@ -63,6 +63,6 @@ int main (void)
   return 0;
 }
 
-/* { dg-output "(\n|\r|\r\n)" } */
-/* { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" } */
+/* { dg-output "(\n|\r|\r\n)" { target offload_device_nonshared_as } } */
+/* { dg-output "libgomp: Mapped array elements must be the same .*(\n|\r|\r\n)+" { target offload_device_nonshared_as } } */
 /* { dg-shouldfail "" { offload_device_nonshared_as } } */


[PATCH V2] Emit funcall external declarations only if actually used.

2023-08-21 Thread Jose E. Marchesi via Gcc-patches
[Differences from V1:
- Prototype for call_from_call_insn moved before comment block.
- Reuse the `call' flag for SYMBOL_REF_LIBCALL.
- Fallback to check REG_CALL_DECL in non-direct calls.
- New test to check correct behavior for non-direct calls.]

There are many places in GCC where alternative local sequences are
tried in order to determine what is the cheapest or best alternative
to use in the current target.  When any of these sequences involve a
libcall, the current implementation of emit_library_call_value_1
introduce a side-effect consisting on emitting an external declaration
for the funcall (such as __divdi3) which is thus emitted even if the
sequence that does the libcall is not retained.

This is problematic in targets such as BPF, because the kernel loader
chokes on the spurious symbol __divdi3 and makes the resulting BPF
object unloadable.  Note that BPF objects are not linked before being
loaded.

This patch changes emit_library_call_value_1 to mark the target
SYMBOL_REF as a libcall.  Then, the emission of the external
declaration is done in the first loop of final.cc:shorten_branches.
This happens only if the corresponding sequence has been kept.

Regtested in x86_64-linux-gnu.
Tested with host x86_64-linux-gnu with target bpf-unknown-none.

gcc/ChangeLog

* rtl.h (SYMBOL_REF_LIBCALL): Define.
* calls.cc (emit_library_call_value_1): Do not emit external
libcall declaration here.
* final.cc (shorten_branches): Do it here.

gcc/testsuite/ChangeLog

* gcc.target/bpf/divmod-libcall-1.c: New test.
* gcc.target/bpf/divmod-libcall-2.c: Likewise.
* gcc.c-torture/compile/libcall-2.c: Likewise.
---
 gcc/calls.cc  |  9 +++---
 gcc/final.cc  | 30 +++
 gcc/rtl.h |  5 
 .../gcc.c-torture/compile/libcall-2.c |  8 +
 .../gcc.target/bpf/divmod-libcall-1.c | 19 
 .../gcc.target/bpf/divmod-libcall-2.c | 16 ++
 6 files changed, 83 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/libcall-2.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-1.c
 create mode 100644 gcc/testsuite/gcc.target/bpf/divmod-libcall-2.c

diff --git a/gcc/calls.cc b/gcc/calls.cc
index 1f3a6d5c450..219ea599b16 100644
--- a/gcc/calls.cc
+++ b/gcc/calls.cc
@@ -4388,9 +4388,10 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
|| argvec[i].partial != 0)
   update_stack_alignment_for_call (&argvec[i].locate);
 
-  /* If this machine requires an external definition for library
- functions, write one out.  */
-  assemble_external_libcall (fun);
+  /* Mark the emitted target as a libcall.  This will be used by final
+ in order to emit an external symbol declaration if the libcall is
+ ever used.  */
+  SYMBOL_REF_LIBCALL (fun) = 1;
 
   original_args_size = args_size;
   args_size.constant = (aligned_upper_bound (args_size.constant
@@ -4735,7 +4736,7 @@ emit_library_call_value_1 (int retval, rtx orgfun, rtx 
value,
   valreg,
   old_inhibit_defer_pop + 1, call_fusage, flags, args_so_far);
 
-  if (flag_ipa_ra)
+  if (flag_ipa_ra || SYMBOL_REF_LIBCALL (orgfun))
 {
   rtx datum = orgfun;
   gcc_assert (GET_CODE (datum) == SYMBOL_REF);
diff --git a/gcc/final.cc b/gcc/final.cc
index dd3e22547ac..2041e43fdd1 100644
--- a/gcc/final.cc
+++ b/gcc/final.cc
@@ -804,6 +804,8 @@ make_pass_compute_alignments (gcc::context *ctxt)
 }
 
 
+static rtx call_from_call_insn (rtx_call_insn *insn);
+
 /* Make a pass over all insns and compute their actual lengths by shortening
any branches of variable length if possible.  */
 
@@ -850,6 +852,34 @@ shorten_branches (rtx_insn *first)
   for (insn = get_insns (), i = 1; insn; insn = NEXT_INSN (insn))
 {
   INSN_SHUID (insn) = i++;
+
+  /* If this is a `call' instruction implementing a libcall, and
+ this machine requires an external definition for library
+ functions, write one out.  */
+  if (CALL_P (insn))
+{
+  rtx x;
+
+  if ((x = call_from_call_insn (dyn_cast  (insn)))
+  && (x = XEXP (x, 0))
+  && MEM_P (x)
+  && (x = XEXP (x, 0))
+  && SYMBOL_REF_P (x)
+  && SYMBOL_REF_LIBCALL (x))
+{
+  /* Direct call.  */
+  assemble_external_libcall (x);
+}
+  else if ((x = find_reg_note (insn, REG_CALL_DECL, NULL_RTX))
+   && (x = XEXP (x, 0)))
+{
+  /* Indirect call with REG_CALL_DECL note.  */
+  gcc_assert (SYMBOL_REF_P (x));
+  if (SYMBOL_REF_LIBCALL (x))
+assemble_external_libcall (x);
+}
+}
+
   if (INSN_P (insn))
continue;
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e1c51156f90..28be708a55f 10

Re: [PATCH] Fix tests sensitive to internal library allocations

2023-08-21 Thread Jonathan Wakely via Gcc-patches
On Mon, 21 Aug 2023 at 18:05, François Dumont via Libstdc++
 wrote:
>
> Hi
>
> Here is a propocal to fix tests sensitive to libstdc++ internal allocations.

Surely the enter() and exit() calls should be a constructor and destructor?

The constructor could use count() to get the count, and then restore
it in the destructor. Something like:

--- a/libstdc++-v3/testsuite/util/replacement_memory_operators.h
+++ b/libstdc++-v3/testsuite/util/replacement_memory_operators.h
@@ -75,12 +75,30 @@ namespace __gnu_test
  counter& cntr = get();
  cntr._M_increments = cntr._M_decrements = 0;
}
+
+struct scope
+{
+  scope() : _M_count(counter::count()) { }
+  ~scope() { counter::get()._M_count = _M_count; }
+
+private:
+  std::size_t _M_count;
+
+#if __cplusplus >= 201103L
+  scope(const scope&) = delete;
+  scope& operator=(const scope&) = delete;
+#else
+  scope(const scope&);
+  scope& operator=(const scope&);
+#endif
+};
  };

  template
bool
check_new(Alloc a = Alloc())
{
+  __gnu_test::counter::scope s;
  __gnu_test::counter::exceptions(false);
  __gnu_test::counter::reset();
  (void) a.allocate(10);






>
> Tested by restoring allocation in tzdb.cc.
>
> As announced I'm also adding a test to detect such allocations. If it is
> ok let me know if you prefer to see it in a different place.

The test is a good idea. I think 17_intro/no_library_allocation.cc
would be a better place for it.

>
>  libstdc++: Fix tests relying on operator new/delete overload
>
>  Fix tests that are checking for an allocation plan. They are failing if
>  an allocation is taking place outside the test.
>
>  libstdc++-v3/ChangeLog
>
>  * testsuite/util/replacement_memory_operators.h
>  (counter::_M_pre_enter_count): New.
>  (counter::enter, counter::exit): New static methods to call
> on main() enter/exit.
>  * testsuite/23_containers/unordered_map/96088.cc (main):
>  Call __gnu_test::counter::enter/exit.
>  * testsuite/23_containers/unordered_multimap/96088.cc
> (main): Likewise.
>  * testsuite/23_containers/unordered_multiset/96088.cc
> (main): Likewise.
>  * testsuite/23_containers/unordered_set/96088.cc (main):
> Likewise.
>  * testsuite/ext/malloc_allocator/deallocate_local.cc
> (main): Likewise.
>  * testsuite/ext/new_allocator/deallocate_local.cc (main):
> Likewise.
>  * testsuite/ext/throw_allocator/deallocate_local.cc (main):
> Likewise.
>  * testsuite/ext/pool_allocator/allocate_chunk.cc (started):
> New global.
>  (operator new(size_t)): Check started.
>  (main): Set/Unset started.
>  * testsuite/ext/no_library_allocation.cc: New test case.
>
> Ok to commit ?
>
> François


Re: [PATCH] testsuite: Adjust g++.dg/gomp/pr58567.C to new compiler message

2023-08-21 Thread Thiago Jung Bauermann via Gcc-patches


Hello Tobias,

Tobias Burnus  writes:

> On 18.08.23 23:24, Thiago Jung Bauermann wrote:
>> Tobias Burnus  writes:
>>> the patch looks good to me. Thanks! Can you commit the patch yourself or
>>> do you need someone to do this for you?
>> Thank you! I don't have commit access, so I would need someone to do
>> this for me.
>
> Done now in commit r14-3344-g40a6803c6d8ca2.

Thank you!

-- 
Thiago


Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread Tobias Burnus

Hi FX,
On 20.08.23 21:37, FX Coudert wrote:

testsuite/libgomp.c/simd-math-1.c calls nonstandard functions that are not 
available on darwin (and possibly other systems?).


Namely:

* "gamma" which has been repaced by tgamma and lgamma as BSD's gamma ==
tgamma while glibc's gamma == lgamma.

* __builtin_scalb{,f,l} – where "scalb" (only double version) was in
POSIX.1-2001 but was replaced in favour of scalb{l,}n{,f,l} which take
an int/long instead of a floating point number for the exponent argument
(2nd arg).

* __builtin_significand{,f,l} – where the man page states: "This
function exists mainly for use in certain standardized tests for IEEE
754 conformance." and "These functions are nonstandard; the double
version is available on a number of other systems."

(BTW: The testcase does not test the long-double versions – which makes
sense as it evolved as nvptx/gcn SIMD test.)

* * *

Looking at the testcase, I wonder:

(a) why there is no test for scalb[l,}n{,f,l} (= scalb* but with int or
long as second argument for 'exp'). (Requires a new macro taking a
second type.)

(b) whether the tgamma test shouldn't be only TEST_FUN_XFAIL for '#if
defined(__AMDGCN__) || defined(__nvptx__)' and TEST_FUN otherwise. Other
platforms could then still add themselves to XFAIL as needed.


Because I did not want to disable their testing completely, I suggest we simply 
use preprocessor macros to avoid them on darwin.

That makes sense.

This fixes the test failure on aarch64-apple-darwin.
OK to commit?


OK. — I'd prefer if you also changed + tested a fix for my (a) + (b)
remarks, but as those are unrelated, I understand if you don't and just
commit your Darwin patch.

Thanks,

Tobias


 From bc7f4862b9301c9490c7e80a58aa21c7a9727bcd Mon Sep 17 00:00:00 2001
From: Francois-Xavier Coudert
Date: Sun, 20 Aug 2023 21:32:18 +0200
Subject: [PATCH] libgomp, testsuite: Do not call nonstandard functions on
  darwin

The following functions are not standard, and not always available on
darwin. They should not be called there: gamma, gammaf, scalb, scalbf,
significand, and significandf.

libgomp/ChangeLog:

  * testsuite/libgomp.c/simd-math-1.c: Avoid calling nonstandard
  functions on darwin.
---
  libgomp/testsuite/libgomp.c/simd-math-1.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c 
b/libgomp/testsuite/libgomp.c/simd-math-1.c
index dd2077cc597..b6127c118d1 100644
--- a/libgomp/testsuite/libgomp.c/simd-math-1.c
+++ b/libgomp/testsuite/libgomp.c/simd-math-1.c
@@ -160,7 +160,9 @@ int main (void)
TEST_FUN (float, -10.0, 10.0, expf);
TEST_FUN (float, -10.0, 10.0, exp2f);
TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf);
+#if !defined(__APPLE__)
TEST_FUN (float, -10.0, 10.0, gammaf);
+#endif
TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf);
TEST_FUN (float, -10.0, 10.0, lgammaf);
TEST_FUN (float, -1.0, 50.0, logf);
@@ -169,8 +171,10 @@ int main (void)
TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf);
TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf);
TEST_FUN (float, -50.0, 50.0, rintf);
+#if !defined(__APPLE__)
TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf);
TEST_FUN (float, -10.0, 10.0, __builtin_significandf);
+#endif
TEST_FUN (float, -3.14159265359, 3.14159265359, sinf);
TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf);
TEST_FUN (float, -0.1, 1.0, sqrtf);
@@ -193,7 +197,9 @@ int main (void)
TEST_FUN (double, -10.0, 10.0, exp);
TEST_FUN (double, -10.0, 10.0, exp2);
TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod);
+#if !defined(__APPLE__)
TEST_FUN (double, -10.0, 10.0, gamma);
+#endif
TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot);
TEST_FUN (double, -10.0, 10.0, lgamma);
TEST_FUN (double, -1.0, 50.0, log);
@@ -202,8 +208,10 @@ int main (void)
TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow);
TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder);
TEST_FUN (double, -50.0, 50.0, rint);
+#if !defined(__APPLE__)
TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb);
TEST_FUN (double, -10.0, 10.0, __builtin_significand);
+#endif
TEST_FUN (double, -3.14159265359, 3.14159265359, sin);
TEST_FUN (double, -3.14159265359, 3.14159265359, sinh);
TEST_FUN (double, -0.1, 1.0, sqrt);
-- 2.39.2 (Apple Git-143)
Attachments:

0001-libgomp-testsuite-Do-not-call-nonstandard-functions-.patch   2,6 KB


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] Remove XFAIL from gcc/testsuite/gcc.dg/unroll-7.c

2023-08-21 Thread Richard Sandiford via Gcc-patches
Thiago Jung Bauermann via Gcc-patches  writes:
> This test passes since commit e41103081bfa "Fix undefined behaviour in
> profile_count::differs_from_p", so remove the xfail annotation.
>
> Tested on aarch64-linux-gnu, armv8l-linux-gnueabihf and x86_64-linux-gnu.
>
> gcc/testsuite/ChangeLog:
>   * gcc.dg/unroll-7.c: Remove xfail.

Thanks, pushed to trunk.  Sorry for the slow response.

Richard

> ---
>  gcc/testsuite/gcc.dg/unroll-7.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/unroll-7.c b/gcc/testsuite/gcc.dg/unroll-7.c
> index 650448df5db1..17c5e533c2cb 100644
> --- a/gcc/testsuite/gcc.dg/unroll-7.c
> +++ b/gcc/testsuite/gcc.dg/unroll-7.c
> @@ -15,4 +15,4 @@ int t(void)
>  /* { dg-final { scan-rtl-dump "upper bound: 99" "loop2_unroll" } } */
>  /* { dg-final { scan-rtl-dump "realistic bound: 99" "loop2_unroll" } } */
>  /* { dg-final { scan-rtl-dump "considering unrolling loop with constant 
> number of iterations" "loop2_unroll" } } */
> -/* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" {xfail *-*-* 
> } } } */
> +/* { dg-final { scan-rtl-dump-not "Invalid sum" "loop2_unroll" } } */
>
> base-commit: 5da4c0b85a97727e6802eaf3a0d47bcdb8da5f51


Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 09:12:09PM +0200, Tobias Burnus wrote:
> OK. — I'd prefer if you also changed + tested a fix for my (a) + (b)
> remarks, but as those are unrelated, I understand if you don't and just
> commit your Darwin patch.

I don't like the #if !defined(__APPLE__) conditionals everywhere in the
test, I think much cleaner would be to add an effective target to test
for those functions (ideally that calls to all of them link;
all of them at once) and then use
{ dg-additional-options "-DWHATEVER" { target whatever } }
and use #ifdef WHATEVER conditionals instead.
That way any other target which doesn't have all these will not suffer from
it.

Jakub



[PATCH] Fortran: implement vector sections in DATA statements [PR49588]

2023-08-21 Thread Harald Anlauf via Gcc-patches
Dear all,

the attached patch implements vector sections in DATA statements.

The implementation is simpler than the size of the patch suggests,
as part of changes try to clean up the existing code to make it
easier to understand, as ordinary sections (start:end:stride)
and vector sections may actually share some common code.

The basisc idea of the implementation is that one needs a
temporary vector that keeps track of the offsets into the
array constructors for the indices in the array reference
that are vectors.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 96cc0333cdaa8459ef516ae8e74158cdb6302853 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 21 Aug 2023 21:23:57 +0200
Subject: [PATCH] Fortran: implement vector sections in DATA statements
 [PR49588]

gcc/fortran/ChangeLog:

	PR fortran/49588
	* data.cc (gfc_advance_section): Derive next index set and next offset
	into DATA variable also for array references using vector sections.
	Use auxiliary array to keep track of offsets into indexing vectors.
	(gfc_get_section_index): Set up initial indices also for DATA variables
	with array references using vector sections.
	* data.h (gfc_get_section_index): Adjust prototype.
	(gfc_advance_section): Likewise.
	* resolve.cc (check_data_variable): Pass vector offsets.

gcc/testsuite/ChangeLog:

	PR fortran/49588
	* gfortran.dg/data_vector_section.f90: New test.
---
 gcc/fortran/data.cc   | 161 +++---
 gcc/fortran/data.h|   4 +-
 gcc/fortran/resolve.cc|   5 +-
 .../gfortran.dg/data_vector_section.f90   |  26 +++
 4 files changed, 134 insertions(+), 62 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/data_vector_section.f90

diff --git a/gcc/fortran/data.cc b/gcc/fortran/data.cc
index d29eb12c1b1..7c2537dd3f0 100644
--- a/gcc/fortran/data.cc
+++ b/gcc/fortran/data.cc
@@ -634,65 +634,102 @@ abort:

 void
 gfc_advance_section (mpz_t *section_index, gfc_array_ref *ar,
-		 mpz_t *offset_ret)
+		 mpz_t *offset_ret, int *vector_offset)
 {
   int i;
   mpz_t delta;
   mpz_t tmp;
   bool forwards;
   int cmp;
-  gfc_expr *start, *end, *stride;
+  gfc_expr *start, *end, *stride, *elem;
+  gfc_constructor_base base;

   for (i = 0; i < ar->dimen; i++)
 {
-  if (ar->dimen_type[i] != DIMEN_RANGE)
-	continue;
+  bool advance = false;

-  if (ar->stride[i])
+  switch (ar->dimen_type[i])
 	{
-	  stride = gfc_copy_expr(ar->stride[i]);
-	  if(!gfc_simplify_expr(stride, 1))
-	gfc_internal_error("Simplification error");
-	  mpz_add (section_index[i], section_index[i],
-		   stride->value.integer);
-	  if (mpz_cmp_si (stride->value.integer, 0) >= 0)
-	forwards = true;
+	case DIMEN_ELEMENT:
+	  /* Loop to advance the next index.  */
+	  advance = true;
+	  break;
+
+	case DIMEN_RANGE:
+	  if (ar->stride[i])
+	{
+	  stride = gfc_copy_expr(ar->stride[i]);
+	  if(!gfc_simplify_expr(stride, 1))
+		gfc_internal_error("Simplification error");
+	  mpz_add (section_index[i], section_index[i],
+		   stride->value.integer);
+	  if (mpz_cmp_si (stride->value.integer, 0) >= 0)
+		forwards = true;
+	  else
+		forwards = false;
+	  gfc_free_expr(stride);
+	}
 	  else
-	forwards = false;
-	  gfc_free_expr(stride);
-	}
-  else
-	{
-	  mpz_add_ui (section_index[i], section_index[i], 1);
-	  forwards = true;
-	}
+	{
+	  mpz_add_ui (section_index[i], section_index[i], 1);
+	  forwards = true;
+	}

-  if (ar->end[i])
-{
-	  end = gfc_copy_expr(ar->end[i]);
-	  if(!gfc_simplify_expr(end, 1))
-	gfc_internal_error("Simplification error");
-	  cmp = mpz_cmp (section_index[i], end->value.integer);
-	  gfc_free_expr(end);
-	}
-  else
-	cmp = mpz_cmp (section_index[i], ar->as->upper[i]->value.integer);
+	  if (ar->end[i])
+	{
+	  end = gfc_copy_expr(ar->end[i]);
+	  if(!gfc_simplify_expr(end, 1))
+		gfc_internal_error("Simplification error");
+	  cmp = mpz_cmp (section_index[i], end->value.integer);
+	  gfc_free_expr(end);
+	}
+	  else
+	cmp = mpz_cmp (section_index[i], ar->as->upper[i]->value.integer);

-  if ((cmp > 0 && forwards) || (cmp < 0 && !forwards))
-	{
-	  /* Reset index to start, then loop to advance the next index.  */
-	  if (ar->start[i])
+	  if ((cmp > 0 && forwards) || (cmp < 0 && !forwards))
 	{
-	  start = gfc_copy_expr(ar->start[i]);
-	  if(!gfc_simplify_expr(start, 1))
-	gfc_internal_error("Simplification error");
+	  /* Reset index to start, then loop to advance the next index.  */
+	  if (ar->start[i])
+		{
+		  start = gfc_copy_expr(ar->start[i]);
+		  if(!gfc_simplify_expr(start, 1))
+		gfc_internal_error("Simplification error");
+		  mpz_set (section_index[i], start->value.integer);
+		  gfc_free_expr(start);
+		}
+	  else
+		mpz_set (section_index[i], ar->as->lower[i]->value.integer);
+	  advance = true;
+	}
+

Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread FX Coudert via Gcc-patches
> I don't like the #if !defined(__APPLE__) conditionals everywhere in the
> test, I think much cleaner would be to add an effective target to test
> for those functions

I understand, I wanted to not just report the issue but propose an option. It 
seems a bit heavy to design an effective target just for one test, though, no?

Another possibility would be to replace #if !defined(__APPLE__) by #if 
defined(__linux__), or glibc?

FX




Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 09:50:37PM +0200, FX Coudert wrote:
> > I don't like the #if !defined(__APPLE__) conditionals everywhere in the
> > test, I think much cleaner would be to add an effective target to test
> > for those functions
> 
> I understand, I wanted to not just report the issue but propose an option. It 
> seems a bit heavy to design an effective target just for one test, though, no?

It has the advantage of getting it right on all current and future targets.

> Another possibility would be to replace #if !defined(__APPLE__) by #if 
> defined(__linux__), or glibc?

If we do it, I'd still prefer one specific macro for all those spots,
TEST_NONSTANDARD_MATH_FNS or whatever and then at the start of the test
you can do either
#if !defined(__APPLE__) or #if defined(__linux__) or whatever else around
its definition.
That has the advantage of only touching one spot if one wants to add or
remove those on some other target.

Jakub



Re: [PATCH] RISC-V: Support simplify (-1-x) for vector.

2023-08-21 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 16 Aug 2023 at 14:12, yanzhang.wang--- via Gcc-patches
 wrote:
>
> From: Yanzhang Wang 
>
> The pattern is enabled for scalar but not for vector. The patch try to
> make it consistent and will convert below code,
(CCing Richard S.)
Hi,
Sorry if this comment is not relevant to the patch but I was wondering if it
should also fold -1 - x --> ~x for the following test or is the test
written incorrectly ?

svint32_t f(svint32_t x)
{
  return svsub_s32_x (svptrue_b8 (), svdup_s32 (-1), x);
}

expand dump shows:
(insn 2 4 3 2 (set (reg/v:VNx4SI 93 [ x ])
(reg:VNx4SI 32 v0 [ x ])) "foo.c":9:1 -1
 (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:VNx4SI 94)
(const_vector:VNx4SI repeat [
(const_int -1 [0x])
])) "foo.c":10:10 -1
 (nil))
(insn 7 6 11 2 (set (reg:VNx4SI 92 [  ])
(minus:VNx4SI (reg:VNx4SI 94)
(reg/v:VNx4SI 93 [ x ]))) "foo.c":10:10 -1
 (nil))
(insn 11 7 12 2 (set (reg/i:VNx4SI 32 v0)
(reg:VNx4SI 92 [  ])) "foo.c":11:1 -1
 (nil))
(insn 12 11 0 2 (use (reg/i:VNx4SI 32 v0)) "foo.c":11:1 -1
 (nil))

and results in following code-gen:
f:
mov z31.b, #-1
sub z0.s, z31.s, z0.s
ret

Altho I suppose at TREE level the above call to svsub_s32_x could be folded by
implementing the same transform (-1 - x -> ~x) in svsub_impl::fold ?

Thanks,
Prathamesh




>
> shortcut_for_riscv_vrsub_case_1_32:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vrsub.viv1,v1,-1
> vs1r.v  v1,0(a0)
> ret
>
> to,
>
> shortcut_for_riscv_vrsub_case_1_32:
> vl1re32.v   v1,0(a1)
> vsetvli zero,a2,e32,m1,ta,ma
> vnot.v  v1,v1
> vs1r.v  v1,0(a0)
> ret
>
> gcc/ChangeLog:
>
> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
> Get -1 with mode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/simplify-vrsub.c: New test.
>
> Signed-off-by: Yanzhang Wang 
> ---
>  gcc/simplify-rtx.cc|  2 +-
>  .../gcc.target/riscv/rvv/base/simplify-vrsub.c | 18 ++
>  2 files changed, 19 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/simplify-vrsub.c
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index d7315d82aa3..eb1ac120832 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -3071,7 +3071,7 @@ simplify_context::simplify_binary_operation_1 (rtx_code 
> code,
>/* (-1 - a) is ~a, unless the expression contains symbolic
>  constants, in which case not retaining additions and
>  subtractions could cause invalid assembly to be produced.  */
> -  if (trueop0 == constm1_rtx
> +  if (trueop0 == CONSTM1_RTX (mode)
>   && !contains_symbolic_reference_p (op1))
> return simplify_gen_unary (NOT, mode, op1, mode);
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/simplify-vrsub.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/simplify-vrsub.c
> new file mode 100644
> index 000..df87ed94ea4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/simplify-vrsub.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64 -O3 -Wno-psabi" } */
> +
> +#include "riscv_vector.h"
> +
> +#define VRSUB_WITH_LMUL(LMUL, DTYPE)\
> +  vint##DTYPE##m##LMUL##_t  \
> +  shortcut_for_riscv_vrsub_case_##LMUL##_##DTYPE\
> +  (vint##DTYPE##m##LMUL##_t v1, \
> +   size_t vl)   \
> +  { \
> +return __riscv_vrsub_vx_i##DTYPE##m##LMUL (v1, -1, vl); \
> +  }
> +
> +VRSUB_WITH_LMUL (1, 16)
> +VRSUB_WITH_LMUL (1, 32)
> +
> +/* { dg-final { scan-assembler-times {vnot\.v} 2 } } */
> --
> 2.41.0
>


Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread FX Coudert via Gcc-patches
>> I understand, I wanted to not just report the issue but propose an option. 
>> It seems a bit heavy to design an effective target just for one test, 
>> though, no?
> 
> It has the advantage of getting it right on all current and future targets.

Something like this? (not tested yet)


diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 2f9e538278f..85d467434e9 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -377,6 +377,24 @@ proc offload_target_to_openacc_device_type { 
offload_target } {
 }
 }
  +# Return 1 if certain nonstandard math functions are available
+# on the target.
+proc libgomp_check_effective_target_nonstandard_math_functions { } {
+return [check_no_compiler_messages nonstandard_math_functions executable {
+#include 
+int main() {
+  float x = 42;
+  double y = 42;
+  x = gammaf (x);
+  x = __builtin_scalbf (x, 2.f);
+  x =__builtin_significandf (x);
+  y = gamma (y);
+  y = __builtin_scalb (y, 2.);
+  y =__builtin_significand (y);
+  return 0;
+} } "-lm" ]
+}
+
 # Return 1 if compiling for the specified offload target
 # Takes -foffload=... into account by checking OFFLOAD_TARGET_NAMES=
 # in the -v compiler output.



Re: [PATCH] libgomp, testsuite: Do not call nonstandard functions on darwin

2023-08-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Aug 21, 2023 at 10:00:30PM +0200, FX Coudert wrote:
> >> I understand, I wanted to not just report the issue but propose an option. 
> >> It seems a bit heavy to design an effective target just for one test, 
> >> though, no?
> > 
> > It has the advantage of getting it right on all current and future targets.
> 
> Something like this? (not tested yet)

Yes (of course if it works on linux and is true there and on darwin and
expectedly is false there).

> --- a/libgomp/testsuite/lib/libgomp.exp
> +++ b/libgomp/testsuite/lib/libgomp.exp
> @@ -377,6 +377,24 @@ proc offload_target_to_openacc_device_type { 
> offload_target } {
>  }
>  }
>   +# Return 1 if certain nonstandard math functions are available
> +# on the target.
> +proc libgomp_check_effective_target_nonstandard_math_functions { } {
> +return [check_no_compiler_messages nonstandard_math_functions executable 
> {
> +#include 
> +int main() {
> +  float x = 42;
> +  double y = 42;
> +  x = gammaf (x);
> +  x = __builtin_scalbf (x, 2.f);
> +  x =__builtin_significandf (x);
> +  y = gamma (y);
> +  y = __builtin_scalb (y, 2.);
> +  y =__builtin_significand (y);
> +  return 0;
> +} } "-lm" ]
> +}
> +
>  # Return 1 if compiling for the specified offload target
>  # Takes -foffload=... into account by checking OFFLOAD_TARGET_NAMES=
>  # in the -v compiler output.

Jakub



Re: [PATCH] Fix tests sensitive to internal library allocations

2023-08-21 Thread François Dumont via Gcc-patches

Here is the updated and tested patch.

On 21/08/2023 20:07, Jonathan Wakely wrote:

On Mon, 21 Aug 2023 at 18:05, François Dumont via Libstdc++
 wrote:

Hi

Here is a propocal to fix tests sensitive to libstdc++ internal allocations.

Surely the enter() and exit() calls should be a constructor and destructor?

The constructor could use count() to get the count, and then restore
it in the destructor. Something like:

--- a/libstdc++-v3/testsuite/util/replacement_memory_operators.h
+++ b/libstdc++-v3/testsuite/util/replacement_memory_operators.h
@@ -75,12 +75,30 @@ namespace __gnu_test
   counter& cntr = get();
   cntr._M_increments = cntr._M_decrements = 0;
 }
+
+struct scope
+{
+  scope() : _M_count(counter::count()) { }
+  ~scope() { counter::get()._M_count = _M_count; }
+
+private:
+  std::size_t _M_count;
+
+#if __cplusplus >= 201103L
+  scope(const scope&) = delete;
+  scope& operator=(const scope&) = delete;
+#else
+  scope(const scope&);
+  scope& operator=(const scope&);
+#endif
+};
   };

   template
 bool
 check_new(Alloc a = Alloc())
 {
+  __gnu_test::counter::scope s;
   __gnu_test::counter::exceptions(false);
   __gnu_test::counter::reset();
   (void) a.allocate(10);







Tested by restoring allocation in tzdb.cc.

As announced I'm also adding a test to detect such allocations. If it is
ok let me know if you prefer to see it in a different place.

The test is a good idea. I think 17_intro/no_library_allocation.cc
would be a better place for it.


  libstdc++: Fix tests relying on operator new/delete overload

  Fix tests that are checking for an allocation plan. They are failing if
  an allocation is taking place outside the test.

  libstdc++-v3/ChangeLog

  * testsuite/util/replacement_memory_operators.h
  (counter::_M_pre_enter_count): New.
  (counter::enter, counter::exit): New static methods to call
on main() enter/exit.
  * testsuite/23_containers/unordered_map/96088.cc (main):
  Call __gnu_test::counter::enter/exit.
  * testsuite/23_containers/unordered_multimap/96088.cc
(main): Likewise.
  * testsuite/23_containers/unordered_multiset/96088.cc
(main): Likewise.
  * testsuite/23_containers/unordered_set/96088.cc (main):
Likewise.
  * testsuite/ext/malloc_allocator/deallocate_local.cc
(main): Likewise.
  * testsuite/ext/new_allocator/deallocate_local.cc (main):
Likewise.
  * testsuite/ext/throw_allocator/deallocate_local.cc (main):
Likewise.
  * testsuite/ext/pool_allocator/allocate_chunk.cc (started):
New global.
  (operator new(size_t)): Check started.
  (main): Set/Unset started.
  * testsuite/ext/no_library_allocation.cc: New test case.

Ok to commit ?

Françoisdiff --git a/libstdc++-v3/testsuite/17_intro/no_library_allocation.cc b/libstdc++-v3/testsuite/17_intro/no_library_allocation.cc
new file mode 100644
index 000..278d4757c93
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/no_library_allocation.cc
@@ -0,0 +1,8 @@
+#include 
+#include 
+
+int main()
+{
+  VERIFY( __gnu_test::counter::count() == 0 );
+  return 0;
+}
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
index c6d50c20fbf..cdf00c93d80 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/96088.cc
@@ -268,6 +268,7 @@ test03()
 int
 main()
 {
+  __gnu_test::counter::scope s;
   test01();
   test02();
   test11();
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
index 214bc91a559..d8b9a40c174 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multimap/96088.cc
@@ -61,6 +61,7 @@ test02()
 int
 main()
 {
+  __gnu_test::counter::scope s;
   test01();
   test02();
   return 0;
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
index 838ce8d5bc5..db17cda0ddd 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_multiset/96088.cc
@@ -61,6 +61,7 @@ test02()
 int
 main()
 {
+  __gnu_test::counter::scope s;
   test01();
   test02();
   return 0;
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc b/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
index 0f7dce2b38c..831f2aa1210 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_set/96088.cc
@@ -269,6 +269,7 @@ test03()
 int
 main()
 {
+  __gnu_test::counter::scope s;
   test01();
   test02();

Re: RISCV test infrastructure for d / v / zfh extensions

2023-08-21 Thread Robin Dapp via Gcc-patches
Hi Joern.

> Hmm, you are right.  I personally prefer my version because it allows
> consistent naming of the
> different tests, also easily extendible when new extensions need testing.
> Although the riscv_vector name has the advantage that it is better
> legible for people who are
> not used to dealing with RISC_V extension names.  If we keep
> riscv_vector, it would make
> sense to name the other tests also something more verbose, e.g. change
> riscv_d into
> riscv_double_fp or even riscv_double_precision_floating_point .
> It would be nice to hear other people's opinions on the naming.

I can live with either with a preference for your naming scheme, i.e. 
calling the extensions directly by their name for consistency reasons.
A more verbose scheme might lead to misconceptions later in case we
have several closely related extensions.  There will probably already be
ample discussion during ratification about naming and IMHO we should
not repeat that just to make names more accessible.  If needed we can
still add comments in the respective tests to clarify.
Vector is usually special among architecture extensions but we're not
even consistent with naming in the source itself, so...  

>> Would it make sense to skip the first check here
>> (check_effective_target_riscv_v) so we have a proper runtime check?
> 
> My starting point was that the changing of global testsuite variables around -
> as the original RISC-V vector patches did - is wrong.  The user asked to test
> a particular target (or set targets, for multilibs), and that target
> is the one to test,
> so we can't just assume it has other hardware features that are not implied by
> the target.
> Contrarily, the target that the user requested to test can be assumed to be
> available for testing.  Testing that it actually works is a part of
> the point of the
> test.  If I ask for a dejagnu test for a target that has vector support, I 
> would
> hope that the vector support is also tested, not backing off if it finds that
> there is a problem with the target,
> The way I look at things, when the macro  __riscv_v is defined,
> the compiler asserts that it is compiling for a target that has vector 
> support,
> because it was instructed by configuration / options to emit code for that
> target.  Which we can take as evidence that dejagnu is run with options
> to select that target (either explicitly or by default due to the
> configuration of
> the compiler under test)

Yes, I largely agree with that.  Where I was coming from is that several other
effective target checks will not short circuit the check but always perform it
fully (i.e. interpreting the effective target as the full chain up to 
execution).
Yet, I can see the appeal of the short circuit as well and in the end it really
doesn't matter all that much.

I would have preferred to replace the existing checks right away in order to
immediately have proper coverage but let's not dwell on that, therefore
LGTM, thanks. 

Regards
 Robin


[PATCH 0/2] libstdc++: Documentation fixes.

2023-08-21 Thread Bruno Victal
This small patch-series fixes the 'doc-install-info' rule
and updates the URI used for docbook-xsl.

Bruno Victal (2):
  libstdc++: Fix 'doc-install-info' rule.
  libstdc++: Update docbook xsl URI.

 libstdc++-v3/acinclude.m4| 2 +-
 libstdc++-v3/doc/Makefile.am | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)


base-commit: f9ff6fa58217294d63f255dd02abfcc8a074f509
-- 
2.40.1



[PATCH 2/2] libstdc++: Update docbook xsl URI.

2023-08-21 Thread Bruno Victal
The URI for namespaced docbook-xsl was updated to reflect the current
DocBook upstream at .

libstdc++-v3/Changelog:
* acinclude.m4: Update docbook xsl URI.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b25378eaace..152811fd00d 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -604,7 +604,7 @@ dnl  XSL_STYLE_DIR
 dnl
 AC_DEFUN([GLIBCXX_CONFIGURE_DOCBOOK], [
 
-glibcxx_docbook_url=http://docbook.sourceforge.net/release/xsl-ns/current/
+glibcxx_docbook_url=http://cdn.docbook.org/release/xsl/current/
 
 AC_MSG_CHECKING([for local stylesheet directory])
 glibcxx_local_stylesheets=no
-- 
2.40.1



[PATCH 1/2] libstdc++: Fix 'doc-install-info' rule.

2023-08-21 Thread Bruno Victal
The info manual isn't moved to the expected location after
generation which causes the install rule for it to fail.

libstdc++-v3/Changelog:

* doc/Makefile.in: Regenerate.
* doc/Makefile.am: Fix 'doc-install-info' rule.
Fix typo in commment.
---
 libstdc++-v3/doc/Makefile.am | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/doc/Makefile.am b/libstdc++-v3/doc/Makefile.am
index 8371441c62e..373522d593d 100644
--- a/libstdc++-v3/doc/Makefile.am
+++ b/libstdc++-v3/doc/Makefile.am
@@ -598,7 +598,7 @@ stamp-pdf-docbook: doc-pdf-docbook-pre 
doc-xml-single-docbook
 doc-pdf-docbook: stamp-pdf-docbook
 
 # TEXINFO, via docbook2X
-# NB: Both experimental and tempermental
+# NB: Both experimental and temperamental
 manual_texi = ${docbook_outdir}/texinfo/libstdc++-manual.texi
 manual_info = ${docbook_outdir}/texinfo/libstdc++-manual.info
 DB2TEXI_FLAGS = \
@@ -615,7 +615,7 @@ stamp-texinfo-docbook: stamp-xml-single-docbook 
${docbook_outdir}/texinfo
 
 stamp-info-docbook: stamp-texinfo-docbook
@echo "Generating info files..."
-   $(MAKEINFO) $(MAKEINFOFLAGS) ${manual_texi}
+   $(MAKEINFO) $(MAKEINFOFLAGS) ${manual_texi} -o ${manual_info}
$(STAMP) stamp-info-docbook
 
 doc-texinfo-docbook: stamp-texinfo-docbook

base-commit: f9ff6fa58217294d63f255dd02abfcc8a074f509
-- 
2.40.1



Re: [PATCH 0/2] libstdc++: Documentation fixes.

2023-08-21 Thread Jonathan Wakely via Gcc-patches
On Mon, 21 Aug 2023, 21:33 Bruno Victal,  wrote:

> This small patch-series fixes the 'doc-install-info' rule
> and updates the URI used for docbook-xsl.
>

Thanks! I'll get these committed tomorrow.



> Bruno Victal (2):
>   libstdc++: Fix 'doc-install-info' rule.
>   libstdc++: Update docbook xsl URI.
>
>  libstdc++-v3/acinclude.m4| 2 +-
>  libstdc++-v3/doc/Makefile.am | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
>
> base-commit: f9ff6fa58217294d63f255dd02abfcc8a074f509
> --
> 2.40.1
>
>


Re: [PATCH 2/2] VR-VALUES: Rewrite test_for_singularity using range_op_handler

2023-08-21 Thread Andrew Pinski via Gcc-patches
On Fri, Aug 11, 2023 at 8:08 AM Andrew MacLeod via Gcc-patches
 wrote:
>
>
> On 8/11/23 05:51, Richard Biener wrote:
> > On Fri, Aug 11, 2023 at 11:17 AM Andrew Pinski via Gcc-patches
> >  wrote:
> >> So it turns out there was a simplier way of starting to
> >> improve VRP to start to fix PR 110131, PR 108360, and PR 108397.
> >> That was rewrite test_for_singularity to use range_op_handler
> >> and Value_Range.
> >>
> >> This patch implements that and
> >>
> >> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > I'm hoping Andrew/Aldy can have a look here.
> >
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >>  * vr-values.cc (test_for_singularity): Add edge argument
> >>  and rewrite using range_op_handler.
> >>  (simplify_compare_using_range_pairs): Use Value_Range
> >>  instead of value_range and update test_for_singularity call.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  * gcc.dg/tree-ssa/vrp124.c: New test.
> >>  * gcc.dg/tree-ssa/vrp125.c: New test.
> >> ---
> >>   gcc/testsuite/gcc.dg/tree-ssa/vrp124.c | 44 +
> >>   gcc/testsuite/gcc.dg/tree-ssa/vrp125.c | 44 +
> >>   gcc/vr-values.cc   | 91 --
> >>   3 files changed, 114 insertions(+), 65 deletions(-)
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >> new file mode 100644
> >> index 000..6ccbda35d1b
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp124.c
> >> @@ -0,0 +1,44 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >> +
> >> +/* Should be optimized to a == -100 */
> >> +int g(int a)
> >> +{
> >> +  if (a == -100 || a >= 0)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 0;
> >> +}
> >> +
> >> +/* Should optimize to a == 0 */
> >> +int f(int a)
> >> +{
> >> +  if (a == 0 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 50;
> >> +}
> >> +
> >> +/* Should be optimized to a == 0. */
> >> +int f2(int a)
> >> +{
> >> +  if (a == 0 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 100;
> >> +}
> >> +
> >> +/* Should optimize to a == 100 */
> >> +int f1(int a)
> >> +{
> >> +  if (a < 0 || a == 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a > 50;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c 
> >> b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
> >> new file mode 100644
> >> index 000..f6c2f8e35f1
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp125.c
> >> @@ -0,0 +1,44 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >> +
> >> +/* Should be optimized to a == -100 */
> >> +int g(int a)
> >> +{
> >> +  if (a == -100 || a == -50 || a >= 0)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < -50;
> >> +}
> >> +
> >> +/* Should optimize to a == 0 */
> >> +int f(int a)
> >> +{
> >> +  if (a == 0 || a == 50 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 50;
> >> +}
> >> +
> >> +/* Should be optimized to a == 0. */
> >> +int f2(int a)
> >> +{
> >> +  if (a == 0 || a == 50 || a > 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a < 25;
> >> +}
> >> +
> >> +/* Should optimize to a == 100 */
> >> +int f1(int a)
> >> +{
> >> +  if (a < 0 || a == 50 || a == 100)
> >> +;
> >> +  else
> >> +return 0;
> >> +  return a > 50;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-not "goto " "optimized" } } */
> >> diff --git a/gcc/vr-values.cc b/gcc/vr-values.cc
> >> index a4fddd62841..7004b0224bd 100644
> >> --- a/gcc/vr-values.cc
> >> +++ b/gcc/vr-values.cc
> >> @@ -907,66 +907,30 @@ simplify_using_ranges::simplify_bit_ops_using_ranges
> >>  a known value range VR.
> >>
> >>  If there is one and only one value which will satisfy the
> >> -   conditional, then return that value.  Else return NULL.
> >> -
> >> -   If signed overflow must be undefined for the value to satisfy
> >> -   the conditional, then set *STRICT_OVERFLOW_P to true.  */
> >> +   conditional on the EDGE, then return that value.
> >> +   Else return NULL.  */
> >>
> >>   static tree
> >>   test_for_singularity (enum tree_code cond_code, tree op0,
> >> - tree op1, const value_range *vr)
> >> + tree op1, Value_Range vr, bool edge)
>
> VR should be a "vrange &".   THis is the top level base class for all
> ranges of all types/kinds, and what we usually pass values around as if
> we want tohem to be any kind.   If this is inetger only, we'd pass a an
> 'irange &'
>
> Value_Range is the opposite. Its the sink that contains one of each kind
> of range and can switch around betw

[PATCH] bpf: neg instruction does not accept an immediate

2023-08-21 Thread David Faust via Gcc-patches
The BPF virtual machine does not support neg nor neg32 instructions with
an immediate.

The erroneous instructions were removed from binutils:
https://sourceware.org/pipermail/binutils/2023-August/129135.html

Change the define_insn so that an immediate cannot be accepted.

>From testing, a neg-immediate was probably never chosen over a
mov-immediate anyway.

Tested on x86_64-linux-gnu host for bpf-unknown-none target.

gcc/

* config/bpf/bpf.md (neg): Second operand must be a register.
---
 gcc/config/bpf/bpf.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
index a64de1095ed..e87d72182bb 100644
--- a/gcc/config/bpf/bpf.md
+++ b/gcc/config/bpf/bpf.md
@@ -163,8 +163,8 @@ (define_insn "sub3"
 
 ;;; Negation
 (define_insn "neg2"
-  [(set (match_operand:AM 0 "register_operand"   "=r,r")
-(neg:AM (match_operand:AM 1 "reg_or_imm_operand" " 0,I")))]
+  [(set (match_operand:AM 0 "register_operand" "=r")
+(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
   ""
   "{neg\t%0|%w0 = -%w1}"
   [(set_attr "type" "")])
-- 
2.40.1



Re: [PATCH] RISC-V: Add Types to Missing Bitmanip Instructions:

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/21/23 10:37, Edwin Lu wrote:

This patch updates the bitmanip instructions to ensure that no insn is left
without a type attribute. Updates a total of 8 insns to have type "bitmanip"

Tested for regressions using rv32/64 multilib with newlib/linux.

gcc/Changelog:

* config/riscv/bitmanip.md: Added bitmanip type to insns
 that are missing types

Thanks.  I pushed this to the trunk.

Just an FYI.  We have a bit of leeway on the types for these 
define_insn_and_split patterns.  So while I think that 'branch' is a 
better choice for the last one in this patch, I don't think it's going 
to matter in practice.


Essentially these define_insn_and_split patterns which are always split 
after reload, the type will only be used for the first scheduling pass. 
By the time we run the second scheduling pass the pattern should have 
been split into its component insns.  Meaning that these types won't get 
used for the final schedule.


But it's still useful to get a type on them, mostly because it'll allow 
us to turn on that assert I mentioned last week.



jeff


Re: [PATCH] Fix tests sensitive to internal library allocations

2023-08-21 Thread Jonathan Wakely via Gcc-patches
On Mon, 21 Aug 2023 at 21:20, François Dumont  wrote:
>
> Here is the updated and tested patch.

OK for trunk, thanks.

We could consider it for the branches too (I'm going to remove the
global strings on the gcc-13 branch tomorrow).


>
> On 21/08/2023 20:07, Jonathan Wakely wrote:
> > On Mon, 21 Aug 2023 at 18:05, François Dumont via Libstdc++
> >  wrote:
> >> Hi
> >>
> >> Here is a propocal to fix tests sensitive to libstdc++ internal 
> >> allocations.
> > Surely the enter() and exit() calls should be a constructor and destructor?
> >
> > The constructor could use count() to get the count, and then restore
> > it in the destructor. Something like:
> >
> > --- a/libstdc++-v3/testsuite/util/replacement_memory_operators.h
> > +++ b/libstdc++-v3/testsuite/util/replacement_memory_operators.h
> > @@ -75,12 +75,30 @@ namespace __gnu_test
> >counter& cntr = get();
> >cntr._M_increments = cntr._M_decrements = 0;
> >  }
> > +
> > +struct scope
> > +{
> > +  scope() : _M_count(counter::count()) { }
> > +  ~scope() { counter::get()._M_count = _M_count; }
> > +
> > +private:
> > +  std::size_t _M_count;
> > +
> > +#if __cplusplus >= 201103L
> > +  scope(const scope&) = delete;
> > +  scope& operator=(const scope&) = delete;
> > +#else
> > +  scope(const scope&);
> > +  scope& operator=(const scope&);
> > +#endif
> > +};
> >};
> >
> >template
> >  bool
> >  check_new(Alloc a = Alloc())
> >  {
> > +  __gnu_test::counter::scope s;
> >__gnu_test::counter::exceptions(false);
> >__gnu_test::counter::reset();
> >(void) a.allocate(10);
> >
> >
> >
> >
> >
> >
> >> Tested by restoring allocation in tzdb.cc.
> >>
> >> As announced I'm also adding a test to detect such allocations. If it is
> >> ok let me know if you prefer to see it in a different place.
> > The test is a good idea. I think 17_intro/no_library_allocation.cc
> > would be a better place for it.
> >
> >>   libstdc++: Fix tests relying on operator new/delete overload
> >>
> >>   Fix tests that are checking for an allocation plan. They are failing 
> >> if
> >>   an allocation is taking place outside the test.
> >>
> >>   libstdc++-v3/ChangeLog
> >>
> >>   * testsuite/util/replacement_memory_operators.h
> >>   (counter::_M_pre_enter_count): New.
> >>   (counter::enter, counter::exit): New static methods to call
> >> on main() enter/exit.
> >>   * testsuite/23_containers/unordered_map/96088.cc (main):
> >>   Call __gnu_test::counter::enter/exit.
> >>   * testsuite/23_containers/unordered_multimap/96088.cc
> >> (main): Likewise.
> >>   * testsuite/23_containers/unordered_multiset/96088.cc
> >> (main): Likewise.
> >>   * testsuite/23_containers/unordered_set/96088.cc (main):
> >> Likewise.
> >>   * testsuite/ext/malloc_allocator/deallocate_local.cc
> >> (main): Likewise.
> >>   * testsuite/ext/new_allocator/deallocate_local.cc (main):
> >> Likewise.
> >>   * testsuite/ext/throw_allocator/deallocate_local.cc (main):
> >> Likewise.
> >>   * testsuite/ext/pool_allocator/allocate_chunk.cc (started):
> >> New global.
> >>   (operator new(size_t)): Check started.
> >>   (main): Set/Unset started.
> >>   * testsuite/ext/no_library_allocation.cc: New test case.
> >>
> >> Ok to commit ?
> >>
> >> François


Re: [PATCH] RISC-V: Add Types to Un-Typed Sync Instructions:

2023-08-21 Thread Jeff Law via Gcc-patches




On 8/21/23 10:51, Edwin Lu wrote:

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the sync instructions to ensure that no insn is left
without a type attribute. Updates a total of 6 insns to have type "atomic"

Tested for regressions using rv32/64 multilib with newlib/linux.
gcc/Changelog:

* config/riscv/sync-rvwmo.md: Added atomic type to insns
 missing types
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Signed-off-by: Edwin Lu 
---
  gcc/config/riscv/sync-rvwmo.md |  3 ++-
  gcc/config/riscv/sync-ztso.md  |  5 +++--
  gcc/config/riscv/sync.md   | 12 
  3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index 1fc7cf16b5b..4970d561211 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -41,7 +41,8 @@ (define_insn "mem_thread_fence_rvwmo"
  else
gcc_unreachable ();
}
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
  
  ;; Atomic memory operations.
  
diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md

index 91c2a48c069..c8968d01488 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -35,7 +35,8 @@ (define_insn "mem_thread_fence_ztso"
  else
gcc_unreachable ();
}
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
  
  ;; Atomic memory operations.

Those two are definitely OK.

  
@@ -77,4 +78,4 @@ (define_insn "atomic_store_ztso"

return "s\t%z1,%0";
}
[(set_attr "type" "atomic")
-   (set (attr "length") (const_int 8))])
\ No newline at end of file
+   (set (attr "length") (const_int 8))])
This raises a question.  We're likely better off using "multi" for a 
define_insn which generates multiple instructions.




diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 2f85951508f..d6c44afd9ca 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -136,7 +136,8 @@ (define_insn "subword_atomic_fetch_strong_"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
}
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 28)) ])

Similarly.

  
  (define_expand "atomic_fetch_nand"

[(match_operand:SHORT 0 "register_operand") ;; 
old value at mem
@@ -203,7 +204,8 @@ (define_insn "subword_atomic_fetch_strong_nand"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
}
-  [(set (attr "length") (const_int 32))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 32)) ])

Similarly.

  
  (define_expand "atomic_fetch_"

[(match_operand:SHORT 0 "register_operand");; old 
value at mem
@@ -310,7 +312,8 @@ (define_insn "subword_atomic_exchange_strong"
   "sc.w%J3\t%5, %5, %1\;"
   "bnez\t%5, 1b";
}
-  [(set (attr "length") (const_int 20))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 20))])

Similarly.

  
  (define_insn "atomic_cas_value_strong"

[(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -497,7 +500,8 @@ (define_insn "subword_atomic_cas_strong"
   "bnez\t%7, 1b\;"
   "1:";
}
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 28))])

Similarly.


Can you respin changing atomic to multi for those cases where we're 
generating more than one instruction out of a define_insn?


THanks,
jeff


Re: [PATCH] bpf: neg instruction does not accept an immediate

2023-08-21 Thread Jose E. Marchesi via Gcc-patches


> The BPF virtual machine does not support neg nor neg32 instructions with
> an immediate.
>
> The erroneous instructions were removed from binutils:
> https://sourceware.org/pipermail/binutils/2023-August/129135.html
>
> Change the define_insn so that an immediate cannot be accepted.
>
> From testing, a neg-immediate was probably never chosen over a
> mov-immediate anyway.

OK.
Thanks!

>
> Tested on x86_64-linux-gnu host for bpf-unknown-none target.
>
> gcc/
>
>   * config/bpf/bpf.md (neg): Second operand must be a register.
> ---
>  gcc/config/bpf/bpf.md | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/bpf/bpf.md b/gcc/config/bpf/bpf.md
> index a64de1095ed..e87d72182bb 100644
> --- a/gcc/config/bpf/bpf.md
> +++ b/gcc/config/bpf/bpf.md
> @@ -163,8 +163,8 @@ (define_insn "sub3"
>  
>  ;;; Negation
>  (define_insn "neg2"
> -  [(set (match_operand:AM 0 "register_operand"   "=r,r")
> -(neg:AM (match_operand:AM 1 "reg_or_imm_operand" " 0,I")))]
> +  [(set (match_operand:AM 0 "register_operand" "=r")
> +(neg:AM (match_operand:AM 1 "register_operand" " 0")))]
>""
>"{neg\t%0|%w0 = -%w1}"
>[(set_attr "type" "")])


Re: [PATCH] Remove XFAIL from gcc/testsuite/gcc.dg/unroll-7.c

2023-08-21 Thread Thiago Jung Bauermann via Gcc-patches


Richard Sandiford  writes:

> Thiago Jung Bauermann via Gcc-patches  writes:
>> This test passes since commit e41103081bfa "Fix undefined behaviour in
>> profile_count::differs_from_p", so remove the xfail annotation.
>>
>> Tested on aarch64-linux-gnu, armv8l-linux-gnueabihf and x86_64-linux-gnu.
>>
>> gcc/testsuite/ChangeLog:
>>  * gcc.dg/unroll-7.c: Remove xfail.
>
> Thanks, pushed to trunk.  Sorry for the slow response.

Thank you! No problem.

-- 
Thiago


RE: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode intrinsic API

2023-08-21 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Monday, August 21, 2023 11:06 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Support RVV VFWREDUSUM.VS rounding mode 
intrinsic API



On 8/17/23 02:05, Pan Li via Gcc-patches wrote:
> From: Pan Li 
> 
> This patch would like to support the rounding mode API for the
> VFWREDUSUM.VS as the below samples
> 
> * __riscv_vfwredusum_vs_f32m1_f64m1_rm
> * __riscv_vfwredusum_vs_f32m1_f64m1_rm_m
> 
> Signed-off-by: Pan Li 
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv-vector-builtins-bases.cc
>   (vfwredusum_frm_obj): New declaration.
>   (BASE): Ditto.
>   * config/riscv/riscv-vector-builtins-bases.h: Ditto.
>   * config/riscv/riscv-vector-builtins-functions.def
>   (vfwredusum_frm): New intrinsic function def.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/base/float-point-wredusum.c: New test.
OK
jeff


RE: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

2023-08-21 Thread Li, Pan2 via Gcc-patches
Thanks Kito and Jeff for comments, will double check and address the comment in 
v2.

Pan

-Original Message-
From: Kito Cheng  
Sent: Monday, August 21, 2023 11:07 PM
To: Jeff Law 
Cc: Li, Pan2 ; gcc-patches@gcc.gnu.org; 
juzhe.zh...@rivai.ai; Wang, Yanzhang 
Subject: Re: [PATCH v1] RISC-V: Refactor RVV class by frm_op_type template arg

Just one nit from me: plz add assertion to OP_TYPE_vx to make sure NO
FRM_OP == HAS_FRM there

On Mon, Aug 21, 2023 at 11:04 PM Jeff Law via Gcc-patches
 wrote:
>
>
>
> On 8/17/23 20:53, Pan Li via Gcc-patches wrote:
> > From: Pan Li 
> >
> > As suggested by kito, we will add new frm_opt_type template arg
> > to the op class, to avoid the duplicated function expand.
> >
> > Signed-off-by: Pan Li 
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv-vector-builtins-bases.cc
> >   (class binop_frm): Removed.
> >   (class reverse_binop_frm): Ditto.
> >   (class widen_binop_frm): Ditto.
> >   (class vfmacc_frm): Ditto.
> >   (class vfnmacc_frm): Ditto.
> >   (class vfmsac_frm): Ditto.
> >   (class vfnmsac_frm): Ditto.
> >   (class vfmadd_frm): Ditto.
> >   (class vfnmadd_frm): Ditto.
> >   (class vfmsub_frm): Ditto.
> >   (class vfnmsub_frm): Ditto.
> >   (class vfwmacc_frm): Ditto.
> >   (class vfwnmacc_frm): Ditto.
> >   (class vfwmsac_frm): Ditto.
> >   (class vfwnmsac_frm): Ditto.
> >   (class unop_frm): Ditto.
> >   (class vfrec7_frm): Ditto.
> >   (class binop): Add frm_op_type template arg.
> >   (class unop): Ditto.
> >   (class widen_binop): Ditto.
> >   (class widen_binop_fp): Ditto.
> >   (class reverse_binop): Ditto.
> >   (class vfmacc): Ditto.
> >   (class vfnmsac): Ditto.
> >   (class vfmadd): Ditto.
> >   (class vfnmsub): Ditto.
> >   (class vfnmacc): Ditto.
> >   (class vfmsac): Ditto.
> >   (class vfnmadd): Ditto.
> >   (class vfmsub): Ditto.
> >   (class vfwmacc): Ditto.
> >   (class vfwnmacc): Ditto.
> >   (class vfwmsac): Ditto.
> >   (class vfwnmsac): Ditto.
> >   (class float_misc): Ditto.
> So in the expand method, you added a case for OP_TYPE_vx.  I assume that
> was intentional -- but it's not mentioned anywhere in the ChangeLog.  So
> please update the ChangeLog if it was intentional or remove the change
> if it wasn't intentional.  Pre-approved with whichever change is
> appropriate.
>
> Thanks,
> Jeff


Re: [PATCH V5] VECT: Support loop len control on EXTRACT_LAST vectorization

2023-08-21 Thread Kewen.Lin via Gcc-patches
Hi Juzhe,

on 2023/8/21 18:59, Juzhe-Zhong wrote:
> Co-Authored-By: Kewen.Lin 
> 
> Hi, @Richi and @Richard, base on previous disscussion, I simpily fix issuses 
> for
> powerpc and s390 with your suggestions:
> 
> -  machine_mode len_load_mode = get_len_load_store_mode
> -(loop_vinfo->vector_mode, true).require ();
> -  machine_mode len_store_mode = get_len_load_store_mode
> -(loop_vinfo->vector_mode, false).require ();
> +  machine_mode len_load_mode, len_store_mode;
> +  if (!get_len_load_store_mode (loop_vinfo->vector_mode, true)
> +.exists (&len_load_mode))
> +return false;
> +  if (!get_len_load_store_mode (loop_vinfo->vector_mode, false)
> +.exists (&len_store_mode))
> +return false;
> 
> Hi, @Kewen and @Stefan
> 
> Could you test this patch again ? Thanks.

I confirmed it's bootstrapped and regress-tested on
powerpc64le-linux-gnu P9/P10.  Thanks!

BR,
Kewen

> 
> Co-Authored-By: Kewen.Lin 
> 
> gcc/ChangeLog:
> 
>   * tree-vect-loop.cc (vect_verify_loop_lens): Add exists check.
>   (vectorizable_live_operation): Add live vectorization for length loop 
> control.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/partial/live-1.c: New test.
>   * gcc.target/riscv/rvv/autovec/partial/live_run-1.c: New test.
> 
> ---
>  .../riscv/rvv/autovec/partial/live-1.c| 34 +++
>  .../riscv/rvv/autovec/partial/live_run-1.c| 35 
>  gcc/tree-vect-loop.cc | 89 ++-
>  3 files changed, 138 insertions(+), 20 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> 
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
> new file mode 100644
> index 000..75fa2eba8cc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live-1.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
> riscv-autovec-preference=scalable -fdump-tree-optimized-details" } */
> +
> +#include 
> +
> +#define EXTRACT_LAST(TYPE)   
>   \
> +  TYPE __attribute__ ((noinline, noclone))   
>   \
> +  test_##TYPE (TYPE *x, int n, TYPE value)   
>   \
> +  {  
>   \
> +TYPE last;   
>   \
> +for (int j = 0; j < n; ++j)  
>   \
> +  {  
>   \
> + last = x[j];   \
> + x[j] = last * value;   \
> +  }  
>   \
> +return last; 
>   \
> +  }
> +
> +#define TEST_ALL(T)  
>   \
> +  T (int8_t) 
>   \
> +  T (int16_t)
>   \
> +  T (int32_t)
>   \
> +  T (int64_t)
>   \
> +  T (uint8_t)
>   \
> +  T (uint16_t)   
>   \
> +  T (uint32_t)   
>   \
> +  T (uint64_t)   
>   \
> +  T (_Float16)   
>   \
> +  T (float)  
>   \
> +  T (double)
> +
> +TEST_ALL (EXTRACT_LAST)
> +
> +/* { dg-final { scan-tree-dump-times "\.VEC_EXTRACT" 10 "optimized" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> new file mode 100644
> index 000..42913a112c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/live_run-1.c
> @@ -0,0 +1,35 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "--param riscv-autovec-preference=scalable" } */
> +
> +#include "live-1.c"
> +
> +#define N 107
> +#define OP 70
> +
> +#define TEST_LOOP(TYPE)  \
> +  {  \
> +TYPE a[N];   \
> +for (int i = 0; i < N; ++i)  

[pushed 1/6] analyzer: convert note_adding_context to annotating_context

2023-08-21 Thread David Malcolm via Gcc-patches
This is enabling work towards the context being able to inject
events into diagnostic paths, rather than just notes after the
warning.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3371-ge40a935db29cfd.


gcc/analyzer/ChangeLog:
* region-model.cc
(class check_external_function_for_access_attr::annotating_ctxt):
Convert to an annotating_context.
* region-model.h (class note_adding_context): Rename to...
(class annotating_context): ...this, updating the "warn" method.
(note_adding_context::make_note): Replace with...
(annotating_context::add_annotations): ...this.
---
 gcc/analyzer/region-model.cc | 12 ++--
 gcc/analyzer/region-model.h  | 14 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 494a9cdf149e..5c165ff127f8 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1641,23 +1641,23 @@ check_external_function_for_access_attr (const gcall 
*call,
   if (access->mode == access_write_only
  || access->mode == access_read_write)
{
- /* Subclass of decorated_region_model_context that
+ /* Subclass of annotating_context that
 adds a note about the attr access to any saved diagnostics.  */
- class annotating_ctxt : public note_adding_context
+ class annotating_ctxt : public annotating_context
  {
  public:
annotating_ctxt (tree callee_fndecl,
 const attr_access &access,
 region_model_context *ctxt)
-   : note_adding_context (ctxt),
+   : annotating_context (ctxt),
  m_callee_fndecl (callee_fndecl),
  m_access (access)
{
}
-   std::unique_ptr make_note () final override
+   void add_annotations () final override
{
- return make_unique
-   (m_callee_fndecl, m_access);
+ add_note (make_unique
+   (m_callee_fndecl, m_access));
}
  private:
tree m_callee_fndecl;
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 4f09f2e585ac..88772655bc5b 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -922,28 +922,28 @@ protected:
   region_model_context *m_inner;
 };
 
-/* Subclass of region_model_context_decorator that adds a note
-   when saving diagnostics.  */
+/* Subclass of region_model_context_decorator with a hook for adding
+   notes/events when saving diagnostics.  */
 
-class note_adding_context : public region_model_context_decorator
+class annotating_context : public region_model_context_decorator
 {
 public:
   bool warn (std::unique_ptr d) override
   {
 if (m_inner->warn (std::move (d)))
   {
-   add_note (make_note ());
+   add_annotations ();
return true;
   }
 else
   return false;
   }
 
-  /* Hook to make the new note.  */
-  virtual std::unique_ptr make_note () = 0;
+  /* Hook to add new event(s)/note(s)  */
+  virtual void add_annotations () = 0;
 
 protected:
-  note_adding_context (region_model_context *inner)
+  annotating_context (region_model_context *inner)
   : region_model_context_decorator (inner)
   {
   }
-- 
2.26.3



[pushed 3/6] analyzer: handle NULL inner context in region_model_context_decorator

2023-08-21 Thread David Malcolm via Gcc-patches
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3373-g1e7b0a5d7a45dc.

gcc/analyzer/ChangeLog:
* region-model.cc (region_model_context_decorator::add_event):
Handle m_inner being NULL.
* region-model.h (class region_model_context_decorator): Likewise.
(annotating_context::warn): Likewise.
---
 gcc/analyzer/region-model.cc |  3 +-
 gcc/analyzer/region-model.h  | 86 
 2 files changed, 60 insertions(+), 29 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index fa30193943d2..ed93fb89f933 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5875,7 +5875,8 @@ noop_region_model_context::terminate_path ()
 void
 region_model_context_decorator::add_event (std::unique_ptr 
event)
 {
-  m_inner->add_event (std::move (event));
+  if (m_inner)
+m_inner->add_event (std::move (event));
 }
 
 /* struct model_merger.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index cdfce0727cf7..a01399c8e85a 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -813,93 +813,118 @@ class region_model_context_decorator : public 
region_model_context
  public:
   bool warn (std::unique_ptr d) override
   {
-return m_inner->warn (std::move (d));
+if (m_inner)
+  return m_inner->warn (std::move (d));
+else
+  return false;
   }
 
   void add_note (std::unique_ptr pn) override
   {
-m_inner->add_note (std::move (pn));
+if (m_inner)
+  m_inner->add_note (std::move (pn));
   }
   void add_event (std::unique_ptr event) override;
 
   void on_svalue_leak (const svalue *sval) override
   {
-m_inner->on_svalue_leak (sval);
+if (m_inner)
+  m_inner->on_svalue_leak (sval);
   }
 
   void on_liveness_change (const svalue_set &live_svalues,
   const region_model *model) override
   {
-m_inner->on_liveness_change (live_svalues, model);
+if (m_inner)
+  m_inner->on_liveness_change (live_svalues, model);
   }
 
   logger *get_logger () override
   {
-return m_inner->get_logger ();
+if (m_inner)
+  return m_inner->get_logger ();
+else
+  return nullptr;
   }
 
   void on_condition (const svalue *lhs,
 enum tree_code op,
 const svalue *rhs) override
   {
-m_inner->on_condition (lhs, op, rhs);
+if (m_inner)
+  m_inner->on_condition (lhs, op, rhs);
   }
 
   void on_bounded_ranges (const svalue &sval,
  const bounded_ranges &ranges) override
   {
-m_inner->on_bounded_ranges (sval, ranges);
+if (m_inner)
+  m_inner->on_bounded_ranges (sval, ranges);
   }
 
   void on_pop_frame (const frame_region *frame_reg) override
   {
-m_inner->on_pop_frame (frame_reg);
+if (m_inner)
+  m_inner->on_pop_frame (frame_reg);
   }
 
   void on_unknown_change (const svalue *sval, bool is_mutable) override
   {
-m_inner->on_unknown_change (sval, is_mutable);
+if (m_inner)
+  m_inner->on_unknown_change (sval, is_mutable);
   }
 
   void on_phi (const gphi *phi, tree rhs) override
   {
-m_inner->on_phi (phi, rhs);
+if (m_inner)
+  m_inner->on_phi (phi, rhs);
   }
 
   void on_unexpected_tree_code (tree t,
const dump_location_t &loc) override
   {
-m_inner->on_unexpected_tree_code (t, loc);
+if (m_inner)
+  m_inner->on_unexpected_tree_code (t, loc);
   }
 
   void on_escaped_function (tree fndecl) override
   {
-m_inner->on_escaped_function (fndecl);
+if (m_inner)
+  m_inner->on_escaped_function (fndecl);
   }
 
   uncertainty_t *get_uncertainty () override
   {
-return m_inner->get_uncertainty ();
+if (m_inner)
+  return m_inner->get_uncertainty ();
+else
+  return nullptr;
   }
 
   void purge_state_involving (const svalue *sval) override
   {
-m_inner->purge_state_involving (sval);
+if (m_inner)
+  m_inner->purge_state_involving (sval);
   }
 
   void bifurcate (std::unique_ptr info) override
   {
-m_inner->bifurcate (std::move (info));
+if (m_inner)
+  m_inner->bifurcate (std::move (info));
   }
 
   void terminate_path () override
   {
-m_inner->terminate_path ();
+if (m_inner)
+  m_inner->terminate_path ();
   }
 
   const extrinsic_state *get_ext_state () const override
   {
-return m_inner->get_ext_state ();
+if (m_inner)
+  return m_inner->get_ext_state ();
+else
+  return nullptr;
   }
 
   bool get_state_map_by_name (const char *name,
@@ -909,20 +934,25 @@ class region_model_context_decorator : public 
region_model_context
  std::unique_ptr *out_sm_context)
 override
   {
-return m_inner->get_state_map_by_name (name, out_smap, out_sm, out_sm_idx,
-  out_sm_context);
+if (m_inner)
+  return m_inner->get_state_map_by_

[pushed 5/6] analyzer: add kf_fopen

2023-08-21 Thread David Malcolm via Gcc-patches
Add checking to -fanalyzer that both params of calls to "fopen" are
valid null-terminated strings.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3375-g4325c82736d9e8.

gcc/analyzer/ChangeLog:
* kf.cc (class kf_fopen): New.
(register_known_functions): Register it.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/fopen-1.c: New test.
---
 gcc/analyzer/kf.cc  | 28 +++
 gcc/testsuite/gcc.dg/analyzer/fopen-1.c | 66 +
 2 files changed, 94 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/fopen-1.c

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 6b2db8613768..1601cf15c685 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -420,6 +420,33 @@ kf_error::impl_call_pre (const call_details &cd) const
   model->check_for_null_terminated_string_arg (cd, fmt_arg_idx);
 }
 
+/* Handler for fopen.
+ FILE *fopen (const char *filename, const char *mode);
+   See e.g. https://en.cppreference.com/w/c/io/fopen
+   https://www.man7.org/linux/man-pages/man3/fopen.3.html
+   
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170
  */
+
+class kf_fopen : public known_function
+{
+public:
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+return (cd.num_args () == 2
+   && cd.arg_is_pointer_p (0)
+   && cd.arg_is_pointer_p (1));
+  }
+
+  void impl_call_pre (const call_details &cd) const final override
+  {
+cd.check_for_null_terminated_string_arg (0);
+cd.check_for_null_terminated_string_arg (1);
+cd.set_any_lhs_with_defaults ();
+
+/* fopen's mode param is effectively a mini-DSL, but there are various
+   non-standard extensions, so we don't bother to check it.  */
+  }
+};
+
 /* Handler for "free", after sm-handling.
 
If the ptr points to an underlying heap region, delete the region,
@@ -1422,6 +1449,7 @@ register_known_functions (known_function_manager &kfm)
 
   /* Known POSIX functions, and some non-standard extensions.  */
   {
+kfm.add ("fopen", make_unique ());
 kfm.add ("putenv", make_unique ());
 
 register_known_fd_functions (kfm);
diff --git a/gcc/testsuite/gcc.dg/analyzer/fopen-1.c 
b/gcc/testsuite/gcc.dg/analyzer/fopen-1.c
new file mode 100644
index ..e5b00e93b6da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/fopen-1.c
@@ -0,0 +1,66 @@
+typedef struct FILE FILE;
+FILE *fopen (const char *pathname, const char *mode);
+#define NULL ((void *)0)
+
+FILE *
+test_passthrough (const char *pathname, const char *mode)
+{
+  return fopen (pathname, mode);
+}
+
+FILE *
+test_null_pathname (const char *pathname, const char *mode)
+{
+  return fopen (NULL, mode);
+}
+
+FILE *
+test_null_mode (const char *pathname)
+{
+  return fopen (pathname, NULL);
+}
+
+FILE *
+test_simple_r (void)
+{
+  return fopen ("foo.txt", "r");
+}
+
+FILE *
+test_swapped_args (void)
+{
+  return fopen ("r", "foo.txt"); /* TODO: would be nice to detect this.  */
+}
+
+FILE *
+test_unterminated_pathname (const char *mode)
+{
+  char buf[3] = "abc";
+  return fopen (buf, mode); /* { dg-warning "stack-based buffer over-read" } */
+  /* { dg-message "while looking for null terminator for argument 1 
\\('&buf'\\) of 'fopen'..." "event" { target *-*-* } .-1 } */
+}
+
+FILE *
+test_unterminated_mode (const char *filename)
+{
+  char buf[3] = "abc";
+  return fopen (filename, buf);  /* { dg-warning "stack-based buffer 
over-read" } */
+  /* { dg-message "while looking for null terminator for argument 2 
\\('&buf'\\) of 'fopen'..." "event" { target *-*-* } .-1 } */
+}
+
+FILE *
+test_uninitialized_pathname (const char *mode)
+{
+  char buf[10];
+  return fopen (buf, mode); /* { dg-warning "use of uninitialized value 
'buf\\\[0\\\]'" } */  
+  /* { dg-message "while looking for null terminator for argument 1 
\\('&buf'\\) of 'fopen'..." "event" { target *-*-* } .-1 } */
+}
+
+FILE *
+test_uninitialized_mode (const char *filename)
+{
+  char buf[10];
+  return fopen (filename, buf); /* { dg-warning "use of uninitialized value 
'buf\\\[0\\\]'" } */  
+  /* { dg-message "while looking for null terminator for argument 2 
\\('&buf'\\) of 'fopen'..." "event" { target *-*-* } .-1 } */
+}
+
-- 
2.26.3



  1   2   >