date:20230420

Re: [PATCH] Less warnings for parameters declared as arrays [PR98541, PR98536]

2023-04-20 Thread Martin Uecker via Gcc-patches

Am Dienstag, dem 04.04.2023 um 19:31 -0600 schrieb Jeff Law:
> 
> On 4/3/23 13:34, Martin Uecker via Gcc-patches wrote:
> > 
> > 
> > With the relatively new warnings (11..) affecting VLA bounds,
> > I now get a lot of false positives with -Wall. In general, I find
> > the new warnings very useful, but they seem a bit too
> > aggressive and some minor tweaks are needed, otherwise they are
> > too noisy.  This patch suggests two changes:
> > 
> > 1. For VLA bounds non-null is implied only when 'static' is
> > used (similar to clang) and not already when a bound > 0 is
> > specified:
> > 
> > int foo(int n, char buf[static n]);
> > 
> > int foo(10, 0); // warning with 'static' but not without.
> > 
> > 
> > (It also seems problematic to require a size of 0 to indicate
> > that the pointer may be null, because 0 is not allowed in
> > ISO C as a size. It is also inconsistent to how arrays with
> > static bound behave.)
> > 
> > There seems to be agreement about this change in PR98541.
> > 
> > 
> > 2. GCC always warns when the number of unspecified
> > bounds is different between two declarations:
> > 
> > int foo(int n, char buf[*]);
> > int foo(int n, char buf[n]);
> > 
> > or
> > 
> > int foo(int n, char buf[n]);
> > int foo(int n, char buf[*]);
> > 
> > But the first version is useful if the size expression
> > can not be specified in a header (e.g. because it uses
> > a macro or variable not available there) and there is
> > currently no easy way to avoid this.  The warning for
> > both cases was by design,  but I suggest to limit the
> > warning to the second case.
> > 
> > Note that the logic currently applied by GCC is too
> > simplistic anyway, as GCC does not warn for
> > 
> > int foo(int x, int y, double m[*][y]);
> > int foo(int x, int y, double m[x][*]);
> > 
> > because the number of specified / unspecified bounds
> > is the same.  So I suggest to go with the attached
> > patch now and add  more precise warnings later
> > if there is more experience with these warning
> > in gernal and if this then still seems desirable.
> > 
> > 
> > Martin
> > 
> > 
> >  Less warnings for parameters declared as arrays [PR98541, PR98536]
> >  
> > 
> > 
> > 
> >  To avoid false positivies, tune the warnings for parameters declared
> >  as arrays with size expressions.  Only warn about null arguments with
> >  'static'.  Also do not warn when more bounds are specified in the new
> >  declaration than before.
> >  
> > 
> > 
> > 
> >  PR c/98541
> >  PR c/98536
> >  
> > 
> > 
> > 
> >  c-family/
> >  * c-warn.cc (warn_parm_array_mismatch): Do not warn if more
> >  bounds are specified.
> >  
> > 
> > 
> > 
> >  gcc/
> >  * gimple-ssa-warn-access.cc
> >    (pass_waccess::maybe_check_access_sizes): For VLA bounds
> >  in parameters, only warn about null pointers with 'static'.
> >  
> > 
> > 
> > 
> >  gcc/testsuite:
> >  * gcc.dg/Wnonnull-4: Adapt test.
> >  * gcc.dg/Wstringop-overflow-40.c: Adapt test.
> >  * gcc.dg/Wvla-parameter-4.c: Adapt test.
> >  * gcc.dg/attr-access-2.c: Adapt test.
> Neither appears to be a regression.  Seems like it should defer to gcc-14.

Then ok for trunk now?

Martin

[PATCH] tree-vect-patterns: Pattern recognize ctz or ffs using clz, popcount or ctz [PR109011]

2023-04-20 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch allows to vectorize __builtin_ffs*/.FFS even if
we just have vector .CTZ support, or __builtin_ffs*/.FFS/__builtin_ctz*/.CTZ
if we just have vector .CLZ or .POPCOUNT support.
It uses various expansions from Hacker's Delight book as well as GCC's
expansion, in particular:
.CTZ (X) = PREC - .CLZ ((X - 1) & ~X)
.CTZ (X) = .POPCOUNT ((X - 1) & ~X)
.CTZ (X) = (PREC - 1) - .CLZ (X & -X)
.FFS (X) = PREC - .CLZ (X & -X)
.CTZ (X) = PREC - .POPCOUNT (X | -X)
.FFS (X) = (PREC + 1) - .POPCOUNT (X | -X)
.FFS (X) = .CTZ (X) + 1
where the first one can be only used if both CTZ and CLZ have value
defined at zero (kind 2) and both have value of PREC there.
If the original has value defined at zero and the latter doesn't
for other forms or if it doesn't have matching value for that case,
a COND_EXPR is added for that afterwards.

The patch also modifies vect_recog_popcount_clz_ctz_ffs_pattern
such that the two can work together.

Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested
on the testcases on powerpc64le-linux and s390x-linux crosses, ok for trunk?

2023-04-20  Jakub Jelinek  

PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): New function.
(vect_recog_popcount_clz_ctz_ffs_pattern): Move vect_pattern_detected
call later.  Don't punt for IFN_CTZ or IFN_FFS if it doesn't have
direct optab support, but has instead IFN_CLZ, IFN_POPCOUNT or
for IFN_FFS IFN_CTZ support, use vect_recog_ctz_ffs_pattern for that
case.
(vect_vect_recog_func_ptrs): Add ctz_ffs entry.

* gcc.dg/vect/pr109011-1.c: Remove -mpower9-vector from
dg-additional-options.
(baz, qux): Remove functions and corresponding dg-final.
* gcc.dg/vect/pr109011-2.c: New test.
* gcc.dg/vect/pr109011-3.c: New test.
* gcc.dg/vect/pr109011-4.c: New test.
* gcc.dg/vect/pr109011-5.c: New test.

--- gcc/tree-vect-patterns.cc.jj2023-04-19 11:14:17.445843870 +0200
+++ gcc/tree-vect-patterns.cc   2023-04-19 20:49:27.946432713 +0200
@@ -1501,6 +1501,266 @@ vect_recog_widen_minus_pattern (vec_info
  "vect_recog_widen_minus_pattern");
 }
 
+/* Function vect_recog_ctz_ffs_pattern
+
+   Try to find the following pattern:
+
+   TYPE1 A;
+   TYPE1 B;
+
+   B = __builtin_ctz{,l,ll} (A);
+
+   or
+
+   B = __builtin_ffs{,l,ll} (A);
+
+   Input:
+
+   * STMT_VINFO: The stmt from which the pattern search begins.
+   here it starts with B = __builtin_* (A);
+
+   Output:
+
+   * TYPE_OUT: The vector type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the sequence of
+   stmts that constitute the pattern, using clz or popcount builtins.  */
+
+static gimple *
+vect_recog_ctz_ffs_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
+   tree *type_out)
+{
+  gimple *call_stmt = stmt_vinfo->stmt;
+  gimple *pattern_stmt;
+  tree rhs_oprnd, rhs_type, lhs_oprnd, lhs_type, vec_type, vec_rhs_type;
+  tree new_var;
+  internal_fn ifn = IFN_LAST, ifnnew = IFN_LAST;
+  bool defined_at_zero = true, defined_at_zero_new = false;
+  int val = 0, val_new = 0;
+  int prec;
+  int sub = 0, add = 0;
+  location_t loc;
+
+  if (!is_gimple_call (call_stmt))
+return NULL;
+
+  if (gimple_call_num_args (call_stmt) != 1)
+return NULL;
+
+  rhs_oprnd = gimple_call_arg (call_stmt, 0);
+  rhs_type = TREE_TYPE (rhs_oprnd);
+  lhs_oprnd = gimple_call_lhs (call_stmt);
+  if (!lhs_oprnd)
+return NULL;
+  lhs_type = TREE_TYPE (lhs_oprnd);
+  if (!INTEGRAL_TYPE_P (lhs_type)
+  || !INTEGRAL_TYPE_P (rhs_type)
+  || !type_has_mode_precision_p (rhs_type)
+  || TREE_CODE (rhs_oprnd) != SSA_NAME)
+return NULL;
+
+  switch (gimple_call_combined_fn (call_stmt))
+{
+CASE_CFN_CTZ:
+  ifn = IFN_CTZ;
+  if (!gimple_call_internal_p (call_stmt)
+ || CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (rhs_type),
+   val) != 2)
+   defined_at_zero = false;
+  break;
+CASE_CFN_FFS:
+  ifn = IFN_FFS;
+  break;
+default:
+  return NULL;
+}
+
+  prec = TYPE_PRECISION (rhs_type);
+  loc = gimple_location (call_stmt);
+
+  vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
+  if (!vec_type)
+return NULL;
+
+  vec_rhs_type = get_vectype_for_scalar_type (vinfo, rhs_type);
+  if (!vec_rhs_type)
+return NULL;
+
+  /* Do it only if the backend doesn't have ctz2 or
+ ffs2 pattern but does have clz2 or
+ popcount2.  */
+  if (!vec_type
+  || direct_internal_fn_supported_p (ifn, vec_rhs_type,
+OPTIMIZE_FOR_SPEED))
+return NULL;
+
+  if (ifn == IFN_FFS
+  && direct_internal_fn_supported_p (IFN_CTZ, vec_rhs_type,
+OPTIMIZE_FOR_SPEED))
+{
+  ifnnew = IFN_CTZ;
+  defined_at_zero_new
+   = CTZ_DEFINED_VALUE_AT

Re: [PATCH 2/1, rs6000] make ppc_cpu_supports_hw as effective target keyword [PR108728]

2023-04-20 Thread Kewen.Lin via Gcc-patches

Hi,

on 2023/4/20 14:04, HAO CHEN GUI wrote:
> Hi,
>   This patch adds ppc_cpu_supports_hw into explicit name checking in
> proc is-effective-target-keyword. So ppc_cpu_supports_hw can be used
> as a target selector in test directives. It's required by patch2 of
> this issue.

OK for trunk, thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> testsuite: make ppc_cpu_supports_hw as effective target keyword [PR108728]
> 
> gcc/testsuite/
>   PR target/108728
>   * lib/target-supports.exp (is-effective-target-keyword): Add
>   ppc_cpu_supports_hw.
> 
> 
> patch.diff
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index 1d6cc6f8d88..e65b447663f 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -9170,6 +9170,7 @@ proc is-effective-target-keyword { arg } {
> "named_sections" { return 1 }
> "gc_sections"{ return 1 }
> "cxa_atexit" { return 1 }
> +   "ppc_cpu_supports_hw" { return 1 }
> default  { return 0 }
>   }
>  }

Re: [PATCH 2/2, rs6000] xfail float128 comparison test case that fails on powerpc64 [PR108728]

2023-04-20 Thread Kewen.Lin via Gcc-patches

Hi,

on 2023/4/20 14:04, HAO CHEN GUI wrote:
> Hi,
>   This patch xfails a float128 comparison test case on powerpc64
> that fails due to a longstanding issue with floating-point
> compares.
> 
>   See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more
> information.
> 
>   The patch passed regression test on Power Linux platforms.
> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: xfail float128 comparison test case that fails on powerpc64.
> 
> This patch xfails a float128 comparison test cases on powerpc64 that

s/cases/case/

> fails due to a longstanding issue with floating-point compares.
> 
> See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58684 for more information.

You can just use PR58684 for short.  :)

> 
> When float128 hardware is enabled (-mfloat128-hardware), xscmpuqp is
> generated for comparison which is unexpected.   When float128 software
> simulation is enabled (-mno-float128-hardware), we still have to xfail

s/simulation/emulation/

OK for trunk, thanks!

BR,
Kewen

> the hardware version (__lekf2_hw) which finally invokes xscmpuqp.
> 
> gcc/testsuite/
>   PR target/108728
>   * gcc.dg/torture/float128-cmp-invalid.c: Add xfail.
> 
> patch.diff
> diff --git a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c 
> b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> index 1f675efdd61..a86592b3328 100644
> --- a/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> +++ b/gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
> @@ -1,5 +1,6 @@
>  /* Test for "invalid" exceptions from __float128 comparisons.  */
>  /* { dg-do run } */
> +/* { dg-xfail-run-if "ppc float128_hw" { ppc_float128_hw || { 
> ppc_cpu_supports_hw && p9vector_hw } } } */
>  /* { dg-options "" } */
>  /* { dg-require-effective-target __float128 } */
>  /* { dg-require-effective-target base_quadfloat_support } */

Re: [13 PATCH RFA] c++: fix 'unsigned __int128_t' semantics [PR108099]

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Wed, Apr 19, 2023 at 11:20:09AM -0400, Jason Merrill wrote:
>   * g++.dg/ext/int128-8.C: New test.

The testcase needs to be restricted to int128 effective targets,
it expectedly fails on i386 and other 32-bit targets.

Tested using
GXX_TESTSUITE_STDS=98,11,14,17,20,2b make check-g++ 
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} dg.exp=int128*.C'
on x86_64-linux before/after, committed to trunk and cherry-picked to
12 branch as obvious.

2023-04-20  Jakub Jelinek  

PR c++/108099
PR testsuite/109560
* g++.dg/ext/int128-8.C: Require int128 effective target.

--- gcc/testsuite/g++.dg/ext/int128-8.C.jj  2023-04-20 09:36:09.106375587 
+0200
+++ gcc/testsuite/g++.dg/ext/int128-8.C 2023-04-20 09:37:02.429592525 +0200
@@ -1,5 +1,5 @@
 // PR c++/108099
-// { dg-do compile { target c++11 } }
+// { dg-do compile { target { c++11 && int128 } } }
 // { dg-options "" }
 
 using u128 = unsigned __int128_t;

Jakub

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Sandiford via Gcc-patches

钟居哲  writes:
> Hi, Richards.
> Since GCC 14 is open and this patch has been boostraped && tested on X86.
> Is this patch supporting variable IV OK for the trunk ?

Doesn't the patch need updating based on the previous discussion?
I thought the outcome was that WHILE_LEN isn't a simple MIN operation
(contrary to the documentation in the patch) and that pointer IVs
would also need to be updated by a variable amount, given that even
non-final iterations might process fewer than VF elements.

Thanks,
Richard

> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-04-07 09:47
> To: gcc-patches
> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> From: Juzhe-Zhong 
>  
> This patch is to add WHILE_LEN pattern.
> It's inspired by RVV ISA simple "vvaddint32.s" example:
> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>  
> More details are in "vect_set_loop_controls_by_while_len" implementation
> and comments.
>  
> Consider such following case:
> #define N 16
> int src[N];
> int dest[N];
>  
> void
> foo (int n)
> {
>   for (int i = 0; i < n; i++)
> dest[i] = src[i];
> }
>  
> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
> -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>  
> foo:
> ble a0,zero,.L1
> lui a4,%hi(.LANCHOR0)
> addia4,a4,%lo(.LANCHOR0)
> addia3,a4,64
> csrra2,vlenb
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma
> vle32.v v1,0(a4)
> sub a0,a0,a5
> vse32.v v1,0(a3)
> add a4,a4,a2
> add a3,a3,a2
> bne a0,zero,.L3
> .L1:
> ret
>  
> gcc/ChangeLog:
>  
> * doc/md.texi: Add WHILE_LEN support.
> * internal-fn.cc (while_len_direct): Ditto.
> (expand_while_len_optab_fn): Ditto.
> (direct_while_len_optab_supported_p): Ditto.
> * internal-fn.def (WHILE_LEN): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-ssa-loop-manip.cc (create_iv): Ditto.
> * tree-ssa-loop-manip.h (create_iv): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
> Ditto.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (vect_get_loop_len): Ditto.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (vect_get_loop_len): Ditto.
>  
> ---
> gcc/doc/md.texi |  14 +++
> gcc/internal-fn.cc  |  29 ++
> gcc/internal-fn.def |   1 +
> gcc/optabs.def  |   1 +
> gcc/tree-ssa-loop-manip.cc  |   4 +-
> gcc/tree-ssa-loop-manip.h   |   2 +-
> gcc/tree-vect-loop-manip.cc | 186 ++--
> gcc/tree-vect-loop.cc   |  35 +--
> gcc/tree-vect-stmts.cc  |   9 +-
> gcc/tree-vectorizer.h   |   4 +-
> 10 files changed, 264 insertions(+), 21 deletions(-)
>  
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 8e3113599fd..72178ab014c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> @end smallexample
> +@cindex @code{while_len@var{m}@var{n}} instruction pattern
> +@item @code{while_len@var{m}@var{n}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The operation is equivalent to:
> +
> +@smallexample
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +@end smallexample
> +
> @cindex @code{check_raw_ptrs@var{m}} instruction pattern
> @item @samp{check_raw_ptrs@var{m}}
> Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 6e81dc05e0e..5f44def90d3 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -127,6 +127,7 @@ init_internal_fns ()
> #define cond_binary_direct { 1, 1, true }
> #define cond_ternary_direct { 1, 1, true }
> #define while_direct { 0, 2, false }
> +#define while_len_direct { 0, 0, false }
> #define fold_extract_direct { 2, 2, false }
> #define fold_left_direct { 1, 1, false }
> #define mask_fold_left_direct { 1, 1, false }
> @@ -3702,6 +3703,33 @@ expand_while_optab_fn (internal_fn, gcall *stmt, 
> convert_optab optab)
>  emit_move_insn (lhs_rtx, ops[0].value);
> }
> +/* Expand WHILE_LEN call STMT using optab OPTAB.  */
> +static void
> +expand_while_len_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> +  expand_operand ops[3];
> +  tree rhs_type[2];
> +
> +  tree lh

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread juzhe.zh...@rivai.ai

Thanks Richard reminding me. I originally think community does not allow me 
support variable amount IV and let me do this in RISC-V backend.
It seems that I can do that in middle-end. Thank you so much. I will update the 
patch. Really appreciate it!



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-04-20 16:52
To: 钟居哲
CC: gcc-patches; rguenther; Jeff Law
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
钟居哲  writes:
> Hi, Richards.
> Since GCC 14 is open and this patch has been boostraped && tested on X86.
> Is this patch supporting variable IV OK for the trunk ?
 
Doesn't the patch need updating based on the previous discussion?
I thought the outcome was that WHILE_LEN isn't a simple MIN operation
(contrary to the documentation in the patch) and that pointer IVs
would also need to be updated by a variable amount, given that even
non-final iterations might process fewer than VF elements.
 
Thanks,
Richard
 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-04-07 09:47
> To: gcc-patches
> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> From: Juzhe-Zhong 
>  
> This patch is to add WHILE_LEN pattern.
> It's inspired by RVV ISA simple "vvaddint32.s" example:
> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>  
> More details are in "vect_set_loop_controls_by_while_len" implementation
> and comments.
>  
> Consider such following case:
> #define N 16
> int src[N];
> int dest[N];
>  
> void
> foo (int n)
> {
>   for (int i = 0; i < n; i++)
> dest[i] = src[i];
> }
>  
> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
> -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>  
> foo:
> ble a0,zero,.L1
> lui a4,%hi(.LANCHOR0)
> addia4,a4,%lo(.LANCHOR0)
> addia3,a4,64
> csrra2,vlenb
> .L3:
> vsetvli a5,a0,e32,m1,ta,ma
> vle32.v v1,0(a4)
> sub a0,a0,a5
> vse32.v v1,0(a3)
> add a4,a4,a2
> add a3,a3,a2
> bne a0,zero,.L3
> .L1:
> ret
>  
> gcc/ChangeLog:
>  
> * doc/md.texi: Add WHILE_LEN support.
> * internal-fn.cc (while_len_direct): Ditto.
> (expand_while_len_optab_fn): Ditto.
> (direct_while_len_optab_supported_p): Ditto.
> * internal-fn.def (WHILE_LEN): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-ssa-loop-manip.cc (create_iv): Ditto.
> * tree-ssa-loop-manip.h (create_iv): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
> Ditto.
> (vect_set_loop_condition_partial_vectors): Ditto.
> * tree-vect-loop.cc (vect_get_loop_len): Ditto.
> * tree-vect-stmts.cc (vectorizable_store): Ditto.
> (vectorizable_load): Ditto.
> * tree-vectorizer.h (vect_get_loop_len): Ditto.
>  
> ---
> gcc/doc/md.texi |  14 +++
> gcc/internal-fn.cc  |  29 ++
> gcc/internal-fn.def |   1 +
> gcc/optabs.def  |   1 +
> gcc/tree-ssa-loop-manip.cc  |   4 +-
> gcc/tree-ssa-loop-manip.h   |   2 +-
> gcc/tree-vect-loop-manip.cc | 186 ++--
> gcc/tree-vect-loop.cc   |  35 +--
> gcc/tree-vect-stmts.cc  |   9 +-
> gcc/tree-vectorizer.h   |   4 +-
> 10 files changed, 264 insertions(+), 21 deletions(-)
>  
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 8e3113599fd..72178ab014c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
> @end smallexample
> +@cindex @code{while_len@var{m}@var{n}} instruction pattern
> +@item @code{while_len@var{m}@var{n}}
> +Set operand 0 to the number of active elements in vector will be updated 
> value.
> +operand 1 is the total elements need to be updated value.
> +operand 2 is the vectorization factor.
> +The operation is equivalent to:
> +
> +@smallexample
> +operand0 = MIN (operand1, operand2);
> +operand2 can be const_poly_int or poly_int related to vector mode size.
> +Some target like RISC-V has a standalone instruction to get MIN (n, MODE 
> SIZE) so
> +that we can reduce a use of general purpose register.
> +@end smallexample
> +
> @cindex @code{check_raw_ptrs@var{m}} instruction pattern
> @item @samp{check_raw_ptrs@var{m}}
> Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 6e81dc05e0e..5f44def90d3 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -127,6 +127,7 @@ init_internal_fns ()
> #define cond_binary_direct { 1, 1, true }
> #define cond_ternary_direct { 1, 1, true }
> #define while_direct { 0, 2, false }
> +#define while_len_direct { 0, 0, false }
> #define fold_extract_direct { 2,

Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread Robin Dapp via Gcc-patches

> $ riscv64-unknown-linux-gnu-gcc
> --param=riscv-autovec-preference=fixed-vlmax
> gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c -O2 -march=rv64gcv
> -S
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:
> In function 'stach_check_alloca_1':
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> error: insn does not satisfy its constraints:
>41 | }
>   | ^
> (insn 37 26 40 2 (set (reg:VNx8QI 120 v24 [orig:158 data ] [158])
> (reg:VNx8QI 10 a0 [ data ]))
> "../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c":28:1
> 727 {*movvnx8qi_whole}
>  (nil))
> during RTL pass: reload
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> internal compiler error: in extract_constrain_insn, at recog.cc:2692

For a slightly adjusted testcase

void
foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
{
  f[i * 2 + 0] = 1;
  f[i * 2 + 1] = 2;
  d[i] = 3;
}
}

compiled with -fno-vect-cost-model --param=riscv-autovec-preference=scalable
I see an ICE:

during GIMPLE pass: vect
dump file: foo3.c.172t.vect
foo3.c: In function 'foo0':
foo3.c:4:1: internal compiler error: in exact_div, at poly-int.h:2232
4 | foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
  | ^~~~
0x7bb237 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/poly-int.h:2232
0x7bbf91 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/tree.h:3663
0x7bbf91 can_duplicate_and_interleave_p(vec_info*, unsigned int, tree_node*, 
unsigned int*, tree_node**, tree_node**)
../../gcc/tree-vect-slp.cc:437
[..]

With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
reasonable.  BTW please use --param instead of -param in the description to
avoid confusion.

Now the patches don't explicitly note that they only work for certain marchs,
configurations or so but they certainly shouldn't introduce ICEs for
unsupported configurations.

Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't thought
this through but I think I'd prefer "fixed" vs "varying" or more explicitly
"fixed vector size" vs "dynamic vector size".  Certainly room for discussion 
here.
What about the -mriscv-vector-bits=... (which would be vlen in v-spec parlance)
from your "rvv-next" branch?  Is this orthogonal to the new parameter here? Are 
you
thinking of introducing this as well?

Regards
 Robin

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread juzhe.zh...@rivai.ai

>> With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
>> reasonable.  BTW please use --param instead of -param in the description to
>> avoid confusion.
>>Now the patches don't explicitly note that they only work for certain marchs,
>>configurations or so but they certainly shouldn't introduce ICEs for
>>unsupported configurations.

Address comments.  And fix that soon. Thank you so much.

>>Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't 
>>thought
>>this through but I think I'd prefer "fixed" vs "varying" or more explicitly
>>"fixed vector size" vs "dynamic vector size".  Certainly room for discussion 
>>here.
>>What about the -mriscv-vector-bits=... (which would be vlen in v-spec 
>>parlance)
>>from your "rvv-next" branch?  Is this orthogonal to the new parameter here? 
>>Are you
>>thinking of introducing this as well?

The current compile options are suggested by Kito. They are internal GCC 
compile option.
I was trying to add -mriscv-vector-bits-, However, it was objected by LLVM 
community.
https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/33 

I think in case of compile options, Kito may give more comments since he is the 
RISC-V ABI and convention maintainer.
I develop this patch following his order.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-04-20 16:58
To: Kito Cheng; juzhe.zhong
CC: gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV
> $ riscv64-unknown-linux-gnu-gcc
> --param=riscv-autovec-preference=fixed-vlmax
> gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c -O2 -march=rv64gcv
> -S
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:
> In function 'stach_check_alloca_1':
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> error: insn does not satisfy its constraints:
>41 | }
>   | ^
> (insn 37 26 40 2 (set (reg:VNx8QI 120 v24 [orig:158 data ] [158])
> (reg:VNx8QI 10 a0 [ data ]))
> "../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c":28:1
> 727 {*movvnx8qi_whole}
>  (nil))
> during RTL pass: reload
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> internal compiler error: in extract_constrain_insn, at recog.cc:2692
 
For a slightly adjusted testcase
 
void
foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
{
  f[i * 2 + 0] = 1;
  f[i * 2 + 1] = 2;
  d[i] = 3;
}
}
 
compiled with -fno-vect-cost-model --param=riscv-autovec-preference=scalable
I see an ICE:
 
during GIMPLE pass: vect
dump file: foo3.c.172t.vect
foo3.c: In function 'foo0':
foo3.c:4:1: internal compiler error: in exact_div, at poly-int.h:2232
4 | foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
  | ^~~~
0x7bb237 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/poly-int.h:2232
0x7bbf91 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/tree.h:3663
0x7bbf91 can_duplicate_and_interleave_p(vec_info*, unsigned int, tree_node*, 
unsigned int*, tree_node**, tree_node**)
../../gcc/tree-vect-slp.cc:437
[..]
 
With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
reasonable.  BTW please use --param instead of -param in the description to
avoid confusion.
 
Now the patches don't explicitly note that they only work for certain marchs,
configurations or so but they certainly shouldn't introduce ICEs for
unsupported configurations.
 
Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't thought
this through but I think I'd prefer "fixed" vs "varying" or more explicitly
"fixed vector size" vs "dynamic vector size".  Certainly room for discussion 
here.
What about the -mriscv-vector-bits=... (which would be vlen in v-spec parlance)
from your "rvv-next" branch?  Is this orthogonal to the new parameter here? Are 
you
thinking of introducing this as well?
 
Regards
Robin

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard reminding me. I originally think community does not allow me 
> support variable amount IV and let me do this in RISC-V backend.

No, I think that part should and needs to be done in the middle-end,
since if the initial IVs are incorrect, it's very difficult to fix
them up later.

But with the patch as originally presented, WHILE_LEN was just a
simple minimum operation, with only the final iteration being partial.
It didn't make sense IMO for that to be its own IFN.  It was only later
that you said that non-final iterations might be partial too.

And there was pushback against WHILE_LEN having an effect on global
state, rather than being a simple "how many elements should I process?"
calculation.  That last bit -- the global effect of VSETVL -- was the bit
that needed to be kept local to the RISC-V backend.

Thanks,
Richard

> It seems that I can do that in middle-end. Thank you so much. I will update 
> the patch. Really appreciate it!
>
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-20 16:52
> To: 钟居哲
> CC: gcc-patches; rguenther; Jeff Law
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> 钟居哲  writes:
>> Hi, Richards.
>> Since GCC 14 is open and this patch has been boostraped && tested on X86.
>> Is this patch supporting variable IV OK for the trunk ?
>  
> Doesn't the patch need updating based on the previous discussion?
> I thought the outcome was that WHILE_LEN isn't a simple MIN operation
> (contrary to the documentation in the patch) and that pointer IVs
> would also need to be updated by a variable amount, given that even
> non-final iterations might process fewer than VF elements.
>  
> Thanks,
> Richard
>  
>> juzhe.zh...@rivai.ai
>>  
>> From: juzhe.zhong
>> Date: 2023-04-07 09:47
>> To: gcc-patches
>> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
>> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
>> auto-vectorization
>> From: Juzhe-Zhong 
>>  
>> This patch is to add WHILE_LEN pattern.
>> It's inspired by RVV ISA simple "vvaddint32.s" example:
>> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>>  
>> More details are in "vect_set_loop_controls_by_while_len" implementation
>> and comments.
>>  
>> Consider such following case:
>> #define N 16
>> int src[N];
>> int dest[N];
>>  
>> void
>> foo (int n)
>> {
>>   for (int i = 0; i < n; i++)
>> dest[i] = src[i];
>> }
>>  
>> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
>> -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>>  
>> foo:
>> ble a0,zero,.L1
>> lui a4,%hi(.LANCHOR0)
>> addia4,a4,%lo(.LANCHOR0)
>> addia3,a4,64
>> csrra2,vlenb
>> .L3:
>> vsetvli a5,a0,e32,m1,ta,ma
>> vle32.v v1,0(a4)
>> sub a0,a0,a5
>> vse32.v v1,0(a3)
>> add a4,a4,a2
>> add a3,a3,a2
>> bne a0,zero,.L3
>> .L1:
>> ret
>>  
>> gcc/ChangeLog:
>>  
>> * doc/md.texi: Add WHILE_LEN support.
>> * internal-fn.cc (while_len_direct): Ditto.
>> (expand_while_len_optab_fn): Ditto.
>> (direct_while_len_optab_supported_p): Ditto.
>> * internal-fn.def (WHILE_LEN): Ditto.
>> * optabs.def (OPTAB_D): Ditto.
>> * tree-ssa-loop-manip.cc (create_iv): Ditto.
>> * tree-ssa-loop-manip.h (create_iv): Ditto.
>> * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
>> Ditto.
>> (vect_set_loop_condition_partial_vectors): Ditto.
>> * tree-vect-loop.cc (vect_get_loop_len): Ditto.
>> * tree-vect-stmts.cc (vectorizable_store): Ditto.
>> (vectorizable_load): Ditto.
>> * tree-vectorizer.h (vect_get_loop_len): Ditto.
>>  
>> ---
>> gcc/doc/md.texi |  14 +++
>> gcc/internal-fn.cc  |  29 ++
>> gcc/internal-fn.def |   1 +
>> gcc/optabs.def  |   1 +
>> gcc/tree-ssa-loop-manip.cc  |   4 +-
>> gcc/tree-ssa-loop-manip.h   |   2 +-
>> gcc/tree-vect-loop-manip.cc | 186 ++--
>> gcc/tree-vect-loop.cc   |  35 +--
>> gcc/tree-vect-stmts.cc  |   9 +-
>> gcc/tree-vectorizer.h   |   4 +-
>> 10 files changed, 264 insertions(+), 21 deletions(-)
>>  
>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
>> index 8e3113599fd..72178ab014c 100644
>> --- a/gcc/doc/md.texi
>> +++ b/gcc/doc/md.texi
>> @@ -4965,6 +4965,20 @@ for (i = 1; i < operand3; i++)
>>operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
>> @end smallexample
>> +@cindex @code{while_len@var{m}@var{n}} instruction pattern
>> +@item @code{while_len@var{m}@var{n}}
>> +Set operand 0 to the number of active elements in vector will be updated 
>> value.
>> +operand 1 is the total elements need to be updated value.
>> +operand 2 is the vectorization factor.
>> +The operation is equivalent

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread juzhe.zh...@rivai.ai

OK. Thanks Richard.
So let me conclude:
1. Community agree that I should support variable IV in the middle-end.
2. We can keep WHILE_LEN pattern when "not only final iteration is partial".
And I should describe it more clearly in the doc.

I should do these 2 things in the later update patch.
Is that right? Feel free to correct me.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-04-20 17:11
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther; jeffreyalaw
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richard reminding me. I originally think community does not allow me 
> support variable amount IV and let me do this in RISC-V backend.
 
No, I think that part should and needs to be done in the middle-end,
since if the initial IVs are incorrect, it's very difficult to fix
them up later.
 
But with the patch as originally presented, WHILE_LEN was just a
simple minimum operation, with only the final iteration being partial.
It didn't make sense IMO for that to be its own IFN.  It was only later
that you said that non-final iterations might be partial too.
 
And there was pushback against WHILE_LEN having an effect on global
state, rather than being a simple "how many elements should I process?"
calculation.  That last bit -- the global effect of VSETVL -- was the bit
that needed to be kept local to the RISC-V backend.
 
Thanks,
Richard
 
> It seems that I can do that in middle-end. Thank you so much. I will update 
> the patch. Really appreciate it!
>
>
>
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-20 16:52
> To: 钟居哲
> CC: gcc-patches; rguenther; Jeff Law
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> 钟居哲  writes:
>> Hi, Richards.
>> Since GCC 14 is open and this patch has been boostraped && tested on X86.
>> Is this patch supporting variable IV OK for the trunk ?
>  
> Doesn't the patch need updating based on the previous discussion?
> I thought the outcome was that WHILE_LEN isn't a simple MIN operation
> (contrary to the documentation in the patch) and that pointer IVs
> would also need to be updated by a variable amount, given that even
> non-final iterations might process fewer than VF elements.
>  
> Thanks,
> Richard
>  
>> juzhe.zh...@rivai.ai
>>  
>> From: juzhe.zhong
>> Date: 2023-04-07 09:47
>> To: gcc-patches
>> CC: richard.sandiford; rguenther; jeffreyalaw; Juzhe-Zhong
>> Subject: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
>> auto-vectorization
>> From: Juzhe-Zhong 
>>  
>> This patch is to add WHILE_LEN pattern.
>> It's inspired by RVV ISA simple "vvaddint32.s" example:
>> https://github.com/riscv/riscv-v-spec/blob/master/example/vvaddint32.s
>>  
>> More details are in "vect_set_loop_controls_by_while_len" implementation
>> and comments.
>>  
>> Consider such following case:
>> #define N 16
>> int src[N];
>> int dest[N];
>>  
>> void
>> foo (int n)
>> {
>>   for (int i = 0; i < n; i++)
>> dest[i] = src[i];
>> }
>>  
>> -march=rv64gcv -O3 --param riscv-autovec-preference=scalable 
>> -fno-vect-cost-model -fno-tree-loop-distribute-patterns:
>>  
>> foo:
>> ble a0,zero,.L1
>> lui a4,%hi(.LANCHOR0)
>> addia4,a4,%lo(.LANCHOR0)
>> addia3,a4,64
>> csrra2,vlenb
>> .L3:
>> vsetvli a5,a0,e32,m1,ta,ma
>> vle32.v v1,0(a4)
>> sub a0,a0,a5
>> vse32.v v1,0(a3)
>> add a4,a4,a2
>> add a3,a3,a2
>> bne a0,zero,.L3
>> .L1:
>> ret
>>  
>> gcc/ChangeLog:
>>  
>> * doc/md.texi: Add WHILE_LEN support.
>> * internal-fn.cc (while_len_direct): Ditto.
>> (expand_while_len_optab_fn): Ditto.
>> (direct_while_len_optab_supported_p): Ditto.
>> * internal-fn.def (WHILE_LEN): Ditto.
>> * optabs.def (OPTAB_D): Ditto.
>> * tree-ssa-loop-manip.cc (create_iv): Ditto.
>> * tree-ssa-loop-manip.h (create_iv): Ditto.
>> * tree-vect-loop-manip.cc (vect_set_loop_controls_by_while_len): 
>> Ditto.
>> (vect_set_loop_condition_partial_vectors): Ditto.
>> * tree-vect-loop.cc (vect_get_loop_len): Ditto.
>> * tree-vect-stmts.cc (vectorizable_store): Ditto.
>> (vectorizable_load): Ditto.
>> * tree-vectorizer.h (vect_get_loop_len): Ditto.
>>  
>> ---
>> gcc/doc/md.texi |  14 +++
>> gcc/internal-fn.cc  |  29 ++
>> gcc/internal-fn.def |   1 +
>> gcc/optabs.def  |   1 +
>> gcc/tree-ssa-loop-manip.cc  |   4 +-
>> gcc/tree-ssa-loop-manip.h   |   2 +-
>> gcc/tree-vect-loop-manip.cc | 186 ++--
>> gcc/tree-vect-loop.cc   |  35 +--
>> gcc/tree-vect-stmts.cc  |   9 +-
>> gcc/tree-vectorizer.h   |   4 +-
>> 10 files changed, 264 insertions(+), 21 deletions(-)
>>  
>> diff --git a/gcc/doc/m

Re: [PATCH] tree-vect-patterns: Pattern recognize ctz or ffs using clz, popcount or ctz [PR109011]

2023-04-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Apr 2023, Jakub Jelinek wrote:

> Hi!
> 
> The following patch allows to vectorize __builtin_ffs*/.FFS even if
> we just have vector .CTZ support, or __builtin_ffs*/.FFS/__builtin_ctz*/.CTZ
> if we just have vector .CLZ or .POPCOUNT support.
> It uses various expansions from Hacker's Delight book as well as GCC's
> expansion, in particular:
> .CTZ (X) = PREC - .CLZ ((X - 1) & ~X)
> .CTZ (X) = .POPCOUNT ((X - 1) & ~X)
> .CTZ (X) = (PREC - 1) - .CLZ (X & -X)
> .FFS (X) = PREC - .CLZ (X & -X)
> .CTZ (X) = PREC - .POPCOUNT (X | -X)
> .FFS (X) = (PREC + 1) - .POPCOUNT (X | -X)
> .FFS (X) = .CTZ (X) + 1
> where the first one can be only used if both CTZ and CLZ have value
> defined at zero (kind 2) and both have value of PREC there.
> If the original has value defined at zero and the latter doesn't
> for other forms or if it doesn't have matching value for that case,
> a COND_EXPR is added for that afterwards.
> 
> The patch also modifies vect_recog_popcount_clz_ctz_ffs_pattern
> such that the two can work together.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, plus tested
> on the testcases on powerpc64le-linux and s390x-linux crosses, ok for trunk?

OK.

Thanks,
Richard.

> 2023-04-20  Jakub Jelinek  
> 
>   PR tree-optimization/109011
>   * tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): New function.
>   (vect_recog_popcount_clz_ctz_ffs_pattern): Move vect_pattern_detected
>   call later.  Don't punt for IFN_CTZ or IFN_FFS if it doesn't have
>   direct optab support, but has instead IFN_CLZ, IFN_POPCOUNT or
>   for IFN_FFS IFN_CTZ support, use vect_recog_ctz_ffs_pattern for that
>   case.
>   (vect_vect_recog_func_ptrs): Add ctz_ffs entry.
> 
>   * gcc.dg/vect/pr109011-1.c: Remove -mpower9-vector from
>   dg-additional-options.
>   (baz, qux): Remove functions and corresponding dg-final.
>   * gcc.dg/vect/pr109011-2.c: New test.
>   * gcc.dg/vect/pr109011-3.c: New test.
>   * gcc.dg/vect/pr109011-4.c: New test.
>   * gcc.dg/vect/pr109011-5.c: New test.
> 
> --- gcc/tree-vect-patterns.cc.jj  2023-04-19 11:14:17.445843870 +0200
> +++ gcc/tree-vect-patterns.cc 2023-04-19 20:49:27.946432713 +0200
> @@ -1501,6 +1501,266 @@ vect_recog_widen_minus_pattern (vec_info
> "vect_recog_widen_minus_pattern");
>  }
>  
> +/* Function vect_recog_ctz_ffs_pattern
> +
> +   Try to find the following pattern:
> +
> +   TYPE1 A;
> +   TYPE1 B;
> +
> +   B = __builtin_ctz{,l,ll} (A);
> +
> +   or
> +
> +   B = __builtin_ffs{,l,ll} (A);
> +
> +   Input:
> +
> +   * STMT_VINFO: The stmt from which the pattern search begins.
> +   here it starts with B = __builtin_* (A);
> +
> +   Output:
> +
> +   * TYPE_OUT: The vector type of the output of this pattern.
> +
> +   * Return value: A new stmt that will be used to replace the sequence of
> +   stmts that constitute the pattern, using clz or popcount builtins.  */
> +
> +static gimple *
> +vect_recog_ctz_ffs_pattern (vec_info *vinfo, stmt_vec_info stmt_vinfo,
> + tree *type_out)
> +{
> +  gimple *call_stmt = stmt_vinfo->stmt;
> +  gimple *pattern_stmt;
> +  tree rhs_oprnd, rhs_type, lhs_oprnd, lhs_type, vec_type, vec_rhs_type;
> +  tree new_var;
> +  internal_fn ifn = IFN_LAST, ifnnew = IFN_LAST;
> +  bool defined_at_zero = true, defined_at_zero_new = false;
> +  int val = 0, val_new = 0;
> +  int prec;
> +  int sub = 0, add = 0;
> +  location_t loc;
> +
> +  if (!is_gimple_call (call_stmt))
> +return NULL;
> +
> +  if (gimple_call_num_args (call_stmt) != 1)
> +return NULL;
> +
> +  rhs_oprnd = gimple_call_arg (call_stmt, 0);
> +  rhs_type = TREE_TYPE (rhs_oprnd);
> +  lhs_oprnd = gimple_call_lhs (call_stmt);
> +  if (!lhs_oprnd)
> +return NULL;
> +  lhs_type = TREE_TYPE (lhs_oprnd);
> +  if (!INTEGRAL_TYPE_P (lhs_type)
> +  || !INTEGRAL_TYPE_P (rhs_type)
> +  || !type_has_mode_precision_p (rhs_type)
> +  || TREE_CODE (rhs_oprnd) != SSA_NAME)
> +return NULL;
> +
> +  switch (gimple_call_combined_fn (call_stmt))
> +{
> +CASE_CFN_CTZ:
> +  ifn = IFN_CTZ;
> +  if (!gimple_call_internal_p (call_stmt)
> +   || CTZ_DEFINED_VALUE_AT_ZERO (SCALAR_INT_TYPE_MODE (rhs_type),
> + val) != 2)
> + defined_at_zero = false;
> +  break;
> +CASE_CFN_FFS:
> +  ifn = IFN_FFS;
> +  break;
> +default:
> +  return NULL;
> +}
> +
> +  prec = TYPE_PRECISION (rhs_type);
> +  loc = gimple_location (call_stmt);
> +
> +  vec_type = get_vectype_for_scalar_type (vinfo, lhs_type);
> +  if (!vec_type)
> +return NULL;
> +
> +  vec_rhs_type = get_vectype_for_scalar_type (vinfo, rhs_type);
> +  if (!vec_rhs_type)
> +return NULL;
> +
> +  /* Do it only if the backend doesn't have ctz2 or
> + ffs2 pattern but does have clz2 or
> + popcount2.  */
> +  if (!vec_type
> +  || direct_internal_fn_supported_p (ifn, vec_rhs_type,
> +

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Sandiford via Gcc-patches

"juzhe.zh...@rivai.ai"  writes:
> OK. Thanks Richard.
> So let me conclude:
> 1. Community agree that I should support variable IV in the middle-end.
> 2. We can keep WHILE_LEN pattern when "not only final iteration is partial".
> And I should describe it more clearly in the doc.
>
> I should do these 2 things in the later update patch.

Sounds good to me, but Richi is the maintainer.

Thanks,
Richard

[PATCH] Remove duplicate DFS walks from DF init

2023-04-20 Thread Richard Biener via Gcc-patches

The following removes unused CFG order computes from
rest_of_handle_df_initialize.  The CFG orders are computed from df_analyze ().
This also removes code duplication that would have to be kept in sync.

Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages,
pushed.

* df-core.cc (rest_of_handle_df_initialize): Remove
computation of df->postorder, df->postorder_inverted and
df->n_blocks.
---
 gcc/df-core.cc | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/df-core.cc b/gcc/df-core.cc
index 3286ffda2ce..de5cbd0c622 100644
--- a/gcc/df-core.cc
+++ b/gcc/df-core.cc
@@ -701,11 +701,6 @@ rest_of_handle_df_initialize (void)
   if (optimize > 1)
 df_live_add_problem ();
 
-  df->postorder = XNEWVEC (int, last_basic_block_for_fn (cfun));
-  df->n_blocks = post_order_compute (df->postorder, true, true);
-  inverted_post_order_compute (&df->postorder_inverted);
-  gcc_assert ((unsigned) df->n_blocks == df->postorder_inverted.length ());
-
   df->hard_regs_live_count = XCNEWVEC (unsigned int, FIRST_PSEUDO_REGISTER);
 
   df_hard_reg_init ();
-- 
2.35.3

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread juzhe.zh...@rivai.ai

Hi, kito. Can you give more comments for us in case of compile options?
I think I should fix this patch after we have done all discussions of compile 
option
of choosing vector-length && LMUL && auto-vectorization mode (VLA/VLS).

I just received Richard Sandiford comments of "WHILE_LEN" pattern.
Overall the global reviewers accept our RVV loop control mechanism in 
middle-end, 
I am going to support RVV loop control mechanism in middle-end first. Then, we 
can 
have perfect codegen like RVV ISA example soon.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-04-20 16:58
To: Kito Cheng; juzhe.zhong
CC: gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV
> $ riscv64-unknown-linux-gnu-gcc
> --param=riscv-autovec-preference=fixed-vlmax
> gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c -O2 -march=rv64gcv
> -S
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:
> In function 'stach_check_alloca_1':
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> error: insn does not satisfy its constraints:
>41 | }
>   | ^
> (insn 37 26 40 2 (set (reg:VNx8QI 120 v24 [orig:158 data ] [158])
> (reg:VNx8QI 10 a0 [ data ]))
> "../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c":28:1
> 727 {*movvnx8qi_whole}
>  (nil))
> during RTL pass: reload
> ../riscv-gnu-toolchain-trunk/riscv-gcc/gcc/testsuite/gcc.target/riscv/rvv/base/spill-10.c:41:1:
> internal compiler error: in extract_constrain_insn, at recog.cc:2692
 
For a slightly adjusted testcase
 
void
foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
{
  for (int i = 0; i < n; ++i)
{
  f[i * 2 + 0] = 1;
  f[i * 2 + 1] = 2;
  d[i] = 3;
}
}
 
compiled with -fno-vect-cost-model --param=riscv-autovec-preference=scalable
I see an ICE:
 
during GIMPLE pass: vect
dump file: foo3.c.172t.vect
foo3.c: In function 'foo0':
foo3.c:4:1: internal compiler error: in exact_div, at poly-int.h:2232
4 | foo0 (int32_t *__restrict f, int32_t *__restrict d, int n)
  | ^~~~
0x7bb237 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/poly-int.h:2232
0x7bbf91 poly_int<2u, poly_result::is_poly>::type, poly_coeff_pair_traits::is_poly>::type>::result_kind>::type> 
exact_div<2u, unsigned long, int>(poly_int_pod<2u, unsigned long> const&, int)
../../gcc/tree.h:3663
0x7bbf91 can_duplicate_and_interleave_p(vec_info*, unsigned int, tree_node*, 
unsigned int*, tree_node**, tree_node**)
../../gcc/tree-vect-slp.cc:437
[..]
 
With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
reasonable.  BTW please use --param instead of -param in the description to
avoid confusion.
 
Now the patches don't explicitly note that they only work for certain marchs,
configurations or so but they certainly shouldn't introduce ICEs for
unsupported configurations.
 
Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't thought
this through but I think I'd prefer "fixed" vs "varying" or more explicitly
"fixed vector size" vs "dynamic vector size".  Certainly room for discussion 
here.
What about the -mriscv-vector-bits=... (which would be vlen in v-spec parlance)
from your "rvv-next" branch?  Is this orthogonal to the new parameter here? Are 
you
thinking of introducing this as well?
 
Regards
Robin

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread Kito Cheng via Gcc-patches

On Thu, Apr 20, 2023 at 5:07 PM juzhe.zh...@rivai.ai
 wrote:
>
> >> With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
> >> reasonable.  BTW please use --param instead of -param in the description to
> >> avoid confusion.
> >>Now the patches don't explicitly note that they only work for certain 
> >>marchs,
> >>configurations or so but they certainly shouldn't introduce ICEs for
> >>unsupported configurations.
>
> Address comments.  And fix that soon. Thank you so much.
>
> >>Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't 
> >>thought
> >>this through but I think I'd prefer "fixed" vs "varying" or more explicitly
> >>"fixed vector size" vs "dynamic vector size".  Certainly room for 
> >>discussion here.
> >>What about the -mriscv-vector-bits=... (which would be vlen in v-spec 
> >>parlance)
> >>from your "rvv-next" branch?  Is this orthogonal to the new parameter here? 
> >>Are you
> >>thinking of introducing this as well?
>
> The current compile options are suggested by Kito. They are internal GCC 
> compile option.
> I was trying to add -mriscv-vector-bits-, However, it was objected by 
> LLVM community.
> https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/33


Wait, -mriscv-vector-bits= isn't objected by LLVM, what they objected
to is lmul option.
LLVM community has try to implmenat that:
https://reviews.llvm.org/D145088

But personally I would prefer not to rush to implement that feature on upstream,
we could implement that and have more conversion with LLVM community and then
document that into https://github.com/riscv-non-isa/rvv-intrinsic-doc
or https://github.com/riscv-non-isa/riscv-toolchain-conventions

> I think in case of compile options, Kito may give more comments since he is 
> the RISC-V ABI and convention maintainer.
> I develop this patch following his order.

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread juzhe.zh...@rivai.ai

Ahhh.  Thanks kito.

Can you give more comments about Robin's opinion that he want to change into
"fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?

I am Ok with any of them. 



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-20 17:31
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; gcc-patches; palmer; jeffreyalaw
Subject: Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV
On Thu, Apr 20, 2023 at 5:07 PM juzhe.zh...@rivai.ai
 wrote:
>
> >> With --param=riscv-autovec-preference=fixed-vlmax, however, the output is
> >> reasonable.  BTW please use --param instead of -param in the description to
> >> avoid confusion.
> >>Now the patches don't explicitly note that they only work for certain 
> >>marchs,
> >>configurations or so but they certainly shouldn't introduce ICEs for
> >>unsupported configurations.
>
> Address comments.  And fix that soon. Thank you so much.
>
> >>Are the "fixed-vlmax" vs "scalable" names based on ARM's SVE?  I haven't 
> >>thought
> >>this through but I think I'd prefer "fixed" vs "varying" or more explicitly
> >>"fixed vector size" vs "dynamic vector size".  Certainly room for 
> >>discussion here.
> >>What about the -mriscv-vector-bits=... (which would be vlen in v-spec 
> >>parlance)
> >>from your "rvv-next" branch?  Is this orthogonal to the new parameter here? 
> >>Are you
> >>thinking of introducing this as well?
>
> The current compile options are suggested by Kito. They are internal GCC 
> compile option.
> I was trying to add -mriscv-vector-bits-, However, it was objected by 
> LLVM community.
> https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/33
 
 
Wait, -mriscv-vector-bits= isn't objected by LLVM, what they objected
to is lmul option.
LLVM community has try to implmenat that:
https://reviews.llvm.org/D145088
 
But personally I would prefer not to rush to implement that feature on upstream,
we could implement that and have more conversion with LLVM community and then
document that into https://github.com/riscv-non-isa/rvv-intrinsic-doc
or https://github.com/riscv-non-isa/riscv-toolchain-conventions
 
> I think in case of compile options, Kito may give more comments since he is 
> the RISC-V ABI and convention maintainer.
> I develop this patch following his order.

Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread Robin Dapp via Gcc-patches

> Can you give more comments about Robin's opinion that he want to change into
> "fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?

It's not necessary to decide on this now as --params are not supposed
to be stable and can be changed quickly.  I was just curious if this had
already been discussed or finalized elsewhere.

Regards
 Robin

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread juzhe.zh...@rivai.ai

Ahhh. These compile options are not finalized. 
I just ask kito provide me some compile option that I can specify LMUL && 
auto-vectorization mode && vector-length (scalable or fixed-length)
in order to have chances test auto-vectorizaiton fully for example: fully 
testing LMUL = 1/2/4/8 auto-vectorization (You can see the codes in rvv.exp). 
Then, he let me add --param options.

I can change compile option as you suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-04-20 17:42
To: juzhe.zh...@rivai.ai; kito.cheng
CC: gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV
> Can you give more comments about Robin's opinion that he want to change into
> "fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?
 
It's not necessary to decide on this now as --params are not supposed
to be stable and can be changed quickly.  I was just curious if this had
already been discussed or finalized elsewhere.
 
Regards
Robin

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Apr 2023, Richard Sandiford wrote:

> "juzhe.zh...@rivai.ai"  writes:
> > OK. Thanks Richard.
> > So let me conclude:
> > 1. Community agree that I should support variable IV in the middle-end.
> > 2. We can keep WHILE_LEN pattern when "not only final iteration is partial".
> > And I should describe it more clearly in the doc.
> >
> > I should do these 2 things in the later update patch.
> 
> Sounds good to me, but Richi is the maintainer.

But I happily defer to you for designing VL stuff.  I suppose it's
time to make you maintainer as well ... (you are global reviewer).

That said, I will look at an updated patch.

Richard.

Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Thu, 20 Apr 2023, Richard Sandiford wrote:
>
>> "juzhe.zh...@rivai.ai"  writes:
>> > OK. Thanks Richard.
>> > So let me conclude:
>> > 1. Community agree that I should support variable IV in the middle-end.
>> > 2. We can keep WHILE_LEN pattern when "not only final iteration is 
>> > partial".
>> > And I should describe it more clearly in the doc.
>> >
>> > I should do these 2 things in the later update patch.
>> 
>> Sounds good to me, but Richi is the maintainer.
>
> But I happily defer to you for designing VL stuff.  I suppose it's
> time to make you maintainer as well ... (you are global reviewer).

Heh, wasn't trying to bag an extra maintainership :-)  I just got a
bit lost in the thread and wasn't sure whether I was contradicting
something you'd said (in which case I'd defer to that).

Richard

Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-20 Thread Richard Sandiford via Gcc-patches

 writes:
> Yes, like kito said.
> We won't enable VNx1DImode in auto-vectorization so it's meaningless to fix 
> it here.
> We dynamic adjust the minimum vector-length for different '-march' according 
> to RVV ISA specification.
> So we strongly suggest that we should drop this fix.

I think the patch should go in regardless.  If we have a port with
a VNx1 mode then the exact_div is at best dubious and at worst wrong.

Thanks,
Richard

[committed v2] gcc-13: Add release note for RISC-V

2023-04-20 Thread Kito Cheng via Gcc-patches

---
 htdocs/gcc-13/changes.html | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index f6941534..4515a6af 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -636,7 +636,34 @@ a work-in-progress.
 
 RISC-V
 
-New ISA extension support for zawrs.
+Support for vector intrinsics as specified in https://github.com/riscv-non-isa/rvv-intrinsic-doc/tree/v0.11.x";>
+   version 0.11 of the RISC-V vector intrinsic specification,
+   thanks Ju-Zhe Zhong from https://rivai-ic.com.cn/";>RiVAI
+   for contributing most of implementation.
+
+Support for the following standard extensions has been added:
+  
+Zawrs
+Zbkb
+Zbkc
+Zbkx
+Zdinx
+Zfinx
+Zfh
+Zfhmin
+Zhinx
+Zhinxmin
+Zicbom
+Zicbop
+Zicboz
+Zknd
+Zkne
+Zksed
+Zksh
+Zmmul
+  
+
 Support for the following vendor extensions has been added:
   
 XTheadBa
@@ -659,6 +686,11 @@ a work-in-progress.
 T-Head's XuanTie C906 (thead-c906).
   
 
+Improves the multi-lib selection mechanism for the bare-metal toolchain
+   (riscv*-elf*). GCC will now automatically select the best-fit multi-lib
+   candidate instead of requiring all possible reuse rules to be listed at
+   build time.
+
 
 
 
-- 
2.39.2

Re: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV

2023-04-20 Thread Kito Cheng via Gcc-patches

Hi Robin:

Share with you more context that I've discussed with Ju-Zhe, and look
for comments from you :)

There is 3 different auto vectorization flavor:
- VLA
- VLS fixed-vlmax (Name TBD)
- (Traditional) VLS

I think I don't need to explain too much on VLA.
So let we focus on second and third:

VLS fixed-vlmax, that's something like -mriscv-vector-bits= or
-msve-vector-bits, assume VLEN is a static length, and evaluate
scalable vector mode as fixed length vector mode (e.g. evaluate (8x +
8) byte to 16 byte), so that stack allocation could be done by static
instead computed by vlenb register, and vlvmax could be evaluate to a
static value too, but the code generated by this mode is not portable,
when you compile with -mriscv-vector-bits=128, then the code can't run
on machine which VLEN is not exactly equal to 128.

(Traditional) VLS, vectorized to something like int32x4_t, stack
allocation can be determined in static too since the size is fixed,
but size of vector register size is still a poly_int16 value (scalable
vector), not evaluated to fixed length vector like VLS fixed-vlmax
mode, this mode could be useful to handle those loop can't vectorized
by VLA mode, or used by SLP vectorizor, and this mode is more portable
than VLS fixed-vlmax mode since it only require VLEN is larger than
specific length, rather than require must equal to specific length.





On Thu, Apr 20, 2023 at 5:47 PM juzhe.zh...@rivai.ai
 wrote:
>
> Ahhh. These compile options are not finalized.
> I just ask kito provide me some compile option that I can specify LMUL && 
> auto-vectorization mode && vector-length (scalable or fixed-length)
> in order to have chances test auto-vectorizaiton fully for example: fully 
> testing LMUL = 1/2/4/8 auto-vectorization (You can see the codes in rvv.exp).
> Then, he let me add --param options.
>
> I can change compile option as you suggested.
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Robin Dapp
> Date: 2023-04-20 17:42
> To: juzhe.zh...@rivai.ai; kito.cheng
> CC: gcc-patches; palmer; jeffreyalaw
> Subject: Re: [PATCH 2/3 V2] RISC-V: Enable basic auto-vectorization for RVV
> > Can you give more comments about Robin's opinion that he want to change into
> > "fixed" vs "varying" or "fixed vector size" vs "dynamic vector size" ?
>
> It's not necessary to decide on this now as --params are not supposed
> to be stable and can be changed quickly.  I was just curious if this had
> already been discussed or finalized elsewhere.
>
> Regards
> Robin
>

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread juzhe.zh...@rivai.ai

Thanks Richards (Sandiford && Biener).
I have a technique question:
In case of support variable IVs for memory address calculation, is it right I 
should make output of WHILE_LEN
visible in tree-ssa-loop-ivopts.cc ? Since the address calculation is not in 
the loop control handling function.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-04-20 17:54
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; gcc-patches; jeffreyalaw
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Richard Biener  writes:
> On Thu, 20 Apr 2023, Richard Sandiford wrote:
>
>> "juzhe.zh...@rivai.ai"  writes:
>> > OK. Thanks Richard.
>> > So let me conclude:
>> > 1. Community agree that I should support variable IV in the middle-end.
>> > 2. We can keep WHILE_LEN pattern when "not only final iteration is 
>> > partial".
>> > And I should describe it more clearly in the doc.
>> >
>> > I should do these 2 things in the later update patch.
>> 
>> Sounds good to me, but Richi is the maintainer.
>
> But I happily defer to you for designing VL stuff.  I suppose it's
> time to make you maintainer as well ... (you are global reviewer).

Heh, wasn't trying to bag an extra maintainership :-)  I just got a
bit lost in the thread and wasn't sure whether I was contradicting
something you'd said (in which case I'd defer to that).

Richard

Re: [PATCH 1/3] RISC-V: Add auto-vectorization compile option for RVV

2023-04-20 Thread Richard Biener via Gcc-patches

On Wed, Apr 19, 2023 at 6:38 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> This patch is adding 2 compile option for RVV auto-vectorization.
> 1. -param=riscv-autovec-preference=
>This option is to specify the auto-vectorization approach for RVV.
>Currently, we only support scalable and fixed-vlmax.
>
> - scalable means VLA auto-vectorization. The vector-length to compiler is
>   unknown and runtime invariant. Such approach can allow us compile the 
> code
>   run on any vector-length RVV CPU.
>
> - fixed-vlmax means the compile known the RVV CPU vector-length, compile 
> option
>   in fixed-length VLS auto-vectorization. Meaning if we specify 
> vector-length=512.
>   The execution file can only run on vector-length = 512 RVV CPU.
>
> - TODO: we may need to support min-length VLS auto-vectorization, means 
> the execution
>   file can run on larger length RVV CPU.

Just as a generic comment - if the option should be exposed to users
rather than just used
for testsuite or development purposes it should eventually become a
-mautovec-preference=
flag (no need to prefix with riscv).

> gcc/ChangeLog:
>
> * config/riscv/riscv-opts.h (enum riscv_autovec_preference_enum): Add 
> enum for auto-vectorization preference.
> (enum riscv_autovec_lmul_enum): Add enum for choosing LMUL of RVV 
> auto-vectorization.
> * config/riscv/riscv.opt: Add compile option for RVV 
> auto-vectorization.
>
> ---
>  gcc/config/riscv/riscv-opts.h | 15 ++
>  gcc/config/riscv/riscv.opt| 37 +++
>  2 files changed, 52 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
> index cf0cd669be4..4207db240ea 100644
> --- a/gcc/config/riscv/riscv-opts.h
> +++ b/gcc/config/riscv/riscv-opts.h
> @@ -67,6 +67,21 @@ enum stack_protector_guard {
>SSP_GLOBAL   /* global canary */
>  };
>
> +/* RISC-V auto-vectorization preference.  */
> +enum riscv_autovec_preference_enum {
> +  NO_AUTOVEC,
> +  RVV_SCALABLE,
> +  RVV_FIXED_VLMAX
> +};
> +
> +/* RISC-V auto-vectorization RVV LMUL.  */
> +enum riscv_autovec_lmul_enum {
> +  RVV_M1 = 1,
> +  RVV_M2 = 2,
> +  RVV_M4 = 4,
> +  RVV_M8 = 8
> +};
> +
>  #define MASK_ZICSR(1 << 0)
>  #define MASK_ZIFENCEI (1 << 1)
>
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index ff1dd4ddd4f..ef1bdfcfe28 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -254,3 +254,40 @@ Enum(isa_spec_class) String(20191213) 
> Value(ISA_SPEC_CLASS_20191213)
>  misa-spec=
>  Target RejectNegative Joined Enum(isa_spec_class) Var(riscv_isa_spec) 
> Init(TARGET_DEFAULT_ISA_SPEC)
>  Set the version of RISC-V ISA spec.
> +
> +Enum
> +Name(riscv_autovec_preference) Type(enum riscv_autovec_preference_enum)
> +The RISC-V auto-vectorization preference:
> +
> +EnumValue
> +Enum(riscv_autovec_preference) String(none) Value(NO_AUTOVEC)
> +
> +EnumValue
> +Enum(riscv_autovec_preference) String(scalable) Value(RVV_SCALABLE)
> +
> +EnumValue
> +Enum(riscv_autovec_preference) String(fixed-vlmax) Value(RVV_FIXED_VLMAX)
> +
> +-param=riscv-autovec-preference=
> +Target RejectNegative Joined Enum(riscv_autovec_preference) 
> Var(riscv_autovec_preference) Init(NO_AUTOVEC)
> +-param=riscv-autovec-preference=   Set the preference of 
> auto-vectorization in the RISC-V port.
> +
> +Enum
> +Name(riscv_autovec_lmul) Type(enum riscv_autovec_lmul_enum)
> +The RVV possible LMUL:
> +
> +EnumValue
> +Enum(riscv_autovec_lmul) String(m1) Value(RVV_M1)
> +
> +EnumValue
> +Enum(riscv_autovec_lmul) String(m2) Value(RVV_M2)
> +
> +EnumValue
> +Enum(riscv_autovec_lmul) String(m4) Value(RVV_M4)
> +
> +EnumValue
> +Enum(riscv_autovec_lmul) String(m8) Value(RVV_M8)
> +
> +-param=riscv-autovec-lmul=
> +Target RejectNegative Joined Enum(riscv_autovec_lmul) 
> Var(riscv_autovec_lmul) Init(RVV_M1)
> +-param=riscv-autovec-lmul= Set the RVV LMUL of 
> auto-vectorization in the RISC-V port.
> --
> 2.36.3
>

[PATCH] tree-optimization/109564 - avoid equivalences from PHIs in most cases

2023-04-20 Thread Richard Biener via Gcc-patches

The following avoids registering two-way equivalences from PHIs when
UNDEFINED arguments are involved.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

As noted this causes missed optimizations for the cases where
we have unreachable edges rather than UNDEFINED ranges.

OK for trunk / branch?  Do we want to try tackle the unreachable
edges case separately?

Thanks,
Richard.

PR tree-optimization/109564
* gimple-range-fold.cc (fold_using_range::range_of_phi):
Track whether we've seen any UNDEFINED argument and avoid
registering equivalences in that case.

* gcc.dg/torture/pr109564-1.c: New testcase.
* gcc.dg/torture/pr109564-2.c: Likewise.
* gcc.dg/tree-ssa/evrp-ignore.c: Likewise.
* gcc.dg/tree-ssa/vrp06.c: Likewise.
---
 gcc/gimple-range-fold.cc| 27 +++-
 gcc/testsuite/gcc.dg/torture/pr109564-1.c   | 74 +
 gcc/testsuite/gcc.dg/torture/pr109564-2.c   | 33 +
 gcc/testsuite/gcc.dg/tree-ssa/evrp-ignore.c |  2 +-
 gcc/testsuite/gcc.dg/tree-ssa/vrp06.c   |  2 +-
 5 files changed, 119 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr109564-1.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr109564-2.c

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 429734f954a..d6b79600cea 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -743,6 +743,7 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, 
fur_source &src)
   // Track if all executable arguments are the same.
   tree single_arg = NULL_TREE;
   bool seen_arg = false;
+  bool seen_undefined = false;
 
   // Start with an empty range, unioning in each argument's range.
   r.set_undefined ();
@@ -781,6 +782,8 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, 
fur_source &src)
  else if (single_arg != arg)
single_arg = NULL_TREE;
}
+  else
+   seen_undefined = true;
 
   // Once the value reaches varying, stop looking.
   if (r.varying_p () && single_arg == NULL_TREE)
@@ -798,23 +801,13 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, 
fur_source &src)
// Symbolic arguments can be equivalences.
if (gimple_range_ssa_p (single_arg))
  {
-   // Only allow the equivalence if the PHI definition does not
-   // dominate any incoming edge for SINGLE_ARG.
-   // See PR 108139 and 109462.
-   basic_block bb = gimple_bb (phi);
-   if (!dom_info_available_p (CDI_DOMINATORS))
- single_arg = NULL;
-   else
- for (x = 0; x < gimple_phi_num_args (phi); x++)
-   if (gimple_phi_arg_def (phi, x) == single_arg
-   && dominated_by_p (CDI_DOMINATORS,
-   gimple_phi_arg_edge (phi, x)->src,
-   bb))
- {
-   single_arg = NULL;
-   break;
- }
-   if (single_arg)
+   // Only allow the equivalence if there isn't any UNDEFINED
+   // argument we ignored.  Such equivalences are one way
+   // PHIDEF == name, but name == PHIDEF might not hold.
+   // See PR 108139, 109462 and 109564.
+   // ???  This misses cases with not executable edges such
+   // as gcc.dg/tree-ssa/vrp06.c
+   if (!seen_undefined)
  src.register_relation (phi, VREL_EQ, phi_def, single_arg);
  }
else if (src.get_operand (arg_range, single_arg)
diff --git a/gcc/testsuite/gcc.dg/torture/pr109564-1.c 
b/gcc/testsuite/gcc.dg/torture/pr109564-1.c
new file mode 100644
index 000..e7c855f1edf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr109564-1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+
+struct libkeccak_spec {
+long int bitrate;
+};
+
+struct libkeccak_generalised_spec {
+long int bitrate;
+long int state_size;
+long int word_size;
+};
+
+int __attribute__((noipa))
+libkeccak_degeneralise_spec(struct libkeccak_generalised_spec *restrict spec,
+   struct libkeccak_spec *restrict output_spec)
+{
+  long int state_size, word_size, bitrate, output;
+  const int have_state_size = spec->state_size != (-65536L);
+  const int have_word_size = spec->word_size != (-65536L);
+  const int have_bitrate = spec->bitrate != (-65536L);
+
+  if (have_state_size)
+{
+  state_size = spec->state_size;
+  if (state_size <= 0)
+   return 1;
+  if (state_size > 1600)
+   return 2;
+}
+
+  if (have_word_size)
+{
+  word_size = spec->word_size;
+  if (word_size <= 0)
+   return 4;
+  if (word_size > 64)
+   return 5;
+  if (have_state_size && state_size != word_size * 25)
+   return 6;
+  else if (!have_state_size) {
+ spec->state_size = 1;
+ state_size = word_size * 25;
+  }
+}
+
+  if (have_bitrate)
+bit

Re: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for auto-vectorization

2023-04-20 Thread Richard Biener via Gcc-patches

On Thu, 20 Apr 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richards (Sandiford && Biener).
> I have a technique question:
> In case of support variable IVs for memory address calculation, is it right I 
> should make output of WHILE_LEN
> visible in tree-ssa-loop-ivopts.cc ? Since the address calculation is not in 
> the loop control handling function.

Well, it is visible, no?  It's going to be not different from SVE
testing the mask for all zeros.

I'd leave trying to make IVOPTs eliminate other counting IVs with
the WHILE_LEN result as a followup exercise - you will likely have
address IVs that do not depend on the WHILE_LEN result for all of
the memory accesses.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Sandiford
> Date: 2023-04-20 17:54
> To: Richard Biener
> CC: juzhe.zhong\@rivai.ai; gcc-patches; jeffreyalaw
> Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
> auto-vectorization
> Richard Biener  writes:
> > On Thu, 20 Apr 2023, Richard Sandiford wrote:
> >
> >> "juzhe.zh...@rivai.ai"  writes:
> >> > OK. Thanks Richard.
> >> > So let me conclude:
> >> > 1. Community agree that I should support variable IV in the middle-end.
> >> > 2. We can keep WHILE_LEN pattern when "not only final iteration is 
> >> > partial".
> >> > And I should describe it more clearly in the doc.
> >> >
> >> > I should do these 2 things in the later update patch.
> >> 
> >> Sounds good to me, but Richi is the maintainer.
> >
> > But I happily defer to you for designing VL stuff.  I suppose it's
> > time to make you maintainer as well ... (you are global reviewer).
>  
> Heh, wasn't trying to bag an extra maintainership :-)  I just got a
> bit lost in the thread and wasn't sure whether I was contradicting
> something you'd said (in which case I'd defer to that).
>  
> Richard
>  
>  
>  
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

[committed] amdgcn: update target-supports.exp

2023-04-20 Thread Andrew Stubbs

Recent patches have enabled new capabilities on AMD GCN, but not all the 
testsuite features were enabled. The hardfp divide patch actually had an 
test regression because the expected results were too conservative.


This patch corrects both issues.

Andrewamdgcn: update target-supports.exp

The backend can now vectorize more things.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_vect_call_copysignf): Add amdgcn.
(check_effective_target_vect_call_sqrtf): Add amdgcn.
(check_effective_target_vect_call_ceilf): Add amdgcn.
(check_effective_target_vect_call_floor): Add amdgcn.
(check_effective_target_vect_logical_reduc): Add amdgcn.

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index ad68af51f91..868e2c4f092 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8555,7 +8555,8 @@ proc check_effective_target_vect_call_copysignf { } {
 return [check_cached_effective_target_indexed vect_call_copysignf {
   expr { [istarget i?86-*-*] || [istarget x86_64-*-*]
 || [istarget powerpc*-*-*]
-|| [istarget aarch64*-*-*] }}]
+|| [istarget aarch64*-*-*]
+ || [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports hardware square root instructions.
@@ -8591,7 +8592,8 @@ proc check_effective_target_vect_call_sqrtf { } {
 || [istarget i?86-*-*] || [istarget x86_64-*-*]
 || ([istarget powerpc*-*-*] && [check_vsx_hw_available])
 || ([istarget s390*-*-*]
-&& [check_effective_target_s390_vx]) }}]
+&& [check_effective_target_s390_vx])
+ || [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector lrint calls.
@@ -8636,14 +8638,16 @@ proc check_effective_target_vect_call_ceil { } {
 
 proc check_effective_target_vect_call_ceilf { } {
 return [check_cached_effective_target_indexed vect_call_ceilf {
-  expr { [istarget aarch64*-*-*] }}]
+  expr { [istarget aarch64*-*-*]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector floor calls.
 
 proc check_effective_target_vect_call_floor { } {
 return [check_cached_effective_target_indexed vect_call_floor {
-  expr { [istarget aarch64*-*-*] }}]
+  expr { [istarget aarch64*-*-*]
+|| [istarget amdgcn-*-*] }}]
 }
 
 # Return 1 if the target supports vector floorf calls.
@@ -8699,7 +8703,8 @@ proc check_effective_target_vect_call_roundf { } {
 # Return 1 if the target supports AND, OR and XOR reduction.
 
 proc check_effective_target_vect_logical_reduc { } {
-return [check_effective_target_aarch64_sve]
+return [expr { [check_effective_target_aarch64_sve]
+  || [istarget amdgcn-*-*] }]
 }
 
 # Return 1 if the target supports the fold_extract_last optab.

[committed] amdgcn: bug fix ldexp insn

2023-04-20 Thread Andrew Stubbs

The hardfp division patch exposed a flaw in the ldexp pattern at -O0; 
the compiler was trying to use out-of-range immediates on VOP3 
instruction encodings.


This patch changes the constraints appropriately, and also takes the 
opportunity to combine the two patterns into one using the newly 
available SV_FP iterator.


Andrewamdgcn: bug fix ldexp insn

The vop3 instructions don't support B constraint immediates.
Also, take the use the SV_FP iterator to delete a redundant pattern.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (vnsi, VnSI): Add scalar modes.
(ldexp3): Delete.
(ldexp3): Change "B" to "A".

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 4a40a9d8d4c..44c48468dd6 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -208,7 +208,9 @@ (define_mode_attr SCALAR_MODE
(V64HF "HF") (V64SF "SF") (V64DI "DI") (V64DF "DF")])
 
 (define_mode_attr vnsi
-  [(V2QI "v2si") (V2HI "v2si") (V2HF "v2si") (V2SI "v2si")
+  [(QI "si") (HI "si") (SI "si")
+   (HF "si") (SF "si") (DI "si") (DF "si")
+   (V2QI "v2si") (V2HI "v2si") (V2HF "v2si") (V2SI "v2si")
(V2SF "v2si") (V2DI "v2si") (V2DF "v2si")
(V4QI "v4si") (V4HI "v4si") (V4HF "v4si") (V4SI "v4si")
(V4SF "v4si") (V4DI "v4si") (V4DF "v4si")
@@ -222,7 +224,9 @@ (define_mode_attr vnsi
(V64SF "v64si") (V64DI "v64si") (V64DF "v64si")])
 
 (define_mode_attr VnSI
-  [(V2QI "V2SI") (V2HI "V2SI") (V2HF "V2SI") (V2SI "V2SI")
+  [(QI "SI") (HI "SI") (SI "SI")
+   (HF "SI") (SF "SI") (DI "SI") (DF "SI")
+   (V2QI "V2SI") (V2HI "V2SI") (V2HF "V2SI") (V2SI "V2SI")
(V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI")
(V4QI "V4SI") (V4HI "V4SI") (V4HF "V4SI") (V4SI "V4SI")
(V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI")
@@ -3043,21 +3047,10 @@ (define_expand "2"
 
 ; Implement ldexp pattern
 
-(define_insn "ldexp3"
-  [(set (match_operand:FP 0 "register_operand"  "=v")
-   (unspec:FP
- [(match_operand:FP 1 "gcn_alu_operand" "vB")
-  (match_operand:SI 2 "gcn_alu_operand" "vSvA")]
- UNSPEC_LDEXP))]
-  ""
-  "v_ldexp%i0\t%0, %1, %2"
-  [(set_attr "type" "vop3a")
-   (set_attr "length" "8")])
-
 (define_insn "ldexp3"
-  [(set (match_operand:V_FP 0 "register_operand" "=  v")
-   (unspec:V_FP
- [(match_operand:V_FP 1 "gcn_alu_operand"   "  vB")
+  [(set (match_operand:SV_FP 0 "register_operand" "=  v")
+   (unspec:SV_FP
+ [(match_operand:SV_FP 1 "gcn_alu_operand"   "  vA")
   (match_operand: 2 "gcn_alu_operand" "vSvA")]
  UNSPEC_LDEXP))]
   ""

Re: [PATCH] [i386] Support type _Float16/__bf16 independent of SSE2.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Wed, Apr 19, 2023 at 03:15:51PM +0800, liuhongt wrote:
ChangeLog nits have been already reported earlier.

> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -817,6 +817,43 @@ ix86_target_macros (void)
>if (!TARGET_80387)
>  cpp_define (parse_in, "_SOFT_FLOAT");
>  
> +  /* HFmode/BFmode is supported without depending any isa
> + in scalar_mode_supported_p and libgcc_floating_mode_supported_p,
> + but according to psABI, they're really supported w/ SSE2 and above.
> + Since libstdc++ uses __STDCPP_FLOAT16_T__ and __STDCPP_BFLOAT16_T__
> + for backend support of the types, undef the macros to avoid
> + build failure, see PR109504.  */
> +  if (!TARGET_SSE2)
> +{
> +  if (c_dialect_cxx ()
> +   && cxx_dialect > cxx20)

Formatting, both conditions are short, so just put them on one line.

> + {
> +   cpp_undef (parse_in, "__STDCPP_FLOAT16_T__");
> +   cpp_undef (parse_in, "__STDCPP_BFLOAT16_T__");
> + }

But for the C++23 macros, more importantly I think we really should
also in ix86_target_macros_internal add
  if (c_dialect_cxx ()
  && cxx_dialect > cxx20
  && (isa_flag & OPTION_MASK_ISA_SSE2))
{
  def_or_undef (parse_in, "__STDCPP_FLOAT16_T__");
  def_or_undef (parse_in, "__STDCPP_BFLOAT16_T__");
}
plus associated libstdc++ changes.  It can be done incrementally though.

> +
> +  if (flag_building_libgcc)
> + {
> +   /* libbid uses __LIBGCC_HAS_HF_MODE__ and __LIBGCC_HAS_BF_MODE__
> +  to check backend support of _Float16 and __bf16 type.  */

That is actually the case only for HFmode, but not for BFmode right now.
So, we need further work.  One is to add the BFmode support in there,
and another one is make sure the _Float16 <-> _Decimal* and __bf16 <->
_Decimal* conversions are compiled in also if not -msse2 by default.
One way to do that is wrap the HF and BF mode related functions on x86
#ifndef __SSE2__ into the pragmas like intrin headers use (but then
perhaps we don't need to undef this stuff here), another is not provide
the hf/bf support in that case from the TUs where they are provided now,
but from a different one which would be compiled with -msse2.

> +   cpp_undef (parse_in, "__LIBGCC_HAS_HF_MODE__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_FUNC_EXT__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_MANT_DIG__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_EXCESS_PRECISION__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_EPSILON__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_MAX__");
> +   cpp_undef (parse_in, "__LIBGCC_HF_MIN__");
> +
> +   cpp_undef (parse_in, "__LIBGCC_HAS_BF_MODE__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_FUNC_EXT__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_MANT_DIG__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_EXCESS_PRECISION__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_EPSILON__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_MAX__");
> +   cpp_undef (parse_in, "__LIBGCC_BF_MIN__");
> + }
> +}
> +

> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -2651,7 +2651,10 @@ construct_container (machine_mode mode, machine_mode 
> orig_mode,
>  
>/* We allowed the user to turn off SSE for kernel mode.  Don't crash if
>   some less clueful developer tries to use floating-point anyway.  */
> -  if (needed_sseregs && !TARGET_SSE)
> +  if (needed_sseregs
> +  && (!TARGET_SSE
> +   || (VALID_SSE2_TYPE_MODE (mode)
> +   && !TARGET_SSE2)))

Formatting, no need to split this up that much.
  if (needed_sseregs
  && (!TARGET_SSE
  || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
or even better
  if (needed_sseregs
  && (!TARGET_SSE || (VALID_SSE2_TYPE_MODE (mode) && !TARGET_SSE2)))
will do it.

> @@ -22805,9 +22827,10 @@ ix86_emit_support_tinfos 
> (emit_support_tinfos_callback callback)
>  
>if (!TARGET_SSE2)
>  {
> -  gcc_checking_assert (!float16_type_node && !bfloat16_type_node);
> -  float16_type_node = ix86_float16_type_node;
> -  bfloat16_type_node = ix86_bf16_type_node;
> +  float16_type_node
> + = float16_type_node ? float16_type_node : ix86_float16_type_node;
> +  bfloat16_type_node
> + = bfloat16_type_node ? bfloat16_type_node : ix86_bf16_type_node;
>callback (float16_type_node);
>callback (bfloat16_type_node);

Instead of this, just use
  if (!float16_type_node)
{
  float16_type_node = ix86_float16_type_node;
  callback (float16_type_node);
  float16_type_node = NULL_TREE;
}
  if (!bfloat16_type_node)
{
  bfloat16_type_node = ix86_bf16_type_node;
  callback (bfloat16_type_node);
  bfloat16_type_node = NULL_TREE;
}
?
> +/* Return the diagnostic message string if conversion from FROMTYPE to
> +   TOTYPE is not allowed, NULL otherwise.  */
> +
> +static const char *
> +ix86_invalid_conversion (const_tree fromtype, c

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Tue, Apr 18, 2023 at 03:12:50PM +0200, Aldy Hernandez wrote:
> [I don't know why I keep poking at floats.  I must really like the pain.
> Jakub, are you OK with this patch for trunk?]

Thanks for working on this.  Though expectedly here we are running
again into the discussions we had in November about math properties of the
functions vs. numeric properties in their implementations, how big maximum
error shall we expect for the functions (and whether we should hardcode
it for all implementations, or have some more fine-grained list of expected
ulp errors for each implementation), whether the implementations at least
guarantee the basic mathematical properties of the functions even if they
have some errors (say for sin/cos, if they really never return > 1.0 or <
-1.0) and the same questions repeated for -frounding-math, what kind of
extra errors to expect when using non-default rounding and whether say sin
could ever return nextafter (1.0, 2.0) or even larger value say when
using non-default rounding mode.
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606466.html
was my attempt to get at least some numbers on some targets, I'm afraid
for most implementations we aren't going to get any numerical proofs of
maximum errors and the like.  For sin/cos to check whether the implementation
really never returns > 1.0 or < -1.0 perhaps instead of using randomized
testing we could exhaustively check some range around 0, M_PI, 3*M_PI,
-M_PI, -3*M_PI, and say some much larger multiples of M_PI, say 50 ulps
in each direction about those points, and similarly for sin around M_PI/2
etc., in all arounding modes.

> This is the range-op entry for sin/cos.  It is meant to serve as an
> example of what we can do for glibc math functions.  It is by no means
> exhaustive, just a stub to restrict the return range from sin/cos to
> [-1.0, 1.0] with appropriate smarts of NANs.
> 
> As can be seen in the testcase, we see sin() as well as
> __builtin_sin() in the IL, and can resolve the resulting range
> accordingly.
> 
> gcc/ChangeLog:
> 
>   * gimple-range-op.cc (class cfn_sincos): New.
>   (gimple_range_op_handler::maybe_builtin_call): Add case for sin/cos.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-ssa/range-sincos.c: New test.
> ---
>  gcc/gimple-range-op.cc   | 63 
>  gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c | 40 +
>  2 files changed, 103 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/range-sincos.c
> 
> diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
> index 4ca32a7b5d5..36390f2645e 100644
> --- a/gcc/gimple-range-op.cc
> +++ b/gcc/gimple-range-op.cc
> @@ -402,6 +402,60 @@ public:
>}
>  } op_cfn_copysign;
>  
> +class cfn_sincos : public range_operator_float
> +{
> +public:
> +  using range_operator_float::fold_range;
> +  using range_operator_float::op1_range;
> +  virtual bool fold_range (frange &r, tree type,
> +const frange &lh, const frange &,
> +relation_trio) const final override
> +  {
> +if (lh.known_isnan () || lh.known_isinf ())
> +  {
> + r.set_nan (type);
> + return true;
> +  }
> +r.set (type, dconstm1, dconst1);

See above, are we sure we can use [-1., 1.] range safely, or should that be
[-1.-Nulps, 1.+Nulps] for some kind of expected worse error margin of the
implementation?  And ditto for -frounding-math, shall we increase that
interval in that case, or is [-1., 1.] going to be ok?

> +if (!lh.maybe_isnan ())

This condition isn't sufficient, even if lh can't be NAN, but just
may be +Inf or -Inf, the result needs to include maybe NAN.

> +  r.clear_nan ();
> +return true;

Incrementally, if we decide what to do with the maximum allowed errors in
ulps, if lh's range is smaller than 2*M_PI (upper_bound () - lower_bound
()), we could narrow it down further by computing the exact values
for the bounds and any local maximums or minimums in between if any
and creating a range out of that.

> +  }
> +  virtual bool op1_range (frange &r, tree type,
> +   const frange &lhs, const frange &,
> +   relation_trio) const final override
> +  {
> +if (!lhs.maybe_isnan ())
> +  {
> + // If NAN is not valid result, the input cannot include either
> + // a NAN nor a +-INF.
> + REAL_VALUE_TYPE lb = real_min_representable (type);
> + REAL_VALUE_TYPE ub = real_max_representable (type);
> + r.set (type, lb, ub, nan_state (false, false));
> + return true;
> +  }
> +// A known NAN means the input is [-INF,-INF][+INF,+INF] U +-NAN,
> +// which we can't currently represent.
> +if (lhs.known_isnan ())
> +  {
> + r.set_varying (type);
> + return true;
> +  }
> +// Results outside of [-1.0, +1.0] are impossible.
> +REAL_VALUE_TYPE lb = lhs.lower_bound ();
> +REAL_VALUE_TYPE ub = lhs.upper_bound ();
> +if (real_less (&

Re: [PATCH v3] libgfortran: Replace mutex with rwlock

2023-04-20 Thread Zhu, Lipeng via Gcc-patches


Hi Bernhard,

Thanks for your questions and suggestions.
The rwlock could allow multiple threads to have concurrent read-only 
access to the cache/unit list, only a single writer is allowed.

Write lock will not be acquired until all read lock are released.
And I didn't change the mutex scope when refactor the code, only make a 
more fine-grained distinction for the read/write cache/unit list.


I complete the comment according to your template, I will insert the 
comment in the source code in next version patch with other refinement 
by your suggestions.

"
Now we did not get a unit in cache and unit list, so we need to create a
new unit, and update it to cache and unit list.
Prior to update the cache and list, we need to release all read locks,
and then immediately to acquire write lock, thus ensure the exclusive
update to the cache and unit list.
Either way, we will manipulate the cache and/or the unit list so we must
take a write lock now.
We don't take the write bit in *addition* to the read lock because:
1. It will needlessly complicate releasing the respective lock;
2. By separate the read/write lock, it will greatly reduce the
contention at the read part, while write part is not always necessary or
most unlikely once the unit hit in cache;
3. We try to balance the implementation complexity and the performance
gains that fit into current cases we observed.
"

Best Regards,
Zhu, Lipeng

On 1/1/1970 8:00 AM, Bernhard Reutner-Fischer wrote:

On 19 April 2023 09:06:28 CEST, Lipeng Zhu via Fortran  
wrote:

This patch try to introduce the rwlock and split the read/write to
unit_root tree and unit_cache with rwlock instead of the mutex to
increase CPU efficiency. In the get_gfc_unit function, the percentage
to step into the insert_unit function is around 30%, in most instances,
we can get the unit in the phase of reading the unit_cache or unit_root
tree. So split the read/write phase by rwlock would be an approach to
make it more parallel.

BTW, the IPC metrics can gain around 9x in our test server with 220
cores. The benchmark we used is https://github.com/rwesson/NEAT




+#define RD_TO_WRLOCK(rwlock) \
+  RWUNLOCK (rwlock);\
+  WRLOCK (rwlock);
+#endif
+




diff --git a/libgfortran/io/unit.c b/libgfortran/io/unit.c index
82664dc5f98..4312c5f36de 100644
--- a/libgfortran/io/unit.c
+++ b/libgfortran/io/unit.c



@@ -329,7 +335,7 @@ get_gfc_unit (int n, int do_create)
   int c, created = 0;

   NOTE ("Unit n=%d, do_create = %d", n, do_create);
-  LOCK (&unit_lock);
+  RDLOCK (&unit_rwlock);

retry:
   for (c = 0; c < CACHE_SIZE; c++)
@@ -350,6 +356,7 @@ retry:
   if (c == 0)
break;
 }
+  RD_TO_WRLOCK (&unit_rwlock);


So I'm trying to convince myself why it's safe to unlock and only then take the 
write lock.

Can you please elaborate/confirm why that's ok?

I wouldn't mind a comment like
We can release the unit and cache read lock now. We might have to allocate a 
(locked) unit, below in
do_create.
Either way, we will manipulate the cache and/or the unit list so we have to 
take a write lock now.

We don't take the write bit in *addition* to the read lock because..

(that needlessly complicates releasing the respective locks / it triggers too 
much contention when we..
/ ...?)

thanks,



   if (p == NULL && do_create)
 {
@@ -368,8 +375,8 @@ retry:
   if (created)
 {
   /* Newly created units have their lock held already
-from insert_unit.  Just unlock UNIT_LOCK and return.  */
-  UNLOCK (&unit_lock);
+from insert_unit.  Just unlock UNIT_RWLOCK and return.  */
+  RWUNLOCK (&unit_rwlock);
   return p;
 }

@@ -380,7 +387,7 @@ found:
   if (! TRYLOCK (&p->lock))
{
  /* assert (p->closed == 0); */
- UNLOCK (&unit_lock);
+ RWUNLOCK (&unit_rwlock);
  return p;
}

@@ -388,14 +395,14 @@ found:
 }


-  UNLOCK (&unit_lock);
+  RWUNLOCK (&unit_rwlock);

   if (p != NULL && (p->child_dtio == 0))
 {
   LOCK (&p->lock);
   if (p->closed)
{
- LOCK (&unit_lock);
+ WRLOCK (&unit_rwlock);
  UNLOCK (&p->lock);
  if (predec_waiting_locked (p) == 0)
destroy_unit_mutex (p);
@@ -593,8 +600,8 @@ init_units (void)
#endif
#endif

-#ifndef __GTHREAD_MUTEX_INIT
-  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_lock);
+#if (!defined(__GTHREAD_RWLOCK_INIT) &&
+!defined(__GTHREAD_MUTEX_INIT))
+  __GTHREAD_MUTEX_INIT_FUNCTION (&unit_rwlock);
#endif

   if (sizeof (max_offset) == 8)

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Siddhesh Poyarekar


On 2023-04-20 08:59, Jakub Jelinek via Gcc-patches wrote:

+r.set (type, dconstm1, dconst1);


See above, are we sure we can use [-1., 1.] range safely, or should that be
[-1.-Nulps, 1.+Nulps] for some kind of expected worse error margin of the
implementation?  And ditto for -frounding-math, shall we increase that
interval in that case, or is [-1., 1.] going to be ok?


Do any math implementations generate results outside of [-1., 1.]?  If 
yes, then it's a bug in those implementations IMO, not in the range 
assumption.  It feels wrong to cater for what ought to be trivially 
fixable in libraries if they ever happen to generate such results.


Thanks,
Sid

Re: [PATCH] RISC-V: Fix bug of PR109535

2023-04-20 Thread Kito Cheng via Gcc-patches

Committed, thanks!

On Wed, Apr 19, 2023 at 6:42 PM  wrote:
>
> From: Ju-Zhe Zhong 
>
> Testcase coming from Kito.
>
> Co-authored-by: kito-cheng 
> Co-authored-by: kito-cheng 
>
> PR 109535
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New 
> function.
> (pass_vsetvl::cleanup_insns): Fix bug.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/riscv/rvv/base/pr109535.C: New test.
> * gcc.target/riscv/rvv/base/pr109535.c: New test.
>
> Signed-off-by: Ju-Zhe Zhong 
> Co-authored-by: kito-cheng 
> Co-authored-by: kito-cheng 
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc  |  14 +-
>  .../g++.target/riscv/rvv/base/pr109535.C  | 144 ++
>  .../gcc.target/riscv/rvv/base/pr109535.c  |  11 ++
>  3 files changed, 168 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr109535.C
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr109535.c
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 1b66e3b9eeb..9c356ce5157 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1592,6 +1592,18 @@ backward_propagate_worthwhile_p (const basic_block 
> cfg_bb,
>return true;
>  }
>
> +/* Count the number of REGNO in RINSN.  */
> +static int
> +count_regno_occurrences (rtx_insn *rinsn, unsigned int regno)
> +{
> +  int count = 0;
> +  extract_insn (rinsn);
> +  for (int i = 0; i < recog_data.n_operands; i++)
> +if (refers_to_regno_p (regno, recog_data.operand[i]))
> +  count++;
> +  return count;
> +}
> +
>  avl_info::avl_info (const avl_info &other)
>  {
>m_value = other.get_value ();
> @@ -3924,7 +3936,7 @@ pass_vsetvl::cleanup_insns (void) const
>   if (!has_vl_op (rinsn) || !REG_P (get_vl (rinsn)))
> continue;
>   rtx avl = get_vl (rinsn);
> - if (count_occurrences (PATTERN (rinsn), avl, 0) == 1)
> + if (count_regno_occurrences (rinsn, REGNO (avl)) == 1)
> {
>   /* Get the list of uses for the new instruction.  */
>   auto attempt = crtl->ssa->new_change_attempt ();
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr109535.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr109535.C
> new file mode 100644
> index 000..7013cfcf4ee
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr109535.C
> @@ -0,0 +1,144 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */
> +
> +typedef long size_t;
> +typedef signed char int8_t;
> +typedef  char uint8_t
> +
> +;
> +template < typename > struct Relations{ using Unsigned = uint8_t; };
> +template < typename T > using MakeUnsigned = typename Relations< T 
> >::Unsigned;
> +#pragma riscv intrinsic "vector"
> +size_t ScaleByPower() {  return 0;}
> +template < typename Lane, size_t , int > struct Simd {
> +using T = Lane;
> +
> +template < typename NewT > using Rebind = Simd< NewT, 1, 0 >;
> +};
> +template < typename T > struct ClampNAndPow2 {
> +using type = Simd< T, 65536, 0 >
> +;
> +};
> +struct CappedTagChecker {
> +using type = ClampNAndPow2< signed char >::type;
> +};
> +template < typename , size_t , int >
> +using CappedTag = CappedTagChecker::type;
> +template < class D > using TFromD = typename D::T;
> +template < class T, class D > using Rebind = typename D::Rebind< T >;
> +template < class D >
> +using RebindToUnsigned = Rebind< MakeUnsigned<  D  >, D >;
> +template < size_t N >
> +size_t
> +Lanes(Simd< uint8_t, N, 0 > ) {
> +size_t kFull = 0;
> +size_t kCap ;
> +size_t actual =
> +__riscv_vsetvl_e8m1(kCap);
> +return actual;
> +}
> +template < size_t N >
> +size_t
> +Lanes(Simd< int8_t, N, 0 > ) {
> +size_t kFull  ;
> +size_t kCap ;
> +size_t actual =
> +__riscv_vsetvl_e8m1(kCap);
> +return actual;
> +}
> +template < size_t N >
> +vuint8m1_t
> +Set(Simd< uint8_t, N, 0 > d, uint8_t arg) {
> +size_t __trans_tmp_1 = Lanes(d);
> +return __riscv_vmv_v_x_u8m1(arg, __trans_tmp_1);
> +}
> +template < size_t N >
> +vint8m1_t Set(Simd< int8_t, N, 0 > , int8_t );
> +template < class D > using VFromD = decltype(Set(D(), TFromD< D >()));
> +template < class D >
> +VFromD< D >
> +Zero(D )
> +;
> +
> +template < size_t N >
> +vint8m1_t
> +BitCastFromByte(Simd< int8_t, N, 0 >, vuint8m1_t v) {
> +return __riscv_vreinterpret_v_u8m1_i8m1(v);
> +}
> +template < class D, class FromV >
> +VFromD< D >
> +BitCast(D d, FromV v) {
> +return BitCastFromByte(d, v)
> +
> +;
> +}
> +template < size_t N >
> +void
> +Store(vint8m1_t v, Simd< int8_t, N, 0 > d) {
> +int8_t *p ;
> +__riscv_vse8_v_i8m1(p, v, Lanes(d));
> +}
> +template < class V, class D >
> +void
> +StoreU(V v, D d) {
> +Store(v, d)
> +;
> +}
> +template < class D > using Vec = decltype(Zero(D()));
> +size_t Generate_count;
> +template < class D, class Func>
> +void Generate(D d, Func func) {
> +RebindToUnsigned< D > du
> +;
> +size_t N = Lanes(d)

Re: [ping][vect-patterns] Refactor widen_plus/widen_minus as internal_fns

2023-04-20 Thread Andre Vieira (lists) via Gcc-patches


Rebased all three patches and made some small changes to the second one:
- removed sub and abd optabs from commutative_optab_p, I suspect this 
was a copy paste mistake,
- removed what I believe to be a superfluous switch case in vectorizable 
conversion, the one that was here:

+  if (code.is_fn_code ())
+ {
+  internal_fn ifn = as_internal_fn (code.as_fn_code ());
+  int ecf_flags = internal_fn_flags (ifn);
+  gcc_assert (ecf_flags & ECF_MULTI);
+
+  switch (code.as_fn_code ())
+   {
+   case CFN_VEC_WIDEN_PLUS:
+ break;
+   case CFN_VEC_WIDEN_MINUS:
+ break;
+   case CFN_LAST:
+   default:
+ return false;
+   }
+
+  internal_fn lo, hi;
+  lookup_multi_internal_fn (ifn, &lo, &hi);
+  *code1 = as_combined_fn (lo);
+  *code2 = as_combined_fn (hi);
+  optab1 = lookup_multi_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
+  optab2 = lookup_multi_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
 }

I don't think we need to check they are a specfic fn code, as we look-up 
optabs and if they succeed then surely we can vectorize?


OK for trunk?

Kind regards,
Andre
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
8802141cd6edb298866025b8a55843eae1f0eb17..68dfba266d679c9738a3d5d70551a91cbdafcf66
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimple-fold.h"
 #include "ssa.h"
 #include "expmed.h"
 #include "optabs-tree.h"
@@ -1391,7 +1393,7 @@ vect_recog_sad_pattern (vec_info *vinfo,
 static gimple *
 vect_recog_widen_op_pattern (vec_info *vinfo,
 stmt_vec_info last_stmt_info, tree *type_out,
-tree_code orig_code, tree_code wide_code,
+tree_code orig_code, code_helper wide_code,
 bool shift_p, const char *name)
 {
   gimple *last_stmt = last_stmt_info->stmt;
@@ -1434,7 +1436,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
   vecctype = get_vectype_for_scalar_type (vinfo, ctype);
 }
 
-  enum tree_code dummy_code;
+  code_helper dummy_code;
   int dummy_int;
   auto_vec dummy_vec;
   if (!vectype
@@ -1455,8 +1457,7 @@ vect_recog_widen_op_pattern (vec_info *vinfo,
   2, oprnd, half_type, unprom, vectype);
 
   tree var = vect_recog_temp_ssa_var (itype, NULL);
-  gimple *pattern_stmt = gimple_build_assign (var, wide_code,
- oprnd[0], oprnd[1]);
+  gimple *pattern_stmt = vect_gimple_build (var, wide_code, oprnd[0], 
oprnd[1]);
 
   if (vecctype != vecitype)
 pattern_stmt = vect_convert_output (vinfo, last_stmt_info, ctype,
@@ -6406,3 +6407,28 @@ vect_pattern_recog (vec_info *vinfo)
   /* After this no more add_stmt calls are allowed.  */
   vinfo->stmt_vec_info_ro = true;
 }
+
+/* Build a GIMPLE_ASSIGN or GIMPLE_CALL with the tree_code,
+   or internal_fn contained in ch, respectively.  */
+gimple *
+vect_gimple_build (tree lhs, code_helper ch, tree op0, tree op1)
+{
+  if (op0 == NULL_TREE)
+return NULL;
+  if (ch.is_tree_code ())
+return op1 == NULL_TREE ? gimple_build_assign (lhs, ch.safe_as_tree_code 
(),
+  op0) :
+ gimple_build_assign (lhs, ch.safe_as_tree_code (),
+  op0, op1);
+  else
+  {
+internal_fn fn = as_internal_fn (ch.safe_as_fn_code ());
+gimple* stmt;
+if (op1 == NULL_TREE)
+  stmt = gimple_build_call_internal (fn, 1, op0);
+else
+  stmt = gimple_build_call_internal (fn, 2, op0, op1);
+gimple_call_set_lhs (stmt, lhs);
+return stmt;
+  }
+}
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 
6b7dbfd4a231baec24e740ffe0ce0b0bf7a1de6b..715ec2e30a4de620b8a5076c0e7f2f7fd1b0654e
 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4768,7 +4768,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
stmt_vec_info stmt_info,
STMT_INFO is the original scalar stmt that we are vectorizing.  */
 
 static gimple *
-vect_gen_widened_results_half (vec_info *vinfo, enum tree_code code,
+vect_gen_widened_results_half (vec_info *vinfo, code_helper ch,
tree vec_oprnd0, tree vec_oprnd1, int op_type,
   tree vec_dest, gimple_stmt_iterator *gsi,
   stmt_vec_info stmt_info)
@@ -4777,12 +4777,11 @@ vect_gen_widened_results_half (vec_info *vinfo, enum 
tree_code code,
   tree new_temp;
 
   /* Generate half of the widened result:  */
-  gcc_assert (op_type == TREE_CODE_LENGTH (code));
   if (op_type != binary_op)
 vec_oprnd1 = NULL;
-  new_stmt = gimple_build_assign (vec_dest, code, vec_oprnd0, vec_oprnd1);
+  new_stmt = vect_gimple_build (vec_dest, ch, vec_

[committed] RISC-V: Fix simplify_ior_optimization.c on rv32

2023-04-20 Thread Kito Cheng via Gcc-patches

GCC will complaint if target ABI isn't have corresponding multi-lib on
glibc toolchain, use stdint-gcc.h to suppress that.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/simplify_ior_optimization.c: Use stdint-gcc.h
rather than stdint.h
---
 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c 
b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
index ec3bd0baf03f..b94e1ee25ab7 100644
--- a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
+++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
 
-#include 
+#include "stdint-gcc.h"
 
 uint8_t test_simplify_ior_scalar_case_0 (uint8_t a)
 {
-- 
2.39.2

[committed] RISC-V: Fix riscv/arch-19.c with different ISA spec version

2023-04-20 Thread Kito Cheng via Gcc-patches

In newer ISA spec, F will implied zicsr, add that into -march option to
prevent different test result on different default -misa-spec version.

gcc/testsuite/

* gcc.target/riscv/arch-19.c: Add -misa-spec.
---
 gcc/testsuite/gcc.target/riscv/arch-19.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/arch-19.c 
b/gcc/testsuite/gcc.target/riscv/arch-19.c
index b042e1a49fe6..95204ede26a6 100644
--- a/gcc/testsuite/gcc.target/riscv/arch-19.c
+++ b/gcc/testsuite/gcc.target/riscv/arch-19.c
@@ -1,4 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64if_zfinx -mabi=lp64" } */
+/* { dg-options "-march=rv64if_zicsr_zfinx -mabi=lp64" } */
 int foo() {}
-/* { dg-error "'-march=rv64if_zfinx': z\\*inx conflicts with floating-point 
extensions" "" { target *-*-* } 0 } */
+/* { dg-error "'-march=rv64if_zicsr_zfinx': z\\*inx conflicts with 
floating-point extensions" "" { target *-*-* } 0 } */
-- 
2.39.2

[committed][OG10] amdgcn, openmp: Fix concurrency in low-latency allocator

2023-04-20 Thread Andrew Stubbs


I've committed this to the devel/omp/gcc-12 branch.

The patch fixes a concurrency issue where the spin-locks didn't work 
well if many GPU threads tried to free low-latency memory all at once.


Adding a short sleep instruction is enough for the hardware thread to 
yield and allow another to proceed. The alloc routine already had this 
feature, so this just corrects an accidental omission.


This patch will get folded into the previous OG12 patch series when I 
repost it for mainline.


Andrewamdgcn, openmp: Fix concurrency in low-latency allocator

The previous code works fine on Fiji and Vega 10 devices, but bogs down in The
spin locks on Vega 20 or newer.  Adding the sleep instructions fixes the
problem.

libgomp/ChangeLog:

* basic-allocator.c (basic_alloc_free): Use BASIC_ALLOC_YIELD.
(basic_alloc_realloc): Use BASIC_ALLOC_YIELD.

diff --git a/libgomp/basic-allocator.c b/libgomp/basic-allocator.c
index b4b9e4ba13a..a61828e48a0 100644
--- a/libgomp/basic-allocator.c
+++ b/libgomp/basic-allocator.c
@@ -188,6 +188,7 @@ basic_alloc_free (char *heap, void *addr, size_t size)
  break;
}
   /* Spin.  */
+  BASIC_ALLOC_YIELD;
 }
   while (1);
 
@@ -267,6 +268,7 @@ basic_alloc_realloc (char *heap, void *addr, size_t oldsize,
  break;
}
   /* Spin.  */
+  BASIC_ALLOC_YIELD;
 }
   while (1);

Re: [PATCH] RISC-V: Fix reg order of RVV registers.

2023-04-20 Thread Kito Cheng via Gcc-patches

Committed to trunk, thanks :)

On Tue, Apr 18, 2023 at 9:50 PM Jeff Law  wrote:
>
>
>
> On 3/13/23 02:19, juzhe.zh...@rivai.ai wrote:
> > From: Ju-Zhe Zhong 
> >
> > Co-authored-by: kito-cheng 
> > Co-authored-by: kito-cheng 
> >
> > Consider this case:
> > void f19 (void *base,void *base2,void *out,size_t vl, int n)
> > {
> >  vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl);
> >  for (int i = 0; i < n; i++){
> >vbool8_t m = __riscv_vlm_v_b8 (base + i, vl);
> >vuint64m8_t v = __riscv_vluxei64_v_u64m8_m(m,base,bindex,vl);
> >vuint64m8_t v2 = __riscv_vle64_v_u64m8_tu (v, base2 + i, vl);
> >vint8m1_t v3 = __riscv_vluxei64_v_i8m1_m(m,base,v,vl);
> >vint8m1_t v4 = __riscv_vluxei64_v_i8m1_m(m,base,v2,vl);
> >__riscv_vse8_v_i8m1 (out + 100*i,v3,vl);
> >__riscv_vse8_v_i8m1 (out + 222*i,v4,vl);
> >  }
> > }
> >
> > Due to the current unreasonable reg order, this case produce unnecessary
> > register spillings.
> >
> > Fix the order can help for RA.
> Note that this is likely a losing game -- over time you're likely to
> find that one ordering works better for one set of inputs while another
> ordering works better for a different set of inputs.
>
> So while I don't object to the patch, in general we try to find a
> reasonable setting, knowing that it's likely not to be optimal in all cases.
>
> Probably the most important aspect of this patch in my mind is moving
> the vector mask register to the end so that it's only used for vectors
> when we've exhausted the whole vector register file.  Thus it's more
> likely to be usable as a mask when we need it for that purpose.
>
> OK for the trunk and backporting to the shared RISC-V sub-branch off
> gcc-13 (once it's created).
>
> jeff
>
> >

[PATCH 1/2] c++: make strip_typedefs generalize strip_typedefs_expr

2023-04-20 Thread Patrick Palka via Gcc-patches

If we have a TREE_VEC of types that we want to strip of typedefs, we
unintuitively need to call strip_typedefs_expr instead of strip_typedefs
since only strip_typedefs_expr handles TREE_VEC, and it also dispatches
to strip_typedefs when given a type.  But this seems backwards: arguably
strip_typedefs_expr should be the more specialized function, which
strip_typedefs dispatches to (and thus generalizes).

This patch makes strip_typedefs generalize strip_typedefs_expr, which
allows for some simplifications.

gcc/cp/ChangeLog:

* tree.cc (strip_typedefs): Move TREE_LIST handling to
strip_typedefs_expr.  Dispatch to strip_typedefs_expr
for a non-type 't'.
: Remove manual dispatching to
strip_typedefs_expr.
: Likewise.
(strip_typedefs_expr): Replaces calls to strip_typedefs_expr
with strip_typedefs throughout.  Don't dispatch to strip_typedefs
for a type 't'.
: Replace this with the better version from
strip_typedefs.
---
 gcc/cp/tree.cc | 83 +++---
 1 file changed, 24 insertions(+), 59 deletions(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 2c22fac17ee..f0fb78fe69d 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -1562,7 +1562,8 @@ apply_identity_attributes (tree result, tree attribs, 
bool *remove_attributes)
 
 /* Builds a qualified variant of T that is either not a typedef variant
(the default behavior) or not a typedef variant of a user-facing type
-   (if FLAGS contains STF_USER_FACING).
+   (if FLAGS contains STF_USER_FACING).  If T is not a type, then this
+   just calls strip_typedefs_expr.
 
E.g. consider the following declarations:
  typedef const int ConstInt;
@@ -1596,25 +1597,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
   if (!t || t == error_mark_node)
 return t;
 
-  if (TREE_CODE (t) == TREE_LIST)
-{
-  bool changed = false;
-  releasing_vec vec;
-  tree r = t;
-  for (; t; t = TREE_CHAIN (t))
-   {
- gcc_assert (!TREE_PURPOSE (t));
- tree elt = strip_typedefs (TREE_VALUE (t), remove_attributes, flags);
- if (elt != TREE_VALUE (t))
-   changed = true;
- vec_safe_push (vec, elt);
-   }
-  if (changed)
-   r = build_tree_list_vec (vec);
-  return r;
-}
-
-  gcc_assert (TYPE_P (t));
+  if (!TYPE_P (t))
+return strip_typedefs_expr (t, remove_attributes, flags);
 
   if (t == TYPE_CANONICAL (t))
 return t;
@@ -1747,12 +1731,7 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
for (int i = 0; i < TREE_VEC_LENGTH (args); ++i)
  {
tree arg = TREE_VEC_ELT (args, i);
-   tree strip_arg;
-   if (TYPE_P (arg))
- strip_arg = strip_typedefs (arg, remove_attributes, flags);
-   else
- strip_arg = strip_typedefs_expr (arg, remove_attributes,
-  flags);
+   tree strip_arg = strip_typedefs (arg, remove_attributes, flags);
TREE_VEC_ELT (new_args, i) = strip_arg;
if (strip_arg != arg)
  changed = true;
@@ -1792,11 +1771,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
   break;
 case TRAIT_TYPE:
   {
-   tree type1 = TRAIT_TYPE_TYPE1 (t);
-   if (TYPE_P (type1))
- type1 = strip_typedefs (type1, remove_attributes, flags);
-   else
- type1 = strip_typedefs_expr (type1, remove_attributes, flags);
+   tree type1 = strip_typedefs (TRAIT_TYPE_TYPE1 (t),
+remove_attributes, flags);
tree type2 = strip_typedefs (TRAIT_TYPE_TYPE2 (t),
 remove_attributes, flags);
if (type1 == TRAIT_TYPE_TYPE1 (t) && type2 == TRAIT_TYPE_TYPE2 (t))
@@ -1883,7 +1859,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = NULL 
*/,
   return cp_build_qualified_type (result, cp_type_quals (t));
 }
 
-/* Like strip_typedefs above, but works on expressions, so that in
+/* Like strip_typedefs above, but works on expressions (and other non-types
+   such as TREE_VEC), so that in
 
template struct A
{
@@ -1908,11 +1885,6 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
   if (DECL_P (t) || CONSTANT_CLASS_P (t))
 return t;
 
-  /* Some expressions have type operands, so let's handle types here rather
- than check TYPE_P in multiple places below.  */
-  if (TYPE_P (t))
-return strip_typedefs (t, remove_attributes, flags);
-
   code = TREE_CODE (t);
   switch (code)
 {
@@ -1940,26 +1912,19 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
 
 case TREE_LIST:
   {
-   releasing_vec vec;
bool changed = false;
-   tree it;
-   for (it = t; it; it = TREE_CHAIN (it))
+   releasing_vec vec;
+   r =

[PATCH 2/2] c++: use TREE_VEC for trailing args of variadic built-in traits

2023-04-20 Thread Patrick Palka via Gcc-patches

This patch makes us use a TREE_VEC instead of TREE_LIST to represent the
trailing arguments of a variadic built-in trait.  These built-ins are
typically passed a simple pack expansion as the second argument, e.g.

   __is_constructible(T, Ts...)

so the main benefit of this representation change means that expanding
such an argument list at substitution time is now basically free, since
argument packs are also TREE_VECs and tsubst_template_args makes sure
we reuse this TREE_VEC when expanding such pack expansions.  Previously,
we would perform the expansion via tsubst_tree_list which converts the
expanded pack expansion into a TREE_LIST.

Note, after this patch an empty set of trailing arguments is now
represented as an empty TREE_VEC instead of NULL_TREE, so
TRAIT_TYPE/EXPR_TYPE2 should be empty only for unary traits now.

(This patch slightly depends on "c++: make strip_typedefs generalize
strip_typedefs_expr".  Without it, strip_typedefs 
would need to conditionally dispatch to strip_typedefs_expr for
non-TYPE_P TRAIT_TYPE_TYPE2 since it could now be a TREE_VEC which
only strip_typedefs_expr handles.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Convert a TREE_VEC
of arguments into a TREE_LIST for sake of pretty printing.
* cxx-pretty-print.cc (pp_cxx_trait): Handle TREE_VEC
instead of TREE_LIST of variadic trait arguments.
* method.cc (constructible_expr): Likewise.
(is_xible_helper): Likewise.
* parser.cc (cp_parser_trait): Represent variadic trait
arguments as a TREE_VEC instead of TREE_LIST.
* pt.cc (value_dependent_expression_p): Handle TREE_VEC
instead of TREE_LIST of variadic trait arguments.
* semantics.cc (finish_type_pack_element): Likewise.
(check_trait_type): Likewise.
---
 gcc/cp/constraint.cc   | 10 ++
 gcc/cp/cxx-pretty-print.cc |  6 +++---
 gcc/cp/method.cc   | 17 +
 gcc/cp/parser.cc   | 10 ++
 gcc/cp/pt.cc   |  9 -
 gcc/cp/semantics.cc| 15 +--
 6 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 273d15ab097..dfead28e8c7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3675,6 +3675,16 @@ diagnose_trait_expr (tree expr, tree args)
 
   tree t1 = TRAIT_EXPR_TYPE1 (expr);
   tree t2 = TRAIT_EXPR_TYPE2 (expr);
+  if (t2 && TREE_CODE (t2) == TREE_VEC)
+{
+  /* Convert the TREE_VEC of arguments into a TREE_LIST, since the
+pretty printer cannot directly print a TREE_VEC but it can a
+TREE_LIST via the E format specifier.  */
+  tree list = NULL_TREE;
+  for (tree t : tree_vec_range (t2))
+   list = tree_cons (NULL_TREE, t, list);
+  t2 = nreverse (list);
+}
   switch (TRAIT_EXPR_KIND (expr))
 {
 case CPTK_HAS_NOTHROW_ASSIGN:
diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
index c33919873f1..4cda27f2b30 100644
--- a/gcc/cp/cxx-pretty-print.cc
+++ b/gcc/cp/cxx-pretty-print.cc
@@ -2640,16 +2640,16 @@ pp_cxx_trait (cxx_pretty_printer *pp, tree t)
 }
   if (type2)
 {
-  if (TREE_CODE (type2) != TREE_LIST)
+  if (TREE_CODE (type2) != TREE_VEC)
{
  pp_cxx_separate_with (pp, ',');
  pp->type_id (type2);
}
   else
-   for (tree arg = type2; arg; arg = TREE_CHAIN (arg))
+   for (tree arg : tree_vec_range (type2))
  {
pp_cxx_separate_with (pp, ',');
-   pp->type_id (TREE_VALUE (arg));
+   pp->type_id (arg);
  }
 }
   if (kind == CPTK_TYPE_PACK_ELEMENT)
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 225ec456143..00eae56eb5b 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2075,8 +2075,9 @@ constructible_expr (tree to, tree from)
   if (!TYPE_REF_P (to))
to = cp_build_reference_type (to, /*rval*/false);
   tree ob = build_stub_object (to);
-  for (; from; from = TREE_CHAIN (from))
-   vec_safe_push (args, build_stub_object (TREE_VALUE (from)));
+  vec_alloc (args, TREE_VEC_LENGTH (from));
+  for (tree arg : tree_vec_range (from))
+   args->quick_push (build_stub_object (arg));
   expr = build_special_member_call (ob, complete_ctor_identifier, &args,
ctype, LOOKUP_NORMAL, tf_none);
   if (expr == error_mark_node)
@@ -2096,9 +2097,9 @@ constructible_expr (tree to, tree from)
 }
   else
 {
-  if (from == NULL_TREE)
+  const int len = TREE_VEC_LENGTH (from);
+  if (len == 0)
return build_value_init (strip_array_types (to), tf_none);
-  const int len = list_length (from);
   if (len > 1)
{
  if (cxx_dialect < cxx20)
@@ -2112,9 +2113,9 @@ constructible_expr (tree to, tree from)
 should be true.  */
  vec *v;

RE: Re: [PATCH v2] RISC-V: Bugfix for RVV vbool*_t vn_reference_equal.

2023-04-20 Thread Li, Pan2 via Gcc-patches

Hi Kito,

There is one patch reviewed already and I suppose it will be ok after GCC 14 
open. Could you please help to double check about it?

Pann

-Original Message-
From: Gcc-patches  On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Wednesday, March 29, 2023 6:39 PM
To: juzhe.zh...@rivai.ai; rguenther 
Cc: gcc-patches ; Kito.cheng ; 
Wang, Yanzhang 
Subject: RE: Re: [PATCH v2] RISC-V: Bugfix for RVV vbool*_t vn_reference_equal.

Cool. Thank you all for this, have a nice day!

Pan

From: juzhe.zh...@rivai.ai 
Sent: Wednesday, March 29, 2023 5:35 PM
To: rguenther ; Li, Pan2 
Cc: gcc-patches ; Kito.cheng ; 
Wang, Yanzhang 
Subject: Re: Re: [PATCH v2] RISC-V: Bugfix for RVV vbool*_t vn_reference_equal.

Thanks Richard && Pan.

Pan has passed the bootstrap and I will merge this patch when GCC 14 is open (I 
have write access now).


juzhe.zh...@rivai.ai

From: Richard Biener
Date: 2023-03-29 17:24
To: pan2.li
CC: gcc-patches; 
juzhe.zhong; 
kito.cheng; 
yanzhang.wang
Subject: Re: [PATCH v2] RISC-V: Bugfix for RVV vbool*_t vn_reference_equal.
On Wed, 29 Mar 2023, pan2...@intel.com wrote:

> From: Pan Li mailto:pan2...@intel.com>>
>
> In most architecture the precision_size of vbool*_t types are 
> caculated like as the multiple of the type size.  For example:
> precision_size = type_size * 8 (aka, bit count per bytes).
>
> Unfortunately, some architecture like RISC-V will adjust the 
> precision_size for the vbool*_t in order to align the ISA. For example as 
> below.
> type_size  = [1, 1, 1, 1,  2,  4,  8]
> precision_size = [1, 2, 4, 8, 16, 32, 64]
>
> Then the precision_size of RISC-V vbool*_t will not be the multiple of 
> the type_size. This PATCH try to enrich this case when comparing the 
> vn_reference.
>
> Given we have the below code:
> void test_vbool8_then_vbool16(int8_t * restrict in, int8_t * restrict out) {
> vbool8_t v1 = *(vbool8_t*)in;
> vbool16_t v2 = *(vbool16_t*)in;
>
> *(vbool8_t*)(out + 100) = v1;
> *(vbool16_t*)(out + 200) = v2;
> }
>
> Before this PATCH:
> csrrt0,vlenb
> sllit1,t0,1
> csrra3,vlenb
> sub sp,sp,t1
> sllia4,a3,1
> add a4,a4,sp
> addia2,a1,100
> vsetvli a5,zero,e8,m1,ta,ma
> sub a3,a4,a3
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a2)
> vsm.v   v24,0(a3)
> addia1,a1,200
> csrrt0,vlenb
> vsetvli a4,zero,e8,mf2,ta,ma
> sllit1,t0,1
> vlm.v   v24,0(a3)
> vsm.v   v24,0(a1)
> add sp,sp,t1
> jr  ra
>
> After this PATCH:
> addia3,a1,100
> vsetvli a4,zero,e8,m1,ta,ma
> addia1,a1,200
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a3)
> vsetvli a5,zero,e8,mf2,ta,ma
> vlm.v   v24,0(a0)
> vsm.v   v24,0(a1)
> ret

OK if this passes bootstrap / regtest.

Thanks,
Richard.

> PR 109272
>
> gcc/ChangeLog:
>
> * tree-ssa-sccvn.cc (vn_reference_eq):
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/pr108185-4.c:
> * gcc.target/riscv/rvv/base/pr108185-5.c:
> * gcc.target/riscv/rvv/base/pr108185-6.c:
>
> Signed-off-by: Pan Li mailto:pan2...@intel.com>>
> ---
>  .../gcc.target/riscv/rvv/base/pr108185-4.c|  2 +-
>  .../gcc.target/riscv/rvv/base/pr108185-5.c|  2 +-
>  .../gcc.target/riscv/rvv/base/pr108185-6.c|  2 +-
>  gcc/tree-ssa-sccvn.cc | 20 +++
>  4 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c
> index ea3c360d756..e70284fada8 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-4.c
> @@ -65,4 +65,4 @@ test_vbool8_then_vbool64(int8_t * restrict in, 
> int8_t * restrict out) {
>  /* { dg-final { scan-assembler-times 
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>  /* { dg-final { scan-assembler-times 
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf8,\s*ta,\s*ma} 1 } } */
>  /* { dg-final { scan-assembler-times 
> {vlm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> -/* { dg-final { scan-assembler-times 
> {vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 15 } } */
> +/* { dg-final { scan-assembler-times 
> +{vsm\.v\s+v[0-9]+,\s*0\([a-x][0-9]+\)} 12 } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c
> index 9fc659d2402..575a7842cdf 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr108185-5.c
> @@ -65,4 +65,4 @@ test_vbool16_then_vbool64(int8_t * restrict in, 
> int8_t * restrict out) {
>  /* { dg-final { scan-assembler-times 
> {vsetvli\s+[a-x][0-9]+,\s*zero,\s*e8,\s*mf4,\s*ta,\s*ma} 1 } } */
>  /* { dg-final { scan-assembler-times 
> {vsetvli\s+[a-x][0-9]+,

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 09:17:10AM -0400, Siddhesh Poyarekar wrote:
> On 2023-04-20 08:59, Jakub Jelinek via Gcc-patches wrote:
> > > +r.set (type, dconstm1, dconst1);
> > 
> > See above, are we sure we can use [-1., 1.] range safely, or should that be
> > [-1.-Nulps, 1.+Nulps] for some kind of expected worse error margin of the
> > implementation?  And ditto for -frounding-math, shall we increase that
> > interval in that case, or is [-1., 1.] going to be ok?
> 
> Do any math implementations generate results outside of [-1., 1.]?  If yes,

Clearly they do.

> then it's a bug in those implementations IMO, not in the range assumption.
> It feels wrong to cater for what ought to be trivially fixable in libraries
> if they ever happen to generate such results.

So, I wrote following test.

On x86_64-linux with glibc 2.35, I see
for i in FLOAT DOUBLE LDOUBLE FLOAT128; do for j in TONEAREST UPWARD DOWNWARD 
TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o /tmp/sincos{,.c} -lm; 
/tmp/sincos || echo $i $j; done; done
Aborted (core dumped)
FLOAT UPWARD
Aborted (core dumped)
FLOAT DOWNWARD
On sparc-sun-solaris2.11 I see
for i in FLOAT DOUBLE LDOUBLE; do for j in TONEAREST UPWARD DOWNWARD 
TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o sincos{,.c} -lm; ./sincos || 
echo $i $j; done; done
Abort (core dumped)
DOUBLE UPWARD
Abort (core dumped)
DOUBLE DOWNWARD
Haven't tried anything else.  So that shows (but doesn't prove) that
maybe [-1., 1.] interval is fine for -fno-rounding-math on those, but not
for -frounding-math.

Jakub
#define _GNU_SOURCE
#include 
#include 

#ifdef FLOAT
#define TYPE float
#define SIN sinf
#define COS cosf
#ifdef M_PI_2f
#define PI2 M_PI_2f
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterf
#elif defined DOUBLE
#define TYPE double
#define SIN sin
#define COS cos
#ifdef M_PI_2
#define PI2 M_PI_2
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafter
#elif defined LDOUBLE
#define TYPE long double
#define SIN sinl
#define COS cosl
#ifdef M_PI_2l
#define PI2 M_PI_2l
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterl
#elif defined FLOAT128
#define TYPE _Float128
#define SIN sinf128
#define COS cosf128
#ifdef 
#define PI2 M_PI_2f128
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterf128
#endif

int
main ()
{
#ifdef ROUND
  fesetround (ROUND);
#endif
  for (int i = -1024; i <= 1024; i++)
for (int j = -1; j <= 1; j += 2)
  {
TYPE val = ((TYPE) i) * PI2;
TYPE inf = j * __builtin_inf ();
for (int k = 0; k < 1000; k++)
  {
TYPE res = SIN (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
  __builtin_abort ();
res = COS (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
  __builtin_abort ();
val = NEXTAFTER (val, inf);
  }
  }
}

Ping * 3: Fwd: [V6][PATCH 0/2] Handle component_ref to a structure/union field including FAM for builtin_object_size

2023-04-20 Thread Qing Zhao via Gcc-patches

This is the 3rd ping for the 6th version of the patches.

Now, GCC14 is open. Is it ready to commit these patches to GCC14?

Kees has tested this version of the patch with Linux kernel, and everything is 
good, and relsolved many false
positives for bounds checking.

Note for the review history of these patches (2 patches)
1.  The patch 1/2: Handle component_ref to a structre/union field including  
flexible array member [PR101832]

The C front-end part has been approved by Joseph.
For the middle-end, most of the change has been reviewed by Richard (and 
modified based on his comments
 and suggestions), except the change in 
tree-object-size.cc, which need Jakub’s review and 
approval.

Jakub, could you review the middle end of change to see whether it’s ready 
for trunk?

3. The patch 2/2: Update documentation to clarify a GCC extension

This is basically a C FE and documentation change, I have updated it based 
on previous comments and suggestions.
Joseph, could you review it to see whether this version is ready to go?

Thanks a lot.

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [V6][PATCH 0/2] Handle component_ref to a structure/union field 
including FAM for builtin_object_size
Date: March 28, 2023 at 11:49:42 AM EDT
To: ja...@redhat.com, 
jos...@codesourcery.com
Cc: richard.guent...@gmail.com, 
keesc...@chromium.org, 
siddh...@gotplt.org, 
gcc-patches@gcc.gnu.org, Qing Zhao 
mailto:qing.z...@oracle.com>>

Hi, Joseph and Jakub,

this is the 6th version of the patch.
compared to the 5th version, the major changes are:

1. Update the documentation Per Joseph's comments;
2. Change the name of the new warning option per Jakub's suggestions.
3. Update testing case per the above change.

these changes are all in the 2th patch (2/2 Update documentation to
clarify a GCC extension).

The first patch (1/2 Handle component_ref to a structre/union field
including  flexible array member [PR101832]) is not changed

For the first patch, As a record, Joseph has approved the C front-end change,
I only need a review from Jakub for the Middle-end.

bootstrapped and regression tested on aarch64 and x86.

Okay for commit?

thanks.

Qing

=

Qing Zhao (2):
 Handle component_ref to a structre/union field including flexible
   array member [PR101832]
 Update documentation to clarify a GCC extension

gcc/c-family/c.opt|   5 +
gcc/c/c-decl.cc   |  20 +++
gcc/doc/extend.texi   |  45 +-
gcc/lto/lto-common.cc |   5 +-
gcc/print-tree.cc |   5 +
.../gcc.dg/builtin-object-size-pr101832.c | 134 ++
.../gcc.dg/variable-sized-type-flex-array.c   |  31 
gcc/tree-core.h   |   2 +
gcc/tree-object-size.cc   |  23 
++-
gcc/tree-streamer-in.cc   |   5 
+-
gcc/tree-streamer-out.cc  |   
5 +-
gcc/tree.h|   7 +-
12 files changed, 281 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/builtin-object-size-pr101832.c
create mode 100644 gcc/testsuite/gcc.dg/variable-sized-type-flex-array.c

--
2.31.1

Ping * 3: [V6][PATCH 1/2] Handle component_ref to a structre/union field including flexible array member [PR101832]

2023-04-20 Thread Qing Zhao via Gcc-patches

Hi,

Is this patch ready for GCC14?

Thanks.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: April 11, 2023 at 9:37:18 AM EDT
To: Jakub Jelinek mailto:ja...@redhat.com>>
Cc: Joseph Myers mailto:jos...@codesourcery.com>>, 
Richard Biener mailto:richard.guent...@gmail.com>>, 
kees Cook mailto:keesc...@chromium.org>>, Siddhesh 
Poyarekar mailto:siddh...@gotplt.org>>, gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Hi,  Jakub,

This is the 2nd ping to the 6th version of the patches -:)

Please let me know if you have any further comments on this patch, and whether 
it’s Okay to commit it to trunk?

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: April 4, 2023 at 9:06:37 AM EDT
To: Jakub Jelinek 
mailto:ja...@redhat.com>>
Cc: Joseph Myers 
mailto:jos...@codesourcery.com>>,
 Richard Biener 
mailto:richard.guent...@gmail.com>>,
 kees Cook 
mailto:keesc...@chromium.org>>,
 Siddhesh Poyarekar 
mailto:siddh...@gotplt.org>>, 
gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao 
mailto:qing.z...@oracle.com>>

Ping…

Qing

Begin forwarded message:

From: Qing Zhao 
mailto:qing.z...@oracle.com>>
Subject: [V6][PATCH 1/2] Handle component_ref to a structre/union field 
including flexible array member [PR101832]
Date: March 28, 2023 at 11:49:43 AM EDT
To: 
ja...@redhat.com,
 
jos...@codesourcery.com
Cc: 
richard.guent...@gmail.com,
 
keesc...@chromium.org,
 
siddh...@gotplt.org,
 
gcc-patches@gcc.gnu.org,
 Qing Zhao 
mailto:qing.z...@oracle.com>>

the C front-end has been approved by Joseph.

Jacub, could you please eview the middle end part of the changes of this patch?

The major change is in 
tree-object-size.cc
 (addr_object_size).
(To use the new TYPE_INCLUDE_FLEXARRAY info).

This patch is to fix 
PR101832(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832),
and is needed for Linux Kernel security.  It’s better to be put into GCC13.

Thanks a lot!

Qing

==

GCC extension accepts the case when a struct with a flexible array member
is embedded into another struct or union (possibly recursively).
__builtin_object_size should treat such struct as flexible size per
-fstrict-flex-arrays.

gcc/c/ChangeLog:

PR tree-optimization/101832
* c-decl.cc 
(finish_struct): Set TYPE_INCLUDE_FLEXARRAY for
struct/union type.

gcc/lto/ChangeLog:

PR tree-optimization/101832
* 
lto-common.cc 
(compare_tree_sccs_1): Compare bit
TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDE_FLEXARRAY properly
for its corresponding type.

gcc/ChangeLog:

PR tree-optimization/101832
* 
print-tree.cc 
(print_node): Print new bit type_include_flexarray.
* tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p
as type_include_flexarray for RECORD_TYPE or UNION_TYPE.
* 
tree-object-size.cc
 (addr_object_size): Handle structure/union type
when it has flexible size.
* 
tree-streamer-in.cc
 (unpack_ts_type_common_value_fields): Stream
in bit no_named_args_stdarg_p properly for its corresponding type.
* 
tree-streamer-out.cc
 (pack_ts_type_common_value_fields): Stream
out bit no_named_args_stdarg_p properly for its corresponding type.
* tree.h (TYPE_INCLUDE_FLEXARRAY): New macro TYPE_INCLUDE_FLEXARRAY.

Ping * 3: [V6][PATCH 2/2] Update documentation to clarify a GCC extension

2023-04-20 Thread Qing Zhao via Gcc-patches

Hi,

Is this patch ready for GCC14?

Thanks.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 2/2] Update documentation to clarify a GCC extension
Date: April 11, 2023 at 9:38:29 AM EDT
To: Joseph Myers mailto:jos...@codesourcery.com>>
Cc: Jakub Jelinek mailto:ja...@redhat.com>>, Richard Biener 
mailto:richard.guent...@gmail.com>>, kees Cook 
mailto:keesc...@chromium.org>>, Siddhesh Poyarekar 
mailto:siddh...@gotplt.org>>, gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao mailto:qing.z...@oracle.com>>

Hi, Joseph,

This is the 2nd ping to the 6th version of the patch -:)

Please let me know if you have any further comments on the patch, and whether 
it’s Okay to commit it to trunk?

Thanks a lot for the help.

Qing

Begin forwarded message:

From: Qing Zhao via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Subject: Fwd: [V6][PATCH 2/2] Update documentation to clarify a GCC extension
Date: April 4, 2023 at 9:07:55 AM EDT
To: Joseph Myers 
mailto:jos...@codesourcery.com>>
Cc: Jakub Jelinek 
mailto:ja...@redhat.com>>, Richard 
Biener 
mailto:richard.guent...@gmail.com>>,
 Kees Cook 
mailto:keesc...@chromium.org>>,
 Siddhesh Poyarekar 
mailto:siddh...@gotplt.org>>, 
gcc Patches 
mailto:gcc-patches@gcc.gnu.org>>
Reply-To: Qing Zhao 
mailto:qing.z...@oracle.com>>

Ping….

Qing

Begin forwarded message:

From: Qing Zhao 
mailto:qing.z...@oracle.com>>
Subject: [PATCH 2/2] Update documentation to clarify a GCC extension
Date: March 28, 2023 at 11:49:44 AM EDT
To: 
ja...@redhat.com,
 
jos...@codesourcery.com
Cc: 
richard.guent...@gmail.com,
 
keesc...@chromium.org,
 
siddh...@gotplt.org,
 
gcc-patches@gcc.gnu.org,
 Qing Zhao 
mailto:qing.z...@oracle.com>>

on a structure with a C99 flexible array member being nested in
another structure. (PR77650)

"GCC extension accepts a structure containing an ISO C99 "flexible array
member", or a union containing such a structure (possibly recursively)
to be a member of a structure.

There are two situations:

* A structure or a union with a C99 flexible array member is the last
  field of another structure, for example:

   struct flex  { int length; char data[]; };
   union union_flex { int others; struct flex f; };

   struct out_flex_struct { int m; struct flex flex_data; };
   struct out_flex_union { int n; union union_flex flex_data; };

  In the above, both 'out_flex_struct.flex_data.data[]' and
  'out_flex_union.flex_data.f.data[]' are considered as flexible
  arrays too.

* A structure or a union with a C99 flexible array member is the
  middle field of another structure, for example:

   struct flex  { int length; char data[]; };

   struct mid_flex { int m; struct flex flex_data; int n; };

  In the above, 'mid_flex.flex_data.data[]' has undefined behavior.
  Compilers do not handle such case consistently, Any code relying on
  such case should be modified to ensure that flexible array members
  only end up at the ends of structures.

  Please use warning option '-Wflex-array-member-not-at-end' to
  identify all such cases in the source code and modify them.  This
  warning will be on by default starting from GCC 14.
"

gcc/c-family/ChangeLog:

* c.opt: New option -Wflex-array-member-not-at-end.

gcc/c/ChangeLog:

* 
c-decl.cc>
 (finish_struct): Issue warnings for new option.

gcc/ChangeLog:

* doc/extend.texi: Document GCC extension on a structure containing
a flexible array member to be a member of another structure.

gcc/testsuite/ChangeLog:

* gcc.dg/variable-sized-type-flex-array.c: New test.
---
gcc/c-family/c.opt|  5 +++
gcc/c/c-decl.cc>
   |  9 
gcc/doc/extend.texi   | 45 ++-
.../gcc.dg/variable-sized-type-flex-array.c   | 31 +
4 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuit

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 04:02:02PM +0200, Jakub Jelinek via Gcc-patches wrote:
> So, I wrote following test.

Slightly adjusted to see more info:

x86_64-linux glibc 2.35:
for i in FLOAT DOUBLE LDOUBLE FLOAT128; do for j in TONEAREST UPWARD DOWNWARD 
TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o /tmp/sincos{,.c} -lm; 
/tmp/sincos || echo $i $j; done; done
sin -0x1.2d97c800p+2 0x1.0200p+0
sin -0x1.2d97c800p+2 0x1.0200p+0
sin 0x1.f9cbe200p+7 0x1.0200p+0
FLOAT UPWARD
cos -0x1.f9cbe200p+8 -0x1.0200p+0
sin -0x1.f9cbe200p+7 -0x1.0200p+0
sin 0x1.2d97c800p+2 -0x1.0200p+0
sin 0x1.2d97c800p+2 -0x1.0200p+0
cos 0x1.f9cbe200p+8 -0x1.0200p+0
cos 0x1.f9cbe200p+8 -0x1.0200p+0
FLOAT DOWNWARD
sparc-sun-solaris2.11 results are too large to post in detail, but are
sin -0x1.2d97c7f3321d2000p+2 0x1.1000p+0
...
sin 0x1.f6a7a29553c45000p+2 0x1.1000p+0
DOUBLE UPWARD
sin -0x1.f6a7a2955385e000p+2 -0x1.1000p+0
...
sin 0x1.2d97c7f3325b9000p+2 -0x1.1000p+0
DOUBLE DOWNWARD
where all the DOUBLE UPWARD values have 0x1.1000p+0
results and all DOUBLE DOWNWARD values -0x1.1000p+0.
So, I think that is 1ulp in all cases in both directions for
-frounding-math.

Jakub
#define _GNU_SOURCE
#include 
#include 
#include 
#include 

#ifdef FLOAT
#define TYPE float
#define SIN sinf
#define COS cosf
#ifdef M_PI_2f
#define PI2 M_PI_2f
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define PRINT(str) printf ("%s %.20a %.20a\n", str, val, res)
#define NEXTAFTER nextafterf
#elif defined DOUBLE
#define TYPE double
#define SIN sin
#define COS cos
#ifdef M_PI_2
#define PI2 M_PI_2
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafter
#define PRINT(str) printf ("%s %.20a %.20a\n", str, val, res)
#elif defined LDOUBLE
#define TYPE long double
#define SIN sinl
#define COS cosl
#ifdef M_PI_2l
#define PI2 M_PI_2l
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterl
#define PRINT(str) printf ("%s %.20La %.20La\n", str, val, res)
#elif defined FLOAT128
#define TYPE _Float128
#define SIN sinf128
#define COS cosf128
#ifdef M_PI_2f128
#define PI2 M_PI_2f128
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterf128
#define PRINT(str) __builtin_abort ()
#endif

int
main ()
{
  int ret = 0;
#ifdef ROUND
  fesetround (ROUND);
#endif
  for (int i = -1024; i <= 1024; i++)
for (int j = -1; j <= 1; j += 2)
  {
TYPE val = ((TYPE) i) * PI2;
TYPE inf = j * __builtin_inf ();
for (int k = 0; k < 1000; k++)
  {
TYPE res = SIN (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
  { PRINT ("sin"); ret = 1; }
res = COS (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
  { PRINT ("cos"); ret = 1; }
val = NEXTAFTER (val, inf);
  }
  }
  return ret;
}

Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Richard Sandiford via Gcc-patches

"Andre Vieira (lists)"  writes:
> Hi all,
>
> This is a series of patches/RFCs to implement support in GCC to be able 
> to target AArch64's libmvec functions that will be/are being added to glibc.
> We have chosen to use the omp pragma '#pragma omp declare variant ...' 
> with a simd construct as the way for glibc to inform GCC what functions 
> are available.
>
> For example, if we would like to supply a vector version of the scalar 
> 'cosf' we would have an include file with something like:
> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
> typedef __SVFloat32_t __sv_f32_t;
> typedef __SVBool_t __sv_bool_t;
> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
> #pragma omp declare variant(_ZGVnN4v_cosf) \
>  match(construct = {simd(notinbranch, simdlen(4))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVnN2v_cosf) \
>  match(construct = {simd(notinbranch, simdlen(2))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVsMxv_cosf) \
>  match(construct = {simd(inbranch)}, device = {isa("sve")})
> extern float cosf (float);
>
> The BETA ABI can be found in the vfabia64 subdir of 
> https://github.com/ARM-software/abi-aa/
> This currently disagrees with how this patch series implements 'omp 
> declare simd' for SVE and I also do not see a need for the 'omp declare 
> variant' scalable extension constructs. I will make changes to the ABI 
> once we've finalized the co-design of the ABI and this implementation.

I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.
Relaxing that for an explicit extension seems better than doing it
only based on the ISA (which should in general be a free-form string).
Having "scalable" in the definition also helps to make the intent clearer.

Any change to the declare simd behaviour should probably be agreed
with the LLVM folks first.  Like you say, we already know that GCC
can do your version, since it already does the equivalent thing for x86.

I'm not sure, but I'm guessing the declare simd VFABI was written
that way because, at the time (several years ago), there were
concerns about switching SVE on and off on a function-by-function
basis in LLVM.

But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?

Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.

Thanks,
Richard

> The patch series has three main steps:
> 1) Add SVE support for 'omp declare simd', see PR 96342
> 2) Enable GCC to use omp declare variants with simd constructs as simd 
> clones during auto-vectorization.
> 3) Add SLP support for vectorizable_simd_clone_call (This sounded like a 
> nice thing to add as we want to move away from non-slp vectorization).
>
> Below you can see the list of current Patches/RFCs, the difference being 
> on how confident I am of the proposed changes. For the RFC I am hoping 
> to get early comments on the approach, rather than more indepth 
> code-reviews.
>
> I appreciate we are still in Stage 4, so I can completely understand if 
> you don't have time to review this now, but I thought it can't hurt to 
> post these early.
>
> Andre Vieira:
> [PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
> [PATCH] parloops: Copy target and optimizations when creating a function 
> clone
> [PATCH] parloops: Allow poly nit and bound
> [RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
> [RFC] omp: Create simd clones from 'omp declare variant's
> [RFC] omp: Allow creation of simd clones from omp declare variant with 
> -fopenmp-simd flag
>
> Work in progress:
> [RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

i386: Handle sign-extract for QImode operations with high registers [PR78952]

2023-04-20 Thread Uros Bizjak via Gcc-patches

Introduce extract_operator predicate to handle both, zero-extract and
sign-extract extract operations with expressions like:

(subreg:QI
  (zero_extract:SWI248
(match_operand 1 "int248_register_operand" "0")
(const_int 8)
(const_int 8)) 0)

As shown in the testcase, this will enable generation of QImode
instructions with high registers when signed arguments are used.

gcc/ChangeLog:

PR target/78952
* config/i386/predicates.md (extract_operator): New predicate.
* config/i386/i386.md (any_extract): Remove code iterator.
(*cmpqi_ext_1_mem_rex64): Use extract_operator predicate.
(*cmpqi_ext_1): Ditto.
(*cmpqi_ext_2): Ditto.
(*cmpqi_ext_3_mem_rex64): Ditto.
(*cmpqi_ext_3): Ditto.
(*cmpqi_ext_4): Ditto.
(*extzvqi_mem_rex64): Ditto.
(*extzvqi): Ditto.
(*insvqi_2): Ditto.
(*extendqi_ext_1): Ditto.
(*addqi_ext_0): Ditto.
(*addqi_ext_1): Ditto.
(*addqi_ext_2): Ditto.
(*subqi_ext_0): Ditto.
(*subqi_ext_2): Ditto.
(*testqi_ext_1): Ditto.
(*testqi_ext_2): Ditto.
(*andqi_ext_0): Ditto.
(*andqi_ext_1): Ditto.
(*andqi_ext_1_cc): Ditto.
(*andqi_ext_2): Ditto.
(*qi_ext_0): Ditto.
(*qi_ext_1): Ditto.
(*qi_ext_2): Ditto.
(*xorqi_ext_1_cc): Ditto.
(*negqi_ext_2): Ditto.
(*ashlqi_ext_2): Ditto.
(*qi_ext_2): Ditto.

gcc/testsuite/ChangeLog:

PR target/78952
* gcc.target/i386/pr78952-4.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 0f95d8e8918..337702f5a9b 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1005,9 +1005,6 @@ (define_code_attr absneg_mnemonic [(abs "fabs") (neg 
"fchs")])
 ;; Mapping of extend operators
 (define_code_iterator any_extend [sign_extend zero_extend])
 
-;; Mapping of extract operators
-(define_code_iterator any_extract [sign_extract zero_extract])
-
 ;; Mapping of highpart multiply operators
 (define_code_iterator any_mul_highpart [smul_highpart umul_highpart])
 
@@ -1462,10 +1459,10 @@ (define_insn "*cmpqi_ext_1_mem_rex64"
(compare
  (match_operand:QI 0 "norex_memory_operand" "Bn")
  (subreg:QI
-   (any_extract:SWI248
- (match_operand 1 "int248_register_operand" "Q")
- (const_int 8)
- (const_int 8)) 0)))]
+   (match_operator:SWI248 2 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)))]
   "TARGET_64BIT && reload_completed
&& ix86_match_ccmode (insn, CCmode)"
   "cmp{b}\t{%h1, %0|%0, %h1}"
@@ -1477,10 +1474,10 @@ (define_insn "*cmpqi_ext_1"
(compare
  (match_operand:QI 0 "nonimmediate_operand" "QBc,m")
  (subreg:QI
-   (any_extract:SWI248
- (match_operand 1 "int248_register_operand" "Q,Q")
- (const_int 8)
- (const_int 8)) 0)))]
+   (match_operator:SWI248 2 "extract_operator"
+ [(match_operand 1 "int248_register_operand" "Q,Q")
+  (const_int 8)
+  (const_int 8)]) 0)))]
   "ix86_match_ccmode (insn, CCmode)"
   "cmp{b}\t{%h1, %0|%0, %h1}"
   [(set_attr "isa" "*,nox64")
@@ -1494,29 +1491,29 @@ (define_peephole2
(match_operator 4 "compare_operator"
  [(match_dup 0)
   (subreg:QI
-(any_extract:SWI248
-  (match_operand 2 "int248_register_operand")
-  (const_int 8)
-  (const_int 8)) 0)]))]
+(match_operator:SWI248 5 "extract_operator"
+  [(match_operand 2 "int248_register_operand")
+   (const_int 8)
+   (const_int 8)]) 0)]))]
   "TARGET_64BIT
&& peep2_reg_dead_p (2, operands[0])"
   [(set (match_dup 3)
(match_op_dup 4
  [(match_dup 1)
   (subreg:QI
-(any_extract:SWI248
-  (match_dup 2)
-  (const_int 8)
-  (const_int 8)) 0)]))])
+(match_op_dup 5
+  [(match_dup 2)
+   (const_int 8)
+   (const_int 8)]) 0)]))])
 
 (define_insn "*cmpqi_ext_2"
   [(set (reg FLAGS_REG)
(compare
  (subreg:QI
-   (any_extract:SWI248
- (match_operand 0 "int248_register_operand" "Q")
- (const_int 8)
- (const_int 8)) 0)
+   (match_operator:SWI248 2 "extract_operator"
+ [(match_operand 0 "int248_register_operand" "Q")
+  (const_int 8)
+  (const_int 8)]) 0)
  (match_operand:QI 1 "const0_operand")))]
   "ix86_match_ccmode (insn, CCNOmode)"
   "test{b}\t%h0, %h0"
@@ -1538,10 +1535,10 @@ (define_insn "*cmpqi_ext_3_mem_rex64"
   [(set (reg FLAGS_REG)
(compare
  (subreg:QI
-   (any_extract:SWI248
- (match_operand 0 "int248_register_operand" "Q")
- (const_int 8)
-

[PATCH] arch: Use VIRTUAL_REGISTER_P predicate.

2023-04-20 Thread Uros Bizjak via Gcc-patches

gcc/ChangeLog:

* config/arm/arm.cc (thumb1_legitimate_address_p):
Use VIRTUAL_REGISTER_P predicate.
(arm_eliminable_register): Ditto.
* config/avr/avr.md (push_1): Ditto.
* config/bfin/predicates.md (register_no_elim_operand): Ditto.
* config/h8300/predicates.md (register_no_sp_elim_operand): Ditto.
* config/i386/predicates.md (register_no_elim_operand): Ditto.
* config/iq2000/predicates.md (call_insn_operand): Ditto.
* config/microblaze/microblaze.h (CALL_INSN_OP): Ditto.

Tested by building cc1 and compiling a hello-world.c application for
all affected arches.

Pushed to master as an obvious patch.

Uros.
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index bf7ff9a9704..1164119a300 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -9105,9 +9105,7 @@ thumb1_legitimate_address_p (machine_mode mode, rtx x, 
int strict_p)
   else if (REG_P (XEXP (x, 0))
   && (REGNO (XEXP (x, 0)) == FRAME_POINTER_REGNUM
   || REGNO (XEXP (x, 0)) == ARG_POINTER_REGNUM
-  || (REGNO (XEXP (x, 0)) >= FIRST_VIRTUAL_REGISTER
-  && REGNO (XEXP (x, 0))
- <= LAST_VIRTUAL_POINTER_REGISTER))
+  || VIRTUAL_REGISTER_P (XEXP (x, 0)))
   && GET_MODE_SIZE (mode) >= 4
   && CONST_INT_P (XEXP (x, 1))
   && (INTVAL (XEXP (x, 1)) & 3) == 0)
@@ -13905,8 +13903,7 @@ arm_eliminable_register (rtx x)
 {
   return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM
   || REGNO (x) == ARG_POINTER_REGNUM
-  || (REGNO (x) >= FIRST_VIRTUAL_REGISTER
-  && REGNO (x) <= LAST_VIRTUAL_REGISTER));
+  || VIRTUAL_REGISTER_P (x));
 }
 
 /* Return GENERAL_REGS if a scratch register required to reload x to/from
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index e581e959e57..43b75046384 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -417,8 +417,7 @@ (define_expand "push1"
 operands[0] = copy_to_mode_reg (mode, operands[0]);
   }
 else if (REG_P (operands[0])
- && IN_RANGE (REGNO (operands[0]), FIRST_VIRTUAL_REGISTER,
-  LAST_VIRTUAL_REGISTER))
+ && VIRTUAL_REGISTER_P (operands[0]))
   {
 // Byte-wise pushing of virtual regs might result in something like
 //
diff --git a/gcc/config/bfin/predicates.md b/gcc/config/bfin/predicates.md
index 09ec5a4bd86..632634eb0f7 100644
--- a/gcc/config/bfin/predicates.md
+++ b/gcc/config/bfin/predicates.md
@@ -175,7 +175,7 @@ (define_predicate "symbolic_or_const_operand"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
-;; True for any non-virtual or eliminable register.  Used in places where
+;; True for any non-virtual and non-eliminable register.  Used in places where
 ;; instantiation of such a register may cause the pattern to not be recognized.
 (define_predicate "register_no_elim_operand"
   (match_operand 0 "register_operand")
@@ -184,8 +184,7 @@ (define_predicate "register_no_elim_operand"
 op = SUBREG_REG (op);
   return !(op == arg_pointer_rtx
   || op == frame_pointer_rtx
-  || (REGNO (op) >= FIRST_PSEUDO_REGISTER
-  && REGNO (op) <= LAST_VIRTUAL_REGISTER));
+  || VIRTUAL_REGISTER_P (op));
 })
 
 ;; Test for an operator valid in a BImode conditional branch
diff --git a/gcc/config/h8300/predicates.md b/gcc/config/h8300/predicates.md
index 02da8aa413a..486c4d7ce66 100644
--- a/gcc/config/h8300/predicates.md
+++ b/gcc/config/h8300/predicates.md
@@ -378,8 +378,7 @@ (define_predicate "register_no_sp_elim_operand"
   return !(op == stack_pointer_rtx
   || op == arg_pointer_rtx
   || op == frame_pointer_rtx
-  || IN_RANGE (REGNO (op),
-   FIRST_PSEUDO_REGISTER, LAST_VIRTUAL_REGISTER));
+  || VIRTUAL_REGISTER_P (op));
 })
 
 ;; Return nonzero if X is a constant whose absolute value is greater
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index b4d9ab40ab9..746101a1755 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -702,7 +702,7 @@ (define_predicate "call_register_no_elim_operand"
   return register_no_elim_operand (op, mode);
 })
 
-;; True for any non-virtual or eliminable register.  Used in places where
+;; True for any non-virtual and non-eliminable register.  Used in places where
 ;; instantiation of such a register may cause the pattern to not be recognized.
 (define_predicate "register_no_elim_operand"
   (match_operand 0 "register_operand")
@@ -717,8 +717,7 @@ (define_predicate "register_no_elim_operand"
 
   return !(op == arg_pointer_rtx
   || op == frame_pointer_rtx
-  || IN_RANGE (REGNO (op),
-   FIRST_PSEUDO_REGISTER, LAST_VIRTUAL_REGISTER));
+  || VIRTUAL_REGISTER_P (op));
 })
 
 ;; Similarly, but i

[pushed] [LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in asm in different mode

2023-04-20 Thread Vladimir Makarov via Gcc-patches

The following patch fixes test failure of 20030222-1.c on moxie port.  
But the problem can occur on other targets.  The patch actually 
implements the old reload approach for the test case.


The patch was successfully tested and bootstrapped on x86-64, aarch64, 
and ppc64le.


commit 51703ac3c722cd94011ab5b499921f6c9fe9fab5
Author: Vladimir N. Makarov 
Date:   Thu Apr 20 10:02:13 2023 -0400

[LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in 
asm in different mode

See gcc.c-torture/execute/20030222-1.c.  Consider the code for 32-bit (e.g. 
BE) target:
  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
We generate the following RTL with reload insns:
  1. subreg:si(x:di, 0) = 0;
  2. subreg:si(x:di, 4) = v:si;
  3. t:di = x:di, dead x;
  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
  5. i:si = subreg:si(t:di,4);
If we assign hard reg of x to t, dead code elimination will remove insn #2
and we will use unitialized hard reg.  So exclude the hard reg of x for t.
We could ignore this problem for non-empty asm using all x value but it is 
hard to
check that the asm are expanded into insn realy using x and setting r.
The old reload pass used the same approach.

gcc/ChangeLog

* lra-constraints.cc (match_reload): Exclude some hard regs for
multi-reg inout reload pseudos used in asm in different mode.

diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index b231cb60529..4dc2d70c402 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -1022,6 +1022,34 @@ match_reload (signed char out, signed char *ins, signed 
char *outs,
 are ordered.  */
   if (partial_subreg_p (outmode, inmode))
{
+ bool asm_p = asm_noperands (PATTERN (curr_insn)) >= 0;
+ int hr;
+ HARD_REG_SET temp_hard_reg_set;
+ 
+ if (asm_p && (hr = get_hard_regno (out_rtx)) >= 0
+ && hard_regno_nregs (hr, inmode) > 1)
+   {
+ /* See gcc.c-torture/execute/20030222-1.c.
+Consider the code for 32-bit (e.g. BE) target:
+  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
+We generate the following RTL with reload insns:
+  1. subreg:si(x:di, 0) = 0;
+  2. subreg:si(x:di, 4) = v:si;
+  3. t:di = x:di, dead x;
+  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
+  5. i:si = subreg:si(t:di,4);
+If we assign hard reg of x to t, dead code elimination
+will remove insn #2 and we will use unitialized hard reg.
+So exclude the hard reg of x for t.  We could ignore this
+problem for non-empty asm using all x value but it is hard to
+check that the asm are expanded into insn realy using x
+and setting r.  */
+ CLEAR_HARD_REG_SET (temp_hard_reg_set);
+ if (exclude_start_hard_regs != NULL)
+   temp_hard_reg_set = *exclude_start_hard_regs;
+ SET_HARD_REG_BIT (temp_hard_reg_set, hr);
+ exclude_start_hard_regs = &temp_hard_reg_set;
+   }
  reg = new_in_reg
= lra_create_new_reg_with_unique_value (inmode, in_rtx, goal_class,
exclude_start_hard_regs,

Re: [PATCH 1/X] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

2023-04-20 Thread Richard Sandiford via Gcc-patches

"Andre Vieira (lists)"  writes:
> Hi,
>
> This patch replaces the uses of simd_clone_subparts with 
> TYPE_VECTOR_SUBPARTS and removes the definition of the first.
>
> gcc/ChangeLog:
>
>  * omp-sind-clone.cc (simd_clone_subparts): Remove.
>  (simd_clone_init_simd_arrays): Replace simd_clone_subparts with 
> TYPE_VECTOR_SUBPARTS.
>  (ipa_simd_modify_function_body): Likewise.
>  * tree-vect-stmts.cc (simd_clone_subparts): Remove.
>  (vectorizable_simd_clone_call): Replace simd_clone_subparts 
> with TYPE_VECTOR_SUBPARTS.
>
> diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc
> index 
> 0949b8ba288dfc7e7692403bfc600983faddf5dd..48b480e7556d9ad8e5502e10e513ec36b17b9cbb
>  100644
> --- a/gcc/omp-simd-clone.cc
> +++ b/gcc/omp-simd-clone.cc
> @@ -255,16 +255,6 @@ ok_for_auto_simd_clone (struct cgraph_node *node)
>return true;
>  }
>  
> -
> -/* Return the number of elements in vector type VECTYPE, which is associated
> -   with a SIMD clone.  At present these always have a constant length.  */
> -
> -static unsigned HOST_WIDE_INT
> -simd_clone_subparts (tree vectype)
> -{
> -  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> -}
> -
>  /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
> of arguments to reserve space for.  */
>  
> @@ -1027,7 +1017,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
>   }
> continue;
>   }
> -  if (known_eq (simd_clone_subparts (TREE_TYPE (arg)),
> +  if (known_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)),
>   node->simdclone->simdlen))
>   {
> tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> @@ -1039,7 +1029,7 @@ simd_clone_init_simd_arrays (struct cgraph_node *node,
>   }
>else
>   {
> -   unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
> +   poly_uint64 simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
> unsigned int times = vector_unroll_factor (node->simdclone->simdlen,
>simdlen);
> tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
> @@ -1225,9 +1215,9 @@ ipa_simd_modify_function_body (struct cgraph_node *node,
> iter, NULL_TREE, NULL_TREE);
>adjustments->register_replacement (&(*adjustments->m_adj_params)[j], 
> r);
>  
> -  if (multiple_p (node->simdclone->simdlen, simd_clone_subparts 
> (vectype)))
> +  if (multiple_p (node->simdclone->simdlen, TYPE_VECTOR_SUBPARTS 
> (vectype)))
>   j += vector_unroll_factor (node->simdclone->simdlen,
> -simd_clone_subparts (vectype)) - 1;
> +TYPE_VECTOR_SUBPARTS (vectype)) - 1;
>  }
>adjustments->sort_replacements ();
>  
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 
> df6239a1c61c7213ad3c1468723bc1adf70bc02c..c85b6babc4bc5bc3111ef326dcc8f32bb25333f6
>  100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -3964,16 +3964,6 @@ vect_simd_lane_linear (tree op, class loop *loop,
>  }
>  }
>  
> -/* Return the number of elements in vector type VECTYPE, which is associated
> -   with a SIMD clone.  At present these vectors always have a constant
> -   length.  */
> -
> -static unsigned HOST_WIDE_INT
> -simd_clone_subparts (tree vectype)
> -{
> -  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> -}
> -
>  /* Function vectorizable_simd_clone_call.
>  
> Check if STMT_INFO performs a function call that can be vectorized
> @@ -4251,7 +4241,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
> slp_node);
>   if (arginfo[i].vectype == NULL
>   || !constant_multiple_p (bestn->simdclone->simdlen,
> -  simd_clone_subparts (arginfo[i].vectype)))
> +  TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)))
> return false;
>}
>  
> @@ -4349,15 +4339,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, 
> stmt_vec_info stmt_info,
>   case SIMD_CLONE_ARG_TYPE_VECTOR:
> atype = bestn->simdclone->args[i].vector_type;
> o = vector_unroll_factor (nunits,
> - simd_clone_subparts (atype));
> + TYPE_VECTOR_SUBPARTS (atype));
> for (m = j * o; m < (j + 1) * o; m++)
>   {
> -   if (simd_clone_subparts (atype)
> -   < simd_clone_subparts (arginfo[i].vectype))
> +   poly_uint64 atype_subparts = TYPE_VECTOR_SUBPARTS (atype);
> +   poly_uint64 arginfo_subparts
> + = TYPE_VECTOR_SUBPARTS (arginfo[i].vectype);
> +   if (known_lt (atype_subparts, arginfo_subparts))
>   {
> poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
>

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Siddhesh Poyarekar


On 2023-04-20 10:02, Jakub Jelinek wrote:

On x86_64-linux with glibc 2.35, I see
for i in FLOAT DOUBLE LDOUBLE FLOAT128; do for j in TONEAREST UPWARD DOWNWARD 
TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o /tmp/sincos{,.c} -lm; 
/tmp/sincos || echo $i $j; done; done
Aborted (core dumped)
FLOAT UPWARD
Aborted (core dumped)
FLOAT DOWNWARD
On sparc-sun-solaris2.11 I see
for i in FLOAT DOUBLE LDOUBLE; do for j in TONEAREST UPWARD DOWNWARD 
TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o sincos{,.c} -lm; ./sincos || 
echo $i $j; done; done
Abort (core dumped)
DOUBLE UPWARD
Abort (core dumped)
DOUBLE DOWNWARD
Haven't tried anything else.  So that shows (but doesn't prove) that
maybe [-1., 1.] interval is fine for -fno-rounding-math on those, but not
for -frounding-math.


Would there be a reason to not consider these as bugs?  I feel like 
these should be fixed in glibc, or any math implementation that ends up 
doing this.


I suppose one reason could be the overhead of an additional branch to 
check for result bounds, but is that serious enough to allow this 
imprecision?  The alternative of output range being defined as 
[-1.0-ulp, 1.0+ulp] avoids that conversation I guess.


Thanks,
Sid

Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Andre Vieira (lists) via Gcc-patches





On 20/04/2023 15:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi all,

This is a series of patches/RFCs to implement support in GCC to be able
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...'
with a simd construct as the way for glibc to inform GCC what functions
are available.

For example, if we would like to supply a vector version of the scalar
'cosf' we would have an include file with something like:
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
  match(construct = {simd(notinbranch, simdlen(4))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVnN2v_cosf) \
  match(construct = {simd(notinbranch, simdlen(2))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVsMxv_cosf) \
  match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp
declare simd' for SVE and I also do not see a need for the 'omp declare
variant' scalable extension constructs. I will make changes to the ABI
once we've finalized the co-design of the ABI and this implementation.


I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.

Where can you find this in the specs? I tried to find it but couldn't.

Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
ABI defines behaviour for this. But I couldn't find what it meant for a 
omp declare variant, obviously can't be the same as for declare simd, as 
that is defined to mean 'define a set of clones' and only one clone can 
be associated to a declare variant.


But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?
Not sure I understand what you mean.  The vector ABI defines that if a 
simdlen is omitted that (other than the NEON clones) a SVE VLA clone is 
available. So how would I take -msve-vector-bits into consideration? Do 
you mean I ought to add them as options to pass to the function so that 
it gets used when doing the codegen for the clone (if a function body is 
available)?


This is where things get a bit iffy for me though... We purposefully 
generate a SVE simdclone regardless of command-line options, just like 
x86 does, so why would these options affect simd clone generation but 
not the actual availability of SVE? Just seems a bit odd...


A viable alternative would be to rely on declare variant for such 
behaviour, where we could use function attributes to pass specific 
target options to the variant's prototype to be able to add more 
specific tuning options per variant.  Not sure it will work but I can 
try it with my rebased patches at some point. I have to admit though, it 
is not a feature we are looking to use, so not sure it's worth the 
effort. The SVE simdclone codegen (with function bodies) is already 
pretty bad, so if we do believe there is a usecase for these, that might 
be something we should focus on before this sort of more specific tuning.


Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.
Can't find proper definitions of linkonce/comdat group functions so 
can't comment.




Thanks,
Richard


The patch series has three main steps:
1) Add SVE support for 'omp declare simd', see PR 96342
2) Enable GCC to use omp declare variants with simd constructs as simd
clones during auto-vectorization.
3) Add SLP support for vectorizable_simd_clone_call (This sounded like a
nice thing to add as we want to move away from non-slp vectorization).

Below you can see the list of current Patches/RFCs, the difference being
on how confident I am of the proposed changes. For the RFC I am hoping
to get early comments on the approach, rather than more indepth
code-reviews.

I appreciate we are still in Stage 4, so I can completely understand if
you don't have time to review this now, but I thought it can't hurt to
post these early.

Andre Vieira:
[PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

[PATCH] c++: improve template parameter level lowering

2023-04-20 Thread Patrick Palka via Gcc-patches

1. Now that we no longer substitute the constraints of an auto, we can
   get rid of the infinite recursion loop breaker during level lowering
   of a constrained auto and we can also use the TEMPLATE_PARM_DESCENDANTS
   cache in this case.
2. Don't bother recursing when level lowering a cv-qualified type template
   parameter.
3. Use TEMPLATE_PARM_DESCENDANTS when level lowering a non-type template
   parameter too.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (tsubst) : Remove infinite
recursion loop breaker in the level lowering case for
constrained autos.  Use the TEMPLATE_PARM_DESCENDANTS cache in
this case as well.
: Use the TEMPLATE_PARM_INDEX cache
when level lowering a non-type template parameter.
---
 gcc/cp/pt.cc | 42 --
 1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f65f2d58b28..07e9736cdce 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -16228,33 +16228,23 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
/* If we get here, we must have been looking at a parm for a
   more deeply nested template.  Make a new version of this
   template parameter, but with a lower level.  */
+   int quals;
switch (code)
  {
  case TEMPLATE_TYPE_PARM:
  case TEMPLATE_TEMPLATE_PARM:
-   if (cp_type_quals (t))
+   quals = cp_type_quals (t);
+   if (quals)
  {
-   r = tsubst (TYPE_MAIN_VARIANT (t), args, complain, in_decl);
-   r = cp_build_qualified_type
- (r, cp_type_quals (t),
-  complain | (code == TEMPLATE_TYPE_PARM
-  ? tf_ignore_bad_quals : 0));
+   gcc_checking_assert (code == TEMPLATE_TYPE_PARM);
+   t = TYPE_MAIN_VARIANT (t);
  }
-   else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
-&& PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
-&& (r = (TEMPLATE_PARM_DESCENDANTS
- (TEMPLATE_TYPE_PARM_INDEX (t
-&& (r = TREE_TYPE (r))
-&& !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (r))
- /* Break infinite recursion when substituting the constraints
-of a constrained placeholder.  */;
-   else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
-&& !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
-&& (arg = TEMPLATE_TYPE_PARM_INDEX (t),
-r = TEMPLATE_PARM_DESCENDANTS (arg))
-&& (TEMPLATE_PARM_LEVEL (r)
-== TEMPLATE_PARM_LEVEL (arg) - levels))
-   /* Cache the simple case of lowering a type parameter.  */
+   if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
+   && (arg = TEMPLATE_TYPE_PARM_INDEX (t),
+   r = TEMPLATE_PARM_DESCENDANTS (arg))
+   && (TEMPLATE_PARM_LEVEL (r)
+   == TEMPLATE_PARM_LEVEL (arg) - levels))
+ /* Cache the simple case of lowering a type parameter.  */
  r = TREE_TYPE (r);
else
  {
@@ -16278,6 +16268,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
else
  TYPE_CANONICAL (r) = canonical_type_parameter (r);
  }
+   if (quals)
+ r = cp_build_qualified_type (r, quals,
+  complain | tf_ignore_bad_quals);
break;
 
  case BOUND_TEMPLATE_TEMPLATE_PARM:
@@ -16307,7 +16300,12 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
type = tsubst (type, args, complain, in_decl);
if (type == error_mark_node)
  return error_mark_node;
-   r = reduce_template_parm_level (t, type, levels, args, complain);
+   if ((r = TEMPLATE_PARM_DESCENDANTS (t))
+   && (TEMPLATE_PARM_LEVEL (r) == TEMPLATE_PARM_LEVEL (t) - levels)
+   && TREE_TYPE (r) == type)
+ /* Cache the simple case of lowering a non-type parameter.  */;
+   else
+ r = reduce_template_parm_level (t, type, levels, args, complain);
break;
 
  default:
-- 
2.40.0.352.g667fcf4e15

Re: [PATCH] c++: improve template parameter level lowering

2023-04-20 Thread Patrick Palka via Gcc-patches

On Thu, 20 Apr 2023, Patrick Palka wrote:

> 1. Now that we no longer substitute the constraints of an auto, we can
>get rid of the infinite recursion loop breaker during level lowering
>of a constrained auto and we can also use the TEMPLATE_PARM_DESCENDANTS
>cache in this case.
> 2. Don't bother recursing when level lowering a cv-qualified type template
>parameter.
> 3. Use TEMPLATE_PARM_DESCENDANTS when level lowering a non-type template
>parameter too.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk?
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst) : Remove infinite
>   recursion loop breaker in the level lowering case for
>   constrained autos.  Use the TEMPLATE_PARM_DESCENDANTS cache in
>   this case as well.
>   : Use the TEMPLATE_PARM_INDEX cache
>   when level lowering a non-type template parameter.
> ---
>  gcc/cp/pt.cc | 42 --
>  1 file changed, 20 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index f65f2d58b28..07e9736cdce 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -16228,33 +16228,23 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
> tree in_decl)
>   /* If we get here, we must have been looking at a parm for a
>  more deeply nested template.  Make a new version of this
>  template parameter, but with a lower level.  */
> + int quals;
>   switch (code)
> {
> case TEMPLATE_TYPE_PARM:
> case TEMPLATE_TEMPLATE_PARM:
> - if (cp_type_quals (t))
> + quals = cp_type_quals (t);
> + if (quals)
> {
> - r = tsubst (TYPE_MAIN_VARIANT (t), args, complain, in_decl);
> - r = cp_build_qualified_type
> -   (r, cp_type_quals (t),
> -complain | (code == TEMPLATE_TYPE_PARM
> -? tf_ignore_bad_quals : 0));
> + gcc_checking_assert (code == TEMPLATE_TYPE_PARM);
> + t = TYPE_MAIN_VARIANT (t);
> }
> - else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
> -  && PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
> -  && (r = (TEMPLATE_PARM_DESCENDANTS
> -   (TEMPLATE_TYPE_PARM_INDEX (t
> -  && (r = TREE_TYPE (r))
> -  && !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (r))
> -   /* Break infinite recursion when substituting the constraints
> -  of a constrained placeholder.  */;
> - else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
> -  && !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
> -  && (arg = TEMPLATE_TYPE_PARM_INDEX (t),
> -  r = TEMPLATE_PARM_DESCENDANTS (arg))
> -  && (TEMPLATE_PARM_LEVEL (r)
> -  == TEMPLATE_PARM_LEVEL (arg) - levels))
> - /* Cache the simple case of lowering a type parameter.  */
> + if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
> + && (arg = TEMPLATE_TYPE_PARM_INDEX (t),
> + r = TEMPLATE_PARM_DESCENDANTS (arg))
> + && (TEMPLATE_PARM_LEVEL (r)
> + == TEMPLATE_PARM_LEVEL (arg) - levels))
> +   /* Cache the simple case of lowering a type parameter.  */
> r = TREE_TYPE (r);
>   else
> {
> @@ -16278,6 +16268,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
> tree in_decl)
>   else
> TYPE_CANONICAL (r) = canonical_type_parameter (r);
> }
> + if (quals)
> +   r = cp_build_qualified_type (r, quals,
> +complain | tf_ignore_bad_quals);
>   break;
>  
> case BOUND_TEMPLATE_TEMPLATE_PARM:
> @@ -16307,7 +16300,12 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
> tree in_decl)
>   type = tsubst (type, args, complain, in_decl);
>   if (type == error_mark_node)
> return error_mark_node;
> - r = reduce_template_parm_level (t, type, levels, args, complain);
> + if ((r = TEMPLATE_PARM_DESCENDANTS (t))
> + && (TEMPLATE_PARM_LEVEL (r) == TEMPLATE_PARM_LEVEL (t) - levels)
> + && TREE_TYPE (r) == type)
> +   /* Cache the simple case of lowering a non-type parameter.  */;
> + else
> +   r = reduce_template_parm_level (t, type, levels, args, complain);

D'oh, this hunk is totally redundant since reduce_template_parm_level
already checks TEMPLATE_PARM_DESCENDANTS, and so we've been caching
level-lowering of non-type template parameters this whole time.

Please consider this patch instead, which removes this hunk and
therefore only changes TEMPLATE_TYPE_PARM level lowering:

-- >8 --

Subject: [PATCH] c++: improve TEMPLATE_TYPE_PARM level lowering

1. Don't bother recursing when level lowering a cv-qualified type template
   parameter.
2. Get rid of the infin

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 11:22:24AM -0400, Siddhesh Poyarekar wrote:
> On 2023-04-20 10:02, Jakub Jelinek wrote:
> > On x86_64-linux with glibc 2.35, I see
> > for i in FLOAT DOUBLE LDOUBLE FLOAT128; do for j in TONEAREST UPWARD 
> > DOWNWARD TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o /tmp/sincos{,.c} 
> > -lm; /tmp/sincos || echo $i $j; done; done
> > Aborted (core dumped)
> > FLOAT UPWARD
> > Aborted (core dumped)
> > FLOAT DOWNWARD
> > On sparc-sun-solaris2.11 I see
> > for i in FLOAT DOUBLE LDOUBLE; do for j in TONEAREST UPWARD DOWNWARD 
> > TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o sincos{,.c} -lm; ./sincos 
> > || echo $i $j; done; done
> > Abort (core dumped)
> > DOUBLE UPWARD
> > Abort (core dumped)
> > DOUBLE DOWNWARD
> > Haven't tried anything else.  So that shows (but doesn't prove) that
> > maybe [-1., 1.] interval is fine for -fno-rounding-math on those, but not
> > for -frounding-math.
> 
> Would there be a reason to not consider these as bugs?  I feel like these
> should be fixed in glibc, or any math implementation that ends up doing
> this.

Why?  Unless an implementation guarantees <= 0.5ulps errors, it can be one
or more ulps off, why is an error at or near 1.0 or -1.0 error any worse
than similar errors for other values?
Similarly for other functions which have other ranges, perhaps not with so
nice round numbers.  Say asin has [-pi/2, pi/2] range, those numbers aren't
exactly representable, but is it any worse to round those values to -inf or
+inf or worse give something 1-5 ulps further from that interval comparing
to other 1-5ulps errors?

Jakub

Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 04:22:50PM +0100, Andre Vieira (lists) wrote:
> > I don't see a good reason for dropping the extension("scalable").
> > The problem is that since the base spec requires a simdlen clause,
> > GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
> 
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector ABI
> defines behaviour for this. But I couldn't find what it meant for a omp
> declare variant, obviously can't be the same as for declare simd, as that is
> defined to mean 'define a set of clones' and only one clone can be
> associated to a declare variant.

For missing simdlen on omp declare simd, OpenMP 5.2 says [202:14-15]:
"If a SIMD version is created and the simdlen clause is not specified, the 
number of concurrent
arguments for the function is implementation defined."
Nobody says it must be a constant when not specified, when specified it has
to be a constant.
declare variant is function call specialization based on lots of different
aspects.  If you specify simd among construct selectors, then the
implementation is allowed (and kind of expected but not currently implemented in
GCC) to change the calling convention based on the declare simd ABIs, but
again, simdlen might be specified (then it has to have constant number in
it) or not, then I bet it is supposed to be derived from the actual
differences in the calling convention to which match it is.
But as I said, this part isn't implemented yet even on other targets.

Jakub

Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Richard Sandiford via Gcc-patches

"Andre Vieira (lists)"  writes:
> On 20/04/2023 15:51, Richard Sandiford wrote:
>> "Andre Vieira (lists)"  writes:
>>> Hi all,
>>>
>>> This is a series of patches/RFCs to implement support in GCC to be able
>>> to target AArch64's libmvec functions that will be/are being added to glibc.
>>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>>> with a simd construct as the way for glibc to inform GCC what functions
>>> are available.
>>>
>>> For example, if we would like to supply a vector version of the scalar
>>> 'cosf' we would have an include file with something like:
>>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>>> typedef __SVFloat32_t __sv_f32_t;
>>> typedef __SVBool_t __sv_bool_t;
>>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>>   match(construct = {simd(notinbranch, simdlen(4))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>>   match(construct = {simd(notinbranch, simdlen(2))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>>   match(construct = {simd(inbranch)}, device = {isa("sve")})
>>> extern float cosf (float);
>>>
>>> The BETA ABI can be found in the vfabia64 subdir of
>>> https://github.com/ARM-software/abi-aa/
>>> This currently disagrees with how this patch series implements 'omp
>>> declare simd' for SVE and I also do not see a need for the 'omp declare
>>> variant' scalable extension constructs. I will make changes to the ABI
>>> once we've finalized the co-design of the ABI and this implementation.
>> 
>> I don't see a good reason for dropping the extension("scalable").
>> The problem is that since the base spec requires a simdlen clause,
>> GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
>
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
> ABI defines behaviour for this. But I couldn't find what it meant for a 
> omp declare variant, obviously can't be the same as for declare simd, as 
> that is defined to mean 'define a set of clones' and only one clone can 
> be associated to a declare variant.

I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
which says:

  The simd trait can be further defined with properties that match the
  clauses accepted by the declare simd directive with the same name and
  semantics. The simd trait must define at least the simdlen property and
  one of the inbranch or notinbranch properties.

(probably best to read it in the original -- it's almost incomprehensible
without markup)

Richard

Re: [PATCH] Silence some -Wnarrowing errors

2023-04-20 Thread Jeff Law via Gcc-patches





On 12/2/22 00:26, Eric Gallager via Gcc-patches wrote:

I tried turning -Wnarrowing back on earlier this year, but
unfortunately it didn't work due to triggering a bunch of new errors.
This patch silences at least some of them, but there will still be
more left even after applying it. (When compiling with clang,
technically the warning flag is -Wc++11-narrowing, but it's pretty
much the same thing as gcc's -Wnarrowing, albeit with fixit hints,
which I made use of to insert the casts here.)

gcc/ChangeLog:

 * ipa-modref.cc (modref_lattice::add_escape_point): Use a
static_cast to silence -Wnarrowing.
 (modref_eaf_analysis::record_escape_points): Likewise.
 (update_escape_summary_1): Likewise.
 * rtl-ssa/changes.cc (function_info::temp_access_array): Likewise.
 * rtl-ssa/member-fns.inl: Likewise.
 * tree-ssa-structalias.cc (push_fields_onto_fieldstack): Likewise.
 * tree-vect-slp.cc (vect_prologue_cost_for_slp): Likewise.
 * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Likewise.
 (vectorizable_operation): Likewise.
Would it make sense to instead fix the APIs so that instead of passing 
an "int" they instead pass a suitable enum type?


So for example, modref_lattice::add_escape_point passes "min_flags" as 
an int.  It probably should be an eaf_flags_t.  That may (of course) 
bleed out into other places.  So I'd probably suggest you pick one case 
such as this add_escape_point, make it's API change, fix the fallout and 
submit that as a patch.


Then proceed to the next case where you added a static_cast and do the 
same thing there.


In addition to fixing the warnings, this should make the codebase 
cleaner and catch more errors using the typesystem.


jeff

Re: [PATCH] c: Avoid -Wenum-int-mismatch warning for redeclaration of builtin acc_on_device [PR107041]

2023-04-20 Thread Marek Polacek via Gcc-patches

On Wed, Apr 19, 2023 at 11:02:53AM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The new -Wenum-int-mismatch warning triggers with -Wsystem-headers in
> , for obvious reasons the builtin acc_on_device uses int
> type argument rather than enum which isn't defined yet when the builtin
> is created, while the OpenACC spec requires it to have acc_device_t
> enum argument.  The header makes sure it has int underlying type by using
> negative and __INT_MAX__ enumerators.
> 
> I've tried to make the builtin typegeneric or just varargs, but that
> changes behavior e.g. when one calls it with some C++ class which has
> cast operator to acc_device_t, so the following patch instead disables
> the warning for this builtin.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and 13.2?
> 
> 2023-04-19  Jakub Jelinek  
> 
>   PR c/107041
>   * c-decl.cc (diagnose_mismatched_decls): Avoid -Wenum-int-mismatch
>   warning on acc_on_device declaration.
> 
>   * gcc.dg/goacc/pr107041.c: New test.
> 
> --- gcc/c/c-decl.cc.jj2023-03-10 10:10:17.918387120 +0100
> +++ gcc/c/c-decl.cc   2023-04-18 10:29:33.340793562 +0200
> @@ -2219,7 +2219,14 @@ diagnose_mismatched_decls (tree newdecl,
>  }
>/* Warn about enum/integer type mismatches.  They are compatible types
>   (C2X 6.7.2.2/5), but may pose portability problems.  */
> -  else if (enum_and_int_p && TREE_CODE (newdecl) != TYPE_DECL)
> +  else if (enum_and_int_p
> +&& TREE_CODE (newdecl) != TYPE_DECL
> +/* Don't warn about about acc_on_device builtin redeclaration,

"built-in"

> +   the builtin is declared with int rather than enum because

"built-in"

> +   the enum isn't intrinsic.  */
> +&& !(TREE_CODE (olddecl) == FUNCTION_DECL
> + && fndecl_built_in_p (olddecl, BUILT_IN_ACC_ON_DEVICE)
> + && !C_DECL_DECLARED_BUILTIN (olddecl)))

What do you think about adding an (UN)LIKELY here?  This seems a rather
very special case.  On the other hand we're not on a hot path here so it
hardly matters.

OK either way, thanks.

>  warned = warning_at (DECL_SOURCE_LOCATION (newdecl),
>OPT_Wenum_int_mismatch,
>"conflicting types for %q+D due to enum/integer "
> --- gcc/testsuite/gcc.dg/goacc/pr107041.c.jj  2023-04-18 10:18:07.039754258 
> +0200
> +++ gcc/testsuite/gcc.dg/goacc/pr107041.c 2023-04-18 10:17:21.252418797 
> +0200
> @@ -0,0 +1,23 @@
> +/* PR c/107041 */
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Wenum-int-mismatch" } */
> +
> +typedef enum acc_device_t {
> +  acc_device_current = -1,
> +  acc_device_none = 0,
> +  acc_device_default = 1,
> +  acc_device_host = 2,
> +  acc_device_not_host = 4,
> +  acc_device_nvidia = 5,
> +  acc_device_radeon = 8,
> +  _ACC_highest = __INT_MAX__
> +} acc_device_t;
> +
> +int acc_on_device (acc_device_t);/* { dg-bogus "conflicting 
> types for 'acc_on_device' due to enum/integer mismatch; have 
> 'int\\\(acc_device_t\\\)'" } */
> +int acc_on_device (acc_device_t);
> +
> +int
> +foo (void)
> +{
> +  return acc_on_device (acc_device_host);
> +}
> 
>   Jakub
> 

Marek

Re: [PATCH] riscv: generate builtin macro for compilation with strict alignment

2023-04-20 Thread Jeff Law via Gcc-patches





On 1/17/23 15:59, Vineet Gupta wrote:

This could be useful for library writers who want to write code variants
for fast vs. slow unaligned accesses.

We distinguish explicit -mstrict-align (1) vs. slow_unaligned_access
cpu tune param (2) for even more code divesity.

gcc/ChangeLog:

* config/riscv-c.cc (riscv_cpu_cpp_builtins):
  Generate __riscv_strict_align with value 1 or 2.
* config/riscv/riscv.cc: Define riscv_user_wants_strict_align.
  (riscv_option_override) Set riscv_user_wants_strict_align to
  TARGET_STRICT_ALIGN.
* config/riscv/riscv.h: Declare riscv_user_wants_strict_align.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/attribute.c: Check for
  __riscv_strict_align=1.
* gcc.target/riscv/predef-align-1.c: New test.
* gcc.target/riscv/predef-align-2.c: New test.
* gcc.target/riscv/predef-align-3.c: New test.
* gcc.target/riscv/predef-align-4.c: New test.
* gcc.target/riscv/predef-align-5.c: New test.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv-c.cc | 11 +++
  gcc/config/riscv/riscv.cc   |  9 +
  gcc/config/riscv/riscv.h|  1 +
  gcc/testsuite/gcc.target/riscv/attribute-4.c|  9 +
  gcc/testsuite/gcc.target/riscv/predef-align-1.c | 12 
  gcc/testsuite/gcc.target/riscv/predef-align-2.c | 11 +++
  gcc/testsuite/gcc.target/riscv/predef-align-3.c | 15 +++
  gcc/testsuite/gcc.target/riscv/predef-align-4.c | 16 
  gcc/testsuite/gcc.target/riscv/predef-align-5.c | 16 
  9 files changed, 100 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/predef-align-5.c

diff --git a/gcc/config/riscv/riscv-c.cc b/gcc/config/riscv/riscv-c.cc
index 826ae0067bb8..47a396501d74 100644
--- a/gcc/config/riscv/riscv-c.cc
+++ b/gcc/config/riscv/riscv-c.cc
@@ -102,6 +102,17 @@ riscv_cpu_cpp_builtins (cpp_reader *pfile)
  
  }
  
+  /* TARGET_STRICT_ALIGN does not cover all cases.  */

+  if (riscv_slow_unaligned_access_p)
+{
+  /* Explicit -mstruct-align preceedes cpu tune param
+ slow_unaligned_access=true.  */

Did you mean "-mstrict-align" above?



+  if (riscv_user_wants_strict_align)
+builtin_define_with_int_value ("__riscv_strict_align", 1);
+  else
+builtin_define_with_int_value ("__riscv_strict_align", 2);
So I don't understand why we're testing "riscv_user_wants_strict_align" 
instead of TARGET_STRICT_ALIGN here.  AFAICT they're equivalent.  But 
maybe there's something subtle I'm missing.


Jeff

[PATCH] MAINTAINERS: add Vineet Gupta to write after approval

2023-04-20 Thread Vineet Gupta

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.

(Ref: <680c7bbe-5d6e-07cd-8468-247afc65e...@gmail.com>)

Signed-off-by: Vineet Gupta 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cebf45d49e56..5f25617212a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -434,6 +434,7 @@ Haochen Gui 

 Jiufu Guo  
 Xuepeng Guo
 Wei Guozhi 
+Vineet Gupta   
 Naveen H.S 
 Mostafa Hagog  
 Andrew Haley   
-- 
2.34.1

Re: [PATCH 1/2] c++: make strip_typedefs generalize strip_typedefs_expr

2023-04-20 Thread Jason Merrill via Gcc-patches


On 4/20/23 09:56, Patrick Palka wrote:

If we have a TREE_VEC of types that we want to strip of typedefs, we
unintuitively need to call strip_typedefs_expr instead of strip_typedefs
since only strip_typedefs_expr handles TREE_VEC, and it also dispatches
to strip_typedefs when given a type.  But this seems backwards: arguably
strip_typedefs_expr should be the more specialized function, which
strip_typedefs dispatches to (and thus generalizes).

This patch makes strip_typedefs generalize strip_typedefs_expr, which
allows for some simplifications.


OK.


gcc/cp/ChangeLog:

* tree.cc (strip_typedefs): Move TREE_LIST handling to
strip_typedefs_expr.  Dispatch to strip_typedefs_expr
for a non-type 't'.
: Remove manual dispatching to
strip_typedefs_expr.
: Likewise.
(strip_typedefs_expr): Replaces calls to strip_typedefs_expr
with strip_typedefs throughout.  Don't dispatch to strip_typedefs
for a type 't'.
: Replace this with the better version from
strip_typedefs.
---
  gcc/cp/tree.cc | 83 +++---
  1 file changed, 24 insertions(+), 59 deletions(-)

diff --git a/gcc/cp/tree.cc b/gcc/cp/tree.cc
index 2c22fac17ee..f0fb78fe69d 100644
--- a/gcc/cp/tree.cc
+++ b/gcc/cp/tree.cc
@@ -1562,7 +1562,8 @@ apply_identity_attributes (tree result, tree attribs, 
bool *remove_attributes)
  
  /* Builds a qualified variant of T that is either not a typedef variant

 (the default behavior) or not a typedef variant of a user-facing type
-   (if FLAGS contains STF_USER_FACING).
+   (if FLAGS contains STF_USER_FACING).  If T is not a type, then this
+   just calls strip_typedefs_expr.
  
 E.g. consider the following declarations:

   typedef const int ConstInt;
@@ -1596,25 +1597,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
if (!t || t == error_mark_node)
  return t;
  
-  if (TREE_CODE (t) == TREE_LIST)

-{
-  bool changed = false;
-  releasing_vec vec;
-  tree r = t;
-  for (; t; t = TREE_CHAIN (t))
-   {
- gcc_assert (!TREE_PURPOSE (t));
- tree elt = strip_typedefs (TREE_VALUE (t), remove_attributes, flags);
- if (elt != TREE_VALUE (t))
-   changed = true;
- vec_safe_push (vec, elt);
-   }
-  if (changed)
-   r = build_tree_list_vec (vec);
-  return r;
-}
-
-  gcc_assert (TYPE_P (t));
+  if (!TYPE_P (t))
+return strip_typedefs_expr (t, remove_attributes, flags);
  
if (t == TYPE_CANONICAL (t))

  return t;
@@ -1747,12 +1731,7 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
for (int i = 0; i < TREE_VEC_LENGTH (args); ++i)
  {
tree arg = TREE_VEC_ELT (args, i);
-   tree strip_arg;
-   if (TYPE_P (arg))
- strip_arg = strip_typedefs (arg, remove_attributes, flags);
-   else
- strip_arg = strip_typedefs_expr (arg, remove_attributes,
-  flags);
+   tree strip_arg = strip_typedefs (arg, remove_attributes, flags);
TREE_VEC_ELT (new_args, i) = strip_arg;
if (strip_arg != arg)
  changed = true;
@@ -1792,11 +1771,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = 
NULL */,
break;
  case TRAIT_TYPE:
{
-   tree type1 = TRAIT_TYPE_TYPE1 (t);
-   if (TYPE_P (type1))
- type1 = strip_typedefs (type1, remove_attributes, flags);
-   else
- type1 = strip_typedefs_expr (type1, remove_attributes, flags);
+   tree type1 = strip_typedefs (TRAIT_TYPE_TYPE1 (t),
+remove_attributes, flags);
tree type2 = strip_typedefs (TRAIT_TYPE_TYPE2 (t),
 remove_attributes, flags);
if (type1 == TRAIT_TYPE_TYPE1 (t) && type2 == TRAIT_TYPE_TYPE2 (t))
@@ -1883,7 +1859,8 @@ strip_typedefs (tree t, bool *remove_attributes /* = NULL 
*/,
return cp_build_qualified_type (result, cp_type_quals (t));
  }
  
-/* Like strip_typedefs above, but works on expressions, so that in

+/* Like strip_typedefs above, but works on expressions (and other non-types
+   such as TREE_VEC), so that in
  
 template struct A

 {
@@ -1908,11 +1885,6 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
if (DECL_P (t) || CONSTANT_CLASS_P (t))
  return t;
  
-  /* Some expressions have type operands, so let's handle types here rather

- than check TYPE_P in multiple places below.  */
-  if (TYPE_P (t))
-return strip_typedefs (t, remove_attributes, flags);
-
code = TREE_CODE (t);
switch (code)
  {
@@ -1940,26 +1912,19 @@ strip_typedefs_expr (tree t, bool *remove_attributes, 
unsigned int flags)
  
  case TREE_LIST:

{
-   releasing_vec vec;
bool changed = false;
-   tree it

Re: [PATCH 2/2] c++: use TREE_VEC for trailing args of variadic built-in traits

2023-04-20 Thread Jason Merrill via Gcc-patches


On 4/20/23 09:56, Patrick Palka wrote:

This patch makes us use a TREE_VEC instead of TREE_LIST to represent the
trailing arguments of a variadic built-in trait.  These built-ins are
typically passed a simple pack expansion as the second argument, e.g.

__is_constructible(T, Ts...)

so the main benefit of this representation change means that expanding
such an argument list at substitution time is now basically free, since
argument packs are also TREE_VECs and tsubst_template_args makes sure
we reuse this TREE_VEC when expanding such pack expansions.  Previously,
we would perform the expansion via tsubst_tree_list which converts the
expanded pack expansion into a TREE_LIST.

Note, after this patch an empty set of trailing arguments is now
represented as an empty TREE_VEC instead of NULL_TREE, so
TRAIT_TYPE/EXPR_TYPE2 should be empty only for unary traits now.

(This patch slightly depends on "c++: make strip_typedefs generalize
strip_typedefs_expr".  Without it, strip_typedefs 
would need to conditionally dispatch to strip_typedefs_expr for
non-TYPE_P TRAIT_TYPE_TYPE2 since it could now be a TREE_VEC which
only strip_typedefs_expr handles.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?


OK.


gcc/cp/ChangeLog:

* constraint.cc (diagnose_trait_expr): Convert a TREE_VEC
of arguments into a TREE_LIST for sake of pretty printing.
* cxx-pretty-print.cc (pp_cxx_trait): Handle TREE_VEC
instead of TREE_LIST of variadic trait arguments.
* method.cc (constructible_expr): Likewise.
(is_xible_helper): Likewise.
* parser.cc (cp_parser_trait): Represent variadic trait
arguments as a TREE_VEC instead of TREE_LIST.
* pt.cc (value_dependent_expression_p): Handle TREE_VEC
instead of TREE_LIST of variadic trait arguments.
* semantics.cc (finish_type_pack_element): Likewise.
(check_trait_type): Likewise.
---
  gcc/cp/constraint.cc   | 10 ++
  gcc/cp/cxx-pretty-print.cc |  6 +++---
  gcc/cp/method.cc   | 17 +
  gcc/cp/parser.cc   | 10 ++
  gcc/cp/pt.cc   |  9 -
  gcc/cp/semantics.cc| 15 +--
  6 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 273d15ab097..dfead28e8c7 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3675,6 +3675,16 @@ diagnose_trait_expr (tree expr, tree args)
  
tree t1 = TRAIT_EXPR_TYPE1 (expr);

tree t2 = TRAIT_EXPR_TYPE2 (expr);
+  if (t2 && TREE_CODE (t2) == TREE_VEC)
+{
+  /* Convert the TREE_VEC of arguments into a TREE_LIST, since the
+pretty printer cannot directly print a TREE_VEC but it can a
+TREE_LIST via the E format specifier.  */
+  tree list = NULL_TREE;
+  for (tree t : tree_vec_range (t2))
+   list = tree_cons (NULL_TREE, t, list);
+  t2 = nreverse (list);
+}
switch (TRAIT_EXPR_KIND (expr))
  {
  case CPTK_HAS_NOTHROW_ASSIGN:
diff --git a/gcc/cp/cxx-pretty-print.cc b/gcc/cp/cxx-pretty-print.cc
index c33919873f1..4cda27f2b30 100644
--- a/gcc/cp/cxx-pretty-print.cc
+++ b/gcc/cp/cxx-pretty-print.cc
@@ -2640,16 +2640,16 @@ pp_cxx_trait (cxx_pretty_printer *pp, tree t)
  }
if (type2)
  {
-  if (TREE_CODE (type2) != TREE_LIST)
+  if (TREE_CODE (type2) != TREE_VEC)
{
  pp_cxx_separate_with (pp, ',');
  pp->type_id (type2);
}
else
-   for (tree arg = type2; arg; arg = TREE_CHAIN (arg))
+   for (tree arg : tree_vec_range (type2))
  {
pp_cxx_separate_with (pp, ',');
-   pp->type_id (TREE_VALUE (arg));
+   pp->type_id (arg);
  }
  }
if (kind == CPTK_TYPE_PACK_ELEMENT)
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 225ec456143..00eae56eb5b 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -2075,8 +2075,9 @@ constructible_expr (tree to, tree from)
if (!TYPE_REF_P (to))
to = cp_build_reference_type (to, /*rval*/false);
tree ob = build_stub_object (to);
-  for (; from; from = TREE_CHAIN (from))
-   vec_safe_push (args, build_stub_object (TREE_VALUE (from)));
+  vec_alloc (args, TREE_VEC_LENGTH (from));
+  for (tree arg : tree_vec_range (from))
+   args->quick_push (build_stub_object (arg));
expr = build_special_member_call (ob, complete_ctor_identifier, &args,
ctype, LOOKUP_NORMAL, tf_none);
if (expr == error_mark_node)
@@ -2096,9 +2097,9 @@ constructible_expr (tree to, tree from)
  }
else
  {
-  if (from == NULL_TREE)
+  const int len = TREE_VEC_LENGTH (from);
+  if (len == 0)
return build_value_init (strip_array_types (to), tf_none);
-  const int len = list_length (from);
if (len > 1)
{
  if (cxx_dialect < cxx20)
@@ -2112,9 +2113,9 @@ constructible_ex

Re: [PATCH] c++: improve template parameter level lowering

2023-04-20 Thread Jason Merrill via Gcc-patches


On 4/20/23 11:44, Patrick Palka wrote:

On Thu, 20 Apr 2023, Patrick Palka wrote:


1. Now that we no longer substitute the constraints of an auto, we can
get rid of the infinite recursion loop breaker during level lowering
of a constrained auto and we can also use the TEMPLATE_PARM_DESCENDANTS
cache in this case.
2. Don't bother recursing when level lowering a cv-qualified type template
parameter.
3. Use TEMPLATE_PARM_DESCENDANTS when level lowering a non-type template
parameter too.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

gcc/cp/ChangeLog:

* pt.cc (tsubst) : Remove infinite
recursion loop breaker in the level lowering case for
constrained autos.  Use the TEMPLATE_PARM_DESCENDANTS cache in
this case as well.
: Use the TEMPLATE_PARM_INDEX cache
when level lowering a non-type template parameter.
---
  gcc/cp/pt.cc | 42 --
  1 file changed, 20 insertions(+), 22 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index f65f2d58b28..07e9736cdce 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -16228,33 +16228,23 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
/* If we get here, we must have been looking at a parm for a
   more deeply nested template.  Make a new version of this
   template parameter, but with a lower level.  */
+   int quals;
switch (code)
  {
  case TEMPLATE_TYPE_PARM:
  case TEMPLATE_TEMPLATE_PARM:
-   if (cp_type_quals (t))
+   quals = cp_type_quals (t);
+   if (quals)
  {
-   r = tsubst (TYPE_MAIN_VARIANT (t), args, complain, in_decl);
-   r = cp_build_qualified_type
- (r, cp_type_quals (t),
-  complain | (code == TEMPLATE_TYPE_PARM
-  ? tf_ignore_bad_quals : 0));
+   gcc_checking_assert (code == TEMPLATE_TYPE_PARM);
+   t = TYPE_MAIN_VARIANT (t);
  }
-   else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
-&& PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
-&& (r = (TEMPLATE_PARM_DESCENDANTS
- (TEMPLATE_TYPE_PARM_INDEX (t
-&& (r = TREE_TYPE (r))
-&& !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (r))
- /* Break infinite recursion when substituting the constraints
-of a constrained placeholder.  */;
-   else if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
-&& !PLACEHOLDER_TYPE_CONSTRAINTS_INFO (t)
-&& (arg = TEMPLATE_TYPE_PARM_INDEX (t),
-r = TEMPLATE_PARM_DESCENDANTS (arg))
-&& (TEMPLATE_PARM_LEVEL (r)
-== TEMPLATE_PARM_LEVEL (arg) - levels))
-   /* Cache the simple case of lowering a type parameter.  */
+   if (TREE_CODE (t) == TEMPLATE_TYPE_PARM
+   && (arg = TEMPLATE_TYPE_PARM_INDEX (t),
+   r = TEMPLATE_PARM_DESCENDANTS (arg))
+   && (TEMPLATE_PARM_LEVEL (r)
+   == TEMPLATE_PARM_LEVEL (arg) - levels))
+ /* Cache the simple case of lowering a type parameter.  */
  r = TREE_TYPE (r);
else
  {
@@ -16278,6 +16268,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
else
  TYPE_CANONICAL (r) = canonical_type_parameter (r);
  }
+   if (quals)
+ r = cp_build_qualified_type (r, quals,
+  complain | tf_ignore_bad_quals);
break;
  
  	  case BOUND_TEMPLATE_TEMPLATE_PARM:

@@ -16307,7 +16300,12 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
type = tsubst (type, args, complain, in_decl);
if (type == error_mark_node)
  return error_mark_node;
-   r = reduce_template_parm_level (t, type, levels, args, complain);
+   if ((r = TEMPLATE_PARM_DESCENDANTS (t))
+   && (TEMPLATE_PARM_LEVEL (r) == TEMPLATE_PARM_LEVEL (t) - levels)
+   && TREE_TYPE (r) == type)
+ /* Cache the simple case of lowering a non-type parameter.  */;
+   else
+ r = reduce_template_parm_level (t, type, levels, args, complain);


D'oh, this hunk is totally redundant since reduce_template_parm_level
already checks TEMPLATE_PARM_DESCENDANTS, and so we've been caching
level-lowering of non-type template parameters this whole time.

Please consider this patch instead, which removes this hunk and
therefore only changes TEMPLATE_TYPE_PARM level lowering:


OK.


-- >8 --

Subject: [PATCH] c++: improve TEMPLATE_TYPE_PARM level lowering

1. Don't bother recursing when level lowering a cv-qualified type template
parameter.
2. Get rid of the infinite

Ping: [PATCH v2] C, ObjC: Add -Wunterminated-string-initialization

2023-04-20 Thread Alejandro Colomar via Gcc-patches

Hi David,

On 3/24/23 18:58, David Malcolm wrote:
> On Fri, 2023-03-24 at 18:45 +0100, Alejandro Colomar wrote:
>> Hi David,
>>
>> On 3/24/23 15:53, David Malcolm wrote:
>>> On Fri, 2023-03-24 at 14:39 +0100, Alejandro Colomar via Gcc-
>>> patches
>>> wrote:
 Warn about the following:

     char  s[3] = "foo";

>> [...]
>>
 ---

 Hi,
>>>
>>> Hi Alex, thanks for the patch.
>>
>> :)
>>
>>>

 I sent v1 to the wrong list.  This time I've made sure to write
 to
 gcc-patches@.
>>>
>>> Note that right now we're deep in bug-fixing/stabilization for GCC
>>> 13
>>> (and trunk is still tracking that effort), so your patch might be
>>> more
>>> suitable for GCC 14.
>>
>> Sure, no problem.  Do you have a "next" branch where you pick patches
>> for after the release, or should I resend after the release?  
> 
> We don't; resending it after release is probably best.
> 
>> Is
>> discussion of a patch reasonable now, or is it too much distracting
>> from your stabilization efforts?
> 
> FWIW I'd prefer to postpone the discussion until after we branch for
> the release.

Sure.  AFAIK it's fair game already to propose patches to GCC 14,
right?  Would you please have a look into this?  Thanks!

> 
>>
>>>

 v2 adds some draft of a test, as suggested by Martin.  However, I
 don't
 know yet how to write those, so the test is just a draft.  But I
 did
 test the feature, by compiling GCC and compiling some small
 program
 with
 it.
>>>
>>> Unfortunately the answer to the question "how do I run just one
>>> testcase in GCC's testsuite" is rather non-trivial; FWIW I've
>>> written
>>> up some notes on working with the GCC testsuite here:
>>> https://gcc-newbies-guide.readthedocs.io/en/latest/working-with-the-testsuite.html
>>
>> Hmm, I'll try following that; thanks!  Is there anything obvious that
>> I might have missed, at first glance?
> 
> The main thing is that there's a difference between compiling the test
> case "by hand", versus doing it through the test harness - the latter
> sets up the environment in a particular way, injects a particular set
> of flags, etc etc.

I forgot about this; I'll have a look into it when I find some time.

Cheers,
Alex

> 
> Dave
> 

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature

[PATCH] doc: tfix

2023-04-20 Thread Alejandro Colomar via Gcc-patches

Remove repeated word (typo).

Signed-off-by: Alejandro Colomar 
---
 gcc/doc/extend.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index fd3745c5608..cdfb25ff272 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3756,7 +3756,7 @@ take function pointer arguments.
 The @code{optimize} attribute is used to specify that a function is to
 be compiled with different optimization options than specified on the
 command line.  The optimize attribute arguments of a function behave
-behave as if appended to the command-line.
+as if appended to the command-line.
 
 Valid arguments are constant non-negative integers and
 strings.  Each numeric argument specifies an optimization @var{level}.
-- 
2.40.0

[PATCH] PR tee-optimization/109564 - Do not ignore UNDEFINED ranges when determining PHI equivalences.

2023-04-20 Thread Andrew MacLeod via Gcc-patches

This removes specal casing UNDEFINED ranges when we are checking to see 
if all arguments are the same and registering an equivalence.


previously if there were 2 different names, and one was undefined, we 
ignored it an created an equivaence with the other one.  as observed, 
this is not a 2 way relationship, and as such, we souldnt do it this 
way.   This removes the bypass for undefined ranges in chekcing if 
arguments are the same symbol.


Bootstrapped/regtested successfully on x86_64-linux and i686-linux.  OK 
for trunk?


Andrew


commit 26f20f4446531225b362b9ec7b473ce4f0822a0a
Author: Andrew MacLeod 
Date:   Thu Apr 20 13:10:40 2023 -0400

Do not ignore UNDEFINED ranges when determining PHI equivalences.

Do not ignore UNDEFINED name arguments when registering two-way equivalences
from PHIs.

PR tree-optimization/109564
gcc/
* gimple-range-fold.cc (fold_using_range::range_of_phi):
Do no ignore NUDEFINED range names when deciding if all the names
on a PHI are the same,

gcc/testsuite/
* gcc.dg/torture/pr109564-1.c: New testcase.
* gcc.dg/torture/pr109564-2.c: Likewise.
* gcc.dg/tree-ssa/evrp-ignore.c: XFAIL.
* gcc.dg/tree-ssa/vrp06.c: Likewise.
---

diff --git a/gcc/gimple-range-fold.cc b/gcc/gimple-range-fold.cc
index 429734f954a..180f349eda9 100644
--- a/gcc/gimple-range-fold.cc
+++ b/gcc/gimple-range-fold.cc
@@ -771,16 +771,16 @@ fold_using_range::range_of_phi (vrange &r, gphi *phi, fur_source &src)
 
 	  if (gimple_range_ssa_p (arg) && src.gori ())
 	src.gori ()->register_dependency (phi_def, arg);
+	}
 
-	  // Track if all arguments are the same.
-	  if (!seen_arg)
-	{
-	  seen_arg = true;
-	  single_arg = arg;
-	}
-	  else if (single_arg != arg)
-	single_arg = NULL_TREE;
+  // Track if all arguments are the same.
+  if (!seen_arg)
+	{
+	  seen_arg = true;
+	  single_arg = arg;
 	}
+  else if (single_arg != arg)
+	single_arg = NULL_TREE;
 
   // Once the value reaches varying, stop looking.
   if (r.varying_p () && single_arg == NULL_TREE)
diff --git a/gcc/testsuite/gcc.dg/torture/pr109564-1.c b/gcc/testsuite/gcc.dg/torture/pr109564-1.c
new file mode 100644
index 000..e7c855f1edf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr109564-1.c
@@ -0,0 +1,74 @@
+/* { dg-do run } */
+
+struct libkeccak_spec {
+long int bitrate;
+};
+
+struct libkeccak_generalised_spec {
+long int bitrate;
+long int state_size;
+long int word_size;
+};
+
+int __attribute__((noipa))
+libkeccak_degeneralise_spec(struct libkeccak_generalised_spec *restrict spec,
+			struct libkeccak_spec *restrict output_spec)
+{
+  long int state_size, word_size, bitrate, output;
+  const int have_state_size = spec->state_size != (-65536L);
+  const int have_word_size = spec->word_size != (-65536L);
+  const int have_bitrate = spec->bitrate != (-65536L);
+
+  if (have_state_size)
+{
+  state_size = spec->state_size;
+  if (state_size <= 0)
+	return 1;
+  if (state_size > 1600)
+	return 2;
+}
+
+  if (have_word_size)
+{
+  word_size = spec->word_size;
+  if (word_size <= 0)
+	return 4;
+  if (word_size > 64)
+	return 5;
+  if (have_state_size && state_size != word_size * 25)
+	return 6;
+  else if (!have_state_size) {
+	  spec->state_size = 1;
+	  state_size = word_size * 25;
+  }
+}
+
+  if (have_bitrate)
+bitrate = spec->bitrate;
+
+  if (!have_bitrate)
+{
+  state_size = (have_state_size ? state_size : (1600L));
+  output = ((state_size << 5) / 100L + 7L) & ~0x07L;
+  bitrate = output << 1;
+}
+
+  output_spec->bitrate = bitrate;
+
+  return 0;
+}
+
+int main ()
+{
+  struct libkeccak_generalised_spec gspec;
+  struct libkeccak_spec spec;
+  spec.bitrate = -1;
+  gspec.bitrate = -65536;
+  gspec.state_size = -65536;
+  gspec.word_size = -65536;
+  if (libkeccak_degeneralise_spec(&gspec, &spec))
+__builtin_abort ();
+  if (spec.bitrate != 1024)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/torture/pr109564-2.c b/gcc/testsuite/gcc.dg/torture/pr109564-2.c
new file mode 100644
index 000..eeab437c0b3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr109564-2.c
@@ -0,0 +1,33 @@
+/* { dg-do run } */
+
+struct libkeccak_generalised_spec {
+  int state_size;
+  int word_size;
+} main_gspec;
+
+long gvar;
+
+int libkeccak_degeneralise_spec(struct libkeccak_generalised_spec *spec)
+{
+  int state_size;
+  int have_state_size = spec->state_size != -1;
+  int have_word_size = spec->word_size;
+
+  if (have_state_size)
+state_size = spec->state_size;
+  if (have_word_size)
+gvar = 12345;
+  if (have_state_size && state_size != spec->word_size)
+return 1;
+  if (spec)
+gvar++;
+  return 0;
+}
+
+int main()
+{
+  main_gspec.state_size = -1;
+  if (libkeccak_degeneralise_spec(&main_gspec))
+__built

Re: [PATCH] c: Avoid -Wenum-int-mismatch warning for redeclaration of builtin acc_on_device [PR107041]

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 12:48:57PM -0400, Marek Polacek wrote:
> > -  else if (enum_and_int_p && TREE_CODE (newdecl) != TYPE_DECL)
> > +  else if (enum_and_int_p
> > +  && TREE_CODE (newdecl) != TYPE_DECL
> > +  /* Don't warn about about acc_on_device builtin redeclaration,
> 
> "built-in"
> 
> > + the builtin is declared with int rather than enum because
> 
> "built-in"

Changing.
> 
> > + the enum isn't intrinsic.  */
> > +  && !(TREE_CODE (olddecl) == FUNCTION_DECL
> > +   && fndecl_built_in_p (olddecl, BUILT_IN_ACC_ON_DEVICE)
> > +   && !C_DECL_DECLARED_BUILTIN (olddecl)))
> 
> What do you think about adding an (UN)LIKELY here?  This seems a rather
> very special case.  On the other hand we're not on a hot path here so it
> hardly matters.

If anything, I'd add it either as UNLIKELY (enum_and_int_p) because that
whole thing is unlikely, or add UNLIKELY (flag_openacc) && before this
acc_on_device stuff (but then users of -fopenacc might complain that it is
likely for them).

Jakub

Re: [PATCH] c: Avoid -Wenum-int-mismatch warning for redeclaration of builtin acc_on_device [PR107041]

2023-04-20 Thread Marek Polacek via Gcc-patches

On Thu, Apr 20, 2023 at 07:24:29PM +0200, Jakub Jelinek wrote:
> On Thu, Apr 20, 2023 at 12:48:57PM -0400, Marek Polacek wrote:
> > > -  else if (enum_and_int_p && TREE_CODE (newdecl) != TYPE_DECL)
> > > +  else if (enum_and_int_p
> > > +&& TREE_CODE (newdecl) != TYPE_DECL
> > > +/* Don't warn about about acc_on_device builtin redeclaration,
> > 
> > "built-in"
> > 
> > > +   the builtin is declared with int rather than enum because
> > 
> > "built-in"
> 
> Changing.
> > 
> > > +   the enum isn't intrinsic.  */
> > > +&& !(TREE_CODE (olddecl) == FUNCTION_DECL
> > > + && fndecl_built_in_p (olddecl, BUILT_IN_ACC_ON_DEVICE)
> > > + && !C_DECL_DECLARED_BUILTIN (olddecl)))
> > 
> > What do you think about adding an (UN)LIKELY here?  This seems a rather
> > very special case.  On the other hand we're not on a hot path here so it
> > hardly matters.
> 
> If anything, I'd add it either as UNLIKELY (enum_and_int_p) because that
> whole thing is unlikely,

Might could as well.

> or add UNLIKELY (flag_openacc) && before this
> acc_on_device stuff (but then users of -fopenacc might complain that it is
> likely for them).

Ok.

Marek

Re: [PATCH] MAINTAINERS: add Vineet Gupta to write after approval

2023-04-20 Thread Palmer Dabbelt


On Thu, 20 Apr 2023 09:55:23 PDT (-0700), Vineet Gupta wrote:

ChangeLog:

* MAINTAINERS (Write After Approval): Add myself.

(Ref: <680c7bbe-5d6e-07cd-8468-247afc65e...@gmail.com>)

Signed-off-by: Vineet Gupta 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cebf45d49e56..5f25617212a5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -434,6 +434,7 @@ Haochen Gui 

 Jiufu Guo  
 Xuepeng Guo
 Wei Guozhi 
+Vineet Gupta   
 Naveen H.S 
 Mostafa Hagog  
 Andrew Haley   


Acked-by: Palmer Dabbelt 

Though not sure if I can do that, maybe we need a global reviewer?

Re: [PATCH] PR tee-optimization/109564 - Do not ignore UNDEFINED ranges when determining PHI equivalences.

2023-04-20 Thread Richard Biener via Gcc-patches




> Am 20.04.2023 um 19:22 schrieb Andrew MacLeod :
> 
> This removes specal casing UNDEFINED ranges when we are checking to see if 
> all arguments are the same and registering an equivalence.
> 
> previously if there were 2 different names, and one was undefined, we ignored 
> it an created an equivaence with the other one.  as observed, this is not a 2 
> way relationship, and as such, we souldnt do it this way.   This removes the 
> bypass for undefined ranges in chekcing if arguments are the same symbol.
> 
> Bootstrapped/regtested successfully on x86_64-linux and i686-linux.  OK for 
> trunk?

LGTM

Richard.

> Andrew
> 
> 
> <564.patch>

[PATCH] tree-vect-patterns: One small vect_recog_ctz_ffs_pattern tweak [PR109011]

2023-04-20 Thread Jakub Jelinek via Gcc-patches

Hi!

I've noticed I've made a typo, ifn in this function this late
is always only IFN_CTZ or IFN_FFS, never IFN_CLZ.

Due to this typo, we weren't using the originally intended
.CTZ (X) = .POPCOUNT ((X - 1) & ~X)
but
.CTZ (X) = PREC - .POPCOUNT (X | -X)
instead when we want to emit __builtin_ctz*/.CTZ using .POPCOUNT.
Both compute the same value, both are defined at 0 with the
same value (PREC), both have same number of GIMPLE statements,
but I think the former ought to be preferred, because lots of targets
have andn as a single operation rather than two, and also putting
a -1 constant into a vector register is often cheaper than vector
with broadcast PREC power of two value.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-04-20  Jakub Jelinek  

PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): Use
.CTZ (X) = .POPCOUNT ((X - 1) & ~X) in preference to
.CTZ (X) = PREC - .POPCOUNT (X | -X).

--- gcc/tree-vect-patterns.cc.jj2023-04-20 11:55:03.576154120 +0200
+++ gcc/tree-vect-patterns.cc   2023-04-20 12:09:17.884633795 +0200
@@ -1630,7 +1630,7 @@ vect_recog_ctz_ffs_pattern (vec_info *vi
&& defined_at_zero_new
&& val == prec
&& val_new == prec)
-  || (ifnnew == IFN_POPCOUNT && ifn == IFN_CLZ))
+  || (ifnnew == IFN_POPCOUNT && ifn == IFN_CTZ))
 {
   /* .CTZ (X) = PREC - .CLZ ((X - 1) & ~X)
 .CTZ (X) = .POPCOUNT ((X - 1) & ~X).  */

Jakub

Re: [PATCH v6] RISC-V: Add support for experimental zfa extension.

2023-04-20 Thread Jeff Law via Gcc-patches





On 3/10/23 05:40, Jin Ma via Gcc-patches wrote:

This patch adds the 'Zfa' extension for riscv, which is based on:
  
https://github.com/riscv/riscv-isa-manual/commit/d74d99e22d5f68832f70982d867614e2149a3bd7
latest 'Zfa' change on the master branch of the RISC-V ISA Manual as
of this writing.

The Wiki Page (details):
  https://github.com/a4lg/binutils-gdb/wiki/riscv_zfa

The binutils-gdb for 'Zfa' extension:
  https://sourceware.org/pipermail/binutils/2022-September/122938.html

Implementation of zfa extension on LLVM:
   https://reviews.llvm.org/rGc0947dc44109252fcc0f68a542fc6ef250d4d3a9

There are three points that need to be discussed here.
1. According to riscv-spec, "The FCVTMO D.W.D instruction was added principally 
to
   accelerate the processing of JavaScript Numbers.", so it seems that no 
implementation
   is required in the compiler.
2. The FROUND and FROUNDN instructions in this patch use related functions in 
the math
   library, such as round, floor, ceil, etc. Since there is no interface for 
half-precision in
   the math library, the instructions FROUN D.H and FROUNDN X.H have not been 
implemented for
   the time being. Is it necessary to add a built-in interface belonging to 
riscv such as
  __builtin_roundhf or __builtin_roundf16 to generate half floating point 
instructions?
3. As far as I know, FMINM and FMAXM instructions correspond to C23 library 
function fminimum
   and fmaximum. Therefore, I have not dealt with such instructions for the 
time being, but have
   simply implemented the pattern of fminm3 and fmaxm3. Is 
it necessary to
   add a built-in interface belonging to riscv such as__builtin_fminm to 
generate half
   floating-point instructions?

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zfa extension.
* config/riscv/constraints.md (Zf): Constrain the floating point number 
that the FLI instruction can load.
* config/riscv/iterators.md (round_pattern): New.
* config/riscv/predicates.md: Predicate the floating point number that 
the FLI instruction can load.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): 
Get the index of the
   floating-point number that the FLI instruction can load.
* config/riscv/riscv.cc (find_index_in_array): New.
(riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): Likewise.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): Exclude floating point numbers that can be 
loaded by FLI instructions.
(riscv_output_move): Likewise.
(riscv_memmodel_needs_release_fence): Likewise.
(riscv_print_operand): Likewise.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.h (GP_REG_RTX_P): New.
* config/riscv/riscv.md (fminm3): New.
(fmaxm3): New.
(2): New.
(rint2): New.
(f_quiet4_zfa): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zfa-fleq-fltq-rv32.c: New test.
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh-rv32.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp-rv32.c: New test.
* gcc.target/riscv/zfa-fround-rv32.c: New test.
* gcc.target/riscv/zfa-fround.c: New test.
---
  gcc/common/config/riscv/riscv-common.cc   |   4 +
  gcc/config/riscv/constraints.md   |   7 +
  gcc/config/riscv/iterators.md |   5 +
  gcc/config/riscv/predicates.md|   4 +
  gcc/config/riscv/riscv-opts.h |   3 +
  gcc/config/riscv/riscv-protos.h   |   1 +
  gcc/config/riscv/riscv.cc | 168 +-
  gcc/config/riscv/riscv.h  |   1 +
  gcc/config/riscv/riscv.md | 112 +---
  .../gcc.target/riscv/zfa-fleq-fltq-rv32.c |  19 ++
  .../gcc.target/riscv/zfa-fleq-fltq.c  |  19 ++
  gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c |  79 
  .../gcc.target/riscv/zfa-fli-zfh-rv32.c   |  41 +
  gcc/testsuite/gcc.target/riscv/zfa-fli-zfh.c  |  41 +
  gcc/testsuite/gcc.target/riscv/zfa-fli.c  |  79 
  .../gcc.target/riscv/zfa-fmovh-fmovp-rv32.c   |  10 ++
  .../gcc.target/riscv/zfa-fround-rv32.c|  42 +
  gcc/testsuite/gcc.target/riscv/zfa-fround.c   |  42 +
  18 files changed, 654 insertions(+), 23 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fleq-fltq.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-rv32.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/zfa-fli-zfh-rv32.c
  create mode

Re: [PATCH] tree-vect-patterns: One small vect_recog_ctz_ffs_pattern tweak [PR109011]

2023-04-20 Thread Richard Biener via Gcc-patches




> Am 20.04.2023 um 19:40 schrieb Jakub Jelinek via Gcc-patches 
> :
> 
> Hi!
> 
> I've noticed I've made a typo, ifn in this function this late
> is always only IFN_CTZ or IFN_FFS, never IFN_CLZ.
> 
> Due to this typo, we weren't using the originally intended
> .CTZ (X) = .POPCOUNT ((X - 1) & ~X)
> but
> .CTZ (X) = PREC - .POPCOUNT (X | -X)
> instead when we want to emit __builtin_ctz*/.CTZ using .POPCOUNT.
> Both compute the same value, both are defined at 0 with the
> same value (PREC), both have same number of GIMPLE statements,
> but I think the former ought to be preferred, because lots of targets
> have andn as a single operation rather than two, and also putting
> a -1 constant into a vector register is often cheaper than vector
> with broadcast PREC power of two value.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

> 2023-04-20  Jakub Jelinek  
> 
>PR tree-optimization/109011
>* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): Use
>.CTZ (X) = .POPCOUNT ((X - 1) & ~X) in preference to
>.CTZ (X) = PREC - .POPCOUNT (X | -X).
> 
> --- gcc/tree-vect-patterns.cc.jj2023-04-20 11:55:03.576154120 +0200
> +++ gcc/tree-vect-patterns.cc2023-04-20 12:09:17.884633795 +0200
> @@ -1630,7 +1630,7 @@ vect_recog_ctz_ffs_pattern (vec_info *vi
>&& defined_at_zero_new
>&& val == prec
>&& val_new == prec)
> -  || (ifnnew == IFN_POPCOUNT && ifn == IFN_CLZ))
> +  || (ifnnew == IFN_POPCOUNT && ifn == IFN_CTZ))
> {
>   /* .CTZ (X) = PREC - .CLZ ((X - 1) & ~X)
> .CTZ (X) = .POPCOUNT ((X - 1) & ~X).  */
> 
>Jakub
>

Re: [RFC PATCH v1 02/10] RISC-V: Recognize Zicond (conditional operations) extension

2023-04-20 Thread Jeff Law via Gcc-patches





On 2/10/23 15:41, Philipp Tomsich wrote:

This adds the RISC-V Zicond extension to the option parsing.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Recognize "zicond"
as part of an architecture string.
* config/riscv/riscv-opts.h (MASK_ZICOND): Define.
(TARGET_ZICOND): Define.

OK
jeff

Re: [RFC PATCH v1 03/10] RISC-V: Generate czero.eqz/nez on noce_try_store_flag_mask if-conversion

2023-04-20 Thread Jeff Law via Gcc-patches





On 2/10/23 15:41, Philipp Tomsich wrote:

Adds a pattern to map the output of noce_try_store_flag_mask
if-conversion in the combiner onto vt.maskc; the input patterns
supported are similar to the following:
   (set (reg/v/f:DI 75 [  ])
(and:DI (neg:DI (ne:DI (reg:DI 82)
   (const_int 0 [0])))
   (reg/v/f:DI 75 [  ])))

To ensure that the combine-pass doesn't get confused about
profitability, we recognize the idiom as requiring a single
instruction when the Zicond extension is present.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_rtx_costs): Recongnize the idiom
for conditional-zero as a single instruction for TARGET_ZICOND
* config/riscv/riscv.md: Include zicond.md.
* config/riscv/zicond.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ne-03.c: New test.
* gcc.target/riscv/zicond-ne-04.c: New test.
So as we've discussed earlier on the list.  Conceptually I think we've 
agreed that an if-then-else style of conditional zero is probably a 
better model.


So that will have some impact on this patch since it digs into the RTL 
looking for the (and (neg ...)) form.   But I don't think it changes 
anything conceptually in this patch, just the implementation details.


So I'm OK with this patch once it's updated for the updated form we want 
to be using.


jeff

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Siddhesh Poyarekar


On 2023-04-20 11:52, Jakub Jelinek wrote:

Why?  Unless an implementation guarantees <= 0.5ulps errors, it can be one
or more ulps off, why is an error at or near 1.0 or -1.0 error any worse
than similar errors for other values?


In a general sense, maybe not, but in the sense of breaching the bounds 
of admissible values, especially when it can be reasonably corrected, it 
seems worse IMO to let the error slide.



Similarly for other functions which have other ranges, perhaps not with so
nice round numbers.  Say asin has [-pi/2, pi/2] range, those numbers aren't
exactly representable, but is it any worse to round those values to -inf or
+inf or worse give something 1-5 ulps further from that interval comparing
to other 1-5ulps errors?


I agree the argument in favour of allowing errors breaching the 
mathematical bounds is certainly stronger for bounds that are not 
exactly representable.  I just feel like the implementation ought to 
take the additional effort when the bounds are representable and make it 
easier for the compiler.


For bounds that aren't representable, one could get error bounds from 
libm-test-ulps data in glibc, although I reckon those won't be 
exhaustive.  From a quick peek at the sin/cos data, the arc target seems 
to be among the worst performers at about 7ulps, although if you include 
the complex routines we get close to 13 ulps.  The very worst 
imprecision among all math routines (that's gamma) is at 16 ulps for 
power in glibc tests, so maybe allowing about 25-30 ulps error in bounds 
might work across the board.


But yeah, it's probably going to be guesswork.

Thanks,
Sid

Re: [RFC PATCH v1 04/10] RISC-V: Support immediates in Zicond

2023-04-20 Thread Jeff Law via Gcc-patches





On 2/10/23 15:41, Philipp Tomsich wrote:

When if-conversion encounters sequences using immediates, the
sequences can't trivially map back onto czero.eqz/czero.nezt (even if
benefitial) due to czero.eqz/czero.nez not having immediate forms.

This adds a splitter to rewrite opportunities for Zicond that operate
on an immediate by first putting the immediate into a register to
enable the non-immediate czero.eqz/czero.nez instructions to operate
on the value.

Consider code, such as

   long func2 (long a, long c)
   {
 if (c)
   a = 2;
 else
   a = 5;
 return a;
   }

which will be converted to

   func2:
seqza0,a2
neg a0,a0
andia0,a0,3
addia0,a0,2
ret

Following this change, we generate

li  a0,3
czero.nez a0,a0,a2
addia0,a0,2
ret

This commit also introduces a simple unit test for if-conversion with
immediate (literal) values as the sources for simple sets in the THEN
and ELSE blocks. The test checks that the conditional-zero instruction
(czero.eqz/nez) is emitted as part of the resulting branchless
instruction sequence.

gcc/ChangeLog:

* config/riscv/zicond.md: Support immediates for
czero.eqz/czero.nez through a splitter.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zicond-ifconv-imm.c: New test.

Same comment & resolution as with the #3 in this series.

A note though.  I've got Raphael looking at wiring this capability into 
the movcc expander as well.  When complete that *may* make this 
patch largely obsolete.  But I don't think we necessarily need to wait 
for that to work to land this patch.


jeff

Re: [PATCH] doc: tfix

2023-04-20 Thread Arsen Arsenović via Gcc-patches


Alejandro Colomar via Gcc-patches  writes:

> Remove repeated word (typo).
>
> Signed-off-by: Alejandro Colomar 
> ---
>  gcc/doc/extend.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index fd3745c5608..cdfb25ff272 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3756,7 +3756,7 @@ take function pointer arguments.
>  The @code{optimize} attribute is used to specify that a function is to
>  be compiled with different optimization options than specified on the
>  command line.  The optimize attribute arguments of a function behave
> -behave as if appended to the command-line.
> +as if appended to the command-line.
>  
>  Valid arguments are constant non-negative integers and
>  strings.  Each numeric argument specifies an optimization @var{level}.

Please include a ChangeLog like the following:

gcc/ChangeLog:

* doc/extend.texi (Common Function Attributes): Remove duplicate
word.

I can add that and push for you, if you agree.

Thanks, have a most lovely night!
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

[PATCH] Fortran: function results never have the ALLOCATABLE attribute [PR109500]

2023-04-20 Thread Harald Anlauf via Gcc-patches

Dear all,

Fortran 2018 added a clarification that the *result* of a function
whose result *variable* has the ALLOCATABLE attribute is a *value*
that itself does not have the ALLOCATABLE attribute.

For those interested: there was a thread on the J3 mailing list
some time ago (for links see the PR).

The patch which implements a related check was co-authored with
Steve and regtested by him.  Testcase verified against NAG.

OK for mainline (gcc-14)?

Thanks,
Harald & Steve

From 2cebc8f9e7b399b7747c9ad0392831de91851b5b Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 20 Apr 2023 21:47:34 +0200
Subject: [PATCH] Fortran: function results never have the ALLOCATABLE
 attribute [PR109500]

Fortran 2018 8.5.3 (ALLOCATABLE attribute) explains in Note 1 that the
result of referencing a function whose result variable has the ALLOCATABLE
attribute is a value that does not itself have the ALLOCATABLE attribute.

gcc/fortran/ChangeLog:

	PR fortran/109500
	* interface.cc (gfc_compare_actual_formal): Reject allocatable
	functions being used as actual argument for allocable dummy.

gcc/testsuite/ChangeLog:

	PR fortran/109500
	* gfortran.dg/allocatable_function_11.f90: New test.

Co-authored-by: Steven G. Kargl 
---
 gcc/fortran/interface.cc  | 12 +++
 .../gfortran.dg/allocatable_function_11.f90   | 36 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/allocatable_function_11.f90

diff --git a/gcc/fortran/interface.cc b/gcc/fortran/interface.cc
index e9843e9549c..968ee193c07 100644
--- a/gcc/fortran/interface.cc
+++ b/gcc/fortran/interface.cc
@@ -3638,6 +3638,18 @@ gfc_compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
 	  goto match;
 	}

+  if (a->expr->expr_type == EXPR_FUNCTION
+	  && a->expr->value.function.esym
+	  && f->sym->attr.allocatable)
+	{
+	  if (where)
+	gfc_error ("Actual argument for %qs at %L is a function result "
+		   "and the dummy argument is ALLOCATABLE",
+		   f->sym->name, &a->expr->where);
+	  ok = false;
+	  goto match;
+	}
+
   /* Check intent = OUT/INOUT for definable actual argument.  */
   if (!in_statement_function
 	  && (f->sym->attr.intent == INTENT_OUT
diff --git a/gcc/testsuite/gfortran.dg/allocatable_function_11.f90 b/gcc/testsuite/gfortran.dg/allocatable_function_11.f90
new file mode 100644
index 000..1a2831e186f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/allocatable_function_11.f90
@@ -0,0 +1,36 @@
+! { dg-do compile }
+! PR fortran/109500 - check F2018:8.5.3 Note 1
+!
+! The result of referencing a function whose result variable has the
+! ALLOCATABLE attribute is a value that does not itself have the
+! ALLOCATABLE attribute.
+
+program main
+  implicit none
+  integer, allocatable  :: p
+  procedure(f), pointer :: pp
+  pp => f
+  p = f()
+  print *, allocated (p)
+  print *, is_allocated (p)
+  print *, is_allocated (f())  ! { dg-error "is a function result" }
+  print *, is_allocated (pp()) ! { dg-error "is a function result" }
+  call s (p)
+  call s (f())  ! { dg-error "is a function result" }
+  call s (pp()) ! { dg-error "is a function result" }
+
+contains
+  subroutine s(p)
+integer, allocatable :: p
+  end subroutine s
+
+  function f()
+integer, allocatable :: f
+allocate (f, source=42)
+  end function
+
+  logical function is_allocated(p)
+integer, allocatable :: p
+is_allocated = allocated(p)
+  end function
+end program
--
2.35.3

RISC-V: avoid splitting small constants in bcrli_nottwobits patterns

2023-04-20 Thread Jivan Hakobyan via Gcc-patches

Hi all.

I have noticed that in the case when we try to clear two bits through a
small constant,
and ZBS is enabled then GCC split it into two "andi" instructions.
For example for the following C code:
  int foo(int a) {
return a & ~ 0x101;
  }

GCC generates the following:
  foo:
 andi a0,a0,-2
 andi a0,a0,-257
 ret

but should be this one:
  foo:
 andi a0,a0,-258
 ret


This patch solves the mentioned issue.


-- 
With the best regards
Jivan Hakobyan
RISC-V: avoid splitting small constant in *bclri_nottwobits
and *bclridisi_nottwobit patterns

gcc/
* config/riscv/bitmanip.md Updated predicats of
bclri_nottwobits and bclridisi_nottwobits patterns
* config/riscv/predicates.md (not_uimm_extra_bit_or_nottwobits):
Adjust predicate to avoid splitting arith constants
* config/riscv/predicates.md (const_nottwobits_not_arith_operand):
New predicate

gcc/testsuite
* gcc.target/riscv/zbs-bclri-nottwobits.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 388ef662820..f3d29a466e7 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -507,7 +507,7 @@
 (define_insn_and_split "*bclri_nottwobits"
   [(set (match_operand:X 0 "register_operand" "=r")
 	(and:X (match_operand:X 1 "register_operand" "r")
-	   (match_operand:X 2 "const_nottwobits_operand" "i")))]
+	   (match_operand:X 2 "const_nottwobits_not_arith_operand" "i")))]
   "TARGET_ZBS && !paradoxical_subreg_p (operands[1])"
   "#"
   "&& reload_completed"
@@ -526,7 +526,7 @@
 (define_insn_and_split "*bclridisi_nottwobits"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(and:DI (match_operand:DI 1 "register_operand" "r")
-		(match_operand:DI 2 "const_nottwobits_operand" "i")))]
+		(match_operand:DI 2 "const_nottwobits_not_arith_operand" "i")))]
   "TARGET_64BIT && TARGET_ZBS
&& clz_hwi (~UINTVAL (operands[2])) > 33"
   "#"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc5943..e5adf06fa25 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -366,6 +366,11 @@
   (and (match_code "const_int")
(match_test "popcount_hwi (~UINTVAL (op)) == 2")))
 
+(define_predicate "const_nottwobits_not_arith_operand"
+  (and (match_code "const_int")
+   (and (not (match_operand 0 "arith_operand"))
+	(match_operand 0 "const_nottwobits_operand"
+
 ;; A CONST_INT operand that consists of a single run of 32 consecutive
 ;; set bits.
 (define_predicate "consecutive_bits32_operand"
@@ -411,4 +416,4 @@
 (define_predicate "not_uimm_extra_bit_or_nottwobits"
   (and (match_code "const_int")
(ior (match_operand 0 "not_uimm_extra_bit_operand")
-	(match_operand 0 "const_nottwobits_operand"
+	(match_operand 0 "const_nottwobits_not_arith_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bclri-nottwobits.c b/gcc/testsuite/gcc.target/riscv/zbs-bclri-nottwobits.c
new file mode 100644
index 000..5a58e0a1185
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bclri-nottwobits.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+int and_two_bit(int idx) {
+return idx & ~3;
+}
+
+int and_bclr_two_bit(int idx) {
+return idx & ~(0x4001);
+}
+
+/* { dg-final { scan-assembler-times "\tandi\t" 2 } } */
+/* { dg-final { scan-assembler-times "\tbclri\t" 1 } } */

Re: [PATCH] update_web_docs_git: Add updated Texinfo to PATH

2023-04-20 Thread Gerald Pfeifer

Hi Arsen,

On Fri, 14 Apr 2023, Arsen Arsenović wrote:
>> Did you intentionally not implement the following part of my suggestion
>>
>>if [ x${MAKEINFO}x = xx ]; then
>>:
> > that is, allowing to override from the command-line (or crontab)?
> (answering both the questions)
> 
> This := operator is a handy "default assign" operator.  It's a bit of an
> oddity of the POSIX shell, but it works well.  The line:
> 
>   : "${foo:=bar}"
> 
> is a convenient way of spelling "if foo is unset or null, set it to
> bar".  the initial ':' there serves to discard the result of this
> evaluation (so that only its side effect of updating foo if necessary is
> kept)

I understand, just am wondering whether and why the : is required? I 
don't think we are using this construct anywhere else?

(I was aware of the ${foo:=bar} syntax, just caught up by you pushing
that part of the logic to the lowest level whereas I had it at the top
level. That's purely on me.)

Please go ahead and push this (or a variant without the : commands) and
I'll then pick it up from there.

Gerald

Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-20 Thread Michael Collison


Hi Kito,

I will remove the unused UNSPECs, thank you for finding them.

I removed the include of "vector-iterators.md" because "riscv.md" 
already includes it and I was receiving multiple definition errors.


On 4/18/23 21:19, Kito Cheng wrote:

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70ad85b661b..7fae87968d7 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -34,6 +34,8 @@
UNSPEC_VMULHU
UNSPEC_VMULHSU

+  UNSPEC_VADD
+  UNSPEC_VSUB

Defined but unused?


UNSPEC_VADC
UNSPEC_VSBC
UNSPEC_VMADC
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
  ;; - Auto-vectorization (TBD)
  ;; - Combine optimization (TBD)

-(include "vector-iterators.md")
-

Why remove this?

Re: [PATCH] update_web_docs_git: Add updated Texinfo to PATH

2023-04-20 Thread Arsen Arsenović via Gcc-patches


Gerald Pfeifer  writes:

> Hi Arsen,
>
> On Fri, 14 Apr 2023, Arsen Arsenović wrote:
>>> Did you intentionally not implement the following part of my suggestion
>>>
>>>if [ x${MAKEINFO}x = xx ]; then
>>>:
>> > that is, allowing to override from the command-line (or crontab)?
>> (answering both the questions)
>> 
>> This := operator is a handy "default assign" operator.  It's a bit of an
>> oddity of the POSIX shell, but it works well.  The line:
>> 
>>   : "${foo:=bar}"
>> 
>> is a convenient way of spelling "if foo is unset or null, set it to
>> bar".  the initial ':' there serves to discard the result of this
>> evaluation (so that only its side effect of updating foo if necessary is
>> kept)
>
> I understand, just am wondering whether and why the : is required? I 
> don't think we are using this construct anywhere else?

Without them, this would happen:

  ~$ "${foo:=foo}"
  bash: foo: command not found
  ~ 127 $ unset foo
  ~$ echo "${foo:=foo}"
  foo
  ~$ 

> (I was aware of the ${foo:=bar} syntax, just caught up by you pushing
> that part of the logic to the lowest level whereas I had it at the top
> level. That's purely on me.)
>
> Please go ahead and push this (or a variant without the : commands) and
> I'll then pick it up from there.

Thank you!  Hopefully we get this just in time for 13 :)

Pushed.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

Re: [PATCH v3 3/4] ree: Main functionality to Improve ree pass for rs6000 target

2023-04-20 Thread Ajit Agarwal via Gcc-patches

Hello Jeff:

On 20/04/23 3:23 am, Jeff Law wrote:
> 
> 
> On 4/19/23 12:00, Ajit Agarwal wrote:
>> Hello All:
>>
>> This is patch-3 to improve ree pass for rs6000 target.
>> Main functionality routines to imprve ree pass.
>>
>> Bootstrapped and regtested on powerpc64-gnu-linux.
>>
>> Thanks & Regards
>> Ajit
>>
>> ree: Improve ree pass for rs6000 target.
>>
>> For rs6000 target we see redundant zero and sign
>> extension and done to improve ree pass to eliminate
>> such redundant zero and sign extension. Support of
>> zero_extend/sign_extend/AND.
>>
>> 2023-04-19  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * ree.cc (eliminate_across_bbs_p): Add checks to enable extension
>> elimination across and within basic blocks.
>> (def_arith_p): New function to check definition has arithmetic
>> operation.
>> (combine_set_extension): Modification to incorporate AND
>> and current zero_extend and sign_extend instruction.
>> (merge_def_and_ext): Add calls to eliminate_across_bbs_p and
>> zero_extend sign_extend and AND instruction.
>> (rtx_is_zext_p): New function.
>> (reg_used_set_between_p): New function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * g++.target/powerpc/zext-elim.C: New testcase.
>> * g++.target/powerpc/zext-elim-1.C: New testcase.
>> * g++.target/powerpc/zext-elim-2.C: New testcase.
>> * g++.target/powerpc/sext-elim.C: New testcase.
>> ---
>>   gcc/ree.cc    | 451 --
>>   gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
>>   .../g++.target/powerpc/zext-elim-1.C  |  19 +
>>   .../g++.target/powerpc/zext-elim-2.C  |  11 +
>>   gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
>>   5 files changed, 482 insertions(+), 47 deletions(-)
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
>>
>> diff --git a/gcc/ree.cc b/gcc/ree.cc
>> index 413aec7c8eb..053db2e8ff3 100644
>> --- a/gcc/ree.cc
>> +++ b/gcc/ree.cc
>> @@ -253,6 +253,71 @@ struct ext_cand
>>     static int max_insn_uid;
>>   +bool
>> +reg_used_set_between_p (rtx set, rtx_insn *def_insn, rtx_insn *insn
>> +{
>> +  if (reg_used_between_p (set, def_insn, insn)
>> +  || reg_set_between_p (set, def_insn, insn))
>> +    return true;
>> +
>> +  return false;
>> +}
> This seems general enough that it should go into the same file as 
> reg_used_between_p and reg_set_between_p.  It needs a function comment as 
> well.
> 
> 
>> +static unsigned int
>> +rtx_is_zext_p (rtx insn)
>> +{
>> +  if (GET_CODE (insn) == AND)
>> +    {
>> +  rtx set = XEXP (insn, 0);
>> +  if (REG_P (set))
>> +    {
>> +  if (XEXP (insn, 1) == const1_rtx)
>> +    return 1;
>> +    }
>> +  else
>> +    return 0;
>> +    }
>> +
>> +  return 0;
>> +}
> So my comment from the prior version stands.  Testing for const1_rtx is just 
> wrong.  The optimization you're trying to perform (If I understand it 
> correctly) works for many other constants and the set of constants supported 
> will vary based on the input and output modes.
> 
> Similarly in rtx_is_zext_p.
> 
> You still have numerous formatting issues which makes reading the patch more 
> difficult than it should be.  Please review the formatting guidelines and 
> follow them.   In particular please review how to indent multi-line 
> conditionals.
> 
> 

Currently I support AND with const1_rtx. This is what is equivalent to zero 
extension instruction in power instruction set. When you specify many other 
constants and Could you please specify what other constants needs to be 
supported and how to determine on the Input and output modes.
> 
> 
> 
> You sti
>> @@ -698,6 +777,226 @@ get_sub_rtx (rtx_insn *def_insn)
>>     return sub_rtx;
>>   }
>>   +/* Check if the def insn is ASHIFT and LSHIFTRT.
>> +  Inputs: insn for which def has to be checked.
>> +  source operand rtx.
>> +   Output: True or false if def has arithmetic
>> +   peration like ASHIFT and LSHIFTRT.  */
> This still needs work.  Between the comments and code, I still don't know 
> what you're really trying to do here.  I can make some guesses, but it's 
> really your job to write clear comments about what you're doing so that a 
> review or someone looking at the code in the future don't have to guess.
> 
> It looks like you want to look at all the reaching definitions of INSN for 
> ORIG_SRC and if they are ASHIFT/LSHIFTRT do...  what?
> 
> Why are ASHIFT/LSHIFTRT interesting here?  Why are you looking for them?
> 
> 
> 
>> +
>> +/* Find feasibility of extension elimination
>> +   across basic blocks.
>> +   Input: candiate to check the feasibility.
>> +  def_insn of candidate.
>> +   Output: Returns true or false if feasible or not.  */
> Function comments sh

Re: [PATCH v3 3/4] ree: Main functionality to Improve ree pass for rs6000 target

2023-04-20 Thread Ajit Agarwal via Gcc-patches

Hello Jeff:


On 21/04/23 2:33 am, Ajit Agarwal via Gcc-patches wrote:
> Hello Jeff:
> 
> On 20/04/23 3:23 am, Jeff Law wrote:
>>
>>
>> On 4/19/23 12:00, Ajit Agarwal wrote:
>>> Hello All:
>>>
>>> This is patch-3 to improve ree pass for rs6000 target.
>>> Main functionality routines to imprve ree pass.
>>>
>>> Bootstrapped and regtested on powerpc64-gnu-linux.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>> ree: Improve ree pass for rs6000 target.
>>>
>>> For rs6000 target we see redundant zero and sign
>>> extension and done to improve ree pass to eliminate
>>> such redundant zero and sign extension. Support of
>>> zero_extend/sign_extend/AND.
>>>
>>> 2023-04-19  Ajit Kumar Agarwal  
>>>
>>> gcc/ChangeLog:
>>>
>>> * ree.cc (eliminate_across_bbs_p): Add checks to enable extension
>>> elimination across and within basic blocks.
>>> (def_arith_p): New function to check definition has arithmetic
>>> operation.
>>> (combine_set_extension): Modification to incorporate AND
>>> and current zero_extend and sign_extend instruction.
>>> (merge_def_and_ext): Add calls to eliminate_across_bbs_p and
>>> zero_extend sign_extend and AND instruction.
>>> (rtx_is_zext_p): New function.
>>> (reg_used_set_between_p): New function.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/zext-elim.C: New testcase.
>>> * g++.target/powerpc/zext-elim-1.C: New testcase.
>>> * g++.target/powerpc/zext-elim-2.C: New testcase.
>>> * g++.target/powerpc/sext-elim.C: New testcase.
>>> ---
>>>   gcc/ree.cc    | 451 --
>>>   gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
>>>   .../g++.target/powerpc/zext-elim-1.C  |  19 +
>>>   .../g++.target/powerpc/zext-elim-2.C  |  11 +
>>>   gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
>>>   5 files changed, 482 insertions(+), 47 deletions(-)
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
>>>
>>> diff --git a/gcc/ree.cc b/gcc/ree.cc
>>> index 413aec7c8eb..053db2e8ff3 100644
>>> --- a/gcc/ree.cc
>>> +++ b/gcc/ree.cc
>>> @@ -253,6 +253,71 @@ struct ext_cand
>>>     static int max_insn_uid;
>>>   +bool
>>> +reg_used_set_between_p (rtx set, rtx_insn *def_insn, rtx_insn *insn
>>> +{
>>> +  if (reg_used_between_p (set, def_insn, insn)
>>> +  || reg_set_between_p (set, def_insn, insn))
>>> +    return true;
>>> +
>>> +  return false;
>>> +}
>> This seems general enough that it should go into the same file as 
>> reg_used_between_p and reg_set_between_p.  It needs a function comment as 
>> well.
>>
>>
>>> +static unsigned int
>>> +rtx_is_zext_p (rtx insn)
>>> +{
>>> +  if (GET_CODE (insn) == AND)
>>> +    {
>>> +  rtx set = XEXP (insn, 0);
>>> +  if (REG_P (set))
>>> +    {
>>> +  if (XEXP (insn, 1) == const1_rtx)
>>> +    return 1;
>>> +    }
>>> +  else
>>> +    return 0;
>>> +    }
>>> +
>>> +  return 0;
>>> +}
>> So my comment from the prior version stands.  Testing for const1_rtx is just 
>> wrong.  The optimization you're trying to perform (If I understand it 
>> correctly) works for many other constants and the set of constants supported 
>> will vary based on the input and output modes.
>>
>> Similarly in rtx_is_zext_p.
>>
>> You still have numerous formatting issues which makes reading the patch more 
>> difficult than it should be.  Please review the formatting guidelines and 
>> follow them.   In particular please review how to indent multi-line 
>> conditionals.
>>
>>
> 
> Currently I support AND with const1_rtx. This is what is equivalent to zero 
> extension instruction in power instruction set. When you specify many other 
> constants and Could you please specify what other constants needs to be 
> supported and how to determine on the Input and output modes.
>>
On top of that I support eliminating zero_extend and sign_extend wherein if 
result mode of def insn not equal to source operand of zero_extend and 
sign_extend.

Thanks & Regards
Ajit
>>
>>
>> You sti
>>> @@ -698,6 +777,226 @@ get_sub_rtx (rtx_insn *def_insn)
>>>     return sub_rtx;
>>>   }
>>>   +/* Check if the def insn is ASHIFT and LSHIFTRT.
>>> +  Inputs: insn for which def has to be checked.
>>> +  source operand rtx.
>>> +   Output: True or false if def has arithmetic
>>> +   peration like ASHIFT and LSHIFTRT.  */
>> This still needs work.  Between the comments and code, I still don't know 
>> what you're really trying to do here.  I can make some guesses, but it's 
>> really your job to write clear comments about what you're doing so that a 
>> review or someone looking at the code in the future don't have to guess.
>>
>> It looks like you want to look at all the reaching definitions of INSN for 
>> ORIG_S

Re: [PATCH v3 3/4] ree: Main functionality to Improve ree pass for rs6000 target

2023-04-20 Thread Ajit Agarwal via Gcc-patches

Hello Jeff:

On 21/04/23 2:33 am, Ajit Agarwal wrote:
> Hello Jeff:
> 
> On 20/04/23 3:23 am, Jeff Law wrote:
>>
>>
>> On 4/19/23 12:00, Ajit Agarwal wrote:
>>> Hello All:
>>>
>>> This is patch-3 to improve ree pass for rs6000 target.
>>> Main functionality routines to imprve ree pass.
>>>
>>> Bootstrapped and regtested on powerpc64-gnu-linux.
>>>
>>> Thanks & Regards
>>> Ajit
>>>
>>> ree: Improve ree pass for rs6000 target.
>>>
>>> For rs6000 target we see redundant zero and sign
>>> extension and done to improve ree pass to eliminate
>>> such redundant zero and sign extension. Support of
>>> zero_extend/sign_extend/AND.
>>>
>>> 2023-04-19  Ajit Kumar Agarwal  
>>>
>>> gcc/ChangeLog:
>>>
>>> * ree.cc (eliminate_across_bbs_p): Add checks to enable extension
>>> elimination across and within basic blocks.
>>> (def_arith_p): New function to check definition has arithmetic
>>> operation.
>>> (combine_set_extension): Modification to incorporate AND
>>> and current zero_extend and sign_extend instruction.
>>> (merge_def_and_ext): Add calls to eliminate_across_bbs_p and
>>> zero_extend sign_extend and AND instruction.
>>> (rtx_is_zext_p): New function.
>>> (reg_used_set_between_p): New function.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * g++.target/powerpc/zext-elim.C: New testcase.
>>> * g++.target/powerpc/zext-elim-1.C: New testcase.
>>> * g++.target/powerpc/zext-elim-2.C: New testcase.
>>> * g++.target/powerpc/sext-elim.C: New testcase.
>>> ---
>>>   gcc/ree.cc    | 451 --
>>>   gcc/testsuite/g++.target/powerpc/sext-elim.C  |  18 +
>>>   .../g++.target/powerpc/zext-elim-1.C  |  19 +
>>>   .../g++.target/powerpc/zext-elim-2.C  |  11 +
>>>   gcc/testsuite/g++.target/powerpc/zext-elim.C  |  30 ++
>>>   5 files changed, 482 insertions(+), 47 deletions(-)
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/sext-elim.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-1.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim-2.C
>>>   create mode 100644 gcc/testsuite/g++.target/powerpc/zext-elim.C
>>>
>>> diff --git a/gcc/ree.cc b/gcc/ree.cc
>>> index 413aec7c8eb..053db2e8ff3 100644
>>> --- a/gcc/ree.cc
>>> +++ b/gcc/ree.cc
>>> @@ -253,6 +253,71 @@ struct ext_cand
>>>     static int max_insn_uid;
>>>   +bool
>>> +reg_used_set_between_p (rtx set, rtx_insn *def_insn, rtx_insn *insn
>>> +{
>>> +  if (reg_used_between_p (set, def_insn, insn)
>>> +  || reg_set_between_p (set, def_insn, insn))
>>> +    return true;
>>> +
>>> +  return false;
>>> +}
>> This seems general enough that it should go into the same file as 
>> reg_used_between_p and reg_set_between_p.  It needs a function comment as 
>> well.
>>
>>
>>> +static unsigned int
>>> +rtx_is_zext_p (rtx insn)
>>> +{
>>> +  if (GET_CODE (insn) == AND)
>>> +    {
>>> +  rtx set = XEXP (insn, 0);
>>> +  if (REG_P (set))
>>> +    {
>>> +  if (XEXP (insn, 1) == const1_rtx)
>>> +    return 1;
>>> +    }
>>> +  else
>>> +    return 0;
>>> +    }
>>> +
>>> +  return 0;
>>> +}
>> So my comment from the prior version stands.  Testing for const1_rtx is just 
>> wrong.  The optimization you're trying to perform (If I understand it 
>> correctly) works for many other constants and the set of constants supported 
>> will vary based on the input and output modes.
>>
>> Similarly in rtx_is_zext_p.
>>
>> You still have numerous formatting issues which makes reading the patch more 
>> difficult than it should be.  Please review the formatting guidelines and 
>> follow them.   In particular please review how to indent multi-line 
>> conditionals.
>>
>>
> 
> Currently I support AND with const1_rtx. This is what is equivalent to zero 
> extension instruction in power instruction set. When you specify many other 
> constants and Could you please specify what other constants needs to be 
> supported and how to determine on the Input and output modes.

On top of that I support eliminating zero_extend and sign_extend wherein if 
result mode of def insn not equal to source operand of zero_extend and 
sign_extend.

Thanks & Regards
Ajit
>>
>>
>>
>> You sti
>>> @@ -698,6 +777,226 @@ get_sub_rtx (rtx_insn *def_insn)
>>>     return sub_rtx;
>>>   }
>>>   +/* Check if the def insn is ASHIFT and LSHIFTRT.
>>> +  Inputs: insn for which def has to be checked.
>>> +  source operand rtx.
>>> +   Output: True or false if def has arithmetic
>>> +   peration like ASHIFT and LSHIFTRT.  */
>> This still needs work.  Between the comments and code, I still don't know 
>> what you're really trying to do here.  I can make some guesses, but it's 
>> really your job to write clear comments about what you're doing so that a 
>> review or someone looking at the code in the future don't have to guess.
>>
>> It looks like you want to look at all the reaching definitions of INSN for 
>> ORIG_SRC and if they a

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Siddhesh Poyarekar


On 2023-04-20 13:57, Siddhesh Poyarekar wrote:
For bounds that aren't representable, one could get error bounds from 
libm-test-ulps data in glibc, although I reckon those won't be 
exhaustive.  From a quick peek at the sin/cos data, the arc target seems 
to be among the worst performers at about 7ulps, although if you include 
the complex routines we get close to 13 ulps.  The very worst 
imprecision among all math routines (that's gamma) is at 16 ulps for 
power in glibc tests, so maybe allowing about 25-30 ulps error in bounds 
might work across the board.


I was thinking about this a bit more and it seems like limiting ranges 
to targets that can generate sane results (i.e. error bounds within, 
say, 5-6 ulps) and for the rest, avoid emitting the ranges altogether. 
Emitting a bad range for all architectures seems like a net worse 
solution again.


Thanks,
Sid

[PATCH] libstdc++: Synchronize PSTL with upstream (2nd attempt)

2023-04-20 Thread Thomas Rodgers via Gcc-patches

The attached (gzipped) patch brings libstdc++'s PSTL implementation up to
the current upstream version.

Tested x86_64-pc-linux-gnu, specifically with TBB 2020.3 (fedora 37 +
tbb-devel).


0001-libstdc-Synchronize-PSTL-with-upstream.patch.gz
Description: application/gzip

Re: Gcc-patches Digest, Vol 38, Issue 368

2023-04-20 Thread bot via Gcc-patches



publicKey - imtheboot1@protonmail.com - 0xAE052B51.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature

[committed v2] RISC-V: Handle multi-lib path correclty for linux [DRAFT]

2023-04-20 Thread Kito Cheng via Gcc-patches

---
 gcc/common/config/riscv/riscv-common.cc | 118 
 gcc/config/riscv/linux.h|  13 ++-
 2 files changed, 90 insertions(+), 41 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 309a52def75f..75bfe198d4c6 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1597,6 +1597,73 @@ riscv_check_conds (
   return match_score + ok_count * 100;
 }
 
+static const char *
+riscv_select_multilib_by_abi (
+  const std::string &riscv_current_arch_str,
+  const std::string &riscv_current_abi_str,
+  const riscv_subset_list *subset_list,
+  const struct switchstr *switches,
+  int n_switches,
+  const std::vector &multilib_infos
+   )
+{
+  for (size_t i = 0; i < multilib_infos.size (); ++i)
+if (riscv_current_abi_str == multilib_infos[i].abi_str)
+  return xstrdup (multilib_infos[i].path.c_str ());
+
+  return NULL;
+}
+
+
+static const char *
+riscv_select_multilib (
+  const std::string &riscv_current_arch_str,
+  const std::string &riscv_current_abi_str,
+  const riscv_subset_list *subset_list,
+  const struct switchstr *switches,
+  int n_switches,
+  const std::vector &multilib_infos
+   )
+{
+  int match_score = 0;
+  int max_match_score = 0;
+  int best_match_multi_lib = -1;
+  /* Try to decision which set we should used.  */
+  /* We have 3 level decision tree here, ABI, check input arch/ABI must
+ be superset of multi-lib arch, and other rest option checking.  */
+  for (size_t i = 0; i < multilib_infos.size (); ++i)
+{
+  /* Check ABI is same first.  */
+  if (riscv_current_abi_str != multilib_infos[i].abi_str)
+   continue;
+
+  /* Found a potential compatible multi-lib setting!
+Calculate the match score.  */
+  match_score = subset_list->match_score (multilib_infos[i].subset_list);
+
+  /* Checking other cond in the multi-lib setting.  */
+  match_score = riscv_check_conds (switches,
+  n_switches,
+  match_score,
+  multilib_infos[i].conds);
+
+  /* Record highest match score multi-lib setting.  */
+  if (match_score > max_match_score)
+   {
+ best_match_multi_lib = i;
+ max_match_score = match_score;
+   }
+}
+
+  if (best_match_multi_lib == -1)
+{
+  riscv_no_matched_multi_lib = true;
+  return NULL;
+}
+  else
+return xstrdup (multilib_infos[best_match_multi_lib].path.c_str ());
+}
+
 /* Implement TARGET_COMPUTE_MULTILIB.  */
 static const char *
 riscv_compute_multilib (
@@ -1621,6 +1688,12 @@ riscv_compute_multilib (
   std::string option_cond;
   riscv_multi_lib_info_t multilib_info;
 
+  bool check_abi_only = false;
+
+#if TARGET_LINUX == 1
+  check_abi_only = true;
+#endif
+
   /* Already found suitable, multi-lib, just use that.  */
   if (multilib_dir != NULL)
 return multilib_dir;
@@ -1672,7 +1745,11 @@ riscv_compute_multilib (
}
 
   this_path_len = p - this_path;
-  multilib_info.path = std::string (this_path, this_path_len);
+  const char *multi_os_dir_pos = (const char*)memchr (this_path, ':', 
this_path_len);
+  if (multi_os_dir_pos)
+   multilib_info.path = std::string (this_path, multi_os_dir_pos - 
this_path);
+  else
+   multilib_info.path = std::string (this_path, this_path_len);
 
   option_conds.clear ();
   /* Pasrse option check list into vector.
@@ -1707,43 +1784,10 @@ riscv_compute_multilib (
   p++;
 }
 
-  int match_score = 0;
-  int max_match_score = 0;
-  int best_match_multi_lib = -1;
-  /* Try to decision which set we should used.  */
-  /* We have 3 level decision tree here, ABI, check input arch/ABI must
- be superset of multi-lib arch, and other rest option checking.  */
-  for (size_t i = 0; i < multilib_infos.size (); ++i)
-{
-  /* Check ABI is same first.  */
-  if (riscv_current_abi_str != multilib_infos[i].abi_str)
-   continue;
-
-  /* Found a potential compatible multi-lib setting!
-Calculate the match score.  */
-  match_score = subset_list->match_score (multilib_infos[i].subset_list);
-
-  /* Checking other cond in the multi-lib setting.  */
-  match_score = riscv_check_conds (switches,
-  n_switches,
-  match_score,
-  multilib_infos[i].conds);
-
-  /* Record highest match score multi-lib setting.  */
-  if (match_score > max_match_score)
-   {
- best_match_multi_lib = i;
- max_match_score = match_score;
-   }
-}
-
-  if (best_match_multi_lib == -1)
-{
-  riscv_no_matched_multi_lib = true;
-  return multilib_dir;
-}
+  if (check_abi_only)
+return riscv_select_multilib_by_abi (riscv_current_arch_str, 
riscv_current_abi_str, subset_lis

Re: [committed v2] RISC-V: Handle multi-lib path correclty for linux [DRAFT]

2023-04-20 Thread Kito Cheng via Gcc-patches

Sorry, I didn't really commit this, it's send by accident since I give
wrong sha1

On Fri, Apr 21, 2023 at 2:47 PM Kito Cheng via Gcc-patches
 wrote:
>
> ---
>  gcc/common/config/riscv/riscv-common.cc | 118 
>  gcc/config/riscv/linux.h|  13 ++-
>  2 files changed, 90 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 309a52def75f..75bfe198d4c6 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -1597,6 +1597,73 @@ riscv_check_conds (
>return match_score + ok_count * 100;
>  }
>
> +static const char *
> +riscv_select_multilib_by_abi (
> +  const std::string &riscv_current_arch_str,
> +  const std::string &riscv_current_abi_str,
> +  const riscv_subset_list *subset_list,
> +  const struct switchstr *switches,
> +  int n_switches,
> +  const std::vector &multilib_infos
> +   )
> +{
> +  for (size_t i = 0; i < multilib_infos.size (); ++i)
> +if (riscv_current_abi_str == multilib_infos[i].abi_str)
> +  return xstrdup (multilib_infos[i].path.c_str ());
> +
> +  return NULL;
> +}
> +
> +
> +static const char *
> +riscv_select_multilib (
> +  const std::string &riscv_current_arch_str,
> +  const std::string &riscv_current_abi_str,
> +  const riscv_subset_list *subset_list,
> +  const struct switchstr *switches,
> +  int n_switches,
> +  const std::vector &multilib_infos
> +   )
> +{
> +  int match_score = 0;
> +  int max_match_score = 0;
> +  int best_match_multi_lib = -1;
> +  /* Try to decision which set we should used.  */
> +  /* We have 3 level decision tree here, ABI, check input arch/ABI must
> + be superset of multi-lib arch, and other rest option checking.  */
> +  for (size_t i = 0; i < multilib_infos.size (); ++i)
> +{
> +  /* Check ABI is same first.  */
> +  if (riscv_current_abi_str != multilib_infos[i].abi_str)
> +   continue;
> +
> +  /* Found a potential compatible multi-lib setting!
> +Calculate the match score.  */
> +  match_score = subset_list->match_score (multilib_infos[i].subset_list);
> +
> +  /* Checking other cond in the multi-lib setting.  */
> +  match_score = riscv_check_conds (switches,
> +  n_switches,
> +  match_score,
> +  multilib_infos[i].conds);
> +
> +  /* Record highest match score multi-lib setting.  */
> +  if (match_score > max_match_score)
> +   {
> + best_match_multi_lib = i;
> + max_match_score = match_score;
> +   }
> +}
> +
> +  if (best_match_multi_lib == -1)
> +{
> +  riscv_no_matched_multi_lib = true;
> +  return NULL;
> +}
> +  else
> +return xstrdup (multilib_infos[best_match_multi_lib].path.c_str ());
> +}
> +
>  /* Implement TARGET_COMPUTE_MULTILIB.  */
>  static const char *
>  riscv_compute_multilib (
> @@ -1621,6 +1688,12 @@ riscv_compute_multilib (
>std::string option_cond;
>riscv_multi_lib_info_t multilib_info;
>
> +  bool check_abi_only = false;
> +
> +#if TARGET_LINUX == 1
> +  check_abi_only = true;
> +#endif
> +
>/* Already found suitable, multi-lib, just use that.  */
>if (multilib_dir != NULL)
>  return multilib_dir;
> @@ -1672,7 +1745,11 @@ riscv_compute_multilib (
> }
>
>this_path_len = p - this_path;
> -  multilib_info.path = std::string (this_path, this_path_len);
> +  const char *multi_os_dir_pos = (const char*)memchr (this_path, ':', 
> this_path_len);
> +  if (multi_os_dir_pos)
> +   multilib_info.path = std::string (this_path, multi_os_dir_pos - 
> this_path);
> +  else
> +   multilib_info.path = std::string (this_path, this_path_len);
>
>option_conds.clear ();
>/* Pasrse option check list into vector.
> @@ -1707,43 +1784,10 @@ riscv_compute_multilib (
>p++;
>  }
>
> -  int match_score = 0;
> -  int max_match_score = 0;
> -  int best_match_multi_lib = -1;
> -  /* Try to decision which set we should used.  */
> -  /* We have 3 level decision tree here, ABI, check input arch/ABI must
> - be superset of multi-lib arch, and other rest option checking.  */
> -  for (size_t i = 0; i < multilib_infos.size (); ++i)
> -{
> -  /* Check ABI is same first.  */
> -  if (riscv_current_abi_str != multilib_infos[i].abi_str)
> -   continue;
> -
> -  /* Found a potential compatible multi-lib setting!
> -Calculate the match score.  */
> -  match_score = subset_list->match_score (multilib_infos[i].subset_list);
> -
> -  /* Checking other cond in the multi-lib setting.  */
> -  match_score = riscv_check_conds (switches,
> -  n_switches,
> -  match_score,
> -  multilib_infos[i].conds);
> -
> -  /* Record highest match

[committed v2] RISC-V: Add local user vsetvl instruction elimination [PR109547]

2023-04-20 Thread Kito Cheng via Gcc-patches

From: Juzhe-Zhong 

This patch is to enhance optimization for auto-vectorization.

Before this patch:

Loop:
vsetvl a5,a2...
vsetvl zero,a5...
vle

After this patch:

Loop:
vsetvl a5,a2
vle

gcc/ChangeLog:

PR target/109547
* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New 
function.
(vector_insn_info::skip_avl_compatible_p): Ditto.
(vector_insn_info::merge): Remove default value.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::cleanup_insns): Add local vsetvl elimination.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

PR target/109547
* gcc.target/riscv/rvv/vsetvl/pr109547.c: New.
* gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Update scan
condition.
---
 gcc/config/riscv/riscv-vsetvl.cc  | 71 ++-
 gcc/config/riscv/riscv-vsetvl.h   |  1 +
 .../gcc.target/riscv/rvv/vsetvl/pr109547.c| 14 
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-17.c   |  2 +-
 4 files changed, 85 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109547.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9c356ce51579..2406931dac01 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1054,6 +1054,51 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   change_insn (rinsn, new_pat);
 }
 
+static void
+local_eliminate_vsetvl_insn (const vector_insn_info &dem)
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+   {
+ if (i->is_call () || i->is_asm ()
+ || find_access (i->defs (), VL_REGNUM)
+ || find_access (i->defs (), VTYPE_REGNUM))
+   return;
+
+ if (has_vtype_op (i->rtl ()))
+   {
+ if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+   return;
+ rtx avl = get_avl (i->rtl ());
+ if (avl != vl)
+   return;
+ set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+ if (def->insn () != insn)
+   return;
+
+ vector_insn_info new_info;
+ new_info.parse_insn (i);
+ if (!new_info.skip_avl_compatible_p (dem))
+   return;
+
+ new_info.set_avl_info (dem.get_avl_info ());
+ new_info = dem.merge (new_info, LOCAL_MERGE);
+ change_vsetvl_insn (insn, new_info);
+ eliminate_insn (PREV_INSN (i->rtl ()));
+ return;
+   }
+   }
+}
+}
+
 static bool
 source_equal_p (insn_info *insn1, insn_info *insn2)
 {
@@ -1996,6 +2041,19 @@ vector_insn_info::compatible_p (const vector_insn_info 
&other) const
   return true;
 }
 
+bool
+vector_insn_info::skip_avl_compatible_p (const vector_insn_info &other) const
+{
+  gcc_assert (valid_or_dirty_p () && other.valid_or_dirty_p ()
+ && "Can't compare invalid demanded infos");
+  unsigned array_size = sizeof (incompatible_conds) / sizeof (demands_cond);
+  /* Bypass AVL incompatible cases.  */
+  for (unsigned i = 1; i < array_size; i++)
+if (incompatible_conds[i].dual_incompatible_p (*this, other))
+  return false;
+  return true;
+}
+
 bool
 vector_insn_info::compatible_avl_p (const vl_vtype_info &other) const
 {
@@ -2190,7 +2248,7 @@ vector_insn_info::fuse_mask_policy (const 
vector_insn_info &info1,
 
 vector_insn_info
 vector_insn_info::merge (const vector_insn_info &merge_info,
-enum merge_type type = LOCAL_MERGE) const
+enum merge_type type) const
 {
   if (!vsetvl_insn_p (get_insn ()->rtl ()))
 gcc_assert (this->compatible_p (merge_info)
@@ -2696,7 +2754,7 @@ pass_vsetvl::compute_local_backward_infos (const bb_info 
*bb)
&& !reg_available_p (insn, change))
  && change.compatible_p (info))
{
- info = change.merge (info);
+ info = change.merge (info, LOCAL_MERGE);
  /* Fix PR109399, we should update user vsetvl instruction
 if there is a change in demand fusion.  */
  if (vsetvl_insn_p (insn->rtl ()))
@@ -3925,6 +3983,15 @@ pass_vsetvl::cleanup_insns (void) const
   for (insn_info *insn : bb->real_nondebug_insns ())
{
  rtx_insn *rinsn = insn->rtl ();
+ const auto &dem = m_vector_manager->vector_insn_infos[insn->uid ()];
+ /* Eliminate local vsetvl:
+  bb 0:
+  vsetvl a5,a6,...
+  vsetvl zero,a5.
+
+Eliminate

Re: [PATCH] RISC-V: Add local user vsetvl instruction elimination

2023-04-20 Thread Kito Cheng via Gcc-patches

Committed with an extra testcase from PR109547

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616363.html

On Fri, Apr 7, 2023 at 9:34 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is to enhance optimization for auto-vectorization.
>
> Before this patch:
>
> Loop:
> vsetvl a5,a2...
> vsetvl zero,a5...
> vle
>
> After this patch:
>
> Loop:
> vsetvl a5,a2
> vle
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New 
> function.
> (vector_insn_info::skip_avl_compatible_p): Ditto.
> (vector_insn_info::merge): Remove default value.
> (pass_vsetvl::compute_local_backward_infos): Ditto.
> (pass_vsetvl::cleanup_insns): Add local vsetvl elimination.
> * config/riscv/riscv-vsetvl.h: Ditto.
>
> ---
>  gcc/config/riscv/riscv-vsetvl.cc | 71 +++-
>  gcc/config/riscv/riscv-vsetvl.h  |  1 +
>  2 files changed, 70 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vsetvl.cc 
> b/gcc/config/riscv/riscv-vsetvl.cc
> index 7e8a5376705..b402035f7a5 100644
> --- a/gcc/config/riscv/riscv-vsetvl.cc
> +++ b/gcc/config/riscv/riscv-vsetvl.cc
> @@ -1054,6 +1054,51 @@ change_vsetvl_insn (const insn_info *insn, const 
> vector_insn_info &info)
>change_insn (rinsn, new_pat);
>  }
>
> +static void
> +local_eliminate_vsetvl_insn (const vector_insn_info &dem)
> +{
> +  const insn_info *insn = dem.get_insn ();
> +  if (!insn || insn->is_artificial ())
> +return;
> +  rtx_insn *rinsn = insn->rtl ();
> +  const bb_info *bb = insn->bb ();
> +  if (vsetvl_insn_p (rinsn))
> +{
> +  rtx vl = get_vl (rinsn);
> +  for (insn_info *i = insn->next_nondebug_insn ();
> +  real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
> +   {
> + if (i->is_call () || i->is_asm ()
> + || find_access (i->defs (), VL_REGNUM)
> + || find_access (i->defs (), VTYPE_REGNUM))
> +   return;
> +
> + if (has_vtype_op (i->rtl ()))
> +   {
> + if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
> +   return;
> + rtx avl = get_avl (i->rtl ());
> + if (avl != vl)
> +   return;
> + set_info *def = find_access (i->uses (), REGNO (avl))->def ();
> + if (def->insn () != insn)
> +   return;
> +
> + vector_insn_info new_info;
> + new_info.parse_insn (i);
> + if (!new_info.skip_avl_compatible_p (dem))
> +   return;
> +
> + new_info.set_avl_info (dem.get_avl_info ());
> + new_info = dem.merge (new_info, LOCAL_MERGE);
> + change_vsetvl_insn (insn, new_info);
> + eliminate_insn (PREV_INSN (i->rtl ()));
> + return;
> +   }
> +   }
> +}
> +}
> +
>  static bool
>  source_equal_p (insn_info *insn1, insn_info *insn2)
>  {
> @@ -1984,6 +2029,19 @@ vector_insn_info::compatible_p (const vector_insn_info 
> &other) const
>return true;
>  }
>
> +bool
> +vector_insn_info::skip_avl_compatible_p (const vector_insn_info &other) const
> +{
> +  gcc_assert (valid_or_dirty_p () && other.valid_or_dirty_p ()
> + && "Can't compare invalid demanded infos");
> +  unsigned array_size = sizeof (incompatible_conds) / sizeof (demands_cond);
> +  /* Bypass AVL incompatible cases.  */
> +  for (unsigned i = 1; i < array_size; i++)
> +if (incompatible_conds[i].dual_incompatible_p (*this, other))
> +  return false;
> +  return true;
> +}
> +
>  bool
>  vector_insn_info::compatible_avl_p (const vl_vtype_info &other) const
>  {
> @@ -2178,7 +2236,7 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type = LOCAL_MERGE) const
> +enum merge_type type) const
>  {
>if (!vsetvl_insn_p (get_insn ()->rtl ()))
>  gcc_assert (this->compatible_p (merge_info)
> @@ -2716,7 +2774,7 @@ pass_vsetvl::compute_local_backward_infos (const 
> bb_info *bb)
> && !reg_available_p (insn, change))
>   && change.compatible_p (info))
> {
> - info = change.merge (info);
> + info = change.merge (info, LOCAL_MERGE);
>   /* Fix PR109399, we should update user vsetvl instruction
>  if there is a change in demand fusion.  */
>   if (vsetvl_insn_p (insn->rtl ()))
> @@ -3998,6 +4056,15 @@ pass_vsetvl::cleanup_insns (void) const
>for (insn_info *insn : bb->real_nondebug_insns ())
> {
>   rtx_insn *rinsn = insn->rtl ();
> + const auto &dem = m_vector_manager->vector_insn_infos[insn->uid ()];
> + /* Eliminate local vsetvl:
> +  bb 0:
> +  vsetvl a5,a6,...
> +  vsetvl zero,a5.
> +
> +

Re: [committed v2] RISC-V: Add local user vsetvl instruction elimination [PR109547]

2023-04-20 Thread juzhe.zh...@rivai.ai

LGTM。



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-04-21 14:49
To: gcc-patches
CC: Juzhe-Zhong
Subject: [committed v2] RISC-V: Add local user vsetvl instruction elimination 
[PR109547]
From: Juzhe-Zhong 
 
This patch is to enhance optimization for auto-vectorization.
 
Before this patch:
 
Loop:
vsetvl a5,a2...
vsetvl zero,a5...
vle
 
After this patch:
 
Loop:
vsetvl a5,a2
vle
 
gcc/ChangeLog:
 
PR target/109547
* config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New function.
(vector_insn_info::skip_avl_compatible_p): Ditto.
(vector_insn_info::merge): Remove default value.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::cleanup_insns): Add local vsetvl elimination.
* config/riscv/riscv-vsetvl.h: Ditto.
 
gcc/testsuite/ChangeLog:
 
PR target/109547
* gcc.target/riscv/rvv/vsetvl/pr109547.c: New.
* gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Update scan
condition.
---
gcc/config/riscv/riscv-vsetvl.cc  | 71 ++-
gcc/config/riscv/riscv-vsetvl.h   |  1 +
.../gcc.target/riscv/rvv/vsetvl/pr109547.c| 14 
.../gcc.target/riscv/rvv/vsetvl/vsetvl-17.c   |  2 +-
4 files changed, 85 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109547.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9c356ce51579..2406931dac01 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1054,6 +1054,51 @@ change_vsetvl_insn (const insn_info *insn, const 
vector_insn_info &info)
   change_insn (rinsn, new_pat);
}
+static void
+local_eliminate_vsetvl_insn (const vector_insn_info &dem)
+{
+  const insn_info *insn = dem.get_insn ();
+  if (!insn || insn->is_artificial ())
+return;
+  rtx_insn *rinsn = insn->rtl ();
+  const bb_info *bb = insn->bb ();
+  if (vsetvl_insn_p (rinsn))
+{
+  rtx vl = get_vl (rinsn);
+  for (insn_info *i = insn->next_nondebug_insn ();
+real_insn_and_same_bb_p (i, bb); i = i->next_nondebug_insn ())
+ {
+   if (i->is_call () || i->is_asm ()
+   || find_access (i->defs (), VL_REGNUM)
+   || find_access (i->defs (), VTYPE_REGNUM))
+ return;
+
+   if (has_vtype_op (i->rtl ()))
+ {
+   if (!vsetvl_discard_result_insn_p (PREV_INSN (i->rtl (
+ return;
+   rtx avl = get_avl (i->rtl ());
+   if (avl != vl)
+ return;
+   set_info *def = find_access (i->uses (), REGNO (avl))->def ();
+   if (def->insn () != insn)
+ return;
+
+   vector_insn_info new_info;
+   new_info.parse_insn (i);
+   if (!new_info.skip_avl_compatible_p (dem))
+ return;
+
+   new_info.set_avl_info (dem.get_avl_info ());
+   new_info = dem.merge (new_info, LOCAL_MERGE);
+   change_vsetvl_insn (insn, new_info);
+   eliminate_insn (PREV_INSN (i->rtl ()));
+   return;
+ }
+ }
+}
+}
+
static bool
source_equal_p (insn_info *insn1, insn_info *insn2)
{
@@ -1996,6 +2041,19 @@ vector_insn_info::compatible_p (const vector_insn_info 
&other) const
   return true;
}
+bool
+vector_insn_info::skip_avl_compatible_p (const vector_insn_info &other) const
+{
+  gcc_assert (valid_or_dirty_p () && other.valid_or_dirty_p ()
+   && "Can't compare invalid demanded infos");
+  unsigned array_size = sizeof (incompatible_conds) / sizeof (demands_cond);
+  /* Bypass AVL incompatible cases.  */
+  for (unsigned i = 1; i < array_size; i++)
+if (incompatible_conds[i].dual_incompatible_p (*this, other))
+  return false;
+  return true;
+}
+
bool
vector_insn_info::compatible_avl_p (const vl_vtype_info &other) const
{
@@ -2190,7 +2248,7 @@ vector_insn_info::fuse_mask_policy (const 
vector_insn_info &info1,
vector_insn_info
vector_insn_info::merge (const vector_insn_info &merge_info,
- enum merge_type type = LOCAL_MERGE) const
+ enum merge_type type) const
{
   if (!vsetvl_insn_p (get_insn ()->rtl ()))
 gcc_assert (this->compatible_p (merge_info)
@@ -2696,7 +2754,7 @@ pass_vsetvl::compute_local_backward_infos (const bb_info 
*bb)
&& !reg_available_p (insn, change))
  && change.compatible_p (info))
{
-   info = change.merge (info);
+   info = change.merge (info, LOCAL_MERGE);
  /* Fix PR109399, we should update user vsetvl instruction
 if there is a change in demand fusion.  */
  if (vsetvl_insn_p (insn->rtl ()))
@@ -3925,6 +3983,15 @@ pass_vsetvl::cleanup_insns (void) const
   for (insn_info *insn : bb->real_nondebug_insns ())
{
  rtx_insn *rinsn = insn->rtl ();
+   const auto &dem = m_vector_manager->vector_insn_infos[insn->uid ()];
+   /* Eliminate local vsetvl:
+bb 0:
+vsetvl a5,a6,...
+vsetvl zero,a5.
+
+  Eliminate vsetvl in bb2 when a5 is only coming from
+  bb 0.  */
+   local_eliminate_vsetvl_insn (dem);
  if (vlmax_avl_insn_p (rinsn))
{
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 237381f7026b..4fe08cfc789d 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h

Re: [PATCH] Implement range-op entry for sin/cos.

2023-04-20 Thread Jakub Jelinek via Gcc-patches

On Thu, Apr 20, 2023 at 09:14:10PM -0400, Siddhesh Poyarekar wrote:
> On 2023-04-20 13:57, Siddhesh Poyarekar wrote:
> > For bounds that aren't representable, one could get error bounds from
> > libm-test-ulps data in glibc, although I reckon those won't be
> > exhaustive.  From a quick peek at the sin/cos data, the arc target seems
> > to be among the worst performers at about 7ulps, although if you include
> > the complex routines we get close to 13 ulps.  The very worst
> > imprecision among all math routines (that's gamma) is at 16 ulps for
> > power in glibc tests, so maybe allowing about 25-30 ulps error in bounds
> > might work across the board.
> 
> I was thinking about this a bit more and it seems like limiting ranges to
> targets that can generate sane results (i.e. error bounds within, say, 5-6
> ulps) and for the rest, avoid emitting the ranges altogether. Emitting a bad
> range for all architectures seems like a net worse solution again.

Well, at least for basic arithmetics when libm functions aren't involved,
there is no point in disabling ranges altogether.
And, for libm functions, my plan was to introduce a target hook, which
would have combined_fn argument to tell which function is queried,
machine_mode to say which floating point format and perhaps a bool whether
it is ulps for these basic math boundaries or results somewhere in between,
and would return in unsigned int ulps, 0 for 0.5ulps precision.
So, we could say for CASE_CFN_SIN: CASE_CFN_COS: in the glibc handler
say that ulps is say 3 inside of the ranges and 0 on the boundaries if
!flag_rounding_math and 6 and 2 with flag_rounding_math or whatever.
And in the generic code don't assume anything if ulps is say 100 or more.
The hooks would need to be a union of precision of supported versions of
the library through the history, including say libmvec because function
calls could be vectorized.
And default could be that infinite precision.
Back in November I've posted a proglet that can generate some ulps from
random number testing, plus on glibc we could pick maximums from ulps files.
And if needed, say powerpc*-linux could override the generic glibc
version for some subset of functions and call default otherwise (say at
least for __ibm128).

Jakub

99 matches

Mail list logo