Re: [PATCH v9 5/5] Add the 6th argument to .ACCESS_WITH_SIZE

2024-05-29 Thread Richard Biener
On Tue, May 28, 2024 at 11:10 PM Qing Zhao  wrote:
>
>
>
> > On May 28, 2024, at 03:43, Richard Biener  
> > wrote:
> >
> > On Fri, Apr 12, 2024 at 3:55 PM Qing Zhao  wrote:
> >>
> >> to carry the TYPE of the flexible array.
> >>
> >> Such information is needed during tree-object-size.cc.
> >>
> >> We cannot use the result type or the type of the 1st argument
> >> of the routine .ACCESS_WITH_SIZE to decide the element type
> >> of the original array due to possible type casting in the
> >> source code.
> >
> > OK.  I guess technically an empty CONSTRUCTOR of the array type
> > would work as well (as aggregate it's fine to have it in the call) but a
> > constant zero pointer might be cheaper to have as it's shared across
> > multiple calls.
>
> So, I consider this as an approval? -:)

yes

> thanks.
>
> Qing
> >
> > Richard.
> >
> >> gcc/c/ChangeLog:
> >>
> >>* c-typeck.cc (build_access_with_size_for_counted_by): Add the 6th
> >>argument to .ACCESS_WITH_SIZE.
> >>
> >> gcc/ChangeLog:
> >>
> >>* tree-object-size.cc (access_with_size_object_size): Use the type
> >>of the 6th argument for the type of the element.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* gcc.dg/flex-array-counted-by-6.c: New test.
> >> ---
> >> gcc/c/c-typeck.cc | 11 +++--
> >> gcc/internal-fn.cc|  2 +
> >> .../gcc.dg/flex-array-counted-by-6.c  | 46 +++
> >> gcc/tree-object-size.cc   | 16 ---
> >> 4 files changed, 66 insertions(+), 9 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
> >>
> >> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> >> index ff6685c6c4ba..0ea3b75355a4 100644
> >> --- a/gcc/c/c-typeck.cc
> >> +++ b/gcc/c/c-typeck.cc
> >> @@ -2640,7 +2640,8 @@ build_counted_by_ref (tree datum, tree subdatum, 
> >> tree *counted_by_type)
> >>
> >>to:
> >>
> >> -   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1))
> >> +   (*.ACCESS_WITH_SIZE (REF, COUNTED_BY_REF, 1, (TYPE_OF_SIZE)0, -1,
> >> +   (TYPE_OF_ARRAY *)0))
> >>
> >>NOTE: The return type of this function is the POINTER type pointing
> >>to the original flexible array type.
> >> @@ -2652,6 +2653,9 @@ build_counted_by_ref (tree datum, tree subdatum, 
> >> tree *counted_by_type)
> >>The 4th argument of the call is a constant 0 with the TYPE of the
> >>object pointed by COUNTED_BY_REF.
> >>
> >> +   The 6th argument of the call is a constant 0 with the pointer TYPE
> >> +   to the original flexible array type.
> >> +
> >>   */
> >> static tree
> >> build_access_with_size_for_counted_by (location_t loc, tree ref,
> >> @@ -2664,12 +2668,13 @@ build_access_with_size_for_counted_by (location_t 
> >> loc, tree ref,
> >>
> >>   tree call
> >> = build_call_expr_internal_loc (loc, IFN_ACCESS_WITH_SIZE,
> >> -   result_type, 5,
> >> +   result_type, 6,
> >>array_to_pointer_conversion (loc, ref),
> >>counted_by_ref,
> >>build_int_cst (integer_type_node, 1),
> >>build_int_cst (counted_by_type, 0),
> >> -   build_int_cst (integer_type_node, -1));
> >> +   build_int_cst (integer_type_node, -1),
> >> +   build_int_cst (result_type, 0));
> >>   /* Wrap the call with an INDIRECT_REF with the flexible array type.  */
> >>   call = build1 (INDIRECT_REF, TREE_TYPE (ref), call);
> >>   SET_EXPR_LOCATION (call, loc);
> >> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> >> index e744080ee670..34e4a4aea534 100644
> >> --- a/gcc/internal-fn.cc
> >> +++ b/gcc/internal-fn.cc
> >> @@ -3411,6 +3411,8 @@ expand_DEFERRED_INIT (internal_fn, gcall *stmt)
> >>  1: read_only
> >>  2: write_only
> >>  3: read_write
> >> +   6th argument: A constant 0 with the pointer TYPE to the original 
> >> flexible
> >> + array type.
> >>
> >>Both the return type and the type of the first argument of this
> >>function have been converted from the incomplete array type to
> >> diff --git a/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c 
> >> b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
> >> new file mode 100644
> >> index ..65fa01443d95
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/flex-array-counted-by-6.c
> >> @@ -0,0 +1,46 @@
> >> +/* Test the attribute counted_by and its usage in
> >> + * __builtin_dynamic_object_size: when the type of the flexible array 
> >> member
> >> + * is casting to another type.  */
> >> +/* { dg-do run } */
> >> +/* { dg-options "-O2" } */
> >> +
> >> +#include "builtin-object-size-common.h"
> >> +
> >> +typedef unsigned short u16;
> >> +
> >> +struct info {
> >> +   u16 data_len;
> >> +   char data[] _

[PATCHv5] Optab: add isnormal_optab for __builtin_isnormal

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds an optab for __builtin_isnormal. The normal check can be
implemented on rs6000 by a single instruction. It needs an optab to be
expanded to the certain sequence of instructions.

  The subsequent patches will implement the expand on rs6000.

  Compared to previous version, the main change is to specify return
value of the optab should be either 0 or 1.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652865.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is this OK for trunk?

Thanks
Gui Haochen

ChangeLog
optab: Add isnormal_optab for isnormal builtin

gcc/
* builtins.cc (interclass_mathfn_icode): Set optab to isnormal_optab
for isnormal builtin.
* optabs.def (isnormal_optab): New.
* doc/md.texi (isnormal): Document.


patch.diff
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 53e9d210541..89ba56abf17 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2463,6 +2463,8 @@ interclass_mathfn_icode (tree arg, tree fndecl)
   builtin_optab = isfinite_optab;
   break;
 case BUILT_IN_ISNORMAL:
+  builtin_optab = isnormal_optab;
+  break;
 CASE_FLT_FN (BUILT_IN_FINITE):
 case BUILT_IN_FINITED32:
 case BUILT_IN_FINITED64:
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 3eb4216141e..4fd7da095fe 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8563,6 +8563,12 @@ Return 1 if operand 1 is a finite floating point number 
and 0
 otherwise.  @var{m} is a scalar floating point mode.  Operand 0
 has mode @code{SImode}, and operand 1 has mode @var{m}.

+@cindex @code{isnormal@var{m}2} instruction pattern
+@item @samp{isnormal@var{m}2}
+Return 1 if operand 1 is a normal floating point number and 0
+otherwise.  @var{m} is a scalar floating point mode.  Operand 0
+has mode @code{SImode}, and operand 1 has mode @var{m}.
+
 @end table

 @end ifset
diff --git a/gcc/optabs.def b/gcc/optabs.def
index dcd77315c2a..3c401fc0b4c 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -353,6 +353,7 @@ OPTAB_D (hypot_optab, "hypot$a3")
 OPTAB_D (ilogb_optab, "ilogb$a2")
 OPTAB_D (isinf_optab, "isinf$a2")
 OPTAB_D (isfinite_optab, "isfinite$a2")
+OPTAB_D (isnormal_optab, "isnormal$a2")
 OPTAB_D (issignaling_optab, "issignaling$a2")
 OPTAB_D (ldexp_optab, "ldexp$a3")
 OPTAB_D (log10_optab, "log10$a2")


Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Richard Biener
On Tue, May 28, 2024 at 6:11 PM Feng Xue OS  wrote:
>
> Because bbs of loop_vec_info need to be allocated via old-fashion
> XCNEWVEC, in order to receive result from dfs_enumerate_from(),
> so have to make bb_vec_info align with loop_vec_info, use
> basic_block * instead of vec. Another reason is that
> some loop vect related codes assume that bbs is a pointer, such
> as using LOOP_VINFO_BBS() to directly free the bbs area.

I think dfs_enumerate_from is fine with receiving bbs.address ()
(if you first grow the vector, of course).  There might be other code
that needs changing, sure.

> While encapsulating bbs into array_slice might make changed code
> more wordy. So still choose basic_block * as its type. Updated the
> patch by removing bbs_as_vector.

The updated patch looks good to me.  Lifetime management of
the base class bbs done differently by _loop_vec_info and _bb_vec_info
is a bit ugly but it's a well isolated fact.

Thus, OK.

I do think we can turn the basic_block * back to a vec<> but this
can be done as followup if anybody has spare cycles.

Thanks,
Richard.

> Feng.
> 
> gcc/
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
> initialization of bbs to explicit construction code.  Adjust the
> definition of nbbs.
> (update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
> * tree-vect-pattern.cc (vect_determine_precisions): Make
> loop_vec_info and bb_vec_info share same code.
> (vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
> via base vec_info class.
> (_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
> fields of input auto_vec<> bbs.
> (vect_slp_region): Use access to nbbs to replace original
> bbs.length().
> (vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
> * tree-vectorizer.cc (vec_info::vec_info): Add initialization of
> bbs and nbbs.
> (vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
> class.
> * tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
> (LOOP_VINFO_NBBS): New macro.
> (BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
> (BB_VINFO_NBBS): New macro.
> (_loop_vec_info): Remove field bbs.
> (_bb_vec_info): Rename field bbs.
> ---
>  gcc/tree-vect-loop.cc |   7 +-
>  gcc/tree-vect-patterns.cc | 142 +++---
>  gcc/tree-vect-slp.cc  |  23 +++---
>  gcc/tree-vectorizer.cc|   7 +-
>  gcc/tree-vectorizer.h |  19 +++--
>  5 files changed, 70 insertions(+), 128 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3b94bb13a8b..04a9ac64df7 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
>  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>: vec_info (vec_info::loop, shared),
>  loop (loop_in),
> -bbs (XCNEWVEC (basic_block, loop->num_nodes)),
>  num_itersm1 (NULL_TREE),
>  num_iters (NULL_TREE),
>  num_iters_unchanged (NULL_TREE),
> @@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>   case of the loop forms we allow, a dfs order of the BBs would the same
>   as reversed postorder traversal, so we are safe.  */
>
> -  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> - bbs, loop->num_nodes, loop);
> +  bbs = XCNEWVEC (basic_block, loop->num_nodes);
> +  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
> +loop->num_nodes, loop);
>gcc_assert (nbbs == loop->num_nodes);
>
>for (unsigned int i = 0; i < nbbs; i++)
> @@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, 
> tree advance)
>
>free (LOOP_VINFO_BBS (epilogue_vinfo));
>LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
> +  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;
>
>/* Advance data_reference's with the number of iterations of the previous
>   loop and its prologue.  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 8929e5aa7f3..88e7e34d78d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
> stmt_vec_info stmt_info)
>  void
>  vect_determine_precisions (vec_info *vinfo)
>  {
> +  basic_block *bbs = vinfo->bbs;
> +  unsigned int nbbs = vinfo->nbbs;
> +
>DUMP_VECT_SCOPE ("vect_determine_precisions");
>
> -  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
> +  for (unsigned int i = 0; i < nbbs; i++)
>  {
> -  class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  basic_block *bbs = LOOP_VINFO_BB

Re: [COMMITTED] tree-optimization/115221 - Do not invoke SCEV if it will use a different range query.

2024-05-29 Thread Richard Biener
On Tue, May 28, 2024 at 8:57 PM Andrew MacLeod  wrote:
>
> The original patch causing the PR made  ranger's cache re-entrant to
> enable SCEV to use the current range_query when called from within ranger..
>
> SCEV uses the currently active range query (via get_range_query()) for
> picking up values.  fold_using_range is the general purpose stmt folder
> many  components use, and it takes a range_query to use for folding.
> When propagating values in the cache, we need to ensure no new queries
> are invoked, and when the cache is propagating and calculating outgoing
> edges, it switches to a read only range_query which uses what it knows
> about global values to come up with best result using current state.
>
> SCEV is unaware of what the caller is using for a range_query, so when
> attempting to fold a PHI node, it is re-invoking the current query
> during propagation which is undesired behavior.   This patch tells
> fold_using_range to not use SCEV if the range_query being used is not
> the same as the one SCEV is going to use.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Can we dump a hint to an active dump-file if this happens?  I suppose it's
an unwanted situation, like the pass not setting the active ranger?  Sth
like

   if (src.query () != get_range_query (cfun)
   && dump_file)
fprintf (dump_file, "Using a range query different from the
installed one\n");

(or better wording).

Btw, could we install src.query () as the global range query around the
relevant recursion or is the place around where we'd need to do this
not so clear-cut?

Richard.

> Andrew


Re: [PATCH v2 2/2] Prevent divide-by-zero

2024-05-29 Thread Richard Biener
On Wed, May 29, 2024 at 1:39 AM Patrick O'Neill  wrote:
>
> From: Greg McGary 
>
> gcc/ChangeLog:
> * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
> divide-by-zero.
> * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove xfail.
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
>  gcc/tree-vect-stmts.cc  | 3 ++-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> index fd996a27501..79d03612a22 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
> @@ -1,6 +1,5 @@
>  /* { dg-do compile } */
>  /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
> -mno-autovec-segment" } */
> -/* { xfail *-*-* } */
>
>  enum e { c, d };
>  enum g { f };
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 4219ad832db..34f5736ba00 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
>  - (vec_num * j + i) * nunits);
> /* remain should now be > 0 and < nunits.  */

^^^

> unsigned num;
> -   if (constant_multiple_p (nunits, remain, &num))
> +   if (known_gt (remain, 0)

So this shouldn't happen.  Do you have a testcase where this triggers?
If < nunits doesn't hold things will also go wrong.

Richard.


> +   && constant_multiple_p (nunits, remain, &num))
>   {
> tree ptype;
> new_vtype
> --
> 2.43.2
>


Re: [Patch, PR Fortran/90069] Polymorphic Return Type Memory Leak Without Intermediate Variable

2024-05-29 Thread Andre Vehreschild
Hi Harald,

thanks for the review. Very much appreciated.

Commited as 2f97d98d174e3ef9f3a9a83c179d787abde5e066.

I have some patches for memory leaks I will post in the next days. I am
inclined to backport them together to 14-line, if no new bugs arise.

About the SAVE_EXPR, Richard Biener shed some light. Thanks you very much for
that.

Regards,
Andre

On Tue, 28 May 2024 21:45:56 +0200
Harald Anlauf  wrote:

> Hi Andre,
>
> On 5/28/24 14:10, Andre Vehreschild wrote:
> > Hi all,
> >
> > the attached patch fixes a memory leak with unlimited polymorphic return
> > types. The leak occurred, because an expression with side-effects was
> > evaluated twice. I have substituted the check for non-variable expressions
> > followed by creating a SAVE_EXPR with checking for trees with side effects
> > and creating temp. variable and freeing the memory.
>
> this looks good to me.  It also solves the runtime memory leak in
> testcase pr114012.f90 .  Nice!
>
> > Btw, I do not get the SAVE_EXPR in the old code. Is there something missing
> > to manifest it or is a SAVE_EXPR not meant to be evaluated twice?
>
> I was assuming that the comment in gcc/tree.h applies here:
>
> /* save_expr (EXP) returns an expression equivalent to EXP
> but it can be used multiple times within context CTX
> and only evaluate EXP once.  */
>
> I do not know what the practical difference between a SAVE_EXPR
> and a temporary explicitly evaluated once (which you have now)
> is, except that you can free the temporary cleanly.
>
> > Anyway, regtested ok on Linux-x86_64-Fedora_39. Ok for master?
>
> Yes, this is fine from my side.  If you are inclined to backport
> to e.g. 14-branch after a grace period, that would be great.
>
> > This work is funded by the Souvereign Tech Fund. Yes, the funding has been
> > granted and Nicolas, Mikael and me will be working on some Fortran topics in
> > the next 12-18 months.
>
> This is really great news!
>
> > Regards,
> > Andre
>
> Thanks for the patch!
>
> Harald
>
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Feng Xue OS
Ok. Then I will add a TODO comment on "bbs" field to describe it.

Thanks,
Feng



From: Richard Biener 
Sent: Wednesday, May 29, 2024 3:14 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

On Tue, May 28, 2024 at 6:11 PM Feng Xue OS  wrote:
>
> Because bbs of loop_vec_info need to be allocated via old-fashion
> XCNEWVEC, in order to receive result from dfs_enumerate_from(),
> so have to make bb_vec_info align with loop_vec_info, use
> basic_block * instead of vec. Another reason is that
> some loop vect related codes assume that bbs is a pointer, such
> as using LOOP_VINFO_BBS() to directly free the bbs area.

I think dfs_enumerate_from is fine with receiving bbs.address ()
(if you first grow the vector, of course).  There might be other code
that needs changing, sure.

> While encapsulating bbs into array_slice might make changed code
> more wordy. So still choose basic_block * as its type. Updated the
> patch by removing bbs_as_vector.

The updated patch looks good to me.  Lifetime management of
the base class bbs done differently by _loop_vec_info and _bb_vec_info
is a bit ugly but it's a well isolated fact.

Thus, OK.

I do think we can turn the basic_block * back to a vec<> but this
can be done as followup if anybody has spare cycles.

Thanks,
Richard.

> Feng.
> 
> gcc/
> * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
> initialization of bbs to explicit construction code.  Adjust the
> definition of nbbs.
> (update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
> * tree-vect-pattern.cc (vect_determine_precisions): Make
> loop_vec_info and bb_vec_info share same code.
> (vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
> * tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
> via base vec_info class.
> (_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
> fields of input auto_vec<> bbs.
> (vect_slp_region): Use access to nbbs to replace original
> bbs.length().
> (vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
> * tree-vectorizer.cc (vec_info::vec_info): Add initialization of
> bbs and nbbs.
> (vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
> class.
> * tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
> (LOOP_VINFO_NBBS): New macro.
> (BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
> (BB_VINFO_NBBS): New macro.
> (_loop_vec_info): Remove field bbs.
> (_bb_vec_info): Rename field bbs.
> ---
>  gcc/tree-vect-loop.cc |   7 +-
>  gcc/tree-vect-patterns.cc | 142 +++---
>  gcc/tree-vect-slp.cc  |  23 +++---
>  gcc/tree-vectorizer.cc|   7 +-
>  gcc/tree-vectorizer.h |  19 +++--
>  5 files changed, 70 insertions(+), 128 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 3b94bb13a8b..04a9ac64df7 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
>  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared *shared)
>: vec_info (vec_info::loop, shared),
>  loop (loop_in),
> -bbs (XCNEWVEC (basic_block, loop->num_nodes)),
>  num_itersm1 (NULL_TREE),
>  num_iters (NULL_TREE),
>  num_iters_unchanged (NULL_TREE),
> @@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> vec_info_shared *shared)
>   case of the loop forms we allow, a dfs order of the BBs would the same
>   as reversed postorder traversal, so we are safe.  */
>
> -  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> - bbs, loop->num_nodes, loop);
> +  bbs = XCNEWVEC (basic_block, loop->num_nodes);
> +  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
> +loop->num_nodes, loop);
>gcc_assert (nbbs == loop->num_nodes);
>
>for (unsigned int i = 0; i < nbbs; i++)
> @@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, 
> tree advance)
>
>free (LOOP_VINFO_BBS (epilogue_vinfo));
>LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
> +  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;
>
>/* Advance data_reference's with the number of iterations of the previous
>   loop and its prologue.  */
> diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> index 8929e5aa7f3..88e7e34d78d 100644
> --- a/gcc/tree-vect-patterns.cc
> +++ b/gcc/tree-vect-patterns.cc
> @@ -6925,81 +6925,41 @@ vect_determine_stmt_precisions (vec_info *vinfo, 
> stmt_vec_info stmt_info)
>  void
>  vect_determine_precisions (vec_info *vinfo)
>  {
> +  basic_block *bbs = vinfo->bbs;
> +  u

Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info

2024-05-29 Thread Richard Biener
On Wed, May 29, 2024 at 10:39 AM Feng Xue OS
 wrote:
>
> Ok. Then I will add a TODO comment on "bbs" field to describe it.

Fine with me.

Thanks,
Richard.

> Thanks,
> Feng
>
>
> 
> From: Richard Biener 
> Sent: Wednesday, May 29, 2024 3:14 PM
> To: Feng Xue OS
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] vect: Unify bbs in loop_vec_info and bb_vec_info
>
> On Tue, May 28, 2024 at 6:11 PM Feng Xue OS  
> wrote:
> >
> > Because bbs of loop_vec_info need to be allocated via old-fashion
> > XCNEWVEC, in order to receive result from dfs_enumerate_from(),
> > so have to make bb_vec_info align with loop_vec_info, use
> > basic_block * instead of vec. Another reason is that
> > some loop vect related codes assume that bbs is a pointer, such
> > as using LOOP_VINFO_BBS() to directly free the bbs area.
>
> I think dfs_enumerate_from is fine with receiving bbs.address ()
> (if you first grow the vector, of course).  There might be other code
> that needs changing, sure.
>
> > While encapsulating bbs into array_slice might make changed code
> > more wordy. So still choose basic_block * as its type. Updated the
> > patch by removing bbs_as_vector.
>
> The updated patch looks good to me.  Lifetime management of
> the base class bbs done differently by _loop_vec_info and _bb_vec_info
> is a bit ugly but it's a well isolated fact.
>
> Thus, OK.
>
> I do think we can turn the basic_block * back to a vec<> but this
> can be done as followup if anybody has spare cycles.
>
> Thanks,
> Richard.
>
> > Feng.
> > 
> > gcc/
> > * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Move
> > initialization of bbs to explicit construction code.  Adjust the
> > definition of nbbs.
> > (update_epilogue_loop_vinfo): Update nbbs for epilog vinfo.
> > * tree-vect-pattern.cc (vect_determine_precisions): Make
> > loop_vec_info and bb_vec_info share same code.
> > (vect_pattern_recog): Remove duplicated vect_pattern_recog_1 loop.
> > * tree-vect-slp.cc (vect_get_and_check_slp_defs): Access to bbs[0]
> > via base vec_info class.
> > (_bb_vec_info::_bb_vec_info): Initialize bbs and nbbs using data
> > fields of input auto_vec<> bbs.
> > (vect_slp_region): Use access to nbbs to replace original
> > bbs.length().
> > (vect_schedule_slp_node): Access to bbs[0] via base vec_info class.
> > * tree-vectorizer.cc (vec_info::vec_info): Add initialization of
> > bbs and nbbs.
> > (vec_info::insert_seq_on_entry): Access to bbs[0] via base vec_info
> > class.
> > * tree-vectorizer.h (vec_info): Add new fields bbs and nbbs.
> > (LOOP_VINFO_NBBS): New macro.
> > (BB_VINFO_BBS): Rename BB_VINFO_BB to BB_VINFO_BBS.
> > (BB_VINFO_NBBS): New macro.
> > (_loop_vec_info): Remove field bbs.
> > (_bb_vec_info): Rename field bbs.
> > ---
> >  gcc/tree-vect-loop.cc |   7 +-
> >  gcc/tree-vect-patterns.cc | 142 +++---
> >  gcc/tree-vect-slp.cc  |  23 +++---
> >  gcc/tree-vectorizer.cc|   7 +-
> >  gcc/tree-vectorizer.h |  19 +++--
> >  5 files changed, 70 insertions(+), 128 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index 3b94bb13a8b..04a9ac64df7 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -1028,7 +1028,6 @@ bb_in_loop_p (const_basic_block bb, const void *data)
> >  _loop_vec_info::_loop_vec_info (class loop *loop_in, vec_info_shared 
> > *shared)
> >: vec_info (vec_info::loop, shared),
> >  loop (loop_in),
> > -bbs (XCNEWVEC (basic_block, loop->num_nodes)),
> >  num_itersm1 (NULL_TREE),
> >  num_iters (NULL_TREE),
> >  num_iters_unchanged (NULL_TREE),
> > @@ -1079,8 +1078,9 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
> > vec_info_shared *shared)
> >   case of the loop forms we allow, a dfs order of the BBs would the same
> >   as reversed postorder traversal, so we are safe.  */
> >
> > -  unsigned int nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
> > - bbs, loop->num_nodes, loop);
> > +  bbs = XCNEWVEC (basic_block, loop->num_nodes);
> > +  nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p, bbs,
> > +loop->num_nodes, loop);
> >gcc_assert (nbbs == loop->num_nodes);
> >
> >for (unsigned int i = 0; i < nbbs; i++)
> > @@ -11667,6 +11667,7 @@ update_epilogue_loop_vinfo (class loop *epilogue, 
> > tree advance)
> >
> >free (LOOP_VINFO_BBS (epilogue_vinfo));
> >LOOP_VINFO_BBS (epilogue_vinfo) = epilogue_bbs;
> > +  LOOP_VINFO_NBBS (epilogue_vinfo) = epilogue->num_nodes;
> >
> >/* Advance data_reference's with the number of iterations of the previous
> >   loop and its prologue.  */
> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> > i

Re: [PATCH] i386: Fix ix86_option override after change [PR 113719]

2024-05-29 Thread Hongtao Liu
On Thu, May 16, 2024 at 5:15 PM Hongyu Wang  wrote:
>
> Richard Biener  于2024年5月16日周四 15:05写道:
>
> >
> > On Thu, May 16, 2024 at 8:25 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > In ix86_override_options_after_change, calls to ix86_default_align
> > > and ix86_recompute_optlev_based_flags will cause mismatched target
> > > opt_set when doing cl_optimization_restore. Move them back to
> > > ix86_option_override_internal to solve the issue.
> > >
> > > Bootstrapped & regtested on x86_64-pc-linux-gnu, and Rainer helped to
> > > test with i386-pc-solaris2.11 which also passed 32/64bit tests.
> >
> > Since this is a tricky area apparently without too much test coverage can
> > we have a testcase for this?
>
> This is a fix for my previous change on PR 107692, which moved these 2
> functions to ix86_override_options_after_change and it caused the
> PR113719 regression. The PR103696 test is the one that expose the
> issue. Actually the previous change will cause these 2 function be
> called in cl_optimization_restore
> which is redundant and incorrect. I cannot find another test to expose
> other functional regressions.
>
> >
> > > Ok for trunk and backport down to gcc12?
Ok.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/113719
> > > * config/i386/i386-options.cc 
> > > (ix86_override_options_after_change):
> > > Remove call to ix86_default_align and
> > > ix86_recompute_optlev_based_flags.
> > > (ix86_option_override_internal): Call ix86_default_align and
> > > ix86_recompute_optlev_based_flags.
> > > ---
> > >  gcc/config/i386/i386-options.cc | 10 +-
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-options.cc 
> > > b/gcc/config/i386/i386-options.cc
> > > index ac48b5c61c4..d97464f2c74 100644
> > > --- a/gcc/config/i386/i386-options.cc
> > > +++ b/gcc/config/i386/i386-options.cc
> > > @@ -1930,11 +1930,6 @@ ix86_recompute_optlev_based_flags (struct 
> > > gcc_options *opts,
> > >  void
> > >  ix86_override_options_after_change (void)
> > >  {
> > > -  /* Default align_* from the processor table.  */
> > > -  ix86_default_align (&global_options);
> > > -
> > > -  ix86_recompute_optlev_based_flags (&global_options, 
> > > &global_options_set);
> > > -
> > >/* Disable unrolling small loops when there's explicit
> > >   -f{,no}unroll-loop.  */
> > >if ((OPTION_SET_P (flag_unroll_loops))
> > > @@ -2530,6 +2525,8 @@ ix86_option_override_internal (bool main_args_p,
> > >
> > >set_ix86_tune_features (opts, ix86_tune, opts->x_ix86_dump_tunes);
> > >
> > > +  ix86_recompute_optlev_based_flags (opts, opts_set);
> > > +
> > >ix86_override_options_after_change ();
> > >
> > >ix86_tune_cost = processor_cost_table[ix86_tune];
> > > @@ -2565,6 +2562,9 @@ ix86_option_override_internal (bool main_args_p,
> > >|| TARGET_64BIT_P (opts->x_ix86_isa_flags))
> > >  opts->x_ix86_regparm = REGPARM_MAX;
> > >
> > > +  /* Default align_* from the processor table.  */
> > > +  ix86_default_align (&global_options);
> > > +
> > >/* Provide default for -mbranch-cost= value.  */
> > >SET_OPTION_IF_UNSET (opts, opts_set, ix86_branch_cost,
> > >ix86_tune_cost->branch_cost);
> > > --
> > > 2.31.1
> > >



-- 
BR,
Hongtao


[PATCH 2/3 v2] vect: Support v4hi -> v4qi.

2024-05-29 Thread Hu, Lin1
Exclude add TARGET_MMX_WITH_SSE, I merge two patterns.

BRs,
Lin

gcc/ChangeLog:

PR target/107432
* config/i386/mmx.md
(VI2_32_64): New mode iterator.
(mmxhalfmode): New mode atter.
(mmxhalfmodelower): Ditto.
(truncv2hiv2qi2): Extend mode v4hi and change name from
truncv2hiv2qi to trunc2.

gcc/testsuite/ChangeLog:

PR target/107432
* gcc.target/i386/pr107432-1.c: Modify test.
* gcc.target/i386/pr107432-6.c: Add test.
---
 gcc/config/i386/mmx.md | 17 +
 gcc/testsuite/gcc.target/i386/pr107432-1.c | 13 -
 gcc/testsuite/gcc.target/i386/pr107432-6.c | 19 ---
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5f342497885..27b080bfeb6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -67,6 +67,9 @@ (define_mode_iterator V2F_32 [V2HF V2BF])
 ;; 4-byte integer vector modes
 (define_mode_iterator VI_32 [V4QI V2HI])
 
+;; 8-byte and 4-byte HImode vector modes
+(define_mode_iterator VI2_32_64 [(V4HI "TARGET_MMX_WITH_SSE") V2HI])
+
 ;; 4-byte and 2-byte integer vector modes
 (define_mode_iterator VI_16_32 [V4QI V2QI V2HI])
 
@@ -106,6 +109,12 @@ (define_mode_attr mmxinsnmode
 (define_mode_attr mmxdoublemode
   [(V8QI "V8HI") (V4HI "V4SI")])
 
+(define_mode_attr mmxhalfmode
+  [(V4HI "V4QI") (V2HI "V2QI")])
+
+(define_mode_attr mmxhalfmodelower
+  [(V4HI "v4qi") (V2HI "v2qi")])
+
 ;; Mapping of vector float modes to an integer mode of the same size
 (define_mode_attr mmxintvecmode
   [(V2SF "V2SI") (V2SI "V2SI") (V4HI "V4HI") (V8QI "V8QI")
@@ -4873,10 +4882,10 @@ (define_expand "v2qiv2hi2"
   DONE;
 })
 
-(define_insn "truncv2hiv2qi2"
-  [(set (match_operand:V2QI 0 "register_operand" "=v")
-   (truncate:V2QI
- (match_operand:V2HI 1 "register_operand" "v")))]
+(define_insn "trunc2"
+  [(set (match_operand: 0 "register_operand" "=v")
+   (truncate:
+ (match_operand:VI2_32_64 1 "register_operand" "v")))]
   "TARGET_AVX512VL && TARGET_AVX512BW"
   "vpmovwb\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
b/gcc/testsuite/gcc.target/i386/pr107432-1.c
index a4f37447eb4..afdf367afe2 100644
--- a/gcc/testsuite/gcc.target/i386/pr107432-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
@@ -7,7 +7,8 @@
 /* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 8 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 10 { target { ! ia32 } } } } */
 
 #include 
 
@@ -113,6 +114,11 @@ __v2qi mm32_cvtepi16_epi8_builtin_convertvector(__v2hi 
a)
   return __builtin_convertvector((__v2hi)a, __v2qi);
 }
 
+__v4qi mm64_cvtepi16_epi8_builtin_convertvector(__v4hi a)
+{
+  return __builtin_convertvector((__v4hi)a, __v4qi);
+}
+
 __v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a)
 {
   return __builtin_convertvector((__v8hi)a, __v8qi);
@@ -218,6 +224,11 @@ __v2qu mm32_cvtepu16_epu8_builtin_convertvector(__v2hu 
a)
   return __builtin_convertvector((__v2hu)a, __v2qu);
 }
 
+__v4qu mm64_cvtepu16_epu8_builtin_convertvector(__v4hu a)
+{
+  return __builtin_convertvector((__v4hu)a, __v4qu);
+}
+
 __v8qu mm_cvtepu16_epu8_builtin_convertvector(__m128i a)
 {
   return __builtin_convertvector((__v8hu)a, __v8qu);
diff --git a/gcc/testsuite/gcc.target/i386/pr107432-6.c 
b/gcc/testsuite/gcc.target/i386/pr107432-6.c
index 4a68a10b089..7d3717d45bc 100644
--- a/gcc/testsuite/gcc.target/i386/pr107432-6.c
+++ b/gcc/testsuite/gcc.target/i386/pr107432-6.c
@@ -8,11 +8,14 @@
 /* { dg-final { scan-assembler-times "vcvttps2dq" 4 { target { ! ia32 } } } } 
*/
 /* { dg-final { scan-assembler-times "vcvttps2udq" 3 { target { ia32 } } } } */
 /* { dg-final { scan-assembler-times "vcvttps2udq" 4 { target { ! ia32 } } } } 
*/
-/* { dg-final { scan-assembler-times "vcvttph2w" 4 } } */
-/* { dg-final { scan-assembler-times "vcvttph2uw" 4 } } */
+/* { dg-final { scan-assembler-times "vcvttph2w" 4 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vcvttph2w" 5 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw" 4 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vcvttph2uw" 5 { target { ! ia32 } } } } 
*/
 /* { dg-final { scan-assembler-times "vpmovdb" 10 { target { ia32 } } } } */
 /* { dg-final { scan-assembler-times "vpmovdb" 14 { target { ! ia32 } } } } */
-/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 8 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "vpmovwb" 10 { target { ! ia32 } } } } */
 
 #include 
 
@@ -103,6 +106,11 @@ __v2qi mm32_cvtph_epi8_builtin

Re: [PATCH 2/3 v2] vect: Support v4hi -> v4qi.

2024-05-29 Thread Hongtao Liu
On Wed, May 29, 2024 at 4:56 PM Hu, Lin1  wrote:
>
> Exclude add TARGET_MMX_WITH_SSE, I merge two patterns.
Ok.
>
> BRs,
> Lin
>
> gcc/ChangeLog:
>
> PR target/107432
> * config/i386/mmx.md
> (VI2_32_64): New mode iterator.
> (mmxhalfmode): New mode atter.
> (mmxhalfmodelower): Ditto.
> (truncv2hiv2qi2): Extend mode v4hi and change name from
> truncv2hiv2qi to trunc2.
>
> gcc/testsuite/ChangeLog:
>
> PR target/107432
> * gcc.target/i386/pr107432-1.c: Modify test.
> * gcc.target/i386/pr107432-6.c: Add test.
> ---
>  gcc/config/i386/mmx.md | 17 +
>  gcc/testsuite/gcc.target/i386/pr107432-1.c | 13 -
>  gcc/testsuite/gcc.target/i386/pr107432-6.c | 19 ---
>  3 files changed, 41 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 5f342497885..27b080bfeb6 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -67,6 +67,9 @@ (define_mode_iterator V2F_32 [V2HF V2BF])
>  ;; 4-byte integer vector modes
>  (define_mode_iterator VI_32 [V4QI V2HI])
>
> +;; 8-byte and 4-byte HImode vector modes
> +(define_mode_iterator VI2_32_64 [(V4HI "TARGET_MMX_WITH_SSE") V2HI])
> +
>  ;; 4-byte and 2-byte integer vector modes
>  (define_mode_iterator VI_16_32 [V4QI V2QI V2HI])
>
> @@ -106,6 +109,12 @@ (define_mode_attr mmxinsnmode
>  (define_mode_attr mmxdoublemode
>[(V8QI "V8HI") (V4HI "V4SI")])
>
> +(define_mode_attr mmxhalfmode
> +  [(V4HI "V4QI") (V2HI "V2QI")])
> +
> +(define_mode_attr mmxhalfmodelower
> +  [(V4HI "v4qi") (V2HI "v2qi")])
> +
>  ;; Mapping of vector float modes to an integer mode of the same size
>  (define_mode_attr mmxintvecmode
>[(V2SF "V2SI") (V2SI "V2SI") (V4HI "V4HI") (V8QI "V8QI")
> @@ -4873,10 +4882,10 @@ (define_expand "v2qiv2hi2"
>DONE;
>  })
>
> -(define_insn "truncv2hiv2qi2"
> -  [(set (match_operand:V2QI 0 "register_operand" "=v")
> -   (truncate:V2QI
> - (match_operand:V2HI 1 "register_operand" "v")))]
> +(define_insn "trunc2"
> +  [(set (match_operand: 0 "register_operand" "=v")
> +   (truncate:
> + (match_operand:VI2_32_64 1 "register_operand" "v")))]
>"TARGET_AVX512VL && TARGET_AVX512BW"
>"vpmovwb\t{%1, %0|%0, %1}"
>[(set_attr "type" "ssemov")
> diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
> b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> index a4f37447eb4..afdf367afe2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr107432-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> @@ -7,7 +7,8 @@
>  /* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
>  /* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
>  /* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
> -/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
> +/* { dg-final { scan-assembler-times "vpmovwb" 8 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovwb" 10 { target { ! ia32 } } } } 
> */
>
>  #include 
>
> @@ -113,6 +114,11 @@ __v2qi 
> mm32_cvtepi16_epi8_builtin_convertvector(__v2hi a)
>return __builtin_convertvector((__v2hi)a, __v2qi);
>  }
>
> +__v4qi mm64_cvtepi16_epi8_builtin_convertvector(__v4hi a)
> +{
> +  return __builtin_convertvector((__v4hi)a, __v4qi);
> +}
> +
>  __v8qi mm_cvtepi16_epi8_builtin_convertvector(__m128i a)
>  {
>return __builtin_convertvector((__v8hi)a, __v8qi);
> @@ -218,6 +224,11 @@ __v2qu 
> mm32_cvtepu16_epu8_builtin_convertvector(__v2hu a)
>return __builtin_convertvector((__v2hu)a, __v2qu);
>  }
>
> +__v4qu mm64_cvtepu16_epu8_builtin_convertvector(__v4hu a)
> +{
> +  return __builtin_convertvector((__v4hu)a, __v4qu);
> +}
> +
>  __v8qu mm_cvtepu16_epu8_builtin_convertvector(__m128i a)
>  {
>return __builtin_convertvector((__v8hu)a, __v8qu);
> diff --git a/gcc/testsuite/gcc.target/i386/pr107432-6.c 
> b/gcc/testsuite/gcc.target/i386/pr107432-6.c
> index 4a68a10b089..7d3717d45bc 100644
> --- a/gcc/testsuite/gcc.target/i386/pr107432-6.c
> +++ b/gcc/testsuite/gcc.target/i386/pr107432-6.c
> @@ -8,11 +8,14 @@
>  /* { dg-final { scan-assembler-times "vcvttps2dq" 4 { target { ! ia32 } } } 
> } */
>  /* { dg-final { scan-assembler-times "vcvttps2udq" 3 { target { ia32 } } } } 
> */
>  /* { dg-final { scan-assembler-times "vcvttps2udq" 4 { target { ! ia32 } } } 
> } */
> -/* { dg-final { scan-assembler-times "vcvttph2w" 4 } } */
> -/* { dg-final { scan-assembler-times "vcvttph2uw" 4 } } */
> +/* { dg-final { scan-assembler-times "vcvttph2w" 4 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vcvttph2w" 5 { target { ! ia32 } } } } 
> */
> +/* { dg-final { scan-assembler-times "vcvttph2uw" 4 { target { ia32 } } } } 
> */
> +/* { dg-final { scan-assembler-times "vcvttph2uw" 5 { target { ! ia32 } } } 
> } */
>  /* { dg-final { scan-assembler-times "vpmovdb" 10 { target { ia32 } } } } */
>  /* { dg-final { scan-assembler-times "vpmovdb" 14 { target {

[PATCH 3/3 v2] vect: support direct conversion under x86-64-v3.

2024-05-29 Thread Hu, Lin1
According to hongtao's suggestion, I support some trunc in mmx.md under
x86-64-v3, and optimize ix86_expand_trunc_with_avx2_noavx512f.

BRs,
Lin

gcc/ChangeLog:

PR 107432
* config/i386/i386-expand.cc (ix86_expand_trunc_with_avx2_noavx512f):
New function for generate a series of suitable insn.
* config/i386/i386-protos.h (ix86_expand_trunc_with_avx2_noavx512f):
Define new function.
* config/i386/sse.md: Extend trunc2 for x86-64-v3.
(ssebytemode) Add V8HI.
(PMOV_DST_MODE_2_AVX2): New mode iterator.
(PMOV_SRC_MODE_3_AVX2): Ditto.
* config/i386/mmx.md
(trunc2): Ditto.
(avx512vl_trunc2): Ditto.
(truncv2si2): Ditto.
(avx512vl_truncv2si2): Ditto.
(mmxbytemode): New mode attr.

gcc/testsuite/ChangeLog:

PR 107432
* gcc.target/i386/pr107432-8.c: New test.
* gcc.target/i386/pr107432-9.c: Ditto.
* gcc.target/i386/pr92645-4.c: Modify test.
---
 gcc/config/i386/i386-expand.cc |  44 ++-
 gcc/config/i386/i386-protos.h  |   3 +
 gcc/config/i386/mmx.md |  35 +-
 gcc/config/i386/sse.md |  88 ++
 gcc/testsuite/gcc.target/i386/pr107432-8.c |  94 +++
 gcc/testsuite/gcc.target/i386/pr107432-9.c | 129 +
 gcc/testsuite/gcc.target/i386/pr92645-4.c  |   2 -
 7 files changed, 363 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-9.c

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 2f27bfb484c..90705803d29 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -1896,10 +1896,6 @@ ix86_split_convert_uns_si_sse (rtx operands[])
   emit_insn (gen_xorv4si3 (value, value, large));
 }
 
-static bool ix86_expand_vector_init_one_nonzero (bool mmx_ok,
-machine_mode mode, rtx target,
-rtx var, int one_var);
-
 /* Convert an unsigned DImode value into a DFmode, using only SSE.
Expects the 64-bit DImode to be supplied in a pair of integral
registers.  Requires SSE2; will use SSE3 if available.  For x86_32,
@@ -16418,7 +16414,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
whose ONE_VAR element is VAR, and other elements are zero.  Return true
if successful.  */
 
-static bool
+bool
 ix86_expand_vector_init_one_nonzero (bool mmx_ok, machine_mode mode,
 rtx target, rtx var, int one_var)
 {
@@ -25551,4 +25547,42 @@ ix86_expand_fast_convert_bf_to_sf (rtx val)
   return ret;
 }
 
+/* Trunc a vector to a narrow vector, like v4di -> v4si.  */
+
+void
+ix86_expand_trunc_with_avx2_noavx512f (rtx output, rtx input, machine_mode 
cvt_mode)
+{
+  machine_mode out_mode = GET_MODE (output);
+  machine_mode in_mode = GET_MODE (input);
+  int len = GET_MODE_SIZE (in_mode);
+  gcc_assert (len == GET_MODE_SIZE (cvt_mode)
+ && GET_MODE_INNER (out_mode) == GET_MODE_INNER (cvt_mode)
+ && (REG_P (input) || SUBREG_P (input)));
+  scalar_mode inner_out_mode = GET_MODE_INNER (out_mode);
+  int in_innersize = GET_MODE_SIZE (GET_MODE_INNER (in_mode));
+  int out_innersize = GET_MODE_SIZE (inner_out_mode);
+
+  struct expand_vec_perm_d d;
+  d.target = gen_reg_rtx (cvt_mode);
+  d.op0 = lowpart_subreg (cvt_mode, force_reg(in_mode, input), in_mode);
+  d.op1 = d.op0;
+  d.vmode = cvt_mode;
+  d.nelt = GET_MODE_NUNITS (cvt_mode);
+  d.testing_p = false;
+  d.one_operand_p = true;
+
+  /* Init perm. Put the needed bits of input in order and
+ fill the rest of bits by default.  */
+  for (int i = 0; i < d.nelt; ++i)
+{
+  d.perm[i] = i;
+  if (i < GET_MODE_NUNITS (out_mode))
+   d.perm[i] = i * (in_innersize / out_innersize);
+}
+
+  bool ok = ix86_expand_vec_perm_const_1(&d);
+  gcc_assert (ok);
+  emit_move_insn (output, gen_lowpart (out_mode, d.target));
+}
+
 #include "gt-i386-expand.h"
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index dbc861fb1ea..aa826f4864f 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -242,6 +242,7 @@ extern void ix86_expand_atomic_fetch_op_loop (rtx, rtx, 
rtx, enum rtx_code,
 extern void ix86_expand_cmpxchg_loop (rtx *, rtx, rtx, rtx, rtx, rtx,
  bool, rtx_code_label *);
 extern rtx ix86_expand_fast_convert_bf_to_sf (rtx);
+extern void ix86_expand_trunc_with_avx2_noavx512f (rtx, rtx, machine_mode);
 extern rtx ix86_memtag_untagged_pointer (rtx, rtx);
 extern bool ix86_memtag_can_tag_addresses (void);
 
@@ -288,6 +289,8 @@ extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_abs (rtx, rtx);
 extern bool ix86_expand_vector_init_duplicate (bool, machine_mode, rtx,

Re: [PATCH 1/5] Do single-lane SLP discovery for reductions

2024-05-29 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 24 May 2024, Richard Biener wrote:
>
>> This is the second merge proposed from the SLP vectorizer branch.
>> I have again managed without adding and using --param vect-single-lane-slp
>> but instead this provides always enabled functionality.
>> 
>> This makes us use SLP reductions (a group of reductions) for the
>> case where the group size is one.  This basically means we try
>> to use SLP for all reductions.
>> 
>> I've kept the series close to changes how they are on the branch
>> but in the end I'll squash it, having separate commits for review
>> eventually helps identifying common issues we will run into.  In
>> particular we lack full SLP support for several reduction kinds
>> and the branch has more enabling patches than in this series.
>> For example 4/5 makes sure we use shifts and direct opcode
>> reductions in the reduction epilog for SLP reductions but doesn't
>> bother to try covering the general case but enables it only
>> for the single-element group case to avoid regressions
>> in gcc.dg/vect/reduc-{mul,or}_[12].c testcases.
>> 
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
>> successfully built SPEC CPU 2017.  This posting should trigger
>> arm & riscv pre-checkin CI.
>> 
>> There's one ICE in gcc.target/i386/pr51235.c I discovered late
>> that I will investigate and address after the weekend.
>
> I've fixed this now.
>
> On aarch64 and arm there's
>
> FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "VEC_PERM_EXPR" 
> 0
>
> which is a testism, I _think_ due to a bogus vect_load_lanes check
> in that line.  The code is as expected not using a SLP reduction of
> two lanes due to the widen-sum pattern used.  It might be that we
> somehow fail to use load-lanes when vectorizing the load with SLP
> which means that for SLP reductions we fail to consider
> load-lanes as override.  I think we should leave this FAIL, we need to
> work to get load-lanes vectorization from SLP anyway.  To fix this
> the load-permutation followup I have in the works will be necessary.

Sounds good to me FWIW.

> I also see
>
> FAIL: gcc.target/aarch64/sve/dot_1.c scan-assembler-times \\twhilelo\\t 8
> FAIL: gcc.target/aarch64/sve/reduc_4.c scan-assembler-not \\tfadd\\t
> FAIL: gcc.target/aarch64/sve/sad_1.c scan-assembler-times 
> \\tudot\\tz[0-9]+\\.s, z[0-9]+\\.b, z[0-9]+\\.b\\n 2
>
> but scan-assemblers are not my favorite.  For example dot_1.c has
> twice as many whilelo, but I'm not sure what goes wrong.
>
> There are quite some regressions reported for RISC-V, I looked at the
> ICEs and fixed them but I did not investigate any of the assembly
> scanning FAILs.
>
> I'll re-spin the series with the fixes tomorrow.
> If anybody wants to point out something I should investigate please
> speak up.

Thanks for checking the aarch64 results.  I'll look at the three SVE
failures once the patch is in.  Many of the tests are supposed to ensure
that we generate correct code for a given set of choices.  Sometimes
it's necessary to update the flags to retain the same of choices,
e.g. due to costing changes or general vectoriser improvements.

That is, the point of these tests isn't necessarily to make sure that we
get the "best" SVE code for the source -- especially since there isn't
really an abstract, objective "best" that applies to all targets.
The tests are instead reognising that we have mulitple techniques for
doing some things, and are trying to make sure that each of those
techniques works individually.

Realise that kind of test isn't popular with everyone.  The quid
pro quo is that we (AArch64 folks) get to look at the tests when
failures show up :)

Richard

>
> Thanks,
> Richard.
>
>> This change should be more straight-forward than the previous one,
>> still comments are of course welcome.  After pushed I will followup
>> with changes to enable single-lane SLP reductions for various
>> COND_EXPR reductions as well as double-reduction support and
>> in-order reduction support (also all restricted to single-lane
>> for the moment).
>>
>> Thanks,
>> Richard.
>> 
>> --
>> 
>> The following performs single-lane SLP discovery for reductions.
>> This exposes a latent issue with reduction SLP in outer loop
>> vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL
>> execution.
>> 
>>  * tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
>>  discoveries are reduction chains and need special backedge
>>  treatment.
>>  (vect_analyze_slp): Fall back to single-lane SLP discovery
>>  for reductions.  Make sure to try single-lane SLP reduction
>>  for all reductions as fallback.
>> ---
>>  gcc/tree-vect-slp.cc | 71 +---
>>  1 file changed, 54 insertions(+), 17 deletions(-)
>> 
>> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
>> index c7ed520b629..73cc69d85ce 100644
>> --- a/gcc/tree-vect-slp.cc
>> +++ b/gcc/tree-vect-slp.cc
>> @@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (v

Re: [PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-29 Thread Richard Sandiford
Richard Biener  writes:
> Code generation for contiguous load vectorization can already deal
> with generalized avoidance of loading from a gap.  The following
> extends detection of peeling for gaps requirement with that,
> gets rid of the old special casing of a half load and makes sure
> when we do access the gap we have peeling for gaps enabled.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> This is the first patch in a series to improve peeling for gaps,
> it turned out into an improvement for code rather than just doing
> the (delayed from stage3) removal of the "old" half-vector codepath.
>
> I'll wait for the pre-CI testing for pushing so you also have time
> for some comments.

LGTM FWIW (some trivia below).

Out of interest, how far are we off being able to load:

a[i*8+0]
a[i*8+1]
a[i*8+3]
a[i*8+4]

as two half vectors?  It doesn't look like we're quite there yet,
but I might have misread.

It would be nice if we could eventually integrate the overrun_p checks
with the vectorizable_load code that the code is trying to predict.
E.g. we could run through the vectorizable_load code during the
analysis phase and record overruns, similarly to Kewen's costing
patches.  As it stands, it seems difficult to make sure that the two
checks are exactly in sync, especially when the structure is so
different.

> Richard.
>
>   PR tree-optimization/115252
>   * tree-vect-stmts.cc (get_group_load_store_type): Enhance
>   detecting the number of cases where we can avoid accessing a gap
>   during code generation.
>   (vectorizable_load): Remove old half-vector peeling for gap
>   avoidance which is now redundant.  Add gap-aligned case where
>   it's OK to access the gap.  Add assert that we have peeling for
>   gaps enabled when we access a gap.
>
>   * gcc.dg/vect/slp-gap-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +
>  gcc/tree-vect-stmts.cc| 58 +--
>  2 files changed, 46 insertions(+), 30 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> new file mode 100644
> index 000..36463ca22c5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +typedef unsigned char uint8_t;
> +typedef short int16_t;
> +void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
> +  for (int y = 0; y < 4; y++) {
> +for (int x = 0; x < 4; x++)
> +  diff[x + y * 4] = pix1[x] - pix2[x];
> +pix1 += 16;
> +pix2 += 32;
> +  }
> +}
> +
> +/* We can vectorize this without peeling for gaps and thus without epilogue,
> +   but the only thing we can reliably scan is the zero-padding trick for the
> +   partial loads.  */
> +/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
> vect64 } } } */
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index a01099d3456..b26cc74f417 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, 
> stmt_vec_info stmt_info,
> dr_alignment_support alss;
> int misalign = dr_misalignment (first_dr_info, vectype);
> tree half_vtype;
> +   poly_uint64 remain;
> +   unsigned HOST_WIDE_INT tem, num;
> if (overrun_p
> && !masked_p
> && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
> vectype, misalign)))
>  == dr_aligned
> || alss == dr_unaligned_supported)
> -   && known_eq (nunits, (group_size - gap) * 2)
> -   && known_eq (nunits, group_size)
> -   && (vector_vector_composition_type (vectype, 2, &half_vtype)
> -   != NULL_TREE))
> +   && can_div_trunc_p (group_size
> +   * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
> +   nunits, &tem, &remain)
> +   && (known_eq (remain, 0u)
> +   || (constant_multiple_p (nunits, remain, &num)
> +   && (vector_vector_composition_type (vectype, num,
> +   &half_vtype)
> +   != NULL_TREE
>   overrun_p = false;

Might be worth renaming half_vtype now that it isn't necessarily
a strict half.

>  
> if (overrun_p && !can_overrun_p)
> @@ -11533,33 +11539,14 @@ vectorizable_load (vec_info *vinfo,
>   unsigned HOST_WIDE_INT gap = DR_GROUP_GAP (first_stmt_info);
>   unsigned int vect_align
> = vect_known_alignment_in_bytes (first_dr_info, vectype);
> - unsigned int scalar_dr_size
> -   = vect_get_scalar_dr_

Re: [PATCH] Avoid vector -Wfree-nonheap-object warnings

2024-05-29 Thread Jonathan Wakely
On Tue, 28 May 2024 at 21:55, François Dumont  wrote:
>
> I can indeed restore _M_initialize_dispatch as it was before. It was not
> fixing my initial problem. I simply kept the code simplification.
>
>  libstdc++: Use RAII to replace try/catch blocks
>
>  Move _Guard into std::vector declaration and use it to guard all
> calls to
>  vector _M_allocate.
>
>  Doing so the compiler has more visibility on what is done with the
> pointers
>  and do not raise anymore the -Wfree-nonheap-object warning.
>
>  libstdc++-v3/ChangeLog:
>
>  * include/bits/vector.tcc (_Guard): Move all the nested
> duplicated class...
>  * include/bits/stl_vector.h (_Guard_alloc): ...here and rename.
>  (_M_allocate_and_copy): Use latter.
>  (_M_initialize_dispatch): Small code simplification.
>  (_M_range_initialize): Likewise and set _M_finish first
> from the result
>  of __uninitialize_fill_n_a that can throw.
>
> Tested under Linux x86_64.
>
> Ok to commit ?

OK, thanks


>
> François
>
> On 28/05/2024 12:30, Jonathan Wakely wrote:
> > On Mon, 27 May 2024 at 05:37, François Dumont  wrote:
> >> Here is a new version working also in C++98.
> > Can we use a different solution that doesn't involve an explicit
> > template argument list for that __uninitialized_fill_n_a call?
> >
> > -+this->_M_impl._M_finish = std::__uninitialized_fill_n_a
> > ++this->_M_impl._M_finish =
> > ++  std::__uninitialized_fill_n_a
> > +  (__start, __n, __value, _M_get_Tp_allocator());
> >
> > Using _M_fill_initialize solves the problem :-)
> >
> >
> >
> >> Note that I have this failure:
> >>
> >> FAIL: 23_containers/vector/types/1.cc  -std=gnu++98 (test for excess 
> >> errors)
> >>
> >> but it's already failing on master, my patch do not change anything.
> > Yes, that's been failing for ages.
> >
> >> Tested under Linux x64,
> >>
> >> still ok to commit ?
> >>
> >> François
> >>
> >> On 24/05/2024 16:17, Jonathan Wakely wrote:
> >>> On Thu, 23 May 2024 at 18:38, François Dumont  
> >>> wrote:
>  On 23/05/2024 15:31, Jonathan Wakely wrote:
> > On 23/05/24 06:55 +0200, François Dumont wrote:
> >> As explained in this email:
> >>
> >> https://gcc.gnu.org/pipermail/libstdc++/2024-April/058552.html
> >>
> >> I experimented -Wfree-nonheap-object because of my enhancements on
> >> algos.
> >>
> >> So here is a patch to extend the usage of the _Guard type to other
> >> parts of vector.
> > Nice, that fixes the warning you were seeing?
>  Yes ! I indeed forgot to say so :-)
> 
> 
> > We recently got a bug report about -Wfree-nonheap-object in
> > std::vector, but that is coming from _M_realloc_append which already
> > uses the RAII guard :-(
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115016
>  Note that I also had to move call to __uninitialized_copy_a before
>  assigning this->_M_impl._M_start so get rid of the -Wfree-nonheap-object
>  warn. But _M_realloc_append is already doing potentially throwing
>  operations before assigning this->_M_impl so it must be something else.
> 
>  Though it made me notice another occurence of _Guard in this method. Now
>  replaced too in this new patch.
> 
> libstdc++: Use RAII to replace try/catch blocks
> 
> Move _Guard into std::vector declaration and use it to guard all
>  calls to
> vector _M_allocate.
> 
> Doing so the compiler has more visibility on what is done with the
>  pointers
> and do not raise anymore the -Wfree-nonheap-object warning.
> 
> libstdc++-v3/ChangeLog:
> 
> * include/bits/vector.tcc (_Guard): Move all the nested
>  duplicated class...
> * include/bits/stl_vector.h (_Guard_alloc): ...here.
> (_M_allocate_and_copy): Use latter.
> (_M_initialize_dispatch): Likewise and set _M_finish first
>  from the result
> of __uninitialize_fill_n_a that can throw.
> (_M_range_initialize): Likewise.
> 
> >> diff --git a/libstdc++-v3/include/bits/stl_vector.h
> >> b/libstdc++-v3/include/bits/stl_vector.h
> >> index 31169711a48..4ea74e3339a 100644
> >> --- a/libstdc++-v3/include/bits/stl_vector.h
> >> +++ b/libstdc++-v3/include/bits/stl_vector.h
> >> @@ -1607,6 +1607,39 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
> >> clear() _GLIBCXX_NOEXCEPT
> >> { _M_erase_at_end(this->_M_impl._M_start); }
> >>
> >> +private:
> >> +  // RAII guard for allocated storage.
> >> +  struct _Guard
> > If it's being defined at class scope instead of locally in a member
> > function, I think a better name would be good. Maybe _Ptr_guard or
> > _Dealloc_guard or something.
>  _Guard_alloc chosen.
> >

Re: [PATCH 1/3] vect: generate suitable convert insn for int -> int, float -> float and int <-> float.

2024-05-29 Thread Richard Biener
On Thu, 23 May 2024, Hu, Lin1 wrote:

> gcc/ChangeLog:
> 
>   PR target/107432
>   * tree-vect-generic.cc
>   (supportable_indirect_narrowing_operation): New function for
>   support indirect narrowing convert.
>   (supportable_indirect_widening_operation): New function for
>   support indirect widening convert.
>   (expand_vector_conversion): Support convert for int -> int,
>   float -> float and int <-> float.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/107432
>   * gcc.target/i386/pr107432-1.c: New test.
>   * gcc.target/i386/pr107432-2.c: Ditto.
>   * gcc.target/i386/pr107432-3.c: Ditto.
>   * gcc.target/i386/pr107432-4.c: Ditto.
>   * gcc.target/i386/pr107432-5.c: Ditto.
>   * gcc.target/i386/pr107432-6.c: Ditto.
>   * gcc.target/i386/pr107432-7.c: Ditto.
> ---
>  gcc/testsuite/gcc.target/i386/pr107432-1.c | 234 +
>  gcc/testsuite/gcc.target/i386/pr107432-2.c | 105 +
>  gcc/testsuite/gcc.target/i386/pr107432-3.c |  55 +
>  gcc/testsuite/gcc.target/i386/pr107432-4.c |  56 +
>  gcc/testsuite/gcc.target/i386/pr107432-5.c |  72 +++
>  gcc/testsuite/gcc.target/i386/pr107432-6.c | 139 
>  gcc/testsuite/gcc.target/i386/pr107432-7.c | 156 ++
>  gcc/tree-vect-generic.cc   | 157 +-
>  8 files changed, 968 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-5.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-6.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr107432-7.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr107432-1.c 
> b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> new file mode 100644
> index 000..a4f37447eb4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr107432-1.c
> @@ -0,0 +1,234 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64 -mavx512bw -mavx512vl -O3" } */
> +/* { dg-final { scan-assembler-times "vpmovqd" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovqw" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovqb" 6 } } */
> +/* { dg-final { scan-assembler-times "vpmovdw" 6 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdw" 8 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdb" 6 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovdb" 8 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpmovwb" 8 } } */
> +
> +#include 
> +
> +typedef short __v2hi __attribute__ ((__vector_size__ (4)));
> +typedef char __v2qi __attribute__ ((__vector_size__ (2)));
> +typedef char __v4qi __attribute__ ((__vector_size__ (4)));
> +typedef char __v8qi __attribute__ ((__vector_size__ (8)));
> +
> +typedef unsigned short __v2hu __attribute__ ((__vector_size__ (4)));
> +typedef unsigned short __v4hu __attribute__ ((__vector_size__ (8)));
> +typedef unsigned char __v2qu __attribute__ ((__vector_size__ (2)));
> +typedef unsigned char __v4qu __attribute__ ((__vector_size__ (4)));
> +typedef unsigned char __v8qu __attribute__ ((__vector_size__ (8)));
> +typedef unsigned int __v2su __attribute__ ((__vector_size__ (8)));
> +
> +__v2si mm_cvtepi64_epi32_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2si);
> +}
> +
> +__m128i  mm256_cvtepi64_epi32_builtin_convertvector(__m256i a)
> +{
> +  return (__m128i)__builtin_convertvector((__v4di)a, __v4si);
> +}
> +
> +__m256i  mm512_cvtepi64_epi32_builtin_convertvector(__m512i a)
> +{
> +  return (__m256i)__builtin_convertvector((__v8di)a, __v8si);
> +}
> +
> +__v2hi   mm_cvtepi64_epi16_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2hi);
> +}
> +
> +__v4hi   mm256_cvtepi64_epi16_builtin_convertvector(__m256i a)
> +{
> +  return __builtin_convertvector((__v4di)a, __v4hi);
> +}
> +
> +__m128i  mm512_cvtepi64_epi16_builtin_convertvector(__m512i a)
> +{
> +  return (__m128i)__builtin_convertvector((__v8di)a, __v8hi);
> +}
> +
> +__v2qi   mm_cvtepi64_epi8_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v2di)a, __v2qi);
> +}
> +
> +__v4qi   mm256_cvtepi64_epi8_builtin_convertvector(__m256i a)
> +{
> +  return __builtin_convertvector((__v4di)a, __v4qi);
> +}
> +
> +__v8qi   mm512_cvtepi64_epi8_builtin_convertvector(__m512i a)
> +{
> +  return __builtin_convertvector((__v8di)a, __v8qi);
> +}
> +
> +__v2hi   mm64_cvtepi32_epi16_builtin_convertvector(__v2si a)
> +{
> +  return __builtin_convertvector((__v2si)a, __v2hi);
> +}
> +
> +__v4hi   mm_cvtepi32_epi16_builtin_convertvector(__m128i a)
> +{
> +  return __builtin_convertvector((__v4si)a, __v

Re: [PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-29 Thread Richard Biener
On Wed, 29 May 2024, Richard Sandiford wrote:

> Richard Biener  writes:
> > Code generation for contiguous load vectorization can already deal
> > with generalized avoidance of loading from a gap.  The following
> > extends detection of peeling for gaps requirement with that,
> > gets rid of the old special casing of a half load and makes sure
> > when we do access the gap we have peeling for gaps enabled.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > This is the first patch in a series to improve peeling for gaps,
> > it turned out into an improvement for code rather than just doing
> > the (delayed from stage3) removal of the "old" half-vector codepath.
> >
> > I'll wait for the pre-CI testing for pushing so you also have time
> > for some comments.
> 
> LGTM FWIW (some trivia below).
> 
> Out of interest, how far are we off being able to load:
> 
> a[i*8+0]
> a[i*8+1]
> a[i*8+3]
> a[i*8+4]
> 
> as two half vectors?  It doesn't look like we're quite there yet,
> but I might have misread.

The code in vectorizable_load that eventually would do this only
triggers when we run into the final "gap" part.  We do not look
at the intermediate gaps at all (if the above is what we see
in the loop body).  Extending the code to handle the case
where the intermediate gap is produced because of unrolling (VF > 1)
should be possible - we'd simply need to check whether the currently
loaded elements have unused ones at the end.

> It would be nice if we could eventually integrate the overrun_p checks
> with the vectorizable_load code that the code is trying to predict.
> E.g. we could run through the vectorizable_load code during the
> analysis phase and record overruns, similarly to Kewen's costing
> patches.  As it stands, it seems difficult to make sure that the two
> checks are exactly in sync, especially when the structure is so
> different.

Yeah - that's why I put the assert in now (which I do expect to
trigger - also thanks to poly-ints may vs. must...)

Richard.

> > Richard.
> >
> > PR tree-optimization/115252
> > * tree-vect-stmts.cc (get_group_load_store_type): Enhance
> > detecting the number of cases where we can avoid accessing a gap
> > during code generation.
> > (vectorizable_load): Remove old half-vector peeling for gap
> > avoidance which is now redundant.  Add gap-aligned case where
> > it's OK to access the gap.  Add assert that we have peeling for
> > gaps enabled when we access a gap.
> >
> > * gcc.dg/vect/slp-gap-1.c: New testcase.
> > ---
> >  gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +
> >  gcc/tree-vect-stmts.cc| 58 +--
> >  2 files changed, 46 insertions(+), 30 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
> > b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> > new file mode 100644
> > index 000..36463ca22c5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do compile } */
> > +/* { dg-additional-options "-O3" } */
> > +
> > +typedef unsigned char uint8_t;
> > +typedef short int16_t;
> > +void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t 
> > *pix2) {
> > +  for (int y = 0; y < 4; y++) {
> > +for (int x = 0; x < 4; x++)
> > +  diff[x + y * 4] = pix1[x] - pix2[x];
> > +pix1 += 16;
> > +pix2 += 32;
> > +  }
> > +}
> > +
> > +/* We can vectorize this without peeling for gaps and thus without 
> > epilogue,
> > +   but the only thing we can reliably scan is the zero-padding trick for 
> > the
> > +   partial loads.  */
> > +/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
> > vect64 } } } */
> > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > index a01099d3456..b26cc74f417 100644
> > --- a/gcc/tree-vect-stmts.cc
> > +++ b/gcc/tree-vect-stmts.cc
> > @@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, 
> > stmt_vec_info stmt_info,
> >   dr_alignment_support alss;
> >   int misalign = dr_misalignment (first_dr_info, vectype);
> >   tree half_vtype;
> > + poly_uint64 remain;
> > + unsigned HOST_WIDE_INT tem, num;
> >   if (overrun_p
> >   && !masked_p
> >   && (((alss = vect_supportable_dr_alignment (vinfo, first_dr_info,
> >   vectype, misalign)))
> >== dr_aligned
> >   || alss == dr_unaligned_supported)
> > - && known_eq (nunits, (group_size - gap) * 2)
> > - && known_eq (nunits, group_size)
> > - && (vector_vector_composition_type (vectype, 2, &half_vtype)
> > - != NULL_TREE))
> > + && can_div_trunc_p (group_size
> > + * LOOP_VINFO_VECT_FACTOR (loop_vinfo) - gap,
> > + nunits, &tem, &remain)
> > + && (known_eq (remain, 0u)

[PATCH] libgcc/aarch64: also provide AT_HWCAP2 fallback

2024-05-29 Thread Jan Beulich
Much like AT_HWCAP is already provided in case the platform headers
don't have the value (yet).

libgcc/

* config/aarch64/cpuinfo.c: Provide AT_HWCAP2.
---
Observed as build failure with 14.1.0, so may want backporting there.

--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -146,6 +146,9 @@ struct {
 #define HWCAP_PACG (1UL << 31)
 #endif
 
+#ifndef AT_HWCAP2
+#define AT_HWCAP2 26
+#endif
 #ifndef HWCAP2_DCPODP
 #define HWCAP2_DCPODP (1 << 0)
 #endif


[Ada] Fix PR ada/115270

2024-05-29 Thread Eric Botcazou
This fixes the link failure of the GNAT tools on 32-bit SPARC/Linux (as well 
as on 32-bit PowerPC/Linux probably) coming from an incorrect binding to the 
64-bit compare-and-exchange builtin.

Tested by Rainer on 32-bit SPARC/Linux, applied on mainline and 14 branch.


2024-05-29  Eric Botcazou  

PR ada/115270
* Makefile.rtl (PowerPC/Linux): Use libgnat/s-atopri__32.ads for
the 32-bit library.
(SPARC/Linux): Likewise.

-- 
Eric Botcazoudiff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index 570d0b2703d..0f5ebb87d73 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2266,15 +2266,18 @@ ifeq ($(strip $(filter-out powerpc% linux%,$(target_cpu) $(target_os))),)
   system.ads

[PATCH] tree-optimization/114435 - pcom left around copies confusing SLP

2024-05-29 Thread Richard Biener
The following arranges for the pre-SLP vectorization scalar cleanup
to be run when predictive commoning was applied to a loop in the
function.  This is similar to the complete unroll situation and
facilitating SLP vectorization.  Avoiding the SSA copies in predictive
commoning itself isn't easy (and predcom also sometimes unrolls,
asking for scalar cleanup).

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/114435
* tree-predcom.cc (tree_predictive_commoning): Queue
the next scalar cleanup sub-pipeline to be run when we
did something.

* gcc.dg/vect/bb-slp-pr114435.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c | 37 +
 gcc/tree-predcom.cc |  3 ++
 2 files changed, 40 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c
new file mode 100644
index 000..d1eecf7979a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr114435.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* Predictive commining is supposed to happen.  */
+/* { dg-additional-options "-O3 -fdump-tree-pcom" } */
+
+struct res {
+double r0;
+double r1;
+double r2;
+double r3;
+};
+
+struct pxl {
+double v0;
+double v1;
+double v2;
+double v3;
+};
+
+#define IS_NAN(x) ((x) == (x))
+
+void fold(struct res *r, struct pxl *in, double k, int sz)
+{
+  int i;
+
+  for (i = 0; i < sz; i++) {
+  if (IS_NAN(k)) continue;
+  r->r0 += in[i].v0 * k;
+  r->r1 += in[i].v1 * k;
+  r->r2 += in[i].v2 * k;
+  r->r3 += in[i].v3 * k;
+  }
+}
+
+/* { dg-final { scan-tree-dump "# r__r0_lsm\[^\r\n\]* = PHI" "pcom" } } */
+/* { dg-final { scan-tree-dump "optimized: basic block part vectorized" "slp1" 
} } */
+/* { dg-final { scan-tree-dump "# vect\[^\r\n\]* = PHI" "slp1" } } */
diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
index 75a4c85164c..9844fee1e97 100644
--- a/gcc/tree-predcom.cc
+++ b/gcc/tree-predcom.cc
@@ -3522,6 +3522,9 @@ tree_predictive_commoning (bool allow_unroll_p)
}
 }
 
+  if (ret != 0)
+cfun->pending_TODOs |= PENDING_TODO_force_next_scalar_cleanup;
+
   return ret;
 }
 
-- 
2.35.3


Re: [PATCH] tree-optimization/115252 - enhance peeling for gaps avoidance

2024-05-29 Thread Richard Biener
On Wed, 29 May 2024, Richard Biener wrote:

> On Wed, 29 May 2024, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > Code generation for contiguous load vectorization can already deal
> > > with generalized avoidance of loading from a gap.  The following
> > > extends detection of peeling for gaps requirement with that,
> > > gets rid of the old special casing of a half load and makes sure
> > > when we do access the gap we have peeling for gaps enabled.
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > >
> > > This is the first patch in a series to improve peeling for gaps,
> > > it turned out into an improvement for code rather than just doing
> > > the (delayed from stage3) removal of the "old" half-vector codepath.
> > >
> > > I'll wait for the pre-CI testing for pushing so you also have time
> > > for some comments.
> > 
> > LGTM FWIW (some trivia below).
> > 
> > Out of interest, how far are we off being able to load:
> > 
> > a[i*8+0]
> > a[i*8+1]
> > a[i*8+3]
> > a[i*8+4]
> > 
> > as two half vectors?  It doesn't look like we're quite there yet,
> > but I might have misread.
> 
> The code in vectorizable_load that eventually would do this only
> triggers when we run into the final "gap" part.  We do not look
> at the intermediate gaps at all (if the above is what we see
> in the loop body).  Extending the code to handle the case
> where the intermediate gap is produced because of unrolling (VF > 1)
> should be possible - we'd simply need to check whether the currently
> loaded elements have unused ones at the end.
> 
> > It would be nice if we could eventually integrate the overrun_p checks
> > with the vectorizable_load code that the code is trying to predict.
> > E.g. we could run through the vectorizable_load code during the
> > analysis phase and record overruns, similarly to Kewen's costing
> > patches.  As it stands, it seems difficult to make sure that the two
> > checks are exactly in sync, especially when the structure is so
> > different.
> 
> Yeah - that's why I put the assert in now (which I do expect to
> trigger - also thanks to poly-ints may vs. must...)

I quickly looked and why it should be possible to set
LOOP_VINFO_PEELING_FOR_GAPS from the loops generating the actual
accesses (we run through those now after the costing refactoring)
there are quite a lot of paths that would (possibly) need to be
covered.  We could possibly set a stmt-local 
LOOP_VINFO_PEELING_FOR_GAPS flag conservatively and clear it
in the few places we handle the gap, only updating the global
LOOP_VINFO_PEELING_FOR_GAPS at the end, but it's still going to
be tricky to not forget a path here.

I've amended my TODO accordingly.

Richard.

> Richard.
> 
> > > Richard.
> > >
> > >   PR tree-optimization/115252
> > >   * tree-vect-stmts.cc (get_group_load_store_type): Enhance
> > >   detecting the number of cases where we can avoid accessing a gap
> > >   during code generation.
> > >   (vectorizable_load): Remove old half-vector peeling for gap
> > >   avoidance which is now redundant.  Add gap-aligned case where
> > >   it's OK to access the gap.  Add assert that we have peeling for
> > >   gaps enabled when we access a gap.
> > >
> > >   * gcc.dg/vect/slp-gap-1.c: New testcase.
> > > ---
> > >  gcc/testsuite/gcc.dg/vect/slp-gap-1.c | 18 +
> > >  gcc/tree-vect-stmts.cc| 58 +--
> > >  2 files changed, 46 insertions(+), 30 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> > >
> > > diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c 
> > > b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> > > new file mode 100644
> > > index 000..36463ca22c5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
> > > @@ -0,0 +1,18 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-additional-options "-O3" } */
> > > +
> > > +typedef unsigned char uint8_t;
> > > +typedef short int16_t;
> > > +void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t 
> > > *pix2) {
> > > +  for (int y = 0; y < 4; y++) {
> > > +for (int x = 0; x < 4; x++)
> > > +  diff[x + y * 4] = pix1[x] - pix2[x];
> > > +pix1 += 16;
> > > +pix2 += 32;
> > > +  }
> > > +}
> > > +
> > > +/* We can vectorize this without peeling for gaps and thus without 
> > > epilogue,
> > > +   but the only thing we can reliably scan is the zero-padding trick for 
> > > the
> > > +   partial loads.  */
> > > +/* { dg-final { scan-tree-dump-times "\{_\[0-9\]\+, 0" 6 "vect" { target 
> > > vect64 } } } */
> > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > index a01099d3456..b26cc74f417 100644
> > > --- a/gcc/tree-vect-stmts.cc
> > > +++ b/gcc/tree-vect-stmts.cc
> > > @@ -2072,16 +2072,22 @@ get_group_load_store_type (vec_info *vinfo, 
> > > stmt_vec_info stmt_info,
> > > dr_alignment_support alss;
> > > int misalign = dr_misalignment (first_dr_info, vectype);
> > > tree half_vtype;
> > > +

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-29 Thread Mariam Arutunian
On Tue, May 28, 2024 at 8:20 AM Jeff Law  wrote:

>
>
> On 5/24/24 2:42 AM, Mariam Arutunian wrote:
> > This patch adds a new compiler pass aimed at identifying naive CRC
> > implementations,
> > characterized by the presence of a loop calculating a CRC (polynomial
> > long division).
> > Upon detection of a potential CRC, the pass prints an informational
> message.
> >
> > Performs CRC optimization if optimization level is >= 2,
> > besides optimizations for size and if fno_gimple_crc_optimization given.
> >
> > This pass is added for the detection and optimization of naive CRC
> > implementations,
> > improving the efficiency of CRC-related computations.
> >
> > This patch includes only initial fast checks for filtering out non-CRCs,
> > detected possible CRCs verification and optimization parts will be
> > provided in subsequent patches.
> >
> >gcc/
> >
> >  * Makefile.in (OBJS): Add gimple-crc-optimization.o.
> >  * common.opt (fgimple-crc-optimization): New option.
> >  * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.
> >  * gimple-crc-optimization.cc: New file.
> >  * gimple.cc (set_phi_stmts_not_visited): New function.
> >  (set_gimple_stmts_not_visited): Likewise.
> >  (set_bbs_stmts_not_visited): Likewise.
> >  * gimple.h (set_gimple_stmts_not_visited): New extern function
> > declaration.
> >  (set_phi_stmts_not_visited): New extern function declaration.
> >  (set_bbs_stmts_not_visited): New extern function declaration.
> >  * opts.cc (default_options_table): Add OPT_fgimple_crc_optimization.
> >  (enable_fdo_optimizations): Enable gimple-crc-optimization.
> >  * passes.def (pass_crc_optimization): Add new pass.
> >  * timevar.def (TV_GIMPLE_CRC_OPTIMIZATION): New timevar.
> >  * tree-pass.h (make_pass_crc_optimization): New extern function
> > declaration.
> >
> > Signed-off-by: Mariam Arutunian  > >
>
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 2c078fdd1f8..53f7ab255dd 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -1757,6 +1757,12 @@ Common Var(flag_gcse_after_reload) Optimization
> >  Perform global common subexpression elimination after register
> allocation has
> >  finished.
> >
> > +fgimple-crc-optimization
> > +Common Var(flag_gimple_crc_optimization) Optimization
> > +Detect loops calculating CRC and replace with faster implementation.
> > +If the target supports carry-less-multiplication instruction, generate
> CRC using
> > +it; otherwise generate table-based CRC.
> This probably needs a minor update since your code can also generate a
> CRC instruction on x86 when it detects a CRC loop with the right
> polynomial.
>


Thanks. I forgot to modify this part.


>
>

>
>
> > +
> > +  /* Returns true if there is only two conditional blocks in the loop
> > + (one may be for the CRC bit check and the other for the loop
> counter).
> > + This may filter out some real CRCs, where more than one condition
> > + is checked for the CRC calculation.  */
> > +  static bool loop_contains_two_conditional_bb (basic_block *loop_bbs,
> > + unsigned loop_num_nodes);
> It's been a while, so if we're rehashing something we already worked
> through, I apologize.
>
> IIRC we looked at the problem of canonicalizing the loop into a form
> where we didn't necessarily have conditional blocks, instead we had
> branchless sequences for the conditional xor and dealing with the high
> bit in the crc.  My recollection was that the coremark CRC loop would
> always canonicalize, but that in general we still saw multiple CRC
> implementations that did not canonicalize and thus we still needed the
> more complex matching.  Correct?
>
>
The loop in CoreMark is not fully canonicalized in that form,
as there are still branches present for the conditional XOR operation.
I checked that using the -O2 and -O3 flags.

>
> > +
> > +  /* Checks whether found XOR_STMT is for calculating CRC.
> > + The function CRC_FUN calculates CRC only if there is a shift
> operation
> > + in the crc loop.  */
> > +  bool xor_calculates_crc (function *crc_fun, basic_block *loop_bbs,
> > +const gimple *xor_stmt);
> So the second sentence in the comment doesn't really seem to relate to
> this function.  It also seems like we potentially have two xors in a CRC
> loop.  One which xors one bit of the data with one bit of the crc.  The
> other is the conditional xor.  Which does this refer to?  I'm guesing
> the latter since the former can likely hoist out of the loop given a
> sufficiently smart loop optimizer.
>
>

I'll try to provide a better description of this function.
Here, XOR refers to the conditional XOR.

>
>
> > +
> > +  /* Checks that the variable used in the condition COND is the assumed
> CRC
> > + (or depends on the assumed CRC).
> > + Also sets data member m_phi_for_data if it isn't set and exists.

Re: [V3 PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-29 Thread Richard Biener
On Fri, May 24, 2024 at 9:29 AM liuhongt  wrote:
>
> Update in V3:
> > Since this was about vectorization can you instead add a testcase to
> > gcc.dg/vect/ and check for
> > vectorization to happen?
> Move to vect/pr112325.c.
> >
> > I believe the if (unr_insn <= 0) check can go as well.
> Removed.
>
> > as said, you want to do
> >
> >   curolli = false;
> >
> > after the above since we are iterating and for a subsequent unrolling
> > of an outer loop
> > of an unrolled inner loop we _do_ want to apply the 2/3 reduction
> > since there's likely
> > inter-loop redundancies exposed (as happens in SPEC calculix for example).
> >
> > Not sure if that changes any of the testsuite outcome - it possibly avoids 
> > the
> > gcc.dg/vect/pr69783.c FAIL?
> Yes, it avoids that, cunrolli is set to false when CHANGED is true.
>
> > Not sure about the arm fallout.
> It's the same reason as pr69783.c, there's subsequent unrolling of an outer 
> loop
> of an unrolled inner loop, and since inner loop is completely unrolled,
> outer_loop->inner is false and escape from the check.
> The change also fix 2 arm fallouts.

Perfect!

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

Can you place a comment before the

 cunrolli = false;

line indicating that we do not want to restrict subsequent outer (now
innermost) loop unrollings?

OK with such added comment.

Thanks,
Richard.

> For the innermost loop, after completely loop unroll, it will most likely
> not be able to reduce the body size to 2/3. The current 2/3 reduction
> will make some of the larger loops completely unrolled during
> cunrolli, which will then result in them not being able to be
> vectorized. It also increases the register pressure.
>
> The patch move the 2/3 reduction from estimated_unrolled_size to
> tree_unroll_loops_completely.
>
> gcc/ChangeLog:
>
> PR tree-optimization/112325
> * tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Move the
> 2 / 3 loop body size reduction to ..
> (try_unroll_loop_completely): .. here, add it for the check of
> body size shrink, and the check of comparison against
> param_max_completely_peeled_insns when
> (!cunrolli ||loop->inner).
> (canonicalize_loop_induction_variables): Add new parameter
> cunrolli and pass down.
> (tree_unroll_loops_completely_1): Ditto.
> (canonicalize_induction_variables): Pass cunrolli as false to
> canonicalize_loop_induction_variables.
> (tree_unroll_loops_completely): Set cunrolli to true at
> beginning and set it to false after CHANGED is true.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/pr112325.c: New test.
> ---
>  gcc/testsuite/gcc.dg/vect/pr112325.c | 59 
>  gcc/tree-ssa-loop-ivcanon.cc | 46 +++---
>  2 files changed, 83 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr112325.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr112325.c 
> b/gcc/testsuite/gcc.dg/vect/pr112325.c
> new file mode 100644
> index 000..71cf4099253
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr112325.c
> @@ -0,0 +1,59 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -funroll-loops -fdump-tree-vect-details" } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
> +
> +typedef unsigned short ggml_fp16_t;
> +static float table_f32_f16[1 << 16];
> +
> +inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
> +unsigned short s;
> +__builtin_memcpy(&s, &f, sizeof(unsigned short));
> +return table_f32_f16[s];
> +}
> +
> +typedef struct {
> +ggml_fp16_t d;
> +ggml_fp16_t m;
> +unsigned char qh[4];
> +unsigned char qs[32 / 2];
> +} block_q5_1;
> +
> +typedef struct {
> +float d;
> +float s;
> +char qs[32];
> +} block_q8_1;
> +
> +void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * 
> restrict vx, const void * restrict vy) {
> +const int qk = 32;
> +const int nb = n / qk;
> +
> +const block_q5_1 * restrict x = vx;
> +const block_q8_1 * restrict y = vy;
> +
> +float sumf = 0.0;
> +
> +for (int i = 0; i < nb; i++) {
> +unsigned qh;
> +__builtin_memcpy(&qh, x[i].qh, sizeof(qh));
> +
> +int sumi = 0;
> +
> +for (int j = 0; j < qk/2; ++j) {
> +const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
> +const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
> +
> +const int x0 = (x[i].qs[j] & 0xF) | xh_0;
> +const int x1 = (x[i].qs[j] >> 4) | xh_1;
> +
> +sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
> +}
> +
> +sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
> ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
> +}
> +
> +*s = sumf;
> +}
> +
> +/* { dg-final { scan-tree-dump-ti

[PATCH v1] Vect: Support IFN SAT_SUB for unsigned vector int

2024-05-29 Thread pan2 . li
From: Pan Li 

This patch would like to support the .SAT_SUB for the unsigned
vector int.  Given we have below example code:

void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  for (unsigned i = 0; i < n; i++)
out[i] = (x[i] - y[i]) & (-(uint64_t)(x[i] >= y[i]));
}

Before this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _77 = .SELECT_VL (ivtmp_75, POLY_INT_CST [2, 2]);
  ivtmp_56 = _77 * 8;
  vect__4.7_59 = .MASK_LEN_LOAD (vectp_x.5_57, 64B, { -1, ... }, _77, 0);
  vect__6.10_63 = .MASK_LEN_LOAD (vectp_y.8_61, 64B, { -1, ... }, _77, 0);

  mask__7.11_64 = vect__4.7_59 >= vect__6.10_63;
  _66 = .COND_SUB (mask__7.11_64, vect__4.7_59, vect__6.10_63, { 0, ... });

  .MASK_LEN_STORE (vectp_out.15_71, 64B, { -1, ... }, _77, 0, _66);
  vectp_x.5_58 = vectp_x.5_57 + ivtmp_56;
  vectp_y.8_62 = vectp_y.8_61 + ivtmp_56;
  vectp_out.15_72 = vectp_out.15_71 + ivtmp_56;
  ivtmp_76 = ivtmp_75 - _77;
  ...
}

After this patch:
void
vec_sat_sub_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
{
  ...
  _76 = .SELECT_VL (ivtmp_74, POLY_INT_CST [2, 2]);
  ivtmp_60 = _76 * 8;
  vect__4.7_63 = .MASK_LEN_LOAD (vectp_x.5_61, 64B, { -1, ... }, _76, 0);
  vect__6.10_67 = .MASK_LEN_LOAD (vectp_y.8_65, 64B, { -1, ... }, _76, 0);

  vect_patt_37.11_68 = .SAT_SUB (vect__4.7_63, vect__6.10_67);

  .MASK_LEN_STORE (vectp_out.12_70, 64B, { -1, ... }, _76, 0, 
vect_patt_37.11_68);
  vectp_x.5_62 = vectp_x.5_61 + ivtmp_60;
  vectp_y.8_66 = vectp_y.8_65 + ivtmp_60;
  vectp_out.12_71 = vectp_out.12_70 + ivtmp_60;
  ivtmp_75 = ivtmp_74 - _76;
  ...
}

The below test suites are passed for this patch
* The x86 bootstrap test.
* The x86 fully regression test.
* The riscv fully regression tests.

gcc/ChangeLog:

* match.pd: Add new form for vector mode recog.
* tree-vect-patterns.cc (gimple_unsigned_integer_sat_sub): Add
new match func decl;
(vect_recog_build_binary_gimple_call): Extract helper func to
build gcall with given internal_fn.
(vect_recog_sat_sub_pattern): Add new func impl to recog .SAT_SUB.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 14 +++
 gcc/tree-vect-patterns.cc | 85 ---
 2 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 3e334533ff8..81f389855cd 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3100,6 +3100,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Unsigned saturation sub, case 3 (branchless with gt):
+   SAT_U_SUB = (X - Y) * (X > Y).  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (mult:c (minus @0 @1) (convert (gt @0 @1)))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
+/* Unsigned saturation sub, case 4 (branchless with ge):
+   SAT_U_SUB = (X - Y) * (X >= Y).  */
+(match (unsigned_integer_sat_sub @0 @1)
+ (mult:c (minus @0 @1) (convert (gt @0 @1)))
+ (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index a313dc64643..09a7c129493 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -4488,6 +4488,32 @@ vect_recog_mult_pattern (vec_info *vinfo,
 }
 
 extern bool gimple_unsigned_integer_sat_add (tree, tree*, tree (*)(tree));
+extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
+
+static gcall *
+vect_recog_build_binary_gimple_call (vec_info *vinfo, gimple *stmt,
+internal_fn fn, tree *type_out,
+tree op_0, tree op_1)
+{
+  tree itype = TREE_TYPE (op_0);
+  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
+
+  if (vtype != NULL_TREE
+&& direct_internal_fn_supported_p (fn, vtype, OPTIMIZE_FOR_BOTH))
+{
+  gcall *call = gimple_build_call_internal (fn, 2, op_0, op_1);
+
+  gimple_call_set_lhs (call, vect_recog_temp_ssa_var (itype, NULL));
+  gimple_call_set_nothrow (call, /* nothrow_p */ false);
+  gimple_set_location (call, gimple_location (stmt));
+
+  *type_out = vtype;
+
+  return call;
+}
+
+  return NULL;
+}
 
 /*
  * Try to detect saturation add pattern (SAT_ADD), aka below gimple:
@@ -4510,27 +4536,55 @@ vect_recog_sat_add_pattern (vec_info *vinfo, 
stmt_vec_info stmt_vinfo,
   if (!is_gimple_assign (last_stmt))
 return NULL;
 
-  tree res_ops[2];
+  tree ops[2];
   tree lhs = gimple_assign_lhs (last_stmt);
 
-  if (gimple_unsigned_integer_sat_add (lhs, res_ops, NULL))
+  if (gimple_unsigned_integer_sat_add (lhs, ops, NULL))
 {
-  tree itype = TREE_TYPE (res_ops[0]);
-  tree vtype = get_vectype_for_scalar_type (vinfo, itype);
-
-  if (vtype != NULL_TRE

Re: [PATCH 1/2] Match: Add maybe_bit_not instead of plain matching

2024-05-29 Thread Richard Biener
On Mon, May 27, 2024 at 2:47 AM Andrew Pinski  wrote:
>
> While working on adding matching of negative expressions of `a - b`,
> I noticed that we started to have "duplicated" patterns due to not having
> a way to match maybe negative expressions. So I went back to what I did for
> bit_not and decided to improve the situtation there so for some patterns
> where we had 2 operands of an expression where one could have been a bit_not,
> add back maybe_bit_not.
> This does not add maybe_bit_not in every place were bitwise_inverted_equal_p
> is used, just the ones were 2 operands of an expression could be swapped.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

Richard.

> gcc/ChangeLog:
>
> * match.pd (bit_not_with_nop): Unconditionalize.
> (maybe_cmp): Likewise.
> (maybe_bit_not): New match pattern.
> (`~X & X`): Use maybe_bit_not and add `:c` back.
> (`~x ^ x`/`~x | x`): Likewise.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/match.pd | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 024e3350465..090ad4e08b0 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -167,7 +167,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
>&& tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
> (@0))
>
> -#if GIMPLE
>  /* These are used by gimple_bitwise_inverted_equal_p to simplify
> detection of BIT_NOT and comparisons. */
>  (match (bit_not_with_nop @0)
> @@ -188,7 +187,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (bit_xor@0 @1 @2)
>   (if (INTEGRAL_TYPE_P (type)
>&& TYPE_PRECISION (type) == 1)))
> -#endif
> +/* maybe_bit_not is used to match what
> +   is acceptable for bitwise_inverted_equal_p. */
> +(match (maybe_bit_not @0)
> + (bit_not_with_nop@0 @1))
> +(match (maybe_bit_not @0)
> + (INTEGER_CST@0))
> +(match (maybe_bit_not @0)
> + (maybe_cmp@0 @1))
>
>  /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
> ABSU_EXPR returns unsigned absolute value of the operand and the operand
> @@ -1332,7 +1338,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>
>  /* Simplify ~X & X as zero.  */
>  (simplify
> - (bit_and (convert? @0) (convert? @1))
> + (bit_and:c (convert? @0) (convert? (maybe_bit_not @1)))
>   (with { bool wascmp; }
>(if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
> && bitwise_inverted_equal_p (@0, @1, wascmp))
> @@ -1597,7 +1603,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* ~x ^ x -> -1 */
>  (for op (bit_ior bit_xor)
>   (simplify
> -  (op (convert? @0) (convert? @1))
> +  (op:c (convert? @0) (convert? (maybe_bit_not @1)))
>(with { bool wascmp; }
> (if (types_match (TREE_TYPE (@0), TREE_TYPE (@1))
>  && bitwise_inverted_equal_p (@0, @1, wascmp))
> --
> 2.43.0
>


Re: [PATCH 2/2] match: Add support for `a ^ CST` to bitwise_inverted_equal_p [PR115224]

2024-05-29 Thread Richard Biener
On Mon, May 27, 2024 at 2:48 AM Andrew Pinski  wrote:
>
> While looking into something else, I noticed that `a ^ CST` needed to be
> special casing to bitwise_inverted_equal_p as it would simplify to `a ^ ~CST`
> for the bitwise not.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/115224
>
> gcc/ChangeLog:
>
> * generic-match-head.cc (bitwise_inverted_equal_p): Add `a ^ CST`
> case.
> * gimple-match-head.cc (gimple_bit_xor_cst): New declaration.
> (gimple_bitwise_inverted_equal_p): Add `a ^ CST` case.
> * match.pd (bit_xor_cst): New match.
> (maybe_bit_not): Add bit_xor_cst case.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/bitops-8.c: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/generic-match-head.cc| 10 ++
>  gcc/gimple-match-head.cc | 13 +
>  gcc/match.pd |  4 
>  gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c | 15 +++
>  4 files changed, 42 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
>
> diff --git a/gcc/generic-match-head.cc b/gcc/generic-match-head.cc
> index e2e1e4b2d64..3709fe5456d 100644
> --- a/gcc/generic-match-head.cc
> +++ b/gcc/generic-match-head.cc
> @@ -156,6 +156,16 @@ bitwise_inverted_equal_p (tree expr1, tree expr2, bool 
> &wascmp)
>if (TREE_CODE (expr2) == BIT_NOT_EXPR
>&& bitwise_equal_p (expr1, TREE_OPERAND (expr2, 0)))
>  return true;
> +
> +  /* `X ^ CST` and `X ^ ~CST` match for ~. */
> +  if (TREE_CODE (expr1) == BIT_XOR_EXPR && TREE_CODE (expr2) == BIT_XOR_EXPR
> +  && bitwise_equal_p (TREE_OPERAND (expr1, 0), TREE_OPERAND (expr2, 0)))
> +{
> +  tree cst1 = uniform_integer_cst_p (TREE_OPERAND (expr1, 1));
> +  tree cst2 = uniform_integer_cst_p (TREE_OPERAND (expr2, 1));
> +  if (cst1 && cst2 && wi::to_wide (cst1) == ~wi::to_wide (cst2))
> +   return true;
> +}
>if (COMPARISON_CLASS_P (expr1)
>&& COMPARISON_CLASS_P (expr2))
>  {
> diff --git a/gcc/gimple-match-head.cc b/gcc/gimple-match-head.cc
> index 49b1dde6ae4..d5908f4e9a6 100644
> --- a/gcc/gimple-match-head.cc
> +++ b/gcc/gimple-match-head.cc
> @@ -283,6 +283,7 @@ gimple_bitwise_equal_p (tree expr1, tree expr2, tree 
> (*valueize) (tree))
>
>  bool gimple_bit_not_with_nop (tree, tree *, tree (*) (tree));
>  bool gimple_maybe_cmp (tree, tree *, tree (*) (tree));
> +bool gimple_bit_xor_cst (tree, tree *, tree (*) (tree));
>
>  /* Helper function for bitwise_inverted_equal_p macro.  */
>
> @@ -299,6 +300,18 @@ gimple_bitwise_inverted_equal_p (tree expr1, tree expr2, 
> bool &wascmp, tree (*va
>if (operand_equal_p (expr1, expr2, 0))
>  return false;
>
> +  tree xor1[2];
> +  tree xor2[2];
> +  /* `X ^ CST` and `X ^ ~CST` match for ~. */
> +  if (gimple_bit_xor_cst (expr1, xor1, valueize)
> +  && gimple_bit_xor_cst (expr2, xor2, valueize))
> +{
> +  if (operand_equal_p (xor1[0], xor2[0], 0)
> + && (wi::to_wide (uniform_integer_cst_p (xor1[1]))
> + == ~wi::to_wide (uniform_integer_cst_p (xor2[1]
> +   return true;
> +}
> +
>tree other;
>/* Try if EXPR1 was defined as ~EXPR2. */
>if (gimple_bit_not_with_nop (expr1, &other, valueize))
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 090ad4e08b0..480e36bbbaf 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -174,6 +174,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (bit_not_with_nop @0)
>   (convert (bit_not @0))
>   (if (tree_nop_conversion_p (type, TREE_TYPE (@0)
> +(match (bit_xor_cst @0 @1)
> + (bit_xor @0 uniform_integer_cst_p@1))
>  (for cmp (tcc_comparison)
>   (match (maybe_cmp @0)
>(cmp@0 @1 @2))
> @@ -195,6 +197,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (INTEGER_CST@0))
>  (match (maybe_bit_not @0)
>   (maybe_cmp@0 @1))
> +(match (maybe_bit_not @0)
> + (bit_xor_cst@0 @1 @2))
>
>  /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
> ABSU_EXPR returns unsigned absolute value of the operand and the operand
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
> new file mode 100644
> index 000..40f756e4455
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/bitops-8.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized-raw" } */
> +/* PR tree-optimization/115224 */
> +
> +int f1(int a, int b)
> +{
> +a = a ^ 1;
> +int c = ~a;
> +return c | (a ^ b);
> +// ~((a ^ 1) & b) or (a ^ -2) | ~b
> +}
> +/* { dg-final { scan-tree-dump-times   "bit_xor_expr, "  1  "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times   "bit_ior_expr, "  1  "optimized" } } 
> */
> +/* { dg-final { scan-tree-dump-times   "bit_not_expr, "  1  "optimized" } } 
> */
> +
> --
> 2.43.0
>


[patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Tobias Burnus

This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html

AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK=1 set) can access host memory; the run-time library
returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.

Thus, it makes sense to enable USM support for those devices, which
this patch does. — A simple test with all unified_shared_memory tests
shipping with sollve_vv now works:*

  Test passed on the device.

as tested on an MI200 series device. In line with (some) other compilers,
it requires that HSA_XNACK=1 is set, otherwise the code will be executed
on the host.

(* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
IPA pass: cpin gcn gcc) 1 runtime fail for 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
all 15 Fortran and 16 C tests PASS.)


Comments, remarks, suggestions?
Any reason not to commit it to mainline?

Tobias

PS: Richard confirmed that his gfx1036 APU also has
HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT == true; at least when
he disables the discrete gfx1030, which neither supports xnack not
is an APU.

** rocgdb shows:

Thread 4 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to thread 4, lane 0 (AMDGPU Lane 1:1:1:1/0 (0,0,0)[0,0,0])]
0x77309c30 in main._omp_fn () at 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp:88
88if (Emp.name[i] != RefStr[i]) {

but I have not tried to debug this.
libgomp: Enable USM for AMD APUs and MI200 devices

If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.

XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.

include/ChangeLog:

	* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
	enum value.

libgomp/ChangeLog:

	* libgomp.texi (gcn): Update USM handling
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
	USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.

 gcc/config/gcn/gcn-hsa.h|  2 +-
 include/hsa.h   |  4 +++-
 libgomp/libgomp.texi|  9 +++--
 libgomp/plugin/plugin-gcn.c | 18 ++
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4611bc55392..03220555075 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
+   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 			 "!march=*|march=*:--amdhsa-code-object-version=4"
diff --git a/include/hsa.h b/include/hsa.h
index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@ typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22868635230..e79bd7a3392 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6360,8 +6360,13 @@ The implementation remark:
   such that the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any GCN device from the list of
-  available devices (``host fallback'').
+  @code{unified_shared_memory} is only supported if all AMD GPUs have the
+  @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
+  discrete GPUs, this may require setting the @code{HSA_XNACK} environment
+  variable to @samp{1}; for systems with both an APU and a discrete GPU that
+  does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
+  enable only the APU.  If not supported, all AMD GPU devices are removed
+  from the list of available devices (``host fallback'').
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
   environment variable; the default is 32 kiB per thread.
 @item Low-latency memory (@code{omp_low_lat_mem_s

Re: [patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 02:15:07PM +0200, Tobias Burnus wrote:
> +  bool b;
> +  hsa_status_t status;
> +  status = hsa_fns.hsa_system_get_info_fn (
> +  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT, &b);
> +  if (status != HSA_STATUS_SUCCESS)
> + GOMP_PLUGIN_error (
> +   "HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT failed");

Formatting, the (s at the end of lines look terrible.
In the first case, perhaps using a temporary would help,
  hsa_system_info_t arg = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;
  status = hsa_fns.hsa_system_get_info_fn (arg, &b);
(or use something else instead of arg, as long as its short), while in the
second
GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT "
   "failed");
will do.

Other than that LGTM.

Jakub



Re: [PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

2024-05-29 Thread Richard Biener
On Mon, May 27, 2024 at 8:29 AM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are still running, will update it later.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * genmatch.cc (dt_node::gen_kids): Add new arg of predicate id.
> (allow_phi_predicate_p): New func impl to check the phi
> predicate is allowed or not.
> (dt_node::gen_kids_1): Add COND_EXPR gen for phi node if allowed.
> (dt_operand::gen_phi_on_cond):
> (write_predicate): Init the predicate id before gen_kids.
> * match.pd: Add more forms of unsigned_integer_sat_add and
> comments.
> * tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
> (match_assign_saturation_arith): Rename to.
> (match_phi_saturation_arith): New func impl to match phi.
> (math_opts_dom_walker::after_dom_children): Add phi match for
> echo bb.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/genmatch.cc   | 123 --
>  gcc/match.pd  |  43 -
>  gcc/tree-ssa-math-opts.cc |  51 +++-
>  3 files changed, 210 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f1e0e7abe0c..816d2dafd23 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -1767,6 +1767,7 @@ public:
>unsigned level;
>dt_node *parent;
>vec kids;
> +  const char *id;
>
>/* Statistics.  */
>unsigned num_leafs;
> @@ -1786,7 +1787,7 @@ public:
>virtual void gen (FILE *, int, bool, int) {}
>
>void gen_kids (FILE *, int, bool, int);
> -  void gen_kids_1 (FILE *, int, bool, int,
> +  void gen_kids_1 (FILE *, const char *, int, bool, int,
>const vec &, const vec &,
>const vec &, const vec &,
>const vec &, const vec &);
> @@ -1819,6 +1820,7 @@ public:
>
>char *get_name (char *);
>void gen_opname (char *, unsigned);
> +  void gen_phi_on_cond (FILE *, int, bool, int);
>  };
>
>  /* Leaf node of the decision tree, used for DT_SIMPLIFY.  */
> @@ -3173,7 +3175,7 @@ dt_node::gen_kids (FILE *f, int indent, bool gimple, 
> int depth)
>  for what we have collected sofar.  */
>   fns.qsort (fns_cmp);
>   generic_fns.qsort (fns_cmp);
> - gen_kids_1 (f, indent, gimple, depth, gimple_exprs, generic_exprs,
> + gen_kids_1 (f, id, indent, gimple, depth, gimple_exprs, 
> generic_exprs,
>   fns, generic_fns, preds, others);
>   /* And output the true operand itself.  */
>   kids[i]->gen (f, indent, gimple, depth);
> @@ -3191,14 +3193,21 @@ dt_node::gen_kids (FILE *f, int indent, bool gimple, 
> int depth)
>/* Generate code for the remains.  */
>fns.qs

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-29 Thread David Malcolm
On Fri, 2024-05-24 at 12:42 +0400, Mariam Arutunian wrote:
> This patch adds a new compiler pass aimed at identifying naive CRC
> implementations,
> characterized by the presence of a loop calculating a CRC (polynomial
> long
> division).
> Upon detection of a potential CRC, the pass prints an informational
> message.
> 
> Performs CRC optimization if optimization level is >= 2,
> besides optimizations for size and if fno_gimple_crc_optimization
> given.
> 
> This pass is added for the detection and optimization of naive CRC
> implementations,
> improving the efficiency of CRC-related computations.
> 
> This patch includes only initial fast checks for filtering out non-
> CRCs,
> detected possible CRCs verification and optimization parts will be
> provided
> in subsequent patches.
> 
>   gcc/
> 
>     * Makefile.in (OBJS): Add gimple-crc-optimization.o.
>     * common.opt (fgimple-crc-optimization): New option.
>     * doc/invoke.texi (-fgimple-crc-optimization): Add documentation.

A minor nitpick: patches that add new options (and their documentation)
ought to affect the corresponding .opt.urls, so that we can map from
the new option to the URL of its documentation.

Running "make regenerate-opt-urls" in the build/gcc subdirectory ought
to update common.opt.urls for you (provided you've done a "make html").

[...snip...]

Dave



Re: [patch] libgomp: Enable USM for some nvptx devices

2024-05-29 Thread Jakub Jelinek
On Wed, May 29, 2024 at 08:20:01AM +0200, Tobias Burnus wrote:
> +  if (num_devices > 0
> +  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
> +for (int dev = 0; dev < num_devices; dev++)
> +  {
> + int pi;
> + CUresult r;
> + r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
> +   CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS,
> +   dev);

Formatting nit, the CU_DEVICE_... should be below cuDeviceGetAttribute,
I think it fits like that (if it wouldn't one could use a temporary
variable).

Otherwise LGTM.

Jakub



Re: [PATCH v2] C/C++: add hints for strerror

2024-05-29 Thread Jason Merrill

Pushed, thanks!

On 2/27/24 20:13, Oskari Pirhonen wrote:

Add proper hints for implicit declaration of strerror.

The results could be confusing depending on the other included headers.
These example messages are from compiling a trivial program to print the
string for an errno value. It only includes stdio.h (cstdio for C++).

Before:
$ /tmp/gcc-master/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’; did you mean 
‘perror’? [-Wimplicit-function-declaration]
 4 | printf("%s\n", strerror(0));
   |^~~~
   |perror

$ /tmp/gcc-master/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope; did you mean 
‘stderr’?
 4 | printf("%s\n", strerror(0));
   |^~~~
   |stderr

After:
$ /tmp/gcc-known-headers/bin/gcc test.c -o test_c
test.c: In function ‘main’:
test.c:4:20: warning: implicit declaration of function ‘strerror’ 
[-Wimplicit-function-declaration]
 4 | printf("%s\n", strerror(0));
   |^~~~
test.c:2:1: note: ‘strerror’ is defined in header ‘’; this is probably 
fixable by adding ‘#include ’
 1 | #include 
   +++ |+#include 
 2 |

$ /tmp/gcc-known-headers/bin/g++ test.cpp -o test_cpp
test.cpp: In function ‘int main()’:
test.cpp:4:20: error: ‘strerror’ was not declared in this scope
 4 | printf("%s\n", strerror(0));
   |^~~~
test.cpp:2:1: note: ‘strerror’ is defined in header ‘’; this is probably 
fixable by adding ‘#include ’
 1 | #include 
   +++ |+#include 
 2 |

gcc/c-family/ChangeLog:

* known-headers.cc (get_stdlib_header_for_name): Add strerror.

gcc/testsuite/ChangeLog:

* g++.dg/spellcheck-stdlib.C: Add check for strerror.
* gcc.dg/spellcheck-stdlib-2.c: New test.

Signed-off-by: Oskari Pirhonen 
---
v2:
- check for error instead of warning in gcc.dg/spellcheck-stdlib-2.c
- from linaro ci notification email

  gcc/c-family/known-headers.cc  | 1 +
  gcc/testsuite/g++.dg/spellcheck-stdlib.C   | 2 ++
  gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c | 8 
  3 files changed, 11 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c

diff --git a/gcc/c-family/known-headers.cc b/gcc/c-family/known-headers.cc
index dbc42eacde1..871fd714eb5 100644
--- a/gcc/c-family/known-headers.cc
+++ b/gcc/c-family/known-headers.cc
@@ -182,6 +182,7 @@ get_stdlib_header_for_name (const char *name, enum stdlib 
lib)
  {"strchr", {"", ""} },
  {"strcmp", {"", ""} },
  {"strcpy", {"", ""} },
+{"strerror", {"", ""} },
  {"strlen", {"", ""} },
  {"strncat", {"", ""} },
  {"strncmp", {"", ""} },
diff --git a/gcc/testsuite/g++.dg/spellcheck-stdlib.C 
b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
index fd0f3a9b8c9..33718b8034e 100644
--- a/gcc/testsuite/g++.dg/spellcheck-stdlib.C
+++ b/gcc/testsuite/g++.dg/spellcheck-stdlib.C
@@ -104,6 +104,8 @@ void test_cstring (char *dest, char *src)
// { dg-message "'#include '" "" { target *-*-* } .-1 }
strcpy(dest, "test"); // { dg-error "was not declared" }
// { dg-message "'#include '" "" { target *-*-* } .-1 }
+  strerror(0); // { dg-error "was not declared" }
+  // { dg-message "'#include '" "" { target *-*-* } .-1 }
strlen("test"); // { dg-error "was not declared" }
// { dg-message "'#include '" "" { target *-*-* } .-1 }
strncat(dest, "test", 3); // { dg-error "was not declared" }
diff --git a/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c 
b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c
new file mode 100644
index 000..4762e2ddbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-stdlib-2.c
@@ -0,0 +1,8 @@
+/* { dg-options "-Wimplicit-function-declaration" } */
+
+/* Missing .  */
+void test_string_h (void)
+{
+  strerror (0); /* { dg-error "implicit declaration of function 'strerror'" } 
*/
+  /* { dg-message "'strerror' is defined in header ''" "" { target 
*-*-* } .-1 } */
+}




Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-29 Thread Qing Zhao


> On May 29, 2024, at 02:57, Richard Biener  wrote:
> 
> On Tue, May 28, 2024 at 11:09 PM Qing Zhao  wrote:
>> 
>> Thank you for the comments. See my answers below:
>> 
>> Joseph, please see the last question, I need your help on it. Thanks a lot 
>> for the help.
>> 
>> Qing
>> 
>>> On May 28, 2024, at 03:38, Richard Biener  
>>> wrote:
>>> 
>>> On Fri, Apr 12, 2024 at 3:54 PM Qing Zhao  wrote:
 
 Including the following changes:
 * The definition of the new internal function .ACCESS_WITH_SIZE
 in internal-fn.def.
 * C FE converts every reference to a FAM with a "counted_by" attribute
 to a call to the internal function .ACCESS_WITH_SIZE.
 (build_component_ref in c_typeck.cc)
 
 This includes the case when the object is statically allocated and
 initialized.
 In order to make this working, the routines initializer_constant_valid_p_1
 and output_constant in varasm.cc are updated to handle calls to
 .ACCESS_WITH_SIZE.
 (initializer_constant_valid_p_1 and output_constant in varasm.c)
 
 However, for the reference inside "offsetof", the "counted_by" attribute is
 ignored since it's not useful at all.
 (c_parser_postfix_expression in c/c-parser.cc)
 
 In addtion to "offsetof", for the reference inside operator "typeof" and
 "alignof", we ignore counted_by attribute too.
 
 When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE,
 replace the call with its first argument.
 
 * Convert every call to .ACCESS_WITH_SIZE to its first argument.
 (expand_ACCESS_WITH_SIZE in internal-fn.cc)
 * Adjust alias analysis to exclude the new internal from clobbering 
 anything.
 (ref_maybe_used_by_call_p_1 and call_may_clobber_ref_p_1 in 
 tree-ssa-alias.cc)
 * Adjust dead code elimination to eliminate the call to .ACCESS_WITH_SIZE 
 when
 it's LHS is eliminated as dead code.
 (eliminate_unnecessary_stmts in tree-ssa-dce.cc)
 * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and
 get the reference from the call to .ACCESS_WITH_SIZE.
 (is_access_with_size_p and get_ref_from_access_with_size in tree.cc)
 
 gcc/c/ChangeLog:
 
   * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by
   attribute when build_component_ref inside offsetof operator.
   * c-tree.h (build_component_ref): Add one more parameter.
   * c-typeck.cc (build_counted_by_ref): New function.
   (build_access_with_size_for_counted_by): New function.
   (build_component_ref): Check the counted-by attribute and build
   call to .ACCESS_WITH_SIZE.
   (build_unary_op): When building ADDR_EXPR for
   .ACCESS_WITH_SIZE, use its first argument.
   (lvalue_p): Accept call to .ACCESS_WITH_SIZE.
 
 gcc/ChangeLog:
 
   * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function.
   * internal-fn.def (ACCESS_WITH_SIZE): New internal function.
   * tree-ssa-alias.cc (ref_maybe_used_by_call_p_1): Special case
   IFN_ACCESS_WITH_SIZE.
   (call_may_clobber_ref_p_1): Special case IFN_ACCESS_WITH_SIZE.
   * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Eliminate the call
   to .ACCESS_WITH_SIZE when its LHS is dead.
   * tree.cc (process_call_operands): Adjust side effect for function
   .ACCESS_WITH_SIZE.
   (is_access_with_size_p): New function.
   (get_ref_from_access_with_size): New function.
   * tree.h (is_access_with_size_p): New prototype.
   (get_ref_from_access_with_size): New prototype.
   * varasm.cc (initializer_constant_valid_p_1): Handle call to
   .ACCESS_WITH_SIZE.
   (output_constant): Handle call to .ACCESS_WITH_SIZE.
 
 gcc/testsuite/ChangeLog:
 
   * gcc.dg/flex-array-counted-by-2.c: New test.
 ---
 gcc/c/c-parser.cc |  10 +-
 gcc/c/c-tree.h|   2 +-
 gcc/c/c-typeck.cc | 128 +-
 gcc/internal-fn.cc|  35 +
 gcc/internal-fn.def   |   4 +
 .../gcc.dg/flex-array-counted-by-2.c  | 112 +++
 gcc/tree-ssa-alias.cc |   2 +
 gcc/tree-ssa-dce.cc   |   5 +-
 gcc/tree.cc   |  25 +++-
 gcc/tree.h|   8 ++
 gcc/varasm.cc |  10 ++
 11 files changed, 331 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/flex-array-counted-by-2.c
 
 diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
 index c31349dae2ff..a6ed5ac43bb1 100644
 --- a/gcc/c/c-parser.cc
 +++ b/gcc/c/c-parser.cc
 @@ -10850,9 +10850,12 @@ c_parser_postfix_expression (c_p

Re: [PATCH 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-29 Thread Robin Dapp
On 5/28/24 23:55, Patrick O'Neill wrote:
> From: Greg McGary 
> 
> Add option -m(no-)autovec-segment to enable/disable autovectorizer
> from emitting vector segment load/store instructions. This is useful for
> performance experiments.

I think the question was raised before but does a vector tune model
with high segment permute costs help for that already?  We didn't have
those when the patch was initially posted.
If so, we wouldn't need a specific option.

Regards
 Robin



[PATCH 2/3] Reduce single-lane SLP testresult noise

2024-05-29 Thread Richard Biener
The following avoids dumping 'vectorizing stmts using SLP' for
single-lane instances since that causes extra testsuite fallout.

* tree-vect-slp.cc (vect_schedule_slp): Gate dumping
'vectorizing stmts using SLP' on > 1 lanes.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 6ab00661382..bb943e5e6c7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -10103,7 +10103,8 @@ vect_schedule_slp (vec_info *vinfo, const 
vec &slp_instances)
   if (!SLP_INSTANCE_ROOT_STMTS (instance).is_empty ())
vectorize_slp_instance_root_stmt (node, instance);
 
-  if (dump_enabled_p ())
+  /* ???  Reduce some testsuite noise because of "more SLP".  */
+  if (SLP_TREE_LANES (node) > 1 && dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
  "vectorizing stmts using SLP.\n");
 }
-- 
2.35.3



[PATCH 1/3] Do single-lane SLP discovery for reductions

2024-05-29 Thread Richard Biener
The following performs single-lane SLP discovery for reductions.
It requires a fixup for outer loop vectorization where a check
for multiple types needs adjustments as otherwise bogus pointer
IV increments happen when there are multiple copies of vector stmts
in the inner loop.

For the reduction epilog handling this extends the optimized path
to cover the trivial single-lane SLP reduction case.

The fix for PR65518 implemented in vect_grouped_load_supported for
non-SLP needs a SLP counterpart that I put in get_group_load_store_type.

I've squashed parts of the previous series, no changes but the
added last patch (in this series) and the already pushed
r15-858-g65aa46ffc3b06b.

* tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane
discoveries are reduction chains and need special backedge
treatment.
(vect_analyze_slp): Fall back to single-lane SLP discovery
for reductions.  Make sure to try single-lane SLP reduction
for all reductions as fallback.
(vectorizable_load): Avoid outer loop SLP vectorization with
multi-copy vector stmts in the inner loop.
(vectorizable_store): Likewise.
* tree-vect-loop.cc (vect_create_epilog_for_reduction): Allow
direct opcode and shift reduction also for SLP reductions
with a single lane.
* tree-vect-stmts.cc (get_group_load_store_type): For SLP also
check for the PR65518 single-element interleaving case as done in
vect_grouped_load_supported.
---
 gcc/tree-vect-loop.cc  |  4 +--
 gcc/tree-vect-slp.cc   | 71 --
 gcc/tree-vect-stmts.cc | 24 --
 3 files changed, 78 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 3b94bb13a8b..24a1239f016 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6507,7 +6507,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   /* 2.3 Create the reduction code, using one of the three schemes described
  above. In SLP we simply need to extract all the elements from the 
  vector (without reducing them), so we use scalar shifts.  */
-  else if (reduc_fn != IFN_LAST && !slp_reduc)
+  else if (reduc_fn != IFN_LAST && (!slp_reduc || group_size == 1))
 {
   tree tmp;
   tree vec_elem_type;
@@ -6677,7 +6677,7 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
   gsi_insert_seq_before (&exit_gsi, stmts, GSI_SAME_STMT);
   reduc_inputs[0] = new_temp;
 
-  if (reduce_with_shift && !slp_reduc)
+  if (reduce_with_shift && (!slp_reduc || group_size == 1))
{
  int element_bitsize = tree_to_uhwi (bitsize);
  /* Enforced by vectorizable_reduction, which disallows SLP reductions
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7a963e28063..6ab00661382 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1912,7 +1912,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
/* Reduction chain backedge defs are filled manually.
   ???  Need a better way to identify a SLP reduction chain PHI.
   Or a better overall way to SLP match those.  */
-   if (all_same && def_type == vect_reduction_def)
+   if (stmts.length () > 1
+   && all_same && def_type == vect_reduction_def)
  skip_args[loop_latch_edge (loop)->dest_idx] = true;
  }
else if (def_type != vect_internal_def)
@@ -3910,9 +3911,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
  }
 
   /* Find SLP sequences starting from groups of reductions.  */
-  if (loop_vinfo->reductions.length () > 1)
+  if (loop_vinfo->reductions.length () > 0)
{
- /* Collect reduction statements.  */
+ /* Collect reduction statements we can combine into
+a SLP reduction.  */
  vec scalar_stmts;
  scalar_stmts.create (loop_vinfo->reductions.length ());
  for (auto next_info : loop_vinfo->reductions)
@@ -3925,25 +3927,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned 
max_tree_size)
 reduction path.  In that case we'd have to reverse
 engineer that conversion stmt following the chain using
 reduc_idx and from the PHI using reduc_def.  */
- && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def
- /* Do not discover SLP reductions for lane-reducing ops, that
-will fail later.  */
- && (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
+ && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def)
+   {
+ /* Do not discover SLP reductions combining lane-reducing
+ops, that will fail later.  */
+ if (!(g = dyn_cast  (STMT_VINFO_STMT (next_info)))
  || (gimple_assign_rhs_code (g) != DOT

[PATCH 3/3] RISC-V: Avoid inserting after a GIMPLE_COND with SLP and early break

2024-05-29 Thread Richard Biener
When vectorizing an early break loop with LENs (do we miss some
check here to disallow this?) we can end up deciding to insert
stmts after a GIMPLE_COND when doing SLP scheduling and trying
to be conservative with placing of stmts only dependent on
the implicit loop mask/len.  The following avoids this, I guess
it's not perfect but it does the job fixing some observed
RISC-V regression.

This is a new fix to avoid some RISC-V regressions.  I'd like to
see how much remains there.

* tree-vect-slp.cc (vect_schedule_slp_node): For mask/len
loops make sure to not advance the insertion iterator
beyond a GIMPLE_COND.
---
 gcc/tree-vect-slp.cc | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index bb943e5e6c7..bbfde8849c1 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -9684,7 +9684,12 @@ vect_schedule_slp_node (vec_info *vinfo,
   else
{
  si = gsi_for_stmt (last_stmt);
- gsi_next (&si);
+ /* When we're getting gsi_after_labels from the starting
+condition of a fully masked/len loop avoid insertion
+after a GIMPLE_COND that can appear as the only header
+stmt with early break vectorization.  */
+ if (gimple_code (last_stmt) != GIMPLE_COND)
+   gsi_next (&si);
}
 }
 
-- 
2.35.3


[PATCH] Fix LTO type mismatch warning on transparent union

2024-05-29 Thread Eric Botcazou
Hi,

Ada doesn't have an equivalent to transparent union types in GNU C so, when it 
needs to interface a C function that takes a parameter of a transparent union 
type, GNAT uses the type of the first member of the union on the Ada side 
(which is the type used to determine the passing mechanism of the parameter).  
This works fine, except that LTO may warn about it; for the attached testcase:

.> gcc -c t.c -O2 -flto -D_GNU_SOURCE
.> gnatmake -q p -O2 -flto -largs t.o

q.ads:6:12: warning: type of 'q__c_getpeername' does not match original 
declaration [-Wlto-type-mismatch]
6 |   function C_Getpeername
  |^
/usr/include/sys/socket.h:130:12: note: type mismatch in parameter 2
  130 | extern int getpeername (int __fd, __SOCKADDR_ARG __addr,
  |^
/usr/include/sys/socket.h:130:12: note: 'getpeername' was previously declared 
here
/usr/include/sys/socket.h:130:12: note: code may be misoptimized unless '-fno-
strict-aliasing' is used


The attached patch recognizes the situation and checks the compatibility with 
the type of the first member of the union in this case.

Tested on x86-64/Linux, OK for the mainline?


2024-05-29  Eric Botcazou  

* lto/lto-symtab.cc (warn_type_compatibility_p): Deal with
parameters whose type is a transparent union specially.

-- 
Eric Botcazoudiff --git a/gcc/lto/lto-symtab.cc b/gcc/lto/lto-symtab.cc
index a40218beac5..ca5a79610bb 100644
--- a/gcc/lto/lto-symtab.cc
+++ b/gcc/lto/lto-symtab.cc
@@ -233,8 +233,20 @@ warn_type_compatibility_p (tree prevailing_type, tree type,
 	   parm1 && parm2;
 	   parm1 = TREE_CHAIN (parm1),
 	   parm2 = TREE_CHAIN (parm2))
-	lev |= warn_type_compatibility_p (TREE_VALUE (parm1),
-	  TREE_VALUE (parm2), false);
+	/* If a function with a transparent union parameter is interfaced
+	   with another type, check that the latter is compatible with the
+	   type of the first field of the union, which is the type used to
+	   set the calling convention for the argument.  */
+	if (TREE_CODE (TREE_VALUE (parm1)) == UNION_TYPE
+		&& TYPE_TRANSPARENT_AGGR (TREE_VALUE (parm1))
+		&& TREE_CODE (TREE_VALUE (parm2)) != UNION_TYPE
+		&& common_or_extern)
+	  lev |= warn_type_compatibility_p
+		   (TREE_TYPE (TYPE_FIELDS (TREE_VALUE (parm1))),
+			TREE_VALUE (parm2), false);
+	else
+	  lev |= warn_type_compatibility_p (TREE_VALUE (parm1),
+		TREE_VALUE (parm2), false);
 	  if (parm1 || parm2)
 	lev |= odr_p ? 3 : 1;
 	}
with Interfaces.C; use Interfaces.C;
with System;

with Q; use Q;

procedure P is
  L : aliased unsigned;
  I : int := C_Getpeername (0, System.Null_Address, L'Access);

begin
  null;
end;
with Interfaces.C;
with System;

package Q is

  function C_Getpeername
  (S   : Interfaces.C.int;
   Name: System.Address;
   Namelen : not null access Interfaces.C.unsigned) return Interfaces.C.int;
  pragma Import (C, C_Getpeername, "getpeername");

  procedure Foo;
  pragma Import (C, Foo, "foo");

end Q;
#include 
#include 

void foo (void)
{
  int i = getpeername (0, NULL, NULL);
}


Re: [PATCH v6 1/8] Improve must tail in RTL backend

2024-05-29 Thread Michael Matz
On Tue, 21 May 2024, Andi Kleen wrote:

> - Give error messages for all causes of non sibling call generation
> - When giving error messages clear the musttail flag to avoid ICEs
> - Error out when tree-tailcall failed to mark a must-tail call
> sibcall. In this case it doesn't know the true reason and only gives
> a vague message.

Sorry for jumping in late, Richi triggered me :)  But some general 
remarks:

I think the ultimate knowledge if a call can or cannot be implemented as 
tail-call lies within calls.cc/expand_call: It is inherently 
target and ABI specific how arguments and returns are layed out, how the 
stack frame is generated, if arguments are or aren't removed by callers 
or callees and so on; all of that being knowledge that tree-tailcall 
doesn't have and doesn't want to have.  As such tree-tailcall should 
not be regarded as ultimate truth, and failures of tree-tailcall to 
recognize something as tail-callable shouldn't matter.

It then follows that tree-tailcall needn't be run at -O0 merely for 
setting the flag.  Instead calls.cc simply should try expanding a 
tail-call when it sees the must-tail flag (as it right now would do), i.e. 
trust the user.  If that fails for some reasons then that means that the 
checks within calls.cc aren't complete enough (and that tree-tailcall 
papered over that problem).  That would be (IMHO) an independend bug to be 
solved.  But _when_ those bugs are fixed then what you merely need to do 
for the musttail attribute is to set that flag on the gimple_call, 
possibly make sure that nothing (tree-tailcall!) removes the flag, and be 
done.

(For avoidance of doubt: with tree-tailcall I mean the tree sibcall call 
pass, "tailc", not the tail-recursion pass).

IOW: I don't see why the tree pass needs to be run at -O0 for musttail.  
If something doesn't work currently then that points to other 
deficiencies.


Ciao,
Michael.

> 
>   PR83324
> 
> gcc/ChangeLog:
> 
>   * calls.cc (expand_call): Fix mustcall implementation.
>   (maybe_complain_about_tail_call): Clear must tail flag on error.
> ---
>  gcc/calls.cc | 30 --
>  1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 21d78f9779fe..161e36839654 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1249,6 +1249,7 @@ maybe_complain_about_tail_call (tree call_expr, const 
> char *reason)
>  return;
>  
>error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
> +  CALL_EXPR_MUST_TAIL_CALL (call_expr) = 0;
>  }
>  
>  /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
> @@ -2650,7 +2651,11 @@ expand_call (tree exp, rtx target, int ignore)
>/* The type of the function being called.  */
>tree fntype;
>bool try_tail_call = CALL_EXPR_TAILCALL (exp);
> -  bool must_tail_call = CALL_EXPR_MUST_TAIL_CALL (exp);
> +  /* tree-tailcall decided not to do tail calls. Error for the musttail case,
> + unfortunately we don't know the reason so it's fairly vague.
> + When tree-tailcall reported an error it already cleared the flag.  */
> +  if (!try_tail_call)
> +  maybe_complain_about_tail_call (exp, "other reasons");
>int pass;
>  
>/* Register in which non-BLKmode value will be returned,
> @@ -3022,10 +3027,21 @@ expand_call (tree exp, rtx target, int ignore)
>   pushed these optimizations into -O2.  Don't try if we're already
>   expanding a call, as that means we're an argument.  Don't try if
>   there's cleanups, as we know there's code to follow the call.  */
> -  if (currently_expanding_call++ != 0
> -  || (!flag_optimize_sibling_calls && !CALL_FROM_THUNK_P (exp))
> -  || args_size.var
> -  || dbg_cnt (tail_call) == false)
> +  if (currently_expanding_call++ != 0)
> +{
> +  maybe_complain_about_tail_call (exp, "inside another call");
> +  try_tail_call = 0;
> +}
> +  if (!flag_optimize_sibling_calls
> + && !CALL_FROM_THUNK_P (exp)
> + && !CALL_EXPR_MUST_TAIL_CALL (exp))
> +try_tail_call = 0;
> +  if (args_size.var)
> +{
> +  maybe_complain_about_tail_call (exp, "variable size arguments");
> +  try_tail_call = 0;
> +}
> +  if (dbg_cnt (tail_call) == false)
>  try_tail_call = 0;
>  
>/* Workaround buggy C/C++ wrappers around Fortran routines with
> @@ -3046,13 +3062,15 @@ expand_call (tree exp, rtx target, int ignore)
>   if (MEM_P (*iter))
> {
>   try_tail_call = 0;
> + maybe_complain_about_tail_call (exp,
> + "hidden string length argument passed on 
> stack");
>   break;
> }
>   }
>  
>/* If the user has marked the function as requiring tail-call
>   optimization, attempt it.  */
> -  if (must_tail_call)
> +  if (CALL_EXPR_MUST_TAIL_CALL (exp))
>  try_tail_call = 1;
>  
>/*  Rest of purposes for tail call optimizations to fail.  */
> 


[pushed] c++: add module extensions

2024-05-29 Thread Jason Merrill
Revised to change mkdeps and the docs.

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

There is a trend in the broader C++ community to use a different extension
for module interface units, even though (in GCC) they are compiled in the
same way as other source files.  Let's recognize these extensions as C++.

.ixx is the MSVC standard, while the .c*m are supported by Clang.  libc++
standard headers use .cppm, as their other source files use .cpp.
Perhaps libstdc++ might use .ccm for parallel consistency?

One issue with .c++m is that libcpp/mkdeps.cc has been using it for the
phony dependencies to express module dependencies, so I'm changing mkdeps to
something less likely to be an actual file, ".c++-module".

gcc/cp/ChangeLog:

* lang-specs.h: Add module interface extensions.

gcc/ChangeLog:

* doc/invoke.texi: Update module extension docs.

libcpp/ChangeLog:

* mkdeps.cc (make_write): Change .c++m to .c++-module.

gcc/testsuite/ChangeLog:

* g++.dg/modules/dep-1_a.C
* g++.dg/modules/dep-1_b.C
* g++.dg/modules/dep-2.C: Change .c++m to .c++-module.
---
 gcc/doc/invoke.texi| 20 ++--
 gcc/cp/lang-specs.h|  6 ++
 gcc/testsuite/g++.dg/modules/dep-1_a.C |  4 ++--
 gcc/testsuite/g++.dg/modules/dep-1_b.C |  8 
 gcc/testsuite/g++.dg/modules/dep-2.C   |  4 ++--
 libcpp/mkdeps.cc   | 13 ++---
 6 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2cba380718b..517a782987d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2317,9 +2317,12 @@ other language.
 C++ source files conventionally use one of the suffixes @samp{.C},
 @samp{.cc}, @samp{.cpp}, @samp{.CPP}, @samp{.c++}, @samp{.cp}, or
 @samp{.cxx}; C++ header files often use @samp{.hh}, @samp{.hpp},
-@samp{.H}, or (for shared template code) @samp{.tcc}; and
-preprocessed C++ files use the suffix @samp{.ii}.  GCC recognizes
-files with these names and compiles them as C++ programs even if you
+@samp{.H}, or (for shared template code) @samp{.tcc};
+preprocessed C++ files use the suffix @samp{.ii}; and C++20 module interface
+units sometimes use @samp{.ixx}, @samp{.cppm}, @samp{.cxxm}, @samp{.c++m},
+or @samp{.ccm}.
+
+GCC recognizes files with these names and compiles them as C++ programs even 
if you
 call the compiler the same way as for compiling C programs (usually
 with the name @command{gcc}).
 
@@ -37705,13 +37708,10 @@ Modular compilation is @emph{not} enabled with just 
the
 version selected, although in pre-C++20 versions, it is of course an
 extension.
 
-No new source file suffixes are required or supported.  If you wish to
-use a non-standard suffix (@pxref{Overall Options}), you also need
-to provide a @option{-x c++} option too.@footnote{Some users like to
-distinguish module interface files with a new suffix, such as naming
-the source @code{module.cppm}, which involves
-teaching all tools about the new suffix.  A different scheme, such as
-naming @code{module-m.cpp} would be less invasive.}
+No new source file suffixes are required.  A few suffixes preferred
+for module interface units by other compilers (e.g. @samp{.ixx},
+@samp{.cppm}) are supported, but files with these suffixes are treated
+the same as any other C++ source file.
 
 Compiling a module interface unit produces an additional output (to
 the assembly or object file), called a Compiled Module Interface
diff --git a/gcc/cp/lang-specs.h b/gcc/cp/lang-specs.h
index 7a7f5ff0ab5..e5651567a2d 100644
--- a/gcc/cp/lang-specs.h
+++ b/gcc/cp/lang-specs.h
@@ -39,6 +39,12 @@ along with GCC; see the file COPYING3.  If not see
   {".HPP", "@c++-header", 0, 0, 0},
   {".tcc", "@c++-header", 0, 0, 0},
   {".hh",  "@c++-header", 0, 0, 0},
+  /* Module interface unit.  Should there also be a .C counterpart?  */
+  {".ixx", "@c++", 0, 0, 0}, /* MSVC */
+  {".cppm", "@c++", 0, 0, 0}, /* Clang/libc++ */
+  {".cxxm", "@c++", 0, 0, 0},
+  {".c++m", "@c++", 0, 0, 0},
+  {".ccm", "@c++", 0, 0, 0},
   {"@c++-header",
   "%{E|M|MM:cc1plus -E %{fmodules-ts:-fdirectives-only -fmodule-header}"
   "  %(cpp_options) %2 %(cpp_debug_options)}"
diff --git a/gcc/testsuite/g++.dg/modules/dep-1_a.C 
b/gcc/testsuite/g++.dg/modules/dep-1_a.C
index 5ec5dd30f6d..3e92eeaef9f 100644
--- a/gcc/testsuite/g++.dg/modules/dep-1_a.C
+++ b/gcc/testsuite/g++.dg/modules/dep-1_a.C
@@ -4,6 +4,6 @@ export module m:part;
 // { dg-module-cmi m:part }
 
 // All The Backslashes!
-// { dg-final { scan-file dep-1_a.d {\nm:part\.c\+\+m: gcm.cache/m-part\.gcm} 
} }
+// { dg-final { scan-file dep-1_a.d {\nm:part\.c\+\+-module: 
gcm.cache/m-part\.gcm} } }
 // { dg-final { scan-file dep-1_a.d {\ngcm.cache/m-part\.gcm:| dep-1_a\.o} } }
-// { dg-final { scan-file dep-1_a.d {\n\.PHONY: m:part\.c\+\+m} } }
+// { dg-final { scan-file dep-1_a.d {\n\.PHONY: m:part\.c\+\+-module} } }
diff --git a/gcc/testsuite/g++.dg/modules/de

[pushed] c++: pragma target and static init [PR109753]

2024-05-29 Thread Jason Merrill
Revised to drop the cgraph change so I can self-approve the remaining patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

 #pragma target and optimize should also apply to implicitly-generated
 functions like static initialization functions and defaulted special member
 functions.

The handle_optimize_attribute change is necessary to avoid regressing
g++.dg/opt/pr105306.C; maybe_clone_body creates a cgraph_node for the ~B
alias before handle_optimize_attribute, and the alias never goes through
finalize_function, so we need to adjust semantic_interposition somewhere
else.

PR c++/109753

gcc/c-family/ChangeLog:

* c-attribs.cc (handle_optimize_attribute): Set
cgraph_node::semantic_interposition.

gcc/cp/ChangeLog:

* decl.cc (start_preparsed_function): Call decl_attributes.

gcc/testsuite/ChangeLog:

* g++.dg/opt/always_inline1.C: New test.
---
 gcc/c-family/c-attribs.cc | 4 
 gcc/cp/decl.cc| 3 +++
 gcc/testsuite/g++.dg/opt/always_inline1.C | 8 
 3 files changed, 15 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/opt/always_inline1.C

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 04e39b41bdf..605469dd7dd 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -5971,6 +5971,10 @@ handle_optimize_attribute (tree *node, tree name, tree 
args,
   if (prev_target_node != target_node)
DECL_FUNCTION_SPECIFIC_TARGET (*node) = target_node;
 
+  /* Also update the cgraph_node, if it's already built.  */
+  if (cgraph_node *cn = cgraph_node::get (*node))
+   cn->semantic_interposition = flag_semantic_interposition;
+
   /* Restore current options.  */
   cl_optimization_restore (&global_options, &global_options_set,
   &cur_opts);
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index a992d54dc8f..d481e1ec074 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -17832,6 +17832,9 @@ start_preparsed_function (tree decl1, tree attrs, int 
flags)
doing_friend = true;
 }
 
+  /* Adjust for #pragma target/optimize.  */
+  decl_attributes (&decl1, NULL_TREE, 0);
+
   if (DECL_DECLARED_INLINE_P (decl1)
   && lookup_attribute ("noinline", attrs))
 warning_at (DECL_SOURCE_LOCATION (decl1), 0,
diff --git a/gcc/testsuite/g++.dg/opt/always_inline1.C 
b/gcc/testsuite/g++.dg/opt/always_inline1.C
new file mode 100644
index 000..a042a1cf0c6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/always_inline1.C
@@ -0,0 +1,8 @@
+// PR c++/109753
+// { dg-do compile { target x86_64-*-* } }
+
+#pragma GCC target("avx2")
+struct aa {
+__attribute__((__always_inline__)) aa() {}
+};
+aa _M_impl;

base-commit: ff41abdca0ab9993b6170b9b1f46b3a40921f1b0
-- 
2.44.0



[PATCH] c-family: Introduce the -Winvalid-noreturn flag from clang with extra tuneability

2024-05-29 Thread Julian Waters
Currently, gcc warns about noreturn marked functions that return both 
explicitly and implicitly, with no way to turn this warning off. clang does 
have an option for these classes of warnings, -Winvalid-noreturn. However, we 
can do better. Instead of just having 1 option that switches the warnings for 
both on and off, we can define an extra layer of granularity, and have a 
separate options for implicit returns and explicit returns, as in 
-Winvalid-return=explicit and -Winvalid-noreturn=implicit. This patch adds both 
to gcc, for compatibility with clang. Do note that I am relatively new to gcc's 
codebase, and as such couldn't figure out how to cleanly define a general 
-Winvalid-noreturn warning that switch both on and off, for better 
compatibility with clang. If someone should point out how to do so, I'll 
happily rewrite my patch. I also do not have write access to gcc, and will need 
help pushing this patch once the green light is given

best regards,
Julian

gcc/c-family/ChangeLog:

* c.opt: Introduce -Winvalid-noreturn=explicit and 
-Winvalid-noreturn=implicit

gcc/ChangeLog:

* tree-cfg.cc (pass_warn_function_return::execute): Use it

gcc/c/ChangeLog:

* c-typeck.cc (c_finish_return): Use it
* gimple-parser.cc (c_finish_gimple_return): Use it

gcc/config/mingw/ChangeLog:

* mingw32.h (EXTRA_OS_CPP_BUILTINS): Fix semicolons

gcc/cp/ChangeLog:

* coroutines.cc (finish_co_return_stmt): Use it
* typeck.cc (check_return_expr): Use it

gcc/doc/ChangeLog:

* invoke.texi: Document new options

>From 4daf884f8bbc1e318ba93121a6fdf4139da80b64 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Wed, 29 May 2024 21:32:08 +0800
Subject: [PATCH] Introduce the -Winvalid-noreturn flag from clang with extra
 tuneability

Signed-off-by: TheShermanTanker 
---
 gcc/c-family/c.opt |  8 
 gcc/c/c-typeck.cc  |  2 +-
 gcc/c/gimple-parser.cc |  2 +-
 gcc/config/mingw/mingw32.h |  6 +++---
 gcc/cp/coroutines.cc   |  2 +-
 gcc/cp/typeck.cc   |  2 +-
 gcc/doc/invoke.texi| 13 +
 gcc/tree-cfg.cc|  2 +-
 8 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..32a2859fdcc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -886,6 +886,14 @@ Winvalid-constexpr
 C++ ObjC++ Var(warn_invalid_constexpr) Init(-1) Warning
 Warn when a function never produces a constant expression.
 
+Winvalid-noreturn=explicit
+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns explicitly.
+
+Winvalid-noreturn=implicit
+C ObjC C++ ObjC++ Warning
+Warn when a function marked noreturn returns implicitly.
+
 Winvalid-offsetof
 C++ ObjC++ Var(warn_invalid_offsetof) Init(1) Warning
 Warn about invalid uses of the \"offsetof\" macro.
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ad4c7add562..1941fbc44cb 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -11468,7 +11468,7 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
   location_t xloc = expansion_point_location_if_in_system_header (loc);
 
   if (TREE_THIS_VOLATILE (current_function_decl))
-warning_at (xloc, 0,
+warning_at (xloc, OPT_Winvalid_noreturn_explicit,
"function declared % has a % statement");
 
   if (retval)
diff --git a/gcc/c/gimple-parser.cc b/gcc/c/gimple-parser.cc
index d156d83cd37..1acaf75f844 100644
--- a/gcc/c/gimple-parser.cc
+++ b/gcc/c/gimple-parser.cc
@@ -2593,7 +2593,7 @@ c_finish_gimple_return (location_t loc, tree retval)
   location_t xloc = expansion_point_location_if_in_system_header (loc);
 
   if (TREE_THIS_VOLATILE (current_function_decl))
-warning_at (xloc, 0,
+warning_at (xloc, OPT_Winvalid_noreturn_explicit,
"function declared % has a % statement");
 
   if (! retval)
diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index fa6e307476c..a69926133b1 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -35,9 +35,9 @@ along with GCC; see the file COPYING3.  If not see
 | MASK_MS_BITFIELD_LAYOUT)
 
 #ifdef TARGET_USING_MCFGTHREAD
-#define DEFINE_THREAD_MODEL  builtin_define ("__USING_MCFGTHREAD__");
+#define DEFINE_THREAD_MODEL  builtin_define ("__USING_MCFGTHREAD__")
 #elif defined(TARGET_USE_PTHREAD_BY_DEFAULT)
-#define DEFINE_THREAD_MODEL  builtin_define ("__USING_POSIXTHREAD__");
+#define DEFINE_THREAD_MODEL  builtin_define ("__USING_POSIXTHREAD__")
 #else
 #define DEFINE_THREAD_MODEL
 #endif
@@ -60,7 +60,7 @@ along with GCC; see the file COPYING3.  If not see
  builtin_define_std ("WIN64"); \
  builtin_define ("_WIN64");\
}   \
-  DEFINE_THREAD_MODEL  \
+  DEFINE_THREAD_MODEL; \
 }

RE: [PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

2024-05-29 Thread Li, Pan2
Thanks Richard for suggestion and review.

Did some tricky/ugly restrictions v3 for the phi gen as there are 
sorts of (cond in match.pd, will have a try with your proposal in v4.
Thanks again for help.

Pan

-Original Message-
From: Richard Biener  
Sent: Wednesday, May 29, 2024 8:36 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
tamar.christ...@arm.com
Subject: Re: [PATCH v3] Match: Support more form for scalar unsigned SAT_ADD

On Mon, May 27, 2024 at 8:29 AM  wrote:
>
> From: Pan Li 
>
> After we support one gassign form of the unsigned .SAT_ADD,  we
> would like to support more forms including both the branch and
> branchless.  There are 5 other forms of .SAT_ADD,  list as below:
>
> Form 1:
>   #define SAT_ADD_U_1(T) \
>   T sat_add_u_1_##T(T x, T y) \
>   { \
> return (T)(x + y) >= x ? (x + y) : -1; \
>   }
>
> Form 2:
>   #define SAT_ADD_U_2(T) \
>   T sat_add_u_2_##T(T x, T y) \
>   { \
> T ret; \
> T overflow = __builtin_add_overflow (x, y, &ret); \
> return (T)(-overflow) | ret; \
>   }
>
> Form 3:
>   #define SAT_ADD_U_3(T) \
>   T sat_add_u_3_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) ? -1 : ret; \
>   }
>
> Form 4:
>   #define SAT_ADD_U_4(T) \
>   T sat_add_u_4_##T (T x, T y) \
>   { \
> T ret; \
> return __builtin_add_overflow (x, y, &ret) == 0 ? ret : -1; \
>   }
>
> Form 5:
>   #define SAT_ADD_U_5(T) \
>   T sat_add_u_5_##T(T x, T y) \
>   { \
> return (T)(x + y) < x ? -1 : (x + y); \
>   }
>
> Take the forms 3 of above as example:
>
> uint64_t
> sat_add (uint64_t x, uint64_t y)
> {
>   uint64_t ret;
>   return __builtin_add_overflow (x, y, &ret) ? -1 : ret;
> }
>
> Before this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _1;
>   long unsigned int _2;
>   uint64_t _3;
>   __complex__ long unsigned int _6;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
>   _2 = IMAGPART_EXPR <_6>;
>   if (_2 != 0)
> goto ; [35.00%]
>   else
> goto ; [65.00%]
> ;;succ:   4
> ;;3
>
> ;;   basic block 3, loop depth 0
> ;;pred:   2
>   _1 = REALPART_EXPR <_6>;
> ;;succ:   4
>
> ;;   basic block 4, loop depth 0
> ;;pred:   3
> ;;2
>   # _3 = PHI <_1(3), 18446744073709551615(2)>
>   return _3;
> ;;succ:   EXIT
> }
>
> After this patch:
> uint64_t sat_add (uint64_t x, uint64_t y)
> {
>   long unsigned int _12;
>
> ;;   basic block 2, loop depth 0
> ;;pred:   ENTRY
>   _12 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
>   return _12;
> ;;succ:   EXIT
> }
>
> The below test suites are still running, will update it later.
> * The x86 bootstrap test.
> * The x86 fully regression test.
> * The riscv fully regression test.
>
> gcc/ChangeLog:
>
> * genmatch.cc (dt_node::gen_kids): Add new arg of predicate id.
> (allow_phi_predicate_p): New func impl to check the phi
> predicate is allowed or not.
> (dt_node::gen_kids_1): Add COND_EXPR gen for phi node if allowed.
> (dt_operand::gen_phi_on_cond):
> (write_predicate): Init the predicate id before gen_kids.
> * match.pd: Add more forms of unsigned_integer_sat_add and
> comments.
> * tree-ssa-math-opts.cc (match_saturation_arith): Rename from.
> (match_assign_saturation_arith): Rename to.
> (match_phi_saturation_arith): New func impl to match phi.
> (math_opts_dom_walker::after_dom_children): Add phi match for
> echo bb.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/genmatch.cc   | 123 --
>  gcc/match.pd  |  43 -
>  gcc/tree-ssa-math-opts.cc |  51 +++-
>  3 files changed, 210 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f1e0e7abe0c..816d2dafd23 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -1767,6 +1767,7 @@ public:
>unsigned level;
>dt_node *parent;
>vec kids;
> +  const char *id;
>
>/* Statistics.  */
>unsigned num_leafs;
> @@ -1786,7 +1787,7 @@ public:
>virtual void gen (FILE *, int, bool, int) {}
>
>void gen_kids (FILE *, int, bool, int);
> -  void gen_kids_1 (FILE *, int, bool, int,
> +  void gen_kids_1 (FILE *, const char *, int, bool, int,
>const vec &, const vec &,
>const vec &, const vec &,
>const vec &, const vec &);
> @@ -1819,6 +1820,7 @@ public:
>
>char *get_name (char *);
>void gen_opname (char *, unsigned);
> +  void gen_phi_on_cond (FILE *, int, bool, int);
>  };
>
>  /* Leaf node of the decision tree, used for DT_SIMPLIFY.  */
> @@ -3173,7 +3175,7 @@ dt_node::gen_kids (FILE *f, int indent, bool gimple, 
> int depth)
>  for what we have collected sofar.  */
>   fns.qsort (fns_cmp);
>   generic_fns.qsort (f

Re: [COMMITTED] tree-optimization/115221 - Do not invoke SCEV if it will use a different range query.

2024-05-29 Thread Andrew MacLeod



On 5/29/24 03:19, Richard Biener wrote:

On Tue, May 28, 2024 at 8:57 PM Andrew MacLeod  wrote:

The original patch causing the PR made  ranger's cache re-entrant to
enable SCEV to use the current range_query when called from within ranger..

SCEV uses the currently active range query (via get_range_query()) for
picking up values.  fold_using_range is the general purpose stmt folder
many  components use, and it takes a range_query to use for folding.
When propagating values in the cache, we need to ensure no new queries
are invoked, and when the cache is propagating and calculating outgoing
edges, it switches to a read only range_query which uses what it knows
about global values to come up with best result using current state.

SCEV is unaware of what the caller is using for a range_query, so when
attempting to fold a PHI node, it is re-invoking the current query
during propagation which is undesired behavior.   This patch tells
fold_using_range to not use SCEV if the range_query being used is not
the same as the one SCEV is going to use.

Bootstrapped on x86_64-pc-linux-gnu with no regressions. Pushed.

Can we dump a hint to an active dump-file if this happens?  I suppose it's
an unwanted situation, like the pass not setting the active ranger?  Sth
like

if (src.query () != get_range_query (cfun)
&& dump_file)
 fprintf (dump_file, "Using a range query different from the
installed one\n");

(or better wording).


Sure, I can add that info to the dump.   Its unlikely to happen in 
places other than from ranger's cache, but you never know.





Btw, could we install src.query () as the global range query around the
relevant recursion or is the place around where we'd need to do this
not so clear-cut?


The problem is actually the other way around.  The relevant recursion 
location *is* using the global query to try to  prevent problems.  SCEV 
is not honoring the use of an alternative to a non-active range_query


So. An active ranger is the current range_query.  Cache propagation of a 
ssa-name happens when you ask for a range further down in the CFG than 
has been processed yet.  It walks the CFG between the last dominator 
which has a range calculated, (if any), filling in the on-entry cache 
with values for the name based on outgoing edge calculations.  it 
specifically uses only a global query to fold and evaluate the value of 
that name on edges along the way to ensure it doesnt trigger recursion 
with the the active ranger.  When propagating the cache for an SSA_NAME, 
it should finish the propagation and return the final value before it 
goes off to do any other activity.    Without this patch, when the cache 
processes a PHI node via fold_using_ranges(), the generic fold routine 
checks with SCEV and SCEV always uses get_range_query instead of a 
specific operand source...  which invokes the active ranger again :-P


So this patch is mostly to make sure that of we are going to invoke 
SCEV, we only invoke it when we are calling it with the same query. 
which is 99.9% of the time.    If a pass is only using the global range 
query, it'll still work fine because get_range_query () returns the 
global range query, and thats what SCEV will be using.   If a pass uses 
enable_ranger()  and itself uses get_range_query(), then SCEV will also 
work as expected.


Andrew



Compare loop bounds in ipa-icf

2024-05-29 Thread Jan Hubicka
Hi,
this testcase shows another poblem with missing comparators for metadata
in ICF. With value ranges available to loop optimizations during early
opts we can estimate number of iterations based on guarding condition that
can be split away by the fnsplit pass. This patch disables ICF when
number of iteraitons does not match.

Bootstrapped/regtesed x86_64-linux, will commit it shortly

gcc/ChangeLog:

PR ipa/115277
* ipa-icf-gimple.cc (func_checker::compare_loops):

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/pr115277.c: New test.

diff --git a/gcc/ipa-icf-gimple.cc b/gcc/ipa-icf-gimple.cc
index c25eb24710f..4c3174b68b6 100644
--- a/gcc/ipa-icf-gimple.cc
+++ b/gcc/ipa-icf-gimple.cc
@@ -543,6 +543,10 @@ func_checker::compare_loops (basic_block bb1, basic_block 
bb2)
 return return_false_with_msg ("unroll");
   if (!compare_variable_decl (l1->simduid, l2->simduid))
 return return_false_with_msg ("simduid");
+  if ((l1->any_upper_bound != l2->any_upper_bound)
+  || (l1->any_upper_bound
+ && (l1->nb_iterations_upper_bound != l2->nb_iterations_upper_bound)))
+return return_false_with_msg ("nb_iterations_upper_bound");
 
   return true;
 }
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr115277.c 
b/gcc/testsuite/gcc.c-torture/compile/pr115277.c
new file mode 100644
index 000..27449eb254f
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/pr115277.c
@@ -0,0 +1,28 @@
+int array[1000];
+void
+test (int a)
+{
+if (__builtin_expect (a > 3, 1))
+return;
+for (int i = 0; i < a; i++)
+array[i]=i;
+}
+void
+test2 (int a)
+{
+if (__builtin_expect (a > 10, 1))
+return;
+for (int i = 0; i < a; i++)
+array[i]=i;
+}
+int
+main()
+{
+test(1);
+test(2);
+test(3);
+test2(10);
+if (array[9] != 9)
+__builtin_abort ();
+return 0;
+}


[patch] libgomp.texi: Impl. update for USM and missing 5.2 item

2024-05-29 Thread Tobias Burnus
Now that unified-shared memory works (with some devices), mark it as 'Y' 
and link to the device-specific chapter. While there is always room for 
improvement (like having opt-in partial support for managed-memory 
semi-USM devices), it works sufficienty for a 'Y'.


Additionally, I saw that 5.2 now extended what is permitted inside 
'declare mapper'. Instead of listening the permitted clauses as in 5.1, 
it now refers to the 'map' clause such that 'delete'/'release', 
'present' and in particular 'iterator' and 'mapper' itself are permitted 
inside a declare-mapper 'map' clause. - Thus, I added it as to-do item 
to the 5.2 status.


Comments?

Tobias

PS: As this is also about USM, the declare-target USM issue I mentioned 
in several patch emails is now filed as https://gcc.gnu.org/PR115279libgomp.texi: Impl. update for USM and missing 5.2 item

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
	link to 'Offload-Target Specifics'.
	(OpenMP 5.2 status): Add item about additional map-type modifiers
	in 'declare mapper'.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e79bd7a3392..03e6455219d 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -198,8 +198,8 @@ The OpenMP 4.5 specification is fully supported.
 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
-@item @code{requires} directive @tab P
-  @tab complete but no non-host device provides @code{unified_shared_memory}
+@item @code{requires} directive @tab Y
+  @tab See @ref{Offload-Target Specifics}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P
   @tab Full support for C/C++, partial for Fortran
@@ -443,6 +443,8 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
   of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
   on other devices @tab N @tab
+@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mappter}
+  @tab N @tab
 @end multitable
 
 


Re: [PATCH] Fix LTO type mismatch warning on transparent union

2024-05-29 Thread Richard Biener



> Am 29.05.2024 um 15:30 schrieb Eric Botcazou :
> 
> Hi,
> 
> Ada doesn't have an equivalent to transparent union types in GNU C so, when it
> needs to interface a C function that takes a parameter of a transparent union
> type, GNAT uses the type of the first member of the union on the Ada side
> (which is the type used to determine the passing mechanism of the parameter). 
>  
> This works fine, except that LTO may warn about it; for the attached testcase:
> 
> .> gcc -c t.c -O2 -flto -D_GNU_SOURCE
> .> gnatmake -q p -O2 -flto -largs t.o
> 
> q.ads:6:12: warning: type of 'q__c_getpeername' does not match original
> declaration [-Wlto-type-mismatch]
>6 |   function C_Getpeername
>  |^
> /usr/include/sys/socket.h:130:12: note: type mismatch in parameter 2
>  130 | extern int getpeername (int __fd, __SOCKADDR_ARG __addr,
>  |^
> /usr/include/sys/socket.h:130:12: note: 'getpeername' was previously declared
> here
> /usr/include/sys/socket.h:130:12: note: code may be misoptimized unless '-fno-
> strict-aliasing' is used
> 
> 
> The attached patch recognizes the situation and checks the compatibility with
> the type of the first member of the union in this case.
> 
> Tested on x86-64/Linux, OK for the mainline?

Do function pointers inter-operate TBAA wise for this case and would this 
possibly
An issue?

Richard 

> 
> 2024-05-29  Eric Botcazou  
> 
>* lto/lto-symtab.cc (warn_type_compatibility_p): Deal with
>parameters whose type is a transparent union specially.
> 
> --
> Eric Botcazou
> 
> 
> 
> 


[PATCH] aarch64: Split aarch64_combinev16qi before RA [PR115258]

2024-05-29 Thread Richard Sandiford
Two-vector TBL instructions are fed by an aarch64_combinev16qi, whose
purpose is to put the two input data vectors into consecutive registers.
This aarch64_combinev16qi was then split after reload into individual
moves (from the first input to the first half of the output, and from
the second input to the second half of the output).

In the worst case, the RA might allocate things so that the destination
of the aarch64_combinev16qi is the second input followed by the first
input.  In that case, the split form of aarch64_combinev16qi uses three
eors to swap the registers around.

This PR is about a test where this worst case occurred.  And given the
insn description, that allocation doesn't semm unreasonable.

early-ra should (hopefully) mean that we're now better at allocating
subregs of vector registers.  The upcoming RA subreg patches should
improve things further.  The best fix for the PR therefore seems
to be to split the combination before RA, so that the RA can see
the underlying moves.

Perhaps it even makes sense to do this at expand time, avoiding the need
for aarch64_combinev16qi entirely.  That deserves more experimentation
though.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
PR target/115258
* config/aarch64/aarch64-simd.md (aarch64_combinev16qi): Allow
the split before reload.
* config/aarch64/aarch64.cc (aarch64_split_combinev16qi): Generalize
into a form that handles pseudo registers.

gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md  |  2 +-
 gcc/config/aarch64/aarch64.cc   | 29 ++---
 gcc/testsuite/gcc.target/aarch64/pr115258.c | 19 ++
 3 files changed, 34 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr115258.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index c311888e4bd..868f4486218 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -8474,7 +8474,7 @@ (define_insn_and_split "aarch64_combinev16qi"
UNSPEC_CONCAT))]
   "TARGET_SIMD"
   "#"
-  "&& reload_completed"
+  "&& 1"
   [(const_int 0)]
 {
   aarch64_split_combinev16qi (operands);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index ee12d8897a8..13191ec8e34 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25333,27 +25333,26 @@ aarch64_output_sve_ptrues (rtx const_unspec)
 void
 aarch64_split_combinev16qi (rtx operands[3])
 {
-  unsigned int dest = REGNO (operands[0]);
-  unsigned int src1 = REGNO (operands[1]);
-  unsigned int src2 = REGNO (operands[2]);
   machine_mode halfmode = GET_MODE (operands[1]);
-  unsigned int halfregs = REG_NREGS (operands[1]);
-  rtx destlo, desthi;
 
   gcc_assert (halfmode == V16QImode);
 
-  if (src1 == dest && src2 == dest + halfregs)
+  rtx destlo = simplify_gen_subreg (halfmode, operands[0],
+   GET_MODE (operands[0]), 0);
+  rtx desthi = simplify_gen_subreg (halfmode, operands[0],
+   GET_MODE (operands[0]),
+   GET_MODE_SIZE (halfmode));
+
+  bool skiplo = rtx_equal_p (destlo, operands[1]);
+  bool skiphi = rtx_equal_p (desthi, operands[2]);
+
+  if (skiplo && skiphi)
 {
   /* No-op move.  Can't split to nothing; emit something.  */
   emit_note (NOTE_INSN_DELETED);
   return;
 }
 
-  /* Preserve register attributes for variable tracking.  */
-  destlo = gen_rtx_REG_offset (operands[0], halfmode, dest, 0);
-  desthi = gen_rtx_REG_offset (operands[0], halfmode, dest + halfregs,
-  GET_MODE_SIZE (halfmode));
-
   /* Special case of reversed high/low parts.  */
   if (reg_overlap_mentioned_p (operands[2], destlo)
   && reg_overlap_mentioned_p (operands[1], desthi))
@@ -25366,16 +25365,16 @@ aarch64_split_combinev16qi (rtx operands[3])
 {
   /* Try to avoid unnecessary moves if part of the result
 is in the right place already.  */
-  if (src1 != dest)
+  if (!skiplo)
emit_move_insn (destlo, operands[1]);
-  if (src2 != dest + halfregs)
+  if (!skiphi)
emit_move_insn (desthi, operands[2]);
 }
   else
 {
-  if (src2 != dest + halfregs)
+  if (!skiphi)
emit_move_insn (desthi, operands[2]);
-  if (src1 != dest)
+  if (!skiplo)
emit_move_insn (destlo, operands[1]);
 }
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/pr115258.c 
b/gcc/testsuite/gcc.target/aarch64/pr115258.c
new file mode 100644
index 000..9a489d4604c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr115258.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** fun:
+** (ldr|adrp)  [^\n]+
+** (ldr|adrp)  [^\n]+
+** (ldr|adrp)  [^\

[PATCH 0/13 ver 3] rs6000, built-in cleanup patch series

2024-05-29 Thread Carl Love


GCC maintainers:

The following is an updated patch series to remove duplicate built-ins.  

There are patches to extend an existing overloaded built-in to cover additional 
input types. 

A new patch, 0005-rs6000-Remove-redundant-float-double-type-conversion.patch, 
was added to remove built-ins that were inadvertently missing in the last 
version.  

Patch 12 patch in the previous series was dropped as the built-in 
__builtin_vsx_xvcmpeqsp is not a duplicate of the overloaded vec_cmpeq 
built-in.  Specifically, the return values are different.  The goal in this 
series is to remove built-ins that are functionally equivalent.  Patch 12 from 
the previous series will be reworked and submitted later.

Some of the patches in the previous series were approved, but everything is 
being reposted for completeness.  The following gives the mapping of the 
patches from the previous version to the current version of the series with 
notes on the patches.

Version 2   Version 3   Notes
patch 1 patch 1 Approved, no changes
patch 2 patch 2 Responded to comments, 
no changes to the patch
patch 3 patch 3 Updated changelog, no 
functional changes
patch 4 patch 4 Updated patch
patch 5 New patch to removed 
built-ins missed in the
series.
patch 5 patch 6 Updated patch
patch 6 patch 7 Updated patch
patch 7 patch 8 Updated patch
patch 8 patch 9 Approved, no changes to 
this patch
patch 9 patch 10Approved, no changes to 
this patch
patch 10patch 11Updated, added test 
file.
patch 11patch 12Updated
patch 12Patch from previous 
series removed
patch 13patch 13Comments said built-ins 
__builtin_vec_set_v1ti
__builtin_vec_set_v2di, 
__builtin_vec_set_v2df
can also get removed 
with equivalent gimple codes.
This is somewhat more 
involved than a simple
removal of redundant 
built-ins.  The built-ins 
will be removed in a 
separate future patch.

The patch series has been tested on Power 10 LE, Power 9 BE with no regression 
failures.
in additional patch


The patches have all been tested on Power 10 LE.  The last patch was also 
tested on Power 8 BE.

No regression tests were seen.

Please let me know if the patches are acceptable for mainline.  Thanks.

   Carl 




Re: [PATCH 1/13 ver 3] s6000, Remove __builtin_vsx_cmple* builtins

2024-05-29 Thread Carl Love
This patch was approved in the previous series.  There are no changes to this 
patch.  Reposting for completeness. 

 Carl 
---

rs6000, Remove __builtin_vsx_cmple* builtins

The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
unsigned arguments and return an unsigned result.  The current definitions
take signed arguments and return signed results which is incorrect.

The signed and unsigned versions of __builtin_vsx_cmple* are not
documented in extend.texi.  Also there are no test cases for the
built-ins.

Users can use the existing vec_cmple as PVIPR defines instead of
__builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
__builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
__builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
__builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
__builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.

Hence these built-ins are redundant and are removed by this patch.

gcc/ChangeLog:
* config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
RS6000_BIF_CMPLE_U1TI): Remove case statements.
* config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
__builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
__builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
__builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
__builtin_vsx_cmple_u8hi): Remove buit-in definitions.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 13 
 gcc/config/rs6000/rs6000-builtins.def | 30 ---
 2 files changed, 43 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 320affd79e3..ac9f16fe51a 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   fold_compare_helper (gsi, GT_EXPR, stmt);
   return true;
 
-case RS6000_BIF_CMPLE_16QI:
-case RS6000_BIF_CMPLE_U16QI:
-case RS6000_BIF_CMPLE_8HI:
-case RS6000_BIF_CMPLE_U8HI:
-case RS6000_BIF_CMPLE_4SI:
-case RS6000_BIF_CMPLE_U4SI:
-case RS6000_BIF_CMPLE_2DI:
-case RS6000_BIF_CMPLE_U2DI:
-case RS6000_BIF_CMPLE_1TI:
-case RS6000_BIF_CMPLE_U1TI:
-  fold_compare_helper (gsi, LE_EXPR, stmt);
-  return true;
-
 /* flavors of vec_splat_[us]{8,16,32}.  */
 case RS6000_BIF_VSPLTISB:
 case RS6000_BIF_VSPLTISH:
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..7c36976a089 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1337,30 +1337,6 @@
   const vss __builtin_vsx_cmpge_u8hi (vus, vus);
 CMPGE_U8HI vector_nltuv8hi {}
 
-  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
-CMPLE_16QI vector_ngtv16qi {}
-
-  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
-CMPLE_2DI vector_ngtv2di {}
-
-  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
-CMPLE_4SI vector_ngtv4si {}
-
-  const vss __builtin_vsx_cmple_8hi (vss, vss);
-CMPLE_8HI vector_ngtv8hi {}
-
-  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
-CMPLE_U16QI vector_ngtuv16qi {}
-
-  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
-CMPLE_U2DI vector_ngtuv2di {}
-
-  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
-CMPLE_U4SI vector_ngtuv4si {}
-
-  const vss __builtin_vsx_cmple_u8hi (vss, vss);
-CMPLE_U8HI vector_ngtuv8hi {}
-
   const vd __builtin_vsx_concat_2df (double, double);
 CONCAT_2DF vsx_concat_v2df {}
 
@@ -3117,12 +3093,6 @@
   const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
 CMPGE_U1TI vector_nltuv1ti {}
 
-  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
-CMPLE_1TI vector_ngtv1ti {}
-
-  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
-CMPLE_U1TI vector_ngtuv1ti {}
-
   const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
 VCNTMBB vec_cntmb_v16qi {}
 
-- 
2.45.0



Re: [PATCH 2/13 ver 3] rs6000, Remove __builtin_vsx_xvcvspsxws built-in

2024-05-29 Thread Carl Love
I responded to comments about the patch from the previous patch series.  No 
functional changes were made to this patch.

Carl 
-- 

rs6000, Remove __builtin_vsx_xvcvspsxws built-in.

The built-in __builtin_vsx_xvcvspsxws is a duplicate of the vec_signed
built-in that is documented in the PVIPR.  The __builtin_vsx_xvcvspsxws
built-in is not documented and there are no test cases for it.

This patch removes the redundant built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxws):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 7c36976a089..c6d2ea1bc39 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1709,9 +1709,6 @@
   const vsll __builtin_vsx_xvcvspsxds (vf);
 XVCVSPSXDS vsx_xvcvspsxds {}
 
-  const vsi __builtin_vsx_xvcvspsxws (vf);
-XVCVSPSXWS vsx_fix_truncv4sfv4si2 {}
-
   const vsll __builtin_vsx_xvcvspuxds (vf);
 XVCVSPUXDS vsx_xvcvspuxds {}
 
-- 
2.45.0



Re: [PATCH 3/13 ver 3] rs6000, fix error in unsigned vector float to unsigned int built-in definition

2024-05-29 Thread Carl Love
This patch was updated per the feedback comment from the previous version in 
series 2.

 Carl 
---

rs6000, fix error in unsigned vector float to unsigned int built-in definitions

The built-in __builtin_vsx_vunsigned_v2df is supposed to take a vector of
doubles and return a vector of unsigned long long ints.  Similarly
__builtin_vsx_vunsigned_v4sf takes a vector of floats an is supposed to
return a vector of unsinged ints.  The definitions are using the signed
version of the instructions not the unsigned version of the instruction.
The results should also be unsigned.  The builtins are used by the
overloaded vec_unsigned builtin which has an unsigned result.

Similarly the built-ins __builtin_vsx_vunsignede_v2df and
__builtin_vsx_vunsignedo_v2df are supposed to return an unsigned result.
If the floating point argument is negative, the unsigned result is zero.
The built-ins are used in the overloaded built-in vec_unsignede and
vec_unsignedo respectively.

Add a test cases for a negative floating point arguments for each of the
above built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vunsigned_v2df,
__builtin_vsx_vunsigned_v4sf, __builtin_vsx_vunsignede_v2df,
__builtin_vsx_vunsignedo_v2df): Change the result type to unsigned.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: Add tests for
vec_unsignede and vec_unsignedo with negative arguments.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 .../gcc.target/powerpc/builtins-3-runnable.c  | 30 +--
 2 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index c6d2ea1bc39..bf9a0ae22fc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1580,16 +1580,16 @@
   const vsi __builtin_vsx_vsignedo_v2df (vd);
 VEC_VSIGNEDO_V2DF vsignedo_v2df {}
 
-  const vsll __builtin_vsx_vunsigned_v2df (vd);
-VEC_VUNSIGNED_V2DF vsx_xvcvdpsxds {}
+  const vull __builtin_vsx_vunsigned_v2df (vd);
+VEC_VUNSIGNED_V2DF vsx_xvcvdpuxds {}
 
-  const vsi __builtin_vsx_vunsigned_v4sf (vf);
-VEC_VUNSIGNED_V4SF vsx_xvcvspsxws {}
+  const vui __builtin_vsx_vunsigned_v4sf (vf);
+VEC_VUNSIGNED_V4SF vsx_xvcvspuxws {}
 
-  const vsi __builtin_vsx_vunsignede_v2df (vd);
+  const vui __builtin_vsx_vunsignede_v2df (vd);
 VEC_VUNSIGNEDE_V2DF vunsignede_v2df {}
 
-  const vsi __builtin_vsx_vunsignedo_v2df (vd);
+  const vui __builtin_vsx_vunsignedo_v2df (vd);
 VEC_VUNSIGNEDO_V2DF vunsignedo_v2df {}
 
   const vf __builtin_vsx_xscvdpsp (double);
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
index 0231a1fd086..5dcdfbee791 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-runnable.c
@@ -313,6 +313,14 @@ int main()
test_unsigned_int_result (ALL, vec_uns_int_result,
  vec_uns_int_expected);
 
+   /* Convert single precision float to  unsigned int.  Negative
+  arguments.  */
+   vec_flt0 = (vector float){-14.930, -834.49, -3.3, -5.4};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
+   vec_uns_int_result = vec_unsigned (vec_flt0);
+   test_unsigned_int_result (ALL, vec_uns_int_result,
+ vec_uns_int_expected);
+
/* Convert double precision float to long long unsigned int */
vec_dble0 = (vector double){124.930, 8134.49};
vec_ll_uns_int_expected = (vector long long unsigned int){124, 8134};
@@ -320,10 +328,18 @@ int main()
test_ll_unsigned_int_result (vec_ll_uns_int_result,
 vec_ll_uns_int_expected);
 
+   /* Convert double precision float to long long unsigned int. Negative
+  arguments.  */
+   vec_dble0 = (vector double){-24.93, -134.9};
+   vec_ll_uns_int_expected = (vector long long unsigned int){0, 0};
+   vec_ll_uns_int_result = vec_unsigned (vec_dble0);
+   test_ll_unsigned_int_result (vec_ll_uns_int_result,
+vec_ll_uns_int_expected);
+
/* Convert double precision vector float to vector unsigned int,
-  even words */
-   vec_dble0 = (vector double){3124.930, 8234.49};
-   vec_uns_int_expected = (vector unsigned int){3124, 0, 8234, 0};
+  even words.  Negative arguments */
+   vec_dble0 = (vector double){-124.930, -234.49};
+   vec_uns_int_expected = (vector unsigned int){0, 0, 0, 0};
vec_uns_int_result = vec_unsignede (vec_dble0);
test_unsigned_int_result (EVEN, vec_uns_int_result,
  vec_uns_int_expected);
@@ -335,5 +351,13 @@ int main()
vec_uns_int_resul

Re: [PATCH 4/13 ver 3] rs6000, extend the current vec_{un,}signed{e,o} built-ins

2024-05-29 Thread Carl Love
Updated the patch per the feedback comments from the previous version.

 Carl 
---

rs6000, extend the current vec_{un,}signed{e,o} built-ins

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds
convert a vector of floats to signed/unsigned long long ints.  Extend the
existing vec_{un,}signed{e,o} built-ins to handle the argument
vector of floats to return the even/odd signed/unsigned integers.

The define expands vsignede_v4sf, vsignedo_v4sf, vunsignede_v4sf,
vunsignedo_v4sf are added to support the new vec_{un,}signed{e,o}
built-ins.

The built-ins __builtin_vsx_xvcvspsxds and __builtin_vsx_xvcvspuxds are
now for internal use only. They are not documented and they do not
have testcases.

The built-in __builtin_vsx_xvcvdpsxws is redundant as it is covered by
vec_signed{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxws is redundant as it is covered by
vec_unsigned{e,o}, remove.

The built-in __builtin_vsx_xvcvdpuxds_uns is redundant as it is covered by
vec_unsigned, remove.

The __builtin_vsx_xvcvspuxws is redundante as it is covered by
vec_unsigned, remove.

Add testcases and update documentation.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspsxds_low,
__builtin_vsx_xvcvspuxds_low): New built-in definitions.
(__builtin_vsx_xvcvspuxds): Fix return type.
(XVCVSPSXDS, XVCVSPUXDS): Renamed VEC_VSIGNEDE_V4SF,
VEC_VUNSIGNEDE_V4SF respectively.
(vsx_xvcvspsxds, vsx_xvcvspuxds): Renamed vsignede_v4sf,
vunsignede_v4sf respectively.
(__builtin_vsx_xvcvdpsxws, __builtin_vsx_xvcvdpuxws,
__builtin_vsx_xvcvdpuxds_uns, __builtin_vsx_xvcvspuxws): Removed.
* config/rs6000/rs6000-overload.def (vec_signede, vec_signedo,
vec_unsignede,vec_unsignedo):  Add new overloaded specifications.
* config/rs6000/vsx.md (vsignede_v4sf, vsignedo_v4sf,
vunsignede_v4sf, vunsignedo_v4sf): New define_expands.
* doc/extend.texi (vec_signedo, vec_signede): Add documentation.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/builtins-3-runnable.c: New tests for the added
overloaded built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 25 ++
 gcc/config/rs6000/rs6000-overload.def |  8 ++
 gcc/config/rs6000/vsx.md  | 88 +++
 gcc/doc/extend.texi   | 10 +++
 .../gcc.target/powerpc/builtins-3-runnable.c  | 51 +--
 5 files changed, 157 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index bf9a0ae22fc..cea2649b86c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1688,32 +1688,23 @@
   const vsll __builtin_vsx_xvcvdpsxds_scale (vd, const int);
 XVCVDPSXDS_SCALE vsx_xvcvdpsxds_scale {}
 
-  const vsi __builtin_vsx_xvcvdpsxws (vd);
-XVCVDPSXWS vsx_xvcvdpsxws {}
-
-  const vsll __builtin_vsx_xvcvdpuxds (vd);
-XVCVDPUXDS vsx_fixuns_truncv2dfv2di2 {}
-
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vull __builtin_vsx_xvcvdpuxds_uns (vd);
-XVCVDPUXDS_UNS vsx_fixuns_truncv2dfv2di2 {}
-
-  const vsi __builtin_vsx_xvcvdpuxws (vd);
-XVCVDPUXWS vsx_xvcvdpuxws {}
-
   const vd __builtin_vsx_xvcvspdp (vf);
 XVCVSPDP vsx_xvcvspdp {}
 
   const vsll __builtin_vsx_xvcvspsxds (vf);
-XVCVSPSXDS vsx_xvcvspsxds {}
+VEC_VSIGNEDE_V4SF vsignede_v4sf {}
+
+  const vsll __builtin_vsx_xvcvspsxds_low (vf);
+VEC_VSIGNEDO_V4SF vsignedo_v4sf {}
 
-  const vsll __builtin_vsx_xvcvspuxds (vf);
-XVCVSPUXDS vsx_xvcvspuxds {}
+  const vull __builtin_vsx_xvcvspuxds (vf);
+VEC_VUNSIGNEDE_V4SF vunsignede_v4sf {}
 
-  const vsi __builtin_vsx_xvcvspuxws (vf);
-XVCVSPUXWS vsx_fixuns_truncv4sfv4si2 {}
+  const vull __builtin_vsx_xvcvspuxds_low (vf);
+VEC_VUNSIGNEDO_V4SF vunsignedo_v4sf {}
 
   const vd __builtin_vsx_xvcvsxddp (vsll);
 XVCVSXDDP vsx_floatv2div2df2 {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 84bd9ae6554..4d857bb1af3 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3307,10 +3307,14 @@
 [VEC_SIGNEDE, vec_signede, __builtin_vec_vsignede]
   vsi __builtin_vec_vsignede (vd);
 VEC_VSIGNEDE_V2DF
+  vsll __builtin_vec_vsignede (vf);
+VEC_VSIGNEDE_V4SF
 
 [VEC_SIGNEDO, vec_signedo, __builtin_vec_vsignedo]
   vsi __builtin_vec_vsignedo (vd);
 VEC_VSIGNEDO_V2DF
+  vsll __builtin_vec_vsignedo (vf);
+VEC_VSIGNEDO_V4SF
 
 [VEC_SIGNEXTI, vec_signexti, __builtin_vec_signexti]
   vsi __builtin_vec_signexti (vsc);
@@ -4433,10 +4437,14 @@
 [VEC_UNSIGNEDE, vec_unsignede, __builtin_vec_vunsignede]
   vui __builtin_vec_vunsignede (vd);
 VEC_VUNSIGNEDE_V2DF
+  vull __builtin_vec_vunsignede 

Re: [PATCH 5/13 ver 3] rs6000, Remove redundant float/double type conversions

2024-05-29 Thread Carl Love
This is a new patch to removed the built-ins that were inadvertently missing in 
the previous series.

  Carl 
--

rs6000, Remove redundant float/double type conversions

The following built-ins are redundant as they are covered by another
overloaded built-in.

  __builtin_vsx_xvcvspdp covered by vec_double{e,o}
  __builtin_vsx_xvcvdpsp covered by vec_float{e,o}
  __builtin_vsx_xvcvsxwdp covered by vec_double{e,o}
  __builtin_vsx_xvcvuxddp_uns covered by  vec_double

Remove the redundant built-ins. They are not documented nor do they have
test cases.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcvspdp,
__builtin_vsx_xvcvdpsp, __builtin_vsx_xvcvsxwdp,
__builtin_vsx_xvcvuxddp_uns): Remove.
---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 1 file changed, 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index cea2649b86c..6049f3a4599 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1679,9 +1679,6 @@
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}
 
-  const vf __builtin_vsx_xvcvdpsp (vd);
-XVCVDPSP vsx_xvcvdpsp {}
-
   const vsll __builtin_vsx_xvcvdpsxds (vd);
 XVCVDPSXDS vsx_fix_truncv2dfv2di2 {}
 
@@ -1691,9 +1688,6 @@
   const vsll __builtin_vsx_xvcvdpuxds_scale (vd, const int);
 XVCVDPUXDS_SCALE vsx_xvcvdpuxds_scale {}
 
-  const vd __builtin_vsx_xvcvspdp (vf);
-XVCVSPDP vsx_xvcvspdp {}
-
   const vsll __builtin_vsx_xvcvspsxds (vf);
 VEC_VSIGNEDE_V4SF vsignede_v4sf {}
 
@@ -1715,9 +1709,6 @@
   const vf __builtin_vsx_xvcvsxdsp (vsll);
 XVCVSXDSP vsx_xvcvsxdsp {}
 
-  const vd __builtin_vsx_xvcvsxwdp (vsi);
-XVCVSXWDP vsx_xvcvsxwdp {}
-
   const vf __builtin_vsx_xvcvsxwsp (vsi);
 XVCVSXWSP vsx_floatv4siv4sf2 {}
 
@@ -1727,9 +1718,6 @@
   const vd __builtin_vsx_xvcvuxddp_scale (vsll, const int<5>);
 XVCVUXDDP_SCALE vsx_xvcvuxddp_scale {}
 
-  const vd __builtin_vsx_xvcvuxddp_uns (vull);
-XVCVUXDDP_UNS vsx_floatunsv2div2df2 {}
-
   const vf __builtin_vsx_xvcvuxdsp (vull);
 XVCVUXDSP vsx_xvcvuxdsp {}
 
-- 
2.45.0



Re: [PATCH 7/13 ver 3] rs6000, add overloaded vec_sel with int128 arguments

2024-05-29 Thread Carl Love
This was patch 6 in the previous series.  Updated the documentation file per 
the comments.  No functional changes to the patch.

  Carl 


rs6000, add overloaded vec_sel with int128 arguments

Extend the vec_sel built-in to take three signed/unsigned int128 arguments
and return a signed/unsigned int128 result.

Extending the vec_sel built-in makes the existing buit-ins
__builtin_vsx_xxsel_1ti and __builtin_vsx_xxsel_1ti_uns obsolete.  The
patch removes these built-ins.

The patch adds documentation and test cases for the new overloaded vec_sel
built-ins.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_1ti,
__builtin_vsx_xxsel_1ti_uns): Remove built-in definitions.
* config/rs6000/rs6000-overload.def (vec_sel): Add new overloaded
definitions.
* doc/extend.texi: Add documentation for new vec_sel instances.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-sel-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-builtins.def |   6 -
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |  12 ++
 .../powerpc/vec-sel-runnable-i128.c   | 129 ++
 4 files changed, 145 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 13e36df008d..ea0da77f13e 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1904,12 +1904,6 @@
   const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
 XXSEL_16QI_UNS vector_select_v16qi_uns {}
 
-  const vsq __builtin_vsx_xxsel_1ti (vsq, vsq, vsq);
-XXSEL_1TI vector_select_v1ti {}
-
-  const vsq __builtin_vsx_xxsel_1ti_uns (vsq, vsq, vsq);
-XXSEL_1TI_UNS vector_select_v1ti_uns {}
-
   const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
 XXSEL_2DF vector_select_v2df {}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 4d857bb1af3..a210c5ad10d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3274,6 +3274,10 @@
 VSEL_2DF  VSEL_2DF_B
   vd __builtin_vec_sel (vd, vd, vull);
 VSEL_2DF  VSEL_2DF_U
+  vsq __builtin_vec_sel (vsq, vsq, vsq);
+VSEL_1TI  VSEL_1TI_S
+  vuq __builtin_vec_sel (vuq, vuq, vuq);
+VSEL_1TI_UNS  VSEL_1TI_U
 ; The following variants are deprecated.
   vsll __builtin_vec_sel (vsll, vsll, vsll);
 VSEL_2DI_B  VSEL_2DI_S
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b88e61641a2..0756230b19e 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -21372,6 +21372,18 @@ Additional built-in functions are available for the 
64-bit PowerPC
 family of processors, for efficient use of 128-bit floating point
 (@code{__float128}) values.
 
+Vector select
+
+@smallexample
+vector signed __int128 vec_sel (vector signed __int128,
+   vector signed __int128, vector signed __int128);
+vector unsigned __int128 vec_sel (vector unsigned __int128,
+   vector unsigned __int128, vector unsigned __int128);
+@end smallexample
+
+The instance is an extension of the exiting overloaded built-in @code{vec_sel}
+that is documented in the PVIPR.
+
 @node Basic PowerPC Built-in Functions Available on ISA 2.06
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 2.06
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
new file mode 100644
index 000..d82225cc847
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-sel-runnable-i128.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+/* { dg-final { scan-assembler-times "xxsel" 2 } } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will run
+ with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;

Re: [PATCH 6/13 ver 3] rs6000, remove duplicated built-ins of vecmergl and, vec_mergeh

2024-05-29 Thread Carl Love
This was patch 5 in the previous series.  It was previously approved.  Not 
changes in this version.  Being posted for completeness.

 Carl 


rs6000, remove duplicated built-ins of vecmergl and
 vec_mergeh

The following undocumented built-ins are same as existing documented
overloaded builtins.

  const vf __builtin_vsx_xxmrghw (vf, vf);
same as  vf __builtin_vec_mergeh (vf, vf);  (overloaded vec_mergeh)

  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
same as vsi __builtin_vec_mergeh (vsi, vsi);   (overloaded vec_mergeh)

  const vf __builtin_vsx_xxmrglw (vf, vf);
same as vf __builtin_vec_mergel (vf, vf);  (overloaded vec_mergel)

  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
same as vsi __builtin_vec_mergel (vsi, vsi);   (overloaded vec_mergel)

This patch removes the duplicate built-in definitions so only the
documented built-ins will be available for use.  The case statements in
rs6000_gimple_fold_builtin are removed as they are no longer needed.  The
patch removes the now unused define_expands for vsx_xxmrghw_ and
vsx_xxmrglw_.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxmrghw,
__builtin_vsx_xxmrghw_4si, __builtin_vsx_xxmrglw,
__builtin_vsx_xxmrglw_4si, __builtin_vsx_xxsel_16qi): Remove
built-in definition.
* config/rs6000/rs6000-builtin.cc (rs6000_gimple_fold_builtin):
remove case entries RS6000_BIF_XXMRGLW_4SI,
RS6000_BIF_XXMRGLW_4SF, RS6000_BIF_XXMRGHW_4SI,
RS6000_BIF_XXMRGHW_4SF.
* config/rs6000/vsx.md (vsx_xxmrghw_, vsx_xxmrglw_):
Remove unused define_expands.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 ---
 gcc/config/rs6000/rs6000-builtins.def | 12 
 gcc/config/rs6000/vsx.md  | 41 ---
 3 files changed, 57 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index ac9f16fe51a..f83d65b06ef 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -2097,20 +2097,16 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 /* vec_mergel (integrals).  */
 case RS6000_BIF_VMRGLH:
 case RS6000_BIF_VMRGLW:
-case RS6000_BIF_XXMRGLW_4SI:
 case RS6000_BIF_VMRGLB:
 case RS6000_BIF_VEC_MERGEL_V2DI:
-case RS6000_BIF_XXMRGLW_4SF:
 case RS6000_BIF_VEC_MERGEL_V2DF:
   fold_mergehl_helper (gsi, stmt, 1);
   return true;
 /* vec_mergeh (integrals).  */
 case RS6000_BIF_VMRGHH:
 case RS6000_BIF_VMRGHW:
-case RS6000_BIF_XXMRGHW_4SI:
 case RS6000_BIF_VMRGHB:
 case RS6000_BIF_VEC_MERGEH_V2DI:
-case RS6000_BIF_XXMRGHW_4SF:
 case RS6000_BIF_VEC_MERGEH_V2DF:
   fold_mergehl_helper (gsi, stmt, 0);
   return true;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 6049f3a4599..13e36df008d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1877,18 +1877,6 @@
   const signed int __builtin_vsx_xvtsqrtsp_fg (vf);
 XVTSQRTSP_FG vsx_tsqrtv4sf2_fg {}
 
-  const vf __builtin_vsx_xxmrghw (vf, vf);
-XXMRGHW_4SF vsx_xxmrghw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrghw_4si (vsi, vsi);
-XXMRGHW_4SI vsx_xxmrghw_v4si {}
-
-  const vf __builtin_vsx_xxmrglw (vf, vf);
-XXMRGLW_4SF vsx_xxmrglw_v4sf {}
-
-  const vsi __builtin_vsx_xxmrglw_4si (vsi, vsi);
-XXMRGLW_4SI vsx_xxmrglw_v4si {}
-
   const vsc __builtin_vsx_xxpermdi_16qi (vsc, vsc, const int<2>);
 XXPERMDI_16QI vsx_xxpermdi_v16qi {}
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index a8f3d459232..4402b8b01d5 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -4875,47 +4875,6 @@ (define_insn "vsx_xxspltd_"
 }
   [(set_attr "type" "vecperm")])
 
-;; V4SF/V4SI interleave
-(define_expand "vsx_xxmrghw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-(vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 0) (const_int 4)
-(const_int 1) (const_int 5)])))]
-  "VECTOR_MEM_VSX_P (mode)"
-{
-  rtx (*fun) (rtx, rtx, rtx);
-  fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_
-: gen_altivec_vmrglw_direct_;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  [(set_attr "type" "vecperm")])
-
-(define_expand "vsx_xxmrglw_"
-  [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
-   (vec_select:VSX_W
- (vec_concat:
-   (match_operand:VSX_W 1 "vsx_register_operand" "wa")
-   (match_operand:VSX_W 2 "vsx_register_operand" "wa"))
- (parallel [(const_int 2) (const_int 6)
-(cons

Re: [PATCH 8/13 ver 3] rs6000, remove the vec_xxsel built-ins, they are, duplicates

2024-05-29 Thread Carl Love
This was patch 7 in the previous series.  Patch was updated to address the 
feedback comments.

Carl 


rs6000, remove the vec_xxsel built-ins, they are duplicates

The following undocumented built-ins are covered by the existing overloaded
vec_sel built-in definitions.

  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
same as vsc __builtin_vec_sel (vsc, vsc, vuc);  (overloaded vec_sel)

  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
same as vuc __builtin_vec_sel (vuc, vuc, vuc);  (overloaded vec_sel)

  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
same as  vd __builtin_vec_sel (vd, vd, vull);   (overloaded vec_sel)

  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
same as vsll __builtin_vec_sel (vsll, vsll, vsll);  (overloaded vec_sel)

  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
same as vull __builtin_vec_sel (vull, vull, vsll);  (overloaded vec_sel)

  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
same as vf __builtin_vec_sel (vf, vf, vsi)  (overloaded vec_sel)

  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
same as vsi __builtin_vec_sel (vsi, vsi, vbi);  (overloaded vec_sel)

  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
same as vui __builtin_vec_sel (vui, vui, vui);  (overloaded vec_sel)

  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
same as vss __builtin_vec_sel (vss, vss, vbs);  (overloaded vec_sel)

  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
same as vus __builtin_vec_sel (vus, vus, vus);  (overloaded vec_sel)

This patch removed the duplicate built-in definitions so users will only
use the documented vec_sel built-in.  The __builtin_vsx_xxsel_[4si, 8hi,
16qi, 4sf, 2df] tests are also removed.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_16qi_uns, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel_2di,__builtin_vsx_xxsel_2di_uns,
__builtin_vsx_xxsel_4sf,__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_4si_uns,__builtin_vsx_xxsel_8hi,
__builtin_vsx_xxsel_8hi_uns): Removebuilt-in definitions.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_xxsel_4si,
__builtin_vsx_xxsel_8hi, __builtin_vsx_xxsel_16qi,
__builtin_vsx_xxsel_4sf, __builtin_vsx_xxsel_2df,
__builtin_vsx_xxsel): Change built-in call to overloaded built-in
call vec_sel.
---
 gcc/config/rs6000/rs6000-builtins.def | 30 
 .../gcc.target/powerpc/vsx-builtin-3.c| 36 ++-
 2 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index ea0da77f13e..a78c52183bc 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1898,36 +1898,6 @@
   const vss __builtin_vsx_xxpermdi_8hi (vss, vss, const int<2>);
 XXPERMDI_8HI vsx_xxpermdi_v8hi {}
 
-  const vsc __builtin_vsx_xxsel_16qi (vsc, vsc, vsc);
-XXSEL_16QI vector_select_v16qi {}
-
-  const vuc __builtin_vsx_xxsel_16qi_uns (vuc, vuc, vuc);
-XXSEL_16QI_UNS vector_select_v16qi_uns {}
-
-  const vd __builtin_vsx_xxsel_2df (vd, vd, vd);
-XXSEL_2DF vector_select_v2df {}
-
-  const vsll __builtin_vsx_xxsel_2di (vsll, vsll, vsll);
-XXSEL_2DI vector_select_v2di {}
-
-  const vull __builtin_vsx_xxsel_2di_uns (vull, vull, vull);
-XXSEL_2DI_UNS vector_select_v2di_uns {}
-
-  const vf __builtin_vsx_xxsel_4sf (vf, vf, vf);
-XXSEL_4SF vector_select_v4sf {}
-
-  const vsi __builtin_vsx_xxsel_4si (vsi, vsi, vsi);
-XXSEL_4SI vector_select_v4si {}
-
-  const vui __builtin_vsx_xxsel_4si_uns (vui, vui, vui);
-XXSEL_4SI_UNS vector_select_v4si_uns {}
-
-  const vss __builtin_vsx_xxsel_8hi (vss, vss, vss);
-XXSEL_8HI vector_select_v8hi {}
-
-  const vus __builtin_vsx_xxsel_8hi_uns (vus, vus, vus);
-XXSEL_8HI_UNS vector_select_v8hi_uns {}
-
   const vsc __builtin_vsx_xxsldwi_16qi (vsc, vsc, const int<2>);
 XXSLDWI_16QI vsx_xxsldwi_v16qi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index ff875c55304..e20d3f03c86 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -37,6 +37,8 @@
 /* { dg-final { scan-assembler "xvcvsxdsp" } } */
 /* { dg-final { scan-assembler "xvcvuxdsp" } } */
 
+#include 
+
 extern __vector int si[][4];
 extern __vector short ss[][4];
 extern __vector signed char sc[][4];
@@ -61,23 +63,23 @@ int do_sel(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_xxsel_4si (si[i][1], si[i][2], si[i][3]); i++;
-  ss[i][0] = __builtin_vsx_xxsel_8hi (ss[i][1], ss[i][2], ss[i][3]); i++;
-  sc[i][0] = __builtin_vsx_xxsel_16qi (sc[i][1], sc[i][2], sc[i][3]); i++;
-  f[i][0] = __built

Re: [PATCH 10/13 ver 3] rs6000, remove __builtin_vsx_xvnegdp and, __builtin_vsx_xvnegsp built-ins

2024-05-29 Thread Carl Love
 This was patch 9 in the previous series.  It was previously approved.  
Reposting for completeness.

 Carl
-

rs6000, remove __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp built-ins

The undocumented __builtin_vsx_xvnegdp and __builtin_vsx_xvnegsp are
redundant.  The overloaded vec_neg built-in provides the same
functionality.  The two buit-ins are not documented nor are there any
test cases for them.

Remove the definitions so users will use the overloaded vec_neg built-in
which is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvnegdp,
__builtin_vsx_xvnegsp): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 6 --
 1 file changed, 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index f02a8c4de45..64690b9b9b5 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1736,12 +1736,6 @@
   const vf __builtin_vsx_xvnabssp (vf);
 XVNABSSP vsx_nabsv4sf2 {}
 
-  const vd __builtin_vsx_xvnegdp (vd);
-XVNEGDP negv2df2 {}
-
-  const vf __builtin_vsx_xvnegsp (vf);
-XVNEGSP negv4sf2 {}
-
   const vd __builtin_vsx_xvnmadddp (vd, vd, vd);
 XVNMADDDP nfmav2df4 {}
 
-- 
2.45.0



Re: [PATCH 11/13 ver 3] rs6000, extend vec_xxpermdi built-in for __int128 args

2024-05-29 Thread Carl Love
 This was patch 10 from the previous series.  The patch was updated to address 
feedback comments.

Carl 
---

rs6000, extend vec_xxpermdi built-in for __int128 args

Add a new signed and unsigned overloaded instances for vec_xxpermdi

   __int128 vec_xxpermdi (__int128, __int128, const int);
   __uint128 vec_xxpermdi (__uint128, __uint128, const int);

Update the documentation to include a reference to the new built-in
instances.

Add test cases for the new overloaded instances.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (vec_xxpermdi): Add new
overloaded built-in instances.
* doc/extend.texi:  Add documentation for new overloaded built-in
instances.

gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec_perm-runnable-i128.c: New test file.
---
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vec_perm-runnable-i128.c  | 229 ++
 3 files changed, 235 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index a210c5ad10d..45000f161e4 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4932,6 +4932,10 @@
 XXPERMDI_4SF  XXPERMDI_VF
   vd __builtin_vsx_xxpermdi (vd, vd, const int);
 XXPERMDI_2DF  XXPERMDI_VD
+  vsq __builtin_vsx_xxpermdi (vsq, vsq, const int);
+XXPERMDI_1TI  XXPERMDI_1TI
+  vuq __builtin_vsx_xxpermdi (vuq, vuq, const int);
+XXPERMDI_1TI  XXPERMDI_1TUI
 
 [VEC_XXSLDWI, vec_xxsldwi, __builtin_vsx_xxsldwi]
   vsc __builtin_vsx_xxsldwi (vsc, vsc, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0756230b19e..edfef1bdab7 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -22555,6 +22555,8 @@ void vec_vsx_st (vector bool char, int, signed char *);
 vector double vec_xxpermdi (vector double, vector double, const int);
 vector float vec_xxpermdi (vector float, vector float, const int);
 vector long long vec_xxpermdi (vector long long, vector long long, const int);
+vector __int128 vec_xxpermdi (vector __int128, vector __int128, const int);
+vector __int128 vec_xxpermdi (vector __uint128, vector __uint128, const int);
 vector unsigned long long vec_xxpermdi (vector unsigned long long,
 vector unsigned long long, const int);
 vector int vec_xxpermdi (vector int, vector int, const int);
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
new file mode 100644
index 000..2d5dce09404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -0,0 +1,229 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-options "-save-temps" } */
+
+#include 
+
+#define DEBUG 0
+
+#if DEBUG
+#include 
+void print_i128 (unsigned __int128 val)
+{
+  printf(" 0x%016llx%016llx",
+ (unsigned long long)(val >> 64),
+ (unsigned long long)(val & 0x));
+}
+#endif
+
+extern void abort (void);
+
+union convert_union {
+  vector signed __int128s128;
+  vector unsigned __int128  u128;
+  char  val[16];
+} convert;
+
+int check_u128_result(vector unsigned __int128 vresult_u128,
+ vector unsigned __int128 expected_vresult_u128)
+{
+  /* Use a for loop to check each byte manually so the test case will
+ run with ISA 2.06.
+
+ Return 1 if they match, 0 otherwise.  */
+
+  int i;
+
+  union convert_union result;
+  union convert_union expected;
+
+  result.u128 = vresult_u128;
+  expected.u128 = expected_vresult_u128;
+
+  /* Check if each byte of the result and expected match. */
+  for (i = 0; i < 16; i++)
+{
+  if (result.val[i] != expected.val[i])
+   return 0;
+}
+  return 1;
+}
+
+int check_s128_result(vector signed __int128 vresult_s128,
+ vector signed __int128 expected_vresult_s128)
+{
+  /* Convert the arguments to unsigned, then check equality.  */
+  union convert_union result;
+  union convert_union expected;
+
+  result.s128 = vresult_s128;
+  expected.s128 = expected_vresult_s128;
+
+  return check_u128_result (result.u128, expected.u128);
+}
+
+
+int
+main (int argc, char *argv [])
+{
+  int i;
+  
+  vector signed __int128 src_va_s128;
+  vector signed __int128 src_vb_s128;
+  vector signed __int128 vresult_s128;
+  vector signed __int128 expected_vresult_s128;
+
+  vector unsigned __int128 src_va_u128;
+  vector unsigned __int128 src_vb_u128;
+  vector unsigned __int128 src_vc_u128;
+  vector unsigned __int128 vresult_u128;
+  vector unsigned __int128 expected_vresult_u128;
+
+  src_va_s128 = (vector signed __int128) {0x123456789ABCDEF0};
+  src_va_s128 = src_va_s128 << 64; 
+  src_va_s128

Re: [PATCH 9/13 ver 3] rs6000, remove __builtin_vsx_vperm_* built-ins

2024-05-29 Thread Carl Love
This was patch 8 in the previous series.  Updated patch per the feedback 
comments.

Carl 


rs6000, remove __builtin_vsx_vperm_* built-ins

The undocumented built-ins:
  __builtin_vsx_vperm_16qi_uns,
  __builtin_vsx_vperm_1ti,
  __builtin_vsx_vperm_1ti_uns,
  __builtin_vsx_vperm_2df,
  __builtin_vsx_vperm_2di,
  __builtin_vsx_vperm_2di_uns,
  __builtin_vsx_vperm_4sf,
  __builtin_vsx_vperm_4si,
  __builtin_vsx_vperm_4si_uns

are duplicats of the __builtin_altivec_* builtins that are used by
the overloaded vec_perm built-in that is documented in the PVIPR.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns): Remove
built-in definitions and comments.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c (__builtin_vsx_vperm_16qi_uns,
__builtin_vsx_vperm_1ti, __builtin_vsx_vperm_1ti_uns,
__builtin_vsx_vperm_2df, __builtin_vsx_vperm_2di,
__builtin_vsx_vperm_2di_uns, __builtin_vsx_vperm_4sf,
__builtin_vsx_vperm_4si, __builtin_vsx_vperm_4si_uns,
__builtin_vsx_vperm): Change call to built-in to the  overloaded
built-in vec_perm.
---
 gcc/config/rs6000/rs6000-builtins.def | 33 ---
 .../gcc.target/powerpc/vsx-builtin-3.c| 22 ++---
 2 files changed, 11 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index a78c52183bc..f02a8c4de45 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1529,39 +1529,6 @@
   const vf __builtin_vsx_uns_floato_v2di (vsll);
 UNS_FLOATO_V2DI unsfloatov2di {}
 
-; These are duplicates of __builtin_altivec_* counterparts, and are being
-; kept for backwards compatibility.  The reason for their existence is
-; unclear.  TODO: Consider deprecation/removal at some point.
-  const vsc __builtin_vsx_vperm_16qi (vsc, vsc, vuc);
-VPERM_16QI_X altivec_vperm_v16qi {}
-
-  const vuc __builtin_vsx_vperm_16qi_uns (vuc, vuc, vuc);
-VPERM_16QI_UNS_X altivec_vperm_v16qi_uns {}
-
-  const vsq __builtin_vsx_vperm_1ti (vsq, vsq, vsc);
-VPERM_1TI_X altivec_vperm_v1ti {}
-
-  const vsq __builtin_vsx_vperm_1ti_uns (vsq, vsq, vsc);
-VPERM_1TI_UNS_X altivec_vperm_v1ti_uns {}
-
-  const vd __builtin_vsx_vperm_2df (vd, vd, vuc);
-VPERM_2DF_X altivec_vperm_v2df {}
-
-  const vsll __builtin_vsx_vperm_2di (vsll, vsll, vuc);
-VPERM_2DI_X altivec_vperm_v2di {}
-
-  const vull __builtin_vsx_vperm_2di_uns (vull, vull, vuc);
-VPERM_2DI_UNS_X altivec_vperm_v2di_uns {}
-
-  const vf __builtin_vsx_vperm_4sf (vf, vf, vuc);
-VPERM_4SF_X altivec_vperm_v4sf {}
-
-  const vsi __builtin_vsx_vperm_4si (vsi, vsi, vuc);
-VPERM_4SI_X altivec_vperm_v4si {}
-
-  const vui __builtin_vsx_vperm_4si_uns (vui, vui, vuc);
-VPERM_4SI_UNS_X altivec_vperm_v4si_uns {}
-
   const vss __builtin_vsx_vperm_8hi (vss, vss, vuc);
 VPERM_8HI_X altivec_vperm_v8hi {}
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index e20d3f03c86..f06d871b6b1 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -88,17 +88,17 @@ int do_perm(void)
 {
   int i = 0;
 
-  si[i][0] = __builtin_vsx_vperm_4si (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm_8hi (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm_16qi (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm_4sf (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm_2df (d[i][1], d[i][2], uc[i][3]); i++;
-
-  si[i][0] = __builtin_vsx_vperm (si[i][1], si[i][2], uc[i][3]); i++;
-  ss[i][0] = __builtin_vsx_vperm (ss[i][1], ss[i][2], uc[i][3]); i++;
-  sc[i][0] = __builtin_vsx_vperm (sc[i][1], sc[i][2], uc[i][3]); i++;
-  f[i][0] = __builtin_vsx_vperm (f[i][1], f[i][2], uc[i][3]); i++;
-  d[i][0] = __builtin_vsx_vperm (d[i][1], d[i][2], uc[i][3]); i++;
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
+
+  si[i][0] = vec_perm (si[i][1], si[i][2], uc[i][3]); i++;
+  ss[i][0] = vec_perm (ss[i][1], ss[i][2], uc[i][3]); i++;
+  sc[i][0] = vec_perm (sc[i][1], sc[i][2], uc[i][3]); i++;
+  f[i][0] = vec_perm (f[i][1], f[i][2], uc[i][3]); i++;
+  d[i][0] = vec_perm (d[i][1], d[i][2], uc[i][3]); i++;
 
   return i;
 }
-- 
2.45.0



Re: [PATCH 12/13 ver 3] rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

2024-05-29 Thread Carl Love
This was patch 11 from the previous series.  Patch was updated to address 
feedback comments.

   Carl 
--

rs6000, remove __builtin_vsx_xvcmpeqsp_p built-in

The built-in __builtin_vsx_xvcmpeqsp_p is a duplicate of the overloaded
__builtin_altivec_vcmpeqfp_p built-in.  The built-in is undocumented and
there are no test cases for it.  The patch removes built-in
__builtin_vsx_xvcmpeqsp_p.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vsx_xvcmpeqsp_p):
Remove built-in definition.
---
 gcc/config/rs6000/rs6000-builtins.def | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 64690b9b9b5..48ebc018a8d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1619,9 +1619,6 @@
   const vf __builtin_vsx_xvcmpeqsp (vf, vf);
 XVCMPEQSP vector_eqv4sf {}
 
-  const signed int __builtin_vsx_xvcmpeqsp_p (signed int, vf, vf);
-XVCMPEQSP_P vector_eq_v4sf_p {pred}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}
 
-- 
2.45.0



Re: [PATCH 13/13 ver 3] rs6000, remove vector set and vector init built-ins.

2024-05-29 Thread Carl Love
This was patch 13 from the previous series.  Note the previous series patch 12 
was dropped.  This patch is the same as the previous version.  The additional 
work to remove  __builtin_vec_set_v1ti, __builtin_vec_set_v2di,  
__builtin_vec_set_v2d per the feedback comments with equivalent gimple code is 
being deferred to a future patch.  The goal of this series was simply to remove 
duplicated built-ins, extending overloaded built-ins as needed.  Adding the 
needed gimple code to remove the additional built-ins is beyond the goal of 
this patch series.

 Carl 
---

rs6000, remove vector set and vector init built-ins.

The vector init built-ins:

  __builtin_vec_init_v16qi, __builtin_vec_init_v8hi,
  __builtin_vec_init_v4si, __builtin_vec_init_v4sf,
  __builtin_vec_init_v2di, __builtin_vec_init_v2df,
  __builtin_vec_set_v1ti

perform the same operation as initializing the vector in C code.  For
example:

  result_v4si = __builtin_vec_init_v4si (1, 2, 3, 4);
  result_v4si = {1, 2, 3, 4};

These two constructs were tested and verified they generate identical
assembly instructions with no optimization and -O3 optimization.

The vector set built-ins:

  __builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
  __builtin_vec_set_v4si, __builtin_vec_set_v4sf

perform the same operation as setting a specific element in the vector in
C code.  For example:

  src_v4si = __builtin_vec_set_v4si (src_v4si, int_val, index);
  src_v4si[index] = int_val;

The built-in actually generates more instructions than the inline C code
with no optimization but is identical with -O3 optimizations.

All of the above built-ins that are removed do not have test cases and
are not documented.

Built-ins   __builtin_vec_set_v1ti __builtin_vec_set_v2di,
__builtin_vec_set_v2df are not removed as they are used in function
resolve_vec_insert() in file rs6000-c.cc.

The built-ins are removed as they don't provide any benefit over just
using C code.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_vec_init_v16qi,
__builtin_vec_init_v8hi, __builtin_vec_init_v4si,
__builtin_vec_init_v4sf, __builtin_vec_init_v2di,
__builtin_vec_init_v2df, __builtin_vec_set_v1ti,
__builtin_vec_set_v16qi, __builtin_vec_set_v8hi.
__builtin_vec_set_v4si, __builtin_vec_set_v4sf,
__builtin_vec_set_v2di, __builtin_vec_set_v2df,
__builtin_vec_set_v1ti): Remove built-in definitions.
---
 gcc/config/rs6000/rs6000-builtins.def | 42 ++-
 1 file changed, 2 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 48ebc018a8d..8349d45169f 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1118,37 +1118,6 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
 VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char, signed char, signed char, \
-signed char, signed char, signed char);
-VEC_INIT_V16QI nothing {init}
-
-  const vf __builtin_vec_init_v4sf (float, float, float, float);
-VEC_INIT_V4SF nothing {init}
-
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
- signed int);
-VEC_INIT_V4SI nothing {init}
-
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
- signed short, signed short, signed short, signed short, \
- signed short);
-VEC_INIT_V8HI nothing {init}
-
-  const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
-VEC_SET_V16QI nothing {set}
-
-  const vf __builtin_vec_set_v4sf (vf, float, const int<2>);
-VEC_SET_V4SF nothing {set}
-
-  const vsi __builtin_vec_set_v4si (vsi, signed int, const int<2>);
-VEC_SET_V4SI nothing {set}
-
-  const vss __builtin_vec_set_v8hi (vss, signed short, const int<3>);
-VEC_SET_V8HI nothing {set}
-
-
 ; Cell builtins.
 [cell]
   pure vsc __builtin_altivec_lvlx (signed long, const void *);
@@ -1295,15 +1264,8 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}
 
-  const vsq __builtin_vec_init_v1ti (signed __int128);
-VEC_INIT_V1TI nothing {init}
-
-  const vd __builtin_vec_init_v2df (double, double);
-VEC_INIT_V2DF nothing {init}
-
-  const vsll __builtin_vec_init_v2di (signed long long, signed long long);
-VEC_INIT_V2DI nothing {init}
-
+;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
+;; resolve_vec_insert(), rs6000-c.cc
   const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
 VEC_SET_V1TI nothing {set}
 
-- 
2.45.0



Re: [PATCH v2] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2024-05-29 Thread Alexandre Oliva
On May 27, 2024, "Kewen.Lin"  wrote:

> OK with these nits tweaked and re-tested well, thanks!

Thanks, here's what I've retested on ppc64le-linux-gnu, and will push
onto trunk eventually, after retesting also on ppc- and ppc64-vx7r2:


[testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

Codegen changes caused add instruction count mismatches on
ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
expected counts were adjusted for lp64, but ilp32 differences
remained, and published test results confirm it.


for  gcc/testsuite/ChangeLog

PR testsuite/101169
* gcc.target/powerpc/fold-vec-extract-double.p7.c: Adjust addi
counts for ilp32.
* gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
---
 .../powerpc/fold-vec-extract-double.p7.c   |5 ++---
 .../gcc.target/powerpc/fold-vec-extract-float.p7.c |5 ++---
 .../gcc.target/powerpc/fold-vec-extract-float.p8.c |3 +--
 .../gcc.target/powerpc/fold-vec-extract-int.p7.c   |3 +--
 .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |3 +--
 .../gcc.target/powerpc/fold-vec-extract-short.p7.c |3 +--
 .../gcc.target/powerpc/fold-vec-extract-short.p8.c |3 +--
 7 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
index 3cae644b90b71..e69d9253e2d28 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
@@ -13,12 +13,11 @@
 /* { dg-final { scan-assembler-times {\mxxpermdi\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
 /* -m32 target has an 'add' in place of one of the 'addi'. */
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } } 
*/
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } } 
*/
+/* { dg-final { scan-assembler-times {\maddi?\M} 2 } } */
 /* -m32 target has a rlwinm in place of a rldic .  */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mlfdx\M|\mlfd\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlfdx?\M} 1 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
index f7c06e9610914..ab03cd8adb00e 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
@@ -12,13 +12,12 @@
 /* { dg-final { scan-assembler-times {\mxscvspdp\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
 /* -m32 as an add in place of an addi. */
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } } 
*/
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } } 
*/
+/* { dg-final { scan-assembler-times {\maddi?\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 1 } } */
 /* -m32 uses rlwinm in place of rldic */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
 /* -m32 has lfs in place of lfsx */
-/* { dg-final { scan-assembler-times {\mlfsx\M|\mlfs\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlfsx?\M} 1 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
index 6819d271c539d..ce435d82c1645 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
@@ -24,9 +24,8 @@
 /* { dg-final { scan-assembler-times {\mli\M} 1 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mrlwinm\M} 1 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\madd\M} 1 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlfs\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\maddi?\M} 2 { target ilp32 } } } */
 
 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c
index 5163692695339..20e3d25348952 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c
@@ -10,8 +10,7 @@
 // P7 variables:  li, addi, stxvw4x, lwa/l

Re: [PATCH v2 2/2] Prevent divide-by-zero

2024-05-29 Thread Patrick O'Neill


On 5/29/24 00:20, Richard Biener wrote:

On Wed, May 29, 2024 at 1:39 AM Patrick O'Neill  wrote:

From: Greg McGary

gcc/ChangeLog:
 * gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
 * testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove xfail.
---
  gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
  gcc/tree-vect-stmts.cc  | 3 ++-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
index fd996a27501..79d03612a22 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
@@ -1,6 +1,5 @@
  /* { dg-do compile } */
  /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
-mno-autovec-segment" } */
-/* { xfail *-*-* } */

  enum e { c, d };
  enum g { f };
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..34f5736ba00 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
  - (vec_num * j + i) * nunits);
 /* remain should now be > 0 and < nunits.  */

^^^


 unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)

So this shouldn't happen.  Do you have a testcase where this triggers?
If < nunits doesn't hold things will also go wrong.


This ICE appears after patch 1 of the series is applied with the testcase:
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c

Executing on host: 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
  
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
  -march=rv64gc_zba_zbb_zbc_zbs -mabi=lp64d -mcmodel=medlow   
-fdiagnostics-plain-output  -O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d 
-mrvv-vector-bits=scalable -O3 -mno-autovec-segment -S   -o no-segment.s    
(timeout = 600)
spawn -ignore SIGHUP 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/xgcc
 
-B/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/build/build-gcc-linux-stage2/gcc/
 
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 -march=rv64gc_zba_zbb_zbc_zbs -mabi=lp64d -mcmodel=medlow 
-fdiagnostics-plain-output -O3 -ftree-vectorize -march=rv64gcv -mabi=lp64d 
-mrvv-vector-bits=scalable -O3 -mno-autovec-segment -S -o no-segment.s
during GIMPLE pass: vect
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c:
 In function 'ClutImageChannel':
/home/runner/work/gcc-precommit-ci/gcc-precommit-ci/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c:45:1:
 internal compiler error: Floating point exception
0x13e06b3 crash_signal
    ../../../gcc/gcc/toplev.cc:319
0x2840977 poly_int<2u, poly_result::result_kind>::type> operator-<2u, unsigned long, unsigned long>(poly_int<2u, unsigned long> 
const&, poly_int<2u, unsigned long> const&)
    ../../../gcc/gcc/poly-int.h:871
0x2840977 vectorizable_load
    ../../../gcc/gcc/tree-vect-stmts.cc:11558
0x284da2d vect_transform_stmt(vec_info*, _stmt_vec_info*, 
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
    ../../../gcc/gcc/tree-vect-stmts.cc:13416
0x16a00ce vect_transform_loop_stmt
    ../../../gcc/gcc/tree-vect-loop.cc:11618
0x16ca4f2 vect_transform_loop(_loop_vec_info*, gimple*)
    ../../../gcc/gcc/tree-vect-loop.cc:12144
0x1712a8d vect_transform_loops
    ../../../gcc/gcc/tree-vectorizer.cc:1006
0x17131c3 try_vectorize_loop_1
    ../../../gcc/gcc/tree-vectorizer.cc:1152
0x17131c3 try_vectorize_loop
    ../../../gcc/gcc/tree-vectorizer.cc:1182
0x17137fc execute
    ../../../gcc/gcc/tree-vectorizer.cc:1298
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
compiler exited with status 1

Greg created this patch as a fix for that ICE so I'm guessing it was 
root-caused to be a nunits == 0 divide-by-zero.


@Greg McGary is the author of this patch and would know best.

Patrick



Richard.



+   && constant_multiple_p (nunits, remain, &num))
   {
 tree ptype;
 new_vtype
--
2.43.2


[PATCH v2 00/12] OpenMP: Metadirective support + "declare variant" improvements

2024-05-29 Thread Sandra Loosemore
This is an updated version of the patch series I posted a few weeks
ago:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650725.html

I won't duplicate the full list of things implemented/fixed here from
the original patch mail.  The incremental changes since then include:

* I rebased the entire patch series against mainline head, as the
  previous set wouldn't apply cleanly any more.

* I have fixed the previously-noted test regression in
  declare-variant-1.f90.  (The fix is the new patch hunks to
  tree-nested.cc that I have folded into part 2.)

* While working on that, I also made some tweaks to the raw-format
  pretty print support, also incorporated into part 2.

* After I posted the previous patch set, the Linaro CI testbot
  reported a failure in c-c++-common/gomp/declare-variant-13.c on
  aarch64.  That should be fixed now too (incorporated into part 9).

So, other than rebasing, the only substantive changes from the last version
are in parts 2 and 9.

I'm still reserving the previously-noted problems related to PR113904
for some future follow-up work.  This is already a large patch set and
it would be helpful to get it reviewed before layering on more changes
to "declare variant" that touch the same code.

-Sandra

Sandra Loosemore (12):
  OpenMP: metadirective tree data structures and front-end interfaces
  OpenMP: middle-end support for metadirectives
  libgomp: runtime support for target_device selector
  OpenMP: C front end support for metadirectives
  OpenMP: C++ front-end support for metadirectives
  OpenMP: common c/c++ testcases for metadirectives
  OpenMP: Fortran front-end support for metadirectives.
  OpenMP: Reject other properties with kind(any)
  OpenMP: Extend dynamic selector support to declare variant
  OpenMP: Remove dead code from declare variant reimplementation
  OpenMP: Update "declare target"/OpenMP context interaction
  OpenMP: Update documentation of metadirective implementation status.

 gcc/Makefile.in   |2 +-
 gcc/builtin-types.def |2 +
 gcc/c-family/c-attribs.cc |2 -
 gcc/c-family/c-common.h   |4 +-
 gcc/c-family/c-gimplify.cc|   27 +
 gcc/c-family/c-omp.cc |   60 +-
 gcc/c-family/c-pragma.cc  |1 +
 gcc/c-family/c-pragma.h   |1 +
 gcc/c/c-decl.cc   |8 +-
 gcc/c/c-parser.cc |  473 +++-
 gcc/cgraph.cc |2 -
 gcc/cgraph.h  |   12 +-
 gcc/cgraphclones.cc   |2 +-
 gcc/cp/cp-tree.h  |2 +
 gcc/cp/decl.cc|2 +-
 gcc/cp/decl2.cc   |9 +-
 gcc/cp/parser.cc  |  526 -
 gcc/cp/parser.h   |7 +
 gcc/cp/pt.cc  |  120 +
 gcc/cp/semantics.cc   |3 +-
 gcc/doc/generic.texi  |   32 +
 gcc/doc/gimple.texi   |6 +
 gcc/fortran/decl.cc   |   29 +
 gcc/fortran/dump-parse-tree.cc|   21 +
 gcc/fortran/gfortran.h|   20 +-
 gcc/fortran/io.cc |2 +-
 gcc/fortran/match.h   |2 +
 gcc/fortran/openmp.cc |  294 ++-
 gcc/fortran/parse.cc  |  571 +++--
 gcc/fortran/parse.h   |8 +-
 gcc/fortran/resolve.cc|6 +
 gcc/fortran/st.cc |4 +
 gcc/fortran/symbol.cc |   25 +-
 gcc/fortran/trans-decl.cc |5 +-
 gcc/fortran/trans-openmp.cc   |  238 +-
 gcc/fortran/trans-stmt.h  |1 +
 gcc/fortran/trans.cc  |1 +
 gcc/fortran/types.def |2 +
 gcc/gimple-low.cc |   36 +
 gcc/gimple-pretty-print.cc|   78 +
 gcc/gimple-streamer-in.cc |   13 +
 gcc/gimple-streamer-out.cc|   10 +
 gcc/gimple-walk.cc|   28 +
 gcc/gimple.cc |   36 +
 gcc/gimple.def|8 +
 gcc/gimple.h  |  122 +-
 gcc/gimplify.cc   |  574 +++--
 gcc/gimplify.h|2 +-
 gcc/gsstruct.def  |2 +
 gcc/ipa-free-lang-data.cc |2 +-
 gcc/ipa.cc|3 -
 gcc/lto-cgraph.cc |   12 +-
 gcc/lto-streamer-out.cc   |3 +-
 gcc/lto-streamer.h 

[PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-05-29 Thread Sandra Loosemore
This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.

This patch also adds compile-time support for dynamic context
selectors (the target_device selector set and the condition selector
of the user selector set) for metadirectives only.  The "declare
variant" directive still supports only static selectors.

gcc/ChangeLog
* Makefile.in (GTFILES): Move omp-general.h earlier in the list.
* builtin-types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR):
New.
* doc/generic.texi (OpenMP): Document OMP_METADIRECTIVE and
context selector interfaces.
* omp-builtins.def (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE): New.
* omp-general.cc (omp_check_context_selector): Add metadirective_p
parameter, use it to conditionalize target_device support.
(make_omp_metadirective_variant): New.
(omp_context_selector_matches): Add metadirective_p and delay_p
parameters, use them to control things that can only be matched
late.  Handle OMP_TRAIT_SET_TARGET_DEVICE.
(score_wide_int): Move definition to omp-general.h.
(omp_encode_kind_arch_isa_props): New.
(omp_dynamic_cond): New.
(omp_context_compute_score): Handle OMP_TRAIT_SET_TARGET_DEVICE.
(omp_resolve_late_declare_variant, omp_resolve_declare_variant):
Adjust calls to omp_context_selector_matches.
(sort_variant): New.
(omp_get_dynamic_candidates): New.
(omp_early_resolve_metadirective): New.
* omp-general.h (score_wide_int): Moved here from omp-general.cc.
(struct omp_variant): New.
(OMP_METADIRECTIVE_VARIANT_SELECTOR): New.
(OMP_METADIRECTIVE_VARIANT_DIRECTIVE): New.
(OMP_METADIRECTIVE_VARIANT_BODY): New.
(make_omp_metadirective_variant): Declare.
(omp_check_context_selector): Adjust to match definition.
(omp_context_selector_matches): Likewise.
(omp_early_resolve_metadirective): New.
* tree-pretty-print.cc (dump_omp_context_selector): Remove
static qualifier.
(dump_generic_node): Handle OMP_METADIRECTIVE.
* tree-pretty-print.h (dump_omp_context_selector): Declare.
* tree.def (OMP_METADIRECTIVE): New.
* tree.h (OMP_METADIRECTIVE_VARIANTS): New.

gcc/c/ChangeLog
* c-parser.cc (c_finish_omp_declare_variant): Update calls to
omp_check_context_selector and omp_context_selector_matches.

gcc/cp/ChangeLog
* decl.cc (omp_declare_variant_finalize_one):  Update call to
omp_context_selector_matches to pass additional arguments.
* parser.cc (cp_finish_omp_declare_variant): Likewise for
omp_check_context_selector.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant):  Update calls to
omp_check_context_selector and omp_context_selector_matches.
* types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR): New.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
---
 gcc/Makefile.in |   2 +-
 gcc/builtin-types.def   |   2 +
 gcc/c/c-parser.cc   |   4 +-
 gcc/cp/decl.cc  |   2 +-
 gcc/cp/parser.cc|   2 +-
 gcc/doc/generic.texi|  32 
 gcc/fortran/trans-openmp.cc |   4 +-
 gcc/fortran/types.def   |   2 +
 gcc/omp-builtins.def|   3 +
 gcc/omp-general.cc  | 357 ++--
 gcc/omp-general.h   |  31 +++-
 gcc/tree-pretty-print.cc|  36 +++-
 gcc/tree-pretty-print.h |   2 +
 gcc/tree.def|   6 +
 gcc/tree.h  |   3 +
 15 files changed, 461 insertions(+), 27 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a7f15694c34..d08889a3cec 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2869,6 +2869,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/tree-ssa-operands.h \
   $(srcdir)/tree-profile.cc $(srcdir)/tree-nested.cc \
   $(srcdir)/omp-offload.h \
+  $(srcdir)/omp-general.h \
   $(srcdir)/omp-general.cc \
   $(srcdir)/omp-low.cc \
   $(srcdir)/targhooks.cc $(out_file) $(srcdir)/passes.cc \
@@ -2895,7 +2896,6 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/ipa-strub.cc \
   $(srcdir)/internal-fn.h \
   $(srcdir)/calls.cc \
-  $(srcdir)/omp-general.h \
   $(srcdir)/analyzer/analyzer-language.cc \
   @all_gtfiles@
 
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..605a38ab84d 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -878,6 +878,8 @@ DEF_FUNCTION_TYPE_4 (BT_FN_VOID_UINT_PTR_INT_PTR, BT_VOID, 
BT_INT, BT_PTR,
 BT_INT, BT_PTR)
 DEF_FUNCTION_TYPE_4 (BT_FN_BOOL_UINT_UINT_UINT_BOOL,
 BT_BOO

[PATCH v2 04/12] OpenMP: C front end support for metadirectives

2024-05-29 Thread Sandra Loosemore
This patch adds support to the C front end to parse OpenMP metadirective
constructs.  It includes support for early parse-time resolution
of metadirectives (when possible) that will also be used by the C++ front
end.

Additional common C/C++ testcases are in a later patch in the series.

gcc/c-family/ChangeLog
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_metadirective): Declare.
* c-gimplify.cc: Include omp-general.h.
(genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Call it.
* c-omp.cc (c_omp_directives): Add "metadirective" and fix
commented-out stubs for the begin/end form.
(c_omp_expand_metadirective_r): New.
(c_omp_expand_metadirective): New.
* c-pragma.cc (omp_pragmas): Add "metadirective".
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.

gcc/c/ChangeLog
* c-parser.cc (struct c_parser): Add new fields for metadirectives.
(c_parser_skip_to_end_of_block_or_statement):  Add metadirective_p
parameter; use it to control brace and parentheses behavior.
(mangle_metadirective_region_label): New.
(c_parser_label, c_parser_statement_after_labels): Use it.
(c_parser_pragma): Handle metadirective.
(c_parser_omp_context_selector): Add metadirective_p flag, use it
to gate support for non-constant user condition.
(c_parser_omp_context_selector_specification): Add metadirective_p
argument.
(c_parser_finish_omp_declare_variant): Adjust call to above.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.

gcc/testsuite/ChangeLog
* gcc.dg/gomp/metadirective-1.c: New.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
---
 gcc/c-family/c-common.h |   4 +-
 gcc/c-family/c-gimplify.cc  |  27 ++
 gcc/c-family/c-omp.cc   |  60 ++-
 gcc/c-family/c-pragma.cc|   1 +
 gcc/c-family/c-pragma.h |   1 +
 gcc/c/c-parser.cc   | 489 +++-
 gcc/testsuite/gcc.dg/gomp/metadirective-1.c |  15 +
 7 files changed, 577 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/metadirective-1.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 2d5f5399885..03f62571531 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1391,7 +1391,8 @@ enum c_omp_directive_kind {
   C_OMP_DIR_CONSTRUCT,
   C_OMP_DIR_DECLARATIVE,
   C_OMP_DIR_UTILITY,
-  C_OMP_DIR_INFORMATIONAL
+  C_OMP_DIR_INFORMATIONAL,
+  C_OMP_DIR_META
 };
 
 struct c_omp_directive {
@@ -1405,6 +1406,7 @@ extern const struct c_omp_directive c_omp_directives[];
 extern const struct c_omp_directive *c_omp_categorize_directive (const char *,
 const char *,
 const char *);
+extern tree c_omp_expand_metadirective (vec &);
 
 /* Return next tree in the chain for chain_next walking of tree nodes.  */
 inline tree
diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index 494da49791d..c53aca60bcf 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "context.h"
 #include "tree-pass.h"
 #include "internal-fn.h"
+#include "omp-general.h"
 
 /*  The gimplification pass converts the language-dependent trees
 (ld-trees) emitted by the parser into language-independent trees
@@ -485,6 +486,27 @@ genericize_omp_for_stmt (tree *stmt_p, int *walk_subtrees, 
void *data,
   finish_bc_block (&OMP_FOR_BODY (stmt), bc_continue, clab);
 }
 
+/* Genericize a OMP_METADIRECTIVE node *STMT_P.  */
+
+static void
+genericize_omp_metadirective_stmt (tree *stmt_p, int *walk_subtrees,
+  void *data, walk_tree_fn func,
+  walk_tree_lh lh)
+{
+  tree stmt = *stmt_p;
+
+  for (tree variant = OMP_METADIRECTIVE_VARIANTS (stmt);
+   variant != NULL_TREE;
+   variant = TREE_CHAIN (variant))
+{
+  walk_tree_1 (&OMP_METADIRECTIVE_VARIANT_DIRECTIVE (variant),
+  func, data, NULL, lh);
+  walk_tree_1 (&OMP_METADIRECTIVE_VARIANT_BODY (variant),
+  func, data, NULL, lh);
+}
+
+  *walk_subtrees = 0;
+}
 
 /* Lower structured control flow tree nodes, such as loops.  The
STMT_P, WALK_SUBTREES, and DATA arguments are as for the walk_tree_fn
@@ -533,6 +555,11 @@ c_genericize_control_stmt (tree *stmt_p, int 
*walk_subtrees, void *data,
   genericize_omp_for_stmt (stmt_p, walk_subtrees, data, func, lh);
   break;
 
+case OMP_METADIRECTIVE:
+  genericize_omp_metadirective_stmt (stmt_p, walk_subtrees, data, func,
+lh);
+  break;
+

[PATCH v2 05/12] OpenMP: C++ front-end support for metadirectives

2024-05-29 Thread Sandra Loosemore
This patch adds C++ support for metadirectives.  It uses the
c-family support committed with the corresponding C front end patch
to do early parse-time metadirective resolution when possible.

Additional C/C++ common testcases are provided in a subsequent
patch in the series.

gcc/cp/ChangeLog
* parser.cc (cp_parser_skip_to_end_of_block_or_statement): Add
metadirective_p parameter, use it to control brace/parentheses
behavior for metadirectives.
(mangle_metadirective_region_label): New.
(cp_parser_label_for_labeled_statement): Use it.
(cp_parser_jump_statement): Likewise.
(cp_parser_omp_context_selector): Add metadirective_p
parameter, use it to control error behavior for non-constant exprs
properties.
(cp_parser_omp_context_selector_specification): Add metadirective_p
parameter, use it for cp_parser_omp_context_selector call.
(cp_finish_omp_declare_variant): Adjust call to
cp_parser_omp_context_selector_specification.
(analyze_metadirective_body): New.
(cp_parser_omp_metadirective): New.
(cp_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
* parser.h (struct cp_parser): Add fields for metadirective parsing
state.
* pt.cc (tsubst_omp_context_selector): New.
(tsubst_stmt): Handle OMP_METADIRECTIVE.

gcc/testsuite/ChangeLog
* g++.dg/gomp/attrs-metadirective-1.C: New.
* g++.dg/gomp/attrs-metadirective-2.C: New.
* g++.dg/gomp/attrs-metadirective-3.C: New.
* g++.dg/gomp/attrs-metadirective-4.C: New.
* g++.dg/gomp/attrs-metadirective-5.C: New.
* g++.dg/gomp/attrs-metadirective-6.C: New.
* g++.dg/gomp/attrs-metadirective-7.C: New.
* g++.dg/gomp/attrs-metadirective-8.C: New.

libgomp/ChangeLog
* testsuite/libgomp.c++/metadirective-template-1.C: New.
* testsuite/libgomp.c++/metadirective-template-2.C: New.
* testsuite/libgomp.c++/metadirective-template-3.C: New.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
---
 gcc/cp/parser.cc  | 524 +-
 gcc/cp/parser.h   |   7 +
 gcc/cp/pt.cc  | 119 
 .../g++.dg/gomp/attrs-metadirective-1.C   |  40 ++
 .../g++.dg/gomp/attrs-metadirective-2.C   |  74 +++
 .../g++.dg/gomp/attrs-metadirective-3.C   |  31 ++
 .../g++.dg/gomp/attrs-metadirective-4.C   |  41 ++
 .../g++.dg/gomp/attrs-metadirective-5.C   |  24 +
 .../g++.dg/gomp/attrs-metadirective-6.C   |  31 ++
 .../g++.dg/gomp/attrs-metadirective-7.C   |  31 ++
 .../g++.dg/gomp/attrs-metadirective-8.C   |  16 +
 .../libgomp.c++/metadirective-template-1.C|  37 ++
 .../libgomp.c++/metadirective-template-2.C|  41 ++
 .../libgomp.c++/metadirective-template-3.C|  41 ++
 14 files changed, 1044 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-3.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-4.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-5.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-6.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-7.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/attrs-metadirective-8.C
 create mode 100644 libgomp/testsuite/libgomp.c++/metadirective-template-1.C
 create mode 100644 libgomp/testsuite/libgomp.c++/metadirective-template-2.C
 create mode 100644 libgomp/testsuite/libgomp.c++/metadirective-template-3.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 8f3d566aa25..30461d241a2 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -3003,7 +3003,7 @@ static void cp_parser_skip_to_end_of_statement
 static void cp_parser_consume_semicolon_at_end_of_statement
   (cp_parser *);
 static void cp_parser_skip_to_end_of_block_or_statement
-  (cp_parser *);
+  (cp_parser *, bool = false);
 static bool cp_parser_skip_to_closing_brace
   (cp_parser *);
 static bool cp_parser_skip_entire_template_parameter_list
@@ -4192,9 +4192,11 @@ cp_parser_consume_semicolon_at_end_of_statement 
(cp_parser *parser)
have consumed a non-nested `;'.  */
 
 static void
-cp_parser_skip_to_end_of_block_or_statement (cp_parser* parser)
+cp_parser_skip_to_end_of_block_or_statement (cp_parser* parser,
+bool metadirective_p)
 {
   int nesting_depth = 0;
+  int bracket_depth = 0;
 
   /* Unwind generic function template scope if necessary.  */
   if (parser->fully_implicit_function_template_p)
@@ -4216,7 +4218,7 @@ cp_parser_skip_to_end_of_block_or_statement (cp_parser* 
parser)
 
case CPP_SEMICOLON:
  /* Stop if this is an unnested ';'. */
- if (!nesting_depth)
+ if (!nesting_de

[PATCH v2 08/12] OpenMP: Reject other properties with kind(any)

2024-05-29 Thread Sandra Loosemore
The OpenMP spec says:

"If trait-property any is specified in the kind trait-selector of the
device selector set or the target_device selector sets, no other
trait-property may be specified in the same selector set."

GCC was not previously enforcing this restriction and several testcases
included such valid constructs.  This patch fixes it.

gcc/ChangeLog
* omp-general.cc (omp_check_context_selector): Reject other
properties in the same selector set with kind(any).

gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-variant-10.c: Fix broken tests.
* c-c++-common/gomp/declare-variant-3.c: Likewise.
* c-c++-common/gomp/declare-variant-9.c: Likewise.
* c-c++-common/gomp/declare-variant-any.c: New.
* gfortran.dg/gomp/declare-variant-10.f90: Fix broken tests.
* gfortran.dg/gomp/declare-variant-3.f90: Likewise.
* gfortran.dg/gomp/declare-variant-9.f90: Likewise.
* gfortran.dg/gomp/declare-variant-any.f90: Likewise.
---
 gcc/omp-general.cc| 31 +++
 .../c-c++-common/gomp/declare-variant-10.c|  4 +--
 .../c-c++-common/gomp/declare-variant-3.c | 10 ++
 .../c-c++-common/gomp/declare-variant-9.c |  4 +--
 .../c-c++-common/gomp/declare-variant-any.c   | 10 ++
 .../gfortran.dg/gomp/declare-variant-10.f90   |  4 +--
 .../gfortran.dg/gomp/declare-variant-3.f90| 12 ++-
 .../gfortran.dg/gomp/declare-variant-9.f90|  2 +-
 .../gfortran.dg/gomp/declare-variant-any.f90  | 28 +
 9 files changed, 82 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/declare-variant-any.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/declare-variant-any.f90

diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc
index 6f36b5d163f..23072b10d75 100644
--- a/gcc/omp-general.cc
+++ b/gcc/omp-general.cc
@@ -1277,6 +1277,8 @@ omp_check_context_selector (location_t loc, tree ctx, 
bool metadirective_p)
   for (tree tss = ctx; tss; tss = TREE_CHAIN (tss))
 {
   enum omp_tss_code tss_code = OMP_TSS_CODE (tss);
+  bool saw_any_prop = false;
+  bool saw_other_prop = false;
 
   /* FIXME: not implemented yet.  */
   if (!metadirective_p && tss_code == OMP_TRAIT_SET_TARGET_DEVICE)
@@ -1314,6 +1316,27 @@ omp_check_context_selector (location_t loc, tree ctx, 
bool metadirective_p)
  else
ts_seen[ts_code] = true;
 
+
+ /* If trait-property "any" is specified in the "kind"
+trait-selector of the "device" selector set or the
+"target_device" selector sets, no other trait-property
+may be specified in the same selector set.  */
+ if (ts_code == OMP_TRAIT_DEVICE_KIND)
+   for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
+ {
+   const char *prop = omp_context_name_list_prop (p);
+   if (!prop)
+ continue;
+   else if (strcmp (prop, "any") == 0)
+ saw_any_prop = true;
+   else
+ saw_other_prop = true;
+ }
+   else if (ts_code == OMP_TRAIT_DEVICE_ARCH
+  || ts_code == OMP_TRAIT_DEVICE_ISA
+  || ts_code == OMP_TRAIT_DEVICE_NUM)
+   saw_other_prop = true;
+
  if (omp_ts_map[ts_code].valid_properties == NULL)
continue;
 
@@ -1366,6 +1389,14 @@ omp_check_context_selector (location_t loc, tree ctx, 
bool metadirective_p)
  break;
  }
}
+
+  if (saw_any_prop && saw_other_prop)
+   {
+ error_at (loc,
+   "no other trait-property may be specified "
+   "in the same selector set with %");
+ return error_mark_node;
+   }
 }
   return ctx;
 }
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c 
b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
index 2b8a39425b1..e77693430d1 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-10.c
@@ -7,7 +7,7 @@ void f01 (void);
 #pragma omp declare variant (f01) match (device={isa(avx512f,avx512bw)})
 void f02 (void);
 void f03 (void);
-#pragma omp declare variant (f03) match 
(device={kind("any"),arch(x86_64),isa(avx512f,avx512bw)})
+#pragma omp declare variant (f03) match 
(device={arch(x86_64),isa(avx512f,avx512bw)})
 void f04 (void);
 void f05 (void);
 #pragma omp declare variant (f05) match (device={kind(gpu)})
@@ -28,7 +28,7 @@ void f15 (void);
 #pragma omp declare variant (f15) match (device={isa(sse4,ssse3),arch(i386)})
 void f16 (void);
 void f17 (void);
-#pragma omp declare variant (f17) match (device={kind(any,fpga)})
+#pragma omp declare variant (f17) match (device={kind(fpga)})
 void f18 (void);
 
 #pragma omp declare target
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-3.c 
b/gcc/testsuite/c-c++-common/gomp/declare-variant-3.c
index f5

[PATCH v2 02/12] OpenMP: middle-end support for metadirectives

2024-05-29 Thread Sandra Loosemore
This patch adds middle-end support for OpenMP metadirectives.  Some
context selectors can be resolved during gimplification, but others need to
be deferred until the omp_device_lower pass, which requires that cgraph,
LTO streaming, inlining, etc all know about this construct as well.

gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add has_metadirectives flag.
* cgraphclones.cc (cgraph_node::create_clone): Copy has_metadirectives
flag.
* doc/gimple.texi (Class hierarchy of GIMPLE statements): Document
gomp_metadirective and gomp_variant.
* gimple-low.cc (lower_omp_metadirective): New.
(lower_stmt): Call it.
* gimple-pretty-print.cc (dump_gimple_omp_metadirective): New.
(pp_gimple_stmt_1): Call it.
* gimple-streamer-in.cc (input_gimple_stmt): Handle
GIMPLE_OMP_METADIRECTIVE.
* gimple-streamer-out.cc (output_gimple_stmt): Likewise.
* gimple-walk.cc (walk_gimple_op): Likewise.
(walk_gimple_stmt): Likewise.
* gimple.cc (gimple_alloc_omp_metadirective): New.
(gimple_build_omp_metadirective): New.
(gimple_build_omp_variant): New.
* gimple.def (GIMPLE_OMP_METADIRECTIVE): New.
(GIMPLE_OMP_METADIRECTIVE_VARIANT): New.
* gimple.h (gomp_variant, gomp_metadirective): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(gimple_alloc_omp_metadirective): New.
(gimple_build_omp_metadirective): New.
(gimple_build_omp_variant): New.
(gimple_has_substatements): Handle GIMPLE_OMP_METADIRECTIVE.
(gimple_has_ops): Likewise.
(gimple_omp_metadirective_label): New.
(gimple_omp_metadirective_set_label): New.
(gimple_omp_variants): New.
(gimple_omp_metadirective_set_variants): New.
(gimple_return_set_retval): Handle GIMPLE_OMP_METADIRECTIVE.
* gimplify.cc (is_gimple_stmt): HANDLE OMP_METADIRECTIVE.
(expand_omp_metadirective): New.
(gimplify_omp_metadirective): New.
(gimplify_expr): Call it.
* gsstruct.def (GSS_OMP_METADIRECTIVE): New.
(GSS_OMP_METADIRECTIVE_VARIANT): New.
* lto-cgraph.cc (lto_output_node): Handle has_metadirectives flag.
(input_overwrite_node): Likewise.
* omp-expand.cc (expand_omp_target): Propagate has_metadirectives
flag.
(build_omp_regions_1): Handle GIMPLE_OMP_METADIRECTIVE.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (omp_late_resolve_metadirective): New.
* omp-general.h (omp_late_resolve_metadirective): Declare.
* omp-low.cc (struct omp_context): Add next_clone field.
(new_omp_context): Handle next_clone field.
(clone_omp_context): New.
(delete_omp_context): Delete clones.
(create_omp_child_function): Propagate has_metadirectives bit.
(scan_omp_metadirective): New.
(scan_omp_1_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(lower_omp_metadirective): New.
(lower_omp_1): Handle GIMPLE_OMP_METADIRECTIVE.  Warn about
direct calls to offloadable functions containing metadirectives.
* omp-offload.cc: Include cfganal.h and cfghooks.h.
(omp_expand_metadirective): New.
(execute_omp_device_lower): Handle metadirectives.
(pass_omp_device_lower::gate):  Check has_metadirectives bit.
* omp-simd-clone.cc (simd_clone_create): Propagate has_metadirectives
flag.
* tree-cfg.cc (cleanup_dead_labels): Handle GIMPLE_OMP_METADIRECTIVE.
(gimple_redirect_edge_and_branch): Likewise.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(estimate_num_instructions): Likewise.
(expand_call_inline): Propagate has_metadirectives flag.
(tree_function_versioning): Likewise.
* tree-nested.cc (convert_nonlocal_reference_stmt): Handle
GIMPLE_OMP_METADIRECTIVE specially.
(convert_local_reference_stmt): Likewise.
(convert_tramp_reference_stmt): Likewise.
(convert_gimple_call): Likewise.
* tree-ssa-operands.cc: Include omp-general.h.
(operands_scanner::parse_ssa_operands): Handle
GIMPLE_OMP_METADIRECTIVE.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
Co-Authored-By: Marcel Vollweiler 
---
 gcc/cgraph.h   |   3 +
 gcc/cgraphclones.cc|   1 +
 gcc/doc/gimple.texi|   6 ++
 gcc/gimple-low.cc  |  36 
 gcc/gimple-pretty-print.cc |  78 
 gcc/gimple-streamer-in.cc  |  10 ++
 gcc/gimple-streamer-out.cc |   6 ++
 gcc/gimple-walk.cc |  28 ++
 gcc/gimple.cc  |  35 +++
 gcc/gimple.def |   7 ++
 gcc/gimple.h   | 100 +++-
 gcc/gimplify.cc| 184 +
 gcc/gsstruct.def   |   

[PATCH v2 03/12] libgomp: runtime support for target_device selector

2024-05-29 Thread Sandra Loosemore
This patch implements the libgomp runtime support for the dynamic
target_device selector via the GOMP_evaluate_target_device function.

include/ChangeLog
* cuda/cuda.h (CUdevice_attribute): Add definitions for
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.

libgomp/ChangeLog
* Makefile.am (libgomp_la_SOURCES): Add selector.c.
* Makefile.in: Regenerate.
* config/gcn/selector.c: New.
* config/linux/selector.c: New.
* config/linux/x86/selector.c: New.
* config/nvptx/selector.c: New.
* libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New.
* libgomp.h (struct gomp_device_descr): Add evaluate_device_func field.
* libgomp.map (GOMP_5.1.3): New, add GOMP_evaluate_target_device.
* libgomp.texi (OpenMP Context Selectors): Document dynamic selector
matching of kind/arch/isa.
* libgomp_g.h (GOMP_evaluate_current_device): New.
(GOMP_evaluate_target_device): New.
* oacc-host.c (host_evaluate_device): New.
(host_openacc_exec): Initialize evaluate_device_func field to
host_evaluate_device.
* plugin/plugin-gcn.c (gomp_match_selectors): New.
(gomp_match_isa): New.
(GOMP_OFFLOAD_evaluate_device): New.
* plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and
compute_minor fields.
(nvptx_open_device): Read compute capability information from device.
(gomp_match_selectors): New.
(gomp_match_selector): New.
(CHECK_ISA): New macro.
(GOMP_OFFLOAD_evaluate_device): New.
* selector.c: New.
* target.c (GOMP_evaluate_target_device): New.
(gomp_load_plugin_for_device): Load evaluate_device plugin function.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
---
 include/cuda/cuda.h |   2 +
 libgomp/Makefile.am |   2 +-
 libgomp/Makefile.in |   5 +-
 libgomp/config/gcn/selector.c   | 102 +++
 libgomp/config/linux/selector.c |  65 +
 libgomp/config/linux/x86/selector.c | 406 
 libgomp/config/nvptx/selector.c |  77 ++
 libgomp/libgomp-plugin.h|   2 +
 libgomp/libgomp.h   |   1 +
 libgomp/libgomp.map |   5 +
 libgomp/libgomp.texi|  18 +-
 libgomp/libgomp_g.h |   8 +
 libgomp/oacc-host.c |  11 +
 libgomp/plugin/plugin-gcn.c |  52 
 libgomp/plugin/plugin-nvptx.c   |  82 ++
 libgomp/selector.c  |  64 +
 libgomp/target.c|  40 +++
 17 files changed, 936 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/config/gcn/selector.c
 create mode 100644 libgomp/config/linux/selector.c
 create mode 100644 libgomp/config/linux/x86/selector.c
 create mode 100644 libgomp/config/nvptx/selector.c
 create mode 100644 libgomp/selector.c

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..a775450df03 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,6 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
 } CUdevice_attribute;
 
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index 855f0affddf..ba2dd0bb3c2 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -70,7 +70,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c error.c \
target.c splay-tree.c libgomp-plugin.c oacc-parallel.c oacc-host.c \
oacc-init.c oacc-mem.c oacc-async.c oacc-plugin.c oacc-cuda.c \
priority_queue.c affinity-fmt.c teams.c allocator.c oacc-profiling.c \
-   oacc-target.c target-indirect.c
+   oacc-target.c target-indirect.c selector.c
 
 include $(top_srcdir)/plugin/Makefrag.am
 
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index da902f3daca..b5d704992fc 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -219,7 +219,7 @@ am_libgomp_la_OBJECTS = alloc.lo atomic.lo barrier.lo 
critical.lo \
oacc-parallel.lo oacc-host.lo oacc-init.lo oacc-mem.lo \
oacc-async.lo oacc-plugin.lo oacc-cuda.lo priority_queue.lo \
affinity-fmt.lo teams.lo allocator.lo oacc-profiling.lo \
-   oacc-target.lo target-indirect.lo $(am__objects_1)
+   oacc-target.lo target-indirect.lo selector.lo $(am__objects_1)
 libgomp_la_OBJECTS = $(am_libgomp_la_OBJECTS)
 AM_V_P = $(am__v_P_@AM_V@)
 am__v_P_ = $(am__v_P_@AM_DEFAULT_V@)
@@ -552,7 +552,7 @@ libgomp_la_SOURCES = alloc.c atomic.c barrier.c critical.c 
env.c \
oacc-parallel.c oacc-host.c oacc-init.c oacc-mem.c \
oacc-async.c oa

[PATCH v2 12/12] OpenMP: Update documentation of metadirective implementation status.

2024-05-29 Thread Sandra Loosemore
libgomp/ChangeLog
* libgomp.texi (OpenMP 5.0): Mark metadirective and declare variant
as implemented.
(OpenMP 5.1): Mark target_device as supported.
Add changed interaction between declare target and OpenMP context
and dynamic selector support.
(OpenMP 5.2): Mark otherwise clause as supported, note that
default is also still accepted.
---
 libgomp/libgomp.texi | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 43048da4d6e..af7af63c504 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -192,9 +192,8 @@ The OpenMP 4.5 specification is fully supported.
 @item Array shaping @tab N @tab
 @item Array sections with non-unit strides in C and C++ @tab N @tab
 @item Iterators @tab Y @tab
-@item @code{metadirective} directive @tab N @tab
-@item @code{declare variant} directive
-  @tab P @tab @emph{simd} traits not handled correctly
+@item @code{metadirective} directive @tab Y @tab
+@item @code{declare variant} directive @tab Y @tab
 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
@@ -289,8 +288,8 @@ The OpenMP 4.5 specification is fully supported.
 @headitem Description @tab Status @tab Comments
 @item OpenMP directive as C++ attribute specifiers @tab Y @tab
 @item @code{omp_all_memory} reserved locator @tab Y @tab
-@item @emph{target_device trait} in OpenMP Context @tab N @tab
-@item @code{target_device} selector set in context selectors @tab N @tab
+@item @emph{target_device trait} in OpenMP Context @tab Y
+@item @code{target_device} selector set in context selectors @tab Y @tab
 @item C/C++'s @code{declare variant} directive: elision support of
   preprocessed code @tab N @tab
 @item @code{declare variant}: new clauses @code{adjust_args} and
@@ -366,6 +365,12 @@ to address of matching mapped list item per 5.1, Sect. 
2.21.7.2 @tab N @tab
 @item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N 
@tab
 @item @code{present} modifier to the @code{map}, @code{to} and @code{from}
   clauses @tab Y @tab
+@item Changed interaction between @code{declare target} and OpenMP context
+  @tab Y @tab
+@item Dynamic selector support in @code{metadirective} @tab Y @tab
+@item Dynamic selector support in @code{declare variant} @tab P
+  @tab Fortran rejects non-constant expressions in dynamic selectors;
+  C/C++ reject expressions using argument variables.
 @end multitable
 
 
@@ -413,8 +418,10 @@ to address of matching mapped list item per 5.1, Sect. 
2.21.7.2 @tab N @tab
 @item Deprecation of traits array following the allocator_handle expression in
   @code{uses_allocators} @tab N @tab
 @item New @code{otherwise} clause as alias for @code{default} on metadirectives
-  @tab N @tab
-@item Deprecation of @code{default} clause on metadirectives @tab N @tab
+  @tab Y @tab
+@item Deprecation of @code{default} clause on metadirectives @tab N
+  @tab Both @code{otherwise} and @code{default} are accepted
+  without diagnostics.
 @item Deprecation of delimited form of @code{declare target} @tab N @tab
 @item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
 @item @code{allocate} and @code{firstprivate} clauses on @code{scope}
-- 
2.25.1



[PATCH v2 07/12] OpenMP: Fortran front-end support for metadirectives.

2024-05-29 Thread Sandra Loosemore
This patch adds support for metadirectives to the Fortran front end.

gcc/fortran/ChangeLog
* decl.cc (gfc_match_end): Handle metadirectives.
* dump-parse-tree.cc (show_omp_node): Likewise.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_METADIRECTIVE.
(struct gfc_omp_clauses): Rename target_first_st_is_teams field to
target_first_st_is_teams_or_meta.
(struct gfc_omp_variant): New.
(struct gfc_st_label): Add omp_region field.
(gfc_exec_op): Add EXEC_OMP_METADIRECTIVE.
(struct gfc_code): Add omp_variants field.
(gfc_free_omp_variants): Declare.
(match_omp_directive): Declare.
(is_omp_declarative_stmt): Declare.
* io.cc (format_asterisk): Add initializer for new omp_region field.
* match.h (gfc_match_omp_begin_metadirective): Declare.
(gfc_match_omp_metadirective): Declare.
* openmp.cc (gfc_match_omp_eos): Special case for matching an
OpenMP context selector.
(gfc_free_omp_variants): New.
(gfc_match_omp_clauses): Remove context_selector parameter.
(match_omp): Adjust call to gfc_match_omp_clauses.
(gfc_match_omp_context_selector): Add metadirective_p parameter.
Adjust error-checking logic and calls to gfc_match_omp_clauses.
Set gfc_matching_omp_context_selector.
(gfc_match_omp_context_selector_specification): Generalize to take
a set selector list pointer as parameter, instead of a
declare variant pointer.
(gfc_match_omp_declare_variant): Adjust call to match above change.
(match_omp_metadirective): New.
(gfc_match_omp_begin_metadirective): New.
(gfc_match_omp_metadirective): New.
(resolve_omp_metadirective): New.
(resolve_omp_target): Handle metadirectives.
(gfc_resolve_omp_directive): Handle metadirectives.
* parse.cc (gfc_matching_omp_context_selector): New.
(gfc_in_metadirective_body): New.
(gfc_omp_region_count): New.
(decode_omp_directive): Handle "begin metadirective", "end
metadirective", and "metadirective".
(match_omp_directive): New.
(case_omp_structured_block): New define.
(case_omp_do): New define.
(gfc_ascii_statement): Handle ST_OMP_BEGIN_METADIRECTIVE,
ST_OMP_END_METADIRECTIVE, and ST_OMP_METADIRECTIVE.
(accept_statement): Handle ST_OMP_BEGIN_METADIRECTIVE and
ST_OMP_METADIRECTIVE.
(gfc_omp_end_stmt): New.
(parse_omp_do): Use gfc_omp_end_stmt.  Special-case
"omp end metadirective" to end the current construct.
(parse_omp_structured_block): Likewise.  Adjust setting of
target_first_st_is_teams_or_meta flag.
(parse_omp_metadirective_body): New.
(parse_executable): Handle metadirectives.  Use
case_omp_structured_block and case_omp_do here.
(gfc_parse_file): Initialize gfc_omp_region_count,
gfc_in_metadirective_body, and gfc_matching_omp_context_selector.
(is_omp_declarative_stmt): New.
* parse.h (enum gfc_compile_state): Add metadirective constructs.
(gfc_omp_end_stmt): Declare.
(gfc_matching_omp_context_selector): Declare.
(gfc_in_metadirective_body): Declare.
(gfc_omp_region_count): Declare.
* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_METADIRECTIVE.
* st.cc (gfc_free_statement): Handle EXEC_OMP_METADIRECTIVE.
* symbol.cc (compare_st_labels): Compare omp_region, not just the
value.
(gfc_get_st_label): Likewise.  Initialize the omp_region field when
creating a new label.
* trans-decl.cc (gfc_get_label_decl): Encode the omp_region in the
label name.
* trans-openmp.cc (gfc_trans_omp_directive): Handle
EXEC_OMP_METADIRECTIVE.
(gfc_trans_omp_set_selector): New, split from...
(gfc_trans_omp_declare_variant): ...here.
(gfc_trans_omp_metadirective): New.
* trans-stmt.h (gfc_trans_omp_metadirective): Declare.
* trans.cc (trans_code): Handle EXEC_OMP_METADIRECTIVE.

gcc/testsuite/ChangeLog

* gfortran.dg/gomp/metadirective-1.f90: New.
* gfortran.dg/gomp/metadirective-10.f90: New.
* gfortran.dg/gomp/metadirective-11.f90: New.
* gfortran.dg/gomp/metadirective-2.f90: New.
* gfortran.dg/gomp/metadirective-3.f90: New.
* gfortran.dg/gomp/metadirective-4.f90: New.
* gfortran.dg/gomp/metadirective-5.f90: New.
* gfortran.dg/gomp/metadirective-6.f90: New.
* gfortran.dg/gomp/metadirective-7.f90: New.
* gfortran.dg/gomp/metadirective-8.f90: New.
* gfortran.dg/gomp/metadirective-9.f90: New.
* gfortran.dg/gomp/metadirective-construct.f90: New.
* gfortran.dg/gomp/metadirective-no-score.f90: New.
* gfortran.dg/gomp/pure-1.f90: Add metadirective test.
* gfortran.dg/gom

[PATCH v2 09/12] OpenMP: Extend dynamic selector support to declare variant

2024-05-29 Thread Sandra Loosemore
This patch extends the mechanisms previously added to support dynamic
selectors in metavariant constructs to also apply to "declare
variant".  The front-end mechanisms used to handle "declare variant"
via attributes attached to the function decls remain the same, but the
gimplifier now uses the same internal data structures and helper
functions as metadirective to score and sort "declare variant"
alternatives, and constructs a gomp_metadirective node for variant
calls that cannot be resolved at gimplification time.  During late
resolution, this gomp_metadirective is processed in exactly the same
way as for real metadirectives.

During implementation of this functionality, a number of bugs were
discovered in the previous selector scoring and matching code:

* Metadirective resolution was failing to account for scoring in
  "declare simd" clones, and was also relying on calling a function to
  match construct constructors that's only useful during
  gimplification during late resolution long after that pass.

* The construct constructor scoring was previously implemented backwards
  from the specification (PR114596); a number of testcases were also broken
  in the same way as the implementation.

* The special rules for matching simdlen and aligned properties on simd
  selectors were not implemented (nor were these properties on metadirectives
  being rejected per the OpenMP spec).

This patch includes a new implementation of this functionality that
has cleaner interfaces and is hopefully(!) easier to correlate to
requirements of the OpenMP specification.  Instead of relying on the
gimplifier to score construct selectors, the scoring code has been
consolidated in omp-general.cc with the gimplifier only providing
the OpenMP construct context surrounding the metadirective or variant
call.  This is cached on the gomp_metadirective if necessary for late
resolution.

An additional improvement added in this patch is that for both
metadirective and "declare variant", if late resolution is required the
gimplifier now discards all alternatives that are known not to match.

Note that this patch leaves a substantial amount of dead code that
was used to support the former late "declare variant" resolution strategy,
notably the declare_variant_alt and calls_declare_variant_alt flags on
cgraph_node and all the code that touches those fields.  The next
patch in this series removes that unused code.

Another issue not addressed in this patch is the special scoping rules
for expressions in "declare variant" dynamic selectors, which is still
under discussion in PR113904.  We expect this to be fixed separately.

gcc/c/ChangeLog
* c-parser.c (c_parser_omp_context_selector): Remove metadirective_p
parameter and conditionalization.
(c_parser_omp_context_selector_specification): Remove metadirective_p
parameter and adjust call not to pass it on.
(c_finish_omp_declare_variant): Adjust arguments on calls to
c_parser_omp_context_selector_specification and
omp_context_selector_matches.
(c_parser_omp_metadirective): Likewise.

gcc/cp/ChangeLog
* cp-tree.h (struct saved_scope): Add new field
x_processing_omp_trait_property_expr.
(processing_omp_trait_property_expr): Define
* decl.cc (omp_declare_variant_finalize_one): Adjust arguments
to omp_context_selector_matches.
* parser.cc (cp_parser_omp_context_selector): Remove metadirective_p
argument and conditionalization.
(cp_parser_omp_context_selector_specification): Remove metadirective_p
argument and adjust call not to pass it on.
(cp_finish_omp_declare_variant): Adjust arguments on call to above.
(cp_parser_omp_metadirective): Likewise.
* pt.cc (tsubst_omp_context_selector): Adjust error behavior.
(tsubst_stmt): Adjust call to omp_context_selector_matches.
* semantics.cc (finish_id_expression_1): Do not diagnose error
for use of parameter in declare variant selector here.

gcc/fortran/ChangeLog
* trans-openmp.cc (gfc_trans_omp_declare_variant): Adjust arguments
to omp_context_selector_matches.
(gfc_trans_omp_metadirective): Likewise.

gcc/Changelog
* gimple-streamer-in.cc (input_gimple_stmt): Restore
gomp_metadirective context.
* gimple-streamer-out.cc (output_gimple_stmt): Save
gomp_metadirective context.
* gimple.cc (gimple_build_omp_metadirective): Initialize
gomp_metadirective context.
* gimple.def (GIMPLE_OMP_METADIRECTIVE): Update comments.
* gimple.h (gomp_metadirective): Add context field and update comments.
(gimple_omp_metadirective_context): New.
(gimple_omp_metadirective_set_context): New.
* gimplify.cc (omp_resolved_variant_calls): New.
(gimplify_variant_call_expr): New.
(gimplify_call_expr): Adjust parameters.  Call
gimplify_variant_call_expr to handle declar

[PATCH v2 06/12] OpenMP: common c/c++ testcases for metadirectives

2024-05-29 Thread Sandra Loosemore
gcc/testsuite/ChangeLog
* c-c++-common/gomp/metadirective-1.c: New.
* c-c++-common/gomp/metadirective-2.c: New.
* c-c++-common/gomp/metadirective-3.c: New.
* c-c++-common/gomp/metadirective-4.c: New.
* c-c++-common/gomp/metadirective-5.c: New.
* c-c++-common/gomp/metadirective-6.c: New.
* c-c++-common/gomp/metadirective-7.c: New.
* c-c++-common/gomp/metadirective-8.c: New.
* c-c++-common/gomp/metadirective-construct.c: New.
* c-c++-common/gomp/metadirective-device.c: New.
* c-c++-common/gomp/metadirective-no-score.c: New.
* c-c++-common/gomp/metadirective-target-device.c: New.

libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/metadirective-1.c: New.
* testsuite/libgomp.c-c++-common/metadirective-2.c: New.
* testsuite/libgomp.c-c++-common/metadirective-3.c: New.
* testsuite/libgomp.c-c++-common/metadirective-4.c: New.
* testsuite/libgomp.c-c++-common/metadirective-5.c: New.

Co-Authored-By: Kwok Cheung Yeung 
Co-Authored-By: Sandra Loosemore 
---
 .../c-c++-common/gomp/metadirective-1.c   |  52 +
 .../c-c++-common/gomp/metadirective-2.c   |  74 
 .../c-c++-common/gomp/metadirective-3.c   |  31 +++
 .../c-c++-common/gomp/metadirective-4.c   |  40 
 .../c-c++-common/gomp/metadirective-5.c   |  24 +++
 .../c-c++-common/gomp/metadirective-6.c   |  31 +++
 .../c-c++-common/gomp/metadirective-7.c   |  31 +++
 .../c-c++-common/gomp/metadirective-8.c   |  16 ++
 .../gomp/metadirective-construct.c| 177 ++
 .../c-c++-common/gomp/metadirective-device.c  | 147 +++
 .../gomp/metadirective-no-score.c |  95 ++
 .../gomp/metadirective-target-device.c| 147 +++
 .../libgomp.c-c++-common/metadirective-1.c|  35 
 .../libgomp.c-c++-common/metadirective-2.c|  41 
 .../libgomp.c-c++-common/metadirective-3.c|  34 
 .../libgomp.c-c++-common/metadirective-4.c|  52 +
 .../libgomp.c-c++-common/metadirective-5.c|  46 +
 17 files changed, 1073 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-7.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-8.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-construct.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-device.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/metadirective-no-score.c
 create mode 100644 
gcc/testsuite/c-c++-common/gomp/metadirective-target-device.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/metadirective-1.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/metadirective-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/metadirective-3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/metadirective-4.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/metadirective-5.c

diff --git a/gcc/testsuite/c-c++-common/gomp/metadirective-1.c 
b/gcc/testsuite/c-c++-common/gomp/metadirective-1.c
new file mode 100644
index 000..37b56237531
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/metadirective-1.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+
+#define N 100
+
+void f (int a[], int b[], int c[])
+{
+  int i;
+
+  #pragma omp metadirective \
+  default (teams loop) \
+  default (parallel loop) /* { dg-error "too many 'otherwise' or 'default' 
clauses in 'metadirective'" } */
+for (i = 0; i < N; i++) c[i] = a[i] * b[i];
+
+  #pragma omp metadirective \
+  otherwise (teams loop) \
+  default (parallel loop) /* { dg-error "too many 'otherwise' or 'default' 
clauses in 'metadirective'" } */
+for (i = 0; i < N; i++) c[i] = a[i] * b[i];
+
+  #pragma omp metadirective \
+  otherwise (teams loop) \
+  otherwise (parallel loop) /* { dg-error "too many 'otherwise' or 
'default' clauses in 'metadirective'" } */
+for (i = 0; i < N; i++) c[i] = a[i] * b[i];
+
+  #pragma omp metadirective \
+  default (bad_directive) /* { dg-error "unknown directive name before 
'\\)' token" } */
+for (i = 0; i < N; i++) c[i] = a[i] * b[i];
+
+  #pragma omp metadirective \
+  default (teams loop) \
+  where (device={arch("nvptx")}: parallel loop) /* { dg-error "'where' is 
not valid for 'metadirective'" } */
+for (i = 0; i < N; i++) c[i] = a[i] * b[i];
+
+  #pragma omp metadirective \
+  default (teams loop) \
+  when (device={arch("nvptx")} parallel loop) /* { dg-error "expected 

[PATCH v2 11/12] OpenMP: Update "declare target"/OpenMP context interaction

2024-05-29 Thread Sandra Loosemore
The code and test case previously implemented the OpenMP 5.0 spec,
which said in section 2.3.1:

"For functions within a declare target block, the target trait is added
to the beginning of the set..."

In OpenMP 5.1, this was changed to
"For device routines, the target trait is added to the beginning of
the set..."

In OpenMP 5.2 and TR12, it says:
"For procedures that are determined to be target function variants
by a declare target directive..."

The definition of "device routine" in OpenMP 5.1 is confusing, but
certainly the intent of the later versions of the spec is clear that
it doesn't just apply to functions within a begin declare target/end
declare target block.

The only use of the "omp declare target block" function attribute was
to support the 5.0 language, so it can be removed.  This patch changes
the context augmentation to use the "omp declare target" attribute
instead.

gcc/c-family/ChangeLog
* c-attribs.cc (c_common_gnu_attributes): Delete "omp declare
target block".

gcc/c/ChangeLog
* c-decl.cc (c_decl_attributes): Don't add "omp declare target
block".

gcc/cp/decl2.cc
* decl2.cc (cplus_decl_attributes): Don't add "omp declare target
block".

gcc/ChangeLog
* omp-general.cc (omp_complete_construct_context): Check
"omp declare target" attribute, not "omp declare target block".

gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-target-indirect-2.c : Adjust
expected output for removal of "omp declare target block".
* c-c++-common/gomp/declare-variant-8.c: Likewise, the variant
call to f20 is now resolved differently.
* c-c++-common/gomp/reverse-offload-1.c: Adjust expected output.
* gfortran.dg/gomp/declare-variant-8.f90: Likewise, both f18
and f20 now resolve to the variant.  Delete obsolete comments.
---
 gcc/c-family/c-attribs.cc|  2 --
 gcc/c/c-decl.cc  |  8 ++--
 gcc/cp/decl2.cc  |  9 ++---
 gcc/omp-general.cc   |  2 +-
 .../c-c++-common/gomp/declare-target-indirect-2.c| 10 +-
 gcc/testsuite/c-c++-common/gomp/declare-variant-8.c  |  4 ++--
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c  |  2 +-
 gcc/testsuite/gfortran.dg/gomp/declare-variant-8.f90 | 12 ++--
 8 files changed, 15 insertions(+), 34 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 04e39b41bdf..582d99ada1b 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -570,8 +570,6 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_omp_declare_target_attribute, NULL },
   { "omp declare target nohost", 0, 0, true, false, false, false,
  handle_omp_declare_target_attribute, NULL },
-  { "omp declare target block", 0, 0, true, false, false, false,
- handle_omp_declare_target_attribute, NULL },
   { "non overlapping",   0, 0, true, false, false, false,
  handle_non_overlapping_attribute, NULL },
   { "alloc_align",   1, 1, false, true, true, false,
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index b691b91b3db..20cdb647f57 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -5431,12 +5431,8 @@ c_decl_attributes (tree *node, tree attributes, int 
flags)
attributes = tree_cons (get_identifier ("omp declare target implicit"),
NULL_TREE, attributes);
   else
-   {
- attributes = tree_cons (get_identifier ("omp declare target"),
- NULL_TREE, attributes);
- attributes = tree_cons (get_identifier ("omp declare target block"),
- NULL_TREE, attributes);
-   }
+   attributes = tree_cons (get_identifier ("omp declare target"),
+   NULL_TREE, attributes);
   if (TREE_CODE (*node) == FUNCTION_DECL)
{
  int device_type
diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 7baff46a192..8b5f2006b3b 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -1781,13 +1781,8 @@ cplus_decl_attributes (tree *decl, tree attributes, int 
flags)
  = tree_cons (get_identifier ("omp declare target implicit"),
   NULL_TREE, attributes);
  else
-   {
- attributes = tree_cons (get_identifier ("omp declare target"),
- NULL_TREE, attributes);
- attributes
-   = tree_cons (get_identifier ("omp declare target block"),
-NULL_TREE, attributes);
-   }
+   attributes = tree_cons (get_identifier ("omp declare target"),
+   NULL_TREE, attributes);
  if (TREE_CODE (*decl) == FUNCTION_DECL)
{

[PATCH v2 10/12] OpenMP: Remove dead code from declare variant reimplementation

2024-05-29 Thread Sandra Loosemore
After reimplementing late resolution of "declare variant" to use the
same mechanisms as metadirective, the declare_variant_alt and
calls_declare_variant_alt flags on struct cgraph_node are no longer
used by anything.  For the purposes of marking functions that need
late resolution, the has_metadirectives flag has replaced
calls_declare_variant_alt.

Likewise struct omp_declare_variant_entry, struct
omp_declare_variant_base_entry, and the hash tables used to store
these structures are no longer needed, since the information needed for
late resolution is now stored in the gomp_metadirective nodes.

There are no functional changes in this patch, just removing dead code.

gcc/ChangeLog
* cgraph.cc (symbol_table::create_edge): Don't set
calls_declare_variant_alt in the caller.
* cgraph.h (struct cgraph_node): Remove declare_variant_alt
and calls_declare_variant_alt flags.
* cgraphclones.cc (cgraph_node::create_clone): Don't copy
calls_declare_variant_alt bit.
* ipa-free-lang-data.cc (free_lang_data_in_decl): Adjust code
referencing declare_variant_alt bit.
* ipa.cc (symbol_table::remove_unreachable_nodes): Likewise.
* lto-cgraph.cc (lto_output_node): Remove references to deleted
bits.
(output_refs): Adjust code referencing declare_variant_alt bit.
(input_overwrite_node): Remove references to deleted bits.
(input_refs): Adjust code referencing declare_variant_alt bit.
* lto-streamer-out.cc (lto_output): Likewise.
* lto-streamer.h (omp_lto_output_declare_variant_alt): Delete.
(omp_lto_input_declare_variant_alt): Delete.
* lto/lto-partition.cc (lto_balanced_map): Adjust code referencing
deleted declare_variant_alt bit.
* omp-expand.cc (expand_omp_target): Use has_metadirectives bit to
trigger pass_omp_device_lower instead of calls_declare_variant_alt.
* omp-general.cc (struct omp_declare_variant_entry): Delete.
(struct omp_declare_variant_base_entry): Delete.
(struct omp_declare_variant_hasher): Delete.
(omp_declare_variant_hasher::hash): Delete.
(omp_declare_variant_hasher::equal): Delete.
(omp_declare_variants): Delete.
(omp_declare_variant_alt_hasher): Delete.
(omp_declare_variant_alt_hasher::hash): Delete.
(omp_declare_variant_alt_hasher::equal): Delete.
(omp_declare_variant_alt): Delete.
(omp_lto_output_declare_variant_alt): Delete.
(omp_lto_input_declare_variant_alt): Delete.
(includes): Delete unnecessary include of gt-omp-general.h.
* omp-offload.cc (execute_omp_device_lower): Remove references
to deleted bit.
(pass_omp_device_lower::gate): Likewise.
* omp-simd-clone.cc (simd_clone_create): Likewise.
* passes.cc (ipa_write_summaries): Likeise.
* symtab.cc (symtab_node::get_partitioning_class): Likewise.
* tree-inline.cc (expand_call_inline): Likewise.
(tree_function_versioning): Likewise.
---
 gcc/cgraph.cc |   2 -
 gcc/cgraph.h  |  11 +-
 gcc/cgraphclones.cc   |   1 -
 gcc/ipa-free-lang-data.cc |   2 +-
 gcc/ipa.cc|   3 -
 gcc/lto-cgraph.cc |  10 --
 gcc/lto-streamer-out.cc   |   3 +-
 gcc/lto-streamer.h|   6 --
 gcc/lto/lto-partition.cc  |   5 +-
 gcc/omp-expand.cc |   2 +-
 gcc/omp-general.cc| 218 --
 gcc/omp-offload.cc|   8 +-
 gcc/omp-simd-clone.cc |   2 -
 gcc/passes.cc |   3 +-
 gcc/symtab.cc |   2 +-
 gcc/tree-inline.cc|   4 -
 16 files changed, 10 insertions(+), 272 deletions(-)

diff --git a/gcc/cgraph.cc b/gcc/cgraph.cc
index 473d8410bc9..103bc2c0332 100644
--- a/gcc/cgraph.cc
+++ b/gcc/cgraph.cc
@@ -931,8 +931,6 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node 
*callee,
  caller->decl);
   else
 edge->in_polymorphic_cdtor = caller->thunk;
-  if (callee)
-caller->calls_declare_variant_alt |= callee->declare_variant_alt;
 
   if (callee && symtab->state != LTO_STREAMING
   && edge->callee->comdat_local_p ())
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 6653ce19c3e..dd210842df7 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -897,10 +897,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public 
symtab_node
   split_part (false), indirect_call_target (false), local (false),
   versionable (false), can_change_signature (false),
   redefined_extern_inline (false), tm_may_enter_irr (false),
-  ipcp_clone (false), declare_variant_alt (false),
-  calls_declare_variant_alt (false), gc_candidate (false),
-  called_by_ifunc_resolver (false),
-  has_metadirectives (false),
+  ipcp_clone (false), gc_candidate (false),
+  called_by_ifunc_resolver (false), has_metadirectives (false),
   m_uid (uid), m_summary_id (-1)
 

Re: [PATCH v9 2/5] Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.

2024-05-29 Thread Qing Zhao
Richard and Joseph:


> On May 28, 2024, at 17:09, Qing Zhao  wrote:
> 
>>> 
>>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>>> index fa17eff551e8..d75b23668925 100644
>>> --- a/gcc/varasm.cc
>>> +++ b/gcc/varasm.cc
>>> @@ -5082,6 +5082,11 @@ initializer_constant_valid_p_1 (tree value, tree 
>>> endtype, tree *cache)
>>>   }
>>>  return ret;
>>> 
>>> +case CALL_EXPR:
>>> +  /* For a call to .ACCESS_WITH_SIZE, check the first argument.  */
>>> +  if (tree ref = get_ref_from_access_with_size (value))
>>> +   return initializer_constant_valid_p_1 (ref, endtype, cache);
>> 
>> I think we should fold/strip .ACCESS_WITH_SIZE from initializers
>> instead.  That would be
>> the frontends job I guess, most probably not even generate those in
>> the first place?
> 
> Sounds reasonable, I will see how to do this in C FE. 
> Joseph, do you have any suggestion where in C FE I should do this folding? 
> 
> thanks.
> 
> Qing

In order to address this above comment from Richard, I studied a little bit 
more on the C FE code, and then come up with the following patch:

diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index ac306749e8d7..efd111305b5a 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8650,6 +8650,20 @@ digest_init (location_t init_loc, tree type, tree init, 
tree origtype,
 STRIP_TYPE_NOPS (inside_init);
+  /* If require_constant is TRUE,  when the initializer is a call to
+ .ACCESS_WITH_SIZE, use the first argument as the initializer.
+ For example:
+ y = (char *) .ACCESS_WITH_SIZE ((char *) &static_annotated.c,...)
+ will be converted to
+ y = &static_annotated.c.  */
+
+  if (require_constant
+  && TREE_CODE (inside_init) == NOP_EXPR
+  && TREE_CODE (TREE_OPERAND (inside_init, 0)) == CALL_EXPR
+  && is_access_with_size_p (TREE_OPERAND (inside_init, 0)))
+inside_init
+  = get_ref_from_access_with_size (TREE_OPERAND (inside_init, 0));
+
   if (!c_in_omp_for)
 {
   if (TREE_CODE (inside_init) == EXCESS_PRECISION_EXPR)
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 2e8fa5e30a80..e1a8458f8749 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -5078,10 +5078,6 @@ initializer_constant_valid_p_1 (tree value, tree 
endtype, tree *cache)
  }
   return ret;
-case CALL_EXPR:
-  /* For a call to .ACCESS_WITH_SIZE, check the first argument.  */
-  if (tree ref = get_ref_from_access_with_size (value))
- return initializer_constant_valid_p_1 (ref, endtype, cache);
   /* FALLTHROUGH.  */
 default:
   break;
@@ -5277,11 +5273,6 @@ output_constant (tree exp, unsigned HOST_WIDE_INT size, 
unsigned int align,
  exp = TREE_OPERAND (exp, 0);
 }
-  /* For a call to .ACCESS_WITH_SIZE, check the first argument.  */
-  if (TREE_CODE (exp) == CALL_EXPR)
-if (tree ref = get_ref_from_access_with_size (exp))
-  exp = ref;
-
   code = TREE_CODE (TREE_TYPE (exp));

This resolved the issue well. 

Let me know if you have any comment or suggestion. 

I have fixed all the issues Richard raised in my private workspace, testing is 
ongoing.
Will post the modified patch set soon.

Thanks a lot!

Qing

Re: CFG edge visualization to path-printing bootstrap failure

2024-05-29 Thread David Edelsohn
On Mon, May 20, 2024 at 1:56 PM David Edelsohn  wrote:

> Hi, David
>
> Unfortunately r15-636-g770657d02c986c causes a bootstrap failure on AIX
> when building f951 in stage2.  cc1 and cc1plus link successfully. There
> doesn't seem to be a similar failure for powerpc64-linux BE or LE.
>
> The failure is
>
> ld: 0711-317 ERROR: Undefined symbol: _ZTV29range_label_for_type_mismatch
> ld: 0711-317 ERROR: Undefined symbol:
> ._ZNK29range_label_for_type_mismatch8get_textEj
>
> which corresponds to
>
> vtable for range_label_for_type_mismatch
> range_label_for_type_mismatch::get_text(unsigned int) const
>
> I suspect that something is not being explicitly instantiated, which is
> running afoul of the AIX linker.
>
> Somehow your patch is causing the f951 compiler to reference these
> additional, undefined symbols.  I suspect that they also are undefined for
> Linux targets, but the linker ignores the error and nothing is amiss if the
> symbols never are called.
>
> Thanks, David
>

Thanks for diagnosing and fixing the problem.

David


Re: [PATCH] [testsuite] conditionalize dg-additional-sources on target and type

2024-05-29 Thread Mike Stump
On May 23, 2024, at 6:28 AM, Alexandre Oliva  wrote;
> I came up with an entirely different approach:
> 
> 
> g++.dg/vect/pr95401.cc has dg-additional-sources, and that fails when
> check_vect_support_and_set_flags finds vector support lacking for
> execution tests: tests decay to compile tests, and additional sources
> are rejected by the compiler when compiling to a named output file.
> 
> At first I considered using some effective target to conditionalize
> the additional sources.  There was no support for target-specific
> additional sources, so I added that.
> 
> But then, I found that adding an effective target to check whether the
> test involves linking would just make for busy work in this case, and
> so I went ahead and adjusted the handling of additional sources to
> refrain from adding them on compile tests, reporting them as
> unsupported.
> 
> That solves the problem without using the newly-added machinery for
> per-target additional sources, but I figured since I'd implemented it
> I might as well contribute it, since there might be other uses for it.
> 
> Regstrapped on x86_64-linux-gnu.  Also tested on ppc64-vx7r2 with
> gcc-13.  Ok to install?

Ok.



Re: PING: Re: [PATCH] selftest: invoke "diff" when ASSERT_STREQ fails

2024-05-29 Thread Eric Gallager
On Tue, May 28, 2024 at 1:21 PM David Malcolm  wrote:
>
> Ping.
>
> This patch has actually been *very* helpful to me when debugging
> selftest failures involving ASSERT_STREQ.
>
> Thanks
> Dave
>

Currently `diff` is only listed under the "Tools/packages necessary
for modifying GCC" section of install/prerequisites.html:
https://gcc.gnu.org/install/prerequisites.html
If it's going to become a dependency for actually running GCC, too, it
should get moved to be documented elsewhere, IMO.

> On Fri, 2024-05-17 at 15:51 -0400, David Malcolm wrote:
> > Currently when ASSERT_STREQ or ASSERT_STREQ_AT fail we print
> > both strings to stderr.  However it can be hard to figure out
> > the problem (e.g. for 1-character differences in long strings).
> >
> > Extend the output by writing out the strings to tempfiles and
> > invoking "diff -up" on them when we have such a selftest failure,
> > to (I hope) simplify debugging.
> >
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > OK for trunk?
> >
> > gcc/ChangeLog:
> > * selftest.cc (selftest::print_diff): New function.
> > (selftest::assert_streq): Call it when we have non-equal
> > non-null strings.
> >
> > Signed-off-by: David Malcolm 
> > ---
> >  gcc/selftest.cc | 28 ++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/gcc/selftest.cc b/gcc/selftest.cc
> > index 6438d86a6aa0..f58c0631908e 100644
> > --- a/gcc/selftest.cc
> > +++ b/gcc/selftest.cc
> > @@ -63,6 +63,26 @@ fail_formatted (const location &loc, const char
> > *fmt, ...)
> >abort ();
> >  }
> >
> > +/* Invoke "diff" to print the difference between VAL1 and VAL2
> > +   on stdout.  */
> > +
> > +static void
> > +print_diff (const location &loc, const char *val1, const char *val2)
> > +{
> > +  temp_source_file tmpfile1 (loc, ".txt", val1);
> > +  temp_source_file tmpfile2 (loc, ".txt", val2);
> > +  const char *args[] = {"diff",
> > +   "-up",
> > +   tmpfile1.get_filename (),
> > +   tmpfile2.get_filename (),
> > +   NULL};
> > +  int exit_status = 0;
> > +  int err = 0;
> > +  pex_one (PEX_SEARCH | PEX_LAST,
> > +  args[0], CONST_CAST (char **, args),
> > +  NULL, NULL, NULL, &exit_status, &err);
> > +}
> > +
> >  /* Implementation detail of ASSERT_STREQ.
> > Compare val1 and val2 with strcmp.  They ought
> > to be non-NULL; fail gracefully if either or both are NULL.  */
> > @@ -89,8 +109,12 @@ assert_streq (const location &loc,
> > if (strcmp (val1, val2) == 0)
> >   pass (loc, "ASSERT_STREQ");
> > else
> > - fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> > val2=\"%s\"\n",
> > - desc_val1, desc_val2, val1, val2);
> > + {
> > +   print_diff (loc, val1, val2);
> > +   fail_formatted
> > + (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> > val2=\"%s\"\n",
> > +  desc_val1, desc_val2, val1, val2);
> > + }
> >}
> >  }
> >
>


Re: [pushed] wwwdocs: news: Google+ is no more

2024-05-29 Thread Eric Gallager
Maybe also add a mention of the toolchain's Mastodon account while
you're there? https://fosstodon.org/@gnutools

On Sun, May 26, 2024 at 6:05 PM Gerald Pfeifer  wrote:
>
> Keep the reference as text; just not the link.
>
> Gerald
> ---
>  htdocs/news.html | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/htdocs/news.html b/htdocs/news.html
> index af30872b..09dd2424 100644
> --- a/htdocs/news.html
> +++ b/htdocs/news.html
> @@ -393,8 +393,7 @@
>  [2013-08-08] wwwdocs:
>  GCC and the GNU Toolchain Project now have accounts on
>  https://twitter.com/gnutools"; target="_blank">Twitter and
> -https://plus.google.com/108467477471815191158"; rel="publisher" 
> target="_blank">Google+
> - to help developers stay informed of progress.
> +Google+ to help developers stay informed of progress.
>
>  IBM POWER8 support
>  [2013-07-15] wwwdocs:
> --
> 2.45.0


Re: PING: Re: [PATCH] selftest: invoke "diff" when ASSERT_STREQ fails

2024-05-29 Thread David Malcolm
On Wed, 2024-05-29 at 16:35 -0400, Eric Gallager wrote:
> On Tue, May 28, 2024 at 1:21 PM David Malcolm 
> wrote:
> > 
> > Ping.
> > 
> > This patch has actually been *very* helpful to me when debugging
> > selftest failures involving ASSERT_STREQ.
> > 
> > Thanks
> > Dave
> > 
> 
> Currently `diff` is only listed under the "Tools/packages necessary
> for modifying GCC" section of install/prerequisites.html:
> https://gcc.gnu.org/install/prerequisites.html
> If it's going to become a dependency for actually running GCC, too,
> it
> should get moved to be documented elsewhere, IMO.

All this is selftest code, and is turned off in a release configuration
of GCC.  The code path that invokes "diff" is when a selftest is
failing, which is immediately before a hard failure of the *build* of
GCC.  So arguably this is just a build-time thing for people
packaging/hacking on GCC, and thus not a new dependency for end-usage.

BTW I'm a bit hazy on the details of how "pex" is meant to work, so
hopefully someone more knowledgable than me can comment on that aspect
of the patch.  It seems to work though.

Dave

> 
> > On Fri, 2024-05-17 at 15:51 -0400, David Malcolm wrote:
> > > Currently when ASSERT_STREQ or ASSERT_STREQ_AT fail we print
> > > both strings to stderr.  However it can be hard to figure out
> > > the problem (e.g. for 1-character differences in long strings).
> > > 
> > > Extend the output by writing out the strings to tempfiles and
> > > invoking "diff -up" on them when we have such a selftest failure,
> > > to (I hope) simplify debugging.
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> > > 
> > > gcc/ChangeLog:
> > >     * selftest.cc (selftest::print_diff): New function.
> > >     (selftest::assert_streq): Call it when we have non-equal
> > >     non-null strings.
> > > 
> > > Signed-off-by: David Malcolm 
> > > ---
> > >  gcc/selftest.cc | 28 ++--
> > >  1 file changed, 26 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/gcc/selftest.cc b/gcc/selftest.cc
> > > index 6438d86a6aa0..f58c0631908e 100644
> > > --- a/gcc/selftest.cc
> > > +++ b/gcc/selftest.cc
> > > @@ -63,6 +63,26 @@ fail_formatted (const location &loc, const
> > > char
> > > *fmt, ...)
> > >    abort ();
> > >  }
> > > 
> > > +/* Invoke "diff" to print the difference between VAL1 and VAL2
> > > +   on stdout.  */
> > > +
> > > +static void
> > > +print_diff (const location &loc, const char *val1, const char
> > > *val2)
> > > +{
> > > +  temp_source_file tmpfile1 (loc, ".txt", val1);
> > > +  temp_source_file tmpfile2 (loc, ".txt", val2);
> > > +  const char *args[] = {"diff",
> > > +   "-up",
> > > +   tmpfile1.get_filename (),
> > > +   tmpfile2.get_filename (),
> > > +   NULL};
> > > +  int exit_status = 0;
> > > +  int err = 0;
> > > +  pex_one (PEX_SEARCH | PEX_LAST,
> > > +  args[0], CONST_CAST (char **, args),
> > > +  NULL, NULL, NULL, &exit_status, &err);
> > > +}
> > > +
> > >  /* Implementation detail of ASSERT_STREQ.
> > >     Compare val1 and val2 with strcmp.  They ought
> > >     to be non-NULL; fail gracefully if either or both are NULL. 
> > > */
> > > @@ -89,8 +109,12 @@ assert_streq (const location &loc,
> > >     if (strcmp (val1, val2) == 0)
> > >   pass (loc, "ASSERT_STREQ");
> > >     else
> > > - fail_formatted (loc, "ASSERT_STREQ (%s, %s)\n
> > > val1=\"%s\"\n
> > > val2=\"%s\"\n",
> > > - desc_val1, desc_val2, val1, val2);
> > > + {
> > > +   print_diff (loc, val1, val2);
> > > +   fail_formatted
> > > + (loc, "ASSERT_STREQ (%s, %s)\n val1=\"%s\"\n
> > > val2=\"%s\"\n",
> > > +  desc_val1, desc_val2, val1, val2);
> > > + }
> > >    }
> > >  }
> > > 
> > 
> 



Re: CFG edge visualization to path-printing bootstrap failure

2024-05-29 Thread David Malcolm
On Wed, 2024-05-29 at 15:26 -0400, David Edelsohn wrote:
> On Mon, May 20, 2024 at 1:56 PM David Edelsohn 
> wrote:
> 
> > Hi, David
> > 
> > Unfortunately r15-636-g770657d02c986c causes a bootstrap failure on
> > AIX
> > when building f951 in stage2.  cc1 and cc1plus link successfully.
> > There
> > doesn't seem to be a similar failure for powerpc64-linux BE or LE.
> > 
> > The failure is
> > 
> > ld: 0711-317 ERROR: Undefined symbol:
> > _ZTV29range_label_for_type_mismatch
> > ld: 0711-317 ERROR: Undefined symbol:
> > ._ZNK29range_label_for_type_mismatch8get_textEj
> > 
> > which corresponds to
> > 
> > vtable for range_label_for_type_mismatch
> > range_label_for_type_mismatch::get_text(unsigned int) const
> > 
> > I suspect that something is not being explicitly instantiated,
> > which is
> > running afoul of the AIX linker.
> > 
> > Somehow your patch is causing the f951 compiler to reference these
> > additional, undefined symbols.  I suspect that they also are
> > undefined for
> > Linux targets, but the linker ignores the error and nothing is
> > amiss if the
> > symbols never are called.
> > 
> > Thanks, David
> > 
> 
> Thanks for diagnosing and fixing the problem.

Thanks for your help!

For reference, this was PR bootstrap/115167, fixed by
r15-865-gb544ff88560e10:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652918.html

Sorry for messing up the mailing list threading.

Dave



Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-05-29 Thread Jeff Law




On 5/28/24 1:01 AM, Richard Biener wrote:

On Fri, May 24, 2024 at 10:46 AM Mariam Arutunian
 wrote:


This patch adds a new compiler pass aimed at identifying naive CRC 
implementations,
characterized by the presence of a loop calculating a CRC (polynomial long 
division).
Upon detection of a potential CRC, the pass prints an informational message.

Performs CRC optimization if optimization level is >= 2,
besides optimizations for size and if fno_gimple_crc_optimization given.

This pass is added for the detection and optimization of naive CRC 
implementations,
improving the efficiency of CRC-related computations.

This patch includes only initial fast checks for filtering out non-CRCs,
detected possible CRCs verification and optimization parts will be provided in 
subsequent patches.


Just a few quick questions - I'm waiting for a revision with Jeffs 
comments cleared before having a closer look.  The patch does

nothing but analyze right now, correct?  I assume a later patch will
fill in stuff in ::execute and use the return value of
loop_may_calculate_crc (it's a bit odd to review such a "split"
thing).
We split it up on functional chunks.  I think if it gets approved it 
probably should go in atomically since it makes no sense to commit the 
first pass recognition filter without the validation step or the 
validation step without the codegen step.


So consider the break down strictly for review convenience.




I think what this does fits final value replacement which lives in 
tree-scalar-evolution.cc and works from the loop-closed PHIs, trying

to replace those.  I'm not sure we want to have a separate pass for
this.  Consider a loop calculating two or four CRCs in parallel, 
replacing LC PHIs one-by-one should be able to handle this.
I suspect that'll be quite hard for both the "does this generally look 
like a CRC loop" code as well as the "validate this is a CRC loop" code.


Mariam, your thoughts on whether or not those two phases could handle a 
loop with two CRC calculations inside, essentially creating two calls to 
our new builtins?


Jeff




[PATCH v3 0/2] RISC-V: add option -m(no-)autovec-segment

2024-05-29 Thread Patrick O'Neill
Sending v3 to fixup testsuite issues and whitespace linter issue.

v2 changelog:
Rebased to squash Edwin's fixup into Greg's patch. Split out the middle-end
change and xfailed the associated testcase so the second patch can land
seperately.

Relying on pre-commit CI for full testing.

v3 changelog:
Use dg-ice to xfail the no-segment.c testcase properly.
Converted remaining testcases to use -mrvv-vector-bits=_.

Greg McGary (2):
  RISC-V: add option -m(no-)autovec-segment
  Prevent divide-by-zero

 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 61 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 gcc/tree-vect-stmts.cc|  3 +-
 69 files changed, 411 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rv

[PATCH v3 2/2] Prevent divide-by-zero

2024-05-29 Thread Patrick O'Neill
From: Greg McGary 

gcc/ChangeLog:
* gcc/tree-vect-stmts.cc (gcc/tree-vect-stmts.cc): Prevent 
divide-by-zero.
* testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: Remove dg-ice.
---
No changes in v3. Depends on the risc-v backend option added in patch 1 to
trigger the ICE.
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c | 1 -
 gcc/tree-vect-stmts.cc  | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
index dfbe09f01a1..79d03612a22 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-march=rv64gcv -mabi=lp64d -mrvv-vector-bits=scalable -O3 
-mno-autovec-segment" } */
-/* { dg-ice "Floating point exception" } */

 enum e { c, d };
 enum g { f };
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 4219ad832db..34f5736ba00 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -11558,7 +11558,8 @@ vectorizable_load (vec_info *vinfo,
 - (vec_num * j + i) * nunits);
/* remain should now be > 0 and < nunits.  */
unsigned num;
-   if (constant_multiple_p (nunits, remain, &num))
+   if (known_gt (remain, 0)
+   && constant_multiple_p (nunits, remain, &num))
  {
tree ptype;
new_vtype
--
2.43.2



[PATCH v3 1/2] RISC-V: add option -m(no-)autovec-segment

2024-05-29 Thread Patrick O'Neill
From: Greg McGary 

Add option -m(no-)autovec-segment to enable/disable autovectorizer
from emitting vector segment load/store instructions. This is useful for
performance experiments.

gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes, 
vec_mask_len_store_lanes):
  Predicate with TARGET_VECTOR_AUTOVEC_SEGMENT
* gcc/config/riscv/riscv-opts.h (TARGET_VECTOR_AUTOVEC_SEGMENT): New 
macro.
* gcc/config/riscv/riscv.opt (-m(no-)autovec-segment): New option.
* testsuite/gcc.target/riscv/rvv/autovec/struct/*_noseg*.c,
testsuite/gcc.target/riscv/rvv/autovec/no-segment.c: New tests.

Tested-by: Edwin Lu 
---
Added tested-by on Vineet's recommendation. Please wait for riscv precommit to
finish before committing.

v3 changelog:
Use dg-ice to expect the no-segment.c ICE.
Converted remaining testcases to use -mrvv-vector-bits=_.
---
 gcc/config/riscv/autovec.md   |  4 +-
 gcc/config/riscv/riscv-opts.h |  5 ++
 gcc/config/riscv/riscv.opt|  4 ++
 .../gcc.target/riscv/rvv/autovec/no-segment.c | 62 +++
 .../autovec/struct/mask_struct_load_noseg-1.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-2.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-3.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-4.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-5.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-6.c |  6 ++
 .../autovec/struct/mask_struct_load_noseg-7.c |  6 ++
 .../struct/mask_struct_load_noseg_run-1.c |  4 ++
 .../struct/mask_struct_load_noseg_run-2.c |  4 ++
 .../struct/mask_struct_load_noseg_run-3.c |  4 ++
 .../struct/mask_struct_load_noseg_run-4.c |  4 ++
 .../struct/mask_struct_load_noseg_run-5.c |  4 ++
 .../struct/mask_struct_load_noseg_run-6.c |  4 ++
 .../struct/mask_struct_load_noseg_run-7.c |  4 ++
 .../struct/mask_struct_store_noseg-1.c|  6 ++
 .../struct/mask_struct_store_noseg-2.c|  6 ++
 .../struct/mask_struct_store_noseg-3.c|  6 ++
 .../struct/mask_struct_store_noseg-4.c|  6 ++
 .../struct/mask_struct_store_noseg-5.c|  6 ++
 .../struct/mask_struct_store_noseg-6.c|  6 ++
 .../struct/mask_struct_store_noseg-7.c|  6 ++
 .../struct/mask_struct_store_noseg_run-1.c|  4 ++
 .../struct/mask_struct_store_noseg_run-2.c|  4 ++
 .../struct/mask_struct_store_noseg_run-3.c|  4 ++
 .../struct/mask_struct_store_noseg_run-4.c|  4 ++
 .../struct/mask_struct_store_noseg_run-5.c|  4 ++
 .../struct/mask_struct_store_noseg_run-6.c|  4 ++
 .../struct/mask_struct_store_noseg_run-7.c|  4 ++
 .../rvv/autovec/struct/struct_vect_noseg-1.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-10.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-11.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-12.c |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-13.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-14.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-15.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-16.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-17.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-18.c |  6 ++
 .../rvv/autovec/struct/struct_vect_noseg-2.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-3.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-4.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-5.c  |  8 +++
 .../rvv/autovec/struct/struct_vect_noseg-6.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-7.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-8.c  |  7 +++
 .../rvv/autovec/struct/struct_vect_noseg-9.c  |  7 +++
 .../autovec/struct/struct_vect_noseg_run-1.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-10.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-11.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-12.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-13.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-14.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-15.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-16.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-17.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-18.c |  4 ++
 .../autovec/struct/struct_vect_noseg_run-2.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-3.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-4.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-5.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-6.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-7.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-8.c  |  4 ++
 .../autovec/struct/struct_vect_noseg_run-9.c  |  4 ++
 68 files changed, 410 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/no-segment.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/struct/mask_struct_load_noseg-2.c
 create mo

[PATCH] aarch64: Add vector floating point extend patterns [PR113880, PR113869]

2024-05-29 Thread Pengxuan Zheng
This patch improves vectorization of certain floating point widening operations
for the aarch64 target by adding vector floating point extend patterns for
V2SF->V2DF and V4HF->V4SF conversions.

PR target/113880
PR target/113869

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (extend2): New expand.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/extend-vec.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md|  7 +++
 gcc/testsuite/gcc.target/aarch64/extend-vec.c | 21 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/extend-vec.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 868f4486218..8febb411d06 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3141,6 +3141,13 @@ (define_insn "aarch64_float_extend_lo_"
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+(define_expand "extend2"
+  [(set (match_operand: 0 "register_operand" "=w")
+(float_extend:
+  (match_operand:VDF 1 "register_operand" "w")))]
+  "TARGET_SIMD"
+)
+
 ;; Float narrowing operations.
 
 (define_insn "aarch64_float_trunc_rodd_df"
diff --git a/gcc/testsuite/gcc.target/aarch64/extend-vec.c 
b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
new file mode 100644
index 000..f6241d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/extend-vec.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.2d, v[0-9]+.2s} 1 } } */
+void
+f (float *__restrict a, double *__restrict b)
+{
+  b[0] = a[0];
+  b[1] = a[1];
+}
+
+/* { dg-final { scan-assembler-times {fcvtl\tv[0-9]+.4s, v[0-9]+.4h} 1 } } */
+void
+f1 (_Float16 *__restrict a, float *__restrict b)
+{
+
+  b[0] = a[0];
+  b[1] = a[1];
+  b[2] = a[2];
+  b[3] = a[3];
+}
-- 
2.17.1



Reverted recent patches to resource.cc

2024-05-29 Thread Hans-Peter Nilsson
> From: Hans-Peter Nilsson 
> Date: Mon, 27 May 2024 19:51:47 +0200

> 2: Does not depend on 1, but corrects an incidentally found wart:
> find_basic_block calls fails too often.  Replace it with "modern"
> insn-to-basic-block cross-referencing.
> 
> 3: Just an addendum to 2: removes an "if", where the condition is now
> always-true, dominated by a gcc_assert, and where the change in
> indentation was too ugly.
> 
> 4: Corrects another incidentally found wart: for the last 15 years the
> code in resource.cc has only been called from within reorg.cc (and
> reorg.c), specifically not possibly before calling init_resource_info
> or after free_resource_info, so we can discard the code that tests
> certain allocated arrays for NULL.  I didn't even bother with a
> gcc_assert; besides some gen*-generated files, only reorg.cc includes
> resource.h (not to be confused with the system sys/resource.h).
> A grep says the #include resource.h can be removed from those gen*
> files and presumably from RESOURCE_H(!) as well.  Some Other Time.
> Also, removed a redundant "if (tinfo != NULL)" and moved the then-code
> into the previous then-clause.
> 
>   resource.cc: Replace calls to find_basic_block with cfgrtl
> BLOCK_FOR_INSN
>   resource.cc (mark_target_live_regs): Remove check for bb not found
>   resource.cc: Remove redundant conditionals

I had to revert those last three patches due to PR
bootstrap/115284.  I hope to revisit once I have a means to
reproduce (and fix) the underlying bug.  It doesn't have to
be a bug with those changes per-se: IMHO the "improved"
lifetimes could just as well have uncovered a bug elsewhere
in reorg.  It's still on me to resolve that situation; done.
I'm just glad the cause was the incidental improvements and
not the original bug I wanted to fix.

There appears to be only a single supported SPARC machine in
cfarm: cfarm216, and I currently can't reach it due to what
appears to be issues at my end.  I guess I'll either fix
that or breathe life into sparc-elf+sim.

brgds, H-P


Re: Reverted recent patches to resource.cc

2024-05-29 Thread Jeff Law




On 5/29/24 7:28 PM, Hans-Peter Nilsson wrote:

From: Hans-Peter Nilsson 
Date: Mon, 27 May 2024 19:51:47 +0200



2: Does not depend on 1, but corrects an incidentally found wart:
find_basic_block calls fails too often.  Replace it with "modern"
insn-to-basic-block cross-referencing.

3: Just an addendum to 2: removes an "if", where the condition is now
always-true, dominated by a gcc_assert, and where the change in
indentation was too ugly.

4: Corrects another incidentally found wart: for the last 15 years the
code in resource.cc has only been called from within reorg.cc (and
reorg.c), specifically not possibly before calling init_resource_info
or after free_resource_info, so we can discard the code that tests
certain allocated arrays for NULL.  I didn't even bother with a
gcc_assert; besides some gen*-generated files, only reorg.cc includes
resource.h (not to be confused with the system sys/resource.h).
A grep says the #include resource.h can be removed from those gen*
files and presumably from RESOURCE_H(!) as well.  Some Other Time.
Also, removed a redundant "if (tinfo != NULL)" and moved the then-code
into the previous then-clause.

   resource.cc: Replace calls to find_basic_block with cfgrtl
 BLOCK_FOR_INSN
   resource.cc (mark_target_live_regs): Remove check for bb not found
   resource.cc: Remove redundant conditionals


I had to revert those last three patches due to PR
bootstrap/115284.  I hope to revisit once I have a means to
reproduce (and fix) the underlying bug.  It doesn't have to
be a bug with those changes per-se: IMHO the "improved"
lifetimes could just as well have uncovered a bug elsewhere
in reorg.  It's still on me to resolve that situation; done.
I'm just glad the cause was the incidental improvements and
not the original bug I wanted to fix.

There appears to be only a single supported SPARC machine in
cfarm: cfarm216, and I currently can't reach it due to what
appears to be issues at my end.  I guess I'll either fix
that or breathe life into sparc-elf+sim.

Or if you've got a reasonable server to use, QEMU might save you :-)

I do bootstraps and regression testsuite runs on a variety of systems 
via qemu (alpha, m68k, aarch64, s390, ppc64, etc).  It ain't fast, but 
it does work if QEMU is in pretty good shape and you can find a root 
filesystem to use.


jeff


Re: Reverted recent patches to resource.cc

2024-05-29 Thread Hans-Peter Nilsson
> Date: Wed, 29 May 2024 20:07:22 -0600
> From: Jeff Law 

> > There appears to be only a single supported SPARC machine in
> > cfarm: cfarm216, and I currently can't reach it due to what
> > appears to be issues at my end.  I guess I'll either fix
> > that or breathe life into sparc-elf+sim.
> Or if you've got a reasonable server to use, QEMU might save you :-)

I believe so. :)

> I do bootstraps and regression testsuite runs on a variety of systems 
> via qemu (alpha, m68k, aarch64, s390, ppc64, etc).  It ain't fast, but 
> it does work if QEMU is in pretty good shape and you can find a root 
> filesystem to use.

That might certainly fit the bill.  I guess you mean with a
filesystem image for e.g. sparc-linux?

I keep postponing looking into getting a working setup
(mostly the baseboard file) for qemu-anything + newlib.
Last I looked, qemu.exp had a serious typo...but I see that
was just for arm-eabi and arm-pi4, so yes, that might be a
viable path, thanks for the reminder.

You (or anyone) don't happen to know if sparc-elf + qemu.exp
is in good shape, or some other specific sparc+qemu
configuration?

That "sparc=*" in qemu.exp entry (at dejagnu
ca371cf9c48186716d) looks suspicious though, so I guess it'd
be a tuple matching "sparc32plus-*" or "sparc64-*".

brgds, H-P


[PATCH-2v4] Value Range: Add range op for builtin isfinite

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isfinite.

  Compared to previous version, the main change is to set the range to
1 if it's finite number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652220.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isfinite

The former patch adds optab for builtin isfinite. Thus builtin isfinite
might not be folded at front end.  So the range op for isfinite is needed
for value range analysis.  This patch adds range op for builtin isfinite.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 4e60a42eaac..5ec5c828fa4 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1233,6 +1233,62 @@ public:
   }
 } op_cfn_isinf;

+//Implement range operator for CFN_BUILT_IN_ISFINITE
+class cfn_isfinite : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isfinite ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   // The range is [-INF,-INF][+INF,+INF] NAN, but it can't be represented.
+   // Set range to varying
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isfinite;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1330,6 +1386,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isinf;
   break;

+case CFN_BUILT_IN_ISFINITE:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isfinite;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
new file mode 100644
index 000..f5dce0a0486
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isfinite.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > -__DBL_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > -__FLT_MAX__ && !__builtin_isfinite (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isfinite (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */


[PATCH-3v2] Value Range: Add range op for builtin isnormal

2024-05-29 Thread HAO CHEN GUI
Hi,
  This patch adds the range op for builtin isnormal. It also adds two
help function in frange to detect range of normal floating-point and
range of subnormal or zero.

  Compared to previous version, the main change is to set the range to
1 if it's normal number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652221.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
Value Range: Add range op for builtin isnormal

The former patch adds optab for builtin isnormal. Thus builtin isnormal
might not be folded at front end.  So the range op for isnormal is needed
for value range analysis.  This patch adds range op for builtin isnormal.

gcc/
* gimple-range-op.cc (class cfn_isfinite): New.
(op_cfn_finite): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CFN_BUILT_IN_ISFINITE.
* value-range.h (class frange): Declare known_isnormal and
known_isdenormal_or_zero.
(frange::known_isnormal): Define.
(frange::known_isdenormal_or_zero): Define.

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 5ec5c828fa4..6787f532f11 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1289,6 +1289,61 @@ public:
   }
 } op_cfn_isfinite;

+//Implement range operator for CFN_BUILT_IN_ISNORMAL
+class cfn_isnormal :  public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isnormal ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || op1.known_isinf ()
+   || op1.known_isdenormal_or_zero ())
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   r.set_varying (type);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   nan_state nan (false);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isnormal;
+
 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
 {
@@ -1391,6 +1446,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_isfinite;
   break;

+case CFN_BUILT_IN_ISNORMAL:
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isnormal;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
new file mode 100644
index 000..c4df4d839b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isnormal.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void test1 (double x)
+{
+  if (x < __DBL_MAX__ && x > __DBL_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__DBL_MIN__ && x > -__DBL_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test2 (float x)
+{
+  if (x < __FLT_MAX__ && x > __FLT_MIN__ && !__builtin_isnormal (x))
+link_error ();
+
+  if (x < -__FLT_MIN__ && x > - __FLT_MAX__ && !__builtin_isnormal (x))
+link_error ();
+}
+
+void test3 (double x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+void test4 (float x)
+{
+  if (__builtin_isnormal (x) && __builtin_isinf (x))
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
diff --git a/gcc/value-range.h b/gcc/value-range.h
index 37ce91dc52d..1443d1906e5 100644
--- a/gcc/value-range.h
+++ b/gcc/value-range.h
@@ -588,6 +588,8 @@ public:
   bool maybe_isinf () const;
   bool signbit_p (bool &signbit) const;
   bool nan_signbit_p (bool &signbit) const;
+  bool known_isnormal () const;
+  bool known_isdenormal_or_zero () const;

 protected:
   virtual bool contains_p (tree cst) const override;
@@ -1650,6 +1652,33 @@ frange::known_isfinite () const
   return (!maybe_isnan () && !real_isinf (&m_min) && !real_isinf (&m_max));
 }

+// Return TRUE if range is known to be normal.
+
+inline bool
+frange::known_isnormal () const
+{
+  if (!known_isfinite ())
+re

[PATCH-1v3] Value Range: Add range op for builtin isinf

2024-05-29 Thread HAO CHEN GUI
Hi,
  The builtin isinf is not folded at front end if the corresponding optab
exists. It causes the range evaluation failed on the targets which has
optab_isinf. For instance, range-sincos.c will fail on the targets which
has optab_isinf as it calls builtin_isinf.

  This patch fixed the problem by adding range op for builtin isinf.

  Compared with previous version, the main change is to set the range to
1 if it's infinite number otherwise to 0.
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652219.html

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen


ChangeLog
Value Range: Add range op for builtin isinf

The builtin isinf is not folded at front end if the corresponding optab
exists.  So the range op for isinf is needed for value range analysis.
This patch adds range op for builtin isinf.

gcc/
* gimple-range-op.cc (class cfn_isinf): New.
(op_cfn_isinf): New variables.
(gimple_range_op_handler::maybe_builtin_call): Handle
CASE_FLT_FN (BUILT_IN_ISINF).

gcc/testsuite/
* gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c: New test.

patch.diff
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 55dfbb23ce2..4e60a42eaac 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -1175,6 +1175,63 @@ private:
   bool m_is_pos;
 } op_cfn_goacc_dim_size (false), op_cfn_goacc_dim_pos (true);

+// Implement range operator for CFN_BUILT_IN_ISINF
+class cfn_isinf : public range_operator
+{
+public:
+  using range_operator::fold_range;
+  using range_operator::op1_range;
+  virtual bool fold_range (irange &r, tree type, const frange &op1,
+  const irange &, relation_trio) const override
+  {
+if (op1.undefined_p ())
+  return false;
+
+if (op1.known_isinf ())
+  {
+   wide_int one = wi::one (TYPE_PRECISION (type));
+   r.set (type, one, one);
+   return true;
+  }
+
+if (op1.known_isnan ()
+   || (!real_isinf (&op1.lower_bound ())
+   && !real_isinf (&op1.upper_bound (
+  {
+   r.set_zero (type);
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+  virtual bool op1_range (frange &r, tree type, const irange &lhs,
+ const frange &, relation_trio) const override
+  {
+if (lhs.undefined_p ())
+  return false;
+
+if (lhs.zero_p ())
+  {
+   nan_state nan (true);
+   r.set (type, real_min_representable (type),
+  real_max_representable (type), nan);
+   return true;
+  }
+
+if (!range_includes_zero_p (lhs))
+  {
+   // The range is [-INF,-INF][+INF,+INF], but it can't be represented.
+   // Set range to [-INF,+INF]
+   r.set_varying (type);
+   r.clear_nan ();
+   return true;
+  }
+
+r.set_varying (type);
+return true;
+  }
+} op_cfn_isinf;

 // Implement range operator for CFN_BUILT_IN_
 class cfn_parity : public range_operator
@@ -1268,6 +1325,11 @@ gimple_range_op_handler::maybe_builtin_call ()
   m_operator = &op_cfn_signbit;
   break;

+CASE_FLT_FN (BUILT_IN_ISINF):
+  m_op1 = gimple_call_arg (call, 0);
+  m_operator = &op_cfn_isinf;
+  break;
+
 CASE_CFN_COPYSIGN_ALL:
   m_op1 = gimple_call_arg (call, 0);
   m_op2 = gimple_call_arg (call, 1);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c 
b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
new file mode 100644
index 000..468f1bcf5c7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/range-isinf.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+#include 
+void link_error();
+
+void
+test1 (double x)
+{
+  if (x > __DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__DBL_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test2 (float x)
+{
+  if (x > __FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+  if (x < -__FLT_MAX__ && !__builtin_isinf (x))
+link_error ();
+}
+
+void
+test3 (double x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __DBL_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__DBL_MAX__)
+link_error ();
+}
+
+void
+test4 (float x)
+{
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x > __FLT_MAX__)
+link_error ();
+  if (!__builtin_isinf (x) && !__builtin_isnan (x) && x < -__FLT_MAX__)
+link_error ();
+}
+
+/* { dg-final { scan-tree-dump-not "link_error" "evrp" } } */
+


Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-29 Thread Hongyu Wang
Gently ping :)
Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can
help to review this part?
Thanks.


Hongyu Wang  于2024年5月23日周四 16:27写道:

>
> Gently ping for this :)
> Hi Richard, Is it OK to adopt the ccmp change? Or did you know who can
> help to review this part?
> Thanks.
>
> Hongyu Wang  于2024年5月15日周三 16:25写道:
> >
> > CC'd Richard for ccmp part as previously it is added only for aarch64.
> > The original logic will not interrupted since if
> > aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also
> > success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the
> > prepare_operand will fixup the input that cmp supports but ccmp not,
> > so ret/ret2 will all be valid when comparing cost.
> > Thanks in advance.
> >
> > Hongyu Wang  于2024年5月15日周三 16:22写道:
> > >
> > > For general ccmp scenario, the tree sequence is like
> > >
> > > _1 = (a < b)
> > > _2 = (c < d)
> > > _3 = _1 & _2
> > >
> > > current ccmp expanding will try to swap compare order for _1 and _2,
> > > compare the cost/cost2 between compare _1 and _2 first, then return the
> > > sequence with lower cost.
> > >
> > > For x86 ccmp, we don't support FP compare as ccmp operand, but we
> > > support fp comi + int ccmp sequence. With current cost comparison
> > > model, the fp comi + int ccmp can never be generated since it doesn't
> > > check whether expand_ccmp_next returns available result and the rtl
> > > cost for the empty ccmp sequence is always smaller.
> > >
> > > Check the expand_ccmp_next result ret and ret2, returns the valid one
> > > before cost comparison.
> > >
> > > gcc/ChangeLog:
> > >
> > > * ccmp.cc (expand_ccmp_expr_1): Check ret and ret2 of
> > > expand_ccmp_next, returns the valid one first before
> > > comparing cost.
> > > ---
> > >  gcc/ccmp.cc | 12 +++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/gcc/ccmp.cc b/gcc/ccmp.cc
> > > index 7cb525addf4..4b424220068 100644
> > > --- a/gcc/ccmp.cc
> > > +++ b/gcc/ccmp.cc
> > > @@ -247,7 +247,17 @@ expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, 
> > > rtx_insn **gen_seq)
> > >   cost2 = seq_cost (prep_seq_2, speed_p);
> > >   cost2 += seq_cost (gen_seq_2, speed_p);
> > > }
> > > - if (cost2 < cost1)
> > > +
> > > + /* For x86 target the ccmp does not support fp operands, but
> > > +have fcomi insn that can produce eflags and then do int
> > > +ccmp. So if one of the op is fp compare, ret1 or ret2 can
> > > +fail, and the cost of the corresponding empty seq will
> > > +always be smaller, then the NULL sequence will be returned.
> > > +Add check for ret and ret2, returns the available one if
> > > +the other is NULL.  */
> > > + if ((!ret && ret2)
> > > + || (!(ret && !ret2)
> > > + && cost2 < cost1))
> > > {
> > >   *prep_seq = prep_seq_2;
> > >   *gen_seq = gen_seq_2;
> > > --
> > > 2.31.1
> > >


Re: [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-29 Thread HAO CHEN GUI
Hi Kewen,

在 2024/5/29 13:26, Kewen.Lin 写道:
> I can understand re-using "unordered" and "eq" will save some efforts than
> doing with unspecs, but they are actually RTL codes instead of bits on the
> specific hardware CR, a downside is that people who isn't aware of this
> design point can have some misunderstanding when reading/checking the code
> or dumping, from this perspective unspecs (with reasonable name) can be
> more meaningful.  Normally adopting RTL code is better since they have the
> chance to be considered (optimized) in generic pass/code, but it isn't the
> case here as we just use the code itself but not be with the same semantic
> (meaning).  Looking forward to others' opinions on this, if we want to adopt
> "unordered" and "eq" like what this patch does, I think we should at least
> emphasize such points in rs6000-modes.def.

Thanks so much for your comments. IMHO, the core is if we can re-define
"unordered" or "eq" for certain CC mode on a specific target. If we can't or
it's unsafe, we have to use the unspecs. In this case, I just want to define
the code "unordered" on CCBCD as testing if the bit 3 is set on this CR field.
Actually rs6000 already use "lt" code to test if bit 0 is set for vector
compare instructions. The following expand is an example.

(define_expand "vector_ae__p"
  [(parallel
[(set (reg:CC CR6_REGNO)
  (unspec:CC [(ne:CC (match_operand:VI 1 "vlogical_operand")
 (match_operand:VI 2 "vlogical_operand"))]
   UNSPEC_PREDICATE))
 (set (match_dup 3)
  (ne:VI (match_dup 1)
 (match_dup 2)))])
   (set (match_operand:SI 0 "register_operand" "=r")
(lt:SI (reg:CC CR6_REGNO)
   (const_int 0)))
   (set (match_dup 0)
(xor:SI (match_dup 0)
(const_int 1)))]

I think the "lt" on CC just doesn't mean it compares if CC value is less than an
integer. It just tests the "lt" bit (bit 0) is set or not on this CC.

  Looking forward to your and Segher's further invaluable comments.

Thanks
Gui Haochen


Re: Reverted recent patches to resource.cc

2024-05-29 Thread Jeff Law




On 5/29/24 8:41 PM, Hans-Peter Nilsson wrote:




I do bootstraps and regression testsuite runs on a variety of systems
via qemu (alpha, m68k, aarch64, s390, ppc64, etc).  It ain't fast, but
it does work if QEMU is in pretty good shape and you can find a root
filesystem to use.


That might certainly fit the bill.  I guess you mean with a
filesystem image for e.g. sparc-linux?

I keep postponing looking into getting a working setup
(mostly the baseboard file) for qemu-anything + newlib.
Last I looked, qemu.exp had a serious typo...but I see that
was just for arm-eabi and arm-pi4, so yes, that might be a
viable path, thanks for the reminder.
I don't bother with qemu.exp at all.  I've set up binfmt handlers so 
that I can execute foreign binaries.


So given a root filesystem, I can chroot into it and do whatever I need. 
 As far as dejagnu is concerned it looks like the native system.



Jeff



[PATCH] Fix some opindex for some options [PR115022]

2024-05-29 Thread Andrew Pinski
While looking at the index I noticed that some options had
`-` in the front for the index which is wrong. And then
I noticed there was no index for `mcmodel=` for targets or had
used `-mcmodel` incorrectly.

This fixes both of those and regnerates the urls files see that
`-mcmodel=` option now has an url associated with it.

OK?

gcc/ChangeLog:

PR target/115022
* doc/invoke.texi (fstrub=disable): Fix opindex.
(minline-memops-threshold): Fix opindex.
(mcmodel=): Add opindex and fix them.
* common.opt.urls: Regenerate.
* config/aarch64/aarch64.opt.urls: Regenerate.
* config/bpf/bpf.opt.urls: Regenerate.
* config/i386/i386.opt.urls: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/nds32/nds32-elf.opt.urls: Regenerate.
* config/nds32/nds32-linux.opt.urls: Regenerate.
* config/or1k/or1k.opt.urls: Regenerate.
* config/riscv/riscv.opt.urls: Regenerate.
* config/rs6000/aix64.opt.urls: Regenerate.
* config/rs6000/linux64.opt.urls: Regenerate.
* config/sparc/sparc.opt.urls: Regenerate.

Signed-off-by: Andrew Pinski 
---
 gcc/common.opt.urls |  3 +++
 gcc/config/aarch64/aarch64.opt.urls |  3 ++-
 gcc/config/bpf/bpf.opt.urls |  3 +++
 gcc/config/i386/i386.opt.urls   |  3 ++-
 gcc/config/loongarch/loongarch.opt.urls |  2 +-
 gcc/config/nds32/nds32-elf.opt.urls |  2 +-
 gcc/config/nds32/nds32-linux.opt.urls   |  2 +-
 gcc/config/or1k/or1k.opt.urls   |  3 ++-
 gcc/config/riscv/riscv.opt.urls |  3 ++-
 gcc/config/rs6000/aix64.opt.urls|  3 ++-
 gcc/config/rs6000/linux64.opt.urls  |  3 ++-
 gcc/config/sparc/sparc.opt.urls |  2 +-
 gcc/doc/invoke.texi | 17 +++--
 13 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index 10462e40874..1f2eb67c8e0 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -1339,6 +1339,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fstrict-aliasing)
 fstrict-overflow
 UrlSuffix(gcc/Code-Gen-Options.html#index-fstrict-overflow)
 
+fstrub=disable
+UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003ddisable)
+
 fstrub=strict
 UrlSuffix(gcc/Instrumentation-Options.html#index-fstrub_003dstrict)
 
diff --git a/gcc/config/aarch64/aarch64.opt.urls 
b/gcc/config/aarch64/aarch64.opt.urls
index 993634c52f8..4fa90384378 100644
--- a/gcc/config/aarch64/aarch64.opt.urls
+++ b/gcc/config/aarch64/aarch64.opt.urls
@@ -18,7 +18,8 @@ 
UrlSuffix(gcc/AArch64-Options.html#index-mfix-cortex-a53-843419)
 mlittle-endian
 UrlSuffix(gcc/AArch64-Options.html#index-mlittle-endian)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/AArch64-Options.html#index-mcmodel_003d)
 
 mtp=
 UrlSuffix(gcc/AArch64-Options.html#index-mtp)
diff --git a/gcc/config/bpf/bpf.opt.urls b/gcc/config/bpf/bpf.opt.urls
index 8c1e5f86d5c..1e8873a899f 100644
--- a/gcc/config/bpf/bpf.opt.urls
+++ b/gcc/config/bpf/bpf.opt.urls
@@ -33,3 +33,6 @@ UrlSuffix(gcc/eBPF-Options.html#index-msmov)
 mcpu=
 UrlSuffix(gcc/eBPF-Options.html#index-mcpu-5)
 
+minline-memops-threshold=
+UrlSuffix(gcc/eBPF-Options.html#index-minline-memops-threshold)
+
diff --git a/gcc/config/i386/i386.opt.urls b/gcc/config/i386/i386.opt.urls
index 40e8a844936..9384b0b3187 100644
--- a/gcc/config/i386/i386.opt.urls
+++ b/gcc/config/i386/i386.opt.urls
@@ -40,7 +40,8 @@ UrlSuffix(gcc/x86-Options.html#index-march-16)
 mlarge-data-threshold=
 UrlSuffix(gcc/x86-Options.html#index-mlarge-data-threshold)
 
-; skipping UrlSuffix for 'mcmodel=' due to finding no URLs
+mcmodel=
+UrlSuffix(gcc/x86-Options.html#index-mcmodel_003d-7)
 
 mcpu=
 UrlSuffix(gcc/x86-Options.html#index-mcpu-14)
diff --git a/gcc/config/loongarch/loongarch.opt.urls 
b/gcc/config/loongarch/loongarch.opt.urls
index 9ed5d7b5596..f7545f65103 100644
--- a/gcc/config/loongarch/loongarch.opt.urls
+++ b/gcc/config/loongarch/loongarch.opt.urls
@@ -58,7 +58,7 @@ mrecip
 UrlSuffix(gcc/LoongArch-Options.html#index-mrecip)
 
 mcmodel=
-UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel)
+UrlSuffix(gcc/LoongArch-Options.html#index-mcmodel_003d-1)
 
 mdirect-extern-access
 UrlSuffix(gcc/LoongArch-Options.html#index-mdirect-extern-access)
diff --git a/gcc/config/nds32/nds32-elf.opt.urls 
b/gcc/config/nds32/nds32-elf.opt.urls
index 3ae1efe7312..e5432b62863 100644
--- a/gcc/config/nds32/nds32-elf.opt.urls
+++ b/gcc/config/nds32/nds32-elf.opt.urls
@@ -1,5 +1,5 @@
 ; Autogenerated by regenerate-opt-urls.py from gcc/config/nds32/nds32-elf.opt 
and generated HTML
 
 mcmodel=
-UrlSuffix(gcc/NDS32-Options.html#index-mcmodel-1)
+UrlSuffix(gcc/NDS32-Options.html#index-mcmodel_003d-2)
 
diff --git a/gcc/config/nds32/nds32-linux.opt.urls 
b/gcc/config/nds32/nds32-linux.opt.urls
index ac589ccd472..3986cf225ef 100644
--- a/gcc/config/nds32/nds32-linux.opt.urls
+++ b/gcc/config/nds32/nds

[PATCH] [libstdc++-v3] [rtems] enable filesystem support

2024-05-29 Thread Alexandre Oliva


mkdir, chdir and chmod functions are defined in librtemscpu, that
doesn't get linked in during libstdc++-v3 configure, but applications
use -qrtems for linking, which brings those symbols in, so it makes
sense to mark them as available so that the C++ filesystem APIs are
enabled.

Regstrapped on x86_64-linux-gnu, also tested on aarch64-rtems6 with
gcc-13.  Ok to install?


for  libstdc++-v3/ChangeLog

* configure.ac [*-*-rtems*]: Set chdir, chmod and mkdir as
available.
* configure: Rebuilt.
---
 libstdc++-v3/configure|7 +++
 libstdc++-v3/configure.ac |7 +++
 2 files changed, 14 insertions(+)

diff --git a/libstdc++-v3/configure b/libstdc++-v3/configure
index 5179cc507f129..a7d1c015906c2 100755
--- a/libstdc++-v3/configure
+++ b/libstdc++-v3/configure
@@ -28610,6 +28610,13 @@ _ACEOF
 
 $as_echo "#define HAVE_USLEEP 1" >>confdefs.h
 
+
+   # These functions are defined in librtempscpu.  We don't use
+   # -qrtems during configure, so we don't link that in, and fail
+   # to find them.
+   glibcxx_cv_chdir=yes
+   glibcxx_cv_chmod=yes
+   glibcxx_cv_mkdir=yes
 ;;
 esac
   elif test "x$with_headers" != "xno"; then
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 37396bd6ebbe6..0725c81bc9fa4 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -400,6 +400,13 @@ dnl # rather than hardcoding that information.
 AC_DEFINE(HAVE_SYMLINK)
 AC_DEFINE(HAVE_TRUNCATE)
 AC_DEFINE(HAVE_USLEEP)
+
+   # These functions are defined in librtempscpu.  We don't use
+   # -qrtems during configure, so we don't link that in, and fail
+   # to find them.
+   glibcxx_cv_chdir=yes
+   glibcxx_cv_chmod=yes
+   glibcxx_cv_mkdir=yes
 ;;
 esac
   elif test "x$with_headers" != "xno"; then


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


  1   2   >