date:20220707

[PATCH] Speed up LC SSA rewrite

2022-07-07 Thread Richard Biener via Gcc-patches

The following avoids collecting all loops exit blocks into bitmaps
and computing the union of those up the loop tree possibly repeatedly.
Instead we make sure to do this only once for each loop with a
definition possibly requiring a LC phi node plus make sure to
leverage recorded exits to avoid the intermediate bitmap allocation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-loop-manip.cc (compute_live_loop_exits): Take
the def loop exit block bitmap as argument instead of
re-computing it here.
(add_exit_phis_var): Adjust.
(loop_name_cmp): New function.
(add_exit_phis): Sort variables to insert LC PHI nodes
after definition loop, for each definition loop compute
the exit block bitmap once.
(get_loops_exit): Remove.
(rewrite_into_loop_closed_ssa_1): Do not pre-record
all loop exit blocks into bitmaps.  Record loop exits
if required.
---
 gcc/tree-ssa-loop-manip.cc | 95 ++
 1 file changed, 56 insertions(+), 39 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
index 9f3b62652ea..0324ff60a0f 100644
--- a/gcc/tree-ssa-loop-manip.cc
+++ b/gcc/tree-ssa-loop-manip.cc
@@ -183,12 +183,14 @@ find_sibling_superloop (class loop *use_loop, class loop 
*def_loop)
 /* DEF_BB is a basic block containing a DEF that needs rewriting into
loop-closed SSA form.  USE_BLOCKS is the set of basic blocks containing
uses of DEF that "escape" from the loop containing DEF_BB (i.e. blocks in
-   USE_BLOCKS are dominated by DEF_BB but not in the loop father of DEF_B).
+   USE_BLOCKS are dominated by DEF_BB but not in the loop father of DEF_BB).
ALL_EXITS[I] is the set of all basic blocks that exit loop I.
+   DEF_LOOP_EXITS is a bitmap of loop exit blocks that exit the loop
+   containing DEF_BB or its outer loops.
 
-   Compute the subset of LOOP_EXITS that exit the loop containing DEF_BB
-   or one of its loop fathers, in which DEF is live.  This set is returned
-   in the bitmap LIVE_EXITS.
+   Compute the subset of loop exit destinations that exit the loop
+   containing DEF_BB or one of its loop fathers, in which DEF is live.
+   This set is returned in the bitmap LIVE_EXITS.
 
Instead of computing the complete livein set of the def, we use the loop
nesting tree as a form of poor man's structure analysis.  This greatly
@@ -197,18 +199,17 @@ find_sibling_superloop (class loop *use_loop, class loop 
*def_loop)
 
 static void
 compute_live_loop_exits (bitmap live_exits, bitmap use_blocks,
-bitmap *loop_exits, basic_block def_bb)
+basic_block def_bb, bitmap def_loop_exits)
 {
   unsigned i;
   bitmap_iterator bi;
   class loop *def_loop = def_bb->loop_father;
   unsigned def_loop_depth = loop_depth (def_loop);
-  bitmap def_loop_exits;
 
   /* Normally the work list size is bounded by the number of basic
  blocks in the largest loop.  We don't know this number, but we
  can be fairly sure that it will be relatively small.  */
-  auto_vec worklist (MAX (8, n_basic_blocks_for_fn (cfun) / 128));
+  auto_vec worklist (MAX (8, n_basic_blocks_for_fn (cfun) / 
128));
 
   EXECUTE_IF_SET_IN_BITMAP (use_blocks, 0, i, bi)
 {
@@ -272,13 +273,7 @@ compute_live_loop_exits (bitmap live_exits, bitmap 
use_blocks,
}
 }
 
-  def_loop_exits = BITMAP_ALLOC (&loop_renamer_obstack);
-  for (class loop *loop = def_loop;
-   loop != current_loops->tree_root;
-   loop = loop_outer (loop))
-bitmap_ior_into (def_loop_exits, loop_exits[loop->num]);
   bitmap_and_into (live_exits, def_loop_exits);
-  BITMAP_FREE (def_loop_exits);
 }
 
 /* Add a loop-closing PHI for VAR in basic block EXIT.  */
@@ -322,23 +317,33 @@ add_exit_phi (basic_block exit, tree var)
Exits of the loops are stored in LOOP_EXITS.  */
 
 static void
-add_exit_phis_var (tree var, bitmap use_blocks, bitmap *loop_exits)
+add_exit_phis_var (tree var, bitmap use_blocks, bitmap def_loop_exits)
 {
   unsigned index;
   bitmap_iterator bi;
   basic_block def_bb = gimple_bb (SSA_NAME_DEF_STMT (var));
-  bitmap live_exits = BITMAP_ALLOC (&loop_renamer_obstack);
 
   gcc_checking_assert (! bitmap_bit_p (use_blocks, def_bb->index));
 
-  compute_live_loop_exits (live_exits, use_blocks, loop_exits, def_bb);
+  auto_bitmap live_exits (&loop_renamer_obstack);
+  compute_live_loop_exits (live_exits, use_blocks, def_bb, def_loop_exits);
 
   EXECUTE_IF_SET_IN_BITMAP (live_exits, 0, index, bi)
 {
   add_exit_phi (BASIC_BLOCK_FOR_FN (cfun, index), var);
 }
+}
 
-  BITMAP_FREE (live_exits);
+static int
+loop_name_cmp (const void *p1, const void *p2)
+{
+  auto l1 = (const std::pair *)p1;
+  auto l2 = (const std::pair *)p2;
+  if (l1->first < l2->first)
+return -1;
+  else if (l1->first > l2->first)
+return 1;
+  return 0;
 }
 
 /* Add exit phis for the names marked in NAMES_TO_RENAME.
@@ -346,31 +35

Re: [PATCH]middle-end: don't lower past veclower [PR106063]

2022-07-07 Thread Richard Biener via Gcc-patches

On Tue, 5 Jul 2022, Tamar Christina wrote:

> Hi All,
> 
> My previous patch can cause a problem if the pattern matches after veclower
> as it may replace the construct with a vector sequence which the target may 
> not
> directly support.
> 
> As such don't perform the rewriting if after veclower.

Note that when doing the rewriting before veclower to a variant
not supported by the target can cause veclower to generate absymal
code.  In some cases we are very careful and try to at least preserve
code supported by the target over transforming that into a variant
not supported.

That said, a better fix would be to check whether the target
can perform the new comparison.  Before veclower it would be
OK to do the transform nevertheless in case it cannot do the
original transform.

Richard.

> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> 
> Ok for master? and backport to GCC 12?
> 
> Thanks,
> Tamar
> 
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/106063
>   * match.pd: Do not apply pattern after veclower.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/106063
>   * gcc.dg/pr106063.c: New test.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc377d7d0deaacbcf
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(simplify
> (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
>  (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))
> +   && optimize_vectors_before_lowering_p ())
>(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> (icmp @0 { csts; })
> (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> diff --git a/gcc/testsuite/gcc.dg/pr106063.c b/gcc/testsuite/gcc.dg/pr106063.c
> new file mode 100644
> index 
> ..b23596724f6bb98c53af2dce77d31509bab10378
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106063.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */
> +typedef __int128 __attribute__((__vector_size__ (16))) V;
> +
> +V
> +foo (V v)
> +{
> +  return (v & (V){15}) == v;
> +}
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstra

RE: [PATCH]middle-end: don't lower past veclower [PR106063]

2022-07-07 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Thursday, July 7, 2022 8:19 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd 
> Subject: Re: [PATCH]middle-end: don't lower past veclower [PR106063]
> 
> On Tue, 5 Jul 2022, Tamar Christina wrote:
> 
> > Hi All,
> >
> > My previous patch can cause a problem if the pattern matches after
> > veclower as it may replace the construct with a vector sequence which
> > the target may not directly support.
> >
> > As such don't perform the rewriting if after veclower.
> 
> Note that when doing the rewriting before veclower to a variant not
> supported by the target can cause veclower to generate absymal code.  In
> some cases we are very careful and try to at least preserve code supported
> by the target over transforming that into a variant not supported.
> 
> That said, a better fix would be to check whether the target can perform the
> new comparison.  Before veclower it would be OK to do the transform
> nevertheless in case it cannot do the original transform.

This last statement is somewhat confusing. Did you want me to change it such 
that
before veclower the rewrite is always done and after veclowering only if the 
target
supports it?

Or did you want me to never do the rewrite if the target doesn't support it?

Thanks,
Tamar

> 
> Richard.
> 
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > and no issues.
> >
> > Ok for master? and backport to GCC 12?
> >
> > Thanks,
> > Tamar
> >
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/106063
> > * match.pd: Do not apply pattern after veclower.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/106063
> > * gcc.dg/pr106063.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/match.pd b/gcc/match.pd index
> >
> 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc
> 377
> > d7d0deaacbcf 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(simplify
> > (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> >  (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> > - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> > + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))
> > + && optimize_vectors_before_lowering_p ())
> >(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> > (icmp @0 { csts; })
> > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> > diff --git a/gcc/testsuite/gcc.dg/pr106063.c
> > b/gcc/testsuite/gcc.dg/pr106063.c new file mode 100644 index
> >
> ..b23596724f6bb98c53af2dce77
> d3
> > 1509bab10378
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr106063.c
> > @@ -0,0 +1,9 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */
> > +typedef __int128 __attribute__((__vector_size__ (16))) V;
> > +
> > +V
> > +foo (V v)
> > +{
> > +  return (v & (V){15}) == v;
> > +}
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH, Frankenstra

RE: [PATCH]middle-end: don't lower past veclower [PR106063]

2022-07-07 Thread Richard Biener via Gcc-patches

On Thu, 7 Jul 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 7, 2022 8:19 AM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd 
> > Subject: Re: [PATCH]middle-end: don't lower past veclower [PR106063]
> > 
> > On Tue, 5 Jul 2022, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > My previous patch can cause a problem if the pattern matches after
> > > veclower as it may replace the construct with a vector sequence which
> > > the target may not directly support.
> > >
> > > As such don't perform the rewriting if after veclower.
> > 
> > Note that when doing the rewriting before veclower to a variant not
> > supported by the target can cause veclower to generate absymal code.  In
> > some cases we are very careful and try to at least preserve code supported
> > by the target over transforming that into a variant not supported.
> > 
> > That said, a better fix would be to check whether the target can perform the
> > new comparison.  Before veclower it would be OK to do the transform
> > nevertheless in case it cannot do the original transform.
> 
> This last statement is somewhat confusing. Did you want me to change it such 
> that
> before veclower the rewrite is always done and after veclowering only if the 
> target
> supports it?
> 
> Or did you want me to never do the rewrite if the target doesn't support it?

I meant before veclower you can do the rewrite if either the rewriting
result is supported by the target OR if the original expression is
_not_ supported by the target.  The latter case might be not too
important to worry doing (it would still canonicalize for those
targets then).  After veclower you can only rewrite under the former
condition.

Richard.

> Thanks,
> Tamar
> 
> > 
> > Richard.
> > 
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Ok for master? and backport to GCC 12?
> > >
> > > Thanks,
> > > Tamar
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR tree-optimization/106063
> > >   * match.pd: Do not apply pattern after veclower.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR tree-optimization/106063
> > >   * gcc.dg/pr106063.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> > 40c09bedadb89dabb6622559a8f69df5384e61fd..ba161892a98756c0278dc40fc
> > 377
> > > d7d0deaacbcf 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -6040,7 +6040,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >(simplify
> > > (cmp (bit_and:c@2 @0 cst@1) integer_zerop)
> > >  (with { tree csts = bitmask_inv_cst_vector_p (@1); }
> > > - (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2)))
> > > + (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))
> > > +   && optimize_vectors_before_lowering_p ())
> > >(if (TYPE_UNSIGNED (TREE_TYPE (@1)))
> > > (icmp @0 { csts; })
> > > (with { tree utype = unsigned_type_for (TREE_TYPE (@1)); }
> > > diff --git a/gcc/testsuite/gcc.dg/pr106063.c
> > > b/gcc/testsuite/gcc.dg/pr106063.c new file mode 100644 index
> > >
> > ..b23596724f6bb98c53af2dce77
> > d3
> > > 1509bab10378
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/pr106063.c
> > > @@ -0,0 +1,9 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fno-tree-forwprop --disable-tree-evrp" } */
> > > +typedef __int128 __attribute__((__vector_size__ (16))) V;
> > > +
> > > +V
> > > +foo (V v)
> > > +{
> > > +  return (v & (V){15}) == v;
> > > +}
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Frankenstra
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstra

RE: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-07-07 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Andrew Pinski 
> Sent: Wednesday, July 6, 2022 8:37 PM
> To: Tamar Christina 
> Cc: Richard Biener ; nd ; gcc-
> patc...@gcc.gnu.org
> Subject: Re: [PATCH]middle-end simplify complex if expressions where
> comparisons are inverse of one another.
> 
> On Wed, Jul 6, 2022 at 9:06 AM Tamar Christina 
> wrote:
> >
> > > -Original Message-
> > > From: Andrew Pinski 
> > > Sent: Wednesday, July 6, 2022 3:10 AM
> > > To: Tamar Christina 
> > > Cc: Richard Biener ; nd ; gcc-
> > > patc...@gcc.gnu.org
> > > Subject: Re: [PATCH]middle-end simplify complex if expressions where
> > > comparisons are inverse of one another.
> > >
> > > On Tue, Jul 5, 2022 at 8:16 AM Tamar Christina via Gcc-patches  > > patc...@gcc.gnu.org> wrote:
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Monday, June 20, 2022 9:57 AM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; nd 
> > > > > Subject: Re: [PATCH]middle-end simplify complex if expressions
> > > > > where comparisons are inverse of one another.
> > > > >
> > > > > On Thu, 16 Jun 2022, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > This optimizes the following sequence
> > > > > >
> > > > > >   ((a < b) & c) | ((a >= b) & d)
> > > > > >
> > > > > > into
> > > > > >
> > > > > >   (a < b ? c : d) & 1
> > > > > >
> > > > > > for scalar. On vector we can omit the & 1.
> > > > > >
> > > > > > This changes the code generation from
> > > > > >
> > > > > > zoo2:
> > > > > > cmp w0, w1
> > > > > > csetw0, lt
> > > > > > csetw1, ge
> > > > > > and w0, w0, w2
> > > > > > and w1, w1, w3
> > > > > > orr w0, w0, w1
> > > > > > ret
> > > > > >
> > > > > > into
> > > > > >
> > > > > > cmp w0, w1
> > > > > > cselw0, w2, w3, lt
> > > > > > and w0, w0, 1
> > > > > > ret
> > > > > >
> > > > > > and significantly reduces the number of selects we have to do
> > > > > > in the vector code.
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > >
> > > > > > Ok for master?
> > > > > >
> > > > > > Thanks,
> > > > > > Tamar
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > > * fold-const.cc (inverse_conditions_p): Traverse if SSA_NAME.
> > > > > > * match.pd: Add new rule.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > > * gcc.target/aarch64/if-compare_1.c: New test.
> > > > > > * gcc.target/aarch64/if-compare_2.c: New test.
> > > > > >
> > > > > > --- inline copy of patch --
> > > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc index
> > > > > >
> > > > >
> > >
> 39a5a52958d87497f301826e706886b290771a2d..f180599b90150acd3ed895a64
> > > > > 280
> > > > > > aa3255061256 100644
> > > > > > --- a/gcc/fold-const.cc
> > > > > > +++ b/gcc/fold-const.cc
> > > > > > @@ -2833,15 +2833,38 @@ compcode_to_comparison (enum
> > > > > comparison_code
> > > > > > code)  bool  inverse_conditions_p (const_tree cond1,
> > > > > > const_tree
> > > > > > cond2) {
> > > > > > -  return (COMPARISON_CLASS_P (cond1)
> > > > > > - && COMPARISON_CLASS_P (cond2)
> > > > > > - && (invert_tree_comparison
> > > > > > - (TREE_CODE (cond1),
> > > > > > -  HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE
> > > > > (cond2))
> > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 0),
> > > > > > - TREE_OPERAND (cond2, 0), 0)
> > > > > > - && operand_equal_p (TREE_OPERAND (cond1, 1),
> > > > > > - TREE_OPERAND (cond2, 1), 0));
> > > > > > +  if (COMPARISON_CLASS_P (cond1)
> > > > > > +  && COMPARISON_CLASS_P (cond2)
> > > > > > +  && (invert_tree_comparison
> > > > > > +  (TREE_CODE (cond1),
> > > > > > +   HONOR_NANS (TREE_OPERAND (cond1, 0))) == TREE_CODE
> > > > > (cond2))
> > > > > > +  && operand_equal_p (TREE_OPERAND (cond1, 0),
> > > > > > + TREE_OPERAND (cond2, 0), 0)
> > > > > > +  && operand_equal_p (TREE_OPERAND (cond1, 1),
> > > > > > + TREE_OPERAND (cond2, 1), 0))
> > > > > > +return true;
> > > > > > +
> > > > > > +  if (TREE_CODE (cond1) == SSA_NAME
> > > > > > +  && TREE_CODE (cond2) == SSA_NAME)
> > > > > > +{
> > > > > > +  gimple *gcond1 = SSA_NAME_DEF_STMT (cond1);
> > > > > > +  gimple *gcond2 = SSA_NAME_DEF_STMT (cond2);
> > > > > > +  if (!is_gimple_assign (gcond1) || !is_gimple_assign (gcond2))
> > > > > > +   return false;
> > > > > > +
> > > > > > +  tree_code code1 = gimple_assign_rhs_code (gcond1);
> > > > > > +  tree_code code2 = gimple_assign_rhs_code (gcond2);
> > > > > > +  return TREE_CODE_CLASS (code1) == tcc_comparison
> > > > > > +&& TREE_CODE_CLASS (code2) == tcc_comparison
> > > > > > +&& invert_tree_comparison (code1,
> > > > > > +

Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size

2022-07-07 Thread Richard Biener via Gcc-patches

On Wed, Jul 6, 2022 at 4:20 PM Qing Zhao  wrote:
>
> (Sorry for the late reply, just came back from a short vacation.)
>
> > On Jul 4, 2022, at 2:49 AM, Richard Biener  
> > wrote:
> >
> > On Fri, Jul 1, 2022 at 5:32 PM Martin Sebor  wrote:
> >>
> >> On 7/1/22 08:01, Qing Zhao wrote:
> >>>
> >>>
>  On Jul 1, 2022, at 8:59 AM, Jakub Jelinek  wrote:
> 
>  On Fri, Jul 01, 2022 at 12:55:08PM +, Qing Zhao wrote:
> > If so, comparing to the current implemenation to have all the checking 
> > in middle-end, what’s the
> > major benefit of moving part of the checking into FE, and leaving the 
> > other part in middle-end?
> 
>  The point is recording early what FIELD_DECLs could be vs. can't 
>  possibly be
>  treated like flexible array members and just use that flag in the 
>  decisions
>  in the current routines in addition to what it is doing.
> >>>
> >>> Okay.
> >>>
> >>> Based on the discussion so far, I will do the following:
> >>>
> >>> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL;
> >>> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL 
> >>> based on [0], [1],
> >>> [] and the option -fstrict-flex-array, and whether it’s the last 
> >>> field of the DECL_CONTEXT.
> >>> 3. In Middle end,  Add a new utility routine is_flexible_array_member_p, 
> >>> which bases on
> >>> DECL_NOT_FLEXARRAY + array_at_struct_end_p to decide whether the array
> >>> reference is a real flexible array member reference.
> >
> > I would just update all existing users, not introduce another wrapper
> > that takes DECL_NOT_FLEXARRAY
> > into account additionally.
>
> Okay.
> >
> >>>
> >>>
> >>> Middle end currently is quite mess, array_at_struct_end_p, 
> >>> component_ref_size, and all the phases that
> >>> use these routines need to be updated, + new testing cases for each of 
> >>> the phases.
> >>>
> >>>
> >>> So, I still plan to separate the patch set into 2 parts:
> >>>
> >>>   Part A:the above 1 + 2 + 3,  and use these new utilities in 
> >>> tree-object-size.cc to resolve PR101836 first.
> >>>  Then kernel can use __FORTIFY_SOURCE correctly;
> >>>
> >>>   Part B:update all other phases with the new utilities + new testing 
> >>> cases + resolving regressions.
> >>>
> >>> Let me know if you have any comment and suggestion.
> >>
> >> It might be worth considering whether it should be possible to control
> >> the "flexible array" property separately for each trailing array member
> >> via either a #pragma or an attribute in headers that can't change
> >> the struct layout but that need to be usable in programs compiled with
> >> stricter -fstrict-flex-array=N settings.
> >
> > Or an decl attribute.
>
> Yes, it might be necessary to add a corresponding decl attribute
>
> strict_flex_array (N)
>
> Which is attached to a trailing structure array member to provide the user a 
> finer control when -fstrict-flex-array=N is specified.
>
> So, I will do the following:
>
>
> *User interface:
>
> 1. command line option:
>  -fstrict-flex-array=N   (N=0, 1, 2, 3)
> 2.  decl attribute:
>  strict_flex_array (N)  (N=0, 1, 2, 3)
>
>
> *Implementation:
>
> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL;
> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL based 
> on [0], [1],
>  [], the option -fstrict-flex-array, the attribute strict_flex_array,  
> and whether it’s the last field
>  of the DECL_CONTEXT.
> 3. In Middle end,   update all users of “array_at_struct_end_p" or 
> “component_ref_size”, or any place that treats
> Trailing array as flexible array member with the new flag  
> DECL_NOT_FLEXARRAY.
> (Still think we need a new consistent utility routine here).
>
>
> I still plan to separate the patch set into 2 parts:
>
> Part A:the above 1 + 2 + 3,  and use these new utilities in 
> tree-object-size.cc to resolve PR101836 first.
>Then kernel can use __FORTIFY_SOURCE correctly.
> Part B:update all other phases with the new utilities + new testing cases 
> + resolving regressions.
>
>
> Let me know any more comment or suggestion.

Sounds good.  Part 3. is "optimization" and reasonable to do
separately, I'm not sure you need
'B' (since we're not supposed to have new utilities), but instead I'd
do '3.' as part of 'B', just
changing the pieces th resolve PR101836 for part 'A'.

Richard.

> Thanks a lot.
>
> Qing
>
>

[PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.

2022-07-07 Thread Lulu Cheng

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.
---
 gcc/config/loongarch/loongarch.cc | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
   frame->frame_pointer_offset = offset;
   /* Next are the callee-saved FPRs.  */
   if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
   /* Next are the callee-saved GPRs.  */
   if (frame->mask)
 {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
 
   offset += x_save_size;
+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
 }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
   /* The hard frame pointer points above the callee-saved GPRs.  */
   frame->hard_frame_pointer_offset = offset;
   /* Above the hard frame pointer is the callee-allocated varags save area.  */
-- 
2.31.1

[PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-07 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch modifies the combine pattern after recog fails. With a helper
- change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with
a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT
or AND. The conversion helps pattern to match rotate and mask insn on some
targets.

  For test case rlwimi-2.c, current trunk fails on
"scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch
reduces the total number of insns from 21305 to 21279.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-07-07 Haochen Gui 

gcc/
PR target/93453
* combine.cc (change_pseudo_and_mask): New.
(recog_for_combine): If recog fails, try again with the pattern
modified by change_pseudo_and_mask.
* config/rs6000/rs6000.md (plus_ior_xor): Removed.
(anonymous split pattern for plus_ior_xor): Removed.

gcc/testsuite/
PR target/93453
* gcc.target/powerpc/20050603-3.c: Modify dump check conditions.
* gcc.target/powerpc/rlwimi-2.c: Likewise.
* gcc.target/powerpc/pr93453-2.c: New.

patch.diff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a5fabf397f7..3cd7b2b652b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11599,6 +11599,47 @@ change_zero_ext (rtx pat)
   return changed;
 }

+/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
+   ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits
+   is less than its mode mask.  The nonzero_bits in other pass doesn't return
+   the same value as it does in combine pass.  */
+static bool
+change_pseudo_and_mask (rtx pat)
+{
+  rtx src = SET_SRC (pat);
+  if ((GET_CODE (src) == IOR
+   || GET_CODE (src) == XOR
+   || GET_CODE (src) == PLUS)
+  && (((GET_CODE (XEXP (src, 0)) == ASHIFT
+   || GET_CODE (XEXP (src, 0)) == AND)
+  && REG_P (XEXP (src, 1)
+{
+  rtx *reg = &XEXP (SET_SRC (pat), 1);
+  machine_mode mode = GET_MODE (*reg);
+  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
+  if (nonzero < GET_MODE_MASK (mode))
+   {
+ int shift;
+
+ if (GET_CODE (XEXP (src, 0)) == ASHIFT)
+   shift = INTVAL (XEXP (XEXP (src, 0), 1));
+ else
+   shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1)));
+
+ if (shift > 0
+ && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero)
+   {
+ unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1;
+ rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask));
+ SUBST (*reg, x);
+ maybe_swap_commutative_operands (SET_SRC (pat));
+ return true;
+   }
+   }
+ }
+  return false;
+}
+
 /* Like recog, but we receive the address of a pointer to a new pattern.
We try to match the rtx that the pointer points to.
If that fails, we may try to modify or replace the pattern,
@@ -11646,7 +11687,10 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx 
*pnotes)
}
}
   else
-   changed = change_zero_ext (pat);
+   {
+ changed = change_pseudo_and_mask (pat);
+ changed |= change_zero_ext (pat);
+   }
 }
   else if (GET_CODE (pat) == PARALLEL)
 {
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1367a2cb779..2bd6bd5f908 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4207,24 +4207,6 @@ (define_insn_and_split "*rotl3_insert_3_"
(ior:GPR (and:GPR (match_dup 3) (match_dup 4))
 (ashift:GPR (match_dup 1) (match_dup 2])

-(define_code_iterator plus_ior_xor [plus ior xor])
-
-(define_split
-  [(set (match_operand:GPR 0 "gpc_reg_operand")
-   (plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
- (match_operand:SI 2 "const_int_operand"))
- (match_operand:GPR 3 "gpc_reg_operand")))]
-  "nonzero_bits (operands[3], mode)
-   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
-  [(set (match_dup 0)
-   (ior:GPR (and:GPR (match_dup 3)
- (match_dup 4))
-(ashift:GPR (match_dup 1)
-(match_dup 2]
-{
-  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
-})
-
 (define_insn "*rotlsi3_insert_4"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
(ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c 
b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
index 4017d34f429..e628be11532 100644
--- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
@@ -12,7 +12,7 @@ void rotins (unsigned int x)
   b.y = (x<<12) | (x>>20);
 }

-/* { dg-final { scan-assembler-not {\mrlwinm} } } */
+/* { dg-final { scan-ass

Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.

2022-07-07 Thread WANG Xuerui


Hi,

On 2022/7/7 16:04, Lulu Cheng wrote:

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.
IMO it's better to also state which problem this change is meant to 
solve (i.e. your intent), better yet, with an appropriate bugzilla link.

---
  gcc/config/loongarch/loongarch.cc | 12 +---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
frame->frame_pointer_offset = offset;
/* Next are the callee-saved FPRs.  */
if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
/* Next are the callee-saved GPRs.  */
if (frame->mask)
  {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
  
offset += x_save_size;

+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
  }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
/* The hard frame pointer points above the callee-saved GPRs.  */
frame->hard_frame_pointer_offset = offset;
/* Above the hard frame pointer is the callee-allocated varags save area.  
*/

Adjust 'libgomp.c-c++-common/requires-3.c' (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Thomas Schwinge

Hi!

In preparation for other changes:

On 2022-06-29T16:33:02+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3-aux.c
> @@ -0,0 +1,11 @@
> +/* { dg-skip-if "" { *-*-* } } */
> +
> +#pragma omp requires unified_address
> +
> +int x;
> +
> +void foo (void)
> +{
> +  #pragma omp target
> +  x = 1;
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
> @@ -0,0 +1,24 @@
> +/* { dg-do link { target offloading_enabled } } */

Not expected to see 'offloading_enabled' here...

> +/* { dg-additional-sources requires-3-aux.c } */
> +
> +/* Check diagnostic by device-compiler's lto1.

..., because of this note ^.

> +   Other file uses: 'requires unified_address'.  */
> +
> +#pragma omp requires unified_address,unified_shared_memory
> +
> +int a[10];
> +extern void foo (void);
> +
> +int
> +main (void)
> +{
> +  #pragma omp target
> +  for (int i = 0; i < 10; i++)
> +a[i] = 0;
> +
> +  foo ();
> +  return 0;
> +}
> +
> +/* { dg-error "OpenMP 'requires' directive with non-identical clauses in 
> multiple compilation units: 'unified_address, unified_shared_memory' vs. 
> 'unified_address'" "" { target *-*-* } 0 }  */
> +/* { dg-excess-errors "Ignore messages like: errors during merging of 
> translation units|mkoffload returned 1 exit status" } */

OK to push the attached "Adjust 'libgomp.c-c++-common/requires-3.c'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 6a4031b351680bdbfe3cdb9ac4e4a3aa59e4ca84 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Jul 2022 09:59:45 +0200
Subject: [PATCH] Adjust 'libgomp.c-c++-common/requires-3.c'

As documented, this one does "Check diagnostic by device-compiler's lto1".
Indeed there are none when compiling with '-foffload=disable' with an
offloading-enabled compiler, so we should use 'offload_target_[...]', as
used in other similar test cases.

Follow-up to recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".

	libgomp/
	* testsuite/libgomp.c-c++-common/requires-3.c: Adjust.
---
 libgomp/testsuite/libgomp.c-c++-common/requires-3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
index 4b07ffdd09b..7091f400ef0 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
@@ -1,4 +1,4 @@
-/* { dg-do link { target offloading_enabled } } */
+/* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } */
 /* { dg-additional-sources requires-3-aux.c } */
 
 /* Check diagnostic by device-compiler's lto1.
-- 
2.35.1

Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Thomas Schwinge

Hi!

In preparation for other changes:

On 2022-06-29T16:33:02+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4-aux.c
> @@ -0,0 +1,13 @@
> +/* { dg-skip-if "" { *-*-* } } */
> +
> +#pragma omp requires reverse_offload
> +
> +/* Note: The file does not have neither of:
> +   declare target directives, device constructs or device routines.  */
> +
> +int x;
> +
> +void foo (void)
> +{
> +  x = 1;
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
> @@ -0,0 +1,23 @@
> +/* { dg-do link { target offloading_enabled } } */
> +/* { dg-additional-options "-flto" } */
> +/* { dg-additional-sources requires-4-aux.c } */
> +
> +/* Check diagnostic by device-compiler's or host compiler's lto1.
> +   Other file uses: 'requires reverse_offload', but that's inactive as
> +   there are no declare target directives, device constructs nor device 
> routines  */
> +
> +#pragma omp requires unified_address,unified_shared_memory
> +
> +int a[10];
> +extern void foo (void);
> +
> +int
> +main (void)
> +{
> +  #pragma omp target
> +  for (int i = 0; i < 10; i++)
> +a[i] = 0;
> +
> +  foo ();
> +  return 0;
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5-aux.c
> @@ -0,0 +1,11 @@
> +/* { dg-skip-if "" { *-*-* } } */
> +
> +#pragma omp requires unified_shared_memory, unified_address, reverse_offload
> +
> +int x;
> +
> +void foo (void)
> +{
> +  #pragma omp target
> +  x = 1;
> +}

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } 
> } */
> +/* { dg-additional-sources requires-5-aux.c } */
> +
> +#pragma omp requires unified_shared_memory, unified_address, reverse_offload
> +
> +int a[10];
> +extern void foo (void);
> +
> +int
> +main (void)
> +{
> +  #pragma omp target
> +  for (int i = 0; i < 10; i++)
> +a[i] = 0;
> +
> +  foo ();
> +  return 0;
> +}
> +
> +/* { dg-output "devices present but 'omp requires unified_address, 
> unified_shared_memory, reverse_offload' cannot be fulfilled" } */

(The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.)

OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c',
'libgomp.c-c++-common/requires-5.c' testing"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From ae14ccbd050d0b49073d5ea09de3e2af63f8c674 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 7 Jul 2022 09:45:42 +0200
Subject: [PATCH] Enhance 'libgomp.c-c++-common/requires-4.c',
 'libgomp.c-c++-common/requires-5.c' testing

These should compile and link and execute in all configurations; host-fallback
execution, which we may actually verify.

Follow-up to recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".

	libgomp/
	* testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing.
	* testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
---
 .../libgomp.c-c++-common/requires-4.c  | 17 -
 .../libgomp.c-c++-common/requires-5.c  | 18 +++---
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
index 128fdbb8463..deb04368108 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -1,22 +1,29 @@
-/* { dg-do link { target offloading_enabled } } */
 /* { dg-additional-options "-flto" } */
 /* { dg-additional-sources requires-4-aux.c } */
 
-/* Check diagnostic by device-compiler's or host compiler's lto1.
+/* Check no diagnostic by device-compiler's or host compiler's lto1.
Other file uses: 'requires reverse_offload', but that's inactive as
there are no declare target directives, device constructs nor device routines  */
 
+/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1:
+   "devices present but 'omp requires unified_address, unified_shared_memory, reverse_offload' cannot be fulfilled"
+   and does host-fallback execution.  */
+
 #pragma omp requires unified_address,unified_shared_memory
 
-int a[10];
+int a[10] = { 0 };
 extern void foo (void);
 
 int
 main (void)
 {
-  #pragma omp target
+  #pragma omp target map(to: a)
+  for (int i = 0; i < 10; i++)
+a[i] = i;
+
   for (int i = 0; i < 10; i++)
-a[i] = 0;
+if (a[i] != i)
+  __builtin_abort ();
 
   foo ();
   return 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
index c1e5540cfc5..68816314b94 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
+++ b/libgomp/testsu

[PATCH] Speed up LC SSA rewrite more

2022-07-07 Thread Richard Biener via Gcc-patches

In many cases loops have only one exit or a variable is only live
across one of the exits.  In this case we know that all uses
outside of the loop will be dominated by the single LC PHI node
we insert.  If that holds for all variables requiring LC SSA PHIs
then we can simplify the update_ssa process, avoiding the
(iterated) dominance frontier computations.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-loop-manip.cc (add_exit_phis_var): Return the
number of LC PHIs inserted.
(add_exit_phis): Return whether any variable required
multiple LC PHI nodes.
(rewrite_into_loop_closed_ssa_1): Use TODO_update_ssa_no_phi
when possible.
---
 gcc/tree-ssa-loop-manip.cc | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.cc b/gcc/tree-ssa-loop-manip.cc
index 0324ff60a0f..c531f1f12fd 100644
--- a/gcc/tree-ssa-loop-manip.cc
+++ b/gcc/tree-ssa-loop-manip.cc
@@ -314,9 +314,10 @@ add_exit_phi (basic_block exit, tree var)
 }
 
 /* Add exit phis for VAR that is used in LIVEIN.
-   Exits of the loops are stored in LOOP_EXITS.  */
+   Exits of the loops are stored in LOOP_EXITS.  Returns the number
+   of PHIs added for VAR.  */
 
-static void
+static unsigned
 add_exit_phis_var (tree var, bitmap use_blocks, bitmap def_loop_exits)
 {
   unsigned index;
@@ -328,10 +329,13 @@ add_exit_phis_var (tree var, bitmap use_blocks, bitmap 
def_loop_exits)
   auto_bitmap live_exits (&loop_renamer_obstack);
   compute_live_loop_exits (live_exits, use_blocks, def_bb, def_loop_exits);
 
+  unsigned cnt = 0;
   EXECUTE_IF_SET_IN_BITMAP (live_exits, 0, index, bi)
 {
   add_exit_phi (BASIC_BLOCK_FOR_FN (cfun, index), var);
+  cnt++;
 }
+  return cnt;
 }
 
 static int
@@ -348,13 +352,15 @@ loop_name_cmp (const void *p1, const void *p2)
 
 /* Add exit phis for the names marked in NAMES_TO_RENAME.
Exits of the loops are stored in EXITS.  Sets of blocks where the ssa
-   names are used are stored in USE_BLOCKS.  */
+   names are used are stored in USE_BLOCKS.  Returns whether any name
+   required multiple LC PHI nodes.  */
 
-static void
+static bool
 add_exit_phis (bitmap names_to_rename, bitmap *use_blocks)
 {
   unsigned i;
   bitmap_iterator bi;
+  bool multiple_p = false;
 
   /* Sort names_to_rename after definition loop so we can avoid re-computing
  def_loop_exits.  */
@@ -381,9 +387,12 @@ add_exit_phis (bitmap names_to_rename, bitmap *use_blocks)
for (auto exit = loop->exits->next; exit->e; exit = exit->next)
  bitmap_set_bit (def_loop_exits, exit->e->dest->index);
}
-  add_exit_phis_var (ssa_name (p.second), use_blocks[p.second],
-def_loop_exits);
+  if (add_exit_phis_var (ssa_name (p.second), use_blocks[p.second],
+def_loop_exits) > 1)
+   multiple_p = true;
 }
+
+  return multiple_p;
 }
 
 /* For USE in BB, if it is used outside of the loop it is defined in,
@@ -588,15 +597,18 @@ rewrite_into_loop_closed_ssa_1 (bitmap changed_bbs, 
unsigned update_flag,
}
 
   /* Add the PHI nodes on exits of the loops for the names we need to
-rewrite.  */
-  add_exit_phis (names_to_rename, use_blocks);
+rewrite.  When no variable required multiple LC PHI nodes to be
+inserted then we know that all uses outside of the loop are
+dominated by the single LC SSA definition and no further PHI
+node insertions are required.  */
+  bool need_phis_p = add_exit_phis (names_to_rename, use_blocks);
 
   if (release_recorded_exits_p)
release_recorded_exits (cfun);
 
   /* Fix up all the names found to be used outside their original
 loops.  */
-  update_ssa (TODO_update_ssa);
+  update_ssa (need_phis_p ? TODO_update_ssa : TODO_update_ssa_no_phi);
 }
 
   bitmap_obstack_release (&loop_renamer_obstack);
-- 
2.35.3

[PATCH] target/106219 - proprly mark builtins pure via ix86_add_new_builtins

2022-07-07 Thread Richard Biener via Gcc-patches

The target optimize pragma path to initialize extra target specific
builtins missed handling of the pure_p flag which in turn causes
extra clobber side-effects of gather builtins leading to unexpected
issues downhill.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, will push
as obvious if that succeeds.

* config/i386/i386-builtins.cc (ix86_add_new_builtins): Properly
set DECL_PURE_P.

* g++.dg/pr106219.C: New testcase.
---
 gcc/config/i386/i386-builtins.cc |  2 ++
 gcc/testsuite/g++.dg/pr106219.C  | 31 +++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr106219.C

diff --git a/gcc/config/i386/i386-builtins.cc b/gcc/config/i386/i386-builtins.cc
index 96743e6122d..fe7243c3837 100644
--- a/gcc/config/i386/i386-builtins.cc
+++ b/gcc/config/i386/i386-builtins.cc
@@ -385,6 +385,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT 
isa2)
  ix86_builtins[i] = decl;
  if (ix86_builtins_isa[i].const_p)
TREE_READONLY (decl) = 1;
+ if (ix86_builtins_isa[i].pure_p)
+   DECL_PURE_P (decl) = 1;
}
 }
 
diff --git a/gcc/testsuite/g++.dg/pr106219.C b/gcc/testsuite/g++.dg/pr106219.C
new file mode 100644
index 000..3cad1507d5f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr106219.C
@@ -0,0 +1,31 @@
+// { dg-do compile }
+// { dg-options "-O3" }
+// { dg-additional-options "-march=bdver2" { target x86_64-*-* } }
+
+int max(int __b) {
+  if (0 < __b)
+return __b;
+  return 0;
+}
+struct Plane {
+  Plane(int, int);
+  int *Row();
+};
+#ifdef __x86_64__
+#pragma GCC target "sse2,ssse3,avx,avx2"
+#endif
+float *ConvolveXSampleAndTranspose_rowp;
+int ConvolveXSampleAndTranspose_res, ConvolveXSampleAndTranspose_r;
+void ConvolveXSampleAndTranspose() {
+  Plane out(0, ConvolveXSampleAndTranspose_res);
+  for (int y;;) {
+float sum;
+for (int i = ConvolveXSampleAndTranspose_r; i; ++i)
+  sum += i;
+for (; ConvolveXSampleAndTranspose_r; ++ConvolveXSampleAndTranspose_r)
+  sum +=
+  ConvolveXSampleAndTranspose_rowp[max(ConvolveXSampleAndTranspose_r)] 
*
+  ConvolveXSampleAndTranspose_r;
+out.Row()[y] = sum;
+  }
+}
-- 
2.35.3

[PATCH 0/0] RISC-V: Support IEEE half precision operation

2022-07-07 Thread Kito Cheng

This patch set implement _Float16 both for softfloat and hardfloat (zfh/zfhmin),
_Float16 has introduced into RISC-V psABI[1] since Jul 2021 and zfh/zfhmin
extension has ratified since 2022[2].

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/172
[2] 
https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768

[PATCH 1/2] RISC-V: Support _Float16 type.

2022-07-07 Thread Kito Cheng

RISC-V decide use _Float16 as primary IEEE half precision type, and this
already become part of psABI, this patch has added folloing support for
_Float16:

- Soft-float support for _Float16.
- Make sure _Float16 available on C++ mode.
- Name mangling for _Float16 on C++ mode.

gcc/ChangeLog

* config/riscv/riscv-builtins.cc: include stringpool.h
(riscv_float16_type_node): New.
(riscv_init_builtin_types): Ditto.
(riscv_init_builtins): Call riscv_init_builtin_types.
* config/riscv/riscv-modes.def (HF): New.
* gcc/config/riscv/riscv.cc (riscv_output_move): Handle HFmode.
(riscv_mangle_type): New.
(riscv_scalar_mode_supported_p): Ditto.
(riscv_libgcc_floating_mode_supported_p): Ditto.
(riscv_excess_precision): Ditto.
(riscv_floatn_mode): Ditto.
(riscv_init_libfuncs): Ditto.
(TARGET_MANGLE_TYPE): Ditto.
(TARGET_SCALAR_MODE_SUPPORTED_P): Ditto.
(TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Ditto.
(TARGET_INIT_LIBFUNCS): Ditto.
(TARGET_C_EXCESS_PRECISION): Ditto.
(TARGET_FLOATN_MODE): Ditto.
* gcc/config/riscv/riscv.md (mode): Add HF.
(softload): Add HF.
(softstore): Ditto.
(fmt): Ditto.
(UNITMODE): Ditto.
(movhf): New.
(*movhf_softfloat): New.

libgcc/ChangeLog:

* config/riscv/sfp-machine.h (_FP_NANFRAC_H): New.
(_FP_NANFRAC_H): Ditto.
(_FP_NANSIGN_H): Ditto.
* config/riscv/t-softfp32 (softfp_extensions): Add HF related
routines.
(softfp_truncations): Ditto.
(softfp_extras): Ditto.
* config/riscv/t-softfp64 (softfp_extras): Add HF related routines.

gcc/testsuite/ChangeLog:

* gcc/testsuite/g++.target/riscv/_Float16.C: New.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c: Ditto.
* gcc/testsuite/gcc.target/riscv/_Float16.c: Ditto.
---
 gcc/config/riscv/riscv-builtins.cc|  24 +++
 gcc/config/riscv/riscv-modes.def  |   1 +
 gcc/config/riscv/riscv.cc | 171 --
 gcc/config/riscv/riscv.md |  30 ++-
 gcc/testsuite/g++.target/riscv/_Float16.C |  18 ++
 .../gcc.target/riscv/_Float16-soft-1.c|   9 +
 .../gcc.target/riscv/_Float16-soft-2.c|  13 ++
 .../gcc.target/riscv/_Float16-soft-3.c|  12 ++
 .../gcc.target/riscv/_Float16-soft-4.c|  12 ++
 gcc/testsuite/gcc.target/riscv/_Float16.c |  19 ++
 libgcc/config/riscv/sfp-machine.h |   3 +
 libgcc/config/riscv/t-softfp32|   5 +
 libgcc/config/riscv/t-softfp64|   1 +
 13 files changed, 300 insertions(+), 18 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/riscv/_Float16.C
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-soft-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16.c

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 1218fdfc67d..3009311604d 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "recog.h"
 #include "diagnostic-core.h"
 #include "stor-layout.h"
+#include "stringpool.h"
 #include "expr.h"
 #include "langhooks.h"
 
@@ -160,6 +161,8 @@ static GTY(()) int riscv_builtin_decl_index[NUM_INSN_CODES];
 #define GET_BUILTIN_DECL(CODE) \
   riscv_builtin_decls[riscv_builtin_decl_index[(CODE)]]
 
+tree riscv_float16_type_node = NULL_TREE;
+
 /* Return the function type associated with function prototype TYPE.  */
 
 static tree
@@ -185,11 +188,32 @@ riscv_build_function_type (enum riscv_function_type type)
   return types[(int) type];
 }
 
+static void
+riscv_init_builtin_types (void)
+{
+  /* Provide the _Float16 type and float16_type_node if needed.  */
+  if (!float16_type_node)
+{
+  riscv_float16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (riscv_float16_type_node) = 16;
+  SET_TYPE_MODE (riscv_float16_type_node, HFmode);
+  layout_type (riscv_float16_type_node);
+}
+  else
+riscv_float16_type_node = float16_type_node;
+
+  if (!maybe_get_identifier ("_Float16"))
+lang_hooks.types.register_builtin_type (riscv_float16_type_node,
+   "_Float16");
+}
+
 /* Implement TARGET_INIT_BUILTINS.  */
 
 void
 riscv_init_builtins (void)
 {
+  riscv_init_builtin_types ();
+
   for (size_t i = 0; i < ARRAY_SIZE (riscv_builtins); i++)
 {
   const struct riscv_bu

[PATCH 2/2] RISC-V: Support zfh and zfhmin extension

2022-07-07 Thread Kito Cheng

Zfh and Zfhmin are extensions for IEEE half precision, both are ratified
in Jan. 2022[1]:

- Zfh has full set of operation like F or D for single or double precision.
- Zfhmin has only provide minimal support for half precision operation,
  like conversion, load, store and move instructions.

[1] 
https://github.com/riscv/riscv-isa-manual/commit/b35a54079e0da11740ce5b1e6db999d1d5172768

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info): Add
zfh and zfhmin.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZFHMIN): New.
(MASK_ZFH): Ditto.
(TARGET_ZFHMIN): Ditto.
(TARGET_ZFH): Ditto.
* config/riscv/riscv.cc (riscv_output_move): Handle HFmode move
for zfh and zfhmin.
(riscv_emit_float_compare): Handle HFmode.
* config/riscv/riscv.md (ANYF): Add HF.
(SOFTF): Add HF.
(load): Ditto.
(store): Ditto.
(truncsfhf2): New.
(truncdfhf2): Ditto.
(extendhfsf2): Ditto.
(extendhfdf2): Ditto.
(*movhf_hardfloat): Ditto.
(*movhf_softfloat): Make sure not ZFHMIN.
* config/riscv/riscv.opt (riscv_zf_subext): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/_Float16-zfh-1.c: New.
* gcc.target/riscv/_Float16-zfh-2.c: Ditto.
* gcc.target/riscv/_Float16-zfh-3.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-1.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-2.c: Ditto.
* gcc.target/riscv/_Float16-zfhmin-3.c: Ditto.
* gcc.target/riscv/arch-16.c: Ditto.
* gcc.target/riscv/arch-17.c: Ditto.
* gcc.target/riscv/predef-21.c: Ditto.
* gcc.target/riscv/predef-22.c: Ditto.
---
 gcc/common/config/riscv/riscv-common.cc   |  8 +++
 gcc/config/riscv/riscv-opts.h |  6 ++
 gcc/config/riscv/riscv.cc | 34 ++-
 gcc/config/riscv/riscv.md | 59 +--
 gcc/config/riscv/riscv.opt|  3 +
 .../gcc.target/riscv/_Float16-zfh-1.c |  8 +++
 .../gcc.target/riscv/_Float16-zfh-2.c |  8 +++
 .../gcc.target/riscv/_Float16-zfh-3.c |  8 +++
 .../gcc.target/riscv/_Float16-zfhmin-1.c  |  9 +++
 .../gcc.target/riscv/_Float16-zfhmin-2.c  |  9 +++
 .../gcc.target/riscv/_Float16-zfhmin-3.c  |  9 +++
 gcc/testsuite/gcc.target/riscv/arch-16.c  |  5 ++
 gcc/testsuite/gcc.target/riscv/arch-17.c  |  5 ++
 gcc/testsuite/gcc.target/riscv/predef-21.c| 59 +++
 gcc/testsuite/gcc.target/riscv/predef-22.c| 59 +++
 15 files changed, 280 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfh-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/_Float16-zfhmin-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-16.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-17.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-21.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-22.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 0e5be2ce105..4ee1b3198c5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -96,6 +96,9 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvl32768b", "zvl16384b"},
   {"zvl65536b", "zvl32768b"},
 
+  {"zfh", "zfhmin"},
+  {"zfhmin", "f"},
+
   {NULL, NULL}
 };
 
@@ -193,6 +196,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
   {"zvl32768b", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvl65536b", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"zfh",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zfhmin",ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1148,6 +1154,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zvl32768b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL32768B},
   {"zvl65536b", &gcc_options::x_riscv_zvl_flags, MASK_ZVL65536B},
 
+  {"zfhmin",&gcc_options::x_riscv_zf_subext, MASK_ZFHMIN},
+  {"zfh",   &gcc_options::x_riscv_zf_subext, MASK_ZFH},
 
   {NULL, NULL, 0}
 };
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 1e153b3a6e7..85e869e62e3 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -153,6 +153,12 @@ enum stack_protector_guard {
 #define TARGET_ZICBOM ((riscv_zicmo_subext & MASK_ZICBOM) != 0)
 #define TARGET_ZICBOP ((riscv_zicmo_subext & MASK_ZICBOP) != 0)
 
+#define MASK_ZFHMIN   (1 << 0)
+#define MASK_ZFH  (1 << 1)
+
+#define TARGET_ZFHMIN

[PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]

2022-07-07 Thread Kewen.Lin via Gcc-patches

Hi,
 
As test case in PR106091 shows, rs6000 specific pass swaps
doesn't preserve the reg_note REG_EH_REGION when replacing
some load insn at the end of basic block, it causes the
flow info verification to fail unexpectedly.  Since memory
reference rtx may trap, this patch is to ensure we copy
REG_EH_REGION reg_note while replacing swapped aligned load
or store.

Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8,
and powerpc64le-linux-gnu P9 & P10.

Richi, could you help to review this patch from a point view
of non-call-exceptions expert?

I'm going to install it if it looks good to you.  Thanks!

-
PR target/106091

gcc/ChangeLog:

* config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): Copy
REG_EH_REGION when replacing one store insn having it.
(replace_swapped_aligned_load): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr106091.c: New test.
---
 gcc/config/rs6000/rs6000-p8swap.cc  | 20 ++--
 gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++
 2 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c

diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
b/gcc/config/rs6000/rs6000-p8swap.cc
index 275702fee1b..19fbbfb67dc 100644
--- a/gcc/config/rs6000/rs6000-p8swap.cc
+++ b/gcc/config/rs6000/rs6000-p8swap.cc
@@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry 
*insn_entry,
   gcc_assert ((GET_CODE (new_body) == SET)
  && MEM_P (SET_DEST (new_body)));

-  set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn));
+  basic_block bb = BLOCK_FOR_INSN (store_insn);
+  set_block_for_insn (new_insn, bb);
+  /* Handle REG_EH_REGION note.  */
+  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn)
+{
+  rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX);
+  if (note)
+   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
+}
   df_insn_rescan (new_insn);

   df_insn_delete (store_insn);
@@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry 
*insn_entry, rtx swap_insn)
   gcc_assert ((GET_CODE (new_body) == SET)
  && MEM_P (SET_SRC (new_body)));

-  set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn));
+  basic_block bb = BLOCK_FOR_INSN (def_insn);
+  set_block_for_insn (new_insn, bb);
+  /* Handle REG_EH_REGION note.  */
+  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn)
+{
+  rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX);
+  if (note)
+   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
+}
   df_insn_rescan (new_insn);

   df_insn_delete (def_insn);
diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c 
b/gcc/testsuite/gcc.target/powerpc/pr106091.c
new file mode 100644
index 000..61ce8cf4733
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop -w" 
} */
+
+/* Verify there is no ICE.  */
+
+typedef short __attribute__ ((__vector_size__ (64))) V;
+V v, w;
+
+inline V foo (V a, V b);
+
+V
+foo (V a, V b)
+{
+  b &= v < b;
+  return (V){foo (b, w)[3], (V){}[3]};
+}
--
2.25.1

Re: [PATCH] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method, when frame->mask or frame->fmask is zero.

2022-07-07 Thread Xi Ruoyao via Gcc-patches

On Thu, 2022-07-07 at 16:31 +0800, WANG Xuerui wrote:

> IMO it's better to also state which problem this change is meant to 
> solve (i.e. your intent), better yet, with an appropriate bugzilla
> link.

And/or add a testcase (which FAILs without this change) into
gcc/testsuite/gcc.target/loongarch.

> > ---
> >   gcc/config/loongarch/loongarch.cc | 12 +---
> >   1 file changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/gcc/config/loongarch/loongarch.cc
> > b/gcc/config/loongarch/loongarch.cc
> > index d72b256df51..5c9a33c14f7 100644
> > --- a/gcc/config/loongarch/loongarch.cc
> > +++ b/gcc/config/loongarch/loongarch.cc
> > @@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
> >     frame->frame_pointer_offset = offset;
> >     /* Next are the callee-saved FPRs.  */
> >     if (frame->fmask)
> > -    offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
> > -  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
> > +    {
> > +  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
> > +  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
> > +    }
> > +  else
> > +    frame->fp_sp_offset = offset;
> >     /* Next are the callee-saved GPRs.  */
> >     if (frame->mask)
> >   {
> > @@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
> > frame->save_libcall_adjustment = x_save_size;
> >   
> >     offset += x_save_size;
> > +  frame->gp_sp_offset = offset - UNITS_PER_WORD;
> >   }
> > -  frame->gp_sp_offset = offset - UNITS_PER_WORD;
> > +  else
> > +    frame->gp_sp_offset = offset;
> >     /* The hard frame pointer points above the callee-saved GPRs. 
> > */
> >     frame->hard_frame_pointer_offset = offset;
> >     /* Above the hard frame pointer is the callee-allocated varags
> > save area.  */

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: Adjust 'libgomp.c-c++-common/requires-3.c' (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Tobias Burnus


Hi Thomas, hello all,

On 07.07.22 10:37, Thomas Schwinge wrote:

In preparation for other changes:

On 2022-06-29T16:33:02+0200, Tobias Burnus  wrote:

+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
@@ -0,0 +1,24 @@
+/* { dg-do link { target offloading_enabled } } */

Not expected to see 'offloading_enabled' here...


+/* { dg-additional-sources requires-3-aux.c } */
+
+/* Check diagnostic by device-compiler's lto1.

..., because of this note ^.

...

Subject: [PATCH] Adjust 'libgomp.c-c++-common/requires-3.c'

...

  libgomp/
  * testsuite/libgomp.c-c++-common/requires-3.c: Adjust.

...

index 4b07ffdd09b..7091f400ef0 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-3.c
@@ -1,4 +1,4 @@
-/* { dg-do link { target offloading_enabled } } */
+/* { dg-do link { target { offload_target_nvptx || offload_target_amdgcn } } } 
*/


LGTM.

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]

2022-07-07 Thread Richard Biener via Gcc-patches

On Thu, Jul 7, 2022 at 10:55 AM Kewen.Lin  wrote:
>
> Hi,
>
> As test case in PR106091 shows, rs6000 specific pass swaps
> doesn't preserve the reg_note REG_EH_REGION when replacing
> some load insn at the end of basic block, it causes the
> flow info verification to fail unexpectedly.  Since memory
> reference rtx may trap, this patch is to ensure we copy
> REG_EH_REGION reg_note while replacing swapped aligned load
> or store.
>
> Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8,
> and powerpc64le-linux-gnu P9 & P10.
>
> Richi, could you help to review this patch from a point view
> of non-call-exceptions expert?

I think it looks OK but I do wonder if in RTL there's a better
way to transfer EH info from one stmt to another when you
are replacing it?  On gimple gsi_replace would do, but I
can't immediately find a proper RTL replacement for your
emit_insn_before (..., X); remove_insn (X); (plus DF assorted
things).

Eric?

>
> I'm going to install it if it looks good to you.  Thanks!
>
> -
> PR target/106091
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): Copy
> REG_EH_REGION when replacing one store insn having it.
> (replace_swapped_aligned_load): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/pr106091.c: New test.
> ---
>  gcc/config/rs6000/rs6000-p8swap.cc  | 20 ++--
>  gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++
>  2 files changed, 33 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c
>
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 275702fee1b..19fbbfb67dc 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry 
> *insn_entry,
>gcc_assert ((GET_CODE (new_body) == SET)
>   && MEM_P (SET_DEST (new_body)));
>
> -  set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn));
> +  basic_block bb = BLOCK_FOR_INSN (store_insn);
> +  set_block_for_insn (new_insn, bb);
> +  /* Handle REG_EH_REGION note.  */
> +  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn)
> +{
> +  rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX);
> +  if (note)
> +   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
> +}
>df_insn_rescan (new_insn);
>
>df_insn_delete (store_insn);
> @@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry 
> *insn_entry, rtx swap_insn)
>gcc_assert ((GET_CODE (new_body) == SET)
>   && MEM_P (SET_SRC (new_body)));
>
> -  set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn));
> +  basic_block bb = BLOCK_FOR_INSN (def_insn);
> +  set_block_for_insn (new_insn, bb);
> +  /* Handle REG_EH_REGION note.  */
> +  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn)
> +{
> +  rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX);
> +  if (note)
> +   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
> +}
>df_insn_rescan (new_insn);
>
>df_insn_delete (def_insn);
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106091.c
> new file mode 100644
> index 000..61ce8cf4733
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c
> @@ -0,0 +1,15 @@
> +/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop 
> -w" } */
> +
> +/* Verify there is no ICE.  */
> +
> +typedef short __attribute__ ((__vector_size__ (64))) V;
> +V v, w;
> +
> +inline V foo (V a, V b);
> +
> +V
> +foo (V a, V b)
> +{
> +  b &= v < b;
> +  return (V){foo (b, w)[3], (V){}[3]};
> +}
> --
> 2.25.1

Re: [PATCH] rs6000: Preserve REG_EH_REGION when replacing load/store [PR106091]

2022-07-07 Thread Kewen.Lin via Gcc-patches

on 2022/7/7 17:03, Richard Biener wrote:
> On Thu, Jul 7, 2022 at 10:55 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> As test case in PR106091 shows, rs6000 specific pass swaps
>> doesn't preserve the reg_note REG_EH_REGION when replacing
>> some load insn at the end of basic block, it causes the
>> flow info verification to fail unexpectedly.  Since memory
>> reference rtx may trap, this patch is to ensure we copy
>> REG_EH_REGION reg_note while replacing swapped aligned load
>> or store.
>>
>> Bootstrapped and regtested on powerpc64-linux-gnu P7 & P8,
>> and powerpc64le-linux-gnu P9 & P10.
>>
>> Richi, could you help to review this patch from a point view
>> of non-call-exceptions expert?
> 
> I think it looks OK but I do wonder if in RTL there's a better
> way to transfer EH info from one stmt to another when you
> are replacing it?  On gimple gsi_replace would do, but I
> can't immediately find a proper RTL replacement for your
> emit_insn_before (..., X); remove_insn (X); (plus DF assorted
> things).
> 

Thanks for so prompt review!  For the question, I'm not sure :(,
when I was drafting this patch, I wondered if there is one function
passing/copying reg_note REG_EH_REGION for this kind of need,
so I went through almost all the places related to REG_EH_REGION,
but nothing desired was found (though I may miss sth.).
 
BR,
Kewen

> Eric?
> 
>>
>> I'm going to install it if it looks good to you.  Thanks!
>>
>> -
>> PR target/106091
>>
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000-p8swap.cc (replace_swapped_aligned_store): 
>> Copy
>> REG_EH_REGION when replacing one store insn having it.
>> (replace_swapped_aligned_load): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/powerpc/pr106091.c: New test.
>> ---
>>  gcc/config/rs6000/rs6000-p8swap.cc  | 20 ++--
>>  gcc/testsuite/gcc.target/powerpc/pr106091.c | 15 +++
>>  2 files changed, 33 insertions(+), 2 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106091.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>> b/gcc/config/rs6000/rs6000-p8swap.cc
>> index 275702fee1b..19fbbfb67dc 100644
>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>> @@ -1690,7 +1690,15 @@ replace_swapped_aligned_store (swap_web_entry 
>> *insn_entry,
>>gcc_assert ((GET_CODE (new_body) == SET)
>>   && MEM_P (SET_DEST (new_body)));
>>
>> -  set_block_for_insn (new_insn, BLOCK_FOR_INSN (store_insn));
>> +  basic_block bb = BLOCK_FOR_INSN (store_insn);
>> +  set_block_for_insn (new_insn, bb);
>> +  /* Handle REG_EH_REGION note.  */
>> +  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == store_insn)
>> +{
>> +  rtx note = find_reg_note (store_insn, REG_EH_REGION, NULL_RTX);
>> +  if (note)
>> +   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
>> +}
>>df_insn_rescan (new_insn);
>>
>>df_insn_delete (store_insn);
>> @@ -1784,7 +1792,15 @@ replace_swapped_aligned_load (swap_web_entry 
>> *insn_entry, rtx swap_insn)
>>gcc_assert ((GET_CODE (new_body) == SET)
>>   && MEM_P (SET_SRC (new_body)));
>>
>> -  set_block_for_insn (new_insn, BLOCK_FOR_INSN (def_insn));
>> +  basic_block bb = BLOCK_FOR_INSN (def_insn);
>> +  set_block_for_insn (new_insn, bb);
>> +  /* Handle REG_EH_REGION note.  */
>> +  if (cfun->can_throw_non_call_exceptions && BB_END (bb) == def_insn)
>> +{
>> +  rtx note = find_reg_note (def_insn, REG_EH_REGION, NULL_RTX);
>> +  if (note)
>> +   add_reg_note (new_insn, REG_EH_REGION, XEXP (note, 0));
>> +}
>>df_insn_rescan (new_insn);
>>
>>df_insn_delete (def_insn);
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106091.c 
>> b/gcc/testsuite/gcc.target/powerpc/pr106091.c
>> new file mode 100644
>> index 000..61ce8cf4733
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106091.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-options "-O -fnon-call-exceptions -fno-tree-dce -fno-tree-forwprop 
>> -w" } */
>> +
>> +/* Verify there is no ICE.  */
>> +
>> +typedef short __attribute__ ((__vector_size__ (64))) V;
>> +V v, w;
>> +
>> +inline V foo (V a, V b);
>> +
>> +V
>> +foo (V a, V b)
>> +{
>> +  b &= v < b;
>> +  return (V){foo (b, w)[3], (V){}[3]};
>> +}
>> --
>> 2.25.1

Re: libstdc++: Minor codegen improvement for atomic wait spinloop

2022-07-07 Thread Jonathan Wakely via Gcc-patches

On Wed, 6 Jul 2022 at 22:42, Thomas Rodgers  wrote:
>
> Ok for trunk? backport?

Yes, for all branches that have the atomic wait code.


>
> On Wed, Jul 6, 2022 at 1:56 PM Jonathan Wakely  wrote:
>>
>> On Wed, 6 Jul 2022 at 02:05, Thomas Rodgers via Libstdc++
>>  wrote:
>> >
>> > This patch merges the spin loops in the atomic wait implementation which is
>> > a
>> > minor codegen improvement.
>> >
>> > libstdc++-v3/ChangeLog:
>> >  * include/bits/atomic_wait.h (__atomic_spin): Merge spin loops.
>>
>> OK, thanks.
>>

Re: Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Tobias Burnus


On 07.07.22 10:42, Thomas Schwinge wrote:

In preparation for other changes:

...

On 2022-06-29T16:33:02+0200, Tobias Burnus  wrote:

+/* { dg-output "devices present but 'omp requires unified_address, 
unified_shared_memory, reverse_offload' cannot be fulfilled" } */

(The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.)
OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c',
'libgomp.c-c++-common/requires-5.c' testing"?

...

  libgomp/
  * testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing.
  * testsuite/libgomp.c-c++-common/requires-5.c: Likewise.

...

--- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
@@ -1,22 +1,29 @@
-/* { dg-do link { target offloading_enabled } } */
  /* { dg-additional-options "-flto" } */
  /* { dg-additional-sources requires-4-aux.c } */

-/* Check diagnostic by device-compiler's or host compiler's lto1.
+/* Check no diagnostic by device-compiler's or host compiler's lto1.


I note that without ENABLE_OFFLOADING that there is never any lto1
diagnostic.

However, given that no diagnostic is expected, it also works for "!
offloading_enabled".

Thus, the change fine.


 Other file uses: 'requires reverse_offload', but that's inactive as
 there are no declare target directives, device constructs nor device 
routines  */

+/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1:
+   "devices present but 'omp requires unified_address, unified_shared_memory, 
reverse_offload' cannot be fulfilled"
+   and does host-fallback execution.  */


The latter is only true when also device code is produced – and a device
is available for that/those device types. I think that's what you imply
by "For actual offload execution", but it is a bit hidden.

Maybe s/For actual offload execution, prints/It may print/ is clearer?

In principle, it would be nice if we could test for the output, but
currently setting an env var for remote execution does not work, yet.
Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597773.html -
When set, we could use offload_target_nvptx etc. (..._amdgcn, ..._any)
to test – as this guarantees that it is compiled for that device + the
device is available.

+
  #pragma omp requires unified_address,unified_shared_memory

-int a[10];
+int a[10] = { 0 };
  extern void foo (void);

  int
  main (void)
  {
-  #pragma omp target
+  #pragma omp target map(to: a)


Hmm, I wonder whether I like it or not. Without, there is an implicit
"map(tofrom:a)". On the other hand, OpenMP permits that – even with
unified-shared memory – the implementation my copy the data to the
device. (For instance, to permit faster access to "a".)

Thus, ...


+  for (int i = 0; i < 10; i++)
+a[i] = i;
+
for (int i = 0; i < 10; i++)
-a[i] = 0;
+if (a[i] != i)
+  __builtin_abort ();

... this condition (back on the host) could also fail with USM. However,
given that to my knowledge no USM implementation actually copies the
data, I believe it is fine. (Disclaimer: I have not checked what OG12,
but I guess it also does not copy it.)

--- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
@@ -1,21 +1,25 @@
-/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } } 
*/
  /* { dg-additional-sources requires-5-aux.c } */

+/* For actual offload execution, prints the following (only) if GOMP_DEBUG=1:
+   "devices present but 'omp requires unified_address, unified_shared_memory, 
reverse_offload' cannot be fulfilled"
+   and does host-fallback execution.  */
+

This wording is correct with the now-removed check – but if you remove
the offload_target..., it only "might" print it, depending, well, on the
conditions set by offload_target...

  #pragma omp requires unified_shared_memory, unified_address, reverse_offload

-int a[10];
+int a[10] = { 0 };
  extern void foo (void);

  int
  main (void)
  {
-  #pragma omp target
+  #pragma omp target map(to: a)
+  for (int i = 0; i < 10; i++)
+a[i] = i;
+
for (int i = 0; i < 10; i++)
-a[i] = 0;
+if (a[i] != i)
+  __builtin_abort ();

foo ();
return 0;
  }


Thus: LGTM, if you update the GOMP_DEBUG=... wording, either using
"might" (etc.) or by being more explicit.

Once we have remote setenv, we probably want to add another testcase to
check for the GOMP_DEBUG=1, copying an existing one, adding dg-output
and restricting it to target offload_target_...

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH v2] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

2022-07-07 Thread Lulu Cheng

Under the LA architecture, when the stack is dropped too far, the process
of dropping the stack is divided into two steps.
step1: After dropping the stack, save callee saved registers on the stack.
step2: The rest of it.

The stack drop operation is optimized when frame->total_size minus
frame->sp_fp_offset is an integer multiple of 4096, can reduce the number
of instructions required to drop the stack. However, this optimization is
not effective because of the original calculation method

The following case:
int main()
{
  char buf[1024 * 12];
  printf ("%p\n", buf);
  return 0;
}

As you can see from the generated assembler, the old GCC has two more
instructions than the new GCC, lines 14 and line 24.

   newold
 10 main:   │ 11 main:
 11   addi.d  $r3,$r3,-16   │ 12   lu12i.w $r13,-12288>>12
 12   lu12i.w $r13,-12288>>12   │ 13   addi.d  $r3,$r3,-2032
 13   lu12i.w $r5,-12288>>12│ 14   ori $r13,$r13,2016
 14   lu12i.w $r12,12288>>12│ 15   lu12i.w $r5,-12288>>12
 15   st.d  $r1,$r3,8   │ 16   lu12i.w $r12,12288>>12
 16   add.d $r12,$r12,$r5   │ 17   st.d  $r1,$r3,2024
 17   add.d $r3,$r3,$r13│ 18   add.d $r12,$r12,$r5
 18   add.d $r5,$r12,$r3│ 19   add.d $r3,$r3,$r13
 19   la.local  $r4,.LC0│ 20   add.d $r5,$r12,$r3
 20   bl  %plt(printf)  │ 21   la.local  $r4,.LC0
 21   lu12i.w $r13,12288>>12│ 22   bl  %plt(printf)
 22   add.d $r3,$r3,$r13│ 23   lu12i.w $r13,8192>>12
 23   ld.d  $r1,$r3,8   │ 24   ori $r13,$r13,2080
 24   or  $r4,$r0,$r0   │ 25   add.d $r3,$r3,$r13
 25   addi.d  $r3,$r3,16│ 26   ld.d  $r1,$r3,2024
 26   jr  $r1   │ 27   or  $r4,$r0,$r0
│ 28   addi.d  $r3,$r3,2032
│ 29   jr  $r1
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/prolog-opt.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 12 ++--
 .../gcc.target/loongarch/prolog-opt.c | 29 +++
 2 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
   frame->frame_pointer_offset = offset;
   /* Next are the callee-saved FPRs.  */
   if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
   /* Next are the callee-saved GPRs.  */
   if (frame->mask)
 {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
 
   offset += x_save_size;
+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
 }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
   /* The hard frame pointer points above the callee-saved GPRs.  */
   frame->hard_frame_pointer_offset = offset;
   /* Above the hard frame pointer is the callee-allocated varags save area.  */
diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c 
b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
new file mode 100644
index 000..7f611370aa4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -0,0 +1,29 @@
+/* Test that LoongArch backend stack drop operation optimized.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */
+
+struct test
+{
+  int empty1[0];
+  double empty2[0];
+  int : 0;
+  float x;
+  long empty3[0];
+  long : 0;
+  float y;
+  unsigned : 0;
+  char empty4[0];
+};
+
+extern void callee (struct test);
+
+void
+caller (void)
+{
+  struct test test;
+  test.x = 114;
+  test.y = 514;
+  callee (test);
+}
-- 
2.31.1

[PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

2022-07-07 Thread Lulu Cheng

update testsuite.

--

Under the LA architecture, when the stack is dropped too far, the process
of dropping the stack is divided into two steps.
step1: After dropping the stack, save callee saved registers on the stack.
step2: The rest of it.

The stack drop operation is optimized when frame->total_size minus
frame->sp_fp_offset is an integer multiple of 4096, can reduce the number
of instructions required to drop the stack. However, this optimization is
not effective because of the original calculation method

The following case:
int main()
{
  char buf[1024 * 12];
  printf ("%p\n", buf);
  return 0;
}

As you can see from the generated assembler, the old GCC has two more
instructions than the new GCC, lines 14 and line 24.

   newold
 10 main:   │ 11 main:
 11   addi.d  $r3,$r3,-16   │ 12   lu12i.w $r13,-12288>>12
 12   lu12i.w $r13,-12288>>12   │ 13   addi.d  $r3,$r3,-2032
 13   lu12i.w $r5,-12288>>12│ 14   ori $r13,$r13,2016
 14   lu12i.w $r12,12288>>12│ 15   lu12i.w $r5,-12288>>12
 15   st.d  $r1,$r3,8   │ 16   lu12i.w $r12,12288>>12
 16   add.d $r12,$r12,$r5   │ 17   st.d  $r1,$r3,2024
 17   add.d $r3,$r3,$r13│ 18   add.d $r12,$r12,$r5
 18   add.d $r5,$r12,$r3│ 19   add.d $r3,$r3,$r13
 19   la.local  $r4,.LC0│ 20   add.d $r5,$r12,$r3
 20   bl  %plt(printf)  │ 21   la.local  $r4,.LC0
 21   lu12i.w $r13,12288>>12│ 22   bl  %plt(printf)
 22   add.d $r3,$r3,$r13│ 23   lu12i.w $r13,8192>>12
 23   ld.d  $r1,$r3,8   │ 24   ori $r13,$r13,2080
 24   or  $r4,$r0,$r0   │ 25   add.d $r3,$r3,$r13
 25   addi.d  $r3,$r3,16│ 26   ld.d  $r1,$r3,2024
 26   jr  $r1   │ 27   or  $r4,$r0,$r0
│ 28   addi.d  $r3,$r3,2032
│ 29   jr  $r1
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/prolog-opt.c: New test.
---
 gcc/config/loongarch/loongarch.cc   | 12 +---
 gcc/testsuite/gcc.target/loongarch/prolog-opt.c | 15 +++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
   frame->frame_pointer_offset = offset;
   /* Next are the callee-saved FPRs.  */
   if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
   /* Next are the callee-saved GPRs.  */
   if (frame->mask)
 {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
 
   offset += x_save_size;
+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
 }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
   /* The hard frame pointer points above the callee-saved GPRs.  */
   frame->hard_frame_pointer_offset = offset;
   /* Above the hard frame pointer is the callee-allocated varags save area.  */
diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c 
b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
new file mode 100644
index 000..c7bd71dde93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -0,0 +1,15 @@
+/* Test that LoongArch backend stack drop operation optimized.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */
+
+#include 
+
+int main()
+{
+  char buf[1024 * 12];
+  printf ("%p\n", buf);
+  return 0;
+}
+
-- 
2.31.1

[PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators

2022-07-07 Thread Andrew Stubbs

This patch series implements OpenMP allocators for low-latency memory on
nvptx, unified shared memory on both nvptx and amdgcn, and generic
pinned memory support for all Linux hosts (an nvptx-specific
implementation using Cuda pinned memory is planned for the future, as is
low-latency memory on amdgcn).

Patches 01 to 14 are reposts of patches previously submitted, now
forward ported to the current master branch and with the various
follow-up patches folded in. Where it conflicts with the new memkind
implementation the memkind takes precedence (but there's currently no way to
implement memory that's both high-bandwidth and pinned anyway).

Patches 15 to 17 are new work. I can probably approve these myself, but
they can't be committed until the rest of the series is approved.

Andrew

Andrew Stubbs (11):
  libgomp, nvptx: low-latency memory allocator
  libgomp: pinned memory
  libgomp, openmp: Add ompx_pinned_mem_alloc
  openmp, nvptx: low-lat memory access traits
  openmp, nvptx: ompx_unified_shared_mem_alloc
  openmp: Add -foffload-memory
  openmp: allow requires unified_shared_memory
  openmp: -foffload-memory=pinned
  amdgcn: Support XNACK mode
  amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK
  amdgcn: libgomp plugin USM implementation

Hafiz Abid Qadeer (6):
  openmp: Use libgomp memory allocation functions with unified shared
memory.
  Add parsing support for allocate directive (OpenMP 5.0)
  Translate allocate directive (OpenMP 5.0).
  Handle cleanup of omp allocated variables (OpenMP 5.0).
  Gimplify allocate directive (OpenMP 5.0).
  Lower allocate directive (OpenMP 5.0).

 gcc/c/c-parser.cc |  22 +-
 gcc/common.opt|  16 +
 gcc/config/gcn/gcn-hsa.h  |   3 +-
 gcc/config/gcn/gcn-opts.h |  10 +-
 gcc/config/gcn/gcn-valu.md|  29 +-
 gcc/config/gcn/gcn.cc |  62 ++-
 gcc/config/gcn/gcn.md | 113 +++--
 gcc/config/gcn/gcn.opt|  18 +-
 gcc/config/gcn/mkoffload.cc   |  56 ++-
 gcc/coretypes.h   |   7 +
 gcc/cp/parser.cc  |  22 +-
 gcc/doc/gimple.texi   |  38 +-
 gcc/doc/invoke.texi   |  16 +-
 gcc/fortran/dump-parse-tree.cc|   3 +
 gcc/fortran/gfortran.h|   5 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 242 ++-
 gcc/fortran/parse.cc  |  10 +-
 gcc/fortran/resolve.cc|   1 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |  20 +
 gcc/fortran/trans-openmp.cc   |  50 +++
 gcc/fortran/trans.cc  |   1 +
 gcc/gimple-pretty-print.cc|  37 ++
 gcc/gimple.cc |  12 +
 gcc/gimple.def|   6 +
 gcc/gimple.h  |  60 ++-
 gcc/gimplify.cc   |  19 +
 gcc/gsstruct.def  |   1 +
 gcc/omp-builtins.def  |   3 +
 gcc/omp-low.cc| 383 +
 gcc/passes.def|   1 +
 .../c-c++-common/gomp/alloc-pinned-1.c|  28 ++
 gcc/testsuite/c-c++-common/gomp/usm-1.c   |   4 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c   |  46 +++
 gcc/testsuite/c-c++-common/gomp/usm-3.c   |  44 ++
 gcc/testsuite/c-c++-common/gomp/usm-4.c   |   4 +
 gcc/testsuite/g++.dg/gomp/usm-1.C |  32 ++
 gcc/testsuite/g++.dg/gomp/usm-2.C |  30 ++
 gcc/testsuite/g++.dg/gomp/usm-3.C |  38 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 +
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  84 
 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 |  13 +
 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 |  15 +
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90  |   6 +
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90  |  16 +
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90  |  13 +
 gcc/testsuite/gfortran.dg/gomp/usm-4.f90  |   6 +
 gcc/tree-core.h   |   9 +
 gcc/tree-pass.h   |   1 +
 gcc/tree-pretty-print.cc  |  23 ++
 gcc/tree.cc   |   1 +
 gcc/tree.def  |   4 +
 gcc/tree.h|  15 +
 include/cuda/cuda.h   |  12 +
 libgomp/allocator.c   | 304 ++
 libgomp/config/linux/allocator.c  | 137 +++
 libgomp/config/nvptx/allocator.c  | 387 ++
 libgomp/config/nvptx/team.c

[PATCH 02/17] libgomp: pinned memory

2022-07-07 Thread Andrew Stubbs


Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
---
 libgomp/allocator.c  |  67 ++
 libgomp/config/linux/allocator.c |  99 ++
 libgomp/config/nvptx/allocator.c |   8 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-1.c |  95 +
 libgomp/testsuite/libgomp.c/alloc-pinned-2.c | 101 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-3.c | 130 ++
 libgomp/testsuite/libgomp.c/alloc-pinned-4.c | 132 +++
 7 files changed, 602 insertions(+), 30 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-4.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 9b33bcf529b..54310ab93ca 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -39,16 +39,20 @@
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -351,10 +355,6 @@ omp_init_allocator (omp_memspace_handle_t memspace, int ntraits,
   break;
 }
 
-  /* No support for this so far.  */
-  if (data.pinned)
-return omp_null_allocator;
-
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
   *ret = data;
 #ifndef HAVE_SYNC_BUILTINS
@@ -481,7 +481,8 @@ retry:
 	}
   else
 #endif
-	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+			  allocator_data->pinned);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -511,7 +512,8 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size,
+allocator_data && allocator_data->pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -542,9 +544,9 @@ fail:
 #ifdef LIBGOMP_USE_MEMKIND
 	  || memkind
 #endif
-	  || (allocator_data
-	  && allocator_data->pool_size < ~(uintptr_t) 0)
-	  || !allocator_data)
+	  || !allocator_data
+	  || allocator_data->pool_size < ~(uintptr_t) 0
+	  || allocator_data->pinned)
 	{
 	  allocator = omp_default_mem_alloc;
 	  goto retry;
@@ -596,6 +598,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   struct omp_mem_header *data;
   omp_memspace_handle_t memspace __attribute__((unused))
 = omp_default_mem_space;
+  int pinned __attribute__((unused)) = false;
 
   if (ptr == NULL)
 return;
@@ -627,6 +630,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
   memspace = allocator_data->memspace;
+  pinned = allocator_data->pinned;
 }
   else
 {
@@ -651,7 +655,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
   memspace = predefined_alloc_mapping[data->allocator];
 }
 
-  MEMSPACE_FREE (memspace, data->ptr, data->size);
+  MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
 }
 
 ialias (omp_free)
@@ -767,7 +771,8 @@ retry:
 	}
   else
 #endif
-	ptr = MEMSPACE_CALLOC (allocator_data->memspace, new_size);
+	ptr =

[PATCH 01/17] libgomp, nvptx: low-latency memory allocator

2022-07-07 Thread Andrew Stubbs


This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that low-latency allocator
will not work with the PTX 3.1 multilib.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

co-authored-by: Kwok Cheung Yeung  
---
 libgomp/allocator.c| 235 -
 libgomp/config/nvptx/allocator.c   | 370 +
 libgomp/config/nvptx/team.c|  28 ++
 libgomp/plugin/plugin-nvptx.c  |  23 +-
 libgomp/testsuite/libgomp.c/allocators-1.c |  56 
 libgomp/testsuite/libgomp.c/allocators-2.c |  64 
 libgomp/testsuite/libgomp.c/allocators-3.c |  42 +++
 libgomp/testsuite/libgomp.c/allocators-4.c | 196 +++
 libgomp/testsuite/libgomp.c/allocators-5.c |  63 
 libgomp/testsuite/libgomp.c/allocators-6.c | 117 +++
 10 files changed, 1110 insertions(+), 84 deletions(-)
 create mode 100644 libgomp/config/nvptx/allocator.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-6.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index b04820b8cf9..9b33bcf529b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -37,6 +37,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 enum gomp_memkind_kind
 {
   GOMP_MEMKIND_NONE = 0,
@@ -453,7 +481,7 @@ retry:
 	}
   else
 #endif
-	ptr = malloc (new_size);
+	ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
 	{
 #ifdef HAVE_SYNC_BUILTINS
@@ -478,7 +506,13 @@ retry:
 	}
   else
 #endif
-	ptr = malloc (new_size);
+	{
+	  omp_memspace_handle_t memspace __attribute__((unused))
+	= (allocator_data
+	   ? allocator_data->memspace
+	   : predefined_alloc_mapping[allocator]);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size);
+	}
   if (ptr == NULL)
 	goto fail;
 }
@@ -496,35 +530,38 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_da

[PATCH 04/17] openmp, nvptx: low-lat memory access traits

2022-07-07 Thread Andrew Stubbs


The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all".  This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
* testsuite/libgomp.c/allocators-4.c (main): Add access trait.
* testsuite/libgomp.c/allocators-6.c (main): Add access trait.
* testsuite/libgomp.c/allocators-7.c: New test.
---
 libgomp/allocator.c| 15 +
 libgomp/config/nvptx/allocator.c   | 11 
 libgomp/testsuite/libgomp.c/allocators-4.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-6.c |  7 ++-
 libgomp/testsuite/libgomp.c/allocators-7.c | 68 ++
 5 files changed, 102 insertions(+), 6 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/allocators-7.c

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 029d0d40a36..48ab0782e6b 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -54,6 +54,9 @@
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   (PIN ? NULL : free (ADDR))
 #endif
+#ifndef MEMSPACE_VALIDATE
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) 1
+#endif
 
 /* Map the predefined allocators to the correct memory space.
The index to this table is the omp_allocator_handle_t enum value.  */
@@ -438,6 +441,10 @@ retry:
   if (__builtin_add_overflow (size, new_size, &new_size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -733,6 +740,10 @@ retry:
   if (__builtin_add_overflow (size_temp, new_size, &new_size))
 goto fail;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
@@ -964,6 +975,10 @@ retry:
 goto fail;
   old_size = data->size;
 
+  if (allocator_data
+  && !MEMSPACE_VALIDATE (allocator_data->memspace, allocator_data->access))
+goto fail;
+
   if (__builtin_expect (allocator_data
 			&& allocator_data->pool_size < ~(uintptr_t) 0, 0))
 {
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index f740b97f6ac..0102680b717 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -358,6 +358,15 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
 return realloc (addr, size);
 }
 
+static inline int
+nvptx_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+  /* Disallow use of low-latency memory when it must be accessible by
+ all threads.  */
+  return (memspace != omp_low_lat_mem_space
+	  || access != omp_atv_all);
+}
+
 #define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
   nvptx_memspace_alloc (MEMSPACE, SIZE)
 #define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
@@ -366,5 +375,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
   nvptx_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
 #define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
   nvptx_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+  nvptx_memspace_validate (MEMSPACE, ACCESS)
 
 #include "../../allocator.c"
diff --git a/libgomp/testsuite/libgomp.c/allocators-4.c b/libgomp/testsuite/libgomp.c/allocators-4.c
index 9fa6aa1624f..cae27ea33c1 100644
--- a/libgomp/testsuite/libgomp.c/allocators-4.c
+++ b/libgomp/testsuite/libgomp.c/allocators-4.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]
+  = { { omp_atk_fallback, omp_atv_null_fb },
+  { omp_atk_access, omp_atv_pteam } };
 omp_allocator_handle_t lowlat = omp_init_allocator (omp_low_lat_mem_space,
-			1, traits);
+			2, traits);
 
 int size = 4;
 
diff --git a/libgomp/testsuite/libgomp.c/allocators-6.c b/libgomp/testsuite/libgomp.c/allocators-6.c
index 90bf73095ef..c03233df582 100644
--- a/libgomp/testsuite/libgomp.c/allocators-6.c
+++ b/libgomp/testsuite/libgomp.c/allocators-6.c
@@ -23,10 +23,11 @@ main ()
   #pragma omp target
   {
 /* Ensure that the memory we get *is* low-latency with a null-fallback.  */
-omp_alloctrait_t traits[1]
-  = { { omp_atk_fallback, omp_atv_null_fb } };
+omp_alloctrait_t traits[2]
+  = { { omp_atk_fallback, omp_atv_null_fb },

[PATCH 03/17] libgomp, openmp: Add ompx_pinned_mem_alloc

2022-07-07 Thread Andrew Stubbs


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
---
 libgomp/allocator.c   |  60 +++
 libgomp/omp.h.in  |   1 +
 libgomp/omp_lib.f90.in|   2 +
 libgomp/testsuite/libgomp.c/alloc-pinned-5.c  |  90 
 libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 101 ++
 .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
 6 files changed, 252 insertions(+), 18 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 54310ab93ca..029d0d40a36 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -35,7 +35,7 @@
 #include 
 #endif
 
-#define omp_max_predefined_alloc omp_thread_mem_alloc
+#define omp_max_predefined_alloc ompx_pinned_mem_alloc
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
@@ -67,6 +67,7 @@ static const omp_memspace_handle_t predefined_alloc_mapping[] = {
   omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
   omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+  omp_default_mem_space,   /* ompx_pinned_mem_alloc. */
 };
 
 enum gomp_memkind_kind
@@ -512,8 +513,11 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_ALLOC (memspace, new_size,
-allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	= (allocator_data
+	   ? allocator_data->pinned
+	   : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_ALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -534,7 +538,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -653,6 +658,7 @@ omp_free (void *ptr, omp_allocator_handle_t allocator)
 #endif
 
   memspace = predefined_alloc_mapping[data->allocator];
+  pinned = (data->allocator == ompx_pinned_mem_alloc);
 }
 
   MEMSPACE_FREE (memspace, data->ptr, data->size, pinned);
@@ -802,8 +808,11 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
-	  ptr = MEMSPACE_CALLOC (memspace, new_size,
- allocator_data && allocator_data->pinned);
+	  int pinned __attribute__((unused))
+	= (allocator_data
+	   ? allocator_data->pinned
+	   : allocator == ompx_pinned_mem_alloc);
+	  ptr = MEMSPACE_CALLOC (memspace, new_size, pinned);
 	}
   if (ptr == NULL)
 	goto fail;
@@ -824,7 +833,8 @@ retry:
 fail:
   int fallback = (allocator_data
 		  ? allocator_data->fallback
-		  : allocator == omp_default_mem_alloc
+		  : (allocator == omp_default_mem_alloc
+		 || allocator == ompx_pinned_mem_alloc)
 		  ? omp_atv_null_fb
 		  : omp_atv_default_mem_fb);
   switch (fallback)
@@ -1026,11 +1036,15 @@ retry:
   else
 #endif
   if (prev_size)
-	new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
-data->size, new_size,
-(free_allocator_data
- && free_allocator_data->pinned),
-allocator_data->pinned);
+	{
+	  int was_pinned __attribute__((unused))
+	= (free_allocator_data
+	   ? free_allocator_data->pinned
+	   : free_allocator == ompx_pinned_mem_alloc);
+	  new_ptr = MEMSPACE_REALLOC (allocator_data->memspace, data->ptr,
+  data->size, new_size, was_pinned,
+  allocator_data->pinned);
+	}
   else
 	new_ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
   allocator_data->pinned);
@@ -1079,10 +1093,16 @@ retry:
 	= (allocator_data
 	   ? allocator_data->memspace
 	   : predefined_alloc_mapping[allocator]);
+	  int was_pinned __attribute__((unused))
+	= (free_allocator_data

[PATCH 09/17] openmp: Use libgomp memory allocation functions with unified shared memory.

2022-07-07 Thread Andrew Stubbs


This patches changes calls to malloc/free/calloc/realloc and operator new to
memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc.  This helps existing code to benefit
from the unified shared memory.  The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.

We only replace replacable new operator and not the class member or placement 
new.

gcc/ChangeLog:

* omp-low.cc (usm_transform): New function.
(make_pass_usm_transform): Likewise.
(class pass_usm_transform): New.
* passes.def: Add pass_usm_transform.
* tree-pass.h (make_pass_usm_transform): New declaration.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-2.c: New test.
* c-c++-common/gomp/usm-3.c: New test.
* g++.dg/gomp/usm-1.C: New test.
* g++.dg/gomp/usm-2.C: New test.
* g++.dg/gomp/usm-3.C: New test.
* gfortran.dg/gomp/usm-2.f90: New test.
* gfortran.dg/gomp/usm-3.f90: New test.

libgomp/ChangeLog:

* testsuite/libgomp.c/usm-6.c: New test.
* testsuite/libgomp.c++/usm-1.C: Likewise.

co-authored-by: Andrew Stubbs  
---
 gcc/omp-low.cc   | 174 +++
 gcc/passes.def   |   1 +
 gcc/testsuite/c-c++-common/gomp/usm-2.c  |  46 ++
 gcc/testsuite/c-c++-common/gomp/usm-3.c  |  44 ++
 gcc/testsuite/g++.dg/gomp/usm-1.C|  32 +
 gcc/testsuite/g++.dg/gomp/usm-2.C|  30 
 gcc/testsuite/g++.dg/gomp/usm-3.C|  38 +
 gcc/testsuite/gfortran.dg/gomp/usm-2.f90 |  16 +++
 gcc/testsuite/gfortran.dg/gomp/usm-3.f90 |  13 ++
 gcc/tree-pass.h  |   1 +
 libgomp/testsuite/libgomp.c++/usm-1.C|  54 +++
 libgomp/testsuite/libgomp.c/usm-6.c  |  92 
 12 files changed, 541 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-3.c
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/usm-3.C
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-3.f90
 create mode 100644 libgomp/testsuite/libgomp.c++/usm-1.C
 create mode 100644 libgomp/testsuite/libgomp.c/usm-6.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index ba612e5c67d..cdadd6f0c96 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -15097,6 +15097,180 @@ make_pass_diagnose_omp_blocks (gcc::context *ctxt)
 {
   return new pass_diagnose_omp_blocks (ctxt);
 }
+
+/* Provide transformation required for using unified shared memory
+   by replacing calls to standard memory allocation functions with
+   function provided by the libgomp.  */
+
+static tree
+usm_transform (gimple_stmt_iterator *gsi_p, bool *,
+	   struct walk_stmt_info *wi)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  /* ompx_unified_shared_mem_alloc is 10.  */
+  const unsigned int unified_shared_mem_alloc = 10;
+
+  switch (gimple_code (stmt))
+{
+case GIMPLE_CALL:
+  {
+	gcall *gs = as_a  (stmt);
+	tree fndecl = gimple_call_fndecl (gs);
+	if (fndecl)
+	  {
+	tree allocator = build_int_cst (pointer_sized_int_node,
+	unified_shared_mem_alloc);
+	const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+	if ((strcmp (name, "malloc") == 0)
+		 || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+		 && DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MALLOC)
+		 || DECL_IS_REPLACEABLE_OPERATOR_NEW_P (fndecl)
+		 || strcmp (name, "omp_target_alloc") == 0)
+	  {
+		  tree omp_alloc_type
+		= build_function_type_list (ptr_type_node, size_type_node,
+		pointer_sized_int_node,
+		NULL_TREE);
+		tree repl = build_fn_decl ("omp_alloc", omp_alloc_type);
+		tree size = gimple_call_arg (gs, 0);
+		gimple *g = gimple_build_call (repl, 2, size, allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	  }
+	else if (strcmp (name, "aligned_alloc") == 0)
+	  {
+		/*  May be we can also use this for new operator with
+		std::align_val_t parameter.  */
+		tree omp_alloc_type
+		  = build_function_type_list (ptr_type_node, size_type_node,
+	  size_type_node,
+	  pointer_sized_int_node,
+	  NULL_TREE);
+		tree repl = build_fn_decl ("omp_aligned_alloc",
+	   omp_alloc_type);
+		tree align = gimple_call_arg (gs, 0);
+		tree size = gimple_call_arg (gs, 1);
+		gimple *g = gimple_build_call (repl, 3, align, size,
+	   allocator);
+		gimple_call_set_lhs (g, gimple_call_lhs (gs));
+		gimple_set_location (g, gimple_location (stmt));
+		gsi_replace (gsi_p, g, true);
+	  }
+	else if ((strcmp (name, "calloc") == 0)
+		  || (fndecl_built_in_p (fndecl, BUILT_IN_NORMAL)
+			  && DECL_F

[PATCH 06/17] openmp: Add -foffload-memory

2022-07-07 Thread Andrew Stubbs


Add a new option.  It's inactive until I add some follow-up patches.

gcc/ChangeLog:

* common.opt: Add -foffload-memory and its enum values.
* coretypes.h (enum offload_memory): New.
* doc/invoke.texi: Document -foffload-memory.
---
 gcc/common.opt  | 16 
 gcc/coretypes.h |  7 +++
 gcc/doc/invoke.texi | 16 +++-
 3 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index e7a51e882ba..8d76980fbbb 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2213,6 +2213,22 @@ Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32)
 EnumValue
 Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
 
+foffload-memory=
+Common Joined RejectNegative Enum(offload_memory) Var(flag_offload_memory) Init(OFFLOAD_MEMORY_NONE)
+-foffload-memory=[none|unified|pinned]	Use an offload memory optimization.
+
+Enum
+Name(offload_memory) Type(enum offload_memory) UnknownError(Unknown offload memory option %qs)
+
+EnumValue
+Enum(offload_memory) String(none) Value(OFFLOAD_MEMORY_NONE)
+
+EnumValue
+Enum(offload_memory) String(unified) Value(OFFLOAD_MEMORY_UNIFIED)
+
+EnumValue
+Enum(offload_memory) String(pinned) Value(OFFLOAD_MEMORY_PINNED)
+
 fomit-frame-pointer
 Common Var(flag_omit_frame_pointer) Optimization
 When possible do not generate stack frames.
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 08b9ac9094c..dd52d5bb113 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -206,6 +206,13 @@ enum offload_abi {
   OFFLOAD_ABI_ILP32
 };
 
+/* Types of memory optimization for an offload device.  */
+enum offload_memory {
+  OFFLOAD_MEMORY_NONE,
+  OFFLOAD_MEMORY_UNIFIED,
+  OFFLOAD_MEMORY_PINNED
+};
+
 /* Types of profile update methods.  */
 enum profile_update {
   PROFILE_UPDATE_SINGLE,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d5ff1018372..3df39bb06e3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -fno-builtin  -fno-builtin-@var{function}  -fcond-mismatch @gol
 -ffreestanding  -fgimple  -fgnu-tm  -fgnu89-inline  -fhosted @gol
 -flax-vector-conversions  -fms-extensions @gol
--foffload=@var{arg}  -foffload-options=@var{arg} @gol
+-foffload=@var{arg}  -foffload-options=@var{arg} -foffload-memory=@var{arg} @gol
 -fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fpermitted-flt-eval-methods=@var{standard} @gol
@@ -2708,6 +2708,20 @@ Typical command lines are
 -foffload-options=amdgcn-amdhsa=-march=gfx906 -foffload-options=-lm
 @end smallexample
 
+@item -foffload-memory=none
+@itemx -foffload-memory=unified
+@itemx -foffload-memory=pinned
+@opindex foffload-memory
+@cindex OpenMP offloading memory modes
+Enable a memory optimization mode to use with OpenMP.  The default behavior,
+@option{-foffload-memory=none}, is to do nothing special (unless enabled via
+a requires directive in the code).  @option{-foffload-memory=unified} is
+equivalent to @code{#pragma omp requires unified_shared_memory}.
+@option{-foffload-memory=pinned} forces all host memory to be pinned (this
+mode may require the user to increase the ulimit setting for locked memory).
+All translation units must select the same setting to avoid undefined
+behavior.
+
 @item -fopenacc
 @opindex fopenacc
 @cindex OpenACC accelerator programming

[PATCH 05/17] openmp, nvptx: ompx_unified_shared_mem_alloc

2022-07-07 Thread Andrew Stubbs


This adds support for using Cuda Managed Memory with omp_alloc.  It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.

There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).

The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.

include/ChangeLog:

* cuda/cuda.h (CUdevice_attribute): Add definitions for
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
(CUmemAttach_flags): New.
(CUpointer_attribute): New.
(cuMemAllocManaged): New prototype.
(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
(linux_memspace_calloc): Handle USM.
(linux_memspace_free): Handle USM.
(linux_memspace_realloc): Handle USM.
* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
ompx_host_mem_alloc.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_realloc): Likewise.
* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
(GOMP_OFFLOAD_usm_free): New prototype.
(GOMP_OFFLOAD_is_usm_ptr): New prototype.
* libgomp.h (gomp_usm_alloc): New prototype.
(gomp_usm_free): New prototype.
(gomp_is_usm_ptr): New prototype.
(struct gomp_device_descr): Add USM functions.
* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
and ompx_host_mem_space.
(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc.
* omp_lib.f90.in: Likewise.
* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
(cuPointerGetAttribute): Likewise.
* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
Call cuMemAllocManaged as appropriate.
(GOMP_OFFLOAD_get_num_devices): Allow GOMP_REQUIRES_UNIFIED_ADDRESS
and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
(GOMP_OFFLOAD_alloc): Move internals to ...
(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
(GOMP_OFFLOAD_usm_alloc): New function.
(GOMP_OFFLOAD_usm_free): New function.
(GOMP_OFFLOAD_is_usm_ptr): New function.
* target.c (gomp_map_vars_internal): Add USM support.
(gomp_usm_alloc): New function.
(gomp_usm_free): New function.
(gomp_load_plugin_for_device): New function.
* testsuite/libgomp.c/usm-1.c: New test.
* testsuite/libgomp.c/usm-2.c: New test.
* testsuite/libgomp.c/usm-3.c: New test.
* testsuite/libgomp.c/usm-4.c: New test.
* testsuite/libgomp.c/usm-5.c: New test.

co-authored-by: Kwok Cheung Yeung  

squash! openmp, nvptx: ompx_unified_shared_mem_alloc
---
 include/cuda/cuda.h | 12 ++
 libgomp/allocator.c | 13 --
 libgomp/config/linux/allocator.c| 48 ++
 libgomp/config/nvptx/allocator.c|  6 +++
 libgomp/libgomp-plugin.h|  3 ++
 libgomp/libgomp.h   |  6 +++
 libgomp/omp.h.in|  4 ++
 libgomp/omp_lib.f90.in  |  8 
 libgomp/plugin/cuda-lib.def |  2 +
 libgomp/plugin/plugin-nvptx.c   | 47 ++---
 libgomp/target.c| 64 +
 libgomp/testsuite/libgomp.c/usm-1.c | 24 +++
 libgomp/testsuite/libgomp.c/usm-2.c | 32 +++
 libgomp/testsuite/libgomp.c/usm-3.c | 35 
 libgomp/testsuite/libgomp.c/usm-4.c | 36 
 libgomp/testsuite/libgomp.c/usm-5.c | 28 +
 16 files changed, 340 insertions(+), 28 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c/usm-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-4.c
 create mode 100644 libgomp/testsuite/libgomp.c/usm-5.c

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 3938d05d150..8135e7c9247 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -77,9 +77,19 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_CONCURRENT_KERNELS = 31,
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR = 75,
+  CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR = 76,
   CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82

[PATCH 12/17] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs


Currently we are only handling omp allocate directive that is associated
with an allocate statement.  This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive.  But the free calls come in a separate cleanup block.  To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.

gcc/fortran/ChangeLog:

* gfortran.h (struct access_ref): Declare new members
omp_allocated and omp_allocated_end.
* openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to
NULL.
(prepare_omp_allocated_var_list_for_cleanup): New function.
(gfc_resolve_omp_allocate): Call it.
* trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated.
* trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt
generated for allocate directive.

gcc/ChangeLog:

* tree-core.h (struct tree_base): Add comments.
* tree-pretty-print.cc (dump_generic_node): Handle allocate directive
kind.
* tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define.
(OMP_ALLOCATE_KIND_FREE): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive.
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/openmp.cc | 30 +++
 gcc/fortran/trans-decl.cc | 20 +
 gcc/fortran/trans-openmp.cc   |  6 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  3 +-
 gcc/tree-core.h   |  6 
 gcc/tree-pretty-print.cc  |  4 +++
 gcc/tree.h|  4 +++
 8 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 755469185a6..c6f58341cf3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1829,6 +1829,7 @@ typedef struct gfc_symbol
   gfc_array_spec *as;
   struct gfc_symbol *result;	/* function result symbol */
   gfc_component *components;	/* Derived type components */
+  gfc_omp_namelist *omp_allocated, *omp_allocated_end;
 
   /* Defined only for Cray pointees; points to their pointer.  */
   struct gfc_symbol *cp_pointer;
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 38003890bb0..4c94bc763b5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -6057,6 +6057,7 @@ gfc_match_omp_allocate (void)
 
   new_st.op = EXEC_OMP_ALLOCATE;
   new_st.ext.omp_clauses = c;
+  new_st.resolved_sym = NULL;
   gfc_free_expr (allocator);
   return MATCH_YES;
 }
@@ -9548,6 +9549,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
 }
 }
 
+static void
+prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc)
+{
+  gfc_symbol *proc = cn->sym->ns->proc_name;
+  gfc_omp_namelist *p, *n;
+
+  for (n = cn; n; n = n->next)
+{
+  if (n->sym->attr.allocatable && !n->sym->attr.save
+	  && !n->sym->attr.result && !proc->attr.is_main_program)
+	{
+	  p = gfc_get_omp_namelist ();
+	  p->sym = n->sym;
+	  p->expr = gfc_copy_expr (n->expr);
+	  p->where = loc;
+	  p->next = NULL;
+	  if (proc->omp_allocated == NULL)
+	proc->omp_allocated_end = proc->omp_allocated = p;
+	  else
+	{
+	  proc->omp_allocated_end->next = p;
+	  proc->omp_allocated_end = p;
+	}
+
+	}
+}
+}
+
 static void
 check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
    gfc_namespace *ns, locus loc)
@@ -9678,6 +9707,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
 		 code->loc);
 	}
 }
+  prepare_omp_allocated_var_list_for_cleanup (cn, code->loc);
 }
 
 
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 6493cc2f6b1..326365f22fc 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -4588,6 +4588,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 	  }
 }
 
+  /* Generate a dummy allocate pragma with free kind so that cleanup
+ of those variables which were allocated using the allocate statement
+ associated with an allocate clause happens correctly.  */
+
+  if (proc_sym->omp_allocated)
+{
+  gfc_clear_new_st ();
+  new_st.op = EXEC_OMP_ALLOCATE;
+  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated;
+  new_st.ext.omp_clauses = c;
+  /* This is just a hacky way to convey to handler that we are
+	 dealing with cleanup here.  Saves us from using another field
+	 for it.  */
+  new_st.resolved_sym = proc_sym->omp_allocated->sym;
+  gfc_add_init_cleanup (block, NULL,
+			gfc_trans_omp_directive (&new_st));
+  gfc_free_omp_clauses (c);
+  proc_sym->omp_allocated = NULL;
+}
 
   /* Initialize the INTENT(OUT) derived type dummy argu

[PATCH 07/17] openmp: allow requires unified_shared_memory

2022-07-07 Thread Andrew Stubbs


This is the front-end portion of the Unified Shared Memory implementation.
It removes the "sorry, unimplemented message" in C, C++, and Fortran, and sets
flag_offload_memory, but is otherwise inactive, for now.

It also checks that -foffload-memory isn't set to an incompatible mode.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_requires): Allow "requires
  unified_share_memory" and "unified_address".

gcc/cp/ChangeLog:

* parser.cc (cp_parser_omp_requires): Allow "requires
unified_share_memory" and "unified_address".

gcc/fortran/ChangeLog:

* openmp.cc (gfc_match_omp_requires): Allow "requires
unified_share_memory" and "unified_address".

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/usm-1.c: New test.
* c-c++-common/gomp/usm-4.c: New test.
* gfortran.dg/gomp/usm-1.f90: New test.
* gfortran.dg/gomp/usm-4.f90: New test.
---
 gcc/c/c-parser.cc| 22 --
 gcc/cp/parser.cc | 22 --
 gcc/fortran/openmp.cc| 13 +
 gcc/testsuite/c-c++-common/gomp/usm-1.c  |  4 
 gcc/testsuite/c-c++-common/gomp/usm-4.c  |  4 
 gcc/testsuite/gfortran.dg/gomp/usm-1.f90 |  6 ++
 gcc/testsuite/gfortran.dg/gomp/usm-4.f90 |  6 ++
 7 files changed, 73 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/usm-4.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/usm-4.f90

diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 9c02141e2c6..c30f67cd2da 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -22726,9 +22726,27 @@ c_parser_omp_requires (c_parser *parser)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	{
+	  this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "% is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	}
 	  else if (!strcmp (p, "unified_shared_memory"))
-	this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	{
+	  this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "% is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	}
 	  else if (!strcmp (p, "dynamic_allocators"))
 	this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index df657a3fb2b..3deafc7c928 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -46860,9 +46860,27 @@ cp_parser_omp_requires (cp_parser *parser, cp_token *pragma_tok)
 	  enum omp_requires this_req = (enum omp_requires) 0;
 
 	  if (!strcmp (p, "unified_address"))
-	this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+	{
+	  this_req = OMP_REQUIRES_UNIFIED_ADDRESS;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "% is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	}
 	  else if (!strcmp (p, "unified_shared_memory"))
-	this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+	{
+	  this_req = OMP_REQUIRES_UNIFIED_SHARED_MEMORY;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+		  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+		error_at (cloc,
+			  "% is incompatible with the "
+			  "selected %<-foffload-memory%> option");
+	  flag_offload_memory = OFFLOAD_MEMORY_UNIFIED;
+	}
 	  else if (!strcmp (p, "dynamic_allocators"))
 	this_req = OMP_REQUIRES_DYNAMIC_ALLOCATORS;
 	  else if (!strcmp (p, "reverse_offload"))
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index bd4ff259fe0..91bf8a3c50d 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "gomp-constants.h"
 #include "target-memory.h"  /* For gfc_encode_character.  */
+#include "options.h"
 
 /* Match an end of OpenMP directive.  End of OpenMP directive is optional
whitespace, followed by '\n' or comment '!'.  */
@@ -5556,6 +5557,12 @@ gfc_match_omp_requires (void)
 	  requires_clause = OMP_REQ_UNIFIED_ADDRESS;
 	  if (requires_clauses & OMP_REQ_UNIFIED_ADDRESS)
 	goto duplicate_clause;
+
+	  if (flag_offload_memory != OFFLOAD_MEMORY_UNIFIED
+	  && flag_offload_memory != OFFLOAD_MEMORY_NONE)
+	gfc_error_now ("unified_address

[PATCH 11/17] Translate allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs


gcc/fortran/ChangeLog:

* trans-openmp.cc (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
(gfc_trans_omp_allocate): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
(dump_generic_node): Handle OMP_ALLOCATE.
* tree.def (OMP_ALLOCATE): New.
* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
(OMP_ALLOCATE_DECL): Likewise.
(OMP_ALLOCATE_ALLOCATOR): Likewise.
* tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.cc   | 44 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++
 gcc/tree-core.h   |  3 +
 gcc/tree-pretty-print.cc  | 19 +
 gcc/tree.cc   |  1 +
 gcc/tree.def  |  4 ++
 gcc/tree.h| 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90

diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc
index de27ed52c02..3ee63e416ed 100644
--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -2728,6 +2728,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
 		  }
 	  }
 	  break;
+	case OMP_LIST_ALLOCATOR:
+	  for (; n != NULL; n = n->next)
+	if (n->sym->attr.referenced)
+	  {
+		tree t = gfc_trans_omp_variable (n->sym, false);
+		if (t != error_mark_node)
+		  {
+		tree node = build_omp_clause (input_location,
+		  OMP_CLAUSE_ALLOCATOR);
+		OMP_ALLOCATE_DECL (node) = t;
+		if (n->expr)
+		  {
+			tree allocator_;
+			gfc_init_se (&se, NULL);
+			gfc_conv_expr (&se, n->expr);
+			allocator_ = gfc_evaluate_now (se.expr, block);
+			OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+		  }
+		omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+		  }
+	  }
+	  break;
 	case OMP_LIST_LINEAR:
 	  {
 	gfc_expr *last_step_expr = NULL;
@@ -4982,6 +5004,26 @@ gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block (&block);
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block (&block);
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
+		   code->loc, false,
+		   true);
+  gfc_add_expr_to_block (&block, stmt);
+  gfc_merge_block_scope (&block);
+  return gfc_finish_block (&block);
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7488,6 +7530,8 @@ gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
 {
+case EXEC_OMP_ALLOCATE:
+  return gfc_trans_omp_allocate (code);
 case EXEC_OMP_ATOMIC:
   return gfc_trans_omp_atomic (code);
 case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  integer :: i
+  integer :: j
+  real :: x
+end type
+
+  integer  :: x
+  integer  :: y
+  integer (kind=omp_allocator_handle_kind) :: al
+
+  integer, allocatable :: var1
+  integer, allocatable :: var2
+  real, allocatable :: var3(:,:)
+  type (my_type), allocatable :: var4
+  integer, pointer :: pii, parr(:)
+
+  character, allocatable :: str1a, str1aarr(:) 
+  character(len=5), allocatable :: str5a, str5aarr(:)
+  
+  !$

[PATCH 14/17] Lower allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs


This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.

gcc/ChangeLog:

* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
(scan_omp_1_stmt): Call it.
(lower_omp_allocate): New function.
(lower_omp_1): Call it.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.
* gfortran.dg/gomp/allocate-7.f90: New test.
* gfortran.dg/gomp/allocate-8.f90: New test.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/allocate-2.f90: New test.
---
 gcc/omp-low.cc| 139 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |   9 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90 |  13 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90 |  15 ++
 .../testsuite/libgomp.fortran/allocate-2.f90  |  48 ++
 5 files changed, 224 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-7.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-8.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index cdadd6f0c96..7d1a2a0d795 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1746,6 +1746,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_TASK_REDUCTION:
 	case OMP_CLAUSE_ALLOCATE:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -1963,6 +1964,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_FINALIZE:
 	case OMP_CLAUSE_FILTER:
 	case OMP_CLAUSE__CONDTEMP_:
+	case OMP_CLAUSE_ALLOCATOR:
 	  break;
 
 	case OMP_CLAUSE__CACHE_:
@@ -3033,6 +3035,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for *stmt,
   maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true;
 }
 
+/* Scan an OpenMP allocate directive.  */
+
+static void
+scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx)
+{
+  omp_context *ctx;
+  ctx = new_omp_context (stmt, outer_ctx);
+  scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -4332,6 +4344,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
 	insert_decl_map (&ctx->cb, var, var);
   }
   break;
+case GIMPLE_OMP_ALLOCATE:
+  scan_omp_allocate (as_a  (stmt), ctx);
+  break;
 default:
   *handled_ops_p = false;
   break;
@@ -8768,6 +8783,125 @@ lower_omp_single_simple (gomp_single *single_stmt, gimple_seq *pre_p)
   gimple_seq_add_stmt (pre_p, gimple_build_label (flabel));
 }
 
+static void
+lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *ctx)
+{
+  gomp_allocate *st = as_a  (gsi_stmt (*gsi_p));
+  tree clauses = gimple_omp_allocate_clauses (st);
+  int kind = gimple_omp_allocate_kind (st);
+  gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE
+	  || kind == GF_OMP_ALLOCATE_KIND_FREE);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR)
+	continue;
+
+  bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE);
+  /* The allocate directives that appear in a target region must specify
+	 an allocator clause unless a requires directive with the
+	 dynamic_allocators clause is present in the same compilation unit.  */
+  if (OMP_ALLOCATE_ALLOCATOR (c) == NULL_TREE
+	  && ((omp_requires_mask & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0)
+	  && omp_maybe_offloaded_ctx (ctx))
+	error_at (OMP_CLAUSE_LOCATION (c), "% directive must"
+		  " specify an allocator here");
+
+  tree var = OMP_ALLOCATE_DECL (c);
+
+  gimple_stmt_iterator gsi = *gsi_p;
+  for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+
+	  if (gimple_code (stmt) != GIMPLE_CALL
+	  || (allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_MALLOC))
+	  || (!allocate && gimple_call_fndecl (stmt)
+		  != builtin_decl_explicit (BUILT_IN_FREE)))
+	continue;
+	  const gcall *gs = as_a  (stmt);
+	  tree allocator = OMP_ALLOCATE_ALLOCATOR (c)
+			   ? OMP_ALLOCATE_ALLOCATOR (c)
+			   : integer_zero_node;
+	  if (allocate)
+	{
+	  tree lhs = gimple_call_lhs (gs);
+	  if (lhs && TREE_CODE (lhs) == SSA_NAME)
+		{
+		  gimple_stmt_iterator gsi2 = gsi;
+		  gsi_next (&gsi2);
+		  gimple *assign = gsi_stmt (gsi2);
+		  if (gimple_code (assign) == GIMPLE_ASSIGN)
+		{
+		  lhs = gimple_assign_lhs (as_a  (assign));
+		  if (lhs == NULL_TREE
+			  || TREE_CODE (lhs) != COMPONENT_REF)
+			continue;
+		  lhs = TREE_OPERAND (lhs, 0);
+		}
+		}
+
+	  if (lhs == var)
+		{
+		  unsigned HOST_WIDE_INT ialign = 0;
+		  tree align;
+		  if (TYPE_P (var))
+		ialign = TYPE_ALIGN_UNIT (var);
+		  else
+		ialign = DECL_ALIGN_UNIT (var);

[PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs


Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning.  It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.

In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/alloc-pinned-1.c: New test.
---
 gcc/omp-builtins.def  |  3 +
 gcc/omp-low.cc| 66 +++
 .../c-c++-common/gomp/alloc-pinned-1.c| 28 
 libgomp/config/linux/allocator.c  | 26 
 libgomp/libgomp.map   |  1 +
 libgomp/target.c  |  4 +-
 libgomp/testsuite/libgomp.c/alloc-pinned-7.c  | 63 ++
 7 files changed, 190 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-7.c

diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index ee5213eedcf..276dd7812f2 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -470,3 +470,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_WARNING, "GOMP_warning",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ERROR, "GOMP_error",
 		  BT_FN_VOID_CONST_PTR_SIZE, ATTR_COLD_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GOMP_BUILTIN (BUILT_IN_GOMP_ENABLE_PINNED_MODE,
+		  "GOMP_enable_pinned_mode",
+		  BT_FN_VOID, ATTR_NOTHROW_LIST)
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..ba612e5c67d 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
   input_location = saved_location;
 }
 
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()
+{
+  static bool visited = false;
+  if (visited)
+return;
+  visited = true;
+
+  /* Create a new function like this:
+ 
+   static void __attribute__((constructor))
+   __set_pinned_mode ()
+   {
+ GOMP_enable_pinned_mode ();
+   }
+  */
+
+  tree name = get_identifier ("__set_pinned_mode");
+  tree voidfntype = build_function_type_list (void_type_node, NULL_TREE);
+  tree decl = build_decl (UNKNOWN_LOCATION, FUNCTION_DECL, name, voidfntype);
+
+  TREE_STATIC (decl) = 1;
+  TREE_USED (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 0;
+  TREE_PUBLIC (decl) = 0;
+  DECL_UNINLINABLE (decl) = 1;
+  DECL_EXTERNAL (decl) = 0;
+  DECL_CONTEXT (decl) = NULL_TREE;
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  BLOCK_SUPERCONTEXT (DECL_INITIAL (decl)) = decl;
+  DECL_STATIC_CONSTRUCTOR (decl) = 1;
+  DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("constructor"),
+  NULL_TREE, NULL_TREE);
+
+  tree t = build_decl (UNKNOWN_LOCATION, RESULT_DECL, NULL_TREE,
+		   void_type_node);
+  DECL_ARTIFICIAL (t) = 1;
+  DECL_IGNORED_P (t) = 1;
+  DECL_CONTEXT (t) = decl;
+  DECL_RESULT (decl) = t;
+
+  push_struct_function (decl);
+  init_tree_ssa (cfun);
+
+  tree calldecl = builtin_decl_explicit (BUILT_IN_GOMP_ENABLE_PINNED_MODE);
+  gcall *call = gimple_build_call (calldecl, 0);
+
+  gimple_seq seq = NULL;
+  gimple_seq_add_stmt (&seq, call);
+  gimple_set_body (decl, gimple_build_bind (NULL_TREE, seq, NULL));
+
+  cfun->function_end_locus = UNKNOWN_LOCATION;
+  cfun->curr_properties |= PROP_gimple_any;
+  pop_cfun ();
+  cgraph_node::add_new_function (decl, true);
+}
+
 /* Main entry point.  */
 
 static unsigned int
@@ -14676,6 +14738,10 @@ execute_lower_omp (void)
   for (auto task_stmt : task_cpyfns)
 finalize_task_copyfn (task_stmt);
   task_cpyfns.release ();
+
+  if (flag_offload_memory == OFFLOAD_MEMORY_PINNED)
+omp_enable_pinned_mode ();
+
   return 0;
 }
 
diff --git a/gcc/testsuite/c-c++-common/gomp/alloc-pinned-1.c b/gcc/testsuite/c-c++-common/gomp/alloc-pi

[PATCH 13/17] Gimplify allocate directive (OpenMP 5.0).

2022-07-07 Thread Andrew Stubbs


gcc/ChangeLog:

* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
* gimple-pretty-print.cc (dump_gimple_omp_allocate): New function.
(pp_gimple_stmt_1): Call it.
* gimple.cc (gimple_build_omp_allocate): New function.
* gimple.def (GIMPLE_OMP_ALLOCATE): New node.
* gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK,
GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE.
(struct gomp_allocate): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(gimple_build_omp_allocate): Declare.
(gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with
GIMPLE_OMP_ALLOCATE.
(gimple_omp_allocate_set_clauses): New.
(gimple_omp_allocate_set_kind): Likewise.
(gimple_omp_allocate_clauses): Likewise.
(gimple_omp_allocate_kind): Likewise.
(CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE.
* gimplify.cc (gimplify_omp_allocate): New.
(gimplify_expr): Call it.
* gsstruct.def (GSS_OMP_ALLOCATE): Define.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.
---
 gcc/doc/gimple.texi   | 38 +++-
 gcc/gimple-pretty-print.cc| 37 
 gcc/gimple.cc | 12 
 gcc/gimple.def|  6 ++
 gcc/gimple.h  | 60 ++-
 gcc/gimplify.cc   | 19 ++
 gcc/gsstruct.def  |  1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  4 +-
 8 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index dd9149377f3..67b9061f3a7 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values (layouts) and
  + gomp_continue
  |layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE
  |
+ + gomp_allocate
+ |layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE
+ |
  + gomp_atomic_load
  |layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD
  |
@@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE instruction set.
 @item @code{GIMPLE_GOTO}		@tab x			@tab x
 @item @code{GIMPLE_LABEL}		@tab x			@tab x
 @item @code{GIMPLE_NOP}			@tab x			@tab x
+@item @code{GIMPLE_OMP_ALLOCATE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_LOAD}	@tab x			@tab x
 @item @code{GIMPLE_OMP_ATOMIC_STORE}	@tab x			@tab x
 @item @code{GIMPLE_OMP_CONTINUE}	@tab x			@tab x
@@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}.
 * @code{GIMPLE_LABEL}::
 * @code{GIMPLE_GOTO}::
 * @code{GIMPLE_NOP}::
+* @code{GIMPLE_OMP_ALLOCATE}::
 * @code{GIMPLE_OMP_ATOMIC_LOAD}::
 * @code{GIMPLE_OMP_ATOMIC_STORE}::
 * @code{GIMPLE_OMP_CONTINUE}::
@@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement.
 Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}.
 @end deftypefn
 
+@node @code{GIMPLE_OMP_ALLOCATE}
+@subsection @code{GIMPLE_OMP_ALLOCATE}
+@cindex @code{GIMPLE_OMP_ALLOCATE}
+
+@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @
+tree clauses, int kind)
+Build a @code{GIMPLE_OMP_ALLOCATE} statement.  @code{CLAUSES} is the clauses
+associated with this node.  @code{KIND} is the enumeration value
+@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory
+or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @
+gomp_allocate *g, tree clauses)
+Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @
+const gomp_allocate *g)
+Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @
+gomp_allocate *g, int kind)
+Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @
+const gomp_atomic_load *g)
+Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
 @node @code{GIMPLE_OMP_ATOMIC_LOAD}
 @subsection @code{GIMPLE_OMP_ATOMIC_LOAD}
 @cindex @code{GIMPLE_OMP_ATOMIC_LOAD}
@@ -1760,7 +1797,6 @@ const gomp_atomic_load *g)
 Get the @code{RHS} of an atomic set.
 @end deftypefn
 
-
 @node @code{GIMPLE_OMP_ATOMIC_STORE}
 @subsection @code{GIMPLE_OMP_ATOMIC_STORE}
 @cindex @code{GIMPLE_OMP_ATOMIC_STORE}
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index ebd87b20a0a..bb961a900df 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *buffer, const gomp_critical *gs,
 }
 }
 
+static void
+dump_gimple_omp_allocate (pretty_printer *buffer, const gomp_allocate *gs,
+			  int spc, dump_flags_t fl

[PATCH 10/17] Add parsing support for allocate directive (OpenMP 5.0)

2022-07-07 Thread Andrew Stubbs


Currently we only make use of this directive when it is associated
with an allocate statement.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_ALLOCATE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
(OMP_LIST_ALLOCATOR): New enum value.
(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
* match.h (gfc_match_omp_allocate): New function.
* openmp.cc (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
(OMP_ALLOCATE_CLAUSES): New define.
(gfc_match_omp_allocate): New function.
(resolve_omp_clauses): Add ALLOCATOR in clause_names.
(omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
(EMPTY_VAR_LIST): New define.
(check_allocate_directive_restrictions): New function.
(gfc_resolve_omp_allocate): Likewise.
(gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
* parse.cc (decode_omp_directive): Handle ST_OMP_ALLOCATE.
(next_statement): Likewise.
(gfc_ascii_statement): Likewise.
* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
* st.cc (gfc_free_statement): Likewise.
* trans.cc (trans_code): Likewise
---
 gcc/fortran/dump-parse-tree.cc|   3 +
 gcc/fortran/gfortran.h|   4 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 199 +-
 gcc/fortran/parse.cc  |  10 +-
 gcc/fortran/resolve.cc|   1 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans.cc  |   1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 +++
 10 files changed, 400 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90

diff --git a/gcc/fortran/dump-parse-tree.cc b/gcc/fortran/dump-parse-tree.cc
index 5352008a63d..e0c6c0d9d96 100644
--- a/gcc/fortran/dump-parse-tree.cc
+++ b/gcc/fortran/dump-parse-tree.cc
@@ -2003,6 +2003,7 @@ show_omp_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
 case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
 case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break;
 case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
 case EXEC_OMP_BARRIER: name = "BARRIER"; break;
 case EXEC_OMP_CANCEL: name = "CANCEL"; break;
@@ -2204,6 +2205,7 @@ show_omp_node (int level, gfc_code *c)
   || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA
   || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN
   || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR
+  || c->op == EXEC_OMP_ALLOCATE
   || (c->op == EXEC_OMP_ORDERED && c->block == NULL))
 return;
   if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS)
@@ -3329,6 +3331,7 @@ show_code_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE:
 case EXEC_OACC_ENTER_DATA:
 case EXEC_OACC_EXIT_DATA:
+case EXEC_OMP_ALLOCATE:
 case EXEC_OMP_ATOMIC:
 case EXEC_OMP_CANCEL:
 case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 696aadd7db6..755469185a6 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -259,7 +259,7 @@ enum gfc_statement
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
   ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
-  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1398,6 +1398,7 @@ enum
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
   OMP_LIST_ALLOCATE,
+  OMP_LIST_ALLOCATOR,
   OMP_LIST_HAS_DEVICE_ADDR,
   OMP_LIST_ENTER,
   OMP_LIST_NUM /* Must be the last.  */
@@ -2908,6 +2909,7 @@ enum gfc_exec_op
   EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
   EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
   EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
+  EXEC_OMP_ALLOCATE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 495c93e0b5c..fe43d4b3fd3 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortran/match.h
@@ -149,6 +149,7 @@ match gfc_match_oacc_routine (

[PATCH 17/17] amdgcn: libgomp plugin USM implementation

2022-07-07 Thread Andrew Stubbs


Implement the Unified Shared Memory API calls in the GCN plugin.

The allocate and free are pretty straight-forward because all "target" memory
allocations are compatible with USM, on the right hardware.  However, there's
no known way to check what memory region was used, after the fact, so we use a
splay tree to record allocations so we can answer "is_usm_ptr" later.

libgomp/ChangeLog:

* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Allow
GOMP_REQUIRES_UNIFIED_ADDRESS and GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
(struct usm_splay_tree_key_s): New.
(usm_splay_compare): New.
(splay_tree_prefix): New.
(GOMP_OFFLOAD_usm_alloc): New.
(GOMP_OFFLOAD_usm_free): New.
(GOMP_OFFLOAD_is_usm_ptr): New.
(GOMP_OFFLOAD_supported_features): Move into the OpenMP API fold.
Add GOMP_REQUIRES_UNIFIED_ADDRESS and
GOMP_REQUIRES_UNIFIED_SHARED_MEMORY.
(gomp_fatal): New.
(splay_tree_c): New.
* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
* testsuite/libgomp.c++/usm-1.C: Use dg-require-effective-target.
* testsuite/libgomp.c-c++-common/requires-1.c: Likewise.
* testsuite/libgomp.c/usm-1.c: Likewise.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.
* testsuite/libgomp.c/usm-5.c: Likewise.
* testsuite/libgomp.c/usm-6.c: Likewise.
---
 libgomp/plugin/plugin-gcn.c   | 104 +-
 libgomp/testsuite/lib/libgomp.exp |  22 
 libgomp/testsuite/libgomp.c++/usm-1.C |   2 +-
 .../libgomp.c-c++-common/requires-1.c |   1 +
 libgomp/testsuite/libgomp.c/usm-1.c   |   1 +
 libgomp/testsuite/libgomp.c/usm-2.c   |   1 +
 libgomp/testsuite/libgomp.c/usm-3.c   |   1 +
 libgomp/testsuite/libgomp.c/usm-4.c   |   1 +
 libgomp/testsuite/libgomp.c/usm-5.c   |   2 +-
 libgomp/testsuite/libgomp.c/usm-6.c   |   2 +-
 10 files changed, 133 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index ea327bf2ca0..6a9ff5cd93e 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3226,7 +3226,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (!init_hsa_context ())
 return 0;
   /* Return -1 if no omp_requires_mask cannot be fulfilled but
- devices were present.  */
+ devices were present.
+ Note: not all devices support USM, but the compiler refuses to create
+ binaries for those that don't anyway.  */
+  omp_requires_mask &= ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+			 | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY);
   if (hsa_context.agent_count > 0 && omp_requires_mask != 0)
 return -1;
   return hsa_context.agent_count;
@@ -3810,6 +3814,89 @@ GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
 		   GOMP_PLUGIN_target_task_completion, async_data);
 }
 
+/* Use a splay tree to track USM allocations.  */
+
+typedef struct usm_splay_tree_node_s *usm_splay_tree_node;
+typedef struct usm_splay_tree_s *usm_splay_tree;
+typedef struct usm_splay_tree_key_s *usm_splay_tree_key;
+
+struct usm_splay_tree_key_s {
+  void *addr;
+  size_t size;
+};
+
+static inline int
+usm_splay_compare (usm_splay_tree_key x, usm_splay_tree_key y)
+{
+  if ((x->addr <= y->addr && x->addr + x->size > y->addr)
+  || (y->addr <= x->addr && y->addr + y->size > x->addr))
+return 0;
+
+  return (x->addr > y->addr ? 1 : -1);
+}
+
+#define splay_tree_prefix usm
+#include "../splay-tree.h"
+
+static struct usm_splay_tree_s usm_map = { NULL };
+
+/* Allocate memory suitable for Unified Shared Memory.
+
+   In fact, AMD memory need only be "coarse grained", which target
+   allocations already are.  We do need to track allocations so that
+   GOMP_OFFLOAD_is_usm_ptr can look them up.  */
+
+void *
+GOMP_OFFLOAD_usm_alloc (int device, size_t size)
+{
+  void *ptr = GOMP_OFFLOAD_alloc (device, size);
+
+  usm_splay_tree_node node = malloc (sizeof (struct usm_splay_tree_node_s));
+  node->key.addr = ptr;
+  node->key.size = size;
+  node->left = NULL;
+  node->right = NULL;
+  usm_splay_tree_insert (&usm_map, node);
+
+  return ptr;
+}
+
+/* Free memory allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_usm_free (int device, void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  usm_splay_tree_key node = usm_splay_tree_lookup (&usm_map, &key);
+  if (node)
+{
+  usm_splay_tree_remove (&usm_map, &key);
+  free (node);
+}
+
+  return GOMP_OFFLOAD_free (device, ptr);
+}
+
+/* True if the memory was allocated via GOMP_OFFLOAD_usm_alloc.  */
+
+bool
+GOMP_OFFLOAD_is_usm_ptr (void *ptr)
+{
+  struct usm_splay_tree_key_s key = { ptr, 1 };
+  return usm_splay_tree_lookup (&usm_map, &key);
+}
+
+/* Indicate which GOMP_REQUIRES_* features are supported.  */
+
+bool
+GO

[PATCH 15/17] amdgcn: Support XNACK mode

2022-07-07 Thread Andrew Stubbs


The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt.  This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.

To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber.  When the port supports scheduling
of s_waitcnt instructions there will be further requirements.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (XNACKOPT): New macro.
(ASM_SPEC): Use XNACKOPT.
* config/gcn/gcn-opts.h (enum sram_ecc_type): Rename to ...
(enum hsaco_attr_type): ... this, and generalize the names.
(TARGET_XNACK): New macro.
* config/gcn/gcn-valu.md (gather_insn_1offset):
Add xnack compatible alternatives.
(gather_insn_2offsets): Likewise.
* config/gcn/gcn.c (gcn_option_override): Permit -mxnack for devices
other than Fiji.
(gcn_expand_epilogue): Remove early-clobber problems.
(output_file_start): Emit xnack attributes.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov_insn): Likewise.
(*mov_insn): Likewise.
(*mov_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Add the "on/off/any" syntax.
(sram_ecc_type): Rename to ...
(hsaco_attr_type: ... this.)
* config/gcn/mkoffload.c (SET_XNACK_ANY): New macro.
(TEST_XNACK): Delete.
(TEST_XNACK_ANY): New macro.
(TEST_XNACK_ON): New macro.
(main): Support the new -mxnack=on/off/any syntax.
---
 gcc/config/gcn/gcn-hsa.h|   3 +-
 gcc/config/gcn/gcn-opts.h   |  10 ++--
 gcc/config/gcn/gcn-valu.md  |  29 -
 gcc/config/gcn/gcn.cc   |  34 ++-
 gcc/config/gcn/gcn.md   | 113 +++-
 gcc/config/gcn/gcn.opt  |  18 +++---
 gcc/config/gcn/mkoffload.cc |  19 --
 7 files changed, 140 insertions(+), 86 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index b3079cebb43..fd08947574f 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -81,12 +81,13 @@ extern unsigned int gcn_local_sym_hash (const char *name);
 /* In HSACOv4 no attribute setting means the binary supports "any" hardware
configuration.  The name of the attribute also changed.  */
 #define SRAMOPT "msram-ecc=on:-mattr=+sramecc;msram-ecc=off:-mattr=-sramecc"
+#define XNACKOPT "mxnack=on:-mattr=+xnack;mxnack=off:-mattr=-xnack"
 
 /* Use LLVM assembler and linker options.  */
 #define ASM_SPEC  "-triple=amdgcn--amdhsa "  \
 		  "%:last_arg(%{march=*:-mcpu=%*}) " \
 		  "%{!march=*|march=fiji:--amdhsa-code-object-version=3} " \
-		  "%{" NO_XNACK "mxnack:-mattr=+xnack;:-mattr=-xnack} " \
+		  "%{" NO_XNACK XNACKOPT "}" \
 		  "%{" NO_SRAM_ECC SRAMOPT "} " \
 		  "-filetype=obj"
 #define LINK_SPEC "--pie --export-dynamic"
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
index b62dfb45f59..07ddc79cda3 100644
--- a/gcc/config/gcn/gcn-opts.h
+++ b/gcc/config/gcn/gcn-opts.h
@@ -48,11 +48,13 @@ extern enum gcn_isa {
 #define TARGET_M0_LDS_LIMIT (TARGET_GCN3)
 #define TARGET_PACKED_WORK_ITEMS (TARGET_CDNA2_PLUS)
 
-enum sram_ecc_type
+#define TARGET_XNACK (flag_xnack != HSACO_ATTR_OFF)
+
+enum hsaco_attr_type
 {
-  SRAM_ECC_OFF,
-  SRAM_ECC_ON,
-  SRAM_ECC_ANY
+  HSACO_ATTR_OFF,
+  HSACO_ATTR_ON,
+  HSACO_ATTR_ANY
 };
 
 #endif
diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index abe46201344..ec114db9dd1 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -741,13 +741,13 @@ (define_expand "gather_expr"
 {})
 
 (define_insn "gather_insn_1offset"
-  [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
+  [(set (match_operand:V_ALL 0 "register_operand"		   "=v,&v")
 	(unspec:V_ALL
-	  [(plus: (match_operand: 1 "register_operand" " v")
+	  [(plus: (match_operand: 1 "register_operand" " v, v")
 			(vec_duplicate:
-			  (match_operand 2 "immediate_operand"	   " n")))
-	   (match_operand 3 "immediate_operand"			   " n")
-	   (match_operand 4 "immediate_operand"			   " n")
+			  (match_operand 2 "immediate_operand"	   " n, n")))
+	   (match_operand 3 "immediate_operand"			   " n, n")
+	   (match_operand 4 "immediate_operand"			   " n, n")
 	   (mem:BLK (scratch))]
 	  UNSPEC_GATHER))]
   "(AS_FLAT_P (INTVAL (operands[3]))
@@ -777,7 +777,8 @@ (define_insn "gather_insn_1offset"
 return buf;
   }
   [(set_attr "type" "flat")
-   (set_attr "length" "12")])
+   (set_attr "length" "12")
+   (set_attr "xnack" "off,on")])
 
 (define_insn "gather_insn_1offset_ds"
   [(set (match_operand:V_ALL 0 "register_operand"		   "=v")
@@ -802,17 +803,18 @@ (define_insn "gather_insn_1offset_ds"
(set_attr "length" "12")])
 
 (define_insn "gather_insn_2o

[PATCH 16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK

2022-07-07 Thread Andrew Stubbs


The AMD GCN runtime must be set to the correct mode for Unified Shared Memory
to work, but this is not always clear at compile and link time due to the split
nature of the offload compilation pipeline.

This patch sets a new attribute on OpenMP offload functions to ensure that the
information is passed all the way to the backend.  The backend then places a
marker in the assembler code for mkoffload to find. Finally mkoffload places a
constructor function into the final program to ensure that the HSA_XNACK
environment variable passes the correct mode to the GPU.

The HSA_XNACK variable must be set before the HSA runtime is even loaded, so
it makes more sense to have this set within the constructor than at some point
later within libgomp or the GCN plugin.

gcc/ChangeLog:

* config/gcn/gcn.c (unified_shared_memory_enabled): New variable.
(gcn_init_cumulative_args): Handle attribute "omp unified memory".
(gcn_hsa_declare_function_name): Emit "MKOFFLOAD OPTIONS: USM+".
* config/gcn/mkoffload.c (TEST_XNACK_OFF): New macro.
(process_asm): Detect "MKOFFLOAD OPTIONS: USM+".
Emit configure_xnack constructor, as required.
* omp-low.c (create_omp_child_function): Add attribute "omp unified
memory".
---
 gcc/config/gcn/gcn.cc   | 28 +++-
 gcc/config/gcn/mkoffload.cc | 37 -
 gcc/omp-low.cc  |  4 
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 4df05453604..88cc505597e 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -68,6 +68,11 @@ static bool ext_gcn_constants_init = 0;
 
 enum gcn_isa gcn_isa = ISA_GCN3;	/* Default to GCN3.  */
 
+/* Record whether the host compiler added "omp unifed memory" attributes to
+   any functions.  We can then pass this on to mkoffload to ensure xnack is
+   compatible there too.  */
+static bool unified_shared_memory_enabled = false;
+
 /* Reserve this much space for LDS (for propagating variables from
worker-single mode to worker-partitioned mode), per workgroup.  Global
analysis could calculate an exact bound, but we don't do that yet.
@@ -2542,6 +2547,25 @@ gcn_init_cumulative_args (CUMULATIVE_ARGS *cum /* Argument info to init */ ,
   if (!caller && cfun->machine->normal_function)
 gcn_detect_incoming_pointer_arg (fndecl);
 
+  if (fndecl && lookup_attribute ("omp unified memory",
+  DECL_ATTRIBUTES (fndecl)))
+{
+  unified_shared_memory_enabled = true;
+
+  switch (gcn_arch)
+	{
+	case PROCESSOR_FIJI:
+	case PROCESSOR_VEGA10:
+	case PROCESSOR_VEGA20:
+	  error ("GPU architecture does not support Unified Shared Memory");
+	default:
+	  ;
+	}
+
+  if (flag_xnack == HSACO_ATTR_OFF)
+	error ("Unified Shared Memory is enabled, but XNACK is disabled");
+}
+
   reinit_regs ();
 }
 
@@ -5458,12 +5482,14 @@ gcn_hsa_declare_function_name (FILE *file, const char *name, tree)
   assemble_name (file, name);
   fputs (":\n", file);
 
-  /* This comment is read by mkoffload.  */
+  /* These comments are read by mkoffload.  */
   if (flag_openacc)
 fprintf (file, "\t;; OPENACC-DIMS: %d, %d, %d : %s\n",
 	 oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_GANG),
 	 oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_WORKER),
 	 oacc_get_fn_dim_size (cfun->decl, GOMP_DIM_VECTOR), name);
+  if (unified_shared_memory_enabled)
+fprintf (asm_out_file, "\t;; MKOFFLOAD OPTIONS: USM+\n");
 }
 
 /* Implement TARGET_ASM_SELECT_SECTION.
diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index cb8903c27cb..5741d0a917b 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -80,6 +80,8 @@
 			 == EF_AMDGPU_FEATURE_XNACK_ANY_V4)
 #define TEST_XNACK_ON(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
 			== EF_AMDGPU_FEATURE_XNACK_ON_V4)
+#define TEST_XNACK_OFF(VAR) ((VAR & EF_AMDGPU_FEATURE_XNACK_V4) \
+			 == EF_AMDGPU_FEATURE_XNACK_OFF_V4)
 
 #define SET_SRAM_ECC_ON(VAR) VAR = ((VAR & ~EF_AMDGPU_FEATURE_SRAMECC_V4) \
 | EF_AMDGPU_FEATURE_SRAMECC_ON_V4)
@@ -474,6 +476,7 @@ static void
 process_asm (FILE *in, FILE *out, FILE *cfile)
 {
   int fn_count = 0, var_count = 0, dims_count = 0, regcount_count = 0;
+  bool unified_shared_memory_enabled = false;
   struct obstack fns_os, dims_os, regcounts_os;
   obstack_init (&fns_os);
   obstack_init (&dims_os);
@@ -498,6 +501,7 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   fn_count += 2;
 
   char buf[1000];
+  char dummy;
   enum
 { IN_CODE,
   IN_METADATA,
@@ -517,6 +521,9 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 		dims_count++;
 	  }
 
+	if (sscanf (buf, " ;; MKOFFLOAD OPTIONS: USM+%c", &dummy) > 0)
+	  unified_shared_memory_enabled = true;
+
 	break;
 	  }
 	case IN_METADATA:
@@ -565,7 +572,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 	  }
 	}
 
-  char dummy;
   if (sscanf (buf, " .section

Re: Enhance 'libgomp.c-c++-common/requires-4.c', 'libgomp.c-c++-common/requires-5.c' testing (was: [Patch][v4] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Thomas Schwinge

Hi Tobias!

On 2022-07-07T11:36:34+0200, Tobias Burnus  wrote:
> On 07.07.22 10:42, Thomas Schwinge wrote:
>> In preparation for other changes:
> ...
>> On 2022-06-29T16:33:02+0200, Tobias Burnus  wrote:
>>> +/* { dg-output "devices present but 'omp requires unified_address, 
>>> unified_shared_memory, reverse_offload' cannot be fulfilled" } */
>> (The latter diagnostic later got conditionalized by 'GOMP_DEBUG=1'.)
>> OK to push the attached "Enhance 'libgomp.c-c++-common/requires-4.c',
>> 'libgomp.c-c++-common/requires-5.c' testing"?
> ...
>>  libgomp/
>>  * testsuite/libgomp.c-c++-common/requires-4.c: Enhance testing.
>>  * testsuite/libgomp.c-c++-common/requires-5.c: Likewise.
> ...
>> --- a/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-4.c
>> @@ -1,22 +1,29 @@
>> -/* { dg-do link { target offloading_enabled } } */
>>   /* { dg-additional-options "-flto" } */
>>   /* { dg-additional-sources requires-4-aux.c } */
>>
>> -/* Check diagnostic by device-compiler's or host compiler's lto1.
>> +/* Check no diagnostic by device-compiler's or host compiler's lto1.
>
> I note that without ENABLE_OFFLOADING that there is never any lto1
> diagnostic.
>
> However, given that no diagnostic is expected, it also works for "!
> offloading_enabled".
>
> Thus, the change fine.

ACK.

>>  Other file uses: 'requires reverse_offload', but that's inactive as
>>  there are no declare target directives, device constructs nor device 
>> routines  */
>>
>> +/* For actual offload execution, prints the following (only) if 
>> GOMP_DEBUG=1:
>> +   "devices present but 'omp requires unified_address, 
>> unified_shared_memory, reverse_offload' cannot be fulfilled"
>> +   and does host-fallback execution.  */
>
> The latter is only true when also device code is produced – and a device
> is available for that/those device types. I think that's what you imply
> by "For actual offload execution"

ACK.

> but it is a bit hidden.
>
> Maybe s/For actual offload execution, prints/It may print/ is clearer?

I've settled on:

/* Depending on offload device capabilities, it may print something like the
   following (only) if GOMP_DEBUG=1:
   "devices present but 'omp requires unified_address, 
unified_shared_memory, reverse_offload' cannot be fulfilled"
   and in that case does host-fallback execution.  */

> In principle, it would be nice if we could test for the output, but
> currently setting an env var for remote execution does not work, yet.
> Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597773.html

Right, I'm aware of that issue with remote testing, and that's why I
didn't propose such output verification.  (In a few other test cases, we
do have 'dg-set-target-env-var GOMP_DEBUG "1"', which then at present are
UNSUPPORTED for remote testing.)

> When set, we could use offload_target_nvptx etc. (..._amdgcn, ..._any)
> to test – as this guarantees that it is compiled for that device + the
> device is available.

Use 'target offload_device_nvptx', not 'target offload_target_nvptx',
etc.  ;-)

>> +
>>   #pragma omp requires unified_address,unified_shared_memory
>>
>> -int a[10];
>> +int a[10] = { 0 };
>>   extern void foo (void);
>>
>>   int
>>   main (void)
>>   {
>> -  #pragma omp target
>> +  #pragma omp target map(to: a)
>
> Hmm, I wonder whether I like it or not. Without, there is an implicit
> "map(tofrom:a)". On the other hand, OpenMP permits that – even with
> unified-shared memory – the implementation my copy the data to the
> device. (For instance, to permit faster access to "a".)
>
> Thus, ...
>
>> +  for (int i = 0; i < 10; i++)
>> +a[i] = i;
>> +
>> for (int i = 0; i < 10; i++)
>> -a[i] = 0;
>> +if (a[i] != i)
>> +  __builtin_abort ();
> ... this condition (back on the host) could also fail with USM. However,
> given that to my knowledge no USM implementation actually copies the
> data, I believe it is fine.

Right, this is meant to describe/test the current GCC master branch
behavior, where USM isn't supported, so I didn't consider that.  But I
agree, a source code comment should be added:

   As no offload devices support USM at present, we may verify host-fallback
   execution by absence of separate memory spaces.  */

> (Disclaimer: I have not checked what OG12,
> but I guess it also does not copy it.)

>> --- a/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
>> +++ b/libgomp/testsuite/libgomp.c-c++-common/requires-5.c
>> @@ -1,21 +1,25 @@
>> -/* { dg-do run { target { offload_target_nvptx || offload_target_amdgcn } } 
>> } */
>>   /* { dg-additional-sources requires-5-aux.c } */
>>
>> +/* For actual offload execution, prints the following (only) if 
>> GOMP_DEBUG=1:
>> +   "devices present but 'omp requires unified_address, 
>> unified_shared_memory, reverse_offload' cannot be fulfilled"
>> +   and does host-fallback execution.  */
>> +
> This wording is correct with the now-removed ch

[statistics.cc] ICE in get_function_name with fortran test-case

2022-07-07 Thread Prathamesh Kulkarni via Gcc-patches

Hi,
My recent commit to emit asm name with -fdump-statistics-asmname
caused following ICE
for attached fortran test case.

during IPA pass: icf
power.fppized.f90:6:26:

6 | END SUBROUTINE power_print
  |  ^
internal compiler error: Segmentation fault
0xfddc13 crash_signal
../../gcc/gcc/toplev.cc:322
0x7f6f940de51f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0xfc909d get_function_name
../../gcc/gcc/statistics.cc:124
0xfc929f statistics_fini_pass_2(statistics_counter**, void*)
../../gcc/gcc/statistics.cc:175
0xfc94a4 void hash_table::traverse_noresize(void*)
../../gcc/gcc/hash-table.h:1084
0xfc94a4 statistics_fini_pass()
../../gcc/gcc/statistics.cc:219
0xef12bc execute_todo
../../gcc/gcc/passes.cc:2142

This happens because fn was passed NULL in get_function_name.
The patch adds a check to see if fn is NULL before checking for
DECL_ASSEMBLER_NAME_SET_P, which fixes the issue.
In case the fn is NULL, it calls function_name(NULL) as per old behavior,
which returns "(nofn)".

Bootstrap+tested on x86_64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
diff --git a/gcc/statistics.cc b/gcc/statistics.cc
index 6c21415bf65..01ad353e3a9 100644
--- a/gcc/statistics.cc
+++ b/gcc/statistics.cc
@@ -121,7 +121,7 @@ static const char *
 get_function_name (struct function *fn)
 {
   if ((statistics_dump_flags & TDF_ASMNAME)
-  && DECL_ASSEMBLER_NAME_SET_P (fn->decl))
+  && fn && DECL_ASSEMBLER_NAME_SET_P (fn->decl))
 {
   tree asmname = decl_assembler_name (fn->decl);
   if (asmname)


power.fppized.f90
Description: Binary data

Re: Fix Intel MIC 'mkoffload' for OpenMP 'requires' (was: [Patch] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Thomas Schwinge

Hi Tobias!

On 2022-07-06T15:30:57+0200, Tobias Burnus  wrote:
> On 06.07.22 14:38, Thomas Schwinge wrote:
>> :-) Haha, that's actually *exactly* what I had implemented first!  But
>> then I realized that 'target offloading_enabled' is doing exactly that:
>> check that offloading compilation is configured -- not that "there is an
>> offloading device available or not" as you seem to understand?  Or am I
>> confused there?
>
> I think as you mentioned below – there is a difference.

Eh, thanks for un-confusing me on that aspect!  There's a reason after
all that 'offloading_enabled' lives in 'gcc/testsuite/lib/'...

> And that difference,
> I explicitly maked use of: [...]

> Granted, as the other files do not use -foffload=..., it should not
> make a difference - but, still, replacing it unconditionally
> with 'target offloading_enabled' feels wrong.

ACK!

I've pushed to master branch
commit 9ef714539cb7cc1cd746312fd5dcc987bf167471
"Fix Intel MIC 'mkoffload' for OpenMP 'requires'", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9ef714539cb7cc1cd746312fd5dcc987bf167471 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 5 Jul 2022 12:21:33 +0200
Subject: [PATCH] Fix Intel MIC 'mkoffload' for OpenMP 'requires'

Similar to how the other 'mkoffload's got changed in
recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".

This also means finally switching Intel MIC 'mkoffload' to
'GOMP_offload_register_ver', 'GOMP_offload_unregister_ver',
making 'GOMP_offload_register', 'GOMP_offload_unregister'
legacy entry points.

	gcc/
	* config/i386/intelmic-mkoffload.cc (generate_host_descr_file)
	(prepare_target_image, main): Handle OpenMP 'requires'.
	(generate_host_descr_file): Switch to 'GOMP_offload_register_ver',
	'GOMP_offload_unregister_ver'.
	libgomp/
	* target.c (GOMP_offload_register, GOMP_offload_unregister):
	Denote as legacy entry points.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_target_any): New proc.
	* testsuite/libgomp.c-c++-common/requires-1.c: Enable for
	'offload_target_any'.
	* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
	* testsuite/libgomp.fortran/requires-1.f90: Likewise.
---
 gcc/config/i386/intelmic-mkoffload.cc | 56 +++
 libgomp/target.c  |  4 ++
 libgomp/testsuite/lib/libgomp.exp |  5 ++
 .../libgomp.c-c++-common/requires-1.c |  2 +-
 .../libgomp.c-c++-common/requires-3.c |  2 +-
 .../libgomp.c-c++-common/requires-7.c |  2 +-
 .../testsuite/libgomp.fortran/requires-1.f90  |  2 +-
 7 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/intelmic-mkoffload.cc b/gcc/config/i386/intelmic-mkoffload.cc
index c683d6f473e..596f6f107b8 100644
--- a/gcc/config/i386/intelmic-mkoffload.cc
+++ b/gcc/config/i386/intelmic-mkoffload.cc
@@ -370,7 +370,7 @@ generate_target_offloadend_file (const char *target_compiler)
 
 /* Generates object file with the host side descriptor.  */
 static const char *
-generate_host_descr_file (const char *host_compiler)
+generate_host_descr_file (const char *host_compiler, uint32_t omp_requires)
 {
   char *dump_filename = concat (dumppfx, "_host_descr.c", NULL);
   const char *src_filename = save_temps
@@ -386,39 +386,50 @@ generate_host_descr_file (const char *host_compiler)
   if (!src_file)
 fatal_error (input_location, "cannot open '%s'", src_filename);
 
+  fprintf (src_file, "#include \n\n");
+
   fprintf (src_file,
 	   "extern const void *const __OFFLOAD_TABLE__;\n"
 	   "extern const void *const __offload_image_intelmic_start;\n"
 	   "extern const void *const __offload_image_intelmic_end;\n\n"
 
-	   "static const void *const __offload_target_data[] = {\n"
+	   "static const struct intelmic_data {\n"
+	   "  uintptr_t omp_requires_mask;\n"
+	   "  const void *const image_start;\n"
+	   "  const void *const image_end;\n"
+	   "} intelmic_data = {\n"
+	   "  %d,\n"
 	   "  &__offload_image_intelmic_start, &__offload_image_intelmic_end\n"
-	   "};\n\n");
+	   "};\n\n", omp_requires);
 
   fprintf (src_file,
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (const void *, int, const void *);\n"
+	   "void GOMP_offload_register_ver (unsigned, const void *, int, const void *);\n"
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_unregister (const void *, int, const void *);\n\n"
+	   "void GOMP_offload_unregister_ver (unsigned, const void *, int, const void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
-	   "  GOMP_offload_regi

[PATCH] Speedup update-ssa some more

2022-07-07 Thread Richard Biener via Gcc-patches

The following avoids copying an sbitmap and one traversal by avoiding
to re-allocate old_ssa_names when not necessary.  In addition this
actually checks what the comment before PHI insert iterating promises,
that the old_ssa_names set does not grow.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

* tree-into-ssa.cc (iterating_old_ssa_names): New.
(add_new_name_mapping): Grow {new,old}_ssa_names separately
and only when actually needed.  Assert we are not growing
the old_ssa_names set when iterating over it.
(update_ssa): Remove old_ssa_names copying and empty_p
query, note we are iterating over it and expect no set changes.
---
 gcc/tree-into-ssa.cc | 36 
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index c90651c3a89..9f45e62c6d0 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -587,6 +587,8 @@ add_to_repl_tbl (tree new_tree, tree old)
   bitmap_set_bit (*set, SSA_NAME_VERSION (old));
 }
 
+/* Debugging aid to fence old_ssa_names changes when iterating over it.  */
+static bool iterating_old_ssa_names;
 
 /* Add a new mapping NEW_TREE -> OLD REPL_TBL.  Every entry N_i in REPL_TBL
represents the set of names O_1 ... O_j replaced by N_i.  This is
@@ -602,10 +604,15 @@ add_new_name_mapping (tree new_tree, tree old)
 
   /* We may need to grow NEW_SSA_NAMES and OLD_SSA_NAMES because our
  caller may have created new names since the set was created.  */
-  if (SBITMAP_SIZE (new_ssa_names) <= num_ssa_names - 1)
+  if (SBITMAP_SIZE (new_ssa_names) <= SSA_NAME_VERSION (new_tree))
 {
   unsigned int new_sz = num_ssa_names + NAME_SETS_GROWTH_FACTOR;
   new_ssa_names = sbitmap_resize (new_ssa_names, new_sz, 0);
+}
+  if (SBITMAP_SIZE (old_ssa_names) <= SSA_NAME_VERSION (old))
+{
+  gcc_assert (!iterating_old_ssa_names);
+  unsigned int new_sz = num_ssa_names + NAME_SETS_GROWTH_FACTOR;
   old_ssa_names = sbitmap_resize (old_ssa_names, new_sz, 0);
 }
 
@@ -619,8 +626,11 @@ add_new_name_mapping (tree new_tree, tree old)
 
   /* Register NEW_TREE and OLD in NEW_SSA_NAMES and OLD_SSA_NAMES,
  respectively.  */
+  if (iterating_old_ssa_names)
+gcc_assert (bitmap_bit_p (old_ssa_names, SSA_NAME_VERSION (old)));
+  else
+bitmap_set_bit (old_ssa_names, SSA_NAME_VERSION (old));
   bitmap_set_bit (new_ssa_names, SSA_NAME_VERSION (new_tree));
-  bitmap_set_bit (old_ssa_names, SSA_NAME_VERSION (old));
 }
 
 
@@ -3460,20 +3470,14 @@ update_ssa (unsigned update_flags)
bitmap_initialize (&dfs[bb->index], &bitmap_default_obstack);
   compute_dominance_frontiers (dfs);
 
-  if (bitmap_first_set_bit (old_ssa_names) >= 0)
-   {
- sbitmap_iterator sbi;
-
- /* insert_update_phi_nodes_for will call add_new_name_mapping
-when inserting new PHI nodes, so the set OLD_SSA_NAMES
-will grow while we are traversing it (but it will not
-gain any new members).  Copy OLD_SSA_NAMES to a temporary
-for traversal.  */
- auto_sbitmap tmp (SBITMAP_SIZE (old_ssa_names));
- bitmap_copy (tmp, old_ssa_names);
- EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, sbi)
-   insert_updated_phi_nodes_for (ssa_name (i), dfs, update_flags);
-   }
+  /* insert_update_phi_nodes_for will call add_new_name_mapping
+when inserting new PHI nodes, but it will not add any
+new members to OLD_SSA_NAMES.  */
+  iterating_old_ssa_names = true;
+  sbitmap_iterator sbi;
+  EXECUTE_IF_SET_IN_BITMAP (old_ssa_names, 0, i, sbi)
+   insert_updated_phi_nodes_for (ssa_name (i), dfs, update_flags);
+  iterating_old_ssa_names = false;
 
   symbols_to_rename.qsort (insert_updated_phi_nodes_compare_uids);
   FOR_EACH_VEC_ELT (symbols_to_rename, i, sym)
-- 
2.35.3

[PATCH] lto-plugin: use locking only for selected targets

2022-07-07 Thread Martin Liška

For now, support locking only for linux targets that are different from
riscv* where the target depends on libatomic (and fails during
bootstrap).

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

PR lto/106170

lto-plugin/ChangeLog:

* configure.ac: Configure HAVE_PTHREAD_LOCKING.
* lto-plugin.c (LOCK_SECTION): New.
(UNLOCK_SECTION): New.
(claim_file_handler): Use the newly added macros.
(onload): Likewise.
* config.h.in: Regenerate.
* configure: Regenerate.
---
 lto-plugin/config.h.in  |  4 ++--
 lto-plugin/configure| 20 ++--
 lto-plugin/configure.ac | 17 +++--
 lto-plugin/lto-plugin.c | 30 --
 4 files changed, 51 insertions(+), 20 deletions(-)

diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in
index 029e782f1ee..bf269f000d2 100644
--- a/lto-plugin/config.h.in
+++ b/lto-plugin/config.h.in
@@ -9,8 +9,8 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_MEMORY_H
 
-/* Define to 1 if pthread.h is present. */
-#undef HAVE_PTHREAD_H
+/* Define if the system-provided pthread locking mechanism. */
+#undef HAVE_PTHREAD_LOCKING
 
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STDINT_H
diff --git a/lto-plugin/configure b/lto-plugin/configure
index aaa91a63623..7ea54e6008f 100755
--- a/lto-plugin/configure
+++ b/lto-plugin/configure
@@ -6011,14 +6011,22 @@ fi
 
 
 # Check for thread headers.
-ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" 
"$ac_includes_default"
-if test "x$ac_cv_header_pthread_h" = xyes; then :
+use_locking=no
 
-$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h
+case $target in
+  riscv*)
+# do not use locking as pthread depends on libatomic
+;;
+  *-linux*)
+use_locking=yes
+;;
+esac
 
-fi
+if test x$use_locking = xyes; then
 
+$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h
 
+fi
 
 case `pwd` in
   *\ * | *\*)
@@ -12091,7 +12099,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12094 "configure"
+#line 12102 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12197,7 +12205,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12200 "configure"
+#line 12208 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac
index c2ec512880f..69bc5139193 100644
--- a/lto-plugin/configure.ac
+++ b/lto-plugin/configure.ac
@@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test 
"x$lto_plugin_use_symver" = xgnu
 AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = 
xsun])
 
 # Check for thread headers.
-AC_CHECK_HEADER(pthread.h,
-  [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])])
+use_locking=no
+
+case $target in
+  riscv*)
+# do not use locking as pthread depends on libatomic
+;;
+  *-linux*)
+use_locking=yes
+;;
+esac
+
+if test x$use_locking = xyes; then
+  AC_DEFINE(HAVE_PTHREAD_LOCKING, 1,
+   [Define if the system-provided pthread locking mechanism.])
+fi
 
 AM_PROG_LIBTOOL
 ACX_LT_HOST_FLAGS
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 635e126946b..cba58f5999b 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -40,11 +40,7 @@ along with this program; see the file COPYING3.  If not see
 
 #ifdef HAVE_CONFIG_H
 #include "config.h"
-#if !HAVE_PTHREAD_H
-#error POSIX threads are mandatory dependency
 #endif
-#endif
-
 #if HAVE_STDINT_H
 #include 
 #endif
@@ -59,7 +55,9 @@ along with this program; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#if HAVE_PTHREAD_LOCKING
 #include 
+#endif
 #ifdef HAVE_SYS_WAIT_H
 #include 
 #endif
@@ -162,9 +160,18 @@ enum symbol_style
   ss_uscore,   /* Underscore prefix all symbols.  */
 };
 
+#if HAVE_PTHREAD_LOCKING
 /* Plug-in mutex.  */
 static pthread_mutex_t plugin_lock;
 
+#define LOCK_SECTION pthread_mutex_lock (&plugin_lock)
+#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock)
+#else
+
+#define LOCK_SECTION
+#define UNLOCK_SECTION
+#endif
+
 static char *arguments_file_name;
 static ld_plugin_register_claim_file register_claim_file;
 static ld_plugin_register_all_symbols_read register_all_symbols_read;
@@ -1270,18 +1277,18 @@ claim_file_handler (const struct ld_plugin_input_file 
*file, int *claimed)
  lto_file.symtab.syms);
   check (status == LDPS_OK, LDPL_FATAL, "could not add symbols");
 
-  pthread_mutex_lock (&plugin_lock);
+  LOCK_SECTION;
   num_claimed_files++;
   claimed_files =
xrealloc (claimed_files,
  num_claimed_files * sizeof (struct plugin_file_info));
   claimed_files[num_claimed_files - 1] = lto_file;
-  pt

Re: [statistics.cc] ICE in get_function_name with fortran test-case

2022-07-07 Thread Richard Biener via Gcc-patches

On Thu, Jul 7, 2022 at 12:44 PM Prathamesh Kulkarni
 wrote:
>
> Hi,
> My recent commit to emit asm name with -fdump-statistics-asmname
> caused following ICE
> for attached fortran test case.
>
> during IPA pass: icf
> power.fppized.f90:6:26:
>
> 6 | END SUBROUTINE power_print
>   |  ^
> internal compiler error: Segmentation fault
> 0xfddc13 crash_signal
> ../../gcc/gcc/toplev.cc:322
> 0x7f6f940de51f ???
> ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
> 0xfc909d get_function_name
> ../../gcc/gcc/statistics.cc:124
> 0xfc929f statistics_fini_pass_2(statistics_counter**, void*)
> ../../gcc/gcc/statistics.cc:175
> 0xfc94a4 void hash_table xcallocator>::traverse_noresize &(statistics_fini_pass_2(statistics_counter**, void*))>(void*)
> ../../gcc/gcc/hash-table.h:1084
> 0xfc94a4 statistics_fini_pass()
> ../../gcc/gcc/statistics.cc:219
> 0xef12bc execute_todo
> ../../gcc/gcc/passes.cc:2142
>
> This happens because fn was passed NULL in get_function_name.
> The patch adds a check to see if fn is NULL before checking for
> DECL_ASSEMBLER_NAME_SET_P, which fixes the issue.
> In case the fn is NULL, it calls function_name(NULL) as per old behavior,
> which returns "(nofn)".
>
> Bootstrap+tested on x86_64-linux-gnu.
> OK to commit ?
OK

>
> Thanks,
> Prathamesh

Re: [PATCH] lto-plugin: use locking only for selected targets

2022-07-07 Thread Richard Biener via Gcc-patches

On Thu, Jul 7, 2022 at 1:43 PM Martin Liška  wrote:
>
> For now, support locking only for linux targets that are different from
> riscv* where the target depends on libatomic (and fails during
> bootstrap).
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK - that also resolves the mingw issue, correct?  I suppose we need
to be careful to not advertise v1 API (which includes threadsafeness)
when not HAVE_PTHREAD_LOCKING.

Thanks,
Richard.

> Thanks,
> Martin
>
> PR lto/106170
>
> lto-plugin/ChangeLog:
>
> * configure.ac: Configure HAVE_PTHREAD_LOCKING.
> * lto-plugin.c (LOCK_SECTION): New.
> (UNLOCK_SECTION): New.
> (claim_file_handler): Use the newly added macros.
> (onload): Likewise.
> * config.h.in: Regenerate.
> * configure: Regenerate.
> ---
>  lto-plugin/config.h.in  |  4 ++--
>  lto-plugin/configure| 20 ++--
>  lto-plugin/configure.ac | 17 +++--
>  lto-plugin/lto-plugin.c | 30 --
>  4 files changed, 51 insertions(+), 20 deletions(-)
>
> diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in
> index 029e782f1ee..bf269f000d2 100644
> --- a/lto-plugin/config.h.in
> +++ b/lto-plugin/config.h.in
> @@ -9,8 +9,8 @@
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_MEMORY_H
>
> -/* Define to 1 if pthread.h is present. */
> -#undef HAVE_PTHREAD_H
> +/* Define if the system-provided pthread locking mechanism. */
> +#undef HAVE_PTHREAD_LOCKING
>
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_STDINT_H
> diff --git a/lto-plugin/configure b/lto-plugin/configure
> index aaa91a63623..7ea54e6008f 100755
> --- a/lto-plugin/configure
> +++ b/lto-plugin/configure
> @@ -6011,14 +6011,22 @@ fi
>
>
>  # Check for thread headers.
> -ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" 
> "$ac_includes_default"
> -if test "x$ac_cv_header_pthread_h" = xyes; then :
> +use_locking=no
>
> -$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h
> +case $target in
> +  riscv*)
> +# do not use locking as pthread depends on libatomic
> +;;
> +  *-linux*)
> +use_locking=yes
> +;;
> +esac
>
> -fi
> +if test x$use_locking = xyes; then
>
> +$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h
>
> +fi
>
>  case `pwd` in
>*\ * | *\*)
> @@ -12091,7 +12099,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 12094 "configure"
> +#line 12102 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> @@ -12197,7 +12205,7 @@ else
>lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
>lt_status=$lt_dlunknown
>cat > conftest.$ac_ext <<_LT_EOF
> -#line 12200 "configure"
> +#line 12208 "configure"
>  #include "confdefs.h"
>
>  #if HAVE_DLFCN_H
> diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac
> index c2ec512880f..69bc5139193 100644
> --- a/lto-plugin/configure.ac
> +++ b/lto-plugin/configure.ac
> @@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test 
> "x$lto_plugin_use_symver" = xgnu
>  AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = 
> xsun])
>
>  # Check for thread headers.
> -AC_CHECK_HEADER(pthread.h,
> -  [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])])
> +use_locking=no
> +
> +case $target in
> +  riscv*)
> +# do not use locking as pthread depends on libatomic
> +;;
> +  *-linux*)
> +use_locking=yes
> +;;
> +esac
> +
> +if test x$use_locking = xyes; then
> +  AC_DEFINE(HAVE_PTHREAD_LOCKING, 1,
> +   [Define if the system-provided pthread locking mechanism.])
> +fi
>
>  AM_PROG_LIBTOOL
>  ACX_LT_HOST_FLAGS
> diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
> index 635e126946b..cba58f5999b 100644
> --- a/lto-plugin/lto-plugin.c
> +++ b/lto-plugin/lto-plugin.c
> @@ -40,11 +40,7 @@ along with this program; see the file COPYING3.  If not see
>
>  #ifdef HAVE_CONFIG_H
>  #include "config.h"
> -#if !HAVE_PTHREAD_H
> -#error POSIX threads are mandatory dependency
>  #endif
> -#endif
> -
>  #if HAVE_STDINT_H
>  #include 
>  #endif
> @@ -59,7 +55,9 @@ along with this program; see the file COPYING3.  If not see
>  #include 
>  #include 
>  #include 
> +#if HAVE_PTHREAD_LOCKING
>  #include 
> +#endif
>  #ifdef HAVE_SYS_WAIT_H
>  #include 
>  #endif
> @@ -162,9 +160,18 @@ enum symbol_style
>ss_uscore,   /* Underscore prefix all symbols.  */
>  };
>
> +#if HAVE_PTHREAD_LOCKING
>  /* Plug-in mutex.  */
>  static pthread_mutex_t plugin_lock;
>
> +#define LOCK_SECTION pthread_mutex_lock (&plugin_lock)
> +#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock)
> +#else
> +
> +#define LOCK_SECTION
> +#define UNLOCK_SECTION
> +#endif
> +
>  static char *arguments_file_name;
>  static ld_plugin_register_claim_file register_claim_file;
>  static ld_plugin_register_all_symbols_r

Re: [PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

2022-07-07 Thread Xi Ruoyao via Gcc-patches

On Thu, 2022-07-07 at 18:30 +0800, Lulu Cheng wrote:

/* snip */


> diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
> b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
> new file mode 100644
> index 000..c7bd71dde93
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
> @@ -0,0 +1,15 @@
> +/* Test that LoongArch backend stack drop operation optimized.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mabi=lp64d" } */
> +/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */
> +
> +#include 

It's better to hard code "extern int printf(char *, ...);" here, so the
test case won't unnecessarily depend on libc header.

LGTM otherwise.

> +
> +int main()
> +{
> +  char buf[1024 * 12];
> +  printf ("%p\n", buf);
> +  return 0;
> +}
> +

Re: [PATCH] lto-plugin: use locking only for selected targets

2022-07-07 Thread Rainer Orth

Richard Biener via Gcc-patches  writes:

>> +if test x$use_locking = xyes; then
>> +  AC_DEFINE(HAVE_PTHREAD_LOCKING, 1,
>> +   [Define if the system-provided pthread locking mechanism.])

This isn't even a sentence.  At least I cannot parse it.  Besides, it
seems to be misnamed since the test doesn't check if pthread_mutex_lock
and friends are present on the target, but if they should be used.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Tobias Burnus


Hi Andrew,

On 07.07.22 12:34, Andrew Stubbs wrote:

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs without
modifying the code.

...

gcc/ChangeLog:

  * omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
  * omp-low.cc (omp_enable_pinned_mode): New function.
  (execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

  * config/linux/allocator.c (always_pinned_mode): New variable.
  (GOMP_enable_pinned_mode): New function.
  (linux_memspace_alloc): Disable pinning when always_pinned_mode set.
  (linux_memspace_calloc): Likewise.
  (linux_memspace_free): Likewise.
  (linux_memspace_realloc): Likewise.
  * libgomp.map: Add GOMP_enable_pinned_mode.
  * testsuite/libgomp.c/alloc-pinned-7.c: New test.
...

...

--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
input_location = saved_location;
  }

+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to trigger
+   it by calling GOMP_enable_pinned mode before the program proper runs.  */
+
+static void
+omp_enable_pinned_mode ()


Is there a reason not to use the mechanism of OpenMP's 'requires'
directive for this?

(Okay, I have to admit that the final patch was only committed on
Monday. But still ...)

It looks very similar in spirit. I don't know whether there are issues
of having -foffload-memory=pinned in some TU and not, but that could be
handled in a similar way to GOMP_REQUIRES_TARGET_USED.

For requires, omp_requires_mask is streamed out if
OMP_REQUIRES_TARGET_USED and g->have_offload. (For completeness, it also
requires ENABLE_OFFLOADING.)

This data is read in by all lto1 (in lto-cgraph.cc) and checked for
consistency. This data is then also passed on to *mkoffload.cc.

And in libgomp, it is processed by GOMP_register_ver.

Likewise, the 'requires' mechanism could then also be used in '[PATCH
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [PATCH v3] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

2022-07-07 Thread Lulu Cheng




在 2022/7/7 下午7:51, Xi Ruoyao 写道:

On Thu, 2022-07-07 at 18:30 +0800, Lulu Cheng wrote:

/* snip */



diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
new file mode 100644
index 000..c7bd71dde93
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -0,0 +1,15 @@
+/* Test that LoongArch backend stack drop operation optimized.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */
+
+#include 

It's better to hard code "extern int printf(char *, ...);" here, so the
test case won't unnecessarily depend on libc header.

LGTM otherwise.


OK! Thanks!:-)

[PATCH] lto-dump: Do not print output file

2022-07-07 Thread Martin Liška

Right now the following is printed:

lto-dump
.file   ""
.ident  "GCC: (GNU) 13.0.0 20220707 (experimental)"
.section.note.GNU-stack,"",@progbits

After the patch we print -help and do not emit any assembly output:

lto-dump
Usage: lto-dump [OPTION]... SUB_COMMAND [OPTION]...

LTO dump tool command line options.

  -list [options]   Dump the symbol list.
-demangle   Dump the demangled output.
-defined-only   Dump only the defined symbols.
...

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/lto/ChangeLog:

* lto-dump.cc (lto_main): Exit in the function
as we don't want any LTO bytecode processing.

gcc/ChangeLog:

* toplev.cc (init_asm_output): Do not init asm_out_file.
---
 gcc/lto/lto-dump.cc | 16 ++--
 gcc/toplev.cc   |  2 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/lto/lto-dump.cc b/gcc/lto/lto-dump.cc
index f88486b5143..f3d852df51f 100644
--- a/gcc/lto/lto-dump.cc
+++ b/gcc/lto/lto-dump.cc
@@ -316,7 +316,10 @@ lto_main (void)
 {
   quiet_flag = true;
   if (flag_lto_dump_tool_help)
-dump_tool_help ();
+{
+  dump_tool_help ();
+  exit (SUCCESS_EXIT_CODE);
+}
 
   /* LTO is called as a front end, even though it is not a front end.
  Because it is called as a front end, TV_PHASE_PARSING and
@@ -369,11 +372,12 @@ lto_main (void)
 {
   /* Dump specific gimple body of specified function.  */
   dump_body ();
-  return;
 }
   else if (flag_dump_callgraph)
-{
-  dump_symtab_graphviz ();
-  return;
-}
+dump_symtab_graphviz ();
+  else
+dump_tool_help ();
+
+  /* Exit right now.  */
+  exit (SUCCESS_EXIT_CODE);
 }
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index a24ad5db438..61d234a9ef4 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -721,7 +721,7 @@ init_asm_output (const char *name)
 "cannot open %qs for writing: %m", asm_file_name);
 }
 
-  if (!flag_syntax_only)
+  if (!flag_syntax_only && !(global_dc->lang_mask & CL_LTODump))
 {
   targetm.asm_out.file_start ();
 
-- 
2.36.1

Re: [PATCH] lto-plugin: use locking only for selected targets

2022-07-07 Thread Martin Liška

On 7/7/22 13:52, Rainer Orth wrote:
> Richard Biener via Gcc-patches  writes:
> 
>>> +if test x$use_locking = xyes; then
>>> +  AC_DEFINE(HAVE_PTHREAD_LOCKING, 1,
>>> +   [Define if the system-provided pthread locking mechanism.])
> 
> This isn't even a sentence.  At least I cannot parse it.  Besides, it
> seems to be misnamed since the test doesn't check if pthread_mutex_lock
> and friends are present on the target, but if they should be used.

You are right, fixed in v2.

Martin

> 
>   Rainer
> 
From 0c3380e64dc769b4bc00568395468bdf02e74f6f Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 7 Jul 2022 12:15:28 +0200
Subject: [PATCH] lto-plugin: use locking only for selected targets

For now, support locking only for linux targets that are different from
riscv* where the target depends on libatomic (and fails during
bootstrap).

	PR lto/106170

lto-plugin/ChangeLog:

	* configure.ac: Configure HAVE_PTHREAD_LOCKING.
	* lto-plugin.c (LOCK_SECTION): New.
	(UNLOCK_SECTION): New.
	(claim_file_handler): Use the newly added macros.
	(onload): Likewise.
	* config.h.in: Regenerate.
	* configure: Regenerate.
---
 lto-plugin/config.h.in  |  4 ++--
 lto-plugin/configure| 21 +
 lto-plugin/configure.ac | 17 +++--
 lto-plugin/lto-plugin.c | 29 +++--
 4 files changed, 53 insertions(+), 18 deletions(-)

diff --git a/lto-plugin/config.h.in b/lto-plugin/config.h.in
index 029e782f1ee..8eb9c8aa47d 100644
--- a/lto-plugin/config.h.in
+++ b/lto-plugin/config.h.in
@@ -9,8 +9,8 @@
 /* Define to 1 if you have the  header file. */
 #undef HAVE_MEMORY_H
 
-/* Define to 1 if pthread.h is present. */
-#undef HAVE_PTHREAD_H
+/* Define if the system provides pthread locking mechanism. */
+#undef HAVE_PTHREAD_LOCKING
 
 /* Define to 1 if you have the  header file. */
 #undef HAVE_STDINT_H
diff --git a/lto-plugin/configure b/lto-plugin/configure
index aaa91a63623..870e49b2e62 100755
--- a/lto-plugin/configure
+++ b/lto-plugin/configure
@@ -6011,14 +6011,27 @@ fi
 
 
 # Check for thread headers.
-ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" "$ac_includes_default"
+use_locking=no
+
+case $target in
+  riscv*)
+# do not use locking as pthread depends on libatomic
+;;
+  *-linux*)
+use_locking=yes
+;;
+esac
+
+if test x$use_locking = xyes; then
+  ac_fn_c_check_header_mongrel "$LINENO" "pthread.h" "ac_cv_header_pthread_h" "$ac_includes_default"
 if test "x$ac_cv_header_pthread_h" = xyes; then :
 
-$as_echo "#define HAVE_PTHREAD_H 1" >>confdefs.h
+$as_echo "#define HAVE_PTHREAD_LOCKING 1" >>confdefs.h
 
 fi
 
 
+fi
 
 case `pwd` in
   *\ * | *\	*)
@@ -12091,7 +12104,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12094 "configure"
+#line 12107 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12197,7 +12210,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12200 "configure"
+#line 12213 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
diff --git a/lto-plugin/configure.ac b/lto-plugin/configure.ac
index c2ec512880f..18eb4f60b0a 100644
--- a/lto-plugin/configure.ac
+++ b/lto-plugin/configure.ac
@@ -88,8 +88,21 @@ AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_GNU, [test "x$lto_plugin_use_symver" = xgnu
 AM_CONDITIONAL(LTO_PLUGIN_USE_SYMVER_SUN, [test "x$lto_plugin_use_symver" = xsun])
 
 # Check for thread headers.
-AC_CHECK_HEADER(pthread.h,
-  [AC_DEFINE(HAVE_PTHREAD_H, 1, [Define to 1 if pthread.h is present.])])
+use_locking=no
+
+case $target in
+  riscv*)
+# do not use locking as pthread depends on libatomic
+;;
+  *-linux*)
+use_locking=yes
+;;
+esac
+
+if test x$use_locking = xyes; then
+  AC_CHECK_HEADER(pthread.h,
+[AC_DEFINE(HAVE_PTHREAD_LOCKING, 1, [Define if the system provides pthread locking mechanism.])])
+fi
 
 AM_PROG_LIBTOOL
 ACX_LT_HOST_FLAGS
diff --git a/lto-plugin/lto-plugin.c b/lto-plugin/lto-plugin.c
index 635e126946b..7927dca60a4 100644
--- a/lto-plugin/lto-plugin.c
+++ b/lto-plugin/lto-plugin.c
@@ -40,11 +40,7 @@ along with this program; see the file COPYING3.  If not see
 
 #ifdef HAVE_CONFIG_H
 #include "config.h"
-#if !HAVE_PTHREAD_H
-#error POSIX threads are mandatory dependency
 #endif
-#endif
-
 #if HAVE_STDINT_H
 #include 
 #endif
@@ -59,7 +55,9 @@ along with this program; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#if HAVE_PTHREAD_LOCKING
 #include 
+#endif
 #ifdef HAVE_SYS_WAIT_H
 #include 
 #endif
@@ -162,9 +160,17 @@ enum symbol_style
   ss_uscore,	/* Underscore prefix all symbols.  */
 };
 
+#if HAVE_PTHREAD_LOCKING
 /* Plug-in mutex.  */
 static pthread_mutex_t plugin_lock;
 
+#define LOCK_SECTION pthread_mutex_lock (&plugin_lock)
+#define UNLOCK_SECTION pthread_mutex_unlock (&plugin_lock)
+#else
+#define LOCK_SECTION
+#define UNLOCK_SECTION
+#endif
+
 static char

Re: [PATCH] lto-plugin: use locking only for selected targets

2022-07-07 Thread Martin Liška

On 7/7/22 13:46, Richard Biener wrote:
> OK - that also resolves the mingw issue, correct?  I suppose we need

Yes.

> to be careful to not advertise v1 API (which includes threadsafeness)
> when not HAVE_PTHREAD_LOCKING.

Will reflect that in the patch.

I'm going to push it now.

Martin

Fix one issue in OpenMP 'requires' directive diagnostics (was: [Patch][v5] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Thomas Schwinge

Hi!

On 2022-07-01T23:08:16+0200, Tobias Burnus  wrote:
> Updated version attached – I hope I got everything right, but I start to
> get tired, I am not 100% sure.

..., and so the obligatory copy'n'past-o ;-) crept in:

> --- a/gcc/lto-cgraph.cc
> +++ b/gcc/lto-cgraph.cc

> @@ -1773,6 +1804,10 @@ input_offload_tables (bool do_force_output)
>struct lto_file_decl_data **file_data_vec = lto_get_file_decl_data ();
>struct lto_file_decl_data *file_data;
>unsigned int j = 0;
> +  const char *requires_fn = NULL;
> +  tree requires_decl = NULL_TREE;
> +
> +  omp_requires_mask = (omp_requires) 0;
>
>while ((file_data = file_data_vec[j++]))
>  {
> @@ -1784,6 +1819,7 @@ input_offload_tables (bool do_force_output)
>if (!ib)
>   continue;
>
> +  tree tmp_decl = NULL_TREE;
>enum LTO_symtab_tags tag
>   = streamer_read_enum (ib, LTO_symtab_tags, LTO_symtab_last_tag);
>while (tag)
> @@ -1799,6 +1835,7 @@ input_offload_tables (bool do_force_output)
>LTO mode.  */
> if (do_force_output)
>   cgraph_node::get (fn_decl)->mark_force_output ();
> +   tmp_decl = fn_decl;
>   }
> else if (tag == LTO_symtab_variable)
>   {
> @@ -1810,6 +1847,72 @@ input_offload_tables (bool do_force_output)
>may be no refs to var_decl in offload LTO mode.  */
> if (do_force_output)
>   varpool_node::get (var_decl)->force_output = 1;
> +   tmp_decl = var_decl;
> + }
> +   else if (tag == LTO_symtab_edge)
> + {
> +   static bool error_emitted = false;
> +   HOST_WIDE_INT val = streamer_read_hwi (ib);
> +
> +   if (omp_requires_mask == 0)
> + {
> +   omp_requires_mask = (omp_requires) val;
> +   requires_decl = tmp_decl;
> +   requires_fn = file_data->file_name;
> + }
> +   else if (omp_requires_mask != val && !error_emitted)
> + {
> +   const char *fn1 = requires_fn;
> +   if (requires_decl != NULL_TREE)
> + {
> +   while (DECL_CONTEXT (requires_decl) != NULL_TREE
> +  && TREE_CODE (requires_decl) != 
> TRANSLATION_UNIT_DECL)
> + requires_decl = DECL_CONTEXT (requires_decl);
> +   if (requires_decl != NULL_TREE)
> + fn1 = IDENTIFIER_POINTER (DECL_NAME (requires_decl));
> + }
> +
> +   const char *fn2 = file_data->file_name;
> +   if (tmp_decl != NULL_TREE)
> + {
> +   while (DECL_CONTEXT (tmp_decl) != NULL_TREE
> +  && TREE_CODE (tmp_decl) != TRANSLATION_UNIT_DECL)
> + tmp_decl = DECL_CONTEXT (tmp_decl);
> +   if (tmp_decl != NULL_TREE)
> + fn2 = IDENTIFIER_POINTER (DECL_NAME (requires_decl));
> + }

... here: tmp_decl' not 'requires_decl'.  OK to push the attached
"Fix one issue in OpenMP 'requires' directive diagnostics"?

I'd even push that one "as obvious", but thought I'd ask whether you
maybe have a quick idea about the XFAILs that I'm adding?  (I'm otherwise
not planning on resolving that issue at this time.)

> +
> +   char buf1[sizeof ("unified_address, unified_shared_memory, "
> + "reverse_offload")];
> +   char buf2[sizeof ("unified_address, unified_shared_memory, "
> + "reverse_offload")];
> +   omp_requires_to_name (buf2, sizeof (buf2),
> + val != OMP_REQUIRES_TARGET_USED
> + ? val
> + : (HOST_WIDE_INT) omp_requires_mask);
> +   if (val != OMP_REQUIRES_TARGET_USED
> +   && omp_requires_mask != OMP_REQUIRES_TARGET_USED)
> + {
> +   omp_requires_to_name (buf1, sizeof (buf1),
> + omp_requires_mask);
> +   error ("OpenMP % directive with non-identical "
> +  "clauses in multiple compilation units: %qs vs. "
> +  "%qs", buf1, buf2);
> +   inform (UNKNOWN_LOCATION, "%qs has %qs", fn1, buf1);
> +   inform (UNKNOWN_LOCATION, "%qs has %qs", fn2, buf2);
> + }
> +   else
> + {
> +   error ("OpenMP % directive with %qs specified "
> +  "only in some compilation units", buf2);
> +   inform (UNKNOWN_LOCATION, "%qs has %qs",
> +   val != OMP_REQUIRES_TARGET_USED ? fn2 : fn1,
> +   buf2);
> +   inform (UNKNOWN_LOCATION, "but %qs has not",
> +   val != OMP_REQUIRES_TARGET_USED ? f

[PATCH][RFC] More update-ssa speedup

2022-07-07 Thread Richard Biener via Gcc-patches

When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks.  We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.

For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:

 tree loop unswitching  :  11.25 (  3%)   0.09 (  5%)  11.53 (  3%) 
   36M (  9%)
 `- tree SSA incremental:  35.74 (  9%)   0.07 (  4%)  36.65 (  9%) 
 2734k (  1%)

improves to

 tree loop unswitching  :  10.21 (  3%)   0.05 (  3%)  11.50 (  3%) 
   36M (  9%)
 `- tree SSA incremental:   0.66 (  0%)   0.02 (  1%)   0.49 (  0%) 
 2734k (  1%)

for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

It's probably the last change that makes a visible difference for
general update-ssa and any specialized manual updating that would be
possible as well would not show up in better numbers than above
(unless I manage to complicate the testcase more).

Comments?

Thanks,
Richard.

* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
(rewrite_update_dom_walker::m_in_region_flag): New.
(rewrite_update_dom_walker::before_dom_children): If the region
to update is marked, STOP at exits.
(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
to be updated.
(dump_update_ssa): Use bitmap_empty_p.
(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
TODO_update_ssa_no_phi.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
pending update_ssa to the caller.
---
 gcc/tree-cfgcleanup.cc |  6 ++-
 gcc/tree-into-ssa.cc   | 97 --
 2 files changed, 90 insertions(+), 13 deletions(-)

diff --git a/gcc/tree-cfgcleanup.cc b/gcc/tree-cfgcleanup.cc
index b9ff6896ce6..3535a7e28a4 100644
--- a/gcc/tree-cfgcleanup.cc
+++ b/gcc/tree-cfgcleanup.cc
@@ -1095,7 +1095,11 @@ cleanup_tree_cfg_noloop (unsigned ssa_update_flags)
   /* After doing the above SSA form should be valid (or an update SSA
  should be required).  */
   if (ssa_update_flags)
-update_ssa (ssa_update_flags);
+{
+  timevar_pop (TV_TREE_CLEANUP_CFG);
+  update_ssa (ssa_update_flags);
+  timevar_push (TV_TREE_CLEANUP_CFG);
+}
 
   /* Compute dominator info which we need for the iterative process below.  */
   if (!dom_info_available_p (CDI_DOMINATORS))
diff --git a/gcc/tree-into-ssa.cc b/gcc/tree-into-ssa.cc
index 9f45e62c6d0..be71b629f97 100644
--- a/gcc/tree-into-ssa.cc
+++ b/gcc/tree-into-ssa.cc
@@ -240,7 +240,8 @@ enum rewrite_mode {
 
 /* Incrementally update the SSA web by replacing existing SSA
names with new ones.  See update_ssa for details.  */
-REWRITE_UPDATE
+REWRITE_UPDATE,
+REWRITE_UPDATE_REGION
 };
 
 /* The set of symbols we ought to re-write into SSA form in update_ssa.  */
@@ -2155,11 +2156,14 @@ rewrite_update_phi_arguments (basic_block bb)
 class rewrite_update_dom_walker : public dom_walker
 {
 public:
-  rewrite_update_dom_walker (cdi_direction direction)
-: dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1) {}
+  rewrite_update_dom_walker (cdi_direction direction, int in_region_flag = -1)
+: dom_walker (direction, ALL_BLOCKS, (int *)(uintptr_t)-1),
+  m_in_region_flag (in_region_flag) {}
 
   edge before_dom_children (basic_block) final override;
   void after_dom_children (basic_block) final override;
+
+  int m_in_region_flag;
 };
 
 /* Initialization of block data structures for the incremental SSA
@@ -2179,6 +2183,10 @@ rewrite_update_dom_walker::before_dom_children 
(basic_block bb)
   /* Mark the unwind point for this block.  */
   block_defs_stack.safe_push (NULL_TREE);
 
+  if (m_in_region_flag != -1
+  && !(bb->flags & m_in_region_flag))
+return STOP;
+
   if (!bitmap_bit_p (blocks_to_update, bb->index))
 return NULL;
 
@@ -2270,8 +2278,8 @@ rewrite_update_dom_walker::after_dom_children 
(basic_block bb ATTRIBUTE_UNUSED)
WHAT indicates what actions will be taken by the renamer (see enum
   rewrite_mode).
 
-   BLOCKS are the set of interesting blocks for the dominator walker
-  to process.  If this set is NULL, then all the nodes dominated
+   REGION is a SEME region of interesting blocks for the dominator walker
+  to process.  If this set is invalid, then all the nodes dominated
   by ENTRY are walked.  Otherwise, blocks dominated by ENTRY that
   are not present in

Re: [GCC 13][PATCH] PR101836: Add a new option -fstrict-flex-array[=n] and use it in __builtin_object_size

2022-07-07 Thread Qing Zhao via Gcc-patches



> On Jul 7, 2022, at 4:02 AM, Richard Biener  wrote:
> 
> On Wed, Jul 6, 2022 at 4:20 PM Qing Zhao  wrote:
>> 
>> (Sorry for the late reply, just came back from a short vacation.)
>> 
>>> On Jul 4, 2022, at 2:49 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Fri, Jul 1, 2022 at 5:32 PM Martin Sebor  wrote:
 
 On 7/1/22 08:01, Qing Zhao wrote:
> 
> 
>> On Jul 1, 2022, at 8:59 AM, Jakub Jelinek  wrote:
>> 
>> On Fri, Jul 01, 2022 at 12:55:08PM +, Qing Zhao wrote:
>>> If so, comparing to the current implemenation to have all the checking 
>>> in middle-end, what’s the
>>> major benefit of moving part of the checking into FE, and leaving the 
>>> other part in middle-end?
>> 
>> The point is recording early what FIELD_DECLs could be vs. can't 
>> possibly be
>> treated like flexible array members and just use that flag in the 
>> decisions
>> in the current routines in addition to what it is doing.
> 
> Okay.
> 
> Based on the discussion so far, I will do the following:
> 
> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL;
> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL 
> based on [0], [1],
>[] and the option -fstrict-flex-array, and whether it’s the last field 
> of the DECL_CONTEXT.
> 3. In Middle end,  Add a new utility routine is_flexible_array_member_p, 
> which bases on
>DECL_NOT_FLEXARRAY + array_at_struct_end_p to decide whether the array
>reference is a real flexible array member reference.
>>> 
>>> I would just update all existing users, not introduce another wrapper
>>> that takes DECL_NOT_FLEXARRAY
>>> into account additionally.
>> 
>> Okay.
>>> 
> 
> 
> Middle end currently is quite mess, array_at_struct_end_p, 
> component_ref_size, and all the phases that
> use these routines need to be updated, + new testing cases for each of 
> the phases.
> 
> 
> So, I still plan to separate the patch set into 2 parts:
> 
>  Part A:the above 1 + 2 + 3,  and use these new utilities in 
> tree-object-size.cc to resolve PR101836 first.
> Then kernel can use __FORTIFY_SOURCE correctly;
> 
>  Part B:update all other phases with the new utilities + new testing 
> cases + resolving regressions.
> 
> Let me know if you have any comment and suggestion.
 
 It might be worth considering whether it should be possible to control
 the "flexible array" property separately for each trailing array member
 via either a #pragma or an attribute in headers that can't change
 the struct layout but that need to be usable in programs compiled with
 stricter -fstrict-flex-array=N settings.
>>> 
>>> Or an decl attribute.
>> 
>> Yes, it might be necessary to add a corresponding decl attribute
>> 
>> strict_flex_array (N)
>> 
>> Which is attached to a trailing structure array member to provide the user a 
>> finer control when -fstrict-flex-array=N is specified.
>> 
>> So, I will do the following:
>> 
>> 
>> *User interface:
>> 
>> 1. command line option:
>> -fstrict-flex-array=N   (N=0, 1, 2, 3)
>> 2.  decl attribute:
>> strict_flex_array (N)  (N=0, 1, 2, 3)
>> 
>> 
>> *Implementation:
>> 
>> 1. Add a new flag “DECL_NOT_FLEXARRAY” to FIELD_DECL;
>> 2. In C/C++ FE, set the new flag “DECL_NOT_FLEXARRAY” for a FIELD_DECL based 
>> on [0], [1],
>> [], the option -fstrict-flex-array, the attribute strict_flex_array,  
>> and whether it’s the last field
>> of the DECL_CONTEXT.
>> 3. In Middle end,   update all users of “array_at_struct_end_p" or 
>> “component_ref_size”, or any place that treats
>>Trailing array as flexible array member with the new flag  
>> DECL_NOT_FLEXARRAY.
>>(Still think we need a new consistent utility routine here).
>> 
>> 
>> I still plan to separate the patch set into 2 parts:
>> 
>> Part A:the above 1 + 2 + 3,  and use these new utilities in 
>> tree-object-size.cc to resolve PR101836 first.
>>   Then kernel can use __FORTIFY_SOURCE correctly.
>> Part B:update all other phases with the new utilities + new testing 
>> cases + resolving regressions.
>> 
>> 
>> Let me know any more comment or suggestion.
> 
> Sounds good.  Part 3. is "optimization" and reasonable to do
> separately, I'm not sure you need
> 'B' (since we're not supposed to have new utilities), but instead I'd
> do '3.' as part of 'B', just
> changing the pieces th resolve PR101836 for part 'A'.


Okay, I see. Then I will separate the patches to:

Part A:   1 + 2
Part B:  In Middle end, use the new flag in tree-object-size.cc to resolve 
PR101836, then kernel can use __FORTIFY_SOURCE correctly after this;
Part C:  in Middle end, use the new flag in all other places that use 
“array_at_struct_end_p” or “component_ref_size” to make GCC consistently 
 behave for trailing array. 

The reason I separat

Re: Fix one issue in OpenMP 'requires' directive diagnostics (was: [Patch][v5] OpenMP: Move omp requires checks to libgomp)

2022-07-07 Thread Tobias Burnus


Hi Thomas,

On 07.07.22 15:26, Thomas Schwinge wrote:

On 2022-07-01T23:08:16+0200, Tobias Burnus 
wrote:

Updated version attached – I hope I got everything right, but I start to
get tired, I am not 100% sure.

..., and so the obligatory copy'n'past-o;-)  crept in:

...

+  if (tmp_decl != NULL_TREE)
+fn2 = IDENTIFIER_POINTER (DECL_NAME (requires_decl));
+}

... here: tmp_decl' not 'requires_decl'.  OK to push the attached
"Fix one issue in OpenMP 'requires' directive diagnostics"?

Good that you spotted it and thanks for testing + fixing it!

I'd even push that one "as obvious", but thought I'd ask whether you
maybe have a quick idea about the XFAILs that I'm adding?  (I'm otherwise
not planning on resolving that issue at this time.)


(This question relates to what's printed if there is no TRANSLATION_UNIT_DECL.)

 * * *

Pre-remark - the code does:

* If there is any offload_func or offload_var DECL in the TU, it uses
  that one for diagnostic.
  This is always the case if there is a 'declare target' or 'omp target'
  but not when there is only 'omp target data'.
  → In real-world code it likely has a proper name.

* Otherwise, it takes the file name of file_data->file_name.

With -save-temps, that's based on the input files, which
gives a useful output.

When using
  gcc -c *.c
  gcc *.o
the file name is .o - which is also quite useful.

However, when doing
  gcc *.c
which combines compiling and linking in one step, the filename
is /tmp/cc*.o which is not that helpful.

There is no real way to avoid this, unless we explicitly store the
filename or some location_t for the 'requires'. But the used
LTO writer does not support it directly. It can be fixed, but
requires some re-organization and increased intermittent .o file
size. (Cf. https://gcc.gnu.org/pipermail/gcc-patches/2022-June/597496.html
for what it needed and why it does not work.)

However, in the real world, there should be usually a proper message as
(a) It is unlikely to have code which only does 'omp target ... data'
transfer - and no 'omp target'
and
(b) for larger code, separating compilation and linking is common
while for smaller code, 'requires' mismatches is less likely and
also easier to find the file causing the issue.

Still, like always, having a nice diagnostic would not harm :-)

 * * *


Subject: [PATCH] Fix one issue in OpenMP 'requires' directive diagnostics

Fix-up for recent commit 683f11843974f0bdf42f79cdcbb0c2b43c7b81b0
"OpenMP: Move omp requires checks to libgomp".

  gcc/
  * lto-cgraph.cc (input_offload_tables) : Correct
  'fn2' computation.
  libgomp/
  * testsuite/libgomp.c-c++-common/requires-1.c: Add 'dg-note's.
  * testsuite/libgomp.c-c++-common/requires-2.c: Likewise.
  * testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
  * testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
  * testsuite/libgomp.fortran/requires-1.f90: Likewise.


Regarding the patch, it adds 'dg-note' for the location data - and fixes the 
wrong-decl-use bug.

Those LGTM - Thanks!

Regarding the xfail part:


--- a/libgomp/testsuite/libgomp.c-c++-common/requires-7.c

...

  -/* { dg-error "OpenMP 'requires' directive with non-identical clauses in multiple 
compilation units: 'unified_shared_memory' vs. 'unified_address'" "" { target *-*-* 
} 0 }  */
+/* { dg-error "OpenMP 'requires' directive with non-identical clauses in multiple compilation 
units: 'unified_shared_memory' vs. 'unified_address'" "" { target *-*-* } 0 }
+ { dg-note {requires-7\.c' has 'unified_shared_memory'} {} { target *-*-* 
} 0 }
+ TODO There is some issue that we're not seeing the source file name here 
(but a temporary '*.o' instead):
+ { dg-note {requires-7-aux\.c' has 'unified_address'} {} { xfail *-*-* } 0 
}
+ ..., so verify that at least the rest of the diagnostic is correct:
+ { dg-note {' has 'unified_address'} {} { target *-*-* } 0 } */


The requires-7-aux.c file uses, on purpose, only 'omp target enter data'
to trigger the .o name in 'inform' as no decl is written to
offload_func/offload_vars for that TU.

As the testsuite compiles+links the two requires-7*.c files in one step
and is invoked without -save-temps, the used object file names will be
/tmp/cc*.o.

* * *

Regarding the xfail: I think it is fine to have this xfail, but as it is
clear why inform points to /tmp/cc*.o, you could reword the TODO to
state why it goes wrong.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

[PATCH] match.pd: Add new bitwise arithmetic pattern [PR98304]

2022-07-07 Thread Sam Feifer via Gcc-patches

Hi!

This patch is meant to solve a missed optimization in match.pd. It optimizes 
the following expression: n - (((n > 63) ? n : 63) & -64) where the constant 
being negated (in this case -64) is a power of 2 and the sum of the two 
constants is -1. For the signed case, this gets optimized to (n <= 63) ? n : (n 
& 63). For the unsigned case, it gets optimized to (n & 63). In both scenarios, 
the number of instructions produced decreases.

There are also tests for this optimization making sure the optimization happens 
when it is supposed to, and does not happen when it isn't.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR tree-optimization/98304

gcc/ChangeLog:

* match.pd (n - (((n > C1) ? n : C1) & -C2)): New simplification.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr98304-2.c: New test.
* gcc.dg/pr98304-1.c: New test.
---
 gcc/match.pd  | 12 
 .../gcc.c-torture/execute/pr98304-2.c | 37 
 gcc/testsuite/gcc.dg/pr98304-1.c  | 57 +++
 3 files changed, 106 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr98304-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr98304-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 88c6c414881..45aefd96688 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7836,3 +7836,15 @@ and,
 (match (bitwise_induction_p @0 @2 @3)
  (bit_not
   (nop_convert1? (bit_xor@0 (convert2? (lshift integer_onep@1 @2)) @3
+
+/* n - (((n > C1) ? n : C1) & -C2) ->  n & C1 for unsigned case.
+   n - (((n > C1) ? n : C1) & -C2) ->  (n <= C1) ? n : (n & C1) for signed 
case.  */
+(simplify
+  (minus @0 (bit_and (max @0 INTEGER_CST@1) INTEGER_CST@2))
+  (with { auto i = wi::neg (wi::to_wide (@2)); }
+  /* Check if -C2 is a power of 2 and C1 = -C2 - 1.  */
+(if (wi::popcount (i) == 1
+ && (wi::to_wide (@1)) == (i - 1))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0)))
+(bit_and @0 @1)
+  (cond (le @0 @1) @0 (bit_and @0 @1))
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c 
b/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c
new file mode 100644
index 000..114c612db3b
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr98304-2.c
@@ -0,0 +1,37 @@
+/* PR tree-optimization/98304 */
+
+#include "../../gcc.dg/pr98304-1.c"
+
+/* Runtime tests.  */
+int main() {
+
+/* Signed tests.  */
+if (foo(-42) != -42
+|| foo(0) != 0
+|| foo(63) != 63
+|| foo(64) != 0
+|| foo(65) != 1
+|| foo(99) != 35) {
+__builtin_abort();
+}
+
+/* Unsigned tests.  */
+if (bar(42) != 42
+|| bar(0) != 0
+|| bar(63) != 63
+|| bar(64) != 0
+|| bar(65) != 1
+|| bar(99) != 35) {
+__builtin_abort();
+}
+
+/* Should not simplify.  */
+if (corge(13) != 13
+|| thud(13) != 13
+|| qux(13) != 13
+|| quux(13) != 13) {
+__builtin_abort();
+}
+
+return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/pr98304-1.c b/gcc/testsuite/gcc.dg/pr98304-1.c
new file mode 100644
index 000..dce54ddffe8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98304-1.c
@@ -0,0 +1,57 @@
+/* PR tree-optimization/98304 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+/* Signed test function.  */
+__attribute__((noipa)) int foo(int n) {
+return n - (((n > 63) ? n : 63) & -64);
+}
+
+/* Unsigned test function.  */
+__attribute__((noipa)) unsigned int bar(unsigned int n) {
+return n - (((n > 63) ? n : 63) & -64);
+}
+
+/* Different power of 2.  */
+__attribute__((noipa)) int goo(int n) {
+return n - (((n > 31) ? n : 31) & -32);
+}
+
+/* Commutative property (should be identical to foo)  */
+__attribute__((noipa)) int baz(int n) {
+return n - (((64 > n) ? 63 : n) & -64);
+}
+
+/* < instead of >.  */
+__attribute__((noipa)) int fred(int n) {
+return n - (((63 < n) ? n : 63) & -64);
+}
+
+/* Constant is not a power of 2 so should not simplify.  */
+__attribute__((noipa)) int qux(int n) {
+return n - (((n > 62) ? n : 62) & -63);
+}
+
+/* Constant is not a power of 2 so should not simplify.  */
+__attribute__((noipa)) unsigned int quux(unsigned int n) {
+return n - (((n > 62) ? n : 62) & -63);
+}
+
+/* Constant is a variable so should not simplify.  */
+__attribute__((noipa)) int waldo(int n, int x) {
+return n - (((n > 63) ? n : 63) & x);
+}
+
+/* Difference between constants is not -1.  */
+__attribute__((noipa)) int corge(int n) {
+return n - (((n > 1) ? n : 1) & -64);
+}
+
+/* Difference between constants is not -1.  */
+__attribute__((noipa)) unsigned int thud(unsigned int n)
+{
+return n - (((n > 1) ? n : 1) & -64);
+}
+
+/* { dg-final { scan-tree-dump-times " - " 5 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " <= " 4 "optimized" } } */

base-commit: a8b5d63503b8cf49de32d241218057409f8731ac
-- 
2.31.1

Re: [PATCH][RFC] More update-ssa speedup

2022-07-07 Thread Jeff Law via Gcc-patches





On 7/7/2022 7:33 AM, Richard Biener via Gcc-patches wrote:

When we do TODO_update_ssa_no_phi we already avoid computing
dominance frontiers for all blocks - it is worth to also avoid
walking all dominated blocks in the update domwalk and restrict
the walk to the SEME region with the affected blocks.  We can
do that by walking the CFG in reverse from blocks_to_update to
the common immediate dominator, marking blocks in the region
and telling the domwalk to STOP when leaving it.

For an artificial testcase with N adjacent loops with one
unswitching opportunity that takes the incremental SSA updating
off the -ftime-report radar:

  tree loop unswitching  :  11.25 (  3%)   0.09 (  5%)  11.53 (  
3%)36M (  9%)
  `- tree SSA incremental:  35.74 (  9%)   0.07 (  4%)  36.65 (  
9%)  2734k (  1%)

improves to

  tree loop unswitching  :  10.21 (  3%)   0.05 (  3%)  11.50 (  
3%)36M (  9%)
  `- tree SSA incremental:   0.66 (  0%)   0.02 (  1%)   0.49 (  
0%)  2734k (  1%)

for less localized updates the SEME region isn't likely constrained
enough so I've restricted the extra work to TODO_update_ssa_no_phi
callers.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

It's probably the last change that makes a visible difference for
general update-ssa and any specialized manual updating that would be
possible as well would not show up in better numbers than above
(unless I manage to complicate the testcase more).

Comments?

Thanks,
Richard.

* tree-into-ssa.cc (rewrite_mode::REWRITE_UPDATE_REGION): New.
(rewrite_update_dom_walker::rewrite_update_dom_walker): Update.
(rewrite_update_dom_walker::m_in_region_flag): New.
(rewrite_update_dom_walker::before_dom_children): If the region
to update is marked, STOP at exits.
(rewrite_blocks): For REWRITE_UPDATE_REGION mark the region
to be updated.
(dump_update_ssa): Use bitmap_empty_p.
(update_ssa): Likewise.  Use REWRITE_UPDATE_REGION when
TODO_update_ssa_no_phi.
* tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Account
pending update_ssa to the caller.
Overall concept seems quite reasonable to me -- it also largely mirrors 
ideas I was exploring for incremental DOM many years ago.


Jeff

[RFA] Improve initialization of objects when the initializer has trailing zeros.

2022-07-07 Thread Jeff Law via Gcc-patches

This is an update to a patch originally posted by Takayuki Suwa a few 
months ago.


When we initialize an array from a STRING_CST we perform the 
initialization in two steps.  The first step copies the STRING_CST to 
the destination.  The second step uses clear_storage to initialize 
storage in the array beyond TREE_STRING_LENGTH of the initializer.


Takayuki's patch added a special case when the STRING_CST itself was all 
zeros which would avoid the copy from the STRING_CST and instead do all 
the initialization via clear_storage which is clearly more runtime 
efficient.


Richie had the suggestion that instead of special casing when the entire 
STRING_CST was NULs  to instead identify when the tail of the STRING_CST 
was NULs.   That's more general and handles Takayuki's case as well.


Bootstrapped and regression tested on x86_64-linux-gnu.  Given I rewrote 
Takayuki's patch I think it needs someone else to review rather than 
self-approving.


OK for the trunk?

Jeff

* expr.cc (store_expr): Identify trailing NULs in a STRING_CST
initializer and use clear_storage rather than copying the
NULs to the destination array.

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 62297379ec9..f94d46b969c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -6087,6 +6087,17 @@ store_expr (tree exp, rtx target, int call_param_p,
}
 
   str_copy_len = TREE_STRING_LENGTH (str);
+
+  /* Trailing NUL bytes in EXP will be handled by the call to
+clear_storage, which is more efficient than copying them from
+the STRING_CST, so trim those from STR_COPY_LEN.  */
+  while (str_copy_len)
+   {
+ if (TREE_STRING_POINTER (str)[str_copy_len - 1])
+   break;
+ str_copy_len--;
+   }
+
   if ((STORE_MAX_PIECES & (STORE_MAX_PIECES - 1)) == 0)
{
  str_copy_len += STORE_MAX_PIECES - 1;

Re: [PATCH] c++: generic targs and identity substitution [PR105956]

2022-07-07 Thread Patrick Palka via Gcc-patches

On Thu, 7 Jul 2022, Jason Merrill wrote:

> On 7/6/22 15:26, Patrick Palka wrote:
> > On Tue, 5 Jul 2022, Jason Merrill wrote:
> > 
> > > On 7/5/22 10:06, Patrick Palka wrote:
> > > > On Fri, 1 Jul 2022, Jason Merrill wrote:
> > > > 
> > > > > On 6/29/22 13:42, Patrick Palka wrote:
> > > > > > In r13-1045-gcb7fd1ea85feea I assumed that substitution into generic
> > > > > > DECL_TI_ARGS corresponds to an identity mapping of the given
> > > > > > arguments,
> > > > > > and hence its safe to always elide such substitution.  But this PR
> > > > > > demonstrates that such a substitution isn't always the identity
> > > > > > mapping,
> > > > > > in particular when there's an ARGUMENT_PACK_SELECT argument, which
> > > > > > gets
> > > > > > handled specially during substitution:
> > > > > > 
> > > > > >  * when substituting an APS into a template parameter, we strip
> > > > > > the
> > > > > >APS to its underlying argument;
> > > > > >  * and when substituting an APS into a pack expansion, we strip
> > > > > > the
> > > > > >APS to its underlying argument pack.
> > > > > 
> > > > > Ah, right.  For instance, in variadic96.C we have
> > > > > 
> > > > >   10  template < typename... T >
> > > > >   11  struct derived
> > > > >   12: public base< T, derived< T... > >...
> > > > > 
> > > > > so when substituting into the base-specifier, we're approaching it
> > > > > from
> > > > > the
> > > > > outside in, so when we get to the inner T... we need some way to find
> > > > > the
> > > > > T
> > > > > pack again.  It might be possible to remove the need for APS by
> > > > > substituting
> > > > > inner pack expansions before outer ones, which could improve
> > > > > worst-case
> > > > > complexity, but I don't know how relevant that is in real code; I
> > > > > imagine
> > > > > most
> > > > > inner pack expansions are as simple as this one.
> > > > 
> > > > Aha, that makes sense.
> > > > 
> > > > > 
> > > > > > In this testcase, when expanding the pack expansion pattern (idx +
> > > > > > Ns)...
> > > > > > with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}>
> > > > > > and
> > > > > > then Ns=APS<1,{0,1}>.  The DECL_TI_ARGS of idx are the generic
> > > > > > template
> > > > > > arguments of the enclosing class template impl, so before r13-1045,
> > > > > > we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as
> > > > > > desired.
> > > > > > But after r13-1045, we elide this substitution and end up attempting
> > > > > > to
> > > > > > hash the original Ns argument, an APS, which ICEs.
> > > > > > 
> > > > > > So this patch partially reverts this part of r13-1045.  I considered
> > > > > > using preserve_args in this case instead, but that'd break the
> > > > > > static_assert in the testcase because preserve_args always strips
> > > > > > APS to
> > > > > > its underlying argument, but here we want to strip it to its
> > > > > > underlying
> > > > > > argument pack, so we'd incorrectly end up forming the
> > > > > > specializations
> > > > > > impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx.
> > > > > > 
> > > > > > Although we can't elide the substitution into DECL_TI_ARGS in light
> > > > > > of
> > > > > > ARGUMENT_PACK_SELECT, it should still be safe to elide template
> > > > > > argument
> > > > > > coercion in the case of a non-template decl, which this patch
> > > > > > preserves.
> > > > > > 
> > > > > > It's unfortunate that we need to remove this optimization just
> > > > > > because
> > > > > > it doesn't hold for one special tree code.  So this patch implements
> > > > > > a
> > > > > > heuristic in tsubst_template_args to avoid allocating a new TREE_VEC
> > > > > > if
> > > > > > the substituted elements are identical to those of a level from
> > > > > > ARGS.
> > > > > > It turns out that about 30% of all calls to tsubst_template_args
> > > > > > benefit
> > > > > > from this optimization, and it reduces memory usage by about 1.5%
> > > > > > for
> > > > > > e.g. stdc++.h (relative to r13-1045).  (This is the maybe_reuse
> > > > > > stuff,
> > > > > > the rest of the changes to tsubst_template_args are just drive-by
> > > > > > cleanups.)
> > > > > > 
> > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > > > > > for
> > > > > > trunk?  Patch generated with -w to ignore noisy whitespace changes.
> > > > > > 
> > > > > > PR c++/105956
> > > > > > 
> > > > > > gcc/cp/ChangeLog:
> > > > > > 
> > > > > > * pt.cc (tsubst_template_args): Move variable declarations
> > > > > > closer to their first use.  Replace 'orig_t' with 'r'.  Rename
> > > > > > 'need_new' to 'const_subst_p'.  Heuristically detect if the
> > > > > > substituted elements are identical to that of a level from
> > > > > > 'args' and avoid allocating a new TREE_VEC if so.
> > > > > > (tsubst_decl) : Revert
> > > > > > r13-1045-gcb7fd1ea85feea change for avoiding substitution into
> > > > > > DECL

Re: [PATCH] Add a heuristic for eliminate redundant load and store in inline pass.

2022-07-07 Thread Jan Hubicka via Gcc-patches

Hello,
> From: Lili 
> 
> 
> Hi Hubicka,
> 
> This patch is to add a heuristic inline hint to eliminate redundant load and 
> store.
> 
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.
> OK for trunk?
> 
> Thanks,
> Lili.
> 
> Add a INLINE_HINT_eliminate_load_and_store hint in to inline pass.
> We accumulate the insn number of redundant load and store that can be
> reduced by these three cases, when the count size is greater than the
> threshold, we will enable the hint. with the hint, inlining_insns_auto
> will enlarge the bound.
> 
> 1. Caller's store is same with callee's load
> 2. Caller's load is same with callee's load
> 3. Callee's load is same with caller's local memory access
> 
> With the patch applied
> Icelake server: 538.imagic_r get 14.10% improvement for multicopy and 38.90%
> improvement for single copy with no measurable changes for other benchmarks.
> 
> Casecadelake: 538.imagic_r get 12.5% improvement for multicopy with and code
> size increased by 0.2%. With no measurable changes for other benchmarks.
> 
> Znver3 server: 538.imagic_r get 14.20% improvement for multicopy with and 
> codei
> size increased by 0.3%. With no measurable changes for other benchmarks.
> 
> CPU2017 single copy performance data for Icelake server
> BenchMarks   Score   Build time  Code size
> 500.perlbench_r  1.50%   -0.20%  0.00%
> 502.gcc_r0.10%   -0.10%  0.00%
> 505.mcf_r0.00%   1.70%   0.00%
> 520.omnetpp_r-0.60%  -0.30%  0.00%
> 523.xalancbmk_r  0.60%   0.00%   0.00%
> 525.x264_r   0.00%   -0.20%  0.00%
> 531.deepsjeng_r  0.40%   -1.10%  -0.10%
> 541.leela_r  0.00%   0.00%   0.00%
> 548.exchange2_r  0.00%   -0.90%  0.00%
> 557.xz_r 0.00%   0.00%   0.00%
> 503.bwaves_r 0.00%   1.40%   0.00%
> 507.cactuBSSN_r  0.00%   1.00%   0.00%
> 508.namd_r   0.00%   0.30%   0.00%
> 510.parest_r 0.00%   -0.40%  0.00%
> 511.povray_r 0.70%   -0.60%  0.00%
> 519.lbm_r0.00%   0.00%   0.00%
> 521.wrf_r0.00%   0.60%   0.00%
> 526.blender_r0.00%   0.00%   0.00%
> 527.cam4_r   -0.30%  -0.50%  0.00%
> 538.imagick_r38.90%  0.50%   0.20%
> 544.nab_r0.00%   1.10%   0.00%
> 549.fotonik3d_r  0.00%   0.90%   0.00%
> 554.roms_r   2.30%   -0.10%  0.00%
> Geomean-int  0.00%   -0.30%  0.00%
> Geomean-fp   3.80%   0.30%   0.00%

This is interesting idea.  Basically we want to guess if inlining will
make SRA and or strore->load propagation possible.   I think the
solution using INLINE_HINT may be bit too trigger happy, since it is
very common that this happens and with -O3 the hints are taken quite
sriously.

We already have mechanism to predict this situaiton by simply expeciting
that stores to addresses pointed to by function parameter will be
eliminated by 50%.  See eliminated_by_inlining_prob.

I was thinking that we may combine it with a knowledge that the
parameter points to caller local memory (which is done by llvm's
heuristics) which can be added to IPA predicates.

The idea of checking that the actual sotre in question is paired with
load at caller side is bit harder: one needs to invent representation
for such conditions.  So I wonder how much extra help we need for
critical inlning to happen at imagemagics?

Honza
> 
> gcc/ChangeLog:
> 
>   * ipa-fnsummary.cc (ipa_dump_hints): Add print for hint 
> "eliminate_load_and_store"
>   * ipa-fnsummary.h (enum ipa_hints_vals): Add 
> INLINE_HINT_eliminate_load_and_store.
>   * ipa-inline-analysis.cc (do_estimate_edge_time): Add judgment for 
> INLINE_HINT_eliminate_load_and_store.
>   * ipa-inline.cc (want_inline_small_function_p): Add 
> "INLINE_HINT_eliminate_load_and_store" for hints flag.
>   * ipa-modref-tree.h (struct modref_access_node): Move function contains 
> to public..
>   (struct modref_tree): Add new function "same" and 
> "local_vector_memory_accesse"
>   * ipa-modref.cc (eliminate_load_and_store): New.
>   (ipa_merge_modref_summary_after_inlining): Change the input value of 
> useful_p.
>   * ipa-modref.h (eliminate_load_and_store): New.
>   * opts.cc: Add param "min_inline_hint_eliminate_loads_num"
>   * params.opt: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/ipa/inlinehint-6.c: New test.
> ---
>  gcc/ipa-fnsummary.cc|   5 ++
>  gcc/ipa-fnsummary.h |   4 +-
>  gcc/ipa-inline-analysis.cc  |   7 ++
>  gcc/ipa-inline.cc   |   3 +-
>  gcc/ipa-modref-tree.h   | 109 +++-
>  gcc/ipa-modref.cc   |  46 +-
>  gcc/ipa-modref.h|   1 +
>  gcc/opts.cc |   1 +
>  gcc/params.opt  |   4 +
>  gcc/testsuite/gcc.

[PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-07-07 Thread Alexander Monakov via Gcc-patches

From: Artem Klimov 

Fix PR99619, which asks to optimize TLS model based on visibility.
The fix is implemented as an IPA optimization: this allows to take
optimized visibility status into account (as well as avoid modifying
all language frontends).

2022-04-17  Artem Klimov  

gcc/ChangeLog:

* ipa-visibility.cc (function_and_variable_visibility): Promote
TLS access model afer visibility optimizations.
* varasm.cc (have_optimized_refs): New helper.
(optimize_dyn_tls_for_decl_p): New helper. Use it ...
(decl_default_tls_model): ... here in place of 'optimize' check.

gcc/testsuite/ChangeLog:

* gcc.dg/tls/vis-attr-gd.c: New test.
* gcc.dg/tls/vis-attr-hidden-gd.c: New test.
* gcc.dg/tls/vis-attr-hidden.c: New test.
* gcc.dg/tls/vis-flag-hidden-gd.c: New test.
* gcc.dg/tls/vis-flag-hidden.c: New test.
* gcc.dg/tls/vis-pragma-hidden-gd.c: New test.
* gcc.dg/tls/vis-pragma-hidden.c: New test.

Co-Authored-By:  Alexander Monakov  
Signed-off-by: Artem Klimov 
---

v2: run the new loop in ipa-visibility only in the whole-program IPA pass;
in decl_default_tls_model, check if any referring function is optimized
when 'optimize == 0' (when running in LTO mode)


Note for reviewers: I noticed there's a place which tries to avoid TLS
promotion, but the comment seems wrong and I could not find a testcase.
I'd suggest we remove it. The compiler can only promote general-dynamic
to local-dynamic and initial-exec to local-exec. The comment refers to
promoting x-dynamic to y-exec, but that cannot happen AFAICT:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=8e1ba78f1b8eedd6c65c6f0e6d6d09a801de5d3d


 gcc/ipa-visibility.cc | 19 +++
 gcc/testsuite/gcc.dg/tls/vis-attr-gd.c| 12 +++
 gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c | 13 
 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c| 12 +++
 gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c | 13 
 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c| 12 +++
 .../gcc.dg/tls/vis-pragma-hidden-gd.c | 17 ++
 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c  | 16 ++
 gcc/varasm.cc | 32 ++-
 9 files changed, 145 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-gd.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden-gd.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-attr-hidden.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden-gd.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-flag-hidden.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden-gd.c
 create mode 100644 gcc/testsuite/gcc.dg/tls/vis-pragma-hidden.c

diff --git a/gcc/ipa-visibility.cc b/gcc/ipa-visibility.cc
index 8a27e7bcd..3ed2b7cf6 100644
--- a/gcc/ipa-visibility.cc
+++ b/gcc/ipa-visibility.cc
@@ -873,6 +873,25 @@ function_and_variable_visibility (bool whole_program)
}
 }
 
+  if (symtab->state >= IPA_SSA)
+{
+  FOR_EACH_VARIABLE (vnode)
+   {
+ tree decl = vnode->decl;
+
+ /* Upgrade TLS access model based on optimized visibility status,
+unless it was specified explicitly or no references remain.  */
+ if (DECL_THREAD_LOCAL_P (decl)
+ && !lookup_attribute ("tls_model", DECL_ATTRIBUTES (decl))
+ && vnode->ref_list.referring.length ())
+   {
+ enum tls_model new_model = decl_default_tls_model (decl);
+ gcc_checking_assert (new_model >= decl_tls_model (decl));
+ set_decl_tls_model (decl, new_model);
+   }
+   }
+}
+
   if (dump_file)
 {
   fprintf (dump_file, "\nMarking local functions:");
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 4db8506b1..de149e82c 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -6679,6 +6679,36 @@ init_varasm_once (void)
 #endif
 }
 
+/* Determine whether SYMBOL is used in any optimized function.  */
+
+static bool
+have_optimized_refs (struct symtab_node *symbol)
+{
+  struct ipa_ref *ref;
+
+  for (int i = 0; symbol->iterate_referring (i, ref); i++)
+{
+  cgraph_node *cnode = dyn_cast  (ref->referring);
+
+  if (cnode && opt_for_fn (cnode->decl, optimize))
+   return true;
+}
+
+  return false;
+}
+
+/* Check if promoting general-dynamic TLS access model to local-dynamic is
+   desirable for DECL.  */
+
+static bool
+optimize_dyn_tls_for_decl_p (const_tree decl)
+{
+  if (optimize)
+return true;
+  return symtab->state >= IPA && have_optimized_refs (symtab_node::get (decl));
+}
+
+
 enum tls_model
 decl_default_tls_model (const_tree decl)
 {
@@ -6696,7 +6726,7 @@ decl_default_tls_model (const_tree decl)
 
   /* Local dynamic is inefficient when we're not combining the
  parts of the address.  */
-  else if (optimize && is_local)
+  else if (is_local && optimize_dyn_tls_for_decl_p (decl))
 kind = TLS_MODEL_LOCAL

[pushed] c++: -Woverloaded-virtual and dtors [PR87729]

2022-07-07 Thread Jason Merrill via Gcc-patches

My earlier patch broke out of the loop over base members when we found a
match, but that caused trouble for dtors, which can have multiple for which
same_signature_p is true.  But as the function comment says, we know this
doesn't apply to [cd]tors, so skip them.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/87729

gcc/cp/ChangeLog:

* class.cc (warn_hidden): Ignore [cd]tors.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Woverloaded-virt3.C: New test.
---
 gcc/cp/class.cc   | 3 +++
 gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C | 7 +++
 2 files changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C

diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index 17683f421a7..eb69e7f985c 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -3020,6 +3020,9 @@ warn_hidden (tree t)
tree binfo;
unsigned j;
 
+   if (IDENTIFIER_CDTOR_P (name))
+ continue;
+
/* Iterate through all of the base classes looking for possibly
   hidden functions.  */
for (binfo = TYPE_BINFO (t), j = 0;
diff --git a/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C 
b/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C
new file mode 100644
index 000..34214ba2557
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Woverloaded-virt3.C
@@ -0,0 +1,7 @@
+// PR c++/87729
+// { dg-additional-options -Woverloaded-virtual }
+
+struct S1 {};
+struct S2: S1 { virtual ~S2(); };
+struct S3 { virtual ~S3(); };
+struct S4: S2, S3 { virtual ~S4(); };

base-commit: c1b1c4e58bda152ae932b45396ab67b07dd8c3fe
-- 
2.31.1

Re: [PATCH]middle-end simplify complex if expressions where comparisons are inverse of one another.

2022-07-07 Thread Jeff Law via Gcc-patches





On 7/5/2022 8:09 PM, Andrew Pinski via Gcc-patches wrote:


Not your fault but there are now like two different predicates for a
boolean like operand.
zero_one_valued_p and truth_valued_p and a third way to describe it is
to use SSA_NAME and check ssa_name_has_boolean_range.
The latter is meant to catch cases where analysis indicates that a given 
SSA_NAME only takes on the values 0 or 1, regardless of the actual size 
of the SSA_NAME.


It pre-dates having reasonable range information available in DOM and 
from reviewing the existing uses in DOM, I would expect Ranger to make 
most, if not all, of this code useless.


jeff

Re: [PATCH v2] tree-optimization/95821 - Convert strlen + strchr to memchr

2022-07-07 Thread Noah Goldstein via Gcc-patches

On Tue, Jun 21, 2022 at 11:13 AM Noah Goldstein  wrote:
>
> On Tue, Jun 21, 2022 at 5:01 AM Jakub Jelinek  wrote:
> >
> > On Mon, Jun 20, 2022 at 02:42:20PM -0700, Noah Goldstein wrote:
> > > This patch allows for strchr(x, c) to the replace with memchr(x, c,
> > > strlen(x) + 1) if strlen(x) has already been computed earlier in the
> > > tree.
> > >
> > > Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> > >
> > > Since memchr doesn't need to re-find the null terminator it is faster
> > > than strchr.
> > >
> > > bootstrapped and tested on x86_64-linux.
> > >
> > >   PR tree-optimization/95821
> >
> > This should be indented by a single tab, not two.
>
> Fixed in V3
> > >
> > > gcc/
> > >
> > >   * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
> > >   memchr instead of strchr if strlen already computed.
> > >
> > > gcc/testsuite/
> > >
> > >   * c-c++-common/pr95821-1.c: New test.
> > >   * c-c++-common/pr95821-2.c: New test.
> > >   * c-c++-common/pr95821-3.c: New test.
> > >   * c-c++-common/pr95821-4.c: New test.
> > >   * c-c++-common/pr95821-5.c: New test.
> > >   * c-c++-common/pr95821-6.c: New test.
> > >   * c-c++-common/pr95821-7.c: New test.
> > >   * c-c++-common/pr95821-8.c: New test.
> > > --- a/gcc/tree-ssa-strlen.cc
> > > +++ b/gcc/tree-ssa-strlen.cc
> > > @@ -2405,9 +2405,12 @@ strlen_pass::handle_builtin_strlen ()
> > >  }
> > >  }
> > >
> > > -/* Handle a strchr call.  If strlen of the first argument is known, 
> > > replace
> > > -   the strchr (x, 0) call with the endptr or x + strlen, otherwise 
> > > remember
> > > -   that lhs of the call is endptr and strlen of the argument is endptr - 
> > > x.  */
> > > +/* Handle a strchr call.  If strlen of the first argument is known,
> > > +   replace the strchr (x, 0) call with the endptr or x + strlen,
> > > +   otherwise remember that lhs of the call is endptr and strlen of the
> > > +   argument is endptr - x.  If strlen of x is not know but has been
> > > +   computed earlier in the tree then replace strchr(x, c) to
> >
> > Still missing space before ( above.
>
> Sorry, fixed that in V3.
> >
> > > +   memchr (x, c, strlen + 1).  */
> > >
> > >  void
> > >  strlen_pass::handle_builtin_strchr ()
> > > @@ -2418,8 +2421,12 @@ strlen_pass::handle_builtin_strchr ()
> > >if (lhs == NULL_TREE)
> > >  return;
> > >
> > > -  if (!integer_zerop (gimple_call_arg (stmt, 1)))
> > > -return;
> > > +  tree chr = gimple_call_arg (stmt, 1);
> > > +  /* strchr only uses the lower char of input so to check if its
> > > + strchr (s, zerop) only take into account the lower char.  */
> > > +  bool is_strchr_zerop
> > > +  = (TREE_CODE (chr) == INTEGER_CST
> > > +  && integer_zerop (fold_convert (char_type_node, chr)));
> >
> > The indentation rule is that = should be 2 columns to the right from bool,
> > so
> >
>
> Fixed in V3.
> >   bool is_strchr_zerop
> > = (TREE_CODE (chr) == INTEGER_CST
> >&& integer_zerop (fold_convert (char_type_node, chr)));
> >
> > > +   /* If its not strchr (s, zerop) then try and convert to
> > > +  memchr since strlen has already been computed.  */
> >
> > This comment still has the second line weirdly indented.
>
> Sorry, have emacs with 4-space tabs so things that look right arent
> as they seem :/
>
> Fixed in V3 I believe.
> >
> > > +   tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> > > +
> > > +   /* Only need to check length strlen (s) + 1 if chr may be 
> > > zero.
> > > + Otherwise the last chr (which is known to be zero) can never
> > > + be a match.  NB: We don't need to test if chr is a non-zero
> > > + integer const with zero char bits because that is taken into
> > > + account with is_strchr_zerop.  */
> > > +   if (!tree_expr_nonzero_p (chr))
> >
> > The above is unsafe though.  tree_expr_nonzero_p (chr) will return true
> > if say VRP can prove it is not zero, but because of the implicit
> > (char) chr cast done by the function we need something different.
> > Say if VRP determines that chr is in [1, INT_MAX] or even just [255, 257]
> > it doesn't mean (char) chr won't be 0.
> > So, as I've tried to explain in the previous mail, it can be done e.g. with
>
> Added your code in V3. Thanks for the help.
> >   bool chr_nonzero = false;
> >   if (TREE_CODE (chr) == INTEGER_CST
> >   && integer_nonzerop (fold_convert (char_type_node, chr)))
> > chr_nonzero = true;
> >   else if (TREE_CODE (chr) == SSA_NAME
> >&& CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> > {
> >   value_range r;
> >   /* Try to determine using ranges if (char) chr must
> >  be always 0.  That is true e.g. if all the subranges
> >  have the INT_TYPE_SIZE - CHAR_TY

[x86 PATCH] Support *testdi_not_doubleword during STV pass.

2022-07-07 Thread Roger Sayle


This patch fixes the current two FAILs of pr65105-5.c on x86 when
compiled with -m32.  These (temporary) breakages were fallout from my
patches to improve/upgrade (scalar) double word comparisons.
On mainline, the i386 backend currently represents a critical comparison
using (compare (and (not reg1) reg2) (const_int 0)) which isn't/wasn't
recognized by the STV pass' convertible_comparison_p.  This simple STV
patch adds support for this pattern (*testdi_not_doubleword) and
generates the vector pandn and ptest instructions expected in the
existing (failing) test case.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, where with --target_board=unix{-m32} there are two
fewer failures, and without, there are no new failures.
Ok for mainline?


2022-07-07  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc (convert_compare): Add support
for *testdi_not_doubleword pattern (i.e. "(compare (and (not ...")
by generating a pandn followed by ptest.
(convertible_comparison_p): Recognize both *cmpdi_doubleword and
recent *testdi_not_doubleword comparison patterns.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index be38586..a7bd172 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -938,10 +938,10 @@ general_scalar_chain::convert_compare (rtx op1, rtx op2, 
rtx_insn *insn)
 {
   rtx tmp = gen_reg_rtx (vmode);
   rtx src;
-  convert_op (&op1, insn);
   /* Comparison against anything other than zero, requires an XOR.  */
   if (op2 != const0_rtx)
 {
+  convert_op (&op1, insn);
   convert_op (&op2, insn);
   /* If both operands are MEMs, explicitly load the OP1 into TMP.  */
   if (MEM_P (op1) && MEM_P (op2))
@@ -953,8 +953,25 @@ general_scalar_chain::convert_compare (rtx op1, rtx op2, 
rtx_insn *insn)
src = op1;
   src = gen_rtx_XOR (vmode, src, op2);
 }
+  else if (GET_CODE (op1) == AND
+  && GET_CODE (XEXP (op1, 0)) == NOT)
+{
+  rtx op11 = XEXP (XEXP (op1, 0), 0);
+  rtx op12 = XEXP (op1, 1);
+  convert_op (&op11, insn);
+  convert_op (&op12, insn);
+  if (MEM_P (op11))
+   {
+ emit_insn_before (gen_rtx_SET (tmp, op11), insn);
+ op11 = tmp;
+   }
+  src = gen_rtx_AND (vmode, gen_rtx_NOT (vmode, op11), op12);
+}
   else
-src = op1;
+{
+  convert_op (&op1, insn);
+  src = op1;
+}
   emit_insn_before (gen_rtx_SET (tmp, src), insn);
 
   if (vmode == V2DImode)
@@ -1399,17 +1416,29 @@ convertible_comparison_p (rtx_insn *insn, enum 
machine_mode mode)
   rtx op1 = XEXP (src, 0);
   rtx op2 = XEXP (src, 1);
 
-  if (!CONST_INT_P (op1)
-  && ((!REG_P (op1) && !MEM_P (op1))
- || GET_MODE (op1) != mode))
-return false;
-
-  if (!CONST_INT_P (op2)
-  && ((!REG_P (op2) && !MEM_P (op2))
- || GET_MODE (op2) != mode))
-return false;
+  /* *cmp_doubleword.  */
+  if ((CONST_INT_P (op1)
+   || ((REG_P (op1) || MEM_P (op1))
+   && GET_MODE (op1) == mode))
+  && (CONST_INT_P (op2)
+ || ((REG_P (op2) || MEM_P (op2))
+ && GET_MODE (op2) == mode)))
+return true;
+
+  /* *test_not_doubleword.  */
+  if (op2 == const0_rtx
+  && GET_CODE (op1) == AND
+  && GET_CODE (XEXP (op1, 0)) == NOT)
+{
+  rtx op11 = XEXP (XEXP (op1, 0), 0);
+  rtx op12 = XEXP (op1, 1);
+  return (REG_P (op11) || MEM_P (op11))
+&& (REG_P (op12) || MEM_P (op12))
+&& GET_MODE (op11) == mode
+&& GET_MODE (op12) == mode;
+}
 
-  return true;
+  return false;
 }
 
 /* The general version of scalar_to_vector_candidate_p.  */

[PATCH v2] Simplify memchr with small constant strings

2022-07-07 Thread H.J. Lu via Gcc-patches

When memchr is applied on a constant string of no more than the bytes of
a word, simplify memchr by checking each byte in the constant string.

int f (int a)
{
   return  __builtin_memchr ("AE", a, 2) != 0;
}

is simplified to

int f (int a)
{
  return ((char) a == 'A' || (char) a == 'E') != 0;
}

gcc/

PR tree-optimization/103798
* tree-ssa-forwprop.cc: Include "tree-ssa-strlen.h".
(simplify_builtin_call): Inline memchr with constant strings of
no more than the bytes of a word.
* tree-ssa-strlen.cc (use_in_zero_equality): Make it global.
* tree-ssa-strlen.h (use_in_zero_equality): New.

gcc/testsuite/

PR tree-optimization/103798
* c-c++-common/pr103798-1.c: New test.
* c-c++-common/pr103798-2.c: Likewise.
* c-c++-common/pr103798-3.c: Likewise.
* c-c++-common/pr103798-4.c: Likewise.
* c-c++-common/pr103798-5.c: Likewise.
* c-c++-common/pr103798-6.c: Likewise.
* c-c++-common/pr103798-7.c: Likewise.
* c-c++-common/pr103798-8.c: Likewise.
---
 gcc/testsuite/c-c++-common/pr103798-1.c | 28 +++
 gcc/testsuite/c-c++-common/pr103798-2.c | 30 
 gcc/testsuite/c-c++-common/pr103798-3.c | 28 +++
 gcc/testsuite/c-c++-common/pr103798-4.c | 28 +++
 gcc/testsuite/c-c++-common/pr103798-5.c | 26 ++
 gcc/testsuite/c-c++-common/pr103798-6.c | 27 +++
 gcc/testsuite/c-c++-common/pr103798-7.c | 27 +++
 gcc/testsuite/c-c++-common/pr103798-8.c | 27 +++
 gcc/tree-ssa-forwprop.cc| 64 +
 gcc/tree-ssa-strlen.cc  |  4 +-
 gcc/tree-ssa-strlen.h   |  2 +
 11 files changed, 289 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-1.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-2.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-3.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-4.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-5.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-6.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-7.c
 create mode 100644 gcc/testsuite/c-c++-common/pr103798-8.c

diff --git a/gcc/testsuite/c-c++-common/pr103798-1.c 
b/gcc/testsuite/c-c++-common/pr103798-1.c
new file mode 100644
index 000..cd3edf569fc
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr103798-1.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
+
+__attribute__ ((weak))
+int
+f (char a)
+{
+   return  __builtin_memchr ("a", a, 1) == 0;
+}
+
+__attribute__ ((weak))
+int
+g (char a)
+{
+  return a != 'a';
+}
+
+int
+main ()
+{
+ for (int i = 0; i < 255; i++)
+   if (f (i) != g (i))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "memchr" } } */
diff --git a/gcc/testsuite/c-c++-common/pr103798-2.c 
b/gcc/testsuite/c-c++-common/pr103798-2.c
new file mode 100644
index 000..e7e99c3679e
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr103798-2.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
+
+#include 
+
+__attribute__ ((weak))
+int
+f (int a)
+{
+   return memchr ("aE", a, 2) != NULL;
+}
+
+__attribute__ ((weak))
+int
+g (char a)
+{
+  return a == 'a' || a == 'E';
+}
+
+int
+main ()
+{
+ for (int i = 0; i < 255; i++)
+   if (f (i + 256) != g (i + 256))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "memchr" } } */
diff --git a/gcc/testsuite/c-c++-common/pr103798-3.c 
b/gcc/testsuite/c-c++-common/pr103798-3.c
new file mode 100644
index 000..ddcedc7e238
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr103798-3.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
+
+__attribute__ ((weak))
+int
+f (char a)
+{
+   return  __builtin_memchr ("aEgZ", a, 3) == 0;
+}
+
+__attribute__ ((weak))
+int
+g (char a)
+{
+  return a != 'a' && a != 'E' && a != 'g';
+}
+
+int
+main ()
+{
+ for (int i = 0; i < 255; i++)
+   if (f (i) != g (i))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "memchr" } } */
diff --git a/gcc/testsuite/c-c++-common/pr103798-4.c 
b/gcc/testsuite/c-c++-common/pr103798-4.c
new file mode 100644
index 000..00e8302a833
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr103798-4.c
@@ -0,0 +1,28 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-optimized -save-temps" } */
+
+__attribute__ ((weak))
+int
+f (char a)
+{
+   return  __builtin_memchr ("aEgi", a, 4) != 0;
+}
+
+__attribute__ ((weak))
+int
+g (char a)
+{
+  return a == 'a' || a == 'E' || a == 'g' || a == 'i';
+}
+
+int
+main ()
+{
+ for (int i = 0; i < 255; i++)
+   if (f (i) != g (i))
+ __builtin_abort ();
+
+ return 0;
+}
+
+/* { dg-final { scan-assembler-not "memchr" } } */
diff --git a/gcc/testsuite/c-c++-common/pr103798-5.c 
b/gcc/testsuite/c-c++-common

Re: [PATCH] Inline memchr with a small constant string

2022-07-07 Thread H.J. Lu via Gcc-patches

On Thu, Jun 23, 2022 at 9:26 AM H.J. Lu  wrote:
>
> On Wed, Jun 22, 2022 at 11:03 PM Richard Biener
>  wrote:
> >
> > On Wed, Jun 22, 2022 at 7:13 PM H.J. Lu  wrote:
> > >
> > > On Wed, Jun 22, 2022 at 4:39 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Tue, Jun 21, 2022 at 11:03 PM H.J. Lu via Gcc-patches
> > > >  wrote:
> > > > >
> > > > > When memchr is applied on a constant string of no more than the bytes 
> > > > > of
> > > > > a word, inline memchr by checking each byte in the constant string.
> > > > >
> > > > > int f (int a)
> > > > > {
> > > > >return  __builtin_memchr ("eE", a, 2) != 0;
> > > > > }
> > > > >
> > > > > is simplified to
> > > > >
> > > > > int f (int a)
> > > > > {
> > > > >   return (char) a == 'e' || (char) a == 'E';
> > > > > }
> > > > >
> > > > > gcc/
> > > > >
> > > > > PR tree-optimization/103798
> > > > > * match.pd (__builtin_memchr (const_str, a, N)): Inline memchr
> > > > > with constant strings of no more than the bytes of a word.
> > > >
> > > > Please do this in strlenopt or so, with match.pd you will end up moving
> > > > the memchr loads across possible aliasing stores to the point of the
> > > > comparison.
> > >
> > > strlenopt is run after many other passes.  The code won't be well 
> > > optimized.
> >
> > What followup optimizations do you expect?  That is, other builtins are only
>
> reassociation and dce turn
>
>   _5 = a_2(D) == 101;
>   _6 = a_2(D) == 69;
>   _1 = _5 | _6;
>   _4 = (int) _1;
>
> into
>
>   _7 = a_2(D) & -33;
>   _8 = _7 == 69;
>   _1 = _8;
>   _4 = (int) _1;
>
> > expanded inline at RTL expansion time?
>
> Some high level optimizations will be missed and
> TARGET_GIMPLE_FOLD_BUILTIN improves builtins
> codegen.
>
> > > Since we are only optimizing
> > >
> > > __builtin_memchr ("eE", a, 2) != 0;
> > >
> > > I don't see any aliasing store issues here.
> >
> > Ah, I failed to see the STRING_CST restriction.  Note that when optimizing 
> > for
> > size this doesn't look very good.
>
> True.
>
> > I would expect a target might produce some vector code for
> > memchr ("aAbBcCdDeE...", c, 9) != 0 by splatting 'c', doing
> > a v16qimode compare, masking off excess elements beyond length
> > and then comparing against zero or for == 0 against all-ones.
> >
> > The repetitive pattern result also suggests an implementation elsewhere,
> > if you think strlenopt is too late there would be forwprop as well.
>
> forwprop seems a good place.

The v2 patch is at

https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598022.html

Thanks.

-- 
H.J.

[committed] libstdc++: Remove workaround in __gnu_cxx::char_traits::move [PR89074]

2022-07-07 Thread Jonathan Wakely via Gcc-patches

Tested aarch64-linux, pushed to trunk.

-- >8 --

The front-end bug that prevented this constexpr loop from working has
been fixed since GCC 12.1 so we can remove the workaround.

libstdc++-v3/ChangeLog:

PR c++/89074
* include/bits/char_traits.h (__gnu_cxx::char_traits::move):
Remove workaround for front-end bug.
---
 libstdc++-v3/include/bits/char_traits.h | 9 -
 1 file changed, 9 deletions(-)

diff --git a/libstdc++-v3/include/bits/char_traits.h 
b/libstdc++-v3/include/bits/char_traits.h
index b856b1da320..965ff29b75c 100644
--- a/libstdc++-v3/include/bits/char_traits.h
+++ b/libstdc++-v3/include/bits/char_traits.h
@@ -215,14 +215,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  if (__s1 == __s2) // unlikely, but saves a lot of work
return __s1;
-#if __cpp_constexpr_dynamic_alloc
- // The overlap detection below fails due to PR c++/89074,
- // so use a temporary buffer instead.
- char_type* __tmp = new char_type[__n];
- copy(__tmp, __s2, __n);
- copy(__s1, __tmp, __n);
- delete[] __tmp;
-#else
  const auto __end = __s2 + __n - 1;
  bool __overlap = false;
  for (std::size_t __i = 0; __i < __n - 1; ++__i)
@@ -244,7 +236,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
  else
copy(__s1, __s2, __n);
-#endif
  return __s1;
}
 #endif
-- 
2.36.1

[PATCH] libstdc++: Prefer const T to std::add_const_t

2022-07-07 Thread Jonathan Wakely via Gcc-patches

Does anybody see a problem with this change? The point is to avoid
unnecessary class template instantiations.

Tested aarch64-linux.

-- >8 --

For any typedef-name or template parameter, T, add_const_t is
equivalent to T const, so we can avoid instantiating the std::add_const
class template and just say T const (or const T).

This isn't true for a non-typedef like int&, where int& const would be
ill-formed, but we shouldn't be using add_const_t anyway, because
we know what that type is.

The only place we need to continue using std::add_const is in the
std::bind implementation where it's used as a template template
parameter to be applied as a metafunction elsewhere.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (__iter_to_alloc_t): Replace
add_const_t with const-qualifier.
* include/bits/utility.h (tuple_element): Likewise for
all cv-qualifiers.
* include/std/type_traits (add_const, add_volatile): Replace
typedef-declaration with using-declaration.
(add_cv): Replace add_const and add_volatile with cv-qualifiers.
* include/std/variant (variant_alternative): Replace
add_const_t, add_volatile_t and add_cv_t etc. with cv-qualifiers.
---
 libstdc++-v3/include/bits/stl_iterator.h | 11 +--
 libstdc++-v3/include/bits/utility.h  |  6 +++---
 libstdc++-v3/include/std/type_traits |  9 +++--
 libstdc++-v3/include/std/variant |  6 +++---
 4 files changed, 14 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 12a89ab229f..049cb02a4c4 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -2536,19 +2536,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // of associative containers.
   template
 using __iter_key_t = remove_const_t<
-typename iterator_traits<_InputIterator>::value_type::first_type>;
+  typename iterator_traits<_InputIterator>::value_type::first_type>;
 
   template
-using __iter_val_t =
-typename iterator_traits<_InputIterator>::value_type::second_type;
+using __iter_val_t
+  = typename iterator_traits<_InputIterator>::value_type::second_type;
 
   template
 struct pair;
 
   template
-using __iter_to_alloc_t =
-pair>,
-__iter_val_t<_InputIterator>>;
+using __iter_to_alloc_t
+  = pair, __iter_val_t<_InputIterator>>;
 #endif // __cpp_deduction_guides
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git a/libstdc++-v3/include/bits/utility.h 
b/libstdc++-v3/include/bits/utility.h
index e0e40309a6d..6a192e27836 100644
--- a/libstdc++-v3/include/bits/utility.h
+++ b/libstdc++-v3/include/bits/utility.h
@@ -86,19 +86,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct tuple_element<__i, const _Tp>
 {
-  typedef typename add_const<__tuple_element_t<__i, _Tp>>::type type;
+  using type = const __tuple_element_t<__i, _Tp>;
 };
 
   template
 struct tuple_element<__i, volatile _Tp>
 {
-  typedef typename add_volatile<__tuple_element_t<__i, _Tp>>::type type;
+  using type = volatile __tuple_element_t<__i, _Tp>;
 };
 
   template
 struct tuple_element<__i, const volatile _Tp>
 {
-  typedef typename add_cv<__tuple_element_t<__i, _Tp>>::type type;
+  using type = const volatile __tuple_element_t<__i, _Tp>;
 };
 
 #if __cplusplus >= 201402L
diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 2572d8edd69..e5f58bc2e3f 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1577,20 +1577,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// add_const
   template
 struct add_const
-{ typedef _Tp const type; };
+{ using type = _Tp const; };
 
   /// add_volatile
   template
 struct add_volatile
-{ typedef _Tp volatile type; };
+{ using type = _Tp volatile; };
 
   /// add_cv
   template
 struct add_cv
-{
-  typedef typename
-  add_const::type>::type type;
-};
+{ using type = _Tp const volatile; };
 
 #if __cplusplus > 201103L
 
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 5ff1e3edcdf..f8f15665433 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -107,15 +107,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct variant_alternative<_Np, const _Variant>
-{ using type = add_const_t>; };
+{ using type = const variant_alternative_t<_Np, _Variant>; };
 
   template
 struct variant_alternative<_Np, volatile _Variant>
-{ using type = add_volatile_t>; };
+{ using type = volatile variant_alternative_t<_Np, _Variant>; };
 
   template
 struct variant_alternative<_Np, const volatile _Variant>
-{ using type = add_cv_t>; };
+{ using type = const volatile variant_alternative_t<_Np, _Variant>; };
 
   inline constexpr size_t variant_npos = -

[PATCH] c++: Define built-in for std::tuple_element [PR100157]

2022-07-07 Thread Jonathan Wakely via Gcc-patches

This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative. The purpose is to select the Nth type from a
list of types, e.g. __builtin_type_pack_element(1, char, int, float) is
int.

For a pathological example tuple_element_t<1000, tuple<2000 types...>>
the compilation time is reduced by more than 90% and the memory  used by
the compiler is reduced by 97%. In realistic examples the gains will be
much smaller, but still relevant.

Clang has a similar built-in, __type_pack_element, but that's a
"magic template" built-in using <> syntax, which GCC doesn't support. So
this provides an equivalent feature, but as a built-in function using
parens instead of <>. I don't really like the name "type pack element"
(it gives you an element from a pack of types) but the semi-consistency
with Clang seems like a reasonable argument in favour of keeping the
name. I'd be open to alternative names though, e.g. __builtin_nth_type
or __builtin_type_at_index.


The patch has some problems though ...

FIXME 1: Marek pointed out that this this ICEs:
template using type = __builtin_type_pack_element(sizeof(T), T...);
type c;

The sizeof(T) expression is invalid, because T is an unexpanded pack,
but it's not rejected and instead crashes:

ice.C: In substitution of 'template using type = 
__builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]':
ice.C:2:15:   required from here
ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490
1 | template using type = 
__builtin_type_pack_element(sizeof(T), T...);
  |   ^
0xe13eea dependent_type_p(tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490
0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, bool, 
bool)
/home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912
0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, 
bool)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582
0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587
0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056
0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436
0xdd6d45 tsubst_decl
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038
0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668
0xdfb9a1 instantiate_template(tree_node*, tree_node*, int)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811
0xdfc1b6 instantiate_alias_template
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896
0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696
0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, 
int, int)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131
0xe4bac0 finish_template_type(tree_node*, tree_node*, int)
/home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727
0xd334c8 cp_parser_template_id
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458
0xd429b0 cp_parser_class_name
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923
0xd1ade9 cp_parser_qualifying_entity
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193
0xd1a2c8 cp_parser_nested_name_specifier_opt
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875
0xd4eefd cp_parser_template_introduction
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668
0xd4f416 cp_parser_template_declaration_after_export
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840
0xd2d60e cp_parser_declaration
/home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083


FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as
typename std::_Nth_type::type but I don't know how. Instead of
trying to fake the mangled string, it's probably better to build a decl
for that nested type, right? Any suggestions where to find something
similar I can learn from?

The reason to mangle it that way is that it preserves the same symbol
names as the library produced in GCC 12, and that it will still produce
with non-GCC compilers (see the definitions of std::_Nth_type in the
library parts of the patch). If we don't do that then either we need to
ensure it never appears in a mangled name, or define some other
GCC-specific mangling for this built-in (e.g. we could just mangle it as
though it's a function, "19__builtin_type_pack_elementELm1EJDpT_E" or
something like that!). If we ensure it doesn't appear in mangled names
that means we still need to instantiate the _Nth_type class template,
rather than defining the alias template _Nth_type_t to use the built-in
directly.  That loses a little of the compilation performance gain that
comes from defining the built-in in the first place (although avoid th

Re: [PATCH] c++: generic targs and identity substitution [PR105956]

2022-07-07 Thread Patrick Palka via Gcc-patches

On Thu, 7 Jul 2022, Patrick Palka wrote:

> On Thu, 7 Jul 2022, Jason Merrill wrote:
> 
> > On 7/6/22 15:26, Patrick Palka wrote:
> > > On Tue, 5 Jul 2022, Jason Merrill wrote:
> > > 
> > > > On 7/5/22 10:06, Patrick Palka wrote:
> > > > > On Fri, 1 Jul 2022, Jason Merrill wrote:
> > > > > 
> > > > > > On 6/29/22 13:42, Patrick Palka wrote:
> > > > > > > In r13-1045-gcb7fd1ea85feea I assumed that substitution into 
> > > > > > > generic
> > > > > > > DECL_TI_ARGS corresponds to an identity mapping of the given
> > > > > > > arguments,
> > > > > > > and hence its safe to always elide such substitution.  But this PR
> > > > > > > demonstrates that such a substitution isn't always the identity
> > > > > > > mapping,
> > > > > > > in particular when there's an ARGUMENT_PACK_SELECT argument, which
> > > > > > > gets
> > > > > > > handled specially during substitution:
> > > > > > > 
> > > > > > >  * when substituting an APS into a template parameter, we 
> > > > > > > strip
> > > > > > > the
> > > > > > >APS to its underlying argument;
> > > > > > >  * and when substituting an APS into a pack expansion, we 
> > > > > > > strip
> > > > > > > the
> > > > > > >APS to its underlying argument pack.
> > > > > > 
> > > > > > Ah, right.  For instance, in variadic96.C we have
> > > > > > 
> > > > > >   10template < typename... T >
> > > > > >   11struct derived
> > > > > >   12  : public base< T, derived< T... > >...
> > > > > > 
> > > > > > so when substituting into the base-specifier, we're approaching it
> > > > > > from
> > > > > > the
> > > > > > outside in, so when we get to the inner T... we need some way to 
> > > > > > find
> > > > > > the
> > > > > > T
> > > > > > pack again.  It might be possible to remove the need for APS by
> > > > > > substituting
> > > > > > inner pack expansions before outer ones, which could improve
> > > > > > worst-case
> > > > > > complexity, but I don't know how relevant that is in real code; I
> > > > > > imagine
> > > > > > most
> > > > > > inner pack expansions are as simple as this one.
> > > > > 
> > > > > Aha, that makes sense.
> > > > > 
> > > > > > 
> > > > > > > In this testcase, when expanding the pack expansion pattern (idx +
> > > > > > > Ns)...
> > > > > > > with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}>
> > > > > > > and
> > > > > > > then Ns=APS<1,{0,1}>.  The DECL_TI_ARGS of idx are the generic
> > > > > > > template
> > > > > > > arguments of the enclosing class template impl, so before 
> > > > > > > r13-1045,
> > > > > > > we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as
> > > > > > > desired.
> > > > > > > But after r13-1045, we elide this substitution and end up 
> > > > > > > attempting
> > > > > > > to
> > > > > > > hash the original Ns argument, an APS, which ICEs.
> > > > > > > 
> > > > > > > So this patch partially reverts this part of r13-1045.  I 
> > > > > > > considered
> > > > > > > using preserve_args in this case instead, but that'd break the
> > > > > > > static_assert in the testcase because preserve_args always strips
> > > > > > > APS to
> > > > > > > its underlying argument, but here we want to strip it to its
> > > > > > > underlying
> > > > > > > argument pack, so we'd incorrectly end up forming the
> > > > > > > specializations
> > > > > > > impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx.
> > > > > > > 
> > > > > > > Although we can't elide the substitution into DECL_TI_ARGS in 
> > > > > > > light
> > > > > > > of
> > > > > > > ARGUMENT_PACK_SELECT, it should still be safe to elide template
> > > > > > > argument
> > > > > > > coercion in the case of a non-template decl, which this patch
> > > > > > > preserves.
> > > > > > > 
> > > > > > > It's unfortunate that we need to remove this optimization just
> > > > > > > because
> > > > > > > it doesn't hold for one special tree code.  So this patch 
> > > > > > > implements
> > > > > > > a
> > > > > > > heuristic in tsubst_template_args to avoid allocating a new 
> > > > > > > TREE_VEC
> > > > > > > if
> > > > > > > the substituted elements are identical to those of a level from
> > > > > > > ARGS.
> > > > > > > It turns out that about 30% of all calls to tsubst_template_args
> > > > > > > benefit
> > > > > > > from this optimization, and it reduces memory usage by about 1.5%
> > > > > > > for
> > > > > > > e.g. stdc++.h (relative to r13-1045).  (This is the maybe_reuse
> > > > > > > stuff,
> > > > > > > the rest of the changes to tsubst_template_args are just drive-by
> > > > > > > cleanups.)
> > > > > > > 
> > > > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look 
> > > > > > > OK
> > > > > > > for
> > > > > > > trunk?  Patch generated with -w to ignore noisy whitespace 
> > > > > > > changes.
> > > > > > > 
> > > > > > >   PR c++/105956
> > > > > > > 
> > > > > > > gcc/cp/ChangeLog:
> > > > > > > 
> > > > > > >   * pt.cc (tsubst_template_args): Move variable declarations
> > > > > >

Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2022-07-07 Thread Marek Polacek via Gcc-patches

On Thu, Jul 07, 2022 at 06:14:36PM +0100, Jonathan Wakely wrote:
> This adds a new built-in to replace the recursive class template
> instantiations done by traits such as std::tuple_element and
> std::variant_alternative. The purpose is to select the Nth type from a
> list of types, e.g. __builtin_type_pack_element(1, char, int, float) is
> int.
> 
> For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> the compilation time is reduced by more than 90% and the memory  used by
> the compiler is reduced by 97%. In realistic examples the gains will be
> much smaller, but still relevant.
> 
> Clang has a similar built-in, __type_pack_element, but that's a
> "magic template" built-in using <> syntax, which GCC doesn't support. So
> this provides an equivalent feature, but as a built-in function using
> parens instead of <>. I don't really like the name "type pack element"
> (it gives you an element from a pack of types) but the semi-consistency
> with Clang seems like a reasonable argument in favour of keeping the
> name. I'd be open to alternative names though, e.g. __builtin_nth_type
> or __builtin_type_at_index.
> 
> 
> The patch has some problems though ...
> 
> FIXME 1: Marek pointed out that this this ICEs:
> template using type = __builtin_type_pack_element(sizeof(T), 
> T...);
> type c;
> 
> The sizeof(T) expression is invalid, because T is an unexpanded pack,
> but it's not rejected and instead crashes:

I think this could be fixed by

  if (check_for_bare_parameter_packs (n))
return error_mark_node;

in finish_type_pack_element.

(I haven't looked at the rest of the patch yet.)

Marek

Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]

2022-07-07 Thread Segher Boessenkool

Hi!

On Thu, Jul 07, 2022 at 04:30:50PM +0800, HAO CHEN GUI wrote:
>   This patch modifies the combine pattern after recog fails. With a helper

It modifies combine itself, not just a pattern in the machine
description.

> - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with
> a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT
> or AND. The conversion helps pattern to match rotate and mask insn on some
> targets.

>   For test case rlwimi-2.c, current trunk fails on
> "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch
> reduces the total number of insns from 21305 to 21279.

That is incorrect.  You need to figure out what actually changed, and if
that is wanted or not, and then write some explanation about that.

>   * config/rs6000/rs6000.md (plus_ior_xor): Removed.
>   (anonymous split pattern for plus_ior_xor): Removed.

"Remove.", in both cases.  Always use imperative in changelogs and
commit messages and the like, not some passive tense.

> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
> +   ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits

s/psuedo/pseudo/

> +   is less than its mode mask.  The nonzero_bits in other pass doesn't return
> +   the same value as it does in combine pass.  */

That isn't quite the problem.

Later passes can return a mask for nonzero_bits (which means: bits that
are not known to be zero) that is not a superset of what was known
during combine.  If you use nonzero_bits in the insn condition of a
define_insn (or define_insn_and_split, same thing under the covers) you
then can end up with an insns that is fine during combine, but no longer
recog()ed later.

> +static bool
> +change_pseudo_and_mask (rtx pat)
> +{
> +  rtx src = SET_SRC (pat);
> +  if ((GET_CODE (src) == IOR
> +   || GET_CODE (src) == XOR
> +   || GET_CODE (src) == PLUS)
> +  && (((GET_CODE (XEXP (src, 0)) == ASHIFT
> + || GET_CODE (XEXP (src, 0)) == AND)
> +&& REG_P (XEXP (src, 1)
> +{
> +  rtx *reg = &XEXP (SET_SRC (pat), 1);

Why the extra indirection?  SUBST is a macro, it can take lvalues just
fine :-)

> +  machine_mode mode = GET_MODE (*reg);
> +  unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
> +  if (nonzero < GET_MODE_MASK (mode))
> + {
> +   int shift;
> +
> +   if (GET_CODE (XEXP (src, 0)) == ASHIFT)
> + shift = INTVAL (XEXP (XEXP (src, 0), 1));
> +   else
> + shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1)));
> +
> +   if (shift > 0
> +   && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero)

Too many parens.

> + {
> +   unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1;
> +   rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask));
> +   SUBST (*reg, x);
> +   maybe_swap_commutative_operands (SET_SRC (pat));
> +   return true;
> + }
> + }
> + }
> +  return false;

Broken indentation.

> --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> @@ -12,7 +12,7 @@ void rotins (unsigned int x)
>b.y = (x<<12) | (x>>20);
>  }
> 
> -/* { dg-final { scan-assembler-not {\mrlwinm} } } */
> +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */

Why?

> +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */

Can this just be
  /* { dg-final { scan-assembler-times {\mrl[wd]imi\M} 2 } } */
or is it necessary to not want rlwimi on 64-bit?

> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -2,14 +2,14 @@
>  /* { dg-options "-O2" } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } 
> } */

You are saying there should be 21279 instructions generated by this test
case.  What makes you say that?  Yes, we regressed some time ago, we
generate too many insns in many cases, but that is *bad*.

>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } 
> } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } 
> } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } 
> } */

This needs an explanation (and then the 32-bit and 64-bit checks can be
merged).


This probably needs changes after 4306339798b6 (if it is still wanted?)


Segher

Re: [PATCH] c++: generic targs and identity substitution [PR105956]

2022-07-07 Thread Jason Merrill via Gcc-patches


On 7/7/22 11:16, Patrick Palka wrote:

On Thu, 7 Jul 2022, Jason Merrill wrote:


On 7/6/22 15:26, Patrick Palka wrote:

On Tue, 5 Jul 2022, Jason Merrill wrote:


On 7/5/22 10:06, Patrick Palka wrote:

On Fri, 1 Jul 2022, Jason Merrill wrote:


On 6/29/22 13:42, Patrick Palka wrote:

In r13-1045-gcb7fd1ea85feea I assumed that substitution into generic
DECL_TI_ARGS corresponds to an identity mapping of the given
arguments,
and hence its safe to always elide such substitution.  But this PR
demonstrates that such a substitution isn't always the identity
mapping,
in particular when there's an ARGUMENT_PACK_SELECT argument, which
gets
handled specially during substitution:

  * when substituting an APS into a template parameter, we strip
the
APS to its underlying argument;
  * and when substituting an APS into a pack expansion, we strip
the
APS to its underlying argument pack.


Ah, right.  For instance, in variadic96.C we have

   10   template < typename... T >
   11   struct derived
   12 : public base< T, derived< T... > >...

so when substituting into the base-specifier, we're approaching it
from
the
outside in, so when we get to the inner T... we need some way to find
the
T
pack again.  It might be possible to remove the need for APS by
substituting
inner pack expansions before outer ones, which could improve
worst-case
complexity, but I don't know how relevant that is in real code; I
imagine
most
inner pack expansions are as simple as this one.


Aha, that makes sense.




In this testcase, when expanding the pack expansion pattern (idx +
Ns)...
with Ns={0,1}, we specialize idx twice, first with Ns=APS<0,{0,1}>
and
then Ns=APS<1,{0,1}>.  The DECL_TI_ARGS of idx are the generic
template
arguments of the enclosing class template impl, so before r13-1045,
we'd substitute into its DECL_TI_ARGS which gave Ns={0,1} as
desired.
But after r13-1045, we elide this substitution and end up attempting
to
hash the original Ns argument, an APS, which ICEs.

So this patch partially reverts this part of r13-1045.  I considered
using preserve_args in this case instead, but that'd break the
static_assert in the testcase because preserve_args always strips
APS to
its underlying argument, but here we want to strip it to its
underlying
argument pack, so we'd incorrectly end up forming the
specializations
impl<0>::idx and impl<1>::idx instead of impl<0,1>::idx.

Although we can't elide the substitution into DECL_TI_ARGS in light
of
ARGUMENT_PACK_SELECT, it should still be safe to elide template
argument
coercion in the case of a non-template decl, which this patch
preserves.

It's unfortunate that we need to remove this optimization just
because
it doesn't hold for one special tree code.  So this patch implements
a
heuristic in tsubst_template_args to avoid allocating a new TREE_VEC
if
the substituted elements are identical to those of a level from
ARGS.
It turns out that about 30% of all calls to tsubst_template_args
benefit
from this optimization, and it reduces memory usage by about 1.5%
for
e.g. stdc++.h (relative to r13-1045).  (This is the maybe_reuse
stuff,
the rest of the changes to tsubst_template_args are just drive-by
cleanups.)

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for
trunk?  Patch generated with -w to ignore noisy whitespace changes.

PR c++/105956

gcc/cp/ChangeLog:

* pt.cc (tsubst_template_args): Move variable declarations
closer to their first use.  Replace 'orig_t' with 'r'.  Rename
'need_new' to 'const_subst_p'.  Heuristically detect if the
substituted elements are identical to that of a level from
'args' and avoid allocating a new TREE_VEC if so.
(tsubst_decl) : Revert
r13-1045-gcb7fd1ea85feea change for avoiding substitution into
DECL_TI_ARGS, but still avoid coercion in this case.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic183.C: New test.
---
 gcc/cp/pt.cc | 113
++-
 gcc/testsuite/g++.dg/cpp0x/variadic183.C |  14 +++
 2 files changed, 85 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic183.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 8672da123f4..7898834faa6 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
  Fixed by: C++20 modules.  */
   #include "config.h"
+#define INCLUDE_ALGORITHM // for std::equal
 #include "system.h"
 #include "coretypes.h"
 #include "cp-tree.h"
@@ -13544,17 +13545,22 @@ tsubst_argument_pack (tree orig_arg, tree
args,
tsubst_flags_t complain,
 tree
 tsubst_template_args (tree t, tree args, tsubst_flags_t
complain,
tree
in_decl)
 {
-  tree orig_t = t;
-  int len, need_new = 0, i, expanded_len_adjust = 0, out;
-  tree *elts;
-
   if (t == error_mark_node)
 return error_mark_node;
 -  len = TRE

Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2022-07-07 Thread Jason Merrill via Gcc-patches


On 7/7/22 13:14, Jonathan Wakely wrote:

This adds a new built-in to replace the recursive class template
instantiations done by traits such as std::tuple_element and
std::variant_alternative. The purpose is to select the Nth type from a
list of types, e.g. __builtin_type_pack_element(1, char, int, float) is
int.

For a pathological example tuple_element_t<1000, tuple<2000 types...>>
the compilation time is reduced by more than 90% and the memory  used by
the compiler is reduced by 97%. In realistic examples the gains will be
much smaller, but still relevant.

Clang has a similar built-in, __type_pack_element, but that's a
"magic template" built-in using <> syntax, which GCC doesn't support. So
this provides an equivalent feature, but as a built-in function using
parens instead of <>. I don't really like the name "type pack element"
(it gives you an element from a pack of types) but the semi-consistency
with Clang seems like a reasonable argument in favour of keeping the
name. I'd be open to alternative names though, e.g. __builtin_nth_type
or __builtin_type_at_index.


The patch has some problems though ...

FIXME 1: Marek pointed out that this this ICEs:
template using type = __builtin_type_pack_element(sizeof(T), T...);
type c;

The sizeof(T) expression is invalid, because T is an unexpanded pack,
but it's not rejected and instead crashes:

ice.C: In substitution of 'template using type = 
__builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]':
ice.C:2:15:   required from here
ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490
 1 | template using type = 
__builtin_type_pack_element(sizeof(T), T...);
   |   ^
0xe13eea dependent_type_p(tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490
0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, bool, 
bool)
 /home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912
0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, bool, 
bool)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582
0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587
0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056
0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436
0xdd6d45 tsubst_decl
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038
0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668
0xdfb9a1 instantiate_template(tree_node*, tree_node*, int)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811
0xdfc1b6 instantiate_alias_template
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896
0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696
0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*, 
int, int)
 /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131
0xe4bac0 finish_template_type(tree_node*, tree_node*, int)
 /home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727
0xd334c8 cp_parser_template_id
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458
0xd429b0 cp_parser_class_name
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923
0xd1ade9 cp_parser_qualifying_entity
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193
0xd1a2c8 cp_parser_nested_name_specifier_opt
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875
0xd4eefd cp_parser_template_introduction
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668
0xd4f416 cp_parser_template_declaration_after_export
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840
0xd2d60e cp_parser_declaration
 /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083


FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as
typename std::_Nth_type::type but I don't know how. Instead of
trying to fake the mangled string, it's probably better to build a decl
for that nested type, right? Any suggestions where to find something
similar I can learn from?


The tricky thing is dealing with mangling compression, where we use a 
substitution instead of repeating a type; that's definitely easier if we 
actually have the type.


So you'd probably want to have a declaration of std::_Nth_type to work 
with, and lookup_template_class to get the type of that specialization. 
 And then if it's complete, look up ...::type; if not, we could 
probably stuff a ...::type in its TYPE_FIELDS that would get clobbered 
if we actually instantiated the type...



The reason to mangle it that way is that it preserves the same symbol
names as the library produced in GCC 12, and that it will still produce
with non-GCC compilers (see the definitions of std::_Nth_type in the
library parts of the patch). If we don't do that then either we need to
e

[PATCH/RFC] combine_completed global variable.

2022-07-07 Thread Roger Sayle


Hi Kewen (and Segher),
Many thanks for stress testing my patch to improve multiplication
by integer constants on rs6000 by using the rldmi instruction.
Although I've not been able to reproduce your ICE (using gcc135
on the compile farm), I completely agree with Segher's analysis
that the Achilles heel with my approach/patch is that there's
currently no way for the backend/recog to know that we're in a
pass after combine.

Rather than give up on this optimization (and a similar one for
I386.md where test;sete can be replaced by xor $1 when combine
knows that nonzero_bits is 1, but loses that information afterwards),
I thought I'd post this "strawman" proposal to add a combine_completed
global variable, matching the reload_completed and regstack_completed
global variables already used (to track progress) by the middle-end.

I was wondering if I could ask you could test the attached patch
in combination with my previous rs6000.md patch (with the obvious
change of reload_completed to combine_completed) to confirm
that it fixes the problems you were seeing.

Segher/Richard, would this sort of patch be considered acceptable?
Or is there a better approach/solution?


2022-07-07  Roger Sayle  

gcc/ChangeLog
* combine.cc (combine_completed): New global variable.
(rest_of_handle_combine): Set combine_completed after pass.
* final.cc (rest_of_clean_state): Reset combine_completed.
* rtl.h (combine_completed): Prototype here.


Many thanks in advance,
Roger
--

> -Original Message-
> From: Kewen.Lin 
> Sent: 27 June 2022 10:04
> To: Roger Sayle 
> Cc: gcc-patches@gcc.gnu.org; Segher Boessenkool
> ; David Edelsohn 
> Subject: Re: [rs6000 PATCH] Improve constant integer multiply using rldimi.
> 
> Hi Roger,
> 
> on 2022/6/27 04:56, Roger Sayle wrote:
> >
> >
> > This patch tweaks the code generated on POWER for integer
> > multiplications
> >
> > by a constant, by making use of rldimi instructions.  Much like x86's
> >
> > lea instruction, rldimi can be used to implement a shift and add pair
> >
> > in some circumstances.  For rldimi this is when the shifted operand
> >
> > is known to have no bits in common with the added operand.
> >
> >
> >
> > Hence for the new testcase below:
> >
> >
> >
> > int foo(int x)
> >
> > {
> >
> >   int t = x & 42;
> >
> >   return t * 0x2001;
> >
> > }
> >
> >
> >
> > when compiled with -O2, GCC currently generates:
> >
> >
> >
> > andi. 3,3,0x2a
> >
> > slwi 9,3,13
> >
> > add 3,9,3
> >
> > extsw 3,3
> >
> > blr
> >
> >
> >
> > with this patch, we now generate:
> >
> >
> >
> > andi. 3,3,0x2a
> >
> > rlwimi 3,3,13,0,31-13
> >
> > extsw 3,3
> >
> > blr
> >
> >
> >
> > It turns out this optimization already exists in the form of a combine
> >
> > splitter in rs6000.md, but the constraints on combine splitters,
> >
> > requiring three of four input instructions (and generating one or two
> >
> > output instructions) mean it doesn't get applied as often as it could.
> >
> > This patch converts the define_split into a define_insn_and_split to
> >
> > catch more cases (such as the one above).
> >
> >
> >
> > The one bit that's tricky/controversial is the use of RTL's
> >
> > nonzero_bits which is accurate during the combine pass when this
> >
> > pattern is first recognized, but not as advanced (not kept up to
> >
> > date) when this pattern is eventually split.  To support this,
> >
> > I've used a "|| reload_completed" idiom.  Does this approach seem
> >
> > reasonable? [I've another patch of x86 that uses the same idiom].
> >
> >
> 
> I tested this patch on powerpc64-linux-gnu, it caused the below ICE against 
> test
> case gcc/testsuite/gcc.c-torture/compile/pr93098.c.
> 
> gcc/testsuite/gcc.c-torture/compile/pr93098.c: In function ‘foo’:
> gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: error: unrecognizable 
> insn:
> (insn 104 32 34 2 (set (reg:SI 185 [+4 ])
> (ior:SI (and:SI (reg:SI 200 [+4 ])
> (const_int 4294967295 [0x]))
> (ashift:SI (reg:SI 140)
> (const_int 32 [0x20] "gcc/testsuite/gcc.c-
> torture/compile/pr93098.c":6:11 -1
>  (nil))
> during RTL pass: subreg3
> dump file: pr93098.c.291r.subreg3
> gcc/testsuite/gcc.c-torture/compile/pr93098.c:10:1: internal compiler error: 
> in
> extract_insn, at recog.cc:2791 0x101f664b _fatal_insn(char const*, rtx_def
> const*, char const*, int, char const*)
> gcc/rtl-error.cc:108
> 0x101f6697 _fatal_insn_not_found(rtx_def const*, char const*, int, char
> const*)
> gcc/rtl-error.cc:116
> 0x10ae427f extract_insn(rtx_insn*)
> gcc/recog.cc:2791
> 0x11b239bb decompose_multiword_subregs
> gcc/lower-subreg.cc:1555
> 0x11b25013 execute
> gcc/lower-subreg.cc:1818
> 
> The above trace shows we fails to recog the pattern again due to the 
> inaccurate
> nonzero_bits information as you pointed out above.
> 
> There was another patch

[pushed] analyzer: fix false positives from -Wanalyzer-tainted-divisor [PR106225]

2022-07-07 Thread David Malcolm via Gcc-patches

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r13-1562-g897b3b31f0a94b.

gcc/analyzer/ChangeLog:
PR analyzer/106225
* sm-taint.cc (taint_state_machine::on_stmt): Move handling of
assignments from division to...
(taint_state_machine::check_for_tainted_divisor): ...this new
function.  Reject warning when the divisor is known to be non-zero.
* sm.cc: Include "analyzer/program-state.h".
(sm_context::get_old_region_model): New.
* sm.h (sm_context::get_old_region_model): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/106225
* gcc.dg/analyzer/taint-divisor-1.c: Add test coverage for various
correct and incorrect checks against zero.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/sm-taint.cc  | 51 ++
 gcc/analyzer/sm.cc| 12 
 gcc/analyzer/sm.h |  2 +
 .../gcc.dg/analyzer/taint-divisor-1.c | 66 +++
 4 files changed, 119 insertions(+), 12 deletions(-)

diff --git a/gcc/analyzer/sm-taint.cc b/gcc/analyzer/sm-taint.cc
index d2d03c3d602..4075cf6d868 100644
--- a/gcc/analyzer/sm-taint.cc
+++ b/gcc/analyzer/sm-taint.cc
@@ -109,6 +109,9 @@ private:
   const supernode *node,
   const gcall *call,
   tree callee_fndecl) const;
+  void check_for_tainted_divisor (sm_context *sm_ctxt,
+ const supernode *node,
+ const gassign *assign) const;
 
 public:
   /* State for a "tainted" value: unsanitized data potentially under an
@@ -803,18 +806,7 @@ taint_state_machine::on_stmt (sm_context *sm_ctxt,
case ROUND_MOD_EXPR:
case RDIV_EXPR:
case EXACT_DIV_EXPR:
- {
-   tree divisor = gimple_assign_rhs2 (assign);;
-   state_t state = sm_ctxt->get_state (stmt, divisor);
-   enum bounds b;
-   if (get_taint (state, TREE_TYPE (divisor), &b))
- {
-   tree diag_divisor = sm_ctxt->get_diagnostic_tree (divisor);
-   sm_ctxt->warn  (node, stmt, divisor,
-   new tainted_divisor (*this, diag_divisor, b));
-   sm_ctxt->set_next_state (stmt, divisor, m_stop);
- }
- }
+ check_for_tainted_divisor (sm_ctxt, node, assign);
  break;
}
 }
@@ -989,6 +981,41 @@ taint_state_machine::check_for_tainted_size_arg 
(sm_context *sm_ctxt,
 }
 }
 
+/* Complain if ASSIGN (a division operation) has a tainted divisor
+   that could be zero.  */
+
+void
+taint_state_machine::check_for_tainted_divisor (sm_context *sm_ctxt,
+   const supernode *node,
+   const gassign *assign) const
+{
+  const region_model *old_model = sm_ctxt->get_old_region_model ();
+  if (!old_model)
+return;
+
+  tree divisor_expr = gimple_assign_rhs2 (assign);;
+  const svalue *divisor_sval = old_model->get_rvalue (divisor_expr, NULL);
+
+  state_t state = sm_ctxt->get_state (assign, divisor_sval);
+  enum bounds b;
+  if (get_taint (state, TREE_TYPE (divisor_expr), &b))
+{
+  const svalue *zero_sval
+   = old_model->get_manager ()->get_or_create_int_cst
+   (TREE_TYPE (divisor_expr), 0);
+  tristate ts
+   = old_model->eval_condition (divisor_sval, NE_EXPR, zero_sval);
+  if (ts.is_true ())
+   /* The divisor is known to not equal 0: don't warn.  */
+   return;
+
+  tree diag_divisor = sm_ctxt->get_diagnostic_tree (divisor_expr);
+  sm_ctxt->warn (node, assign, divisor_expr,
+new tainted_divisor (*this, diag_divisor, b));
+  sm_ctxt->set_next_state (assign, divisor_sval, m_stop);
+}
+}
+
 } // anonymous namespace
 
 /* Internal interface to this file. */
diff --git a/gcc/analyzer/sm.cc b/gcc/analyzer/sm.cc
index 24c20b894cd..d17d5c765b4 100644
--- a/gcc/analyzer/sm.cc
+++ b/gcc/analyzer/sm.cc
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "analyzer/program-point.h"
 #include "analyzer/store.h"
 #include "analyzer/svalue.h"
+#include "analyzer/program-state.h"
 
 #if ENABLE_ANALYZER
 
@@ -159,6 +160,17 @@ state_machine::to_json () const
   return sm_obj;
 }
 
+/* class sm_context.  */
+
+const region_model *
+sm_context::get_old_region_model () const
+{
+  if (const program_state *old_state = get_old_program_state ())
+return old_state->m_region_model;
+  else
+return NULL;
+}
+
 /* Create instances of the various state machines, each using LOGGER,
and populate OUT with them.  */
 
diff --git a/gcc/analyzer/sm.h b/gcc/analyzer/sm.h
index e80ef1fac37..353a6db53b0 100644
--- a/gcc/analyzer/sm.h
+++ b/gcc/analyzer/sm.h
@@ -279,6 +279,8 @@ public:
   virtual const program_state *get_old_program_state () const =

[PATCH] Introduce hardbool attribute for C

2022-07-07 Thread Alexandre Oliva via Gcc-patches



This patch introduces hardened booleans in C.  The hardbool attribute,
when attached to an integral type, turns it into an enumerate type
with boolean semantics, using the named or implied constants as
representations for false and true.

Expressions of such types decay to _Bool, trapping if the value is
neither true nor false, and _Bool can convert implicitly back to them.
Other conversions go through _Bool first.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/c-family/ChangeLog

* c-attribs.cc (c_common_attribute_table): Add hardbool.
(handle_hardbool_attribute): New.
(type_valid_for_vector_size): Reject hardbool.
* c-common.cc (convert_and_check): Skip warnings for convert
and check for hardbool.
(c_hardbool_type_attr_1): New.
* c-common.h (c_hardbool_type_attr): New.

for  gcc/c/ChangeLog

* c-typeck.cc (convert_lvalue_to_rvalue): Decay hardbools.
* c-convert.cc (convert): Convert to hardbool through
truthvalue.
* c-decl.cc (check_bitfield_type_and_width): Skip enumeral
truncation warnings for hardbool.
(finish_struct): Propagate hardbool attribute to bitfield
types.
(digest_init): Convert to hardbool.

for  gcc/ChangeLog

* doc/extend.texi (hardbool): New type attribute.

for  gcc/testsuite/ChangeLog

* gcc.dg/hardbool-err.c: New.
* gcc.dg/hardbool-trap.c: New.
* gcc.dg/hardbool.c: New.
* gcc.dg/hardbool-s.c: New.
* gcc.dg/hardbool-us.c: New.
* gcc.dg/hardbool-i.c: New.
* gcc.dg/hardbool-ul.c: New.
* gcc.dg/hardbool-ll.c: New.
* gcc.dg/hardbool-5a.c: New.
* gcc.dg/hardbool-s-5a.c: New.
* gcc.dg/hardbool-us-5a.c: New.
* gcc.dg/hardbool-i-5a.c: New.
* gcc.dg/hardbool-ul-5a.c: New.
* gcc.dg/hardbool-ll-5a.c: New.
---
 gcc/c-family/c-attribs.cc |   97 -
 gcc/c-family/c-common.cc  |   21 
 gcc/c-family/c-common.h   |   18 
 gcc/c/c-convert.cc|   14 +++
 gcc/c/c-decl.cc   |   10 ++
 gcc/c/c-typeck.cc |   31 ++-
 gcc/doc/extend.texi   |   37 
 gcc/testsuite/gcc.dg/hardbool-err.c   |   28 ++
 gcc/testsuite/gcc.dg/hardbool-trap.c  |   13 +++
 gcc/testsuite/gcc.dg/torture/hardbool-5a.c|6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-i.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ll.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c  |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-s.c |5 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-ul.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c |6 +
 gcc/testsuite/gcc.dg/torture/hardbool-us.c|5 +
 gcc/testsuite/gcc.dg/torture/hardbool.c   |  118 +
 21 files changed, 444 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-err.c
 create mode 100644 gcc/testsuite/gcc.dg/hardbool-trap.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-i.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ll.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-s.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-ul.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us-5a.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool-us.c
 create mode 100644 gcc/testsuite/gcc.dg/torture/hardbool.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index c8d96723f4c30..e385d780c49ce 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -172,6 +172,7 @@ static tree handle_objc_root_class_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_objc_nullability_attribute (tree *, tree, tree, int, bool 
*);
 static tree handle_signed_bool_precision_attribute (tree *, tree, tree, int,
bool *);
+static tree handle_hardbool_attribute (tree *, tree, tree, int, bool *);
 static tree handle_retain_attribute (tree *, tree, tree, int, bool *);
 
 /* Helper to define attribute exclusions.  */
@@ -288,6 +289,8 @@ const struct attribute_spec c_common_attribute_table[] =
affects_type_identity, handler, exclude } */
   { "signed_bool_precision",  1, 1, false, true, false, true,
  ha

[pushed 2/2] analyzer: use label_text for superedge::get_description

2022-07-07 Thread David Malcolm via Gcc-patches

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Lightly tested with valgrind.

Pushed to trunk as r13-1564-g52f538fa4a13d5.

gcc/analyzer/ChangeLog:
* checker-path.cc (start_cfg_edge_event::get_desc): Update for
superedge::get_description returning a label_text.
* engine.cc (feasibility_state::maybe_update_for_edge): Likewise.
* supergraph.cc (superedge::dump): Likewise.
(superedge::get_description): Convert return type from char * to
label_text.
* supergraph.h (superedge::get_description): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/checker-path.cc |  3 +--
 gcc/analyzer/engine.cc   |  5 ++---
 gcc/analyzer/supergraph.cc   | 13 +
 gcc/analyzer/supergraph.h|  2 +-
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 959ffdd853c..211cf3e0333 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -594,8 +594,7 @@ label_text
 start_cfg_edge_event::get_desc (bool can_colorize) const
 {
   bool user_facing = !flag_analyzer_verbose_edges;
-  label_text edge_desc
-= label_text::take (m_sedge->get_description (user_facing));
+  label_text edge_desc (m_sedge->get_description (user_facing));
   if (user_facing)
 {
   if (edge_desc.m_buffer && strlen (edge_desc.m_buffer) > 0)
diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 0674c8ba3b6..888123f2b95 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -4586,12 +4586,11 @@ feasibility_state::maybe_update_for_edge (logger 
*logger,
 {
   if (logger)
{
- char *desc = sedge->get_description (false);
+ label_text desc (sedge->get_description (false));
  logger->log ("  sedge: SN:%i -> SN:%i %s",
   sedge->m_src->m_index,
   sedge->m_dest->m_index,
-  desc);
- free (desc);
+  desc.m_buffer);
}
 
   const gimple *last_stmt = src_point.get_supernode ()->get_last_stmt ();
diff --git a/gcc/analyzer/supergraph.cc b/gcc/analyzer/supergraph.cc
index f023c533a09..52b4852404d 100644
--- a/gcc/analyzer/supergraph.cc
+++ b/gcc/analyzer/supergraph.cc
@@ -854,13 +854,12 @@ void
 superedge::dump (pretty_printer *pp) const
 {
   pp_printf (pp, "edge: SN: %i -> SN: %i", m_src->m_index, m_dest->m_index);
-  char *desc = get_description (false);
-  if (strlen (desc) > 0)
+  label_text desc (get_description (false));
+  if (strlen (desc.m_buffer) > 0)
 {
   pp_space (pp);
-  pp_string (pp, desc);
+  pp_string (pp, desc.m_buffer);
 }
-  free (desc);
 }
 
 /* Dump this superedge to stderr.  */
@@ -998,17 +997,15 @@ superedge::get_any_callgraph_edge () const
 /* Build a description of this superedge (e.g. "true" for the true
edge of a conditional, or "case 42:" for a switch case).
 
-   The caller is responsible for freeing the result.
-
If USER_FACING is false, the result also contains any underlying
CFG edge flags. e.g. " (flags FALLTHRU | DFS_BACK)".  */
 
-char *
+label_text
 superedge::get_description (bool user_facing) const
 {
   pretty_printer pp;
   dump_label_to_pp (&pp, user_facing);
-  return xstrdup (pp_formatted_text (&pp));
+  return label_text::take (xstrdup (pp_formatted_text (&pp)));
 }
 
 /* Implementation of superedge::dump_label_to_pp for non-switch CFG
diff --git a/gcc/analyzer/supergraph.h b/gcc/analyzer/supergraph.h
index 42c6df57435..e9a5be27d88 100644
--- a/gcc/analyzer/supergraph.h
+++ b/gcc/analyzer/supergraph.h
@@ -331,7 +331,7 @@ class superedge : public dedge
   ::edge get_any_cfg_edge () const;
   cgraph_edge *get_any_callgraph_edge () const;
 
-  char *get_description (bool user_facing) const;
+  label_text get_description (bool user_facing) const;
 
  protected:
   superedge (supernode *src, supernode *dest, enum edge_kind kind)
-- 
2.26.3

[pushed 1/2] Convert label_text to C++11 move semantics

2022-07-07 Thread David Malcolm via Gcc-patches

libcpp's class label_text stores a char * for a string and a flag saying
whether it owns the buffer.  I added this class before we could use
C++11, and so to avoid lots of copying it required an explicit call
to label_text::maybe_free to potentially free the buffer.

Now that we can use C++11, this patch removes label_text::maybe_free in
favor of doing the cleanup in the destructor, and using C++ move
semantics to avoid any copying.  This allows lots of messy cleanup code
to be eliminated in favor of implicit destruction (mostly in the
analyzer).

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Lightly tested with valgrind.

Pushed to trunk as r13-1563-ga8dce13c076019.

gcc/analyzer/ChangeLog:
* call-info.cc (call_info::print): Update for removal of
label_text::maybe_free in favor of automatic memory management.
* checker-path.cc (checker_event::dump): Likewise.
(checker_event::prepare_for_emission): Likewise.
(state_change_event::get_desc): Likewise.
(superedge_event::should_filter_p): Likewise.
(start_cfg_edge_event::get_desc): Likewise.
(warning_event::get_desc): Likewise.
(checker_path::dump): Likewise.
(checker_path::debug): Likewise.
* diagnostic-manager.cc
(diagnostic_manager::prune_for_sm_diagnostic): Likewise.
(diagnostic_manager::prune_interproc_events): Likewise.
* program-state.cc (sm_state_map::to_json): Likewise.
* region.cc (region::to_json): Likewise.
* sm-malloc.cc (inform_nonnull_attribute): Likewise.
* store.cc (binding_map::to_json): Likewise.
(store::to_json): Likewise.
* svalue.cc (svalue::to_json): Likewise.

gcc/c-family/ChangeLog:
* c-format.cc (range_label_for_format_type_mismatch::get_text):
Update for removal of label_text::maybe_free in favor of automatic
memory management.

gcc/ChangeLog:
* diagnostic-format-json.cc (json_from_location_range): Update for
removal of label_text::maybe_free in favor of automatic memory
management.
* diagnostic-format-sarif.cc
(sarif_builder::make_location_object): Likewise.
* diagnostic-show-locus.cc (struct pod_label_text): New.
(class line_label): Convert m_text from label_text to pod_label_text.
(layout::print_any_labels): Move "text" to the line_label.
* tree-diagnostic-path.cc (path_label::get_text): Update for
removal of label_text::maybe_free in favor of automatic memory
management.
(event_range::print): Likewise.
(default_tree_diagnostic_path_printer): Likewise.
(default_tree_make_json_for_path): Likewise.

libcpp/ChangeLog:
* include/line-map.h: Include .
(class label_text): Delete maybe_free method in favor of a
destructor.  Add move ctor and assignment operator.  Add deletion
of the copy ctor and copy-assignment operator.  Rename field
m_caller_owned to m_owned.  Add std::move where necessary; add
moved_from member function.

Signed-off-by: David Malcolm 
---
 gcc/analyzer/call-info.cc  |  1 -
 gcc/analyzer/checker-path.cc   | 97 ++
 gcc/analyzer/diagnostic-manager.cc |  8 ---
 gcc/analyzer/program-state.cc  |  1 -
 gcc/analyzer/region.cc |  1 -
 gcc/analyzer/sm-malloc.cc  |  3 -
 gcc/analyzer/store.cc  |  3 -
 gcc/analyzer/svalue.cc |  1 -
 gcc/c-family/c-format.cc   |  1 -
 gcc/diagnostic-format-json.cc  |  4 +-
 gcc/diagnostic-format-sarif.cc |  1 -
 gcc/diagnostic-show-locus.cc   | 35 +--
 gcc/tree-diagnostic-path.cc|  4 --
 libcpp/include/line-map.h  | 46 +++---
 14 files changed, 101 insertions(+), 105 deletions(-)

diff --git a/gcc/analyzer/call-info.cc b/gcc/analyzer/call-info.cc
index b3ff51e7460..e1142d743a3 100644
--- a/gcc/analyzer/call-info.cc
+++ b/gcc/analyzer/call-info.cc
@@ -76,7 +76,6 @@ call_info::print (pretty_printer *pp) const
 {
   label_text desc (get_desc (pp_show_color (pp)));
   pp_string (pp, desc.m_buffer);
-  desc.maybe_free ();
 }
 
 /* Implementation of custom_edge_info::add_events_to_path vfunc for
diff --git a/gcc/analyzer/checker-path.cc b/gcc/analyzer/checker-path.cc
index 953e192cd55..959ffdd853c 100644
--- a/gcc/analyzer/checker-path.cc
+++ b/gcc/analyzer/checker-path.cc
@@ -196,7 +196,6 @@ checker_event::dump (pretty_printer *pp) const
   label_text event_desc (get_desc (false));
   pp_printf (pp, "\"%s\" (depth %i",
 event_desc.m_buffer, m_effective_depth);
-  event_desc.maybe_free ();
 
   if (m_effective_depth != m_original_depth)
 pp_printf (pp, " corrected from %i",
@@ -235,7 +234,6 @@ checker_event::prepare_for_emission (checker_path *,
   m_emission_id = emission_id;
 
   label_text desc = get_desc (false);
-  desc.maybe_free ();
 }
 
 /* class debug_event : pub

[PATCH] Control flow redundancy hardening

2022-07-07 Thread Alexandre Oliva via Gcc-patches



This patch introduces an optional hardening pass to catch unexpected
execution flows.  Functions are transformed so that basic blocks set a
bit in an automatic array, and (non-exceptional) function exit edges
check that the bits in the array represent an expected execution path
in the CFG.

Functions with multiple exit edges, or with too many blocks, call an
out-of-line checker builtin implemented in libgcc.  For simpler
functions, the verification is performed in-line.

-fharden-control-flow-redundancy enables the pass for eligible
functions, --param hardcfr-max-blocks sets a block count limit for
functions to be eligible, and --param hardcfr-max-inline-blocks
tunes the "too many blocks" limit for in-line verification.

Regstrapped on x86_64-linux-gnu.  Also bootstrapped with a patchlet that
enables it by default, with --param hardcfr-max-blocks=32.  Ok to
install?


for  gcc/ChangeLog

* Makefile.in (OBJS): Add gimple-harden-control-flow.o.
* builtins.def (BUILT_IN___HARDCFR_CHECK): New.
* common.opt (fharden-control-flow-redundancy): New.
* doc/invoke.texi (fharden-control-flow-redundancy): New.
(hardcfr-max-blocks, hardcfr-max-inline-blocks): New params.
* gimple-harden-control-flow.cc: New.
* params.opt (-param=hardcfr-max-blocks=): New.
(-param=hradcfr-max-inline-blocks=): New.
* passes.def (pass_harden_control_flow_redundancy): Add.
* tree-pass.h (make_pass_harden_control_flow_redundancy):
Declare.

for  gcc/testsuite/ChangeLog

* c-c++-common/torture/harden-cfr.c: New.
* c-c++-common/torture/harden-abrt.c: New.
* c-c++-common/torture/harden-bref.c: New.
* c-c++-common/torture/harden-tail.c: New.
* gnat.dg/hardcfr.adb: New.

for  libgcc/ChangeLog

* Makefile.in (LIB2ADD): Add hardcfr.c.
* hardcfr.c: New.
---
 gcc/Makefile.in|1 
 gcc/builtins.def   |3 
 gcc/common.opt |4 
 gcc/doc/invoke.texi|   19 +
 gcc/gimple-harden-control-flow.cc  |  713 
 gcc/params.opt |8 
 gcc/passes.def |1 
 .../c-c++-common/torture/harden-cfr-abrt.c |   11 
 .../c-c++-common/torture/harden-cfr-bret.c |   11 
 .../c-c++-common/torture/harden-cfr-tail.c |   17 
 gcc/testsuite/c-c++-common/torture/harden-cfr.c|   81 ++
 gcc/testsuite/gnat.dg/hardcfr.adb  |   76 ++
 gcc/tree-pass.h|2 
 libgcc/Makefile.in |3 
 libgcc/hardcfr.c   |  176 +
 15 files changed, 1126 insertions(+)
 create mode 100644 gcc/gimple-harden-control-flow.cc
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-abrt.c
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-bret.c
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr-tail.c
 create mode 100644 gcc/testsuite/c-c++-common/torture/harden-cfr.c
 create mode 100644 gcc/testsuite/gnat.dg/hardcfr.adb
 create mode 100644 libgcc/hardcfr.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3ae237024265c..2a15e6ecf0802 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1403,6 +1403,7 @@ OBJS = \
gimple-iterator.o \
gimple-fold.o \
gimple-harden-conditionals.o \
+   gimple-harden-control-flow.o \
gimple-laddress.o \
gimple-loop-interchange.o \
gimple-loop-jam.o \
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 005976f34e913..b987f9af425fd 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1055,6 +1055,9 @@ DEF_GCC_BUILTIN (BUILT_IN_FILE, "FILE", 
BT_FN_CONST_STRING, ATTR_NOTHROW_LEAF_LI
 DEF_GCC_BUILTIN (BUILT_IN_FUNCTION, "FUNCTION", BT_FN_CONST_STRING, 
ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN (BUILT_IN_LINE, "LINE", BT_FN_INT, ATTR_NOTHROW_LEAF_LIST)
 
+/* Control Flow Redundancy hardening out-of-line checker.  */
+DEF_BUILTIN_STUB (BUILT_IN___HARDCFR_CHECK, "__builtin___hardcfr_check")
+
 /* Synchronization Primitives.  */
 #include "sync-builtins.def"
 
diff --git a/gcc/common.opt b/gcc/common.opt
index e7a51e882bade..54eb30fba642f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1797,6 +1797,10 @@ fharden-conditional-branches
 Common Var(flag_harden_conditional_branches) Optimization
 Harden conditional branches by checking reversed conditions.
 
+fharden-control-flow-redundancy
+Common Var(flag_harden_control_flow_redundancy) Optimization
+Harden control flow by recording and checking execution paths.
+
 ; Nonzero means ignore `#ident' directives.  0 means handle them.
 ; Generate position-independent code for executables if possible
 ; On SVR4 targets, it also controls whether or not to emit a
diff --git a/gcc/doc/invoke.texi b/gcc/do

Re: kernel sparse annotations vs. compiler attributes and debug_annotate_{type,decl} WAS: Re: [PATCH 0/9] Add debug_annotate attributes

2022-07-07 Thread Jose E. Marchesi via Gcc-patches

Hi Yonghong.

> On 6/21/22 9:12 AM, Jose E. Marchesi wrote:
>> 
>>> On 6/17/22 10:18 AM, Jose E. Marchesi wrote:
 Hi Yonghong.

> On 6/15/22 1:57 PM, David Faust wrote:
>>
>> On 6/14/22 22:53, Yonghong Song wrote:
>>>
>>>
>>> On 6/7/22 2:43 PM, David Faust wrote:
 Hello,

 This patch series adds support for:

 - Two new C-language-level attributes that allow to associate (to 
 "annotate" or
   to "tag") particular declarations and types with arbitrary 
 strings. As
   explained below, this is intended to be used to, for example, 
 characterize
   certain pointer types.

 - The conveyance of that information in the DWARF output in the form 
 of a new
   DIE: DW_TAG_GNU_annotation.

 - The conveyance of that information in the BTF output in the form of 
 two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

 All of these facilities are being added to the eBPF ecosystem, and 
 support for
 them exists in some form in LLVM.

 Purpose
 ===

 1)  Addition of C-family language constructs (attributes) to specify 
 free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional 
 information about
 types, variables, and function parameters of interest to the 
 kernel. A
 driving use case is to tag pointer types within the linux 
 kernel and eBPF
 programs with additional semantic information, such as 
 '__user' or '__rcu'.

 For example, consider the linux kernel function do_execve with 
 the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record 
 semantic
 information about the pointer parameters (e.g., they are 
 user-provided) in
 DWARF and BTF information. Other kernel facilites such as the 
 eBPF verifier
 can read the tags and make use of the information.

 2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the 
 Linux kernel
 generates its BTF information via pahole, using DWARF as a 
 source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but 
 there is no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the 
 pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and 
 translated
 to BTF by pahole) and those of the BPF program (available 
 directly in BTF).

 Another motivation for having the tag information in DWARF, 
 unrelated to
 BPF and BTF, is that the drgn project (another DWARF consumer) 
 also wants
 to benefit from these tags in order to differentiate between 
 different
 kinds of pointers in the kernel.

 3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is 
 for the
 compiled eBPF programs. The kernel verifier can then access 
 the tags
 of pointers used by the eBPF programs.

 For more information about these tags and the motivation behind

Re: [PATCH] c++: Define built-in for std::tuple_element [PR100157]

2022-07-07 Thread Jonathan Wakely via Gcc-patches

On Thu, 7 Jul 2022 at 20:29, Jason Merrill  wrote:
>
> On 7/7/22 13:14, Jonathan Wakely wrote:
> > This adds a new built-in to replace the recursive class template
> > instantiations done by traits such as std::tuple_element and
> > std::variant_alternative. The purpose is to select the Nth type from a
> > list of types, e.g. __builtin_type_pack_element(1, char, int, float) is
> > int.
> >
> > For a pathological example tuple_element_t<1000, tuple<2000 types...>>
> > the compilation time is reduced by more than 90% and the memory  used by
> > the compiler is reduced by 97%. In realistic examples the gains will be
> > much smaller, but still relevant.
> >
> > Clang has a similar built-in, __type_pack_element, but that's a
> > "magic template" built-in using <> syntax, which GCC doesn't support. So
> > this provides an equivalent feature, but as a built-in function using
> > parens instead of <>. I don't really like the name "type pack element"
> > (it gives you an element from a pack of types) but the semi-consistency
> > with Clang seems like a reasonable argument in favour of keeping the
> > name. I'd be open to alternative names though, e.g. __builtin_nth_type
> > or __builtin_type_at_index.
> >
> >
> > The patch has some problems though ...
> >
> > FIXME 1: Marek pointed out that this this ICEs:
> > template using type = __builtin_type_pack_element(sizeof(T), 
> > T...);
> > type c;
> >
> > The sizeof(T) expression is invalid, because T is an unexpanded pack,
> > but it's not rejected and instead crashes:
> >
> > ice.C: In substitution of 'template using type = 
> > __builtin_type_pack_element (sizeof (T), T ...) [with T = {int, char}]':
> > ice.C:2:15:   required from here
> > ice.C:1:63: internal compiler error: in dependent_type_p, at cp/pt.cc:27490
> >  1 | template using type = 
> > __builtin_type_pack_element(sizeof(T), T...);
> >|   
> > ^
> > 0xe13eea dependent_type_p(tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:27490
> > 0xeb1286 cxx_sizeof_or_alignof_type(unsigned int, tree_node*, tree_code, 
> > bool, bool)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/typeck.cc:1912
> > 0xdf4fcc tsubst_copy_and_build(tree_node*, tree_node*, int, tree_node*, 
> > bool, bool)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:20582
> > 0xdd9121 tsubst_tree_list(tree_node*, tree_node*, int, tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15587
> > 0xddb583 tsubst(tree_node*, tree_node*, int, tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16056
> > 0xddcc9d tsubst(tree_node*, tree_node*, int, tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16436
> > 0xdd6d45 tsubst_decl
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15038
> > 0xdd952a tsubst(tree_node*, tree_node*, int, tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15668
> > 0xdfb9a1 instantiate_template(tree_node*, tree_node*, int)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21811
> > 0xdfc1b6 instantiate_alias_template
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21896
> > 0xdd9796 tsubst(tree_node*, tree_node*, int, tree_node*)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15696
> > 0xdbaba5 lookup_template_class(tree_node*, tree_node*, tree_node*, 
> > tree_node*, int, int)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10131
> > 0xe4bac0 finish_template_type(tree_node*, tree_node*, int)
> >  /home/jwakely/src/gcc/gcc/gcc/cp/semantics.cc:3727
> > 0xd334c8 cp_parser_template_id
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:18458
> > 0xd429b0 cp_parser_class_name
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:25923
> > 0xd1ade9 cp_parser_qualifying_entity
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:7193
> > 0xd1a2c8 cp_parser_nested_name_specifier_opt
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:6875
> > 0xd4eefd cp_parser_template_introduction
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31668
> > 0xd4f416 cp_parser_template_declaration_after_export
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:31840
> > 0xd2d60e cp_parser_declaration
> >  /home/jwakely/src/gcc/gcc/gcc/cp/parser.cc:15083
> >
> >
> > FIXME 2: I want to mangle __builtin_type_pack_element(N, T...) the same as
> > typename std::_Nth_type::type but I don't know how. Instead of
> > trying to fake the mangled string, it's probably better to build a decl
> > for that nested type, right? Any suggestions where to find something
> > similar I can learn from?
>
> The tricky thing is dealing with mangling compression, where we use a
> substitution instead of repeating a type; that's definitely easier if we
> actually have the type.

Yeah, that's what I discovered when trying to fudge it as
"19__builtin_type_pack_elementE" etc.

> So you'd probably want to have a declaration of std::_Nth_type to work
> with, and lookup_template_class

[PATCH] Be careful with MODE_CC in simplify_const_relational_operation.

2022-07-07 Thread Roger Sayle


I think it's fair to describe RTL's representation of condition flags
using MODE_CC as a little counter-intuitive.  For example, the i386
backend represents the carry flag (in adc instructions) using RTL of
the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs
to be taken not to treat this like a normal RTX expression, after all
LTU (less-than-unsigned) against const0_rtx would normally always be
false.  Hence, MODE_CC comparisons need to be treated with caution,
and simplify_const_relational_operation returns early (to avoid
problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.

However, consider the (currently) hypothetical situation, where the
RTL optimizers determine that a previous instruction unconditionally
sets or clears the carry flag, and this gets propagated by combine into
the above expression, we'd end up with something that looks like
(ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says.
Fortunately, simplify_const_relational_operation is passed the
original mode of the comparison (cmp_mode, the original mode of op0)
which can be checked for MODE_CC, even when op0 is now VOIDmode
(const_int) after the substitution.  Defending against this is clearly the
right thing to do.

More controversially, rather than just abort simplification/optimization
in this case, we can use the comparison operator to infer/select the
semantics of the CC_MODE flag.  Hopefully, whenever a backend uses LTU,
it represents the (set) carry flag (and behaves like i386.md), in which
case the result of the simplified expression is the first operand.
[If there's no standardization of semantics across backends, then
we should always just return 0; but then miss potential optimizations].

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures, and in combination with a i386 backend patch
(that introduces support for x86's stc and clc instructions) where it
avoids failures.  However, I'm submitting this middle-end piece
independently, to confirm that maintainers/reviewers are happy with
the approach, and also to check there are no issues on other platforms,
before building upon this infrastructure.

Thoughts?  Ok for mainline?


2022-07-07  Roger Sayle  

gcc/ChangeLog
* simplify-rtx.cc (simplify_const_relational_operation): Handle
case where both operands of a MODE_CC comparison have been
simplified to constant integers.


Thanks in advance,
Roger
--

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index fa20665..73ec5c7 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -6026,6 +6026,18 @@ simplify_const_relational_operation (enum rtx_code code,
return 0;
 }
 
+  /* Handle MODE_CC comparisons that have been simplified to
+ constants.  */
+  if (GET_MODE_CLASS (mode) == MODE_CC
+  && op1 == const0_rtx
+  && CONST_INT_P (op0))
+{
+  /* LTU represents the carry flag.  */
+  if (code == LTU)
+   return op0 == const0_rtx ? const0_rtx : const_true_rtx;
+  return 0;
+}
+
   /* We can't simplify MODE_CC values since we don't know what the
  actual comparison is.  */
   if (GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC)

Re: [PATCH 08/17] openmp: -foffload-memory=pinned

2022-07-07 Thread Andrew Stubbs


On 07/07/2022 12:54, Tobias Burnus wrote:

Hi Andrew,

On 07.07.22 12:34, Andrew Stubbs wrote:

Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up.  The option is
intended to provide a performance boost to certain offload programs 
without

modifying the code.

...

gcc/ChangeLog:

* omp-builtins.def (BUILT_IN_GOMP_ENABLE_PINNED_MODE): New.
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.

libgomp/ChangeLog:

* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map: Add GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
...

...

--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -14620,6 +14620,68 @@ lower_omp (gimple_seq *body, omp_context *ctx)
    input_location = saved_location;
  }
+/* Emit a constructor function to enable -foffload-memory=pinned
+   at runtime.  Libgomp handles the OS mode setting, but we need to 
trigger
+   it by calling GOMP_enable_pinned mode before the program proper 
runs.  */

+
+static void
+omp_enable_pinned_mode ()


Is there a reason not to use the mechanism of OpenMP's 'requires' 
directive for this?


(Okay, I have to admit that the final patch was only committed on 
Monday. But still ...)


Possibly, I had most of this done before then. I'll have a look next 
time I visit this patch.


The Cuda-specific solution can't work this way anyway, because there's 
no mlockall equivalent, so I will make conditional adjustments anyway.


Likewise, the 'requires' mechanism could then also be used in '[PATCH 
16/17] amdgcn, openmp: Auto-detect USM mode and set HSA_XNACK'.


No, I don't think so; that environment variable needs to be set before 
the libraries are loaded or it's too late.  There are other ways to 
achieve the same thing, by leaving messages for the libgomp plugin to 
pick up, perhaps, but it's all extra complexity for no real gain.


Andrew

Re: [PATCH] Be careful with MODE_CC in simplify_const_relational_operation.

2022-07-07 Thread Segher Boessenkool

Hi!

On Thu, Jul 07, 2022 at 10:08:04PM +0100, Roger Sayle wrote:
> I think it's fair to describe RTL's representation of condition flags
> using MODE_CC as a little counter-intuitive.

"A little challenging", and you should see that as a good thing, as a
puzzle to crack :-)

> For example, the i386
> backend represents the carry flag (in adc instructions) using RTL of
> the form "(ltu:SI (reg:CCC) (const_int 0))", where great care needs
> to be taken not to treat this like a normal RTX expression, after all
> LTU (less-than-unsigned) against const0_rtx would normally always be
> false.

A comparison of a MODE_CC thing against 0 means the result of a
*previous* comparison (or other cc setter) is looked at.  Usually it
simply looks at some condition bits in a flags register.  It does not do
any actual comparison: that has been done before (if at all even).

> Hence, MODE_CC comparisons need to be treated with caution,
> and simplify_const_relational_operation returns early (to avoid
> problems) when GET_MODE_CLASS (GET_MODE (op0)) == MODE_CC.

Not just to avoid problems: there simply isn't enough information to do
a correct job.

> However, consider the (currently) hypothetical situation, where the
> RTL optimizers determine that a previous instruction unconditionally
> sets or clears the carry flag, and this gets propagated by combine into
> the above expression, we'd end up with something that looks like
> (ltu:SI (const_int 1) (const_int 0)), which doesn't mean what it says.
> Fortunately, simplify_const_relational_operation is passed the
> original mode of the comparison (cmp_mode, the original mode of op0)
> which can be checked for MODE_CC, even when op0 is now VOIDmode
> (const_int) after the substitution.  Defending against this is clearly the
> right thing to do.
> 
> More controversially, rather than just abort simplification/optimization
> in this case, we can use the comparison operator to infer/select the
> semantics of the CC_MODE flag.  Hopefully, whenever a backend uses LTU,
> it represents the (set) carry flag (and behaves like i386.md), in which
> case the result of the simplified expression is the first operand.
> [If there's no standardization of semantics across backends, then
> we should always just return 0; but then miss potential optimizations].

On PowerPC, ltu means the result of an unsigned comparison (we have
instructions for that, cmpl[wd][i] mainly) was "smaller than".  It does
not mean anything is unsigned smaller than zero.  It also has nothing to
do with carries, which are done via a different register (the XER).

> +  /* Handle MODE_CC comparisons that have been simplified to
> + constants.  */
> +  if (GET_MODE_CLASS (mode) == MODE_CC
> +  && op1 == const0_rtx
> +  && CONST_INT_P (op0))
> +{
> +  /* LTU represents the carry flag.  */
> +  if (code == LTU)
> + return op0 == const0_rtx ? const0_rtx : const_true_rtx;
> +  return 0;
> +}
> +
>/* We can't simplify MODE_CC values since we don't know what the
>   actual comparison is.  */

^^^
This comment is 100% true.  We cannot simplify any MODE_CC comparison
without having more context.  The combiner does have that context when
it tries to combine the CC setter with the CC consumer, for example.

Do you have some piece of motivating example code?


Segher

libbacktrace patch committed: Don't let "make clean" remove allocfail.sh

2022-07-07 Thread Ian Lance Taylor via Gcc-patches

The script allocfail.sh was being incorrectly removed by "make clean".
This patch fixes the problem.  This fixes
https://github.com/ianlancetaylor/libbacktrace/issues/81.  Ran
libbacktrace "make check" and "make clean" on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

For https://github.com/ianlancetaylor/libbacktrace/issues/81
* Makefile.am (MAKETESTS): New variable split out of TESTS.
(CLEANFILES): Replace TESTS with BUILDTESTS and MAKETESTS.
* Makefile.in: Regenerate.
9ed57796235abcd24e06b1ce10fe72c3d0d07cc5
diff --git a/libbacktrace/Makefile.am b/libbacktrace/Makefile.am
index bf507b73918..9f8516d00e2 100644
--- a/libbacktrace/Makefile.am
+++ b/libbacktrace/Makefile.am
@@ -85,13 +85,19 @@ libbacktrace_la_DEPENDENCIES = $(libbacktrace_la_LIBADD)
 
 # Testsuite.
 
-# Add a test to this variable if you want it to be built.
+# Add a test to this variable if you want it to be built as a program,
+# with SOURCES, etc.
 check_PROGRAMS =
 
 # Add a test to this variable if you want it to be run.
 TESTS =
 
-# Add a test to this variable if you want it to be built and run.
+# Add a test to this variable if you want it to be built as a Makefile
+# target and run.
+MAKETESTS =
+
+# Add a test to this variable if you want it to be built as a program,
+# with SOURCES, etc., and run.
 BUILDTESTS =
 
 # Add a file to this variable if you want it to be built for testing.
@@ -250,7 +256,7 @@ b2test_LDFLAGS = -Wl,--build-id
 b2test_LDADD = libbacktrace_elf_for_test.la
 
 check_PROGRAMS += b2test
-TESTS += b2test_buildid
+MAKETESTS += b2test_buildid
 
 if HAVE_DWZ
 
@@ -260,7 +266,7 @@ b3test_LDFLAGS = -Wl,--build-id
 b3test_LDADD = libbacktrace_elf_for_test.la
 
 check_PROGRAMS += b3test
-TESTS += b3test_dwz_buildid
+MAKETESTS += b3test_dwz_buildid
 
 endif HAVE_DWZ
 
@@ -311,11 +317,11 @@ if HAVE_DWZ
  cp $< $@; \
fi
 
-TESTS += btest_dwz
+MAKETESTS += btest_dwz
 
 if HAVE_OBJCOPY_DEBUGLINK
 
-TESTS += btest_dwz_gnudebuglink
+MAKETESTS += btest_dwz_gnudebuglink
 
 endif HAVE_OBJCOPY_DEBUGLINK
 
@@ -416,7 +422,7 @@ endif HAVE_PTHREAD
 
 if HAVE_OBJCOPY_DEBUGLINK
 
-TESTS += btest_gnudebuglink
+MAKETESTS += btest_gnudebuglink
 
 %_gnudebuglink: %
$(OBJCOPY) --only-keep-debug $< $@.debug
@@ -494,7 +500,7 @@ endif USE_DSYMUTIL
 
 if HAVE_MINIDEBUG
 
-TESTS += mtest_minidebug
+MAKETESTS += mtest_minidebug
 
 %_minidebug: %
$(NM) -D $< -P --defined-only | $(AWK) '{ print $$1 }' | sort > $<.dsyms
@@ -536,10 +542,11 @@ endif HAVE_ELF
 
 check_PROGRAMS += $(BUILDTESTS)
 
-TESTS += $(BUILDTESTS)
+TESTS += $(MAKETESTS) $(BUILDTESTS)
 
 CLEANFILES = \
-   $(TESTS) *.debug elf_for_test.c edtest2_build.c gen_edtest2_build \
+   $(MAKETESTS) $(BUILDTESTS) *.debug elf_for_test.c edtest2_build.c \
+   gen_edtest2_build \
*.dsyms *.fsyms *.keepsyms *.dbg *.mdbg *.mdbg.xz *.strip
 
 clean-local:

libbacktrace patch committed: Don't exit Mach-O dyld loop on failure

2022-07-07 Thread Ian Lance Taylor via Gcc-patches

This libbacktrace patch changes the loop over dynamic libraries on
Mach-O to keep going if we fail to find the debug info for a
particular library.  We can still pick up debug info for other
libraries even if one fails.  Tested on x86_64-pc-linux-gnu which
admittedly does little, but others have tested it on Mach-o.
Committed to mainline.

Ian

* macho.c (backtrace_initialize) [HAVE_MACH_O_DYLD_H]: Don't exit
loop if we can't find debug info for one shared library.
d8ddf1fa098fa50929ea0a1569a8e38d80fadbaf
diff --git a/libbacktrace/macho.c b/libbacktrace/macho.c
index 3f40811719e..16f406507d2 100644
--- a/libbacktrace/macho.c
+++ b/libbacktrace/macho.c
@@ -1268,7 +1268,7 @@ backtrace_initialize (struct backtrace_state *state, 
const char *filename,
   mff = macho_nodebug;
   if (!macho_add (state, name, d, 0, NULL, base_address, 0,
  error_callback, data, &mff, &mfs))
-   return 0;
+   continue;
 
   if (mff != macho_nodebug)
macho_fileline_fn = mff;

Re: [RFA] Improve initialization of objects when the initializer has trailing zeros.

2022-07-07 Thread Takayuki 'January June' Suwa via Gcc-patches

On 2022/07/07 23:46, Jeff Law wrote:
> This is an update to a patch originally posted by Takayuki Suwa a few months 
> ago.
> 
> When we initialize an array from a STRING_CST we perform the initialization 
> in two steps.  The first step copies the STRING_CST to the destination.  The 
> second step uses clear_storage to initialize storage in the array beyond 
> TREE_STRING_LENGTH of the initializer.
> 
> Takayuki's patch added a special case when the STRING_CST itself was all 
> zeros which would avoid the copy from the STRING_CST and instead do all the 
> initialization via clear_storage which is clearly more runtime efficient.

Thank you for understanding what I mean...

> Richie had the suggestion that instead of special casing when the entire 
> STRING_CST was NULs  to instead identify when the tail of the STRING_CST was 
> NULs.   That's more general and handles Takayuki's case as well.

and offering good explanation.

> Bootstrapped and regression tested on x86_64-linux-gnu.  Given I rewrote 
> Takayuki's patch I think it needs someone else to review rather than 
> self-approving.

LGTM and of course it resolves the beginning of the first place 
(https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595685.html).

> 
> OK for the trunk?
> 
> Jeff
>

Re: [PATCH 0/2] loongarch: improve code generation for integer division

2022-07-07 Thread Lulu Cheng




在 2022/7/7 上午10:23, Xi Ruoyao 写道:

We were generating some unnecessary instructions for integer division.
These two patches improve the code generation to compile

 template  T div(T a, T b) { return a / b; }

into a single division instruction (along with a return instruction of
course) as we expected for T in {int32_t, uint32_t, int64_t}.

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

Xi Ruoyao (2):
   loongarch: add alternatives for idiv insns to improve code generation
   loongarch: avoid unnecessary sign-extend after 32-bit division

  gcc/config/loongarch/loongarch-protos.h|  1 +
  gcc/config/loongarch/loongarch.cc  |  2 +-
  gcc/config/loongarch/loongarch.md  | 34 --
  gcc/testsuite/gcc.target/loongarch/div-1.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-2.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-3.c |  9 ++
  gcc/testsuite/gcc.target/loongarch/div-4.c |  9 ++
  7 files changed, 63 insertions(+), 10 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c
  create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c


I am testing the spec and it can be done today or tomorrow.

[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

2022-07-07 Thread Lewis Hyatt via Gcc-patches

Hello-

The PR (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91733) points out that,
while libcpp recognizes a lone '\r' as a valid line-ending character, the
infrastructure that obtains source lines to be printed in diagnostics does
not, and hence diagnostics do not output the intended portion of a source
file that uses such line endings. The PR's author suggests that libcpp
should stop accepting '\r' line endings, but that seems rather controversial
and not likely to change. Fixing the diagnostics is easy enough though, and
that's done by the attached patch. Please let me know if it looks OK,
thanks! bootstrap + regtest all languages looks good, with just new PASSes
for the new testcase.

FAIL 103 103
PASS 543592 543627
UNSUPPORTED 15298 15298
UNTESTED 136 136
XFAIL 4130 4130
XPASS 20 20


-Lewis
[PATCH] diagnostics: Make line-ending logic consistent with libcpp [PR91733]

libcpp recognizes a lone \r as a valid line ending, so the infrastructure
for retrieving source lines to be output in diagnostics needs to do the
same. This patch fixes file_cache_slot::get_next_line() accordingly so that
diagnostics display the correct part of the source when \r line endings are in
use.

gcc/ChangeLog:

PR preprocessor/91733
* input.cc (find_end_of_line): New helper function.
(file_cache_slot::get_next_line): Recognize \r as a line ending.
* diagnostic-show-locus.cc (test_escaping_bytes_1): Adapt selftest
since \r will now be interpreted as a line-ending.

gcc/testsuite/ChangeLog:

PR preprocessor/91733
* c-c++-common/pr91733.c: New test.

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 6eafe19785f..d267d2c258d 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -5508,7 +5508,7 @@ test_tab_expansion (const line_table_case &case_)
 static void
 test_escaping_bytes_1 (const line_table_case &case_)
 {
-  const char content[] = "before\0\1\2\3\r\x80\xff""after\n";
+  const char content[] = "before\0\1\2\3\v\x80\xff""after\n";
   const size_t sz = sizeof (content);
   temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
@@ -5523,18 +5523,18 @@ test_escaping_bytes_1 (const line_table_case &case_)
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
 return;
 
-  /* Locations of the NUL and \r bytes.  */
+  /* Locations of the NUL and \v bytes.  */
   location_t nul_loc
 = linemap_position_for_line_and_column (line_table, ord_map, 1, 7);
-  location_t r_loc
+  location_t v_loc
 = linemap_position_for_line_and_column (line_table, ord_map, 1, 11);
   gcc_rich_location richloc (nul_loc);
-  richloc.add_range (r_loc);
+  richloc.add_range (v_loc);
 
   {
 test_diagnostic_context dc;
 diagnostic_show_locus (&dc, &richloc, DK_ERROR);
-ASSERT_STREQ (" before \1\2\3 \x80\xff""after\n"
+ASSERT_STREQ (" before \1\2\3\v\x80\xff""after\n"
  "   ^   ~\n",
  pp_formatted_text (dc.printer));
   }
@@ -5544,7 +5544,7 @@ test_escaping_bytes_1 (const line_table_case &case_)
 dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_UNICODE;
 diagnostic_show_locus (&dc, &richloc, DK_ERROR);
 ASSERT_STREQ
-  (" before<80>after\n"
+  (" before<80>after\n"
"   ^~~~\n",
pp_formatted_text (dc.printer));
   }
@@ -5552,7 +5552,7 @@ test_escaping_bytes_1 (const line_table_case &case_)
 test_diagnostic_context dc;
 dc.escape_format = DIAGNOSTICS_ESCAPE_FORMAT_BYTES;
 diagnostic_show_locus (&dc, &richloc, DK_ERROR);
-ASSERT_STREQ (" before<00><01><02><03><0d><80>after\n"
+ASSERT_STREQ (" before<00><01><02><03><0b><80>after\n"
  "   ^~~~\n",
  pp_formatted_text (dc.printer));
   }
diff --git a/gcc/input.cc b/gcc/input.cc
index 2acbfdea4f8..060ca160126 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -646,6 +646,37 @@ file_cache_slot::maybe_read_data ()
   return read_data ();
 }
 
+/* Helper function for file_cache_slot::get_next_line (), to find the end of
+   the next line.  Returns with the memchr convention, i.e. nullptr if a line
+   terminator was not found.  We need to determine line endings in the same
+   manner that libcpp does: any of \n, \r\n, or \r is a line ending.  */
+
+static char *
+find_end_of_line (char *s, size_t len)
+{
+  for (const auto end = s + len; s != end; ++s)
+{
+  if (*s == '\n')
+   return s;
+  if (*s == '\r')
+   {
+ const auto next = s + 1;
+ if (next == end)
+   {
+ /* Don't find the line ending if \r is the very last character
+in the buffer; we do not know if it's the end of the file or
+just the end of what has been read so far, and we wouldn't
+want to break in the middle of what's actually a \r\n
+sequence.  Instead, we will handle the case of a file end

[pushd][PATCH v4] LoongArch: Modify fp_sp_offset and gp_sp_offset's calculation method when frame->mask or frame->fmask is zero.

2022-07-07 Thread Lulu Cheng

Pushed for trunk and gcc-12.

r13-1569-gaa8fd7f65683ef.
r12-8558-ge623829c18ec29



Under the LA architecture, when the stack is dropped too far, the process
of dropping the stack is divided into two steps.
step1: After dropping the stack, save callee saved registers on the stack.
step2: The rest of it.

The stack drop operation is optimized when frame->total_size minus
frame->sp_fp_offset is an integer multiple of 4096, can reduce the number
of instructions required to drop the stack. However, this optimization is
not effective because of the original calculation method

The following case:
int main()
{
  char buf[1024 * 12];
  printf ("%p\n", buf);
  return 0;
}

As you can see from the generated assembler, the old GCC has two more
instructions than the new GCC, lines 14 and line 24.

   newold
 10 main:   | 11 main:
 11   addi.d  $r3,$r3,-16   | 12   lu12i.w $r13,-12288>>12
 12   lu12i.w $r13,-12288>>12   | 13   addi.d  $r3,$r3,-2032
 13   lu12i.w $r5,-12288>>12| 14   ori $r13,$r13,2016
 14   lu12i.w $r12,12288>>12| 15   lu12i.w $r5,-12288>>12
 15   st.d  $r1,$r3,8   | 16   lu12i.w $r12,12288>>12
 16   add.d $r12,$r12,$r5   | 17   st.d  $r1,$r3,2024
 17   add.d $r3,$r3,$r13| 18   add.d $r12,$r12,$r5
 18   add.d $r5,$r12,$r3| 19   add.d $r3,$r3,$r13
 19   la.local  $r4,.LC0| 20   add.d $r5,$r12,$r3
 20   bl  %plt(printf)  | 21   la.local  $r4,.LC0
 21   lu12i.w $r13,12288>>12| 22   bl  %plt(printf)
 22   add.d $r3,$r3,$r13| 23   lu12i.w $r13,8192>>12
 23   ld.d  $r1,$r3,8   | 24   ori $r13,$r13,2080
 24   or  $r4,$r0,$r0   | 25   add.d $r3,$r3,$r13
 25   addi.d  $r3,$r3,16| 26   ld.d  $r1,$r3,2024
 26   jr  $r1   | 27   or  $r4,$r0,$r0
| 28   addi.d  $r3,$r3,2032
| 29   jr  $r1
gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_compute_frame_info):
Modify fp_sp_offset and gp_sp_offset's calculation method,
when frame->mask or frame->fmask is zero, don't minus UNITS_PER_WORD
or UNITS_PER_FP_REG.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/prolog-opt.c: New test.
---
 gcc/config/loongarch/loongarch.cc   | 12 +---
 gcc/testsuite/gcc.target/loongarch/prolog-opt.c | 15 +++
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/prolog-opt.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index d72b256df51..5c9a33c14f7 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -917,8 +917,12 @@ loongarch_compute_frame_info (void)
   frame->frame_pointer_offset = offset;
   /* Next are the callee-saved FPRs.  */
   if (frame->fmask)
-offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
-  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+{
+  offset += LARCH_STACK_ALIGN (num_f_saved * UNITS_PER_FP_REG);
+  frame->fp_sp_offset = offset - UNITS_PER_FP_REG;
+}
+  else
+frame->fp_sp_offset = offset;
   /* Next are the callee-saved GPRs.  */
   if (frame->mask)
 {
@@ -931,8 +935,10 @@ loongarch_compute_frame_info (void)
frame->save_libcall_adjustment = x_save_size;
 
   offset += x_save_size;
+  frame->gp_sp_offset = offset - UNITS_PER_WORD;
 }
-  frame->gp_sp_offset = offset - UNITS_PER_WORD;
+  else
+frame->gp_sp_offset = offset;
   /* The hard frame pointer points above the callee-saved GPRs.  */
   frame->hard_frame_pointer_offset = offset;
   /* Above the hard frame pointer is the callee-allocated varags save area.  */
diff --git a/gcc/testsuite/gcc.target/loongarch/prolog-opt.c 
b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
new file mode 100644
index 000..0470a1f1eee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/prolog-opt.c
@@ -0,0 +1,15 @@
+/* Test that LoongArch backend stack drop operation optimized.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "addi.d\t\\\$r3,\\\$r3,-16" } } */
+
+extern int printf (char *, ...);
+
+int main()
+{
+  char buf[1024 * 12];
+  printf ("%p\n", buf);
+  return 0;
+}
+
-- 
2.31.1

Re: [PATCH v2 1/7] config: use $EGREP instead of egrep

2022-07-07 Thread Eric Gallager via Gcc-patches

On Mon, Jun 27, 2022 at 2:07 AM Xi Ruoyao via Gcc-patches
 wrote:
>
> egrep has been deprecated in favor of grep -E for a long time, and the
> next GNU grep release (3.8 or 4.0) will print a warning if egrep is used.
> Unfortunately, old hosts with non-GNU grep may lack the support for -E
> option.  Use AC_PROG_EGREP and $EGREP variable so we'll work fine on
> both old and new platforms.
>
> ChangeLog:
>
> * configure.ac: Call AC_PROG_EGREP, and use $EGREP instead of
> egrep.
> * config.rpath: Use $EGREP instead of egrep.

config.rpath is imported from gnulib where this problem is already
fixed apparently; wouldn't it make more sense to re-import a fresh
config.rpath from upstream gnulib instead of patching GCC's local
copy?


> * configure: Regenerate.
>
> config/ChangeLog:
>
> * lib-ld.m4 (AC_LIB_PROG_LD_GNU): Call AC_PROG_EGREP, and use
> $EGREP instead of egrep.
> (acl_cv_path_LD): Likewise.
> * lib-link.m4 (AC_LIB_RPATH): Call AC_PROG_EGREP, and pass
> $EGREP to config.rpath.
>
> gcc/ChangeLog:
>
> * configure: Regenerate.
>
> intl/ChangeLog:
>
> * configure: Regenerate.
>
> libcpp/ChangeLog:
>
> * configure: Regenerate.
>
> libgcc/ChangeLog:
>
> * configure: Regenerate.
>
> libstdc++-v3/ChangeLog:
>
> * configure: Regenerate.
> ---
>  config.rpath   |  10 +--
>  config/lib-ld.m4   |   6 +-
>  config/lib-link.m4 |   4 +-
>  configure  | 136 -
>  configure.ac   |   5 +-
>  gcc/configure  |  13 ++--
>  intl/configure |   9 +--
>  libcpp/configure   |   9 +--
>  libgcc/configure   |   2 +-
>  libstdc++-v3/configure |   9 +--
>  10 files changed, 172 insertions(+), 31 deletions(-)
>
> diff --git a/config.rpath b/config.rpath
> index 4dea75957c2..4ada7468c22 100755
> --- a/config.rpath
> +++ b/config.rpath
> @@ -29,7 +29,7 @@
>  #CPU_TYPE-MANUFACTURER-OPERATING_SYSTEM
>  # or
>  #CPU_TYPE-MANUFACTURER-KERNEL-OPERATING_SYSTEM
> -# The environment variables CC, GCC, LDFLAGS, LD, with_gnu_ld
> +# The environment variables CC, GCC, EGREP, LDFLAGS, LD, with_gnu_ld
>  # should be set by the caller.
>  #
>  # The set of defined variables is at the end of this script.
> @@ -143,7 +143,7 @@ if test "$with_gnu_ld" = yes; then
>ld_shlibs=no
>;;
>  beos*)
> -  if $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; 
> then
> +  if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null; 
> then
>  :
>else
>  ld_shlibs=no
> @@ -162,9 +162,9 @@ if test "$with_gnu_ld" = yes; then
>  netbsd*)
>;;
>  solaris* | sysv5*)
> -  if $LD -v 2>&1 | egrep 'BFD 2\.8' > /dev/null; then
> +  if $LD -v 2>&1 | $EGREP 'BFD 2\.8' > /dev/null; then
>  ld_shlibs=no
> -  elif $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; 
> then
> +  elif $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > 
> /dev/null; then
>  :
>else
>  ld_shlibs=no
> @@ -174,7 +174,7 @@ if test "$with_gnu_ld" = yes; then
>hardcode_direct=yes
>;;
>  *)
> -  if $LD --help 2>&1 | egrep ': supported targets:.* elf' > /dev/null; 
> then
> +  if $LD --help 2>&1 | $EGREP ': supported targets:.* elf' > /dev/null; 
> then
>  :
>else
>  ld_shlibs=no
> diff --git a/config/lib-ld.m4 b/config/lib-ld.m4
> index 11d0ce77342..88a014b7a74 100644
> --- a/config/lib-ld.m4
> +++ b/config/lib-ld.m4
> @@ -14,7 +14,8 @@ dnl From libtool-1.4. Sets the variable with_gnu_ld to yes 
> or no.
>  AC_DEFUN([AC_LIB_PROG_LD_GNU],
>  [AC_CACHE_CHECK([if the linker ($LD) is GNU ld], acl_cv_prog_gnu_ld,
>  [# I'd rather use --version here, but apparently some GNU ld's only accept 
> -v.
> -if $LD -v 2>&1 &5; then
> +AC_REQUIRE([AC_PROG_EGREP])dnl
> +if $LD -v 2>&1 &5; then
>acl_cv_prog_gnu_ld=yes
>  else
>acl_cv_prog_gnu_ld=no
> @@ -28,6 +29,7 @@ AC_DEFUN([AC_LIB_PROG_LD],
>  [  --with-gnu-ld   assume the C compiler uses GNU ld [default=no]],
>  test "$withval" = no || with_gnu_ld=yes, with_gnu_ld=no)
>  AC_REQUIRE([AC_PROG_CC])dnl
> +AC_REQUIRE([AC_PROG_EGREP])dnl
>  AC_REQUIRE([AC_CANONICAL_HOST])dnl
>  # Prepare PATH_SEPARATOR.
>  # The user is always right.
> @@ -88,7 +90,7 @@ AC_CACHE_VAL(acl_cv_path_LD,
># Check to see if the program is GNU ld.  I'd rather use --version,
># but apparently some GNU ld's only accept -v.
># Break only if it was the GNU/non-GNU ld that we prefer.
> -  if "$acl_cv_path_LD" -v 2>&1 < /dev/null | egrep '(GNU|with BFD)' > 
> /dev/null; then
> +  if "$acl_cv_path_LD" -v 2>&1 < /dev/null | $EGREP '(GNU|with BFD)' > 
> /dev/null; then
> test "$with_gnu_ld" != no && break
>else
> test "$with_gnu_ld" != yes && break
> diff --git a/config/lib-link.m4 b/config/lib-link.m4
> index 20e2

[PATCH] Fix tree-opt/PR106087: ICE with inline-asm with multiple output and assigned only static vars

2022-07-07 Thread apinski--- via Gcc-patches

From: Andrew Pinski 

The problem here is that when we mark the ssa name that was referenced in the 
now removed
dead store (to a write only static variable), the inline-asm would also be 
removed
even though it was defining another ssa name. This fixes the problem by checking
to make sure that the statement was only defining one ssa name.

OK? Bootstrapped and tested on x86_64 with no regressions.

PR tree-optimization/106087

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): Check
to make sure the statement is only defining one operand.

gcc/testsuite/ChangeLog:

* gcc.c-torture/compile/inline-asm-1.c: New test.
---
 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c | 14 ++
 gcc/tree-ssa-dce.cc|  5 +
 2 files changed, 19 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c

diff --git a/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c 
b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
new file mode 100644
index 000..0044cb761b6
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/compile/inline-asm-1.c
@@ -0,0 +1,14 @@
+/* PR tree-opt/106087,
+   simple_dce_from_worklist would delete the
+   inline-asm when it was still being referenced
+   by the other ssa name. */
+
+static int t;
+
+int f(void)
+{
+  int tt, tt1;
+  asm("":"=r"(tt), "=r"(tt1));
+  t = tt1;
+  return tt;
+}
diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index bc533582673..602cdb30ceb 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2061,6 +2061,11 @@ simple_dce_from_worklist (bitmap worklist)
   if (gimple_has_side_effects (t))
continue;
 
+  /* The defining statement needs to be defining one this name. */
+  if (!is_a(t)
+ && !single_ssa_def_operand (t, SSA_OP_DEF))
+   continue;
+
   /* Don't remove statements that are needed for non-call
 eh to work.  */
   if (stmt_unremovable_because_of_non_call_eh_p (cfun, t))
-- 
2.17.1

1 2 >

1 - 100 of 105 matches

Mail list logo