Re: [committed][libgomp, testsuite] Reduce recursion depth in declare_target-*.f90

2022-02-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 01, 2022 at 08:20:39AM +0100, Tom de Vries wrote:
> [libgomp, testsuite] Reduce recursion depth in declare_target-*.f90
> 
> libgomp/ChangeLog:
> 
> 2022-01-27  Tom de Vries  
> 
>   * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Reduce
>   recursion depth.
>   * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.

Ok.

Jakub



[PATCH] PR tree-optimization/102950: Improved EVRP for signed BIT_XOR_EXPR.

2022-02-01 Thread Roger Sayle

This patch fixes PR tree-optimization/102950, which is a P2 regression,
by providing better range bounds for BIT_XOR_EXPR, BIT_AND_EXPR and
BIT_IOR_EXPR on signed integer types.  In general terms, any binary
bitwise operation on sign-extended or zero-extended integer types will
produce results that are themselves sign-extended or zero-extended.
More precisely, we can derive signed bounds from the number of leading
redundant sign bit copies, from the equation:
clrsb(X op Y) >= min (clrsb (X), clrsb(Y))
and from the property that for any (signed or unsigned) range [lb, ub]
that clrsb([lb, ub]) >= min (clrsb(lb), clrsb(ub)).

These can be used to show that [-1, 0] op [-1, 0] is [-1, 0] or that
[-128, 127] op [-128, 127] is [-128, 127], even when tracking nonzero
bits would result in VARYING (as every bit can be 0 or 1).  This is
equivalent to determining the minimum type precision in which the
operation can be performed then sign extending the result.

One additional refinement is to observe that X ^ Y can never be
zero if the ranges of X and Y don't overlap, i.e. X can't be equal
to Y.

Previously, the expression "(int)(char)a ^ 233" in the PR was considered
VARYING, but with the above changes now has the range [-256, -1][1, 255],
which is sufficient to optimize away the call to foo.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check with no new failures.  Ok for mainline?


2022-02-01  Roger Sayle  

gcc/ChangeLog
PR tree-optimization/102950
* range-op.cc (wi_optimize_signed_bitwise_op): New function to
determine bounds of bitwise operations on signed types.
(operator_bitwise_and::wi_fold): Call the above function.
(operator_bitwise_or::wi_fold): Likewise.
(operator_bitwise_xor::wi_fold): Likewise.  Additionally, the
result can't be zero if the operands can't be equal.

gcc/testsuite/ChangeLog
PR tree-optimization/102950
gcc.dg/pr102950.c: New test case.
gcc.dg/tree-ssa/evrp10.c: New test case.


Thanks in advance (and Happy Chinese New Year),
Roger
--

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 19bdf30..71264ba 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -2659,6 +2659,29 @@ operator_bitwise_and::fold_range (irange &r, tree type,
 }
 
 
+// Optimize BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR of signed types
+// by considering the number of leading redundant sign bit copies.
+// clrsb (X op Y) = min (clrsb (X), clrsb (Y)), so for example
+// [-1, 0] op [-1, 0] is [-1, 0] (where nonzero_bits doesn't help).
+static bool
+wi_optimize_signed_bitwise_op (irange &r, tree type,
+  const wide_int &lh_lb, const wide_int &lh_ub,
+  const wide_int &rh_lb, const wide_int &rh_ub)
+{
+  int lh_clrsb = MIN (wi::clrsb (lh_lb), wi::clrsb (lh_ub));
+  int rh_clrsb = MIN (wi::clrsb (rh_lb), wi::clrsb (rh_ub));
+  int new_clrsb = MIN (lh_clrsb, rh_clrsb);
+  if (new_clrsb == 0)
+return false;
+  int type_prec = TYPE_PRECISION (type);
+  int rprec = (type_prec - new_clrsb) - 1;
+  value_range_with_overflow (r, type,
+wi::mask (rprec, true, type_prec),
+wi::mask (rprec, false, type_prec));
+  return true;
+}
+
+
 // Optimize BIT_AND_EXPR and BIT_IOR_EXPR in terms of a mask if
 // possible.  Basically, see if we can optimize:
 //
@@ -2839,7 +2862,14 @@ operator_bitwise_and::wi_fold (irange &r, tree type,
 }
   // If the limits got swapped around, return varying.
   if (wi::gt_p (new_lb, new_ub,sign))
-r.set_varying (type);
+{
+  if (sign == SIGNED
+ && wi_optimize_signed_bitwise_op (r, type,
+   lh_lb, lh_ub,
+   rh_lb, rh_ub))
+   return;
+  r.set_varying (type);
+}
   else
 value_range_with_overflow (r, type, new_lb, new_ub);
 }
@@ -3093,6 +3123,11 @@ operator_bitwise_or::wi_fold (irange &r, tree type,
  || wi::lt_p (lh_ub, 0, sign)
  || wi::lt_p (rh_ub, 0, sign))
r.set_nonzero (type);
+  else if (sign == SIGNED
+  && wi_optimize_signed_bitwise_op (r, type,
+lh_lb, lh_ub,
+rh_lb, rh_ub))
+   return;
   else
r.set_varying (type);
   return;
@@ -3180,8 +3215,23 @@ operator_bitwise_xor::wi_fold (irange &r, tree type,
   // is better than VARYING.
   if (wi::lt_p (new_lb, 0, sign) || wi::ge_p (new_ub, 0, sign))
 value_range_with_overflow (r, type, new_lb, new_ub);
+  else if (sign == SIGNED
+  && wi_optimize_signed_bitwise_op (r, type,
+lh_lb, lh_ub,
+rh_lb, rh_ub))
+;  /* Do nothing.  */
   else
 r.set_varying (type);
+
+  /* Furthermore, XOR is non-zero if its arguments can't be equal.  */
+  i

Re: [PATCH] libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

2022-02-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 01, 2022 at 12:28:57AM +0100, Jakub Jelinek via Gcc-patches wrote:
> I haven't, but will do so now.

So, I've done another bootstrap/regtest with incremental
--- gcc/c-family/c-lex.cc.jj2022-01-18 00:18:02.558747051 +0100
+++ gcc/c-family/c-lex.cc   2022-02-01 00:39:47.314183266 +0100
@@ -297,6 +297,7 @@ get_token_no_padding (cpp_reader *pfile)
   ret = cpp_get_token (pfile);
   if (ret->type != CPP_PADDING)
return ret;
+  gcc_assert ((ret->flags & PREV_WHITE) == 0);
 }
 }
 
@@ -487,6 +488,7 @@ c_lex_with_flags (tree *value, location_
   switch (type)
 {
 case CPP_PADDING:
+  gcc_assert ((tok->flags & PREV_WHITE) == 0);
   goto retry;
 
 case CPP_NAME:
@@ -1267,6 +1269,7 @@ lex_string (const cpp_token *tok, tree *
   switch (tok->type)
 {
 case CPP_PADDING:
+  gcc_assert ((tok->flags & PREV_WHITE) == 0);
   goto retry;
 case CPP_ATSIGN:
   if (objc_string)
--- libcpp/macro.cc.jj  2022-01-31 22:11:34.0 +0100
+++ libcpp/macro.cc 2022-02-01 00:28:30.717339868 +0100
@@ -1373,6 +1373,7 @@ funlike_invocation_p (cpp_reader *pfile,
   token = cpp_get_token (pfile);
   if (token->type != CPP_PADDING)
break;
+  gcc_assert ((token->flags & PREV_WHITE) == 0);
   if (padding == NULL
  || padding->val.source == NULL
  || (!(padding->val.source->flags & PREV_WHITE)

patch.  The funlike_invocation_p macro never triggered, the other
asserts did on some tests, see below for a full list.
This seems to be caused by #pragma/_Pragma handling.
do_pragma does:
  pfile->directive_result.src_loc = pragma_token_virt_loc;
  pfile->directive_result.type = CPP_PRAGMA;
  pfile->directive_result.flags = pragma_token->flags;
  pfile->directive_result.val.pragma = p->u.ident;
when it sees a pragma, while start_directive does:
  pfile->directive_result.type = CPP_PADDING;
and so does _cpp_do__Pragma.
Now, for #pragma lex.cc will just ignore directive_result if
it has CPP_PADDING type:
  if (_cpp_handle_directive (pfile, result->flags & PREV_WHITE))
{
  if (pfile->directive_result.type == CPP_PADDING)
continue;
  result = &pfile->directive_result;
}
but destringize_and_run does not:
  if (pfile->directive_result.type == CPP_PRAGMA)
{
...
}
  else
{
  count = 1;
  toks = XNEW (cpp_token);
  toks[0] = pfile->directive_result;
and from there it will copy type member of CPP_PADDING, but all the
other members from the last CPP_PRAGMA before it.
Small testcase for it with no option (at least no -fopenmp or -fopenmp-simd).
#pragma GCC push_options
#pragma GCC ignored "-Wformat"
#pragma GCC pop_options
void
foo ()
{
  _Pragma ("omp simd")
  for (int i = 0; i < 64; i++)
;
}
I wonder if we shouldn't replace that
  toks[0] = pfile->directive_result;
line with
  toks[0] = pfile->avoid_paste;
or even replace those
  toks = XNEW (cpp_token);
  toks[0] = pfile->directive_result;
lines with
  toks = &pfile->avoid_paste;
(dunno how about the memory management whether something frees
the tokens or not, but e.g. funlike_invocation_p certainly uses the same
_cpp_push_token_context and pushes through that quite often
&pfile->avoid_paste).

+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_construct.cc (test for 
excess errors)
+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_copy_move.cc (test for 
excess errors)
+FAIL: 20_util/specialized_algorithms/pstl/uninitialized_fill_destroy.cc (test 
for excess errors)
+FAIL: 25_algorithms/pstl/alg_merge/inplace_merge.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_merge/merge.cc (test for excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/copy_if.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/copy_move.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/fill.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/generate.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/is_partitioned.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/partition.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/partition_copy.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/remove.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/remove_copy.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/replace.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/replace_copy.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/rotate.cc (test for excess 
errors)
+FAIL: 25_algorithms/pstl/alg_modifying_operations/rotate_copy.cc (test for 
excess errors)
+FAIL: 25_algorithms/pstl/alg_modi

[PATCH] veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]

2022-02-01 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase fails -fcompare-debug, because expand_vector_comparison
since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets
vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses
the lhs in its condition.
Obviously we should ignore debug stmts when doing so, e.g. by not pushing
them to uses.
That would be a 2 liner change, but while looking at it, I'm also worried
about VEC_COND_EXPRs that would use the lhs in more than one operand,
like VEC_COND_EXPR  or VEC_COND_EXPR 
(sure, they ought to be folded, but what if they weren't).  Because if
something like that happens, then FOR_EACH_IMM_USE_FAST would push the same
stmt multiple times and expand_vector_condition can return true even when
it modifies it (for vector bool masking).
And lastly, it seems quite wasteful to safe_push statements that will just
cause vec_cond_expr_only = false; and break; in the second loop, both for
cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like
999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR
use.  So this patch only pushes VEC_COND_EXPRs there.  As
expand_vector_condition modifies the IL, it checks the condition again as
before.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-02-01  Jakub Jelinek  

PR middle-end/104307
* tree-vect-generic.cc (expand_vector_comparison): Don't push debug
stmts to uses vector, just set vec_cond_expr_only to false for
non-VEC_COND_EXPRs instead of pushing them into uses.  Treat
VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too
like non-VEC_COND_EXPRs.

* gcc.target/i386/pr104307.c: New test.

--- gcc/tree-vect-generic.cc.jj 2022-01-20 11:30:45.641577244 +0100
+++ gcc/tree-vect-generic.cc2022-01-31 18:01:29.062568721 +0100
@@ -436,29 +436,43 @@ expand_vector_comparison (gimple_stmt_it
  feeding a VEC_COND_EXPR statement.  */
   auto_vec uses;
   FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
-uses.safe_push (USE_STMT (use_p));
-
-  for (unsigned i = 0; i < uses.length (); i ++)
 {
-  gassign *use = dyn_cast (uses[i]);
-  if (use != NULL
+  gimple *use = USE_STMT (use_p);
+  if (is_gimple_debug (use))
+   continue;
+  if (is_gimple_assign (use)
  && gimple_assign_rhs_code (use) == VEC_COND_EXPR
- && gimple_assign_rhs1 (use) == lhs)
-   {
- gimple_stmt_iterator it = gsi_for_stmt (use);
- if (!expand_vector_condition (&it, dce_ssa_names))
-   {
- vec_cond_expr_only = false;
- break;
-   }
-   }
+ && gimple_assign_rhs1 (use) == lhs
+ && gimple_assign_rhs2 (use) != lhs
+ && gimple_assign_rhs3 (use) != lhs)
+   uses.safe_push (use);
   else
-   {
- vec_cond_expr_only = false;
- break;
-   }
+   vec_cond_expr_only = false;
 }
 
+  if (vec_cond_expr_only)
+for (gimple *use : uses)
+  {
+   if (is_gimple_assign (use)
+   && gimple_assign_rhs_code (use) == VEC_COND_EXPR
+   && gimple_assign_rhs1 (use) == lhs
+   && gimple_assign_rhs2 (use) != lhs
+   && gimple_assign_rhs3 (use) != lhs)
+ {
+   gimple_stmt_iterator it = gsi_for_stmt (use);
+   if (!expand_vector_condition (&it, dce_ssa_names))
+ {
+   vec_cond_expr_only = false;
+   break;
+ }
+ }
+   else
+ {
+   vec_cond_expr_only = false;
+   break;
+ }
+  }
+
   if (!uses.is_empty () && vec_cond_expr_only)
 return NULL_TREE;
 
--- gcc/testsuite/gcc.target/i386/pr104307.c.jj 2022-01-31 17:34:42.163145798 
+0100
+++ gcc/testsuite/gcc.target/i386/pr104307.c2022-01-31 17:35:14.111696698 
+0100
@@ -0,0 +1,6 @@
+/* PR middle-end/104307 */
+/* { dg-do compile } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-O2 -mavx512f -fcompare-debug " } */
+
+#include "pr78669.c"

Jakub



Re: [PATCH] veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]

2022-02-01 Thread Richard Biener via Gcc-patches
On Tue, 1 Feb 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase fails -fcompare-debug, because expand_vector_comparison
> since r11-1786-g1ac9258cca8030745d3c0b8f63186f0adf0ebc27 sets
> vec_cond_expr_only when it sees some use other than VEC_COND_EXPR that uses
> the lhs in its condition.
> Obviously we should ignore debug stmts when doing so, e.g. by not pushing
> them to uses.
> That would be a 2 liner change, but while looking at it, I'm also worried
> about VEC_COND_EXPRs that would use the lhs in more than one operand,
> like VEC_COND_EXPR  or VEC_COND_EXPR  lhs>
> (sure, they ought to be folded, but what if they weren't).  Because if
> something like that happens, then FOR_EACH_IMM_USE_FAST would push the same
> stmt multiple times and expand_vector_condition can return true even when
> it modifies it (for vector bool masking).
> And lastly, it seems quite wasteful to safe_push statements that will just
> cause vec_cond_expr_only = false; and break; in the second loop, both for
> cases like 1000 immediate non-VEC_COND_EXPR uses and for cases like
> 999 VEC_COND_EXPRs with lhs in cond followed by a single non-VEC_COND_EXPR
> use.  So this patch only pushes VEC_COND_EXPRs there.  As
> expand_vector_condition modifies the IL, it checks the condition again as
> before.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

So I think it's all fine besides the handling of VEC_COND_EXPRs where
the use is in rhs1 and rhs2 and/or rhs3 - I don't really understand
your worry here but shouldn't the stmt end up on the vector at least
once?  You can use gimple_assign_rhs1_ptr to see whether the
use is the rhs1 use comparing that with USE_PTR IIRC.  Btw, if you
never push VEC_COND_EXPRs with such double-use it's not necessary
to check again in the second loop?

That said, the other changes look reasonable.

Thanks,
Richard.

> 2022-02-01  Jakub Jelinek  
> 
>   PR middle-end/104307
>   * tree-vect-generic.cc (expand_vector_comparison): Don't push debug
>   stmts to uses vector, just set vec_cond_expr_only to false for
>   non-VEC_COND_EXPRs instead of pushing them into uses.  Treat
>   VEC_COND_EXPRs that use lhs not just in rhs1, but rhs2 or rhs3 too
>   like non-VEC_COND_EXPRs.
> 
>   * gcc.target/i386/pr104307.c: New test.
> 
> --- gcc/tree-vect-generic.cc.jj   2022-01-20 11:30:45.641577244 +0100
> +++ gcc/tree-vect-generic.cc  2022-01-31 18:01:29.062568721 +0100
> @@ -436,29 +436,43 @@ expand_vector_comparison (gimple_stmt_it
>   feeding a VEC_COND_EXPR statement.  */
>auto_vec uses;
>FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
> -uses.safe_push (USE_STMT (use_p));
> -
> -  for (unsigned i = 0; i < uses.length (); i ++)
>  {
> -  gassign *use = dyn_cast (uses[i]);
> -  if (use != NULL
> +  gimple *use = USE_STMT (use_p);
> +  if (is_gimple_debug (use))
> + continue;
> +  if (is_gimple_assign (use)
> && gimple_assign_rhs_code (use) == VEC_COND_EXPR
> -   && gimple_assign_rhs1 (use) == lhs)
> - {
> -   gimple_stmt_iterator it = gsi_for_stmt (use);
> -   if (!expand_vector_condition (&it, dce_ssa_names))
> - {
> -   vec_cond_expr_only = false;
> -   break;
> - }
> - }
> +   && gimple_assign_rhs1 (use) == lhs
> +   && gimple_assign_rhs2 (use) != lhs
> +   && gimple_assign_rhs3 (use) != lhs)
> + uses.safe_push (use);
>else
> - {
> -   vec_cond_expr_only = false;
> -   break;
> - }
> + vec_cond_expr_only = false;
>  }
>  
> +  if (vec_cond_expr_only)
> +for (gimple *use : uses)
> +  {
> + if (is_gimple_assign (use)
> + && gimple_assign_rhs_code (use) == VEC_COND_EXPR
> + && gimple_assign_rhs1 (use) == lhs
> + && gimple_assign_rhs2 (use) != lhs
> + && gimple_assign_rhs3 (use) != lhs)
> +   {
> + gimple_stmt_iterator it = gsi_for_stmt (use);
> + if (!expand_vector_condition (&it, dce_ssa_names))
> +   {
> + vec_cond_expr_only = false;
> + break;
> +   }
> +   }
> + else
> +   {
> + vec_cond_expr_only = false;
> + break;
> +   }
> +  }
> +
>if (!uses.is_empty () && vec_cond_expr_only)
>  return NULL_TREE;
>  
> --- gcc/testsuite/gcc.target/i386/pr104307.c.jj   2022-01-31 
> 17:34:42.163145798 +0100
> +++ gcc/testsuite/gcc.target/i386/pr104307.c  2022-01-31 17:35:14.111696698 
> +0100
> @@ -0,0 +1,6 @@
> +/* PR middle-end/104307 */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2 -mavx512f -fcompare-debug " } */
> +
> +#include "pr78669.c"
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]

2022-02-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 01, 2022 at 10:29:03AM +0100, Richard Biener wrote:
> So I think it's all fine besides the handling of VEC_COND_EXPRs where
> the use is in rhs1 and rhs2 and/or rhs3 - I don't really understand
> your worry here but shouldn't the stmt end up on the vector at least
> once?  You can use gimple_assign_rhs1_ptr to see whether the

My worry is that
  FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
uses.safe_push (USE_STMT (use_p));
for a stmt with multiple uses of lhs pushes the same
stmt multiple times.
And then
  if (a_is_comparison)
a = gimplify_build2 (gsi, code, type, a1, a2);
  a1 = gimplify_build2 (gsi, BIT_AND_EXPR, type, a, b);
  a2 = gimplify_build1 (gsi, BIT_NOT_EXPR, type, a);
  a2 = gimplify_build2 (gsi, BIT_AND_EXPR, type, a2, c);
  a = gimplify_build2 (gsi, BIT_IOR_EXPR, type, a1, a2);
  gimple_assign_set_rhs_from_tree (gsi, a);
  update_stmt (gsi_stmt (*gsi));
will modify it (though the above at least will not remove the
stmt and update it in place I think) and then it won't be
a VEC_COND_EXPR anymore.
To me the non-cond uses in VEC_COND_EXPR conceptually look like
any other unhandled uses that the second loop clears
vec_cond_expr_only on.  But I don't have a testcase, dunno if it is even
possible.

> use is the rhs1 use comparing that with USE_PTR IIRC.  Btw, if you
> never push VEC_COND_EXPRs with such double-use it's not necessary
> to check again in the second loop?

I was just trying to be extra cautious in case expand_vector_comparison
modifies some other stmts, but maybe it is just expand_vector_comparison
in veclower and no other function that modifies anything but the
current stmt (+ pushes some new preparation statements and follow-up
statements).
So perhaps indeed:
+  if (vec_cond_expr_only)
+for (gimple *use : uses)
+  {
+   gimple_stmt_iterator it = gsi_for_stmt (use);
+   if (!expand_vector_condition (&it, dce_ssa_names))
+ {
+   vec_cond_expr_only = false;
+   break;
+ }
+  }
for the second loop is enough.

But sure, if you prefer all I can do:
   FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
-uses.safe_push (USE_STMT (use_p));
+if (!is_gimple_debug (USE_STMT (use_p)))
+  uses.safe_push (USE_STMT (use_p));

and keep the rest for GCC 13.

Jakub



RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms

2022-02-01 Thread Tamar Christina via Gcc-patches
Ping x3

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, January 11, 2022 7:11 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; 'ni...@redhat.com' ;
> Kyrylo Tkachov 
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Monday, December 20, 2021 4:22 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Ramana Radhakrishnan
> > ; Richard Earnshaw
> > ; ni...@redhat.com; Kyrylo Tkachov
> > 
> > Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex
> > mul, fma and fms
> >
> > Updated version of patch following AArch64 review.
> >
> > Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> >
> > Ok for master? and backport along with the first patch?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/102819
> > PR tree-optimization/103169
> > * config/arm/vec-common.md (cml4):
> > Use
> > canonical order.
> >
> > --- inline copy of patch ---
> >
> > diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> > common.md index
> >
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f
> > 4439e6ed363e6e 100644
> > --- a/gcc/config/arm/vec-common.md
> > +++ b/gcc/config/arm/vec-common.md
> > @@ -265,18 +265,18 @@ (define_expand "arm_vcmla"
> >  ;; remainder.  Because of this, expand early.
> >  (define_expand "cml4"
> >[(set (match_operand:VF 0 "register_operand")
> > -   (plus:VF (match_operand:VF 1 "register_operand")
> > -(unspec:VF [(match_operand:VF 2 "register_operand")
> > -(match_operand:VF 3 "register_operand")]
> > -   VCMLA_OP)))]
> > +   (plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> > +(match_operand:VF 2 "register_operand")]
> > +   VCMLA_OP)
> > +(match_operand:VF 3 "register_operand")))]
> >"(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
> >   && ARM_HAVE__ARITH))
> > && !BYTES_BIG_ENDIAN"
> >  {
> >rtx tmp = gen_reg_rtx (mode);
> > -  emit_insn (gen_arm_vcmla (tmp, operands[1],
> > -operands[3], operands[2]));
> > +  emit_insn (gen_arm_vcmla (tmp, operands[3],
> > +operands[2], operands[1]));
> >emit_insn (gen_arm_vcmla (operands[0], tmp,
> > -operands[3], operands[2]));
> > +operands[2], operands[1]));
> >DONE;
> >  })



RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms

2022-02-01 Thread Tamar Christina via Gcc-patches
Ping x3.

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, January 11, 2022 7:11 AM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul,
> fma and fms
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Monday, December 20, 2021 4:21 PM
> > To: Richard Sandiford 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > ; Marcus Shawcroft
> > ; Kyrylo Tkachov
> 
> > Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex
> > mul, fma and fms
> >
> >
> >
> > > -Original Message-
> > > From: Richard Sandiford 
> > > Sent: Friday, December 17, 2021 4:49 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > > ; Marcus Shawcroft
> > > ; Kyrylo Tkachov
> > 
> > > Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex
> > > mul, fma and fms
> > >
> > > Richard Sandiford  writes:
> > > > Tamar Christina  writes:
> > > >> Hi All,
> > > >>
> > > >> After the first patch in the series this updates the optabs to
> > > >> expect the canonical sequence.
> > > >>
> > > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >>
> > > >> Ok for master? and backport along with the first patch?
> > > >>
> > > >> Thanks,
> > > >> Tamar
> > > >>
> > > >> gcc/ChangeLog:
> > > >>
> > > >>PR tree-optimization/102819
> > > >>PR tree-optimization/103169
> > > >>* config/aarch64/aarch64-simd.md
> > > (cml4,
> > > >>cmul3): Use canonical order.
> > > >>* config/aarch64/aarch64-sve.md (cml4,
> > > >>cmul3): Likewise.
> > > >>
> > > >> --- inline copy of patch --
> > > >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> > > >> b/gcc/config/aarch64/aarch64-simd.md
> > > >> index
> > > >>
> > >
> >
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
> > > c
> > > >> fb5649f9b0e73 100644
> > > >> --- a/gcc/config/aarch64/aarch64-simd.md
> > > >> +++ b/gcc/config/aarch64/aarch64-simd.md
> > > >> @@ -556,17 +556,17 @@ (define_insn
> > > "aarch64_fcmlaq_lane"
> > > >>  ;; remainder.  Because of this, expand early.
> > > >>  (define_expand "cml4"
> > > >>[(set (match_operand:VHSDF 0 "register_operand")
> > > >> -  (plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > > >> -  (unspec:VHSDF [(match_operand:VHSDF 2
> > > "register_operand")
> > > >> - (match_operand:VHSDF 3
> > > "register_operand")]
> > > >> - FCMLA_OP)))]
> > > >> +  (plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> > > "register_operand")
> > > >> + (match_operand:VHSDF 2
> > > "register_operand")]
> > > >> + FCMLA_OP)
> > > >> +  (match_operand:VHSDF 3 "register_operand")))]
> > > >>"TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> > > >>  {
> > > >>rtx tmp = gen_reg_rtx (mode);
> > > >> -  emit_insn (gen_aarch64_fcmla (tmp,
> operands[1],
> > > >> -   operands[3], 
> > > >> operands[2]));
> > > >> +  emit_insn (gen_aarch64_fcmla (tmp,
> operands[3],
> > > >> +   operands[1],
> operands[2]));
> > > >>emit_insn (gen_aarch64_fcmla (operands[0],
> tmp,
> > > >> -   operands[3], 
> > > >> operands[2]));
> > > >> +   operands[1],
> operands[2]));
> > > >>DONE;
> > > >>  })
> > > >>
> > > >> @@ -583,9 +583,9 @@ (define_expand "cmul3"
> > > >>rtx tmp = force_reg (mode, CONST0_RTX (mode));
> > > >>rtx res1 = gen_reg_rtx (mode);
> > > >>emit_insn (gen_aarch64_fcmla (res1, tmp,
> > > >> -   operands[2], 
> > > >> operands[1]));
> > > >> +   operands[1],
> operands[2]));
> > > >>emit_insn (gen_aarch64_fcmla (operands[0],
> res1,
> > > >> -   operands[2], 
> > > >> operands[1]));
> > > >> +   operands[1],
> operands[2]));
> > > >
> > > > This doesn't look right.  Going from the documentation, patch 1
> > > > isn't changing the operand order for CMUL: the conjugated operand
> > > > (if there is one) is still operand 2.  The FCMLA sequences use the
> > > > opposite order, where the conjugated operand (if there is one) is
> > operand 1.
> > > > So I think
> > >
> > > I meant “the first multiplication operand” rather than “operand 1” here.
> > >
> > > > the reversal here is still needed.
> > > >
> > > > Same for the multiplication operands in CML* above.
> >
> > I did actually change the order in patch 1, but didn't update the docs..
> > That was done because I followed the SLP order again, but now I've
> > upda

Re: [PATCH] veclower: Fix up -fcompare-debug issue in expand_vector_comparison [PR104307]

2022-02-01 Thread Richard Biener via Gcc-patches
On Tue, 1 Feb 2022, Jakub Jelinek wrote:

> On Tue, Feb 01, 2022 at 10:29:03AM +0100, Richard Biener wrote:
> > So I think it's all fine besides the handling of VEC_COND_EXPRs where
> > the use is in rhs1 and rhs2 and/or rhs3 - I don't really understand
> > your worry here but shouldn't the stmt end up on the vector at least
> > once?  You can use gimple_assign_rhs1_ptr to see whether the
> 
> My worry is that
>   FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
> uses.safe_push (USE_STMT (use_p));
> for a stmt with multiple uses of lhs pushes the same
> stmt multiple times.
> And then
>   if (a_is_comparison)
> a = gimplify_build2 (gsi, code, type, a1, a2);
>   a1 = gimplify_build2 (gsi, BIT_AND_EXPR, type, a, b);
>   a2 = gimplify_build1 (gsi, BIT_NOT_EXPR, type, a);
>   a2 = gimplify_build2 (gsi, BIT_AND_EXPR, type, a2, c);
>   a = gimplify_build2 (gsi, BIT_IOR_EXPR, type, a1, a2);
>   gimple_assign_set_rhs_from_tree (gsi, a);
>   update_stmt (gsi_stmt (*gsi));
> will modify it (though the above at least will not remove the
> stmt and update it in place I think) and then it won't be
> a VEC_COND_EXPR anymore.

Ah, OK.  Sure, pushing the stmt multiple times looks bogus and indeed
if we see we'll visit it a second time for a rhs{2,3} use there's
no point in pushing it in the first place.

> To me the non-cond uses in VEC_COND_EXPR conceptually look like
> any other unhandled uses that the second loop clears
> vec_cond_expr_only on.  But I don't have a testcase, dunno if it is even
> possible.
> 
> > use is the rhs1 use comparing that with USE_PTR IIRC.  Btw, if you
> > never push VEC_COND_EXPRs with such double-use it's not necessary
> > to check again in the second loop?
> 
> I was just trying to be extra cautious in case expand_vector_comparison
> modifies some other stmts, but maybe it is just expand_vector_comparison
> in veclower and no other function that modifies anything but the
> current stmt (+ pushes some new preparation statements and follow-up
> statements).
> So perhaps indeed:
> +  if (vec_cond_expr_only)
> +for (gimple *use : uses)
> +  {
> + gimple_stmt_iterator it = gsi_for_stmt (use);
> + if (!expand_vector_condition (&it, dce_ssa_names))
> +   {
> + vec_cond_expr_only = false;
> + break;
> +   }
> +  }
> for the second loop is enough.

Yes, I think so.

> But sure, if you prefer all I can do:
>FOR_EACH_IMM_USE_FAST (use_p, iterator, lhs)
> -uses.safe_push (USE_STMT (use_p));
> +if (!is_gimple_debug (USE_STMT (use_p)))
> +  uses.safe_push (USE_STMT (use_p));
> 
> and keep the rest for GCC 13.

No, I think the change is fine with the second loop adjusted.

Thanks,
Richard.


RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms

2022-02-01 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Tamar Christina 
> Sent: Monday, December 20, 2021 4:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> Updated version of patch following AArch64 review.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master? and backport along with the first patch?

Ok, sorry I missed it.
Thanks,
Kyrill

> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * config/arm/vec-common.md (cml4):
> Use
>   canonical order.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md
> index
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594
> f4439e6ed363e6e 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -265,18 +265,18 @@ (define_expand "arm_vcmla"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml4"
>[(set (match_operand:VF 0 "register_operand")
> - (plus:VF (match_operand:VF 1 "register_operand")
> -  (unspec:VF [(match_operand:VF 2 "register_operand")
> -  (match_operand:VF 3 "register_operand")]
> - VCMLA_OP)))]
> + (plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> +  (match_operand:VF 2 "register_operand")]
> + VCMLA_OP)
> +  (match_operand:VF 3 "register_operand")))]
>"(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
> && ARM_HAVE__ARITH)) &&
> !BYTES_BIG_ENDIAN"
>  {
>rtx tmp = gen_reg_rtx (mode);
> -  emit_insn (gen_arm_vcmla (tmp, operands[1],
> -  operands[3], operands[2]));
> +  emit_insn (gen_arm_vcmla (tmp, operands[3],
> +  operands[2], operands[1]));
>emit_insn (gen_arm_vcmla (operands[0], tmp,
> -  operands[3], operands[2]));
> +  operands[2], operands[1]));
>DONE;
>  })
> 



RE: [PATCH][AArch32]: correct usdot-product RTL patterns.

2022-02-01 Thread Tamar Christina via Gcc-patches
Ping x3

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, January 11, 2022 7:10 AM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: RE: [PATCH][AArch32]: correct usdot-product RTL patterns.
> 
> ping
> 
> > -Original Message-
> > From: Tamar Christina
> > Sent: Tuesday, December 21, 2021 12:32 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: nd ; Ramana Radhakrishnan
> > ; Richard Earnshaw
> > ; ni...@redhat.com; Kyrylo Tkachov
> > 
> > Subject: [PATCH][AArch32]: correct usdot-product RTL patterns.
> >
> > Hi All,
> >
> > There was a bug in the ACLE specication for dot product which has now
> > been fixed[1].  This means some intrinsics were missing and are added
> > by this patch.
> >
> > Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
> >
> > Ok for master?
> >
> > [1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/arm/arm_neon.h (vusdotq_s32, vusdot_laneq_s32,
> > vusdotq_laneq_s32, vsudot_laneq_s32, vsudotq_laneq_s32): New
> > * config/arm/arm_neon_builtins.def (usdot): Add V16QI.
> > (usdot_laneq, sudot_laneq): New.
> > * config/arm/neon.md (neon_dot_laneq): New.
> > (neon_dot_lane): Remote unneeded code.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/simd/vdot-2-1.c: Add new tests.
> > * gcc.target/arm/simd/vdot-2-2.c: Likewise and fix output.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> > index
> >
> af6ac63dc3b47830d92f199d93153ff510f658e9..2255d600549a2a1e5dbcebc03f
> > 7d6a63bab9f5aa 100644
> > --- a/gcc/config/arm/arm_neon.h
> > +++ b/gcc/config/arm/arm_neon.h
> > @@ -18930,6 +18930,13 @@ vusdot_s32 (int32x2_t __r, uint8x8_t __a,
> > int8x8_t __b)
> >return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);  }
> >
> > +__extension__ extern __inline int32x4_t __attribute__
> > +((__always_inline__, __gnu_inline__, __artificial__))
> > +vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) {
> > +  return __builtin_neon_usdotv16qi_ssus (__r, __a, __b); }
> > +
> >  __extension__ extern __inline int32x2_t  __attribute__
> > ((__always_inline__, __gnu_inline__, __artificial__))
> >  vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, @@ -18962,6 +18969,38
> > @@
> > vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
> >return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b,
> > __index);  }
> >
> > +__extension__ extern __inline int32x2_t __attribute__
> > +((__always_inline__, __gnu_inline__, __artificial__))
> > +vusdot_laneq_s32 (int32x2_t __r, uint8x8_t __a,
> > + int8x16_t __b, const int __index) {
> > +  return __builtin_neon_usdot_laneqv8qi_ssuss (__r, __a, __b,
> > +__index); }
> > +
> > +__extension__ extern __inline int32x4_t __attribute__
> > +((__always_inline__, __gnu_inline__, __artificial__))
> > +vusdotq_laneq_s32 (int32x4_t __r, uint8x16_t __a,
> > +  int8x16_t __b, const int __index) {
> > +  return __builtin_neon_usdot_laneqv16qi_ssuss (__r, __a, __b,
> > +__index); }
> > +
> > +__extension__ extern __inline int32x2_t __attribute__
> > +((__always_inline__, __gnu_inline__, __artificial__))
> > +vsudot_laneq_s32 (int32x2_t __r, int8x8_t __a,
> > + uint8x16_t __b, const int __index) {
> > +  return __builtin_neon_sudot_laneqv8qi_sssus (__r, __a, __b,
> > +__index); }
> > +
> > +__extension__ extern __inline int32x4_t __attribute__
> > +((__always_inline__, __gnu_inline__, __artificial__))
> > +vsudotq_laneq_s32 (int32x4_t __r, int8x16_t __a,
> > +  uint8x16_t __b, const int __index) {
> > +  return __builtin_neon_sudot_laneqv16qi_sssus (__r, __a, __b,
> > +__index); }
> > +
> >  #pragma GCC pop_options
> >
> >  #pragma GCC pop_options
> > diff --git a/gcc/config/arm/arm_neon_builtins.def
> > b/gcc/config/arm/arm_neon_builtins.def
> > index
> >
> f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5..1c150ed3b650a003b44901b4d
> > 160a7d6f595f057 100644
> > --- a/gcc/config/arm/arm_neon_builtins.def
> > +++ b/gcc/config/arm/arm_neon_builtins.def
> > @@ -345,9 +345,11 @@ VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
> >  VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
> >  VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
> >
> > -VAR1 (USTERNOP, usdot, v8qi)
> > +VAR2 (USTERNOP, usdot, v8qi, v16qi)
> >  VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
> >  VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
> > +VAR2 (USMAC_LANE_QUADTUP, usdot_laneq, v8qi, v16qi)
> > +VAR2 (SUMAC_LANE_QUADTUP, sudot_laneq, v8qi, v16qi)
> >
> >  VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
> >  VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) diff --git
> > a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> >
> 848166311b5f82c5facb66e97c2260a5aba5d302..1707d8e625079b83497a3db44
> > db5e33405bb5fa1 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.m

RE: [PATCH][AArch32]: correct usdot-product RTL patterns.

2022-02-01 Thread Kyrylo Tkachov via Gcc-patches


> -Original Message-
> From: Tamar Christina 
> Sent: Tuesday, December 21, 2021 12:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH][AArch32]: correct usdot-product RTL patterns.
> 
> Hi All,
> 
> There was a bug in the ACLE specication for dot product which has now
> been fixed[1].  This means some intrinsics were missing and are added by
> this
> patch.
> 
> Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master?

Ok.
Thanks,
Kyrill

> 
> [1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_neon.h (vusdotq_s32, vusdot_laneq_s32,
>   vusdotq_laneq_s32, vsudot_laneq_s32, vsudotq_laneq_s32): New
>   * config/arm/arm_neon_builtins.def (usdot): Add V16QI.
>   (usdot_laneq, sudot_laneq): New.
>   * config/arm/neon.md (neon_dot_laneq): New.
>   (neon_dot_lane): Remote unneeded code.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/simd/vdot-2-1.c: Add new tests.
>   * gcc.target/arm/simd/vdot-2-2.c: Likewise and fix output.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
> index
> af6ac63dc3b47830d92f199d93153ff510f658e9..2255d600549a2a1e5dbcebc0
> 3f7d6a63bab9f5aa 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -18930,6 +18930,13 @@ vusdot_s32 (int32x2_t __r, uint8x8_t __a,
> int8x8_t __b)
>return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);
>  }
> 
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
> +{
> +  return __builtin_neon_usdotv16qi_ssus (__r, __a, __b);
> +}
> +
>  __extension__ extern __inline int32x2_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a,
> @@ -18962,6 +18969,38 @@ vsudotq_lane_s32 (int32x4_t __r, int8x16_t
> __a,
>return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);
>  }
> 
> +__extension__ extern __inline int32x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vusdot_laneq_s32 (int32x2_t __r, uint8x8_t __a,
> +   int8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_usdot_laneqv8qi_ssuss (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vusdotq_laneq_s32 (int32x4_t __r, uint8x16_t __a,
> +int8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_usdot_laneqv16qi_ssuss (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vsudot_laneq_s32 (int32x2_t __r, int8x8_t __a,
> +   uint8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_sudot_laneqv8qi_sssus (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x4_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vsudotq_laneq_s32 (int32x4_t __r, int8x16_t __a,
> +uint8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_sudot_laneqv16qi_sssus (__r, __a, __b, __index);
> +}
> +
>  #pragma GCC pop_options
> 
>  #pragma GCC pop_options
> diff --git a/gcc/config/arm/arm_neon_builtins.def
> b/gcc/config/arm/arm_neon_builtins.def
> index
> f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5..1c150ed3b650a003b44901b4
> d160a7d6f595f057 100644
> --- a/gcc/config/arm/arm_neon_builtins.def
> +++ b/gcc/config/arm/arm_neon_builtins.def
> @@ -345,9 +345,11 @@ VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
>  VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
>  VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
> 
> -VAR1 (USTERNOP, usdot, v8qi)
> +VAR2 (USTERNOP, usdot, v8qi, v16qi)
>  VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
>  VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
> +VAR2 (USMAC_LANE_QUADTUP, usdot_laneq, v8qi, v16qi)
> +VAR2 (SUMAC_LANE_QUADTUP, sudot_laneq, v8qi, v16qi)
> 
>  VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
>  VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf)
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index
> 848166311b5f82c5facb66e97c2260a5aba5d302..1707d8e625079b83497a3db
> 44db5e33405bb5fa1 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -2977,9 +2977,33 @@ (define_insn "neon_dot_lane"
>   DOTPROD_I8MM)
> (match_operand:VCVTI 1 "register_operand" "0")))]
>"TARGET_I8MM"
> +  "vdot.\\t%0, %2, %P3[%c4]"
> +  [(set_attr "type" "neon_dot")]
> +)
> +
> +;; These instructions map to the __builtins for the Dot Product
> +;; indexed operations in the v8.6 I8MM extension.
> +(define_insn "neon_dot_laneq"
> +  [(set (match_operand:VCVTI 0 "register_operand" "=w")
> + (plus:V

Re: [PATCH][1/4][committed] aarch64: Add support for Armv8.8-a memory operations and memcpy expansion

2022-02-01 Thread Richard Sandiford via Gcc-patches
Kyrylo Tkachov  writes:
> Hi Richard,
>
> Sorry for the delay in getting back to this. I'm now working on a patch to 
> adjust this.
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, December 14, 2021 10:48 AM
>> To: Kyrylo Tkachov via Gcc-patches 
>> Cc: Kyrylo Tkachov 
>> Subject: Re: [PATCH][1/4][committed] aarch64: Add support for Armv8.8-a
>> memory operations and memcpy expansion
>> 
>> Kyrylo Tkachov via Gcc-patches  writes:
>> > @@ -23568,6 +23568,28 @@
>> aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
>> >*dst = aarch64_progress_pointer (*dst);
>> >  }
>> >
>> > +/* Expand a cpymem using the MOPS extension.  OPERANDS are taken
>> > +   from the cpymem pattern.  Return true iff we succeeded.  */
>> > +static bool
>> > +aarch64_expand_cpymem_mops (rtx *operands)
>> > +{
>> > +  if (!TARGET_MOPS)
>> > +return false;
>> > +  rtx addr_dst = XEXP (operands[0], 0);
>> > +  rtx addr_src = XEXP (operands[1], 0);
>> > +  rtx sz_reg = operands[2];
>> > +
>> > +  if (!REG_P (sz_reg))
>> > +sz_reg = force_reg (DImode, sz_reg);
>> > +  if (!REG_P (addr_dst))
>> > +addr_dst = force_reg (DImode, addr_dst);
>> > +  if (!REG_P (addr_src))
>> > +addr_src = force_reg (DImode, addr_src);
>> > +  emit_insn (gen_aarch64_cpymemdi (addr_dst, addr_src, sz_reg));
>> > +
>> > +  return true;
>> > +}
>> 
>> On this, I think it would be better to adjust the original src and dst
>> MEMs if possible, since they contain metadata about the size of the
>> access and alias set information.  It looks like the code above
>> generates an instruction with a wild read and a wild write instead.
>> 
>
> Hmm, do you mean adjust the address of the MEMs in operands with something 
> like replace_equiv_address_nv?

Yeah.

>> It should be possible to do that with a define_expand/define_insn
>> pair, where the define_expand takes two extra operands for the MEMs,
>> but the define_insn contains the same operands as now.
>
> Could you please expand on this a bit? This path is reached from the cpymemdi 
> expander that already takes the two MEMs as operands and generates the 
> aarch64_cpymemdi define_insn that uses just the address registers as 
> operands. Should we carry the MEMs around in the define_insn as well after 
> expand?

It could be a second expander.  E.g.:

(define_expand "aarch64_cpymemdi"
  [(parallel
 [(set (match_operand 2) (const_int 0))
  (clobber (match_operand 0))
  (clobber (match_operand 1))
  (set (match_operand 3)
   (unspec:BLK [(match_operand 4) (match_dup 2)] UNSPEC_CPYMEM))])]
  "TARGET_MOPS"
)

with the existing define_insn maybe becoming *aarch64_cpymemdi.
(The define_insn doesn't need the outer parallel.)

Thanks,
Richard


Re: [PATCH v3] [AARCH64] Fix PR target/103100 -mstrict-align and memset on not aligned buffers

2022-02-01 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> The problem here is that aarch64_expand_setmem does not change the alignment
> for strict alignment case. This is version 3 of this patch, is is based on
> version 2 and moves the check for the number of instructions from the
> optimizing for size case to be always and change the cost of libcalls for
> the !size case to be max_size/16 + 1 (or 17) which was the same as before
> when handling just the max_size.

In this case, if we want the number to be 17 for strict alignment targets,
I think we should just pick 17.  It feels odd to be setting the libcall
cost based on AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS.

I notice that you didn't go for the suggestion in the previous review
round:

| I think in practice the code has only been tuned on targets that
| support LDP/STP Q, so how about moving the copy_limit calculation
| further up and doing:
|
|   unsigned max_set_size = (copy_limit * 8) / BITS_PER_UNIT;
|
| ?

Does that not work?  One advantage of it is that the:

  if (len > max_set_size && !TARGET_MOPS)
return false;

bail-out will kick in sooner, stopping us from generating (say)
255 garbage insns for a byte-aligned 255-byte copy.

Thanks,
Richard

> The main change is dealing with strict
> alignment case where we only inline a max of 17 instructions as at that
> point the call to the memset will be faster and could handle the dynamic
> alignment instead of just the static alignment.
>
> Note the reason why it is +1 is to count for the setting of the simd
> duplicate.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
>   PR target/103100
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64.cc (aarch64_expand_setmem): Constraint
>   copy_limit to the alignment of the mem if STRICT_ALIGNMENT is
>   true. Also constraint the number of instructions for the !size
>   case to max_size/16 + 1.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/memset-strict-align-1.c: Update test.
>   Reduce the size down to 207 and make s1 global and aligned
>   to 16 bytes.
>   * gcc.target/aarch64/memset-strict-align-2.c: New test.
> ---
>  gcc/config/aarch64/aarch64.cc | 55 ++-
>  .../aarch64/memset-strict-align-1.c   | 20 +++
>  .../aarch64/memset-strict-align-2.c   | 14 +
>  3 files changed, 53 insertions(+), 36 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/memset-strict-align-2.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 296145e6008..02ecb2154ea 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -23831,8 +23831,11 @@ aarch64_expand_setmem (rtx *operands)
>  (zero constants can use XZR directly).  */
>unsigned mops_cost = 3 + 1 + cst_val;
>/* A libcall to memset in the worst case takes 3 instructions to prepare
> - the arguments + 1 for the call.  */
> -  unsigned libcall_cost = 4;
> + the arguments + 1 for the call.
> + In the case of not optimizing for size the cost of doing a libcall
> + is the max_set_size / 16 + 1 or 17 instructions. The one instruction
> + is for the vector dup which may or may not be used.  */
> +  unsigned libcall_cost = size_p ? 4 : (max_set_size / 16 + 1);
>  
>/* Upper bound check.  For large constant-sized setmem use the MOPS 
> sequence
>   when available.  */
> @@ -23842,12 +23845,12 @@ aarch64_expand_setmem (rtx *operands)
>  
>/* Attempt a sequence with a vector broadcast followed by stores.
>   Count the number of operations involved to see if it's worth it
> - against the alternatives.  A simple counter simd_ops on the
> + against the alternatives.  A simple counter inlined_ops on the
>   algorithmically-relevant operations is used rather than an rtx_insn 
> count
>   as all the pointer adjusmtents and mode reinterprets will be optimized
>   away later.  */
>start_sequence ();
> -  unsigned simd_ops = 0;
> +  unsigned inlined_ops = 0;
>  
>base = copy_to_mode_reg (Pmode, XEXP (dst, 0));
>dst = adjust_automodify_address (dst, VOIDmode, base, 0);
> @@ -23855,15 +23858,22 @@ aarch64_expand_setmem (rtx *operands)
>/* Prepare the val using a DUP/MOVI v0.16B, val.  */
>src = expand_vector_broadcast (V16QImode, val);
>src = force_reg (V16QImode, src);
> -  simd_ops++;
> +  inlined_ops++;
>/* Convert len to bits to make the rest of the code simpler.  */
>n = len * BITS_PER_UNIT;
>  
>/* Maximum amount to copy in one go.  We allow 256-bit chunks based on the
>   AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter.  */
> -  const int copy_limit = (aarch64_tune_params.extra_tuning_flags
> -   & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)
> -   ? GET_MODE_BITSIZE (TImode) : 256;
> +  int copy_limit;
> +
> +  if (aarch64_tune_params.extra_tuning_flags
> +  & AARCH64_EXTRA_TUN

[PATCH v3][GCC13] RISC-V: Provide `fmin'/`fmax' RTL patterns

2022-02-01 Thread Maciej W. Rozycki
As at r2.2 of the RISC-V ISA specification[1] (equivalent to version 2.0 
of the "F" and "D" standard architecture extensions for single-precision 
and double-precision floating-point respectively) the FMIN and FMAX 
machine instructions fully match our requirement for the `fminM3' and 
`fmaxM3' standard RTL patterns:

"For FMIN and FMAX, if at least one input is a signaling NaN, or if both 
inputs are quiet NaNs, the result is the canonical NaN.  If one operand 
is a quiet NaN and the other is not a NaN, the result is the non-NaN 
operand."

suitably for the IEEE 754-2008 `minNum' and `maxNum' operations.

However we only define `sminM3' and `smaxM3' standard RTL patterns to 
produce the FMIN and FMAX machine instructions, which in turn causes the 
`__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the 
corresponding libcalls rather than the relevant machine instructions.  
This is according to earlier revisions of the RISC-V ISA specification, 
which we however do not support anymore, as from commit 4b81528241ca 
("RISC-V: Support version controling for ISA standard extensions").

As from r20190608 of the RISC-V ISA specification (equivalent to version 
2.2 of the "F" and "D" standard ISA extensions for single-precision and 
double-precision floating-point respectively) the definition of the FMIN 
and FMAX machine instructions has been updated[2]:

"Defined the signed-zero behavior of FMIN.fmt and FMAX.fmt, and changed 
their behavior on signaling-NaN inputs to conform to the minimumNumber 
and maximumNumber operations in the proposed IEEE 754-201x 
specification."

and specifically[3]:

"Floating-point minimum-number and maximum-number instructions FMIN.S 
and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 to 
rd.  For the purposes of these instructions only, the value -0.0 is 
considered to be less than the value +0.0.  If both inputs are NaNs, the 
result is the canonical NaN.  If only one operand is a NaN, the result 
is the non-NaN operand.  Signaling NaN inputs set the invalid operation 
exception flag, even when the result is not NaN."

Consequently for forwards compatibility with r20190608+ hardware we 
cannot use the FMIN and FMAX machine instructions unconditionally even 
where the ISA level of r2.2 has been specified with the `-misa-spec=2.2' 
option where operation would be different between ISA revisions, that 
is the handling of signaling NaN inputs.

Therefore provide new `fmin3' and `fmax3' patterns removing 
the need to emit libcalls with the `__builtin_fmin' and `__builtin_fmax' 
family of intrinsics, however limit them to where `-fno-signaling-nans' 
is in effect, deferring to other code generation strategies otherwise as 
applicable.  Use newly-defined UNSPECs as the operation codes so that 
the patterns are only ever used if referred to by their names, as there
is no RTL expression defined for the IEEE 754-2008 `minNum' and `maxNum' 
operations.

References:

[1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and 
Propagation", p. 48

[1] "The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA",
Document Version 20190608-Base-Ratified, June 8, 2019, "Preface",
p. ii

[2] same, Section 11.6 "Single-Precision Floating-Point Computational
Instructions", p. 66

gcc/
* config/riscv/riscv.md (UNSPEC_FMIN, UNSPEC_FMAX): New
constants.
(fmin3, fmax3): New insns.
* testsuite/gcc.target/riscv/fmax-snan.c: New test.
* testsuite/gcc.target/riscv/fmax.c: New test.
* testsuite/gcc.target/riscv/fmaxf-snan.c: New test.
* testsuite/gcc.target/riscv/fmaxf.c: New test.
* testsuite/gcc.target/riscv/fmin-snan.c: New test.
* testsuite/gcc.target/riscv/fmin.c: New test.
* testsuite/gcc.target/riscv/fminf-snan.c: New test.
* testsuite/gcc.target/riscv/fminf.c: New test.
* testsuite/gcc.target/riscv/smax-ieee.c: New test.
* testsuite/gcc.target/riscv/smax.c: New test.
* testsuite/gcc.target/riscv/smaxf-ieee.c: New test.
* testsuite/gcc.target/riscv/smaxf.c: New test.
* testsuite/gcc.target/riscv/smin-ieee.c: New test.
* testsuite/gcc.target/riscv/smin.c: New test.
* testsuite/gcc.target/riscv/sminf-ieee.c: New test.
* testsuite/gcc.target/riscv/sminf.c: New test.
---
Hi,

 I thought some test coverage would be good having in this area, so I have 
added test cases for the new `fmin'/`fmax' and the existing `smin'/`smax' 
patterns and in the course of making this improvement I have realised the 
latter oned do actually have to be always available, and that using UNSPEC 
operations will be more appropriate for the former ones, especially as the 
respective RTL expressions previously used have non-IEEE semantics:

'(smin:M X Y)'
'(smax:M X Y)'
 Represents the smaller (for 'smin') or larger (for 'smax') of X and
  

Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms

2022-02-01 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Friday, December 17, 2021 4:49 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex mul,
>> fma and fms
>> 
>> Richard Sandiford  writes:
>> > Tamar Christina  writes:
>> >> Hi All,
>> >>
>> >> After the first patch in the series this updates the optabs to expect
>> >> the canonical sequence.
>> >>
>> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >>
>> >> Ok for master? and backport along with the first patch?
>> >>
>> >> Thanks,
>> >> Tamar
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >>   PR tree-optimization/102819
>> >>   PR tree-optimization/103169
>> >>   * config/aarch64/aarch64-simd.md
>> (cml4,
>> >>   cmul3): Use canonical order.
>> >>   * config/aarch64/aarch64-sve.md (cml4,
>> >>   cmul3): Likewise.
>> >>
>> >> --- inline copy of patch --
>> >> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> >> b/gcc/config/aarch64/aarch64-simd.md
>> >> index
>> >>
>> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
>> c
>> >> fb5649f9b0e73 100644
>> >> --- a/gcc/config/aarch64/aarch64-simd.md
>> >> +++ b/gcc/config/aarch64/aarch64-simd.md
>> >> @@ -556,17 +556,17 @@ (define_insn
>> "aarch64_fcmlaq_lane"
>> >>  ;; remainder.  Because of this, expand early.
>> >>  (define_expand "cml4"
>> >>[(set (match_operand:VHSDF 0 "register_operand")
>> >> - (plus:VHSDF (match_operand:VHSDF 1 "register_operand")
>> >> - (unspec:VHSDF [(match_operand:VHSDF 2
>> "register_operand")
>> >> -(match_operand:VHSDF 3
>> "register_operand")]
>> >> -FCMLA_OP)))]
>> >> + (plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
>> "register_operand")
>> >> +(match_operand:VHSDF 2
>> "register_operand")]
>> >> +FCMLA_OP)
>> >> + (match_operand:VHSDF 3 "register_operand")))]
>> >>"TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
>> >>  {
>> >>rtx tmp = gen_reg_rtx (mode);
>> >> -  emit_insn (gen_aarch64_fcmla (tmp, operands[1],
>> >> -  operands[3], operands[2]));
>> >> +  emit_insn (gen_aarch64_fcmla (tmp, operands[3],
>> >> +  operands[1], operands[2]));
>> >>emit_insn (gen_aarch64_fcmla (operands[0], tmp,
>> >> -  operands[3], operands[2]));
>> >> +  operands[1], operands[2]));
>> >>DONE;
>> >>  })
>> >>
>> >> @@ -583,9 +583,9 @@ (define_expand "cmul3"
>> >>rtx tmp = force_reg (mode, CONST0_RTX (mode));
>> >>rtx res1 = gen_reg_rtx (mode);
>> >>emit_insn (gen_aarch64_fcmla (res1, tmp,
>> >> -  operands[2], operands[1]));
>> >> +  operands[1], operands[2]));
>> >>emit_insn (gen_aarch64_fcmla (operands[0], res1,
>> >> -  operands[2], operands[1]));
>> >> +  operands[1], operands[2]));
>> >
>> > This doesn't look right.  Going from the documentation, patch 1 isn't
>> > changing the operand order for CMUL: the conjugated operand (if there
>> > is one) is still operand 2.  The FCMLA sequences use the opposite
>> > order, where the conjugated operand (if there is one) is operand 1.
>> > So I think
>> 
>> I meant “the first multiplication operand” rather than “operand 1” here.
>> 
>> > the reversal here is still needed.
>> >
>> > Same for the multiplication operands in CML* above.
>
> I did actually change the order in patch 1, but didn't update the docs..
> That was done because I followed the SLP order again, but now I've updated
> them to do what the docs say.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master? and backport along with the first patch?

OK, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * config/aarch64/aarch64-simd.md (cml4): Use
>   canonical order.
>   * config/aarch64/aarch64-sve.md (cml4): Likewise.
>
> --- inline copy of patch ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..9e41610fba85862ef7675bea1e5731b14cab59ce
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -556,17 +556,17 @@ (define_insn "aarch64_fcmlaq_lane"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml4"
>[(set (match_operand:VHSDF 0 "register_operand")
> - (plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> - (unspec:VHSDF [(match_operand:VHSDF 2 "register_operand")
> -  

Re: [PATCH], PR 104253, Fix __ibm128 conversions on IEEE 128-bit system

2022-02-01 Thread Jakub Jelinek via Gcc-patches
On Fri, Jan 28, 2022 at 10:47:06PM -0500, Michael Meissner via Gcc-patches 
wrote:
> 2022-01-28  Michael Meissner  
> 
> gcc/
>   PR target/104253
>   * config/rs6000/rs6000.cc (init_float128_ibm): Use the TF names
>   for builtin conversions between __ibm128 and DImode when long
>   double uses the IEEE 128-bit format.
> 
> gcc/testsuite/
>   PR target/104253
>   * gcc.target/powerpc/pr104253.c: New test.

FYI, this looks good to me, but I think Segher or David should give the
final ack.

Jakub



[PATCH] Fix grammar in C++ header dependency notes

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.

commit d334ee4964c7e186336396b44129197f618100f7
Author: Jonathan Wakely 
Date:   Tue Feb 1 12:37:38 2022 +

Fix grammar in C++ header dependency notes

diff --git a/htdocs/gcc-11/porting_to.html b/htdocs/gcc-11/porting_to.html
index 4d27c163..1d15570e 100644
--- a/htdocs/gcc-11/porting_to.html
+++ b/htdocs/gcc-11/porting_to.html
@@ -93,7 +93,7 @@ GCC 11 now enforces that comparison objects be invocable as 
const.
 
 Header dependency changes
 Some C++ Standard Library headers have been changed to no longer include
-other headers that they do need to depend on.
+other headers that were being used internally by the library.
 As such, C++ programs that used standard library components without
 including the right headers will no longer compile.
 


[wwwdocs] Add C++ "porting to" notes

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.

commit 7193a0fb5d1b164fa676beb35df13cc603a6cf67
Author: Jonathan Wakely 
Date:   Tue Feb 1 12:38:59 2022 +

Add C++ "porting to" notes

diff --git a/htdocs/gcc-12/porting_to.html b/htdocs/gcc-12/porting_to.html
index 42179c11..86d99723 100644
--- a/htdocs/gcc-12/porting_to.html
+++ b/htdocs/gcc-12/porting_to.html
@@ -50,9 +50,54 @@ is no longer accepted and you need to add a cast to it like:
   }
 
 
-
+
+Header dependency changes
+Some C++ Standard Library headers have been changed to no longer include
+other headers that were being used internally by the library.
+As such, C++ programs that used standard library components without
+including the right headers will no longer compile.
+
+
+The following headers are used less widely in libstdc++ and may need to
+be included explicitly when compiled with GCC 12:
+
+
+ 
+  (for std::shared_ptr, std::unique_ptr etc.)
+
+ 
+  (for std::istream_iterator, 
std::istreambuf_iterator)
+
+ 
+  (for std::for_each, std::copy etc.)
+
+ 
+  (for std::pair)
+
+ 
+  (for std::array)
+
+ 
+  (for std::atomic)
+
+
+
+C++ Standard Library deprecations
+
+Warnings have been added for use of C++ standard library features that
+are deprecated (or no longer present at all) in recent C++ standards.
+Where possible, the warning suggests a modern replacement for the
+deprecated feature.
+
+
+The std::iterator base class can usually be replaced by defining
+the same necessary typedefs directly in your iterator class.
+The std::unary_function and std::binary_function
+base classes can often be completely removed, or the typedefs for
+result_type and argument types can be defined directly in your
+class.
+
 
 

[wwwdocs] Document C++20 and C++23 library additions

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.

commit 9f6040a13e7c552a8c1c65352b702dc5d71b369e
Author: Jonathan Wakely 
Date:   Tue Feb 1 12:40:46 2022 +

Document C++20 and C++23 library additions

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index c6baee75..2719b9d5 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -305,6 +305,8 @@ a work-in-progress.
   std::vector, std::basic_string,
   std::optional, and std::variant
   can be used in constexpr functions.
+  std::make_shared for arrays with default initialization,
+  and std::atomic>.
   Layout-compatibility and pointer-interconvertibility traits.
   
 
@@ -315,6 +317,10 @@ a work-in-progress.
   
   std::invoke_r
   std::basic_string::resize_and_overwrite
+  
+  (not built by default, requires linking to an extra library).
+  
+  constexpr std::type_info::operator==
   
 
 


[wwwdocs] Fix grammar for computed goto notes

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.
commit 5f199e1197b67b4304a3be54579c2141c39bff30
Author: Jonathan Wakely 
Date:   Tue Feb 1 12:54:42 2022 +

Fix grammar for computed goto notes

diff --git a/htdocs/gcc-12/porting_to.html b/htdocs/gcc-12/porting_to.html
index 86d99723..7f4a2936 100644
--- a/htdocs/gcc-12/porting_to.html
+++ b/htdocs/gcc-12/porting_to.html
@@ -30,10 +30,10 @@ and provide solutions. Let us know if you have suggestions 
for improvements!
 -->
 C language issues
 
-Computed goto now require a pointer type
+Computed goto now requires a pointer type
 
 
-In GCC 12, computed gotos require a pointer type.
+In GCC 12, a computed goto requires a pointer type.
 An example which was accepted before:
 
   void f(void)


[wwwdocs] Uncomment header for Fortran issues

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs.

commit 996afe17497b8c06f00c763bcf7fe2b971bb0f05
Author: Jonathan Wakely 
Date:   Tue Feb 1 12:56:07 2022 +

Uncomment header for Fortran issues

diff --git a/htdocs/gcc-12/porting_to.html b/htdocs/gcc-12/porting_to.html
index 7f4a2936..470703c7 100644
--- a/htdocs/gcc-12/porting_to.html
+++ b/htdocs/gcc-12/porting_to.html
@@ -99,9 +99,8 @@ base classes can often be completely removed, or the typedefs 
for
 class.
 
 
-
+
 Argument name for CO_REDUCE
 
 


Re: [PATCH][pushed] d: Fix -Werror=format-diag error.

2022-02-01 Thread Iain Buclaw via Gcc-patches
Excerpts from Martin Liška's message of Januar 31, 2022 9:50 am:
> Pushed as obvious.
> 

Thanks,
Iain.



ifcvt: Fix PR104153 and PR104198

2022-02-01 Thread Robin Dapp via Gcc-patches
Hi,

this is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).

cond_exec_get_condition () returns the jump condition directly and we
now it to the backend.  The or1k backend modified the condition in-place
but this modification is not reverted when the sequence in question is
discarded.  Therefore this patch copies the RTX instead of using it
directly.

The SPARC problem is due to the backend recreating the initial condition
when being passed a CC comparison.  This causes the sequence
to read from an already overwritten condition operand.  Generally, this
could also happen on other targets.  The workaround is to always first
emit to a temporary.  In a second run of noce_convert_multiple_1 we know
which sequences actually require the comparison and use no
temporaries if all sequences after the current one do not require it.


Before, I used reg_overlap_mentioned_p () to check the generated
instructions against the condition.  The problem with this is that
reg_overlap... only handles a set of rtx_codes while a backend can
theoretically emit everything in an expander.  Is reg_mentioned_p () the
"right thing" to do?  Maybe it is overly conservative but as soon as we
have more than let's say three insns, we are unlikely to succeed anyway.

Bootstrapped and reg-tested on s390x, Power 9, x86 and SPARC.

Regards
 Robin

--

PR 104198
PR 104153

gcc/ChangeLog:

* ifcvt.cc (noce_convert_multiple_sets_1): Copy rtx instead of
using it
directly.  Rework comparison handling and always perform a
second pass.

gcc/testsuite/ChangeLog:

* gcc.dg/pr104198.c: New test.commit 68489d5729b4879bf2df540753fc7ea8ba1565a5
Author: Robin Dapp 
Date:   Mon Jan 24 10:28:05 2022 +0100

ifcvt: Fix PR104153 and PR104198.

This is a bugfix for aa8cfe785953a0e87d2472311e1260cd98c605c0 which
broke an or1k test case (PR104153) as well as SPARC bootstrap (PR104198).

cond_exec_get_condition () returns the jump condition directly and we now
pass it to the backend.  The or1k backend modified the condition in-place
(other backends do that as well) but this modification is not reverted
when the sequence in question is discarded.  Therefore we copy the RTX
instead of using it directly.

The SPARC problem is due to the SPARC backend recreating the initial
condition when being passed a CC comparison.  This causes the sequence
to read from an already overwritten condition operand.  Generally, this
could also happen on other targets.  The workaround is to always first
emit to a temporary.  In a second run of noce_convert_multiple_1 we know
which sequences actually require the comparison and will use no
temporaries if all sequences after the current one do not require it.

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index fe250d508e1..92c2b40a45a 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -3391,7 +3391,11 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
   rtx cond = noce_get_condition (jump, &cond_earliest, false);
 
   rtx cc_cmp = cond_exec_get_condition (jump);
+  if (cc_cmp)
+cc_cmp = copy_rtx (cc_cmp);
   rtx rev_cc_cmp = cond_exec_get_condition (jump, /* get_reversed */ true);
+  if (rev_cc_cmp)
+rev_cc_cmp = copy_rtx (rev_cc_cmp);
 
   rtx_insn *insn;
   int count = 0;
@@ -3515,6 +3519,7 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
   unsigned cost1 = 0, cost2 = 0;
   rtx_insn *seq, *seq1, *seq2;
   rtx temp_dest = NULL_RTX, temp_dest1 = NULL_RTX, temp_dest2 = NULL_RTX;
+  bool read_comparison = false;
 
   seq1 = try_emit_cmove_seq (if_info, temp, cond,
  new_val, old_val, need_cmov,
@@ -3524,10 +3529,38 @@ noce_convert_multiple_sets_1 (struct noce_if_info *if_info,
 	 as well.  This allows the backend to emit a cmov directly without
 	 creating an additional compare for each.  If successful, costing
 	 is easier and this sequence is usually preferred.  */
-  seq2 = try_emit_cmove_seq (if_info, target, cond,
+  seq2 = try_emit_cmove_seq (if_info, temp, cond,
  new_val, old_val, need_cmov,
  &cost2, &temp_dest2, cc_cmp, rev_cc_cmp);
 
+  /* The backend might have created a sequence that uses the
+	 condition.  Check this.  */
+  rtx_insn *walk = seq2;
+  while (walk)
+	{
+	  rtx set = single_set (walk);
+
+	  if (!set || !SET_SRC (set)) {
+	  walk = NEXT_INSN (walk);
+	  continue;
+	  }
+
+	  rtx src = SET_SRC (set);
+
+	  if (XEXP (set, 1) && GET_CODE (XEXP (set, 1)) == IF_THEN_ELSE)
+	;
+	  else
+	{
+	  if (reg_mentioned_p (XEXP (cond, 0), src)
+		  || reg_mentioned_p (XEXP (cond, 1), src))
+		{
+		  read_comparison = true;
+		  break;
+		}
+	}
+	  walk = NEXT_INSN (walk);
+	}
+
   /* Check which version is less expensive.  */
   if (seq1 != NULL_RTX && (cost1 <= cost2 || seq2 == NULL_RTX))
 	{
@@ -3540,6 +3573,8 @@ noce_conve

[PATCH v2 1/8] rs6000: More factoring of overload processing

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi,

I've modified the previous patch to add more explanatory commentary about
the number-of-arguments test that was previously confusing, and to convert
the switch into an if-then-else chain.  The rest of the patch is unchanged.
Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Remainder of commit message follows:

This patch continues the refactoring started with r12-6014.  I had previously
noted that the resolve_vec* routines can be further simplified by processing
the argument list earlier, so that all routines can use the arrays of arguments
and types.  I found that this was useful for some of the routines, but not for
all of them.

For several of the special-cased overloads, we don't specify all of the
possible type combinations in rs6000-overload.def, because the types don't
matter for the expansion we do.  For these, we can't use generic error message
handling when the number of arguments is incorrect, because the result is
misleading error messages that indicate argument types are wrong.

So this patch goes halfway and improves the factoring on the remaining special
cases, but leaves vec_splats, vec_promote, vec_extract, vec_insert, and
vec_step alone.

Thanks!
Bill


2022-01-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
parameters instead of arglist and nargs.  Simplify accordingly.  Remove
unnecessary test for argument count mismatch.
(resolve_vec_cmpne): Likewise.
(resolve_vec_adde_sube): Likewise.
(resolve_vec_addec_subec): Likewise.
(altivec_resolve_overloaded_builtin): Move overload special handling
after the gathering of arguments into args[] and types[] and the test
for correct number of arguments.  Don't perform the test for correct
number of arguments for certain special cases.  Call the other special
cases with args and types instead of arglist and nargs.
---
 gcc/config/rs6000/rs6000-c.cc | 297 ++
 1 file changed, 120 insertions(+), 177 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 145421ab8f2..4911e5f509c 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -939,37 +939,25 @@ altivec_build_resolved_builtin (tree *args, int n, tree 
fntype, tree ret_type,
 enum resolution { unresolved, resolved, resolved_bad };
 
 /* Resolve an overloaded vec_mul call and return a tree expression for the
-   resolved call if successful.  NARGS is the number of arguments to the call.
-   ARGLIST contains the arguments.  RES must be set to indicate the status of
+   resolved call if successful.  ARGS contains the arguments to the call.
+   TYPES contains their types.  RES must be set to indicate the status of
the resolution attempt.  LOC contains statement location information.  */
 
 static tree
-resolve_vec_mul (resolution *res, vec *arglist, unsigned nargs,
-location_t loc)
+resolve_vec_mul (resolution *res, tree *args, tree *types, location_t loc)
 {
   /* vec_mul needs to be special cased because there are no instructions for it
  for the {un}signed char, {un}signed short, and {un}signed int types.  */
-  if (nargs != 2)
-{
-  error ("builtin %qs only accepts 2 arguments", "vec_mul");
-  *res = resolved;
-  return error_mark_node;
-}
-
-  tree arg0 = (*arglist)[0];
-  tree arg0_type = TREE_TYPE (arg0);
-  tree arg1 = (*arglist)[1];
-  tree arg1_type = TREE_TYPE (arg1);
 
   /* Both arguments must be vectors and the types must be compatible.  */
-  if (TREE_CODE (arg0_type) != VECTOR_TYPE
-  || !lang_hooks.types_compatible_p (arg0_type, arg1_type))
+  if (TREE_CODE (types[0]) != VECTOR_TYPE
+  || !lang_hooks.types_compatible_p (types[0], types[1]))
 {
   *res = resolved_bad;
   return error_mark_node;
 }
 
-  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+  switch (TYPE_MODE (TREE_TYPE (types[0])))
 {
 case E_QImode:
 case E_HImode:
@@ -978,21 +966,21 @@ resolve_vec_mul (resolution *res, vec 
*arglist, unsigned nargs,
 case E_TImode:
   /* For scalar types just use a multiply expression.  */
   *res = resolved;
-  return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
- fold_convert (TREE_TYPE (arg0), arg1));
+  return fold_build2_loc (loc, MULT_EXPR, types[0], args[0],
+ fold_convert (types[0], args[1]));
 case E_SFmode:
   {
/* For floats use the xvmulsp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVMULSP];
-   return build_call_expr (call, 2, arg0, arg1);
+   return build_call_expr (call, 2, args[0], args[1]);
   }
 case E_DFmode:
   {
/* For doubles use the xvmuldp instruction directly.  */
*res = resolved;
tree call = rs6000_builtin_decls[RS6000_BIF_XVM

[PATCH v2 3/8] rs6000: Unify error messages for built-in constant restrictions

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi!

As discussed, I simplified this patch by just changing how the error
message is produced:

We currently give different error messages for built-in functions that
violate range restrictions on their arguments, depending on whether we
record them as requiring an n-bit literal or a literal between two values.
It's better to be consistent.  Change the error message for the n-bit
literal to look like the other one.

Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

Thanks!
Bill


2022-01-31  Bill Schmidt  

gcc/
* config/rs6000/rs6000-call.cc (rs6000_expand_builtin): Revise
error message for RES_BITS case.

gcc/testsuite/
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-10.c:
Adjust error messages.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-2.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-3.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-4.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-5.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-data-class-9.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-4.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-5.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-6.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/bfp/vec-test-data-class-7.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-12.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-14.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-17.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-19.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-22.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-24.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-27.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-29.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-32.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-34.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-37.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-39.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-4.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-42.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-44.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-47.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-49.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-52.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-54.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-57.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-59.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-62.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-64.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-67.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-69.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-7.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-72.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-74.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-77.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-79.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/dfp/dtstsfi-9.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-1.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-2.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-3.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr80315-4.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr82015.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/pr91903.c: Likewise.
* gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c:
Likewise.
* gcc/testsuite/gcc.target/powerpc/vec-ternarylogic-10.c: Likewise.
---
 gcc/config/rs6000/rs6000-call.cc  |  6 +-
 .../powerpc/bfp/scalar-test-data-class-10.c   |  2 +-
 .../powerpc/bfp/scalar-test-data-class-2.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-3.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-4.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-5.c|  2 +-
 .../powerpc/bfp/scalar-test-data-class-9.c|  2 +-
 .../powerpc/bfp/vec-test-data-class-4.c   |  2 +-
 .../powerpc/bfp/vec-test-data-class-5.c   |  2 +-
 .../powerpc/bfp/vec-test-data-class-6.c   |  2 +-
 .../powerpc/

Re: [PATCH] tree-optimization/94899: Remove "+ 0x80000000" in int comparisons

2022-02-01 Thread Arjun Shankar via Gcc-patches
> +/* As a special case, X + C < Y + C is the same as X < Y even with wrapping
> +   overflow if X and Y are signed integers of the same size, and C is an
> +   unsigned constant with all bits except MSB set to 0 and size >= that of
> +   X/Y.  */
> +(for op (lt le ge gt)
> + (simplify
> +  (op (plus:c (convert@0 @1) @4) (plus:c (convert@2 @3) @4))
> +  (if (CONSTANT_CLASS_P (@4)
> +   && TYPE_UNSIGNED (TREE_TYPE (@4))
>
> why include (convert ..) here?  It looks like you could do without,
> merging the special case with the preceding pattern and let a followup
> pattern simplify (lt (convert @1) (convert @2)) instead?

Thanks for taking a look at this patch.

It looks like the convert and plus need to be taken into account
together when applying this simplification.

1. 0x8000 is *just* large enough to be interpreted as an unsigned.

2. So, an expression like...

x + 0x8000 < y + 0x8000;

...where x and y are signed actually gets interpreted as:

(unsigned) x + 0x8000 < (unsigned) y + 0x8000

3. Now, adding 0x8000 to (unsigned) INT_MIN gives us 0,
and adding it to (unsigned) INT_MAX gives us UINT_MAX.

4. So, if x < y is true when they are compared as signed integers, then...
(unsigned) x + 0x8000 < (unsigned) y + 0x8000
...will also be true.

5. i.e. the unsigned comparison must be replaced by a signed
comparison when we remove the constant, and so the constant and
convert need to be matched and removed together.



[PATCH] rs6000: Fix up PCH on powerpc* [PR104323]

2022-02-01 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR and as can be seen on:
--- gcc/testsuite/gcc.dg/pch/pr104323-1.c.jj2022-02-01 13:06:00.163192414 
+0100
+++ gcc/testsuite/gcc.dg/pch/pr104323-1.c   2022-02-01 13:13:41.226712735 
+0100
@@ -0,0 +1,16 @@
+/* PR target/104323 */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include "pr104323-1.h"
+
+__vector int a1 = { 100, 200, 300, 400 };
+__vector int a2 = { 500, 600, 700, 800 };
+__vector int r;
+
+int
+main ()
+{
+  r = vec_add (a1, a2);
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pch/pr104323-1.hs.jj   2022-02-01 13:06:03.180149978 
+0100
+++ gcc/testsuite/gcc.dg/pch/pr104323-1.hs  2022-02-01 13:12:30.175706620 
+0100
@@ -0,0 +1,5 @@
+/* PR target/104323 */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec" } */
+
+#include 
testcase which I'm not including into testsuite because for some reason
the test fails on non-powerpc* targets (is done even on those and fails
because of missing altivec.h etc.), PCH is broken on powerpc*-*-* since the
new builtin generator has been introduced.
The generator contains or emits comments like:
  /*  Cannot mark this as a GC root because only pointer types can
 be marked as GTY((user)) and be GC roots.  All trees in here are
 kept alive by other globals, so not a big deal.  Alternatively,
 we could change the enum fields to ints and cast them in and out
 to avoid requiring a GTY((user)) designation, but that seems
 unnecessarily gross.  */
Having the fntypes stored in other GC roots can work fine for GC,
ggc_collect will then always mark them and so they won't disappear from
the tables, but it definitely doesn't work for PCH, which when the
arrays with fntype members aren't GTY marked means on PCH write we create
copies of those FUNCTION_TYPEs and store in *.gch that the GC roots should
be updated, but don't store that rs6000_builtin_info[?].fntype etc. should
be updated.  When PCH is read again, the blob is read at some other address,
GC roots are updated, rs6000_builtin_info[?].fntype contains garbage
pointers (GC freed pointers with random data, or random unrelated types or
other trees).
The following patch fixes that.  It stops any user markings because that
is totally unnecessary, just skips fields we don't need to mark and adds
GTY(()) to the 2 array variables.  We can get rid of all those global
vars for the fn types, they can be now automatic vars.
With the patch we get
  {
&rs6000_instance_info[0].fntype,
1 * (RS6000_INST_MAX),
sizeof (rs6000_instance_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&rs6000_builtin_info[0].fntype,
1 * (RS6000_BIF_MAX),
sizeof (rs6000_builtin_info[0]),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
as the new roots which is exactly what we want and significantly more
compact than countless
  {
&uv2di_ftype_pudi_usi,
1,
sizeof (uv2di_ftype_pudi_usi),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&uv2di_ftype_lg_puv2di,
1,
sizeof (uv2di_ftype_lg_puv2di),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&uv2di_ftype_lg_pudi,
1,
sizeof (uv2di_ftype_lg_pudi),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
  {
&uv2di_ftype_di_puv2di,
1,
sizeof (uv2di_ftype_di_puv2di),
>_ggc_mx_tree_node,
>_pch_nx_tree_node
  },
cases (822 of these instead of just those 4 shown).

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2022-02-01  Jakub Jelinek  

PR target/104323
* config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h
rather than $(srcdir)/config/rs6000/rs6000-builtins.def.
* config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use
GTY((user)) for struct bifdata and struct ovlddata.  Instead add
GTY((skip(""))) to members with pointer and enum types that don't need
to be tracked.  Add GTY(()) to rs6000_builtin_info and 
rs6000_instance_info
declarations.  Don't emit gt_ggc_mx and gt_pch_nx declarations.
(write_extern_fntype, write_fntype): Remove.
(write_fntype_init): Emit the fntype vars as automatic vars instead
of file scope ones.
(write_header_file): Don't iterate with write_extern_fntype.
(write_init_file): Don't iterate with write_fntype.  Don't emit
gt_ggc_mx and gt_pch_nx definitions.

--- gcc/config/rs6000/t-rs6000.jj   2022-01-18 11:58:59.245986871 +0100
+++ gcc/config/rs6000/t-rs6000  2022-02-01 12:25:32.945144092 +0100
@@ -21,7 +21,7 @@
 TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
 TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
 PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
-EXTRA_GTYPE_DEPS += $(srcdir)/config/rs6000/rs6000-builtins.def
+EXTRA_GTYPE_DEPS += rs6000-builtins.h
 
 rs6000-pcrel-opt.o: $(srcdir)/config/rs6000/rs6000-pcrel-opt.cc
$(COMPILE) $<
--- gcc/config/rs6000/rs6000-gen-builtins

[PATCH] libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens

2022-02-01 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 01, 2022 at 10:03:57AM +0100, Jakub Jelinek via Gcc-patches wrote:
> I wonder if we shouldn't replace that
>   toks[0] = pfile->directive_result;
> line with
>   toks[0] = pfile->avoid_paste;
> or even replace those
>   toks = XNEW (cpp_token);
>   toks[0] = pfile->directive_result;
> lines with
>   toks = &pfile->avoid_paste;

Here is a patch that does that, bootstrapped/regtested on powerpc64le-linux,
ok for trunk?

2022-02-01  Jakub Jelinek  

* directives.cc (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.

--- libcpp/directives.cc.jj 2022-01-18 11:59:00.257972414 +0100
+++ libcpp/directives.cc2022-02-01 13:39:27.240114485 +0100
@@ -1954,8 +1954,7 @@ destringize_and_run (cpp_reader *pfile,
   else
 {
   count = 1;
-  toks = XNEW (cpp_token);
-  toks[0] = pfile->directive_result;
+  toks = &pfile->avoid_paste;
 
   /* If we handled the entire pragma internally, make sure we get the
 line number correct for the next token.  */


Jakub



[PATCH] docs: remove --disable-stage1-checking from requirements

2022-02-01 Thread Martin Liška

As the minimal GCC version that can build the current master
is 4.8, it does not make sense mentioning something for older
versions.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* doc/install.texi: Remove option for GCC < 4.8.
---
 gcc/doc/install.texi | 5 -
 1 file changed, 5 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 96b4dfc871a..f4e7dc42198 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -239,11 +239,6 @@ To build all languages in a cross-compiler or other 
configuration where
 GCC binary (version 4.8 or later) because source code for language
 frontends other than C might use GCC extensions.
 
-Note that to bootstrap GCC with versions of GCC earlier than 4.8, you

-may need to use @option{--disable-stage1-checking}, though
-bootstrapping the compiler with such earlier compilers is strongly
-discouraged.
-
 @item C standard library and headers
 
 In order to build GCC, the C standard library and headers must be present

--
2.34.1



Re: [PATCH] docs: remove --disable-stage1-checking from requirements

2022-02-01 Thread Jeff Law via Gcc-patches




On 2/1/2022 8:37 AM, Martin Liška wrote:

As the minimal GCC version that can build the current master
is 4.8, it does not make sense mentioning something for older
versions.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

* doc/install.texi: Remove option for GCC < 4.8.

OK
jeff



Re: [PATCH][V2] Add mold detection for libs.

2022-02-01 Thread Jeff Law via Gcc-patches




On 1/31/2022 1:46 AM, Martin Liška wrote:

On 1/28/22 23:01, Jonathan Wakely wrote:

On Fri, 28 Jan 2022 at 18:17, Jeff Law wrote:




On 1/24/2022 4:11 AM, Martin Liška wrote:

On 1/21/22 17:54, Jonathan Wakely wrote:

Yes, OK (but please CC the libstdc++ list, not just me).


Hello.

Sorry for that. Anyway, I would like to install the extended version
of the patch
that touches all libraries.

Ready to be installed?

It looks to me like Jon ack'd in his original reply.  "Yes, OK ..."


Yes the libstdc++ part is still OK. I can't approve the equivalent
changes for the other libs.


Hi.

I understand Jeff's OK as an approval also for other libs.

Just to be explicit.  It is ;-)

jeff


[PATCH] gcc-12: Mention -mharden-sls= and -mindirect-branch-cs-prefix

2022-02-01 Thread H.J. Lu via Gcc-patches
---
 htdocs/gcc-12/changes.html | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 2719b9d5..479bd6c5 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -387,6 +387,12 @@ a work-in-progress.
   x86 systems with SSE2 enabled. Without {-mavx512fp16},
   all operations will be emulated in software and float
   instructions.
+  Mitigation against straight line speculation (SLS) for function
+  return and indirect jump is supported via
+  -mharden-sls=[none|all|return|indirect-jmp].
+  
+  Add CS prefix to call and jmp to indirect thunk with branch target
+  in r8-r15 registers via -mindirect-branch-cs-prefix.
   
 
 
-- 
2.34.1



Re: [GCC 11 PATCH 0/5] x86: Backport straight-line-speculation mitigation

2022-02-01 Thread H.J. Lu via Gcc-patches
On Mon, Jan 31, 2022 at 11:21 PM Richard Biener
 wrote:
>
> On Mon, Jan 31, 2022 at 7:56 PM H.J. Lu via Gcc-patches
>  wrote:
> >
> > Backport -mindirect-branch-cs-prefix:
>
> LGTM in case a x86 maintainer also acks this.  Can you amend
> the 10.3 release gcc-11/changes.html notes accordingly?

Did you mean 11.3?

Here is the patch for gcc-12/changes.html:

https://gcc.gnu.org/pipermail/gcc-patches/2022-February/589600.html

> Thanks,
> Richard.
>
> > commit 48a4ae26c225eb018ecb59f131e2c4fd4f3cf89a
> > Author: H.J. Lu 
> > Date:   Wed Oct 27 06:27:15 2021 -0700
> >
> > x86: Add -mindirect-branch-cs-prefix
> >
> > Add -mindirect-branch-cs-prefix to add CS prefix to call and jmp to
> > indirect thunk with branch target in r8-r15 registers so that the call
> > and jmp instruction length is 6 bytes to allow them to be replaced with
> > "lfence; call *%r8-r15" or "lfence; jmp *%r8-r15" at run-time.
> >
> > commit 63738e176726d31953deb03f7e32cf8b760735ac
> > Author: H.J. Lu 
> > Date:   Wed Oct 27 07:48:54 2021 -0700
> >
> > x86: Add -mharden-sls=[none|all|return|indirect-branch]
> >
> > Add -mharden-sls= to mitigate against straight line speculation (SLS)
> > for function return and indirect branch by adding an INT3 instruction
> > after function return and indirect branch.
> >
> > and followup commits to support Linux kernel commits:
> >
> > commit e463a09af2f0677b9485a7e8e4e70b396b2ffb6f
> > Author: Peter Zijlstra 
> > Date:   Sat Dec 4 14:43:44 2021 +0100
> >
> > x86: Add straight-line-speculation mitigation
> >
> > commit 68cf4f2a72ef8786e6b7af6fd9a89f27ac0f520d
> > Author: Peter Zijlstra 
> > Date:   Fri Nov 19 17:50:25 2021 +0100
> >
> > x86: Use -mindirect-branch-cs-prefix for RETPOLINE builds
> >
> > H.J. Lu (5):
> >   x86: Remove "%!" before ret
> >   x86: Add -mharden-sls=[none|all|return|indirect-branch]
> >   x86: Add -mindirect-branch-cs-prefix
> >   x86: Rename -harden-sls=indirect-branch to -harden-sls=indirect-jmp
> >   x86: Generate INT3 for __builtin_eh_return
> >
> >  gcc/config/i386/i386-opts.h   |  7 
> >  gcc/config/i386/i386.c| 38 +--
> >  gcc/config/i386/i386.md   |  2 +-
> >  gcc/config/i386/i386.opt  | 24 
> >  gcc/doc/invoke.texi   | 18 -
> >  gcc/testsuite/gcc.target/i386/harden-sls-1.c  | 14 +++
> >  gcc/testsuite/gcc.target/i386/harden-sls-2.c  | 14 +++
> >  gcc/testsuite/gcc.target/i386/harden-sls-3.c  | 14 +++
> >  gcc/testsuite/gcc.target/i386/harden-sls-4.c  | 16 
> >  gcc/testsuite/gcc.target/i386/harden-sls-5.c  | 17 +
> >  gcc/testsuite/gcc.target/i386/harden-sls-6.c  | 18 +
> >  .../i386/indirect-thunk-cs-prefix-1.c | 14 +++
> >  .../i386/indirect-thunk-cs-prefix-2.c | 15 
> >  13 files changed, 198 insertions(+), 13 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-4.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-5.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/harden-sls-6.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-1.c
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/indirect-thunk-cs-prefix-2.c
> >
> > --
> > 2.34.1
> >



-- 
H.J.


Re: [PATCH] Strengthen memory memory order for atomic::wait/notify

2022-02-01 Thread Thomas Rodgers via Gcc-patches
Tested x86_64-pc-linux-gnu, committed to master.
Backported to GCC-11.

On Sun, Jan 16, 2022 at 12:31 PM Jonathan Wakely  wrote:

>
>
> On Sun, 16 Jan 2022 at 01:48, Thomas Rodgers via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
>
>> This patch updates the memory order of atomic accesses to the waiter's
>> count to match libc++'s usage. It should be backported to GCC11.
>>
>
> The commit subject line says "memory memory order", OK for trunk and
> gcc-11 with that fixed.
>
>
>> Tested x86_64-pc-linux-gnu.
>>
>


[committed][nvptx] Fix reduction lock

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator
(T400, driver version 470.86), I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  \
  execution test
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  \
  execution test
...

The problem is in this code generated for a gang reduction:
...
$L39:
atom.global.cas.b32 %r59, [__reduction_lock], 0, 1;
setp.ne.u32 %r116, %r59, 0;
@%r116  bra $L39;
ld.f64  %r60, [%r44];
ld.f64  %r61, [%r44+8];
ld.f64  %r64, [%r44];
ld.f64  %r65, [%r44+8];
add.f64 %r117, %r64, %r22;
add.f64 %r118, %r65, %r41;
st.f64  [%r44], %r117;
st.f64  [%r44+8], %r118;
atom.global.cas.b32 %r119, [__reduction_lock], 1, 0;
...
which is taking and releasing a lock, but missing the appropriate barriers to
protect the loads and store inside the lock.

Fix this by adding membar.gl barriers.

Likewise, add membar.cta barriers if we protect shared memory loads and
stores (even though the worker-partitioning part of the test-case is not
failing).

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Fix reduction lock

gcc/ChangeLog:

2022-01-27  Tom de Vries  

* config/nvptx/nvptx.cc (enum nvptx_builtins): Add
NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA.
(VOID): New macro.
(nvptx_init_builtins): Add MEMBAR_GL and MEMBAR_CTA.
(nvptx_expand_builtin): Handle NVPTX_BUILTIN_MEMBAR_GL and
NVPTX_BUILTIN_MEMBAR_CTA.
(nvptx_lockfull_update): Add level parameter.  Emit barriers.
(nvptx_reduction_update, nvptx_goacc_reduction_fini): Update call to
nvptx_lockfull_update.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_MEMBAR_GL.
(define_expand "nvptx_membar_gl"): New expand.
(define_insn "*nvptx_membar_gl"): New insn.

---
 gcc/config/nvptx/nvptx.cc | 37 -
 gcc/config/nvptx/nvptx.md | 17 +
 2 files changed, 49 insertions(+), 5 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index db6a405d623..ceea4d3a093 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5622,6 +5622,8 @@ enum nvptx_builtins
   NVPTX_BUILTIN_VECTOR_ADDR,
   NVPTX_BUILTIN_CMP_SWAP,
   NVPTX_BUILTIN_CMP_SWAPLL,
+  NVPTX_BUILTIN_MEMBAR_GL,
+  NVPTX_BUILTIN_MEMBAR_CTA,
   NVPTX_BUILTIN_MAX
 };
 
@@ -5652,6 +5654,7 @@ nvptx_init_builtins (void)
 #define UINT unsigned_type_node
 #define LLUINT long_long_unsigned_type_node
 #define PTRVOID ptr_type_node
+#define VOID void_type_node
 
   DEF (SHUFFLE, "shuffle", (UINT, UINT, UINT, UINT, NULL_TREE));
   DEF (SHUFFLELL, "shufflell", (LLUINT, LLUINT, UINT, UINT, NULL_TREE));
@@ -5661,6 +5664,8 @@ nvptx_init_builtins (void)
(PTRVOID, ST, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
+  DEF (MEMBAR_GL, "membar_gl", (VOID, VOID, NULL_TREE));
+  DEF (MEMBAR_CTA, "membar_cta", (VOID, VOID, NULL_TREE));
 
 #undef DEF
 #undef ST
@@ -5696,6 +5701,14 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 case NVPTX_BUILTIN_CMP_SWAPLL:
   return nvptx_expand_cmp_swap (exp, target, mode, ignore);
 
+case NVPTX_BUILTIN_MEMBAR_GL:
+  emit_insn (gen_nvptx_membar_gl ());
+  return NULL_RTX;
+
+case NVPTX_BUILTIN_MEMBAR_CTA:
+  emit_insn (gen_nvptx_membar_cta ());
+  return NULL_RTX;
+
 default: gcc_unreachable ();
 }
 }
@@ -6243,7 +6256,7 @@ nvptx_lockless_update (location_t loc, 
gimple_stmt_iterator *gsi,
 
 static tree
 nvptx_lockfull_update (location_t loc, gimple_stmt_iterator *gsi,
-  tree ptr, tree var, tree_code op)
+  tree ptr, tree var, tree_code op, int level)
 {
   tree var_type = TREE_TYPE (var);
   tree swap_fn = nvptx_builtin_decl (NVPTX_BUILTIN_CMP_SWAP, true);
@@ -6295,8 +6308,17 @@ nvptx_lockfull_update (location_t loc, 
gimple_stmt_iterator *gsi,
   lock_loop->any_estimate = true;
   add_loop (lock_loop, entry_bb->loop_father);
 
-  /* Build and insert the reduction calculation.  */
+  /* Build the pre-barrier.  */
   gimple_seq red_seq = NULL;
+  enum nvptx_builtins barrier_builtin
+= (level == GOMP_DIM_GANG
+   ? NVPTX_BUILTIN_MEMBAR_GL
+   : NVPTX_BUILTIN_MEMBAR_CTA);
+  tree barrier_fn = nvptx_builtin_decl (barrier_builtin, true);
+  tree barrier_expr = build_call_expr_loc (loc, barrier_fn, 0);
+  gimplify_stmt (&barrier_expr, &red_seq);
+
+  /* Build 

[committed][nvptx] Add some support for .local atomics

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

The ptx insn atom doesn't support local memory.  In case of doing an atomic
operation on local memory, we run into:
...
operation not supported on global/shared address space
...
This is the cuGetErrorString message for CUDA_ERROR_INVALID_ADDRESS_SPACE.

The message is somewhat confusing given that actually the operation is not
supported on local address space.

Fix this by falling back on a non-atomic version when detecting
a frame-related memory operand.

This only solves some cases that are detected at compile-time.  It does
however fix the openacc private-atomic-* test-cases.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add some support for .local atomics

gcc/ChangeLog:

2022-01-27  Tom de Vries  

* config/nvptx/nvptx.md (define_insn "atomic_compare_and_swap_1")
(define_insn "atomic_exchange")
(define_insn "atomic_fetch_add")
(define_insn "atomic_fetch_addsf")
(define_insn "atomic_fetch_"): Output non-atomic version
if memory operands is frame-relative.

gcc/testsuite/ChangeLog:

2022-01-31  Tom de Vries  

* gcc.target/nvptx/stack-atomics-run.c: New test.

libgomp/ChangeLog:

2022-01-27  Tom de Vries  

* testsuite/libgomp.oacc-c-c++-common/private-atomic-1.c: Remove
PR83812 workaround.
* testsuite/libgomp.oacc-fortran/private-atomic-1-vector.f90: Same.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90: Same.

---
 gcc/config/nvptx/nvptx.md  | 82 +-
 gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c | 44 
 .../libgomp.oacc-c-c++-common/private-atomic-1.c   |  7 --
 .../private-atomic-1-vector.f90|  7 --
 .../private-atomic-1-worker.f90|  7 --
 5 files changed, 124 insertions(+), 23 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 773ae8fdc6f..9cbbd956f9d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1790,11 +1790,28 @@ (define_insn "atomic_compare_and_swap_1"
(unspec_volatile:SDIM [(const_int 0)] UNSPECV_CAS))]
   ""
   {
+struct address_info info;
+decompose_mem_address (&info, operands[1]);
+if (info.base != NULL && REG_P (*info.base)
+   && REGNO_PTR_FRAME_P (REGNO (*info.base)))
+  {
+   output_asm_insn ("{", NULL);
+   output_asm_insn ("\\t"".reg.pred"  "\\t" "%%eq_p;", NULL);
+   output_asm_insn ("\\t"".reg%t0""\\t" "%%val;", operands);
+   output_asm_insn ("\\t""ld%A1%t0"   "\\t" "%%val,%1;", operands);
+   output_asm_insn ("\\t""setp.eq%t0" "\\t" "%%eq_p, %%val, %2;",
+operands);
+   output_asm_insn ("@%%eq_p\\t" "st%A1%t0"   "\\t" "%1,%3;", operands);
+   output_asm_insn ("\\t""mov%t0" "\\t" "%0,%%val;", operands);
+   output_asm_insn ("}", NULL);
+   return "";
+  }
 const char *t
-  = "%.\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
+  = "\\tatom%A1.cas.b%T0\\t%0, %1, %2, %3;";
 return nvptx_output_atomic_insn (t, operands, 1, 4);
   }
-  [(set_attr "atomic" "true")])
+  [(set_attr "atomic" "true")
+   (set_attr "predicable" "false")])
 
 (define_insn "atomic_exchange"
   [(set (match_operand:SDIM 0 "nvptx_register_operand" "=R")   ;; output
@@ -1806,6 +1823,19 @@ (define_insn "atomic_exchange"
(match_operand:SDIM 2 "nvptx_nonmemory_operand" "Ri"))] ;; input
   ""
   {
+struct address_info info;
+decompose_mem_address (&info, operands[1]);
+if (info.base != NULL && REG_P (*info.base)
+   && REGNO_PTR_FRAME_P (REGNO (*info.base)))
+  {
+   output_asm_insn ("{", NULL);
+   output_asm_insn ("\\t"   ".reg%t0"  "\\t" "%%val;", operands);
+   output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands);
+   output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%2;", operands);
+   output_asm_insn ("%.\\t" "mov%t0"   "\\t" "%0,%%val;", operands);
+   output_asm_insn ("}", NULL);
+   return "";
+  }
 const char *t
   = "%.\tatom%A1.exch.b%T0\t%0, %1, %2;";
 return nvptx_output_atomic_insn (t, operands, 1, 3);
@@ -1823,6 +1853,22 @@ (define_insn "atomic_fetch_add"
(match_dup 1))]
   ""
   {
+struct address_info info;
+decompose_mem_address (&info, operands[1]);
+if (info.base != NULL && REG_P (*info.base)
+   && REGNO_PTR_FRAME_P (REGNO (*info.base)))
+  {
+   output_asm_insn ("{", NULL);
+   output_asm_insn ("\\t"   ".reg%t0"  "\\t" "%%val;", operands);
+   output_asm_insn ("\\t"   ".reg%t0"  "\\t" "%%update;", operands);
+   output_asm_insn ("%.\\t" "ld%A1%t0" "\\t" "%%val,%1;", operands);
+   output_asm_insn ("%.\\t" "add%t0"   "\\t" "%%update,%%val,%2;",
+operands);
+   output_asm_insn ("%.\\t" "st%A1%t0" "\\t" "%1,%%update;", operands);
+   output_asm_insn ("%.\\t" "mov%t0"   "\\t" "%0,%%val;", o

[committed][nvptx] Handle nop in prevent_branch_around_nothing

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

When running libgomp test-case reduction-7.c on an nvptx accelerator
(T400, driver version 470.86) and GOMP_NVPTX_JIT=-O0, I run into:
...
reduction-7.exe:reduction-7.c:312: v_p_2: \
  Assertion `out[j * 32 + i] == (i + j) * 2' failed.
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-7.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O0  execution test
...

During investigation I found ptx code like this:
...
@ %r163 bra $L262;
$L262:
...

There's a known problem with executing this type of code, and a workaround is
in place to address this: prevent_branch_around_nothing.  The workaround does
not trigger though because it doesn't handle the nop insn.

Fix this by handling the nop insn in prevent_branch_around_nothing.

Tested libgomp on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Handle nop in prevent_branch_around_nothing

gcc/ChangeLog:

2022-01-27  Tom de Vries  

PR target/100428
* config/nvptx/nvptx.cc (prevent_branch_around_nothing): Handle nop
insn.

---
 gcc/config/nvptx/nvptx.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index ceea4d3a093..262e8f9cc1b 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -5103,6 +5103,7 @@ prevent_branch_around_nothing (void)
case CODE_FOR_nvptx_forked:
case CODE_FOR_nvptx_joining:
case CODE_FOR_nvptx_join:
+   case CODE_FOR_nop:
  continue;
default:
  seen_label = NULL;


[committed][nvptx] Update bar.sync for ptx isa 6.0

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

In ptx isa 6.0, a new barrier instruction was added, and bar.sync was
redefined as barrier.sync.aligned.

The aligned modifier indicates that all threads in a CTA will execute the same
barrier instruction.

The seems fine for a form "bar.sync 0".

But a "bar.sync %rx,64" (as used for vector length > 32) may execute a
diffferent barrier depending on the value of %rx, so we can't assume it's
aligned.

Fix this by using "barrier.sync %rx,64" instead.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Update bar.sync for ptx isa 6.0

gcc/ChangeLog:

2022-01-27  Tom de Vries  

* config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_6_0.
* config/nvptx/nvptx.h (TARGET_PTX_6_0): New macro.
* config/nvptx/nvptx.md (define_insn "nvptx_barsync"): Use barrier
insn for TARGET_PTX_6_0.

---
 gcc/config/nvptx/nvptx-opts.h | 1 +
 gcc/config/nvptx/nvptx.h  | 1 +
 gcc/config/nvptx/nvptx.md | 8 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index daae72ff70e..c754a5193ce 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -32,6 +32,7 @@ enum ptx_isa
 enum ptx_version
 {
   PTX_VERSION_3_1,
+  PTX_VERSION_6_0,
   PTX_VERSION_6_3,
   PTX_VERSION_7_0
 };
diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 9fda2f0d86c..065d7aa210c 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -91,6 +91,7 @@
 #define TARGET_SM75 (ptx_isa_option >= PTX_ISA_SM75)
 #define TARGET_SM80 (ptx_isa_option >= PTX_ISA_SM80)
 
+#define TARGET_PTX_6_0 (ptx_version_option >= PTX_VERSION_6_0)
 #define TARGET_PTX_6_3 (ptx_version_option >= PTX_VERSION_6_3)
 #define TARGET_PTX_7_0 (ptx_version_option >= PTX_VERSION_7_0)
 
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 9cbbd956f9d..b39116520ba 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1968,9 +1968,13 @@ (define_insn "nvptx_barsync"
   ""
   {
 if (INTVAL (operands[1]) == 0)
-  return "\\tbar.sync\\t%0;";
+  return (TARGET_PTX_6_0
+ ? "\\tbarrier.sync.aligned\\t%0;"
+ : "\\tbar.sync\\t%0;");
 else
-  return "\\tbar.sync\\t%0, %1;";
+  return (TARGET_PTX_6_0
+ ? "\\tbarrier.sync\\t%0, %1;"
+ : "\\tbar.sync\\t%0, %1;");
   }
   [(set_attr "predicable" "false")])
 


[committed][nvptx] Update default ptx isa to 6.3

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

With the following example, minimized from parallel-dims.c:
...
int
main (void)
{
  int vectors_max = -1;
  #pragma acc parallel num_gangs (1) num_workers (1) copy (vectors_max)
  {
for (int i = 0; i < 2; i++)
  for (int j = 0; j < 2; j++)
#pragma acc loop vector reduction (max: vectors_max)
for (int k = 0; k < 32; k++)
  vectors_max = k;
  }

  if (vectors_max != 31)
__builtin_abort ();

  return 0;
}
...
I run into (T400, driver version 470.94):
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \
  execution test
...
The FAIL does not happen with GOMP_NVPTX_JIT=-O0.

The problem seems to be that the shfl insns for the vector reduction are not
executed uniformly by the warp.  Enforcing this by using shfl.sync fixes the
problem.

Fix this by setting the ptx isa to 6.3 by default, which allows the use of
shfl.sync.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Update default ptx isa to 6.3

gcc/ChangeLog:

2022-01-27  Tom de Vries  

* config/nvptx/nvptx.opt (mptx): Set to PTX_VERSION_6_3 by default.

---
 gcc/config/nvptx/nvptx.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 6514dd326fb..6e12b1f7296 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -89,5 +89,5 @@ EnumValue
 Enum(ptx_version) String(7.0) Value(PTX_VERSION_7_0)
 
 mptx=
-Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_3_1)
+Target RejectNegative ToLower Joined Enum(ptx_version) Var(ptx_version_option) 
Init(PTX_VERSION_6_3)
 Specify the version of the ptx version to use.


[committed][nvptx] Add bar.warp.sync

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

On a GT 1030 (sm_61), with driver version 470.94 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx] Update
default ptx isa to 6.3".

The first divergent branch looks like:
...
  {
.reg .u32 %x;
mov.u32 %x,%tid.x;
setp.ne.u32 %r59,%x,0;
  }
  @ %r59 bra $L15;
  mov.u64 %r48,%ar0;
  mov.u32 %r22,2;
  ld.u64 %r53,[%r48];
  mov.u32 %r55,%r22;
  mov.u32 %r54,1;
 $L15:
...
and when inspecting the generated SASS, the branch is not setup as a divergent
branch, but instead as a regular branch.

This causes us to execute a shfl.sync insn in divergent mode, which is likely
to cause trouble given a remark in the ptx isa version 6.3, which mentions
that for .target sm_6x or below, all threads must excute the same
shfl.sync instruction in convergence.

Fix this by placing a "bar.warp.sync 0x" at the desired convergence
point (in the example above, after $L15).

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add bar.warp.sync

gcc/ChangeLog:

2022-01-31  Tom de Vries  

* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_warpsync.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_WARPSYNC.
(define_insn "nvptx_warpsync"): New define_insn.

---
 gcc/config/nvptx/nvptx.cc | 7 +++
 gcc/config/nvptx/nvptx.md | 7 +++
 2 files changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 262e8f9cc1b..1b91990ca1f 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -4598,6 +4598,7 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
   rtx_insn *neuter_start = NULL;
   rtx_insn *worker_label = NULL, *vector_label = NULL;
   rtx_insn *worker_jump = NULL, *vector_jump = NULL;
+  rtx_insn *warp_sync = NULL;
   for (mode = GOMP_DIM_WORKER; mode <= GOMP_DIM_VECTOR; mode++)
 if (GOMP_DIM_MASK (mode) & skip_mask)
   {
@@ -4630,11 +4631,15 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
if (tail_branch)
  {
label_insn = emit_label_before (label, before);
+   if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
+ warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
before = label_insn;
  }
else
  {
label_insn = emit_label_after (label, tail);
+   if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
+ warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
if ((mode == GOMP_DIM_VECTOR || mode == GOMP_DIM_WORKER)
&& CALL_P (tail) && find_reg_note (tail, REG_NORETURN, NULL))
  emit_insn_after (gen_exit (), label_insn);
@@ -4702,6 +4707,8 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
 setp.ne.u32 %rcond,%rcondu32,0;
  */
  rtx_insn *label = PREV_INSN (tail);
+ if (label == warp_sync)
+   label = PREV_INSN (label);
  gcc_assert (label && LABEL_P (label));
  rtx tmp = gen_reg_rtx (BImode);
  emit_insn_before (gen_movbi (tmp, const0_rtx),
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index b39116520ba..b4c7cd6e56d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -56,6 +56,7 @@ (define_c_enum "unspecv" [
UNSPECV_CAS
UNSPECV_XCHG
UNSPECV_BARSYNC
+   UNSPECV_WARPSYNC
UNSPECV_MEMBAR
UNSPECV_MEMBAR_CTA
UNSPECV_MEMBAR_GL
@@ -1978,6 +1979,12 @@ (define_insn "nvptx_barsync"
   }
   [(set_attr "predicable" "false")])
 
+(define_insn "nvptx_warpsync"
+  [(unspec_volatile [(const_int 0)] UNSPECV_WARPSYNC)]
+  "TARGET_PTX_6_0"
+  "\\tbar.warp.sync\\t0x;"
+  [(set_attr "predicable" "false")])
+
 (define_expand "memory_barrier"
   [(set (match_dup 0)
(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]


[committed][nvptx] Add uniform_warp_check insn

2022-02-01 Thread Tom de Vries via Gcc-patches
Hi,

On a GT 1030, with driver version 470.94 and -mptx=3.1 I run into:
...
FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \
  -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none \
  -O2 execution test
...
which minimizes to the same test-case as listed in commit "[nvptx]
Update default ptx isa to 6.3".

The problem is again that the first diverging branch is not handled as such in
SASS, which causes problems with a subsequent shfl insn, but given that we
have -mptx=3.1 we can't use the bar.warp.sync insn.

Given that the default is now -mptx=6.3, and consequently -mptx=3.1 is of a
lesser importance, implement the next best thing: abort when detecting
non-convergence using this insn:
...
  { .reg.b32 act;
vote.ballot.b32 act,1;
.reg.pred uni;
setp.eq.b32 uni,act,0x;
@ !uni trap;
@ !uni exit;
  }
...

Interestingly, the effect of this is that rather than aborting, the test-case
now passes.

Tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Add uniform_warp_check insn

gcc/ChangeLog:

2022-01-31  Tom de Vries  

* config/nvptx/nvptx.cc (nvptx_single): Use nvptx_uniform_warp_check.
* config/nvptx/nvptx.md (define_c_enum "unspecv"): Add
UNSPECV_UNIFORM_WARP_CHECK.
(define_insn "nvptx_uniform_warp_check"): New define_insn.

---
 gcc/config/nvptx/nvptx.cc | 22 ++
 gcc/config/nvptx/nvptx.md | 18 ++
 2 files changed, 36 insertions(+), 4 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 1b91990ca1f..b3bb97c3c14 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -4631,15 +4631,29 @@ nvptx_single (unsigned mask, basic_block from, 
basic_block to)
if (tail_branch)
  {
label_insn = emit_label_before (label, before);
-   if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
- warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
+   if (mode == GOMP_DIM_VECTOR)
+ {
+   if (TARGET_PTX_6_0)
+ warp_sync = emit_insn_after (gen_nvptx_warpsync (),
+  label_insn);
+   else
+ warp_sync = emit_insn_after (gen_nvptx_uniform_warp_check (),
+  label_insn);
+ }
before = label_insn;
  }
else
  {
label_insn = emit_label_after (label, tail);
-   if (TARGET_PTX_6_0 && mode == GOMP_DIM_VECTOR)
- warp_sync = emit_insn_after (gen_nvptx_warpsync (), label_insn);
+   if (mode == GOMP_DIM_VECTOR)
+ {
+   if (TARGET_PTX_6_0)
+ warp_sync = emit_insn_after (gen_nvptx_warpsync (),
+  label_insn);
+   else
+ warp_sync = emit_insn_after (gen_nvptx_uniform_warp_check (),
+  label_insn);
+ }
if ((mode == GOMP_DIM_VECTOR || mode == GOMP_DIM_WORKER)
&& CALL_P (tail) && find_reg_note (tail, REG_NORETURN, NULL))
  emit_insn_after (gen_exit (), label_insn);
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index b4c7cd6e56d..92768dd9e95 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -57,6 +57,7 @@ (define_c_enum "unspecv" [
UNSPECV_XCHG
UNSPECV_BARSYNC
UNSPECV_WARPSYNC
+   UNSPECV_UNIFORM_WARP_CHECK
UNSPECV_MEMBAR
UNSPECV_MEMBAR_CTA
UNSPECV_MEMBAR_GL
@@ -1985,6 +1986,23 @@ (define_insn "nvptx_warpsync"
   "\\tbar.warp.sync\\t0x;"
   [(set_attr "predicable" "false")])
 
+(define_insn "nvptx_uniform_warp_check"
+  [(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)]
+  ""
+  {
+output_asm_insn ("{", NULL);
+output_asm_insn ("\\t"  ".reg.b32""\\t" "act;", NULL);
+output_asm_insn ("\\t"  "vote.ballot.b32" "\\t" "act,1;", NULL);
+output_asm_insn ("\\t"  ".reg.pred"   "\\t" "uni;", NULL);
+output_asm_insn ("\\t"  "setp.eq.b32" "\\t" "uni,act,0x;",
+NULL);
+output_asm_insn ("@ !uni\\t" "trap;", NULL);
+output_asm_insn ("@ !uni\\t" "exit;", NULL);
+output_asm_insn ("}", NULL);
+return "";
+  }
+  [(set_attr "predicable" "false")])
+
 (define_expand "memory_barrier"
   [(set (match_dup 0)
(unspec_volatile:BLK [(match_dup 0)] UNSPECV_MEMBAR))]


Re: [PATCH] rs6000: Fix up PCH on powerpc* [PR104323]

2022-02-01 Thread Segher Boessenkool
Hi!

On Tue, Feb 01, 2022 at 04:27:40PM +0100, Jakub Jelinek wrote:
> +/* PR target/104323 */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-maltivec" } */
> +
> +#include 
> testcase which I'm not including into testsuite because for some reason
> the test fails on non-powerpc* targets (is done even on those and fails
> because of missing altivec.h etc.),

powerpc_altivec_ok returns false if the target isn't Power, you can use
this in the testsuite fine?  Why does it still fail on other targets,
the test should be SKIPPED there?

Or wait, proc check_effective_target_powerpc_altivec_ok is broken, and
does not implement its intention or documentation.  Will fix.

> PCH is broken on powerpc*-*-* since the
> new builtin generator has been introduced.
> The generator contains or emits comments like:
>   /*  Cannot mark this as a GC root because only pointer types can
>  be marked as GTY((user)) and be GC roots.  All trees in here are
>  kept alive by other globals, so not a big deal.  Alternatively,
>  we could change the enum fields to ints and cast them in and out
>  to avoid requiring a GTY((user)) designation, but that seems
>  unnecessarily gross.  */
> Having the fntypes stored in other GC roots can work fine for GC,
> ggc_collect will then always mark them and so they won't disappear from
> the tables, but it definitely doesn't work for PCH, which when the
> arrays with fntype members aren't GTY marked means on PCH write we create
> copies of those FUNCTION_TYPEs and store in *.gch that the GC roots should
> be updated, but don't store that rs6000_builtin_info[?].fntype etc. should
> be updated.  When PCH is read again, the blob is read at some other address,
> GC roots are updated, rs6000_builtin_info[?].fntype contains garbage
> pointers (GC freed pointers with random data, or random unrelated types or
> other trees).
> The following patch fixes that.  It stops any user markings because that
> is totally unnecessary, just skips fields we don't need to mark and adds
> GTY(()) to the 2 array variables.  We can get rid of all those global
> vars for the fn types, they can be now automatic vars.
> With the patch we get
>   {
> &rs6000_instance_info[0].fntype,
> 1 * (RS6000_INST_MAX),
> sizeof (rs6000_instance_info[0]),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
>   {
> &rs6000_builtin_info[0].fntype,
> 1 * (RS6000_BIF_MAX),
> sizeof (rs6000_builtin_info[0]),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
> as the new roots which is exactly what we want and significantly more
> compact than countless
>   {
> &uv2di_ftype_pudi_usi,
> 1,
> sizeof (uv2di_ftype_pudi_usi),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
>   {
> &uv2di_ftype_lg_puv2di,
> 1,
> sizeof (uv2di_ftype_lg_puv2di),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
>   {
> &uv2di_ftype_lg_pudi,
> 1,
> sizeof (uv2di_ftype_lg_pudi),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
>   {
> &uv2di_ftype_di_puv2di,
> 1,
> sizeof (uv2di_ftype_di_puv2di),
> >_ggc_mx_tree_node,
> >_pch_nx_tree_node
>   },
> cases (822 of these instead of just those 4 shown).

Bill, can you review the builtin side of this?

>   PR target/104323
>   * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h
>   rather than $(srcdir)/config/rs6000/rs6000-builtins.def.
>   * config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use
>   GTY((user)) for struct bifdata and struct ovlddata.  Instead add
>   GTY((skip(""))) to members with pointer and enum types that don't need
>   to be tracked.  Add GTY(()) to rs6000_builtin_info and 
> rs6000_instance_info
>   declarations.  Don't emit gt_ggc_mx and gt_pch_nx declarations.

Nice :-)

>   (write_extern_fntype, write_fntype): Remove.
>   (write_fntype_init): Emit the fntype vars as automatic vars instead
>   of file scope ones.
>   (write_header_file): Don't iterate with write_extern_fntype.
>   (write_init_file): Don't iterate with write_fntype.  Don't emit
>   gt_ggc_mx and gt_pch_nx definitions.

>if (tf_found)
> -fprintf (init_file, "  if (float128_type_node)\n  ");
> +fprintf (init_file,
> +  "  tree %s = NULL_TREE;\n  if (float128_type_node)\n",
> +  buf);
>else if (dfp_found)
> -fprintf (init_file, "  if (dfloat64_type_node)\n  ");
> +fprintf (init_file,
> +  "  tree %s = NULL_TREE;\n  if (dfloat64_type_node)\n",
> +  buf);

Things are much more readable if you break this into multiple print
statements.  I realise the existomg code is like that, but that could
be improved.

So, okay for trunk (modulo what Bill finds), and you can include the
testcase as well after I have fixed the selector.  Thanks Jakub!


Segher


Re: [PATCH] Reset relations when crossing backedges.

2022-02-01 Thread Aldy Hernandez via Gcc-patches
Ping

On Mon, Jan 24, 2022, 20:20 Aldy Hernandez  wrote:

> On Mon, Jan 24, 2022 at 9:51 AM Richard Biener
>  wrote:
> >
> > On Fri, Jan 21, 2022 at 1:12 PM Aldy Hernandez  wrote:
> > >
> > > On Fri, Jan 21, 2022 at 11:56 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Fri, Jan 21, 2022 at 11:30 AM Aldy Hernandez 
> wrote:
> > > > >
> > > > > On Fri, Jan 21, 2022 at 10:43 AM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Fri, Jan 21, 2022 at 9:30 AM Aldy Hernandez via Gcc-patches
> > > > > >  wrote:
> > > > > > >
> > > > > > > As discussed in PR103721, the problem here is that we are
> crossing a
> > > > > > > backedge and causing us to use relations from a previous
> iteration of a
> > > > > > > loop.
> > > > > > >
> > > > > > > This handles the testcases in both PR103721 and PR104067 which
> are variants
> > > > > > > of the same thing.
> > > > > > >
> > > > > > > Tested on x86-64 Linux with the usual regstrap as well as
> verifying the
> > > > > > > thread count before and after the patch.  The number of
> threads is
> > > > > > > reduced by a miniscule amount.
> > > > > > >
> > > > > > > I assume we need release manager approval at this point?  OK
> for trunk?
> > > > > >
> > > > > > Not for regression fixes.
> > > > >
> > > > > OK, I've pushed it to fix the P1s.  We can continue refining the
> > > > > solution in a follow-up patch.
> > > > >
> > > > > >
> > > > > > Btw, I wonder whether you have to treat irreducible regions in
> the same
> > > > > > way more generally - which edges are marked as backedge there
> depends
> > > > > > on which edge into the region was visited first.  I also wonder
> how we
> > > > >
> > > > > Jeff, Andrew??
> > > > >
> > > > > > I also wonder how we guarantee that all users of the resolve
> mode have backedges marked
> > > > > > properly?  Also note that CFG manipulation routines in general
> do not
> > > > > > keep backedge markings up-to-date so incremental modification and
> > > > > > resolving queries might not mix.
> > > > > >
> > > > > > It's a bit unfortunate that we can't query the CFG on whether
> backedges
> > > > > > are marked.
> > > > >
> > > > > Ughh.  The call to mark_dfs_back_edges is currently in the backward
> > > > > threader.  Perhaps we could put it in the path_range_query
> > > > > constructor?  That way other users of path_range_query can benefit
> > > > > (loop_ch for instance, though it doesn't use it in a way that
> crosses
> > > > > backedges so perhaps it's unaffected).  Does that sound reasonable?
> > > >
> > > > Hmm, I'd rather keep the burden on the callers because many already
> > > > should have backedges marked.  What you could instead do is
> > > > add sth like
> > > >
> > > >   if (flag_checking)
> > > > {
> > > >auto_edge_flag saved_dfs_back;
> > > >for-each-edge-in-cfg () set saved_dfs_back flag if dfs_back is
> > > > set, clear dfs_back
> > > >mark_dfs_back_edges ();
> > > >for-each-edge-in-cfg () verify the flags are set on the same
> > > > edges and clear saved_dfs_back
> > > > }
> > > >
> > > > to the path_range_query constructor.  That way we at least notice
> when passes
> > > > do _not_ have the backedges marked properly.
> > >
> > > Sounds good.  Thanks.
> > >
> > > I've put the verifier by mark_dfs_back_edges(), since it really has
> > > nothing to do with the path solver.  Perhaps it's useful for someone
> > > else.
> > >
> > > The patch triggered with the loop-ch use, so I've added a
> > > corresponding mark_dfs_back_edges there.
> > >
> > > OK pending tests?
> >
> > Please rename dfs_back_edges_available_p to sth not suggesting
> > it's a simple predicate check, like maybe verify_marked_backedges.
> >
> > +
> > +  gcc_checking_assert (!m_resolve || dfs_back_edges_available_p ());
> >
> > I'd prefer the following which allows even release checking builds
> > to verify this with -fchecking.
> >
> >   if (!m_resolve)
> > if (flag_checking)
> >   verify_marked_backedges ();
> >
> > and then ideally verify_marked_backedges () should raise an
> > internal_error () for the edge not marked properly, best by
> > also specifying it.  Just like other CFG verifiers do.
> >
> > The loop ch and backwards threader changes are OK.  You
> > can post the verification independently again.
>
> How about this?
>
> Tested on x86-64 Linux.
>


[PATCH] declare std::array members attribute const [PR101831]

2022-02-01 Thread Martin Sebor via Gcc-patches

Passing an uninitialized object to a function that takes its argument
by const reference is diagnosed by -Wmaybe-uninitialized because most
such functions read the argument.  The exceptions are functions that
don't access the object but instead use its address to compute
a result.  This includes a number of std::array member functions such
as std::array::size() which returns the template argument N.  Such
functions may be candidates for attribute const which also avoids
the warning.  The attribute typically only benefits extern functions
that IPA cannot infer the property from, but in this case it helps
avoid the warning which runs very early on, even without optimization
or inlining.  The attached patch adds the attribute to a subset of
those member functions of std::array.  (It doesn't add it to const
member functions like cbegin() or front() that return a const_iterator
or const reference to the internal data.)

It might be possible to infer this property from inline functions
earlier on than during IPA and avoid having to annotate them explicitly.
That seems like an enhancement worth considering in the future.

Tested on x86_64-linux.

MartinDeclare std::array members with attribute const [PR101831].

Resolves:
PR libstdc++/101831 - Spurious maybe-uninitialized warning on std::array::size

libstdc++-v3/ChangeLog:

	* include/std/array (begin): Declare const member function attribute
	const.
	(end, rbegin, rend, size, max_size, empty, data): Same.
	* testsuite/23_containers/array/capacity/empty.cc: Add test cases.
	* testsuite/23_containers/array/capacity/max_size.cc: Same.
	* testsuite/23_containers/array/capacity/size.cc: Same.
	* testsuite/23_containers/array/iterators/begin_end.cc: New test.

diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index b4d8fc81a52..e45143fb329 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -127,7 +127,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { std::swap_ranges(begin(), end(), __other.begin()); }
 
   // Iterators.
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   _GLIBCXX17_CONSTEXPR iterator
   begin() noexcept
   { return iterator(data()); }
@@ -137,7 +137,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   begin() const noexcept
   { return const_iterator(data()); }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   _GLIBCXX17_CONSTEXPR iterator
   end() noexcept
   { return iterator(data() + _Nm); }
@@ -147,7 +147,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   end() const noexcept
   { return const_iterator(data() + _Nm); }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   _GLIBCXX17_CONSTEXPR reverse_iterator
   rbegin() noexcept
   { return reverse_iterator(end()); }
@@ -157,7 +157,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   rbegin() const noexcept
   { return const_reverse_iterator(end()); }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   _GLIBCXX17_CONSTEXPR reverse_iterator
   rend() noexcept
   { return reverse_iterator(begin()); }
@@ -188,15 +188,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return const_reverse_iterator(begin()); }
 
   // Capacity.
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   constexpr size_type
   size() const noexcept { return _Nm; }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   constexpr size_type
   max_size() const noexcept { return _Nm; }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   constexpr bool
   empty() const noexcept { return size() == 0; }
 
@@ -278,7 +278,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  	   : _AT_Type::_S_ref(_M_elems, 0);
   }
 
-  [[__nodiscard__]]
+  [[__gnu__::__const__, __nodiscard__]]
   _GLIBCXX17_CONSTEXPR pointer
   data() noexcept
   { return _AT_Type::_S_ptr(_M_elems); }
diff --git a/libstdc++-v3/testsuite/23_containers/array/capacity/empty.cc b/libstdc++-v3/testsuite/23_containers/array/capacity/empty.cc
index 3f3f282ad9d..cecbae39e45 100644
--- a/libstdc++-v3/testsuite/23_containers/array/capacity/empty.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/capacity/empty.cc
@@ -40,8 +40,26 @@ test01()
   }
 }
 
+#pragma GCC push_options
+#pragma GCC optimize "0"
+
+void
+test02()
+{
+  {
+const size_t len = 3;
+typedef std::array array_type;
+array_type a;
+
+VERIFY( a.empty() == false );// { dg-bogus "-Wmaybe-uninitialized"
+  }
+}
+
+#pragma GCC pop_options
+
 int main()
 {
   test01();
+  test02();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/array/capacity/max_size.cc b/libstdc++-v3/testsuite/23_containers/array/capacity/max_size.cc
index 0e000258530..4629316161d 100644
--- a/libstdc++-v3/testsuite/23_containers/array/capacity/max_size.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/capacity/max

Re: [PATCH] rs6000: Fix up PCH on powerpc* [PR104323]

2022-02-01 Thread Bill Schmidt via Gcc-patches
Hi!

Jakub, thanks for fixing this.  I didn't realize the PCH implications here, 
clearly...

On 2/1/22 12:33 PM, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Feb 01, 2022 at 04:27:40PM +0100, Jakub Jelinek wrote:
>> +/* PR target/104323 */
>> +/* { dg-require-effective-target powerpc_altivec_ok } */
>> +/* { dg-options "-maltivec" } */
>> +
>> +#include 
>> testcase which I'm not including into testsuite because for some reason
>> the test fails on non-powerpc* targets (is done even on those and fails
>> because of missing altivec.h etc.),
> powerpc_altivec_ok returns false if the target isn't Power, you can use
> this in the testsuite fine?  Why does it still fail on other targets,
> the test should be SKIPPED there?
>
> Or wait, proc check_effective_target_powerpc_altivec_ok is broken, and
> does not implement its intention or documentation.  Will fix.
>
>> PCH is broken on powerpc*-*-* since the
>> new builtin generator has been introduced.
>> The generator contains or emits comments like:
>>   /*  Cannot mark this as a GC root because only pointer types can
>>  be marked as GTY((user)) and be GC roots.  All trees in here are
>>  kept alive by other globals, so not a big deal.  Alternatively,
>>  we could change the enum fields to ints and cast them in and out
>>  to avoid requiring a GTY((user)) designation, but that seems
>>  unnecessarily gross.  */
>> Having the fntypes stored in other GC roots can work fine for GC,
>> ggc_collect will then always mark them and so they won't disappear from
>> the tables, but it definitely doesn't work for PCH, which when the
>> arrays with fntype members aren't GTY marked means on PCH write we create
>> copies of those FUNCTION_TYPEs and store in *.gch that the GC roots should
>> be updated, but don't store that rs6000_builtin_info[?].fntype etc. should
>> be updated.  When PCH is read again, the blob is read at some other address,
>> GC roots are updated, rs6000_builtin_info[?].fntype contains garbage
>> pointers (GC freed pointers with random data, or random unrelated types or
>> other trees).
>> The following patch fixes that.  It stops any user markings because that
>> is totally unnecessary, just skips fields we don't need to mark and adds
>> GTY(()) to the 2 array variables.  We can get rid of all those global
>> vars for the fn types, they can be now automatic vars.
>> With the patch we get
>>   {
>> &rs6000_instance_info[0].fntype,
>> 1 * (RS6000_INST_MAX),
>> sizeof (rs6000_instance_info[0]),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>>   {
>> &rs6000_builtin_info[0].fntype,
>> 1 * (RS6000_BIF_MAX),
>> sizeof (rs6000_builtin_info[0]),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>> as the new roots which is exactly what we want and significantly more
>> compact than countless
>>   {
>> &uv2di_ftype_pudi_usi,
>> 1,
>> sizeof (uv2di_ftype_pudi_usi),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>>   {
>> &uv2di_ftype_lg_puv2di,
>> 1,
>> sizeof (uv2di_ftype_lg_puv2di),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>>   {
>> &uv2di_ftype_lg_pudi,
>> 1,
>> sizeof (uv2di_ftype_lg_pudi),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>>   {
>> &uv2di_ftype_di_puv2di,
>> 1,
>> sizeof (uv2di_ftype_di_puv2di),
>> >_ggc_mx_tree_node,
>> >_pch_nx_tree_node
>>   },
>> cases (822 of these instead of just those 4 shown).
> Bill, can you review the builtin side of this?

Yes, I've just read through it and it looks just fine to me.
It's a big improvement over what I had there, even ignoring
the PCH issues.

Thanks again, Jakub!

Bill

>
>>  PR target/104323
>>  * config/rs6000/t-rs6000 (EXTRA_GTYPE_DEPS): Append rs6000-builtins.h
>>  rather than $(srcdir)/config/rs6000/rs6000-builtins.def.
>>  * config/rs6000/rs6000-gen-builtins.cc (write_decls): Don't use
>>  GTY((user)) for struct bifdata and struct ovlddata.  Instead add
>>  GTY((skip(""))) to members with pointer and enum types that don't need
>>  to be tracked.  Add GTY(()) to rs6000_builtin_info and 
>> rs6000_instance_info
>>  declarations.  Don't emit gt_ggc_mx and gt_pch_nx declarations.
> Nice :-)
>
>>  (write_extern_fntype, write_fntype): Remove.
>>  (write_fntype_init): Emit the fntype vars as automatic vars instead
>>  of file scope ones.
>>  (write_header_file): Don't iterate with write_extern_fntype.
>>  (write_init_file): Don't iterate with write_fntype.  Don't emit
>>  gt_ggc_mx and gt_pch_nx definitions.
>>if (tf_found)
>> -fprintf (init_file, "  if (float128_type_node)\n  ");
>> +fprintf (init_file,
>> + "  tree %s = NULL_TREE;\n  if (float128_type_node)\n",
>> + buf);
>>else if (dfp_found)
>> -fprintf (init_file, "  if (dfloat64_type_node)\n  ");
>> +fprintf (init_file,
>> + "  tree %s = NULL_TREE;\n  if (dfloat64_type

Re: [PATCH] libcpp: Avoid PREV_WHITE and other random content on CPP_PADDING tokens

2022-02-01 Thread Jason Merrill via Gcc-patches

On 2/1/22 10:31, Jakub Jelinek wrote:

On Tue, Feb 01, 2022 at 10:03:57AM +0100, Jakub Jelinek via Gcc-patches wrote:

I wonder if we shouldn't replace that
   toks[0] = pfile->directive_result;
line with
   toks[0] = pfile->avoid_paste;
or even replace those
   toks = XNEW (cpp_token);
   toks[0] = pfile->directive_result;
lines with
   toks = &pfile->avoid_paste;


Here is a patch that does that, bootstrapped/regtested on powerpc64le-linux,
ok for trunk?


OK along with the previous patch and the (checking) assert in 
funlike_invocation_p.



2022-02-01  Jakub Jelinek  

* directives.cc (destringize_and_run): Push &pfile->avoid_paste
instead of a copy of pfile->directive_result for the CPP_PADDING
case.

--- libcpp/directives.cc.jj 2022-01-18 11:59:00.257972414 +0100
+++ libcpp/directives.cc2022-02-01 13:39:27.240114485 +0100
@@ -1954,8 +1954,7 @@ destringize_and_run (cpp_reader *pfile,
else
  {
count = 1;
-  toks = XNEW (cpp_token);
-  toks[0] = pfile->directive_result;
+  toks = &pfile->avoid_paste;
  
/* If we handled the entire pragma internally, make sure we get the

 line number correct for the next token.  */


Jakub





Re: [PATCH] declare std::array members attribute const [PR101831]

2022-02-01 Thread Jonathan Wakely via Gcc-patches
On Tue, 1 Feb 2022 at 18:54, Martin Sebor via Libstdc++
 wrote:
>
> Passing an uninitialized object to a function that takes its argument
> by const reference is diagnosed by -Wmaybe-uninitialized because most
> such functions read the argument.  The exceptions are functions that
> don't access the object but instead use its address to compute
> a result.  This includes a number of std::array member functions such
> as std::array::size() which returns the template argument N.  Such
> functions may be candidates for attribute const which also avoids
> the warning.  The attribute typically only benefits extern functions
> that IPA cannot infer the property from, but in this case it helps
> avoid the warning which runs very early on, even without optimization
> or inlining.  The attached patch adds the attribute to a subset of
> those member functions of std::array.  (It doesn't add it to const
> member functions like cbegin() or front() that return a const_iterator
> or const reference to the internal data.)
>
> It might be possible to infer this property from inline functions
> earlier on than during IPA and avoid having to annotate them explicitly.
> That seems like an enhancement worth considering in the future.
>
> Tested on x86_64-linux.
>
> Martin

new file mode 100644
index 000..b7743adf3c9
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/array/iterators/begin_end.cc
@@ -0,0 +1,56 @@
+// { dg-do compile { target c++11 } }
+//
+// Copyright (C) 2011-2022 Free Software Foundation, Inc.

Those dates look wrong. I no longer bother putting a license text and
copyright notice on simple tests like this. It's meaningless to assert
copyright on something so trivial that doesn't do anything.



[PATCH] reload: Adjust comment in find_reloads about subset, not intersection

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
> From: Richard Sandiford 
> Hans-Peter Nilsson via Gcc-patches  writes:
> > The mystery isn't so much that there's code mismatching comments or
> > intent, but that this code has been there "forever".  There has been a
> > function reg_classes_intersect_p, in gcc since r0-54, even *before*
> > there was reload.c; r0-361, so why the "we don't have a way of forming
> > the intersection" in the actual patch and why wasn't this fixed
> > (perhaps not even noticed) when reload was state-of-the-art?
> 
> But doesn't the comment

(the second, patched comment; removed in the patch)

> mean that we have/had no way of getting
> a *class* that is the intersection of preferred_class[i] and
> this_alternative[i], to store as the new
> this_alternative[i]?

Yes, that's likely what's going on in the (second) comment
and code; needing a s/intersection/a class for the
intersection/, but the *first* comment is pretty clear that
the intent is exactly to "override" this_alternative[i]: "If
not (a subclass), but it intersects that class, use the
preferred class instead".  But of course, that doesn't
actually have to make sense as code!  And indeed it doesn't,
as you say.

> If the alternatives were register sets rather than classes,
> I think the intended effect would be:
> 
>   this_alternative[i] &= preferred_class[i];
> 
> (i.e. AND_HARD_REG_SET in old money).
> 
> It looks like the patch would instead make this_alternative[i] include
> registers that the alternative doesn't actually accept, which feels odd.

Perhaps I put too much trust in the sanity of old comments.

How about I actually commit this one instead?  Better get it
right before reload is removed.

8< --- >8
"reload: Adjust comment in find_reloads about subset, not intersection"
gcc:

* reload.cc (find_reloads): Align comment with code where
considering the intersection of register classes then tweaking the
regclass for the current alternative or rejecting it.
---
 gcc/reload.cc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/reload.cc b/gcc/reload.cc
index 664082a533d9..3ed901e39447 100644
--- a/gcc/reload.cc
+++ b/gcc/reload.cc
@@ -3635,9 +3635,11 @@ find_reloads (rtx_insn *insn, int replace, int 
ind_levels, int live_known,
 a hard reg and this alternative accepts some
 register, see if the class that we want is a subset
 of the preferred class for this register.  If not,
-but it intersects that class, use the preferred class
-instead.  If it does not intersect the preferred
-class, show that usage of this alternative should be
+but it intersects that class, we'd like to use the
+intersection, but the best we can do is to use the
+preferred class, if it is instead a subset of the
+class we want in this alternative.  If we can't use
+it, show that usage of this alternative should be
 discouraged; it will be discouraged more still if the
 register is `preferred or nothing'.  We do this
 because it increases the chance of reusing our spill
@@ -3664,9 +3666,10 @@ find_reloads (rtx_insn *insn, int replace, int 
ind_levels, int live_known,
  if (! reg_class_subset_p (this_alternative[i],
preferred_class[i]))
{
- /* Since we don't have a way of forming the intersection,
-we just do something special if the preferred class
-is a subset of the class we have; that's the most
+ /* Since we don't have a way of forming a register
+class for the intersection, we just do
+something special if the preferred class is a
+subset of the class we have; that's the most
 common case anyway.  */
  if (reg_class_subset_p (preferred_class[i],
  this_alternative[i]))
-- 
2.30.2



[pushed] PR fortran/104331 - [10/11/12 Regression] ICE in gfc_simplify_eoshift, at fortran/simplify.cc:2590

2022-02-01 Thread Harald Anlauf via Gcc-patches
Dear all,

another trivial and obvious one, discovered by Gerhard.

We can have a NULL pointer dereference simplifying EOSHIFT on an array
that was not properly declared.

Pushed to mainline as r12-6977 after regtesting on x86_64-pc-linux-gnu.
Will backport to affected branches [11/10] with some delay.

Thanks,
Harald

From 447047a8f95c6bf4b1873f390c833e91aa8af18c Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 1 Feb 2022 21:36:42 +0100
Subject: [PATCH] Fortran: error recovery when simplifying EOSHIFT

gcc/fortran/ChangeLog:

	PR fortran/104331
	* simplify.cc (gfc_simplify_eoshift): Avoid NULL pointer
	dereference when shape is not set.

gcc/testsuite/ChangeLog:

	PR fortran/104331
	* gfortran.dg/eoshift_9.f90: New test.
---
 gcc/fortran/simplify.cc | 3 +++
 gcc/testsuite/gfortran.dg/eoshift_9.f90 | 8 
 2 files changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/eoshift_9.f90

diff --git a/gcc/fortran/simplify.cc b/gcc/fortran/simplify.cc
index 8604162cfd5..6483f9c31e7 100644
--- a/gcc/fortran/simplify.cc
+++ b/gcc/fortran/simplify.cc
@@ -2572,6 +2572,9 @@ gfc_simplify_eoshift (gfc_expr *array, gfc_expr *shift, gfc_expr *boundary,
   if (arraysize == 0)
 goto final;

+  if (array->shape == NULL)
+goto final;
+
   arrayvec = XCNEWVEC (gfc_expr *, arraysize);
   array_ctor = gfc_constructor_first (array->value.constructor);
   for (i = 0; i < arraysize; i++)
diff --git a/gcc/testsuite/gfortran.dg/eoshift_9.f90 b/gcc/testsuite/gfortran.dg/eoshift_9.f90
new file mode 100644
index 000..f711b04a7da
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/eoshift_9.f90
@@ -0,0 +1,8 @@
+! { dg-do compile }
+! PR fortran/104331 - ICE in gfc_simplify_eoshift
+! Contributed by G.Steinmetz
+
+program p
+  character(3), parameter :: a(:) = ['123'] ! { dg-error "deferred shape" }
+  character(3), parameter :: b(*) = eoshift(a, 1)
+end
--
2.34.1



[PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-01 Thread Ilya Leoshkevich via Gcc-patches
Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?


s390_code_end () puts indirect branch tables into separate sections and
tries to switch back to wherever it was in the beginning by calling
switch_to_section (current_function_section ()).

First of all, this is unnecessary - the other backends don't do it.

Furthermore, at this time there is no current function, but if the
last processed function was cold, in_cold_section_p remains set.  This
causes targetm.asm_out.function_section () to call
targetm.section_type_flags (), which in absence of current function
decl classifies the section as SECTION_WRITE.  This causes a section
type conflict with the existing SECTION_CODE.

gcc/ChangeLog:

* config/s390/s390.cc (s390_code_end): Do not switch back to
code section.

gcc/testsuite/ChangeLog:

* gcc.target/s390/nobp-section-type-conflict.c: New test.
---
 gcc/config/s390/s390.cc   |  1 -
 .../s390/nobp-section-type-conflict.c | 22 +++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 43c5c72554a..2db12d4ba4b 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -16809,7 +16809,6 @@ s390_code_end (void)
  assemble_name_raw (asm_out_file, label_start);
  fputs ("-.\n", asm_out_file);
}
- switch_to_section (current_function_section ());
}
 }
 }
diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
new file mode 100644
index 000..5d78bc99bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
@@ -0,0 +1,22 @@
+/* Checks that we don't get error: section type conflict with ‘put_page’.  */
+
+/* { dg-do compile } */
+/* { dg-options "-mindirect-branch=thunk-extern -mfunction-return=thunk-extern 
-mindirect-branch-table -O2" } */
+
+int a;
+int b (void);
+void c (int);
+
+static void
+put_page (void)
+{
+  if (b ())
+c (a);
+}
+
+__attribute__ ((__section__ (".init.text"), __cold__)) void
+d (void)
+{
+  put_page ();
+  put_page ();
+}
-- 
2.34.1



Re: [PATCH] IBM Z: fix `section type conflict` with -mindirect-branch-table

2022-02-01 Thread Andreas Krebbel via Gcc-patches
On 2/1/22 21:49, Ilya Leoshkevich wrote:
> Bootstrapped and regtested on s390x-redhat-linux.  Ok for master?
> 
> 
> s390_code_end () puts indirect branch tables into separate sections and
> tries to switch back to wherever it was in the beginning by calling
> switch_to_section (current_function_section ()).
> 
> First of all, this is unnecessary - the other backends don't do it.
> 
> Furthermore, at this time there is no current function, but if the
> last processed function was cold, in_cold_section_p remains set.  This
> causes targetm.asm_out.function_section () to call
> targetm.section_type_flags (), which in absence of current function
> decl classifies the section as SECTION_WRITE.  This causes a section
> type conflict with the existing SECTION_CODE.
> 
> gcc/ChangeLog:
> 
>   * config/s390/s390.cc (s390_code_end): Do not switch back to
>   code section.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/s390/nobp-section-type-conflict.c: New test.

Ok. Thanks!

Andreas


> ---
>  gcc/config/s390/s390.cc   |  1 -
>  .../s390/nobp-section-type-conflict.c | 22 +++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
> 
> diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
> index 43c5c72554a..2db12d4ba4b 100644
> --- a/gcc/config/s390/s390.cc
> +++ b/gcc/config/s390/s390.cc
> @@ -16809,7 +16809,6 @@ s390_code_end (void)
> assemble_name_raw (asm_out_file, label_start);
> fputs ("-.\n", asm_out_file);
>   }
> -   switch_to_section (current_function_section ());
>   }
>  }
>  }
> diff --git a/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c 
> b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
> new file mode 100644
> index 000..5d78bc99bb5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/s390/nobp-section-type-conflict.c
> @@ -0,0 +1,22 @@
> +/* Checks that we don't get error: section type conflict with ‘put_page’.  */
> +
> +/* { dg-do compile } */
> +/* { dg-options "-mindirect-branch=thunk-extern 
> -mfunction-return=thunk-extern -mindirect-branch-table -O2" } */
> +
> +int a;
> +int b (void);
> +void c (int);
> +
> +static void
> +put_page (void)
> +{
> +  if (b ())
> +c (a);
> +}
> +
> +__attribute__ ((__section__ (".init.text"), __cold__)) void
> +d (void)
> +{
> +  put_page ();
> +  put_page ();
> +}



Re: [PATCH v2 1/8] rs6000: More factoring of overload processing

2022-02-01 Thread Segher Boessenkool
On Tue, Feb 01, 2022 at 08:49:34AM -0600, Bill Schmidt wrote:
> I've modified the previous patch to add more explanatory commentary about
> the number-of-arguments test that was previously confusing, and to convert
> the switch into an if-then-else chain.  The rest of the patch is unchanged.
> Bootstrapped and tested on powerpc64le-linux-gnu.  Is this okay for trunk?

> gcc/
>   * config/rs6000/rs6000-c.cc (resolve_vec_mul): Accept args and types
>   parameters instead of arglist and nargs.  Simplify accordingly.  Remove
>   unnecessary test for argument count mismatch.
>   (resolve_vec_cmpne): Likewise.
>   (resolve_vec_adde_sube): Likewise.
>   (resolve_vec_addec_subec): Likewise.
>   (altivec_resolve_overloaded_builtin): Move overload special handling
>   after the gathering of arguments into args[] and types[] and the test
>   for correct number of arguments.  Don't perform the test for correct
>   number of arguments for certain special cases.  Call the other special
>   cases with args and types instead of arglist and nargs.

> +  if (fcode != RS6000_OVLD_VEC_PROMOTE
> +  && fcode != RS6000_OVLD_VEC_SPLATS
> +  && fcode != RS6000_OVLD_VEC_EXTRACT
> +  && fcode != RS6000_OVLD_VEC_INSERT
> +  && fcode != RS6000_OVLD_VEC_STEP
> +  && (!VOID_TYPE_P (TREE_VALUE (fnargs)) || n < nargs))
>  return NULL;

Please don't do De Morgan manually, let the compiler deal with it?
Although even with that the logic is as clear as mud.  This matters if
someone (maybe even you) will have to debug this later, or modify this.
Maybe adding some suitably named variables can clarify things  here?

> +  if (fcode == RS6000_OVLD_VEC_MUL)
> +returned_expr = resolve_vec_mul (&res, args, types, loc);
> +  else if (fcode == RS6000_OVLD_VEC_CMPNE)
> +returned_expr = resolve_vec_cmpne (&res, args, types, loc);
> +  else if (fcode == RS6000_OVLD_VEC_ADDE || fcode == RS6000_OVLD_VEC_SUBE)
> +returned_expr = resolve_vec_adde_sube (&res, fcode, args, types, loc);
> +  else if (fcode == RS6000_OVLD_VEC_ADDEC || fcode == RS6000_OVLD_VEC_SUBEC)
> +returned_expr = resolve_vec_addec_subec (&res, fcode, args, types, loc);
> +  else if (fcode == RS6000_OVLD_VEC_SPLATS || fcode == 
> RS6000_OVLD_VEC_PROMOTE)
> +returned_expr = resolve_vec_splats (&res, fcode, arglist, nargs);
> +  else if (fcode == RS6000_OVLD_VEC_EXTRACT)
> +returned_expr = resolve_vec_extract (&res, arglist, nargs, loc);
> +  else if (fcode == RS6000_OVLD_VEC_INSERT)
> +returned_expr = resolve_vec_insert (&res, arglist, nargs, loc);
> +  else if (fcode == RS6000_OVLD_VEC_STEP)
> +returned_expr = resolve_vec_step (&res, arglist, nargs);
> +
> +  if (res == resolved)
> +return returned_expr;

This is so convoluted because the functions do two things, and have two
return values (res and returned_expr).


Segher


[committed] libstdc++: Improve config output for --enable-cstdio [PR104301]

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


Currently we just print "checking for underlying I/O to use... stdio"
unconditionally, whether configured to use stdio_pure or stdio_posix. We
should make it clear that the user's configure option chose the right
thing.

libstdc++-v3/ChangeLog:

PR libstdc++/104301
* acinclude.m4 (GLIBCXX_ENABLE_CSTDIO): Print different messages
for stdio_pure and stdio_posix options.
* configure: Regenerate.
---
 libstdc++-v3/acinclude.m4 | 4 +++-
 libstdc++-v3/configure| 7 +--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 32638e6bfc5..066453e2148 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2830,11 +2830,13 @@ AC_DEFUN([GLIBCXX_ENABLE_CSTDIO], [
   CSTDIO_H=config/io/c_io_stdio.h
   BASIC_FILE_H=config/io/basic_file_stdio.h
   BASIC_FILE_CC=config/io/basic_file_stdio.cc
-  AC_MSG_RESULT(stdio)
 
   if test "x$enable_cstdio" = "xstdio_pure" ; then
+   AC_MSG_RESULT([stdio (without POSIX read/write)])
AC_DEFINE(_GLIBCXX_USE_STDIO_PURE, 1,
  [Define to restrict std::__basic_file<> to stdio APIs.])
+  else
+   AC_MSG_RESULT([stdio (with POSIX read/write)])
   fi
   ;;
   esac



[committed] libstdc++: Fix doxygen comment for filesystem::perms operators

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


libstdc++-v3/ChangeLog:

* include/bits/fs_fwd.h (filesystem::perms): Fix comment.
---
 libstdc++-v3/include/bits/fs_fwd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/fs_fwd.h 
b/libstdc++-v3/include/bits/fs_fwd.h
index 75aaf554a05..bc063761080 100644
--- a/libstdc++-v3/include/bits/fs_fwd.h
+++ b/libstdc++-v3/include/bits/fs_fwd.h
@@ -160,7 +160,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   };
 
   /// @{
-  /// @relates perm_options
+  /// @relates perms
   constexpr perms
   operator&(perms __x, perms __y) noexcept
   {
-- 
2.34.1



[committed] libstdc++: Reset filesystem::recursive_directory_iterator on error

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


The standard requires directory iterators to become equal to the end
iterator value if they report an error. Some members functions of
filesystem::recursive_directory_iterator fail to do that.

libstdc++-v3/ChangeLog:

* src/c++17/fs_dir.cc (recursive_directory_iterator::increment):
Reset state to past-the-end iterator on error.
(fs::recursive_directory_iterator::pop(error_code&)): Likewise.
(fs::recursive_directory_iterator::pop()): Check _M_dirs before
it might get reset.
* src/filesystem/dir.cc (recursive_directory_iterator): Likewise,
for the TS implementation.
* testsuite/27_io/filesystem/iterators/error_reporting.cc: New test.
* testsuite/experimental/filesystem/iterators/error_reporting.cc: New 
test.
---
 libstdc++-v3/src/c++17/fs_dir.cc  |  12 +-
 libstdc++-v3/src/filesystem/dir.cc|  12 +-
 .../filesystem/iterators/error_reporting.cc   | 135 +
 .../filesystem/iterators/error_reporting.cc   | 136 ++
 4 files changed, 291 insertions(+), 4 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
 create mode 100644 
libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc

diff --git a/libstdc++-v3/src/c++17/fs_dir.cc b/libstdc++-v3/src/c++17/fs_dir.cc
index e050304c19a..149a8b0740c 100644
--- a/libstdc++-v3/src/c++17/fs_dir.cc
+++ b/libstdc++-v3/src/c++17/fs_dir.cc
@@ -311,6 +311,10 @@ fs::recursive_directory_iterator::increment(error_code& ec)
  return *this;
}
 }
+
+  if (ec)
+_M_dirs.reset();
+
   return *this;
 }
 
@@ -334,16 +338,20 @@ fs::recursive_directory_iterator::pop(error_code& ec)
ec.clear();
return;
   }
-  } while (!_M_dirs->top().advance(skip_permission_denied, ec));
+  } while (!_M_dirs->top().advance(skip_permission_denied, ec) && !ec);
+
+  if (ec)
+_M_dirs.reset();
 }
 
 void
 fs::recursive_directory_iterator::pop()
 {
+  const bool dereferenceable = _M_dirs != nullptr;
   error_code ec;
   pop(ec);
   if (ec)
-_GLIBCXX_THROW_OR_ABORT(filesystem_error(_M_dirs
+_GLIBCXX_THROW_OR_ABORT(filesystem_error(dereferenceable
  ? "recursive directory iterator cannot pop"
  : "non-dereferenceable recursive directory iterator cannot pop",
  ec));
diff --git a/libstdc++-v3/src/filesystem/dir.cc 
b/libstdc++-v3/src/filesystem/dir.cc
index d5b11f25297..ac9e70da516 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -298,6 +298,10 @@ fs::recursive_directory_iterator::increment(error_code& 
ec) noexcept
  return *this;
}
 }
+
+  if (ec)
+_M_dirs.reset();
+
   return *this;
 }
 
@@ -321,16 +325,20 @@ fs::recursive_directory_iterator::pop(error_code& ec)
ec.clear();
return;
   }
-  } while (!_M_dirs->top().advance(skip_permission_denied, ec));
+  } while (!_M_dirs->top().advance(skip_permission_denied, ec) && !ec);
+
+  if (ec)
+_M_dirs.reset();
 }
 
 void
 fs::recursive_directory_iterator::pop()
 {
+  const bool dereferenceable = _M_dirs != nullptr;
   error_code ec;
   pop(ec);
   if (ec)
-_GLIBCXX_THROW_OR_ABORT(filesystem_error(_M_dirs
+_GLIBCXX_THROW_OR_ABORT(filesystem_error(dereferenceable
  ? "recursive directory iterator cannot pop"
  : "non-dereferenceable recursive directory iterator cannot pop",
  ec));
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
new file mode 100644
index 000..81ef1069367
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
@@ -0,0 +1,135 @@
+// Copyright (C) 2020-2022 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target { c++17 } } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int choice;
+
+struct dirent global_dirent;
+
+extern "C" struct dirent* readdir(DIR*)
+{
+  switch (choice)
+  {
+  case 1:
+global_dirent.d_ino = 999;
+global_dirent.d_type = DT_REG;
+global_dirent.d_reclen = 0;
+   

[committed] libstdc++: Add more tests for filesystem directory iterators

2022-02-01 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, pushed to trunk.


The PR 97731 test was added to verify a fix to the Filesystem TS code,
but we should also have the same test to avoid similar regressions in
the C++17 std::filesystem code.

Also add tests for directory_options::follow_directory_symlink

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/iterators/97731.cc: New test.
* testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc:
Check follow_directory_symlink option.
* 
testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc:
Likewise.
---
 .../27_io/filesystem/iterators/97731.cc   | 48 +++
 .../iterators/recursive_directory_iterator.cc | 19 
 .../iterators/recursive_directory_iterator.cc | 21 +++-
 3 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/filesystem/iterators/97731.cc

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/97731.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/97731.cc
new file mode 100644
index 000..9021e6edf41
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/97731.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2020-2022 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++17 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+#include 
+
+bool used_custom_readdir = false;
+
+extern "C" void* readdir(void*)
+{
+  used_custom_readdir = true;
+  errno = EIO;
+  return nullptr;
+}
+
+void
+test01()
+{
+  using std::filesystem::recursive_directory_iterator;
+  std::error_code ec;
+  recursive_directory_iterator it(".", ec);
+  if (used_custom_readdir)
+VERIFY( ec.value() == EIO );
+}
+
+int
+main()
+{
+  test01();
+}
diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
index b0ccdfe3d73..e67e2cf38f7 100644
--- 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/recursive_directory_iterator.cc
@@ -184,6 +184,24 @@ test05()
   remove_all(p, ec);
 }
 
+void
+test06()
+{
+#if !(defined __MINGW32__ || defined __MINGW64__)
+  auto p = __gnu_test::nonexistent_path();
+  create_directories(p/"d1/d2");
+  create_directory_symlink("d1", p/"link");
+  fs::recursive_directory_iterator it(p), endit;
+  VERIFY( std::distance(it, endit) == 3 ); // d1 and d2 and link
+
+  it = fs::recursive_directory_iterator(p, 
fs::directory_options::follow_directory_symlink);
+  VERIFY( std::distance(it, endit) == 4 ); // d1 and d1/d2 and link and link/d2
+
+  std::error_code ec;
+  remove_all(p, ec);
+#endif
+}
+
 int
 main()
 {
@@ -192,4 +210,5 @@ main()
   test03();
   test04();
   test05();
+  test06();
 }
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
index 455dbf28c8c..a201415921c 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/recursive_directory_iterator.cc
@@ -174,7 +174,7 @@ test05()
 {
   auto p = __gnu_test::nonexistent_path();
   create_directory(p);
-  create_directory_symlink(p, p / "l");
+  create_directory(p / "x");
   fs::recursive_directory_iterator it(p), endit;
   VERIFY( begin(it) == it );
   static_assert( noexcept(begin(it)), "begin is noexcept" );
@@ -185,6 +185,24 @@ test05()
   remove_all(p, ec);
 }
 
+void
+test06()
+{
+#if !(defined __MINGW32__ || defined __MINGW64__)
+  auto p = __gnu_test::nonexistent_path();
+  create_directories(p/"d1/d2");
+  create_directory_symlink("d1", p/"link");
+  fs::recursive_directory_iterator it(p), endit;
+  VERIFY( std::distance(it, endit) == 3 ); // d1 and d2 and link
+
+  it = fs::recursive_directory_iterator(p, 
fs::directory_options::follow_directory_symlink);
+  VERIFY( std::distance(it, endit) == 4 ); // d1 and d1/d2 and link and link/d2
+
+  std::error_code ec;
+  remove_all(p, ec);
+#endif
+}
+
 int
 main()
 {
@@ -193,4 +211,5 @@ main()
   test03();
   test

Re: [PATCH v2 3/8] rs6000: Unify error messages for built-in constant restrictions

2022-02-01 Thread Segher Boessenkool
On Tue, Feb 01, 2022 at 08:53:52AM -0600, Bill Schmidt wrote:
> As discussed, I simplified this patch by just changing how the error
> message is produced:
> 
> We currently give different error messages for built-in functions that
> violate range restrictions on their arguments, depending on whether we
> record them as requiring an n-bit literal or a literal between two values.
> It's better to be consistent.  Change the error message for the n-bit
> literal to look like the other one.

> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -5729,8 +5729,10 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
> subtarget */,
>   if (!(TREE_CODE (restr_arg) == INTEGER_CST
> && (TREE_INT_CST_LOW (restr_arg) & ~mask) == 0))
> {
> - error ("argument %d must be a %d-bit unsigned literal",
> -bifaddr->restr_opnd[i], bifaddr->restr_val1[i]);
> + unsigned p = ((unsigned) 1 << bifaddr->restr_val1[i]) - 1;

Don't write
  (unsigned) 1
but instead just
  1U
please.  Does it need to be 1ULL btw?  (You need this if restr_val1[i]
can be greater than or equal to 32).  (32 itself would work, but is UB
nevertheless).

Okay with that.  Thanks!


Segher


[PATCH] PR fortran/104311 - [9/10/11/12 Regression] ICE out of memory since r9-6321-g4716603bf875ce

2022-02-01 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

a check in gfc_calculate_transfer_sizes had a bug in the logic:
it did not trigger for MOLD having a storage size zero when
arugment SIZE was present.  The attached obvious patch fixes this.

Regtested on x86_64-pc-linux-gnu.  OK for mainline/11/10/9?

Thanks,
Harald

From b8f124adcf258eccd29ffcec5cc3f8915cc2ca47 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Tue, 1 Feb 2022 23:33:24 +0100
Subject: [PATCH] Fortran: reject simplifying TRANSFER for MOLD with storage
 size 0

gcc/fortran/ChangeLog:

	PR fortran/104311
	* check.cc (gfc_calculate_transfer_sizes): Checks for case when
	storage size of SOURCE is greater than zero while the storage size
	of MOLD is zero and MOLD is an array shall not depend on SIZE.

gcc/testsuite/ChangeLog:

	PR fortran/104311
	* gfortran.dg/transfer_simplify_15.f90: New test.
---
 gcc/fortran/check.cc   |  2 +-
 gcc/testsuite/gfortran.dg/transfer_simplify_15.f90 | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/transfer_simplify_15.f90

diff --git a/gcc/fortran/check.cc b/gcc/fortran/check.cc
index d6c6767ae9e..fc97bb1371e 100644
--- a/gcc/fortran/check.cc
+++ b/gcc/fortran/check.cc
@@ -6150,7 +6150,7 @@ gfc_calculate_transfer_sizes (gfc_expr *source, gfc_expr *mold, gfc_expr *size,
* representation is not shorter than that of SOURCE.
* If SIZE is present, the result is an array of rank one and size SIZE.
*/
-  if (result_elt_size == 0 && *source_size > 0 && !size
+  if (result_elt_size == 0 && *source_size > 0
   && (mold->expr_type == EXPR_ARRAY || mold->rank))
 {
   gfc_error ("% argument of % intrinsic at %L is an "
diff --git a/gcc/testsuite/gfortran.dg/transfer_simplify_15.f90 b/gcc/testsuite/gfortran.dg/transfer_simplify_15.f90
new file mode 100644
index 000..cdbec97ae71
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/transfer_simplify_15.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! PR fortran/104311 - ICE out of memory
+! Contributed by G.Steinmetz
+
+program p
+  type t
+  end type
+  type(t) :: x(2)
+  print *, transfer(1,x,2)   ! { dg-error "shall not have storage size 0" }
+  print *, transfer(1,x,huge(1)) ! { dg-error "shall not have storage size 0" }
+end
--
2.34.1



Re: [PATCH 5/8] rs6000: Fix LE code gen for vec_cnt[lt]z_lsbb [PR95082]

2022-02-01 Thread Segher Boessenkool
On Fri, Jan 28, 2022 at 11:50:23AM -0600, Bill Schmidt wrote:
> These built-ins were misimplemented as always having big-endian semantics.

>   PR target/95082
>   * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Handle
>   endianness for vclzlsbb and vctzlsbb.
>   * config/rs6000/rs6000-builtins.def (VCLZLSBB_V16QI): Change
>   default pattern and indicate a different pattern will be used for
>   big endian.
>   (VCLZLSBB_V4SI): Likewise.
>   (VCLZLSBB_V8HI): Likewise.
>   (VCTZLSBB_V16QI): Likewise.
>   (VCTZLSBB_V4SI): Likewise.
>   (VCTZLSBB_V8HI): Likewise.

Please don't break lines early?  Changelog lines are one tab followed
by 72 chars.

The patch is fine though.  Thanks!


Segher


[PATCH] [COMMITTED] Change multiprecision.org to use https

2022-02-01 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

As reported at
https://gcc.gnu.org/pipermail/gcc/2022-February/238216.html,
multiprecision.org now uses https so this updates the documentation
to use https instead of http.

Committed as obvious.

gcc/ChangeLog:

* doc/install.texi:
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index dae7c0acc36..f8898af027d 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -406,7 +406,7 @@ download_prerequisites installs.
 @item MPC Library version 1.0.1 (or later)
 
 Necessary to build GCC@.  It can be downloaded from
-@uref{http://www.multiprecision.org/mpc/}.  If an MPC source distribution
+@uref{https://www.multiprecision.org/mpc/}.  If an MPC source distribution
 is found in a subdirectory of your GCC sources named @file{mpc}, it
 will be built together with GCC.  Alternatively, if MPC is already
 installed but it is not in your default library search path, the
-- 
2.17.1



[PATCH] adjust warn-access pass placement [PR104260]

2022-02-01 Thread Martin Sebor via Gcc-patches

The attached patch adjusts the placement of the warn-access pass
as the two of you suggested in the bug.  Please let me know if
this is good to commit or if you want me to make some other tweaks.

The patch passes tests in an x86_64-linux bootstrap.

MartinPR middle-end/104260 - Misplaced waccess3 pass

gcc/ChangeLog:

	PR 104260
	* passes.def (pass_warn_access): Adjust pass placement.

diff --git a/gcc/passes.def b/gcc/passes.def
index 3e75de46c23..f7718181038 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -60,10 +60,10 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_printf);
   NEXT_PASS (pass_warn_nonnull_compare);
   NEXT_PASS (pass_early_warn_uninitialized);
+  NEXT_PASS (pass_warn_access, /*early=*/true);
   NEXT_PASS (pass_ubsan);
   NEXT_PASS (pass_nothrow);
   NEXT_PASS (pass_rebuild_cgraph_edges);
-  NEXT_PASS (pass_warn_access, /*early=*/true);
   POP_INSERT_PASSES ()
 
   NEXT_PASS (pass_local_optimization_passes);
@@ -428,9 +428,9 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_gimple_isel);
   NEXT_PASS (pass_harden_conditional_branches);
   NEXT_PASS (pass_harden_compares);
+  NEXT_PASS (pass_warn_access, /*early=*/false);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
-  NEXT_PASS (pass_warn_access, /*early=*/false);
 
   NEXT_PASS (pass_expand);
 


Re: [committed] libstdc++: Reset filesystem::recursive_directory_iterator on error

2022-02-01 Thread Jonathan Wakely via Gcc-patches
On Tue, 1 Feb 2022 at 22:00, Jonathan Wakely via Libstdc++
 wrote:
>
> The standard requires directory iterators to become equal to the end
> iterator value if they report an error. Some members functions of
> filesystem::recursive_directory_iterator fail to do that.
>
> libstdc++-v3/ChangeLog:
>
> * src/c++17/fs_dir.cc (recursive_directory_iterator::increment):
> Reset state to past-the-end iterator on error.
> (fs::recursive_directory_iterator::pop(error_code&)): Likewise.
> (fs::recursive_directory_iterator::pop()): Check _M_dirs before
> it might get reset.
> * src/filesystem/dir.cc (recursive_directory_iterator): Likewise,
> for the TS implementation.
> * testsuite/27_io/filesystem/iterators/error_reporting.cc: New test.
> * testsuite/experimental/filesystem/iterators/error_reporting.cc: New 
> test.
> ---
>  libstdc++-v3/src/c++17/fs_dir.cc  |  12 +-
>  libstdc++-v3/src/filesystem/dir.cc|  12 +-
>  .../filesystem/iterators/error_reporting.cc   | 135 +
>  .../filesystem/iterators/error_reporting.cc   | 136 ++
>  4 files changed, 291 insertions(+), 4 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
>  create mode 100644 
> libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc
>
[...]
> +
> +extern "C" struct dirent* readdir(DIR*)
> +{
> +  switch (choice)
> +  {
> +  case 1:
> +global_dirent.d_ino = 999;
> +global_dirent.d_type = DT_REG;

These new tests should not use the d_type member unless it's actually
present on the OS. Fixed by the attached patch.

Tested x86_64-linux and mingw-w64, pushed to trunk.
commit d98668eb06f532b2dbe0c721fa1b9ed6e643df27
Author: Jonathan Wakely 
Date:   Tue Feb 1 23:58:08 2022

libstdc++: Do not use dirent::d_type unconditionally

These new tests should not use the d_type member unless it's actually
present on the OS.

libstdc++-v3/ChangeLog:

* testsuite/27_io/filesystem/iterators/error_reporting.cc: Use
autoconf macro to check whether d_type is present.
* testsuite/experimental/filesystem/iterators/error_reporting.cc:
Likewise.

diff --git 
a/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
index 81ef1069367..1f297a731a3 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/error_reporting.cc
@@ -36,14 +36,18 @@ extern "C" struct dirent* readdir(DIR*)
   {
   case 1:
 global_dirent.d_ino = 999;
+#if defined _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE && defined DT_REG
 global_dirent.d_type = DT_REG;
+#endif
 global_dirent.d_reclen = 0;
 std::char_traits::copy(global_dirent.d_name, "file", 5);
 choice = 0;
 return &global_dirent;
   case 2:
 global_dirent.d_ino = 111;
+#if defined _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE && defined DT_DIR
 global_dirent.d_type = DT_DIR;
+#endif
 global_dirent.d_reclen = 60;
 std::char_traits::copy(global_dirent.d_name, "subdir", 7);
 choice = 1;
diff --git 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc
index ade62732028..806c511ebef 100644
--- 
a/libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc
+++ 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/error_reporting.cc
@@ -37,14 +37,18 @@ extern "C" struct dirent* readdir(DIR*)
   {
   case 1:
 global_dirent.d_ino = 999;
+#if defined _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE && defined DT_REG
 global_dirent.d_type = DT_REG;
+#endif
 global_dirent.d_reclen = 0;
 std::char_traits::copy(global_dirent.d_name, "file", 5);
 choice = 0;
 return &global_dirent;
   case 2:
 global_dirent.d_ino = 111;
+#if defined _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE && defined DT_DIR
 global_dirent.d_type = DT_DIR;
+#endif
 global_dirent.d_reclen = 60;
 std::char_traits::copy(global_dirent.d_name, "subdir", 7);
 choice = 1;


Re: [PATCH] declare std::array members attribute const [PR101831]

2022-02-01 Thread Martin Sebor via Gcc-patches

On 2/1/22 12:48, Jonathan Wakely wrote:

On Tue, 1 Feb 2022 at 18:54, Martin Sebor via Libstdc++
 wrote:


Passing an uninitialized object to a function that takes its argument
by const reference is diagnosed by -Wmaybe-uninitialized because most
such functions read the argument.  The exceptions are functions that
don't access the object but instead use its address to compute
a result.  This includes a number of std::array member functions such
as std::array::size() which returns the template argument N.  Such
functions may be candidates for attribute const which also avoids
the warning.  The attribute typically only benefits extern functions
that IPA cannot infer the property from, but in this case it helps
avoid the warning which runs very early on, even without optimization
or inlining.  The attached patch adds the attribute to a subset of
those member functions of std::array.  (It doesn't add it to const
member functions like cbegin() or front() that return a const_iterator
or const reference to the internal data.)

It might be possible to infer this property from inline functions
earlier on than during IPA and avoid having to annotate them explicitly.
That seems like an enhancement worth considering in the future.

Tested on x86_64-linux.

Martin


new file mode 100644
index 000..b7743adf3c9
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/array/iterators/begin_end.cc
@@ -0,0 +1,56 @@
+// { dg-do compile { target c++11 } }
+//
+// Copyright (C) 2011-2022 Free Software Foundation, Inc.

Those dates look wrong. I no longer bother putting a license text and
copyright notice on simple tests like this. It's meaningless to assert
copyright on something so trivial that doesn't do anything.



Should I take to mean that you're okay with the rest of the change
(i.e., with the notice removed)?

Martin


Re: [PATCH] declare std::array members attribute const [PR101831]

2022-02-01 Thread Jonathan Wakely via Gcc-patches
On Wed, 2 Feb 2022 at 00:13, Martin Sebor  wrote:
>
> On 2/1/22 12:48, Jonathan Wakely wrote:
> > On Tue, 1 Feb 2022 at 18:54, Martin Sebor via Libstdc++
> >  wrote:
> >>
> >> Passing an uninitialized object to a function that takes its argument
> >> by const reference is diagnosed by -Wmaybe-uninitialized because most
> >> such functions read the argument.  The exceptions are functions that
> >> don't access the object but instead use its address to compute
> >> a result.  This includes a number of std::array member functions such
> >> as std::array::size() which returns the template argument N.  Such
> >> functions may be candidates for attribute const which also avoids
> >> the warning.  The attribute typically only benefits extern functions
> >> that IPA cannot infer the property from, but in this case it helps
> >> avoid the warning which runs very early on, even without optimization
> >> or inlining.  The attached patch adds the attribute to a subset of
> >> those member functions of std::array.  (It doesn't add it to const
> >> member functions like cbegin() or front() that return a const_iterator
> >> or const reference to the internal data.)
> >>
> >> It might be possible to infer this property from inline functions
> >> earlier on than during IPA and avoid having to annotate them explicitly.
> >> That seems like an enhancement worth considering in the future.
> >>
> >> Tested on x86_64-linux.
> >>
> >> Martin
> >
> > new file mode 100644
> > index 000..b7743adf3c9
> > --- /dev/null
> > +++ b/libstdc++-v3/testsuite/23_containers/array/iterators/begin_end.cc
> > @@ -0,0 +1,56 @@
> > +// { dg-do compile { target c++11 } }
> > +//
> > +// Copyright (C) 2011-2022 Free Software Foundation, Inc.
> >
> > Those dates look wrong. I no longer bother putting a license text and
> > copyright notice on simple tests like this. It's meaningless to assert
> > copyright on something so trivial that doesn't do anything.
> >
>
> Should I take to mean that you're okay with the rest of the change
> (i.e., with the notice removed)?

Yes, OK for trunk either with the notice entirely removed, or just fix
the dates (I don't think it is copied from an existing test dating
from 2011, right?)

Whichever you prefer.



[PATCH 0/5] A few CRIS port improvements

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
I'm taking advantage of CRIS being a lesser important target and as
such not subject to the constraints of GCC being in stage 4.
I'm applying this set of CRIS-specific changes that don't have much
expected effect on generated code.

1: cris: Don't default to -mmul-bug-workaround
Avoid the workaround having a sort-of pseudorandom effect
on observability of code-quality changes.

2: cris: For expanded movsi, don't match operands we know will be
reloaded
A random improvement noticed while looking at the performance impact
of the other changes, helping reload to behave better.

3:  cris: Remove CRIS v32 ACR artefacts
The bigger cleanup; tightening up register classes.  A left-over from
r11-220 / d0780379c1b6.  Unfortunately on its own has a negative
effect on performance, when applied in this order.  Don't apply
without the other patches in this set unless you're actually
interested in seeing the fallout.

4:  cris: Don't discriminate against ALL_REGS in TARGET_REGISTER_MOVE_COST
A workaround for a problem from before IRA, one that didn't fit well
with the register class cleanup.

5: cris: Reload using special-regs before general-regs
Fixing a flaw exposed by the register class cleanup.

 gcc/config/cris/constraints.md |  7 +-
 gcc/config/cris/cris.cc| 36 +-
 gcc/config/cris/cris.h | 46 --
 gcc/config/cris/cris.md| 33 +---
 gcc/doc/invoke.texi|  2 +-
 5 files changed, 70 insertions(+), 54 deletions(-)

-- 
2.30.2

brgds, H-P


[PATCH 1/5] cris: Don't default to -mmul-bug-workaround

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
This flips the default for the errata handling for an old version
(TL;DR: workaround: no multiply instruction last on a cache-line).
Newer versions of the CRIS cpu don't have that bug.  While the impact
of the workaround is very marginal (coremark: less than .05% larger,
less than .0005% slower) it's an irritating pseudorandom factor when
assessing the impact of other changes.

Also, fix a wart requiring changes to more than TARGET_DEFAULT to flip
the default.

People building old kernels or operating systems to run on
ETRAX 100 LX are advised to pass "-mmul-bug-workaround".

gcc:
* config/cris/cris.h (TARGET_DEFAULT): Don't include MASK_MUL_BUG.
(MUL_BUG_ASM_DEFAULT): New macro.
(MAYBE_AS_NO_MUL_BUG_ABORT): Define in terms of MUL_BUG_ASM_DEFAULT.
* doc/invoke.texi (CRIS Options, -mmul-bug-workaround): Adjust
accordingly.
---
 gcc/config/cris/cris.h | 19 ---
 gcc/doc/invoke.texi|  2 +-
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/config/cris/cris.h b/gcc/config/cris/cris.h
index b274e1166203..9245d7886929 100644
--- a/gcc/config/cris/cris.h
+++ b/gcc/config/cris/cris.h
@@ -153,7 +153,9 @@ extern int cris_cpu_version;
 
 #ifdef HAVE_AS_NO_MUL_BUG_ABORT_OPTION
 #define MAYBE_AS_NO_MUL_BUG_ABORT \
- "%{mno-mul-bug-workaround:-no-mul-bug-abort} "
+ "%{mno-mul-bug-workaround:-no-mul-bug-abort} " \
+ "%{mmul-bug-workaround:-mul-bug-abort} " \
+ "%{!mmul-bug-workaround:%{!mno-mul-bug-workaround:" MUL_BUG_ASM_DEFAULT "}} "
 #else
 #define MAYBE_AS_NO_MUL_BUG_ABORT
 #endif
@@ -255,15 +257,26 @@ extern int cris_cpu_version;
  (MASK_SIDE_EFFECT_PREFIXES + MASK_STACK_ALIGN \
   + MASK_CONST_ALIGN + MASK_DATA_ALIGN \
   + MASK_ALIGN_BY_32 \
-  + MASK_PROLOGUE_EPILOGUE + MASK_MUL_BUG)
+  + MASK_PROLOGUE_EPILOGUE)
 # else  /* 0 */
 #  define TARGET_DEFAULT \
  (MASK_SIDE_EFFECT_PREFIXES + MASK_STACK_ALIGN \
   + MASK_CONST_ALIGN + MASK_DATA_ALIGN \
-  + MASK_PROLOGUE_EPILOGUE + MASK_MUL_BUG)
+  + MASK_PROLOGUE_EPILOGUE)
 # endif
 #endif
 
+/* Don't depend on the assembler default setting for the errata machinery;
+   always pass the option to turn it on or off explicitly.  But, we have to
+   decide on which is the *GCC* default, and for that we should only need to
+   consider what's in TARGET_DEFAULT; no other changes should be necessary.  */
+
+#if (TARGET_DEFAULT & MASK_MUL_BUG)
+#define MUL_BUG_ASM_DEFAULT "-mul-bug-abort"
+#else
+#define MUL_BUG_ASM_DEFAULT "-no-mul-bug-abort"
+#endif
+
 /* Local, providing a default for cris_cpu_version.  */
 #define CRIS_DEFAULT_CPU_VERSION TARGET_CPU_DEFAULT
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cfd415110cdf..7af5c51cc3c7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -22268,7 +22268,7 @@ The options @option{-metrax4} and @option{-metrax100} 
are synonyms for
 @opindex mmul-bug-workaround
 @opindex mno-mul-bug-workaround
 Work around a bug in the @code{muls} and @code{mulu} instructions for CPU
-models where it applies.  This option is active by default.
+models where it applies.  This option is disabled by default.
 
 @item -mpdebug
 @opindex mpdebug
-- 
2.30.2



Re: [PATCH] declare std::array members attribute const [PR101831]

2022-02-01 Thread Martin Sebor via Gcc-patches

On 2/1/22 17:15, Jonathan Wakely wrote:

On Wed, 2 Feb 2022 at 00:13, Martin Sebor  wrote:


On 2/1/22 12:48, Jonathan Wakely wrote:

On Tue, 1 Feb 2022 at 18:54, Martin Sebor via Libstdc++
 wrote:


Passing an uninitialized object to a function that takes its argument
by const reference is diagnosed by -Wmaybe-uninitialized because most
such functions read the argument.  The exceptions are functions that
don't access the object but instead use its address to compute
a result.  This includes a number of std::array member functions such
as std::array::size() which returns the template argument N.  Such
functions may be candidates for attribute const which also avoids
the warning.  The attribute typically only benefits extern functions
that IPA cannot infer the property from, but in this case it helps
avoid the warning which runs very early on, even without optimization
or inlining.  The attached patch adds the attribute to a subset of
those member functions of std::array.  (It doesn't add it to const
member functions like cbegin() or front() that return a const_iterator
or const reference to the internal data.)

It might be possible to infer this property from inline functions
earlier on than during IPA and avoid having to annotate them explicitly.
That seems like an enhancement worth considering in the future.

Tested on x86_64-linux.

Martin


new file mode 100644
index 000..b7743adf3c9
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/array/iterators/begin_end.cc
@@ -0,0 +1,56 @@
+// { dg-do compile { target c++11 } }
+//
+// Copyright (C) 2011-2022 Free Software Foundation, Inc.

Those dates look wrong. I no longer bother putting a license text and
copyright notice on simple tests like this. It's meaningless to assert
copyright on something so trivial that doesn't do anything.



Should I take to mean that you're okay with the rest of the change
(i.e., with the notice removed)?


Yes, OK for trunk either with the notice entirely removed, or just fix
the dates (I don't think it is copied from an existing test dating
from 2011, right?)


I copied it from 23_containers/array/iterators/end_is_one_past.cc
without even looking at the dates.



Whichever you prefer.



Okay, pushed in r12-6992.

Martin


[PATCH 2/5] cris: For expanded movsi, don't match operands we know will be reloaded

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
In a session investigating unexpected fallout from a change, I
noticed reload needs one operand being a register to make an
informed decision.  It can happen that there's just a constant
and a memory operand, as in:

(insn 668 667 42 104 (parallel [
(set (mem:SI (plus:SI (reg/v/f:SI 347 [ fs ])
(const_int 168 [0xa8])) \
 [1 fs_126(D)->regs.cfa_how+0 S4 A8])
(const_int 2 [0x2]))
(clobber (reg:CC 19 dccr))
]) "<...>/gcc/libgcc/unwind-dw2.c":1121:21 22 {*movsi_internal}
 (expr_list:REG_UNUSED (reg:CC 19 dccr)
(nil)))

This was helpfully created by combine.  When this happens,
reload can't check for costs and preferred register classes,
(both operands will start with NO_REGS as the preferred class)
and will default to the constraints order in the insn in reload.
(Which also does its own temporary merge in find_reloads, but
that's a different story.)  Better don't match the simple cases.
Beware that subregs have to be matched.

I'm doing this just for word_mode (SI) for now, but may repeat
this for the other valid modes as well.  In particular, that
goes for DImode as I see the expanded movdi does *almost* this,
but uses register_operand instead of REG_S_P (from cris.h).
Using REG_S_P is the right choice here because register_operand
also matches (subreg (mem ...)  ...) *until* reload is done.
By itself it's just a sub-0.1% performance win (coremark).

Also removing a stale comment.

gcc:
* config/cris/cris.md ("*movsi_internal"):
Conditionalize on (sub-)register operands or operand 1 being 0.
---
 gcc/config/cris/cris.md | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index bc8d7584f6d9..9d1c179d5211 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -583,9 +583,10 @@ (define_insn "*movsi_internal"
 (match_operand:SI 1 "general_operand"
   "r,Q>,M,M, I,r, M,n,g,r,x,  rQ>,x,gi"))
(clobber (reg:CC CRIS_CC0_REGNUM))]
-;; Note that we prefer not to use the S alternative (if for some reason
-;; it competes with others) above, but g matches S.
-  ""
+  ;; Avoid matching insns we know must be reloaded.  Without one
+  ;; operand being a (pseudo-)register, reload chooses
+  ;; reload-registers suboptimally.
+  "REG_S_P (operands[0]) || REG_S_P (operands[1]) || operands[1] == const0_rtx"
 {
   /* Better to have c-switch here; it is worth it to optimize the size of
  move insns.  The alternative would be to try to find more constraint
-- 
2.30.2



[PATCH 3/5] cris: Remove CRIS v32 ACR artefacts

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
This is the change to which I alluded to this in r11-220 /
d0780379c1b6 as "causes extra register moves in libgcc".  It has
unfortunate side-effects due to the change in register-class topology.
There's a slight improvement in coremark numbers (< 0.07%) though also
increase in code size total (< 0.7%) but looking at the individual
changes in functions, it's all-over (-7..+7%).  Looking specifically
at functions that improved in speed, it's also both plus and minus in
code sizes.  It's unworkable to separate improvements from regressions
for this case.  I'll follow up with patches to restore the previous
code quality, in both size and speed.

gcc:
* config/cris/constraints.md (define_register_constraint "b"): Now
GENERAL_REGS.
* config/cris/cris.md (CRIS_ACR_REGNUM): Remove.
* config/cris/cris.h: (reg_class, REG_CLASS_NAMES)
(REG_CLASS_CONTENTS): Remove ACR_REGS, SPEC_ACR_REGS, GENNONACR_REGS,
and SPEC_GENNONACR_REGS.
* config/cris/cris.cc (cris_preferred_reload_class): Don't mention
ACR_REGS and return GENERAL_REGS instead of GENNONACR_REGS.
---
 gcc/config/cris/constraints.md |  7 ++-
 gcc/config/cris/cris.cc|  5 ++---
 gcc/config/cris/cris.h | 27 +--
 gcc/config/cris/cris.md|  1 -
 4 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/gcc/config/cris/constraints.md b/gcc/config/cris/constraints.md
index 01ec12c4cf2a..83fab622717d 100644
--- a/gcc/config/cris/constraints.md
+++ b/gcc/config/cris/constraints.md
@@ -18,7 +18,12 @@
 ;; .
 
 ;; Register constraints.
-(define_register_constraint "b" "GENNONACR_REGS"
+
+;; Kept for compatibility.  It used to exclude the CRIS v32
+;; register "ACR", which was like GENERAL_REGS except it
+;; couldn't be used for autoincrement, and intended mainly
+;; for use in user asm statements.
+(define_register_constraint "b" "GENERAL_REGS"
   "@internal")
 
 (define_register_constraint "h" "MOF_REGS"
diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index a7807b3cc25c..264439c7654a 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -1663,13 +1663,12 @@ cris_reload_address_legitimized (rtx x,
 static reg_class_t
 cris_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t rclass)
 {
-  if (rclass != ACR_REGS
-  && rclass != MOF_REGS
+  if (rclass != MOF_REGS
   && rclass != MOF_SRP_REGS
   && rclass != SRP_REGS
   && rclass != CC0_REGS
   && rclass != SPECIAL_REGS)
-return GENNONACR_REGS;
+return GENERAL_REGS;
 
   return rclass;
 }
diff --git a/gcc/config/cris/cris.h b/gcc/config/cris/cris.h
index 9245d7886929..6edfe13d92cc 100644
--- a/gcc/config/cris/cris.h
+++ b/gcc/config/cris/cris.h
@@ -436,19 +436,15 @@ extern int cris_cpu_version;
 
 /* Node: Register Classes */
 
-/* We need a separate register class to handle register allocation for
-   ACR, since it can't be used for post-increment.
-
-   It's not obvious, but having subunions of all movable-between
+/* It's not obvious, but having subunions of all movable-between
register classes does really help register allocation (pre-IRA
comment).  */
 enum reg_class
   {
 NO_REGS,
-ACR_REGS, MOF_REGS, SRP_REGS, CC0_REGS,
+MOF_REGS, SRP_REGS, CC0_REGS,
 MOF_SRP_REGS, SPECIAL_REGS,
-SPEC_ACR_REGS, GENNONACR_REGS,
-SPEC_GENNONACR_REGS, GENERAL_REGS,
+GENERAL_REGS,
 ALL_REGS,
 LIM_REG_CLASSES
   };
@@ -457,9 +453,8 @@ enum reg_class
 
 #define REG_CLASS_NAMES\
   {"NO_REGS",  \
-   "ACR_REGS", "MOF_REGS", "SRP_REGS", "CC0_REGS", \
+   "MOF_REGS", "SRP_REGS", "CC0_REGS", \
"MOF_SRP_REGS", "SPECIAL_REGS", \
-   "SPEC_ACR_REGS", "GENNONACR_REGS", "SPEC_GENNONACR_REGS",   \
"GENERAL_REGS", "ALL_REGS"}
 
 #define CRIS_SPECIAL_REGS_CONTENTS \
@@ -472,37 +467,25 @@ enum reg_class
 #define REG_CLASS_CONTENTS \
   {\
{0},\
-   {1 << CRIS_ACR_REGNUM}, \
{1 << CRIS_MOF_REGNUM}, \
{1 << CRIS_SRP_REGNUM}, \
{1 << CRIS_CC0_REGNUM}, \
{(1 << CRIS_MOF_REGNUM) \
 | (1 << CRIS_SRP_REGNUM)}, \
{CRIS_SPECIAL_REGS_CONTENTS},   \
-   {CRIS_SPECIAL_REGS_CONTENTS \
-| (1 << CRIS_ACR_REGNUM)}, \
-   {(0x | CRIS_FAKED_REGS_CONTENTS)\
-& ~(1 << CRIS_ACR_REGNUM)},\
-   {(0x | CRIS_FAKED_REGS_CONTENTS \
-| CRIS_SPECIAL_REGS_CONTENTS)  \
-& ~(1 << CRIS_ACR_REGNUM)},\
{0x | CRIS_FAKED_REGS_

[PATCH 4/5] cris: Don't discriminate against ALL_REGS in TARGET_REGISTER_MOVE_COST

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
When the tightest class including both SPECIAL_REGS and GENERAL_REGS
is ALL_REGS, artificially special-casing for *either* to or from, hits
artificially hard.  This gets the port back to the code quality before
the previous patch ("cris: Remove CRIS v32 ACR artefacts") - except
for_vfprintf_r and _vfiprintf_r in newlib (still .8 and .4% larger).

gcc:
* config/cris/cris.cc (cris_register_move_cost): Remove special pre-ira
extra cost for ALL_REGS.
---
 gcc/config/cris/cris.cc | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 264439c7654a..4f977221f459 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -1683,20 +1683,10 @@ cris_register_move_cost (machine_mode mode 
ATTRIBUTE_UNUSED,
  their move cost within that class is higher.  How about 7?  That's 3
  for a move to a GENERAL_REGS register, 3 for the move from the
  GENERAL_REGS register, and 1 for the increased register pressure.
- Also, it's higher than the memory move cost, as it should.
- We also do this for ALL_REGS, since we don't want that class to be
- preferred (even to memory) at all where GENERAL_REGS doesn't fit.
- Whenever it's about to be used, it's for SPECIAL_REGS.  If we don't
- present a higher cost for ALL_REGS than memory, a SPECIAL_REGS may be
- used when a GENERAL_REGS should be used, even if there are call-saved
- GENERAL_REGS left to allocate.  This is because the fall-back when
- the most preferred register class isn't available, isn't the next
- (or next good) wider register class, but the *most widest* register
- class.  FIXME: pre-IRA comment, perhaps obsolete now.  */
-
-  if ((reg_classes_intersect_p (from, SPECIAL_REGS)
-   && reg_classes_intersect_p (to, SPECIAL_REGS))
-  || from == ALL_REGS || to == ALL_REGS)
+ Also, it's higher than the memory move cost, as it should be.  */
+
+  if (reg_classes_intersect_p (from, SPECIAL_REGS)
+  && reg_classes_intersect_p (to, SPECIAL_REGS))
 return 7;
 
   /* Make moves to/from SPECIAL_REGS slightly more expensive, as we
-- 
2.30.2



[PATCH 5/5] cris: Reload using special-regs before general-regs

2022-02-01 Thread Hans-Peter Nilsson via Gcc-patches
On code where reload has an effect (i.e. quite rarely, just enough to be
noticeable), this change gets code quality back to the situation prior
to "Remove CRIS v32 ACR artefacts".  We had from IRA a pseudoregister
marked to be reloaded from a union of all allocatable registers (here:
SPEC_GENNONACR_REGS) but where the register-class corresponding to the
constraint for the register-type alternative (here: GENERAL_REGS) was
*not* a subset of that class: SPEC_GENNONACR_REGS (and GENNONACR_REGS)
had a one-register "hole" for the ACR register, a register present in
GENERAL_REGS.

Code in reload.cc:find_reloads adds 4 to the cost of a register-type
alternative that is neither a subset of the preferred register class nor
vice versa and thus reload thinks it can't use.  It would be preferable
to look for a non-empty intersection of the two, and use that
intersection for that alternative, something that can't be expressed
because a register class can't be formed from a random register set.

The effect was here that the GENERAL_REGS to/from memory alternatives
("r") had their cost raised such that the SPECIAL_REGS alternatives
("x") looked better.  This happened to improve code quality just a
little bit compared to GENERAL_REGS being chosen.

Anyway, with the improved CRIS register-class topology, the
subset-checking code no longer has the GENERAL_REGS-demoting effect.
To get the same quality, we have to adjust the port such that
SPECIAL_REGS are specifically preferred when possible and advisible,
i.e. when there's at least two of those registers as for the CPU variant
with multiplication (which happens to be the variant maintained for
performance).

For the move-pattern, the obvious method may seem to simply "curse" the
constraints of some alternatives (by prepending one of the "?!^$"
characters) but that method can't be used, because we want the effect to
be conditional on the CPU variant.  It'd also be a shame to split the
"*movsi_internal" into two CPU-variants (with
different cursing).  Iterators would help, but it still seems unwieldy.
Instead, add copies of the GENERAL_REGS variants (to the SPECIAL_REGS
alternatives) on the "other" side, and make use of the "enabled"
attribute to activate just the desired order of alternatives.

gcc:

* config/cris/cris.cc (cris_preferred_reload_class): Reject
"eliminated" registers and small-enough constants unless
reloaded into a class that is a subset of GENERAL_REGS.
* config/cris/cris.md (attribute "cpu_variant"): New.
(attribute "enabled"): Conditionalize on a matching attribute
cpu_variant, if specified.
("*movsi_internal"): For moves to and from
memory, add cpu-variant-enabled variants for "r" alternatives on
the far side of the "x" alternatives, preferring the "x" ones
only for variants where MOF is present (in addition to SRP).
---
 gcc/config/cris/cris.cc | 13 -
 gcc/config/cris/cris.md | 25 -
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 4f977221f459..f0017d630229 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -1661,7 +1661,7 @@ cris_reload_address_legitimized (rtx x,
a bug.  */
 
 static reg_class_t
-cris_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t rclass)
+cris_preferred_reload_class (rtx x, reg_class_t rclass)
 {
   if (rclass != MOF_REGS
   && rclass != MOF_SRP_REGS
@@ -1670,6 +1670,17 @@ cris_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, 
reg_class_t rclass)
   && rclass != SPECIAL_REGS)
 return GENERAL_REGS;
 
+  /* We can't make use of something that's not a general register when
+ reloading an "eliminated" register (i.e. something that has turned into
+ e.g. sp + const_int).  */
+  if (GET_CODE (x) == PLUS && !reg_class_subset_p (rclass, GENERAL_REGS))
+return NO_REGS;
+
+  /* Avoid putting constants into a special register, where the instruction is
+ shorter if loaded into a general register.  */
+  if (satisfies_constraint_P (x) && !reg_class_subset_p (rclass, GENERAL_REGS))
+return NO_REGS;
+
   return rclass;
 }
 
diff --git a/gcc/config/cris/cris.md b/gcc/config/cris/cris.md
index 9d9eb8b7dbbf..dd7094163784 100644
--- a/gcc/config/cris/cris.md
+++ b/gcc/config/cris/cris.md
@@ -153,9 +153,20 @@ (define_delay (eq_attr "slottable" "has_return_slot")
(not (match_test "dead_or_set_regno_p (insn, CRIS_SRP_REGNUM)")))
(nil) (nil)])
 
+;; Enable choosing particular instructions.  The discriminator choice
+;; "v0" stands for "pre-v10", for brevity.
+(define_attr "cpu_variant" "default,v0,v10" (const_string "default"))
+
 (define_attr "enabled" "no,yes"
   (if_then_else
-   (eq_attr "cc_enabled" "normal")
+   (and
+(eq_attr "cc_enabled" "normal")
+(ior
+ (eq_attr "cpu_variant" "default")
+ (and (eq_attr "cpu_variant" "v10")
+ (match_test "TARGET_HAS_MUL_INSNS"))

Re: [PATCH] gcc-12: Mention -mharden-sls= and -mindirect-branch-cs-prefix

2022-02-01 Thread Richard Biener via Gcc-patches
On Tue, 1 Feb 2022, H.J. Lu wrote:


LGTM.

> ---
>  htdocs/gcc-12/changes.html | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
> index 2719b9d5..479bd6c5 100644
> --- a/htdocs/gcc-12/changes.html
> +++ b/htdocs/gcc-12/changes.html
> @@ -387,6 +387,12 @@ a work-in-progress.
>x86 systems with SSE2 enabled. Without {-mavx512fp16},
>all operations will be emulated in software and float
>instructions.
> +  Mitigation against straight line speculation (SLS) for function
> +  return and indirect jump is supported via
> +  -mharden-sls=[none|all|return|indirect-jmp].
> +  
> +  Add CS prefix to call and jmp to indirect thunk with branch target
> +  in r8-r15 registers via -mindirect-branch-cs-prefix.
>
>  
>