[PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

When writing the PR98737 fix, I've handled just the case where people
use __atomic_op_fetch (p, x, y) etc.
But some people actually use the other builtins, like
__atomic_fetch_op (p, x, y) op x.
The following patch canonicalizes the latter to the former and vice versa
when possible if the result of the builtin is a single use and if
that use is a cast with same precision, also that cast's lhs has a single
use.
For all ops of +, -, &, | and ^ we can do those
__atomic_fetch_op (p, x, y) op x -> __atomic_op_fetch (p, x, y)
(and __sync too) opts, but cases of INTEGER_CST and SSA_NAME x
behave differently.  For INTEGER_CST, typically - x is
canonicalized to + (-x), while for SSA_NAME we need to handle various
casts, which sometimes happen on the second argument of the builtin
(there can be even two subsequent casts for char/short due to the
promotions we do) and there can be a cast on the argument of op too.
And all ops but - are commutative.
For the other direction, i.e.
__atomic_op_fetch (p, x, y) rop x -> __atomic_fetch_op (p, x, y)
we can't handle op of & and |, those aren't reversible, for
op + rop is -, for - rop is + and for ^ rop is ^, otherwise the same
stuff as above applies.
And, there is another case, we canonicalize
x - y == 0 (or != 0) and x ^ y == 0 (or != 0) to x == y (or x != y)
and for constant y x + y == 0 (or != 0) to x == -y (or != -y),
so the patch also virtually undoes those canonicalizations, because
e.g. for the earlier PR98737 patch but even generally, it is better
if a result of atomic op fetch is compared against 0 than doing
atomic fetch op and compare it to some variable or non-zero constant.
As for debug info, for non-reversible operations (& and |) the patch
resets debug stmts if there are any, for -fnon-call-exceptions too
(didn't want to include debug temps right before all uses), but
otherwise it emits the reverse operation from the result as a debug
temp and uses that in debug stmts.

On the emitted assembly for the testcases which are fairly large,
I see substantial decreases of the *.s size:
-rw-rw-r--. 1 jakub jakub 116897 Jan 13 09:58 pr98737-1.svanilla
-rw-rw-r--. 1 jakub jakub  93861 Jan 13 09:57 pr98737-1.spatched
-rw-rw-r--. 1 jakub jakub  70257 Jan 13 09:57 pr98737-2.svanilla
-rw-rw-r--. 1 jakub jakub  67537 Jan 13 09:57 pr98737-2.spatched
There are some functions where due to RA we get one more instruction
than previously, but most of them are smaller even when not hitting
the PR98737 previous patch's optimizations.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-13  Jakub Jelinek  

PR target/98737
* tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
__atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
and __atomic_op_fetch (p, x, y) iop x into
__atomic_fetch_op (p, x, y).

* gcc.dg/tree-ssa/pr98737-1.c: New test.
* gcc.dg/tree-ssa/pr98737-2.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2022-01-11 23:11:23.467275019 +0100
+++ gcc/tree-ssa-forwprop.c 2022-01-12 22:12:24.666522743 +0100
@@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
memset (p + 4, ' ', 3);
into
memcpy (p, "abcd   ", 7);
-   call if the latter can be stored by pieces during expansion.  */
+   call if the latter can be stored by pieces during expansion.
+
+   Also canonicalize __atomic_fetch_op (p, x, y) op x
+   to __atomic_op_fetch (p, x, y) or
+   __atomic_op_fetch (p, x, y) iop x
+   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
 
 static bool
 simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
 {
   gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
+  enum built_in_function other_atomic = END_BUILTINS;
+  enum tree_code atomic_op = ERROR_MARK;
   tree vuse = gimple_vuse (stmt2);
   if (vuse == NULL)
 return false;
@@ -1448,6 +1455,300 @@ simplify_builtin_call (gimple_stmt_itera
}
}
   break;
+
+ #define CASE_ATOMIC(NAME, OTHER, OP) \
+case BUILT_IN_##NAME##_1:  \
+case BUILT_IN_##NAME##_2:  \
+case BUILT_IN_##NAME##_4:  \
+case BUILT_IN_##NAME##_8:  \
+case BUILT_IN_##NAME##_16: \
+  atomic_op = OP;  \
+  other_atomic \
+   = (enum built_in_function) (BUILT_IN_##OTHER##_1\
+   + (DECL_FUNCTION_CODE (callee2) \
+  - BUILT_IN_##NAME##_1)); \
+  goto handle_atomic_fetch_op;
+
+CASE_ATOMIC (ATOMIC_FETCH_ADD, ATOMIC_ADD_FETCH, PLUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_SUB, ATOMIC_SUB_FETCH, MINUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_AND, ATOMIC_AND_FETCH, BIT_AND_EXPR)
+CASE

[PATCH] inliner: Don't emit copy stmts for empty type parameters [PR103989]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The following patch avoids emitting a parameter copy statement when inlining
if the parameter has empty type.  E.g. the gimplifier does something similar
(except that it needs to evaluate side-effects if any, which isn't the case
here):
  /* For empty types only gimplify the left hand side and right hand
 side as statements and throw away the assignment.  Do this after
 gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
 types properly.  */
  if (is_empty_type (TREE_TYPE (*from_p))
  && !want_value
  /* Don't do this for calls that return addressable types, expand_call
 relies on those having a lhs.  */
  && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
   && TREE_CODE (*from_p) == CALL_EXPR))
{
  gimplify_stmt (from_p, pre_p);
  gimplify_stmt (to_p, pre_p);
  *expr_p = NULL_TREE;
  return GS_ALL_DONE;
}
Unfortunately, this patch doesn't cure the uninit warnings in that PR,
but I think is desirable anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-13  Jakub Jelinek  

PR tree-optimization/103989
* tree-inline.c (setup_one_parameter): Don't copy parms with
empty type.

--- gcc/tree-inline.c.jj2022-01-11 23:11:23.422275652 +0100
+++ gcc/tree-inline.c   2022-01-12 18:37:44.119950128 +0100
@@ -3608,7 +3608,7 @@ setup_one_parameter (copy_body_data *id,
  init_stmt = gimple_build_assign (def, rhs);
}
}
-  else
+  else if (!is_empty_type (TREE_TYPE (var)))
 init_stmt = gimple_build_assign (var, rhs);
 
   if (bb && init_stmt)

Jakub



[PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

The changes done to genericize_if_stmt in order to improve
-Wunreachable-code* warning (which Richi didn't actually commit
for GCC 12) are I think fine for normal ifs, but for constexpr if
and consteval if we have two competing warnings.
The problem is that we replace the non-taken clause (then or else)
with void_node and keep the if (cond) { something } else {}
or if (cond) {} else { something }; in the IL.
This helps -Wunreachable-code*, if something can't fallthru but the
non-taken clause can, we don't warn about code after it because it
is still (in theory) reachable.
But if the non-taken branch can't fallthru, we can get false positive
-Wreturn-type warnings (which are enabled by default) if there is
nothing after the if and the taken branch can't fallthru either.

One possibility to fix this is revert at least temporarily
to the previous behavior for constexpr and consteval if, yes, we
can get false positive -Wunreachable-code* warnings but the warning
isn't present in GCC 12.
The patch below implements that for constexpr if which throws its
clauses very early (either during parsing or during instantiation),
and for consteval if it decides based on block_may_fallthru on the
non-taken (for constant evaluation only) clause - if the non-taken
branch may fallthru, it does what you did in genericize_if_stmt
for consteval if, if it can't fallthru, it uses the older way
of pretending there wasn't an if and just replacing it with the
taken clause.  There are some false positive risks with this though,
block_may_fallthru is optimistic and doesn't handle some statements
at all (like FOR_STMT, WHILE_STMT, DO_STMT - of course handling those
is quite hard).
For constexpr if (but perhaps for GCC 13?) we could try to
block_may_fallthru before we throw it away and remember it in some
flag on the IF_STMT, but am not sure how dangerous would it be to call
it on the discarded stmts.  Or if it is too dangerous e.g. just
remember whether the discarded block of consteval if wasn't present
or was empty, in that case assume fallthru, and otherwise assume
it can't fallthru (-Wunreachable-code possible false positives).

Bootstrapped/regtested on x86_64-linux and i686-linux, if needed,
I can also test the safer variant with just
  if (IF_STMT_CONSTEVAL_P (stmt))
stmt = else_;
for consteval if.

2022-01-13  Jakub Jelinek  

PR c++/103991
* cp-objcp-common.c (cxx_block_may_fallthru) : For
IF_STMT_CONSTEXPR_P with constant false or true condition only
check if the taken clause may fall through.
* cp-gimplify.c (genericize_if_stmt): For consteval if, revert
to r12-5638^ behavior if then_ block can't fall through.  For
constexpr if, revert to r12-5638^ behavior.

* g++.dg/warn/Wreturn-type-13.C: New test.

--- gcc/cp/cp-objcp-common.c.jj 2022-01-11 23:11:22.091294356 +0100
+++ gcc/cp/cp-objcp-common.c2022-01-12 17:57:18.232202275 +0100
@@ -313,6 +313,13 @@ cxx_block_may_fallthru (const_tree stmt)
   return false;
 
 case IF_STMT:
+  if (IF_STMT_CONSTEXPR_P (stmt))
+   {
+ if (integer_nonzerop (IF_COND (stmt)))
+   return block_may_fallthru (THEN_CLAUSE (stmt));
+ if (integer_zerop (IF_COND (stmt)))
+   return block_may_fallthru (ELSE_CLAUSE (stmt));
+   }
   if (block_may_fallthru (THEN_CLAUSE (stmt)))
return true;
   return block_may_fallthru (ELSE_CLAUSE (stmt));
--- gcc/cp/cp-gimplify.c.jj 2022-01-11 23:11:22.090294370 +0100
+++ gcc/cp/cp-gimplify.c2022-01-12 21:22:17.585212804 +0100
@@ -166,8 +166,15 @@ genericize_if_stmt (tree *stmt_p)
  can contain unfolded immediate function calls, we have to discard
  the then_ block regardless of whether else_ has side-effects or not.  */
   if (IF_STMT_CONSTEVAL_P (stmt))
-stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
-  void_node, else_);
+{
+  if (block_may_fallthru (then_))
+   stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
+  void_node, else_);
+  else
+   stmt = else_;
+}
+  else if (IF_STMT_CONSTEXPR_P (stmt))
+stmt = integer_nonzerop (cond) ? then_ : else_;
   else
 stmt = build3 (COND_EXPR, void_type_node, cond, then_, else_);
   protected_set_expr_location_if_unset (stmt, locus);
--- gcc/testsuite/g++.dg/warn/Wreturn-type-13.C.jj  2022-01-12 
21:21:36.567794238 +0100
+++ gcc/testsuite/g++.dg/warn/Wreturn-type-13.C 2022-01-12 21:20:48.487475787 
+0100
@@ -0,0 +1,35 @@
+// PR c++/103991
+// { dg-do compile { target c++17 } }
+
+struct S { ~S(); };
+int
+foo ()
+{
+  S s;
+  if constexpr (true)
+return 0;
+  else
+return 1;
+}  // { dg-bogus "control reaches end of non-void 
function" }
+
+#if __cpp_if_consteval >= 202106L
+constexpr int
+bar ()
+{
+  S s;
+  if consteval
+{
+  return 0;
+}
+  else
+{
+  return 1;
+}
+}  // { dg-bogus "control 

Re: [PATCH] [12/11/10] Fix invalid format warnings on Windows

2022-01-13 Thread Martin Liška

On 1/12/22 14:34, Tomas Kalibera wrote:


On 1/11/22 2:37 PM, Martin Liška wrote:

Hello.

I do support the patch, but I would ...


Thanks, Martin,  that makes the patch simpler and easier to maintain. Would the 
attached version do?

Thanks
Tomas



On 1/7/22 19:33, Tomas Kalibera wrote:

+  if (is_attribute_p ("format", get_attribute_name (aa)) &&
+  fndecl && fndecl_built_in_p (fndecl, BUILT_IN_NORMAL))
+{
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+case BUILT_IN_FSCANF:
+case BUILT_IN_PRINTF:
+case BUILT_IN_SCANF:
+case BUILT_IN_SNPRINTF:
+case BUILT_IN_SSCANF:
+case BUILT_IN_VFSCANF:
+case BUILT_IN_VPRINTF:
+case BUILT_IN_VSCANF:
+case BUILT_IN_VSNPRINTF:
+case BUILT_IN_VSSCANF:
+case BUILT_IN_DCGETTEXT:
+case BUILT_IN_DGETTEXT:
+case BUILT_IN_GETTEXT:
+case BUILT_IN_STRFMON:
+case BUILT_IN_STRFTIME:
+case BUILT_IN_SNPRINTF_CHK:
+case BUILT_IN_VSNPRINTF_CHK:
+case BUILT_IN_PRINTF_CHK:
+case BUILT_IN_VPRINTF_CHK:
+  skipped_default_format = 1;
+  break;
+default:
+  break;
+}
+}


... skip this as the listed functions are only these that have defined 
ATTR_FORMAT_*:

$ grep ATTR_FORMAT gcc/builtins.def
DEF_LIB_BUILTIN    (BUILT_IN_FSCANF, "fscanf", 
BT_FN_INT_FILEPTR_CONST_STRING_VAR, ATTR_FORMAT_SCANF_2_3)
DEF_LIB_BUILTIN    (BUILT_IN_PRINTF, "printf", BT_FN_INT_CONST_STRING_VAR, 
ATTR_FORMAT_PRINTF_1_2)
DEF_LIB_BUILTIN    (BUILT_IN_SCANF, "scanf", BT_FN_INT_CONST_STRING_VAR, 
ATTR_FORMAT_SCANF_1_2)
DEF_C99_BUILTIN    (BUILT_IN_SNPRINTF, "snprintf", 
BT_FN_INT_STRING_SIZE_CONST_STRING_VAR, ATTR_FORMAT_PRINTF_NOTHROW_3_4)
DEF_LIB_BUILTIN    (BUILT_IN_SSCANF, "sscanf", 
BT_FN_INT_CONST_STRING_CONST_STRING_VAR, ATTR_FORMAT_SCANF_NOTHROW_2_3)
DEF_C99_BUILTIN    (BUILT_IN_VFSCANF, "vfscanf", 
BT_FN_INT_FILEPTR_CONST_STRING_VALIST_ARG, ATTR_FORMAT_SCANF_2_0)
DEF_LIB_BUILTIN    (BUILT_IN_VPRINTF, "vprintf", 
BT_FN_INT_CONST_STRING_VALIST_ARG, ATTR_FORMAT_PRINTF_1_0)
DEF_C99_BUILTIN    (BUILT_IN_VSCANF, "vscanf", 
BT_FN_INT_CONST_STRING_VALIST_ARG, ATTR_FORMAT_SCANF_1_0)
DEF_C99_BUILTIN    (BUILT_IN_VSNPRINTF, "vsnprintf", 
BT_FN_INT_STRING_SIZE_CONST_STRING_VALIST_ARG, ATTR_FORMAT_PRINTF_NOTHROW_3_0)
DEF_C99_BUILTIN    (BUILT_IN_VSSCANF, "vsscanf", 
BT_FN_INT_CONST_STRING_CONST_STRING_VALIST_ARG, ATTR_FORMAT_SCANF_NOTHROW_2_0)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_DCGETTEXT, "dcgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING_INT, ATTR_FORMAT_ARG_2)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_DGETTEXT, "dgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING, ATTR_FORMAT_ARG_2)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_GETTEXT, "gettext", BT_FN_STRING_CONST_STRING, 
ATTR_FORMAT_ARG_1)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_STRFMON, "strfmon", 
BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR, ATTR_FORMAT_STRFMON_NOTHROW_3_4)
DEF_LIB_BUILTIN    (BUILT_IN_STRFTIME, "strftime", 
BT_FN_SIZE_STRING_SIZE_CONST_STRING_CONST_TM_PTR, ATTR_FORMAT_STRFTIME_NOTHROW_3_0)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_SNPRINTF_CHK, "__snprintf_chk", 
BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VAR, ATTR_FORMAT_PRINTF_NOTHROW_5_6)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_VSNPRINTF_CHK, "__vsnprintf_chk", 
BT_FN_INT_STRING_SIZE_INT_SIZE_CONST_STRING_VALIST_ARG, ATTR_FORMAT_PRINTF_NOTHROW_5_0)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_PRINTF_CHK, "__printf_chk", 
BT_FN_INT_INT_CONST_STRING_VAR, ATTR_FORMAT_PRINTF_2_3)
DEF_EXT_LIB_BUILTIN    (BUILT_IN_VPRINTF_CHK, "__vprintf_chk", 
BT_FN_INT_INT_CONST_STRING_VALIST_ARG, ATTR_FORMAT_PRINTF_2_0)

Martin


Few inline comments:


From 82a659c7e5b24bbd39ac567dff3f79cc4c1e083f Mon Sep 17 00:00:00 2001
From: Tomas Kalibera 
Date: Wed, 12 Jan 2022 08:17:21 -0500
Subject: [PATCH] Mingw32 targets use ms_printf format for printf, but
 mingw-w64 when configured for UCRT uses gnu_format (via stdio.h). GCC then
 checks both formats, which means that one cannot print a 64-bit integer
 without a warning. All these lines issue a warning:


Please shorted the commit message's first line and put the rest to next lines.



  printf("Hello %"PRIu64"\n", x);
  printf("Hello %I64u\n", x);
  printf("Hello %llu\n", x);

because each of them violates one of the formats.  Also, one gets a warning
twice if the format string violates both formats.

Fixed by disabling the built in format in case there are additional ones.

gcc/c-family/ChangeLog:

PR c/95130
PR c/92292

* c-common.c (check_function_arguments): Pass also function
  declaration to check_function_format.

* c-common.h (check_function_format): Extra argument - function
  declaration.

* c-format.c (check_function_format): For builtin functions with a
  built

Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs)

2022-01-13 Thread Thomas Schwinge
Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> I've pushed to master branch [...] commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
> On 2019-02-01T00:59:30+0100, I wrote:
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

... and still is, but we're getting closer.

In preparation for a forthcoming ICE fix, I've pushed to master branch
commit 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b
"Enhance OpenACC 'kernels' decomposition testing", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 20 Dec 2021 16:14:46 +0100
Subject: [PATCH] Enhance OpenACC 'kernels' decomposition testing

	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
---
 .../c-c++-common/goacc/kernels-decompose-1.c  |  29 ++--
 .../c-c++-common/goacc/kernels-decompose-2.c  |  82 +++
 .../goacc/kernels-decompose-ice-1.c   |   7 +-
 .../goacc/kernels-decompose-ice-2.c   |   6 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  29 ++--
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  68 ++---
 .../declare-vla-kernels-decompose-ice-1.c |  14 ++
 .../declare-vla-kernels-decompose.c   |  23 
 .../libgomp.oacc-c-c++-common/declare-vla.c   |  16 +++
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c | 129 +-
 .../libgomp.oacc-c-c++-common/f-asyncwait-2.c |  70 --
 .../libgomp.oacc-c-c++-common/f-asyncwait-3.c |  59 ++--
 .../kernels-decompose-1.c |  14 +-
 .../libgomp.oacc-fortran/asyncwait-1.f90  |  86 ++--
 .../libgomp.oacc-fortran/asyncwait-2.f90  |  47 ++-
 .../libgomp.oacc-fortran/asyncwait-3.f90  |  47 ++-
 .../libgomp.oacc-fortran/pr94358-1.f90|  20 ++-
 17 files changed, 593 insertions(+), 153 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index f549cbadfa7..e58bc179f30 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,10 +1,16 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "-fdump-tree-gimple" } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
{ dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
aspects of that functionality.  */
 
@@ -14,7 +20,7 @@
passed to 'incr' may be unset, and in that case, it will be set to [...]",
so to maintain compatibility with earlier Tcl releases, we manually
initialize counter variables:
-   { dg-line l_dummy[variable c_loop_i 0] }
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
{ dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
"WARNING: dg-line var l_dummy defined, but not used".  */
 
@@ -28,36 +34,43 @@ main (void)
   int i;
   unsigned int sum = 1;
 
-#pragma acc kernels copyin(a[0:N]) copy(sum)
-  /* { dg-bogus "optimized: assigned OpenACC seq

[vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches

This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped 
modes that should not be skipped since it used the vector mode provided 
by autovectorize_vector_modes to derive the minimum VF required for it. 
However, those modes should only really be used to dictate vector size, 
so instead this patch looks for the mode in 'used_vector_modes' with the 
largest element size, and constructs a vector mode with the smae size as 
the current vector_modes[mode_i]. Since we are using the largest element 
size the NUNITs for this mode is the smallest possible VF required for 
an epilogue with this mode and should thus skip only the modes we are 
certain can not be used.


Passes bootstrap and regression on x86_64 and aarch64.

gcc/ChangeLog:

    PR 103997
    * tree-vect-loop.c (vect_analyze_loop): Fix mode skipping for 
epilogue

    vectorization.
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 
ba67de490bbd033b6db6217c8f9f9ca04cec323b..87b5ec5b4c6cb40e922b1e04bbce74233af8
 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -3038,12 +3038,37 @@ vect_analyze_loop (class loop *loop, vec_info_shared 
*shared)
 would be at least as high as the main loop's and we would be
 vectorizing for more scalar iterations than there would be left.  */
   if (!supports_partial_vectors
- && maybe_ge (GET_MODE_NUNITS (vector_modes[mode_i]), first_vinfo_vf))
-   {
- mode_i++;
- if (mode_i == vector_modes.length ())
-   break;
- continue;
+ && VECTOR_MODE_P (vector_modes[mode_i]))
+   {
+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER (vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }
+ /* After finding the largest element size used in the main loop, find
+the related vector mode with the same size as the mode
+corresponding to the current MODE_I.  */
+ machine_mode max_elsize_vector_mode =
+   related_vector_mode (vector_modes[mode_i], max_elsize_mode,
+0).else_void ();
+ if (VECTOR_MODE_P (max_elsize_vector_mode)
+ && maybe_ge (GET_MODE_NUNITS (max_elsize_vector_mode),
+  first_vinfo_vf))
+   {
+ mode_i++;
+ if (mode_i == vector_modes.length ())
+ break;
+ continue;
+   }
}
 
   if (dump_enabled_p ())


Re: [PATCH] inliner: Don't emit copy stmts for empty type parameters [PR103989]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> The following patch avoids emitting a parameter copy statement when inlining
> if the parameter has empty type.  E.g. the gimplifier does something similar
> (except that it needs to evaluate side-effects if any, which isn't the case
> here):
>   /* For empty types only gimplify the left hand side and right hand
>  side as statements and throw away the assignment.  Do this after
>  gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
>  types properly.  */
>   if (is_empty_type (TREE_TYPE (*from_p))
>   && !want_value
>   /* Don't do this for calls that return addressable types, expand_call
>  relies on those having a lhs.  */
>   && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
>&& TREE_CODE (*from_p) == CALL_EXPR))
> {
>   gimplify_stmt (from_p, pre_p);
>   gimplify_stmt (to_p, pre_p);
>   *expr_p = NULL_TREE;
>   return GS_ALL_DONE;
> }
> Unfortunately, this patch doesn't cure the uninit warnings in that PR,
> but I think is desirable anyway.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Hmm, but not emitting the initialization might cause even more such
warnings for the case where the passed in argument _is_ initialized
(or not visible as not, like when being a function parameter itself)?

Otherwise sure, it's the same what the gimplifier does.

I wonder if instead uninit warning should simply ignore uses of
"empty" typed variables?

OK.

Thanks,
Richard.

> 2022-01-13  Jakub Jelinek  
> 
>   PR tree-optimization/103989
>   * tree-inline.c (setup_one_parameter): Don't copy parms with
>   empty type.
> 
> --- gcc/tree-inline.c.jj  2022-01-11 23:11:23.422275652 +0100
> +++ gcc/tree-inline.c 2022-01-12 18:37:44.119950128 +0100
> @@ -3608,7 +3608,7 @@ setup_one_parameter (copy_body_data *id,
> init_stmt = gimple_build_assign (def, rhs);
>   }
>   }
> -  else
> +  else if (!is_empty_type (TREE_TYPE (var)))
>  init_stmt = gimple_build_assign (var, rhs);
>  
>if (bb && init_stmt)
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-01-13 Thread Thomas Schwinge
Hi!

On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>  - The "addressable" bit is set during the kernels conversion pass for
>variables that have "create" (alloc) clauses created for them in the
>synthesised outer data region (instead of in the front-end, etc.,
>where it can't be done accurately). Such variables actually have
>their address taken during transformations made in a later pass
>(omp-low, I think), but there's a phase-ordering problem that means
>the flag should be set earlier.

The actual issue is a bit different, but yes, there is a problem.
The related ICE has also been reported as 
"ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
didn't run into that with the OpenACC 'kernels' decomposition
originally.)  I've pushed to master branch
commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized data
clauses as addressable [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9b32c1669aad5459dd053424f9967011348add83 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 16 Dec 2021 22:02:37 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
 synthesized data clauses as addressable [PR100280]

... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

13073			else if (is_gimple_reg (var))
13074			  {
13075			gcc_assert (offloaded);
13076			tree avar = create_tmp_var (TREE_TYPE (var));
13077			mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown 
---
 gcc/omp-oacc-kernels-decompose.cc |   6 +-
 .../goacc/classify-kernels-parloops.c |  41 +++
 ...classify-kernels-unparallelized-parloops.c |  45 +++
 .../goacc/classify-kernels-unparallelized.c   |   5 +-
 .../c-c++-common/goacc/classify-kernels.c |   5 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  16 ++-
 .../goacc/kernels-decompose-ice-1.c   | 114 --
 .../goacc/kernels-decompose-ice-2.c   |  22 
 .../goacc/kernels-decompose-pr100280-1.c  |  19 +++
 .../goacc/classify-kernels-parloops.f95   |  43 +++
 ...assify-kernels-unparallelized-parloops.f95 |  47 
 .../goacc/classify-kernels-unparallelized.f95 |   5 +-
 .../gfortran.dg/goacc/classify-kernels.f95|   5 +-
 .../declare-vla-kernels-decompose-ice-1.c |   2 +-
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  53 
 .../kernels-decompose-1.c |   6 +-
 16 files changed, 264 insertions(+), 170 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100280-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95
 create mode 100644 gc

Wait at end of OpenACC asynchronous kernels regions

2022-01-13 Thread Thomas Schwinge
Hi!

On 2019-08-13T14:37:13-0700, Julian Brown  wrote:
> This patch provides a workaround for unreliable operation of asynchronous
> kernels regions on AMD GCN. At present, kernels regions are decomposed
> into a series of parallel regions surrounded by a data region capturing
> the data-movement clauses needed by the region as a whole:
>
>   #pragma acc kernels async(n)
>   { ... }
>
> is translated to:

... simplified...

>   #pragma acc data copyin(...) copyout(...)
>   {
> #pragma acc parallel async(n) present(...)
> { ... }
> #pragma acc parallel async(n) present(...)
> { ... }
>   }
>
> This is however problematic for two reasons:
>
>  - Variables mapped by the data clause will be unmapped immediately at the end
>of the data region, regardless of whether the inner asynchronous
>parallels have completed. (This causes crashes for GCN.)
>
>  - Even if the "present" clause caused the reference count to stay above zero
>at the end of the data region -- which it doesn't -- the "present"
>clauses on the inner parallel regions would not cause "copyout"
>variables to be transferred back to the host at the appropriate time,
>i.e. when the async parallel region had completed.

> There is no "async" data construct in OpenACC

(Actually, as of OpenACC 3.2 there now is:
 "[OpenACC] 'async' clause on 'data' construct"
-- but that's not yet implemented, so doesn't help us here.)

> so the correct solution
> (which I am deferring on for now) is probably to use asynchronous
> "enter data" and "exit data" directives when translating asynchronous
> kernels regions instead.

(Or rather, use structured 'data' (as we're now doing), but with
appropriate 'async' clauses.)

> The attached patch just adds a "wait" operation before the end of
> the enclosing data region. This works, but introduces undesirable
> synchronisation with the host.

ACK, thanks.  Pushed to master branch in
commit e52253bcc0916d9a7c7ba4bbe7501ae1ded3b8a8
"Wait at end of OpenACC asynchronous kernels regions", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From e52253bcc0916d9a7c7ba4bbe7501ae1ded3b8a8 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Fri, 9 Aug 2019 13:01:33 -0700
Subject: [PATCH] Wait at end of OpenACC asynchronous kernels regions

In OpenACC 'kernels' decomposition, we're improperly nesting synchronous and
asynchronous data and compute regions, giving rise to data races when the
asynchronicity is actually executed, as is visible in at least on test case
with GCN offloading.

The proper fix is to correctly use the asynchronous interfaces, making the
currently synchronous data regions fully asynchronous (see also
 "[OpenACC] 'async' clause on 'data' construct",
which is to share the same implementation), but that's for later; for now add
some more synchronization.

	gcc/
	* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
	of...
	(add_async_clauses_and_wait): ...here. Call new outlined function.
	(decompose_kernels_region_body): Add wait at the end of
	explicitly-asynchronous kernels regions.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Remove GCN
	offloading execution XFAIL.

Co-Authored-By: Thomas Schwinge 
---
 gcc/omp-oacc-kernels-decompose.cc | 31 ++-
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  1 -
 2 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 4ca899d5ece..21872db3ed3 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -878,6 +878,18 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
   return body;
 }
 
+static void
+add_wait (location_t loc, gimple_seq *region_body)
+{
+  /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0).  */
+  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
+  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
+  gimple *wait_call = gimple_build_call (wait_fn, 2,
+	 sync_arg, integer_zero_node);
+  gimple_set_location (wait_call, loc);
+  gimple_seq_add_stmt (region_body, wait_call);
+}
+
 /* Helper function of decompose_kernels_region_body.  The statements in
REGION_BODY are expected to be decomposed parts; add an 'async' clause to
each.  Also add a 'wait' directive at the end of the sequence.  */
@@ -900,13 +912,7 @@ add_async_clauses_and_wait (location_t loc, gimple_seq *region_body)
   gimple_omp_target_set_clauses (as_a  (stmt),
  target_clauses);
 }
-  /* A '#pragma acc wait' is just a call 'GOACC_wait (acc_async_sync, 0)'.  */
-  tree wait_fn = builtin_decl_explicit (BUILT_I

Re: [PATCH] inliner: Don't emit copy stmts for empty type parameters [PR103989]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 10:54:15AM +0100, Richard Biener wrote:
> > The following patch avoids emitting a parameter copy statement when inlining
> > if the parameter has empty type.  E.g. the gimplifier does something similar
> > (except that it needs to evaluate side-effects if any, which isn't the case
> > here):
> >   /* For empty types only gimplify the left hand side and right hand
> >  side as statements and throw away the assignment.  Do this after
> >  gimplify_modify_expr_rhs so we handle TARGET_EXPRs of addressable
> >  types properly.  */
> >   if (is_empty_type (TREE_TYPE (*from_p))
> >   && !want_value
> >   /* Don't do this for calls that return addressable types, expand_call
> >  relies on those having a lhs.  */
> >   && !(TREE_ADDRESSABLE (TREE_TYPE (*from_p))
> >&& TREE_CODE (*from_p) == CALL_EXPR))
> > {
> >   gimplify_stmt (from_p, pre_p);
> >   gimplify_stmt (to_p, pre_p);
> >   *expr_p = NULL_TREE;
> >   return GS_ALL_DONE;
> > }
> > Unfortunately, this patch doesn't cure the uninit warnings in that PR,
> > but I think is desirable anyway.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Hmm, but not emitting the initialization might cause even more such
> warnings for the case where the passed in argument _is_ initialized
> (or not visible as not, like when being a function parameter itself)?

Most of the time it won't be initialized either, but sure, there
can be some cases like when a larger struct is initialized with memset
and then we pass a field from that as an argument.

> Otherwise sure, it's the same what the gimplifier does.
> 
> I wonder if instead uninit warning should simply ignore uses of
> "empty" typed variables?

Apparently it does already:
  /* Avoid warning about empty types such as structs with no members.
 The first_field() test is important for C++ where the predicate
 alone isn't always sufficient.  */
  tree rhstype = TREE_TYPE (rhs);
  if (POINTER_TYPE_P (rhstype))
rhstype = TREE_TYPE (rhstype);
  if (is_empty_type (rhstype))
return NULL_TREE;
Though, the above
  if (POINTER_TYPE_P (rhstype))
rhstype = TREE_TYPE (rhstype);
is just extremely suspicious, either we care about what type rhs has,
or it is dereferenced and it must be a pointer type and we care about
what it points to, but the simple fact whether rhs has a pointer type
or some other type shouldn't change what we test is_empty_type on.

When I was briefly looking at the assignment on which it actually warned,
it actually looked not empty type related.

Jakub



Host and offload targets have no common meaning of address spaces

2022-01-13 Thread Thomas Schwinge
Hi!

Jakub, I'd still like your comment on the two "should we" questions cited
below.

On 2021-08-24T13:43:38+0200, Richard Biener via Gcc-patches 
 wrote:
> On Tue, Aug 24, 2021 at 12:23 PM Thomas Schwinge  
> wrote:
>> On 2021-08-19T22:13:56+0200, I wrote:
>> > On 2021-08-16T10:21:04+0200, Jakub Jelinek  wrote:
>> >> On Mon, Aug 16, 2021 at 10:08:42AM +0200, Thomas Schwinge wrote:
>> > |> Concerning the current 'gcc/omp-low.c:omp_build_component_ref', for the
>> > |> current set of offloading testcases, we never see a
>> > |> '!ADDR_SPACE_GENERIC_P' there, so the address space handling doesn't 
>> > seem
>> > |> to be necessary there (but also won't do any harm: no-op).
>> >>
>> >> Are you sure this can't trigger?
>> >> Say
>> >> extern int __seg_fs a;
>> >>
>> >> void
>> >> foo (void)
>> >> {
>> >>   #pragma omp parallel private (a)
>> >>   a = 2;
>> >> }
>> >
>> > That test case doesn't run into 'omp_build_component_ref' at all,
>> > but [I've pushed an altered and extended variant that does],
>> > "Add 'libgomp.c/address-space-1.c'".
>> >
>> > In this case, 'omp_build_component_ref' called via host compilation
>> > 'pass_lower_omp', it's the 'field_type' that has 'address-space-1'
>> > [...]:
>> >
>> > (gdb) call debug_tree(field_type)
>> >  > > type >
>> >> I think keeping the qual addr space here is the wrong thing to do,
>> >> it should keep the other quals and clear the address space instead,
>> >> the whole struct is going to be in generic addres space, isn't it?
>> >
>> > Correct for 'omp_build_component_ref' called via host compilation
>> > 'pass_lower_omp'
>>
>> > However, regarding the former comment -- shouldn't we force generic
>> > address space for all 'tree' types read in via LTO streaming for
>> > offloading compilation?  I assume that (in the general case) address
>> > spaces are never compatible between host and offloading compilation?
>> > For [...] "Add 'libgomp.c/address-space-1.c'", propagating the
>> > '__seg_fs' address space across the offloading boundary (assuming I did
>> > interpret the dumps correctly) doesn't seem to cause any problems
>>
>> As I found later, actually the 'address-space-1' per host '__seg_fs' does
>> cause the "Intel MIC (emulated) offloading execution failure"
>> mentioned/XFAILed for 'libgomp.c/address-space-1.c': SIGSEGV, like
>> (expected) for host execution.  For GCN offloading target, it maps to
>> GCN 'ADDR_SPACE_FLAT' which apparently doesn't cause any ill effects (for
>> that simple test case).  The nvptx offloading target doesn't consider
>> address spaces at all.
>>
>> Is the attached "Host and offload targets have no common meaning of
>> address spaces" OK to push?

> I'd
> say I agree that any host address-space should go away when the corresponding
> data is offloaded

Pushed to master branch commit 9fcc3a1dd2372deea8856c55d25337b06e201203
"Host and offload targets have no common meaning of address spaces", see
attached.


>> Then, is that the way to do this, or should we add in
>> 'gcc/tree-streamer-out.c:pack_ts_base_value_fields':
>>
>> if (lto_stream_offload_p)
>>   gcc_assert (ADDR_SPACE_GENERIC_P (TYPE_ADDR_SPACE (expr)));
>>
>> ..., and elsewhere sanitize this for offloading compilation?  Jakub's
>> suggestion above, regarding 'gcc/omp-low.c:omp_build_component_ref':
>>
>> | I think keeping the qual addr space here is the wrong thing to do,
>> | it should keep the other quals and clear the address space instead
>>
>> But it's not obvious to me that indeed this is the one place where this
>> would need to be done?  (It ought to work for
>> 'libgomp.c/address-space-1.c', and any other occurrences would run into
>> the 'assert', so that ought to be "fine", though?)
>>
>>
>> And, should we have a new hook
>> 'void targetm.addr_space.validate (addr_space_t as)' (better name?),
>> called via 'gcc/emit-rtl.c:set_mem_attrs' (only? -- assuming this is the
>> appropriate canonic function where address space use is observed?), to
>> make sure that the requested 'as' is valid for the target?
>> 'default_addr_space_validate' would refuse everything but
>> 'ADDR_SPACE_GENERIC_P (as)'; this hook would need implementing for all
>> handful of targets making use of address spaces (supposedly matching the
>> logic how they call 'c_register_addr_space'?).  (The closest existing
>> hook seems to be 'targetm.addr_space.diagnose_usage', only defined for
>> AVR, and called from "the front ends" (C only).)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9fcc3a1dd2372deea8856c55d25337b06e201203 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 24 Aug 2021 11:14:10 +0200
Subject: [PATCH] Host and offload targets have no common meaning of address
 spaces

	gcc/
	* tree-streamer-out.c (pack_ts_bas

Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Jambor
Hi,

On Tue, Jan 11 2022, Martin Liška wrote:
> Hello.
>
> I've got a patch series that does the renaming. It contains of 2 automatic
> scripts ([1] and [2]) that were run as:
>
> $ gcc-renaming-candidates.py gcc --rename && git commit -a -m 'Rename files.' 
> && rename-gcc.py . -vv && git commit -a -m 'Automatic renaming'
>
> The first scripts does the renaming (with a couple of exceptions that are 
> really C files) and saves
> the renamed files to a file. Then the file is then loaded and replacement of 
> all the renamed files does happen
> for most of the GCC files ([2]). It basically replaces at \b${old_filename}\b 
> with ${old_filename}c
> (with some exceptions). That corresponds to patch #1 and #2 and the patches 
> are quite huge.
>
> The last piece are manual changes needed for Makefile.in, configure.ac and so 
> on.
>
> The git branch can be seen here:
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/cc-renaming
>
> and pulled with:
> $ git fetch refs/users/marxin/heads/cc-renaming
> $ git co FETCH_HEAD
>

Thanks for the effort!  I looked at the branch and liked what I saw.  

Perhaps only a small nit about the commit message of the 2nd commit
("Automatic renaming of .c files to .cc.") which confused me.  It does
not actually rename any files so I would change it to "change references
to .c files to .cc files" or something like that.

But I assume the branch will need to be committed squashed anyway, so
commit message worries might be a bit premature.

I am looking forward to seeing it in trunk.

Martin


[PATCH][gcc-changelog] Simplify git-backport.py script.

2022-01-13 Thread Martin Liška

Pushed to master.

It's very unlikely that somebody is going to backport a revision
that is > 14 months old to a release branch.

contrib/ChangeLog:

* git-backport.py: Simplify the script as pre-auto-ChangeLog era
is 14 months old.
---
 contrib/git-backport.py | 39 ++-
 1 file changed, 2 insertions(+), 37 deletions(-)

diff --git a/contrib/git-backport.py b/contrib/git-backport.py
index 2b8e4686719..bc2907a14ed 100755
--- a/contrib/git-backport.py
+++ b/contrib/git-backport.py
@@ -23,43 +23,8 @@ import argparse
 import subprocess
 
 if __name__ == '__main__':

-parser = argparse.ArgumentParser(description='Backport a git revision and '
- 'stash all ChangeLog files.')
+parser = argparse.ArgumentParser(description='Backport a git revision.')
 parser.add_argument('revision', help='Revision')
 args = parser.parse_args()
 
-r = subprocess.run('git cherry-pick -x %s' % args.revision, shell=True)

-if r.returncode == 0:
-cmd = 'git show --name-only --pretty="" -- "*ChangeLog"'
-changelogs = subprocess.check_output(cmd, shell=True, encoding='utf8')
-changelogs = changelogs.strip()
-if changelogs:
-for changelog in changelogs.split('\n'):
-subprocess.check_output('git checkout HEAD~ %s' % changelog,
-shell=True)
-subprocess.check_output('git commit --amend --no-edit', shell=True)
-else:
-# 1) remove all ChangeLog files from conflicts
-out = subprocess.check_output('git diff --name-only --diff-filter=U',
-  shell=True,
-  encoding='utf8')
-conflicts = out.strip().split('\n')
-changelogs = [c for c in conflicts if c.endswith('ChangeLog')]
-if changelogs:
-cmd = 'git checkout --theirs %s' % ' '.join(changelogs)
-subprocess.check_output(cmd, shell=True)
-# 2) remove all ChangeLog files from index
-cmd = 'git diff --name-only --diff-filter=M HEAD'
-out = subprocess.check_output(cmd, shell=True, encoding='utf8')
-out = out.strip().split('\n')
-modified = [c for c in out if c.endswith('ChangeLog')]
-for m in modified:
-subprocess.check_output('git reset %s' % m, shell=True)
-subprocess.check_output('git checkout %s' % m, shell=True)
-
-# try to continue
-if len(conflicts) == len(changelogs):
-cmd = 'git -c core.editor=true cherry-pick --continue'
-subprocess.check_output(cmd, shell=True)
-else:
-print('Please resolve all remaining file conflicts.')
+subprocess.run('git cherry-pick -x %s' % args.revision, shell=True)
--
2.34.1



Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Thomas Schwinge
Hi!

This has fallen out of (unfinished...) work earlier in the year: pushed
to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 26 Aug 2021 16:55:21 +0200
Subject: [PATCH] Document current '-Wuninitialized'/'-Wmaybe-uninitialized'
 diagnostics for OpenACC test cases

... including "note: '[...]' was declared here" emitted since recent
commit 9695e1c23be5b5c55d572ced152897313ddb96ae
"Improve -Wuninitialized note location".

For those that seemed incorrect to me, I've placed XFAILed 'dg-bogus'es,
including one more instance of PR77504 etc., and several instances where
for "local variables" of reference-data-type reductions (etc.?) we emit
bogus (?) diagnostics.

For implicit data clauses (including 'firstprivate'), we seem to be missing
diagnostics, so I've placed XFAILed 'dg-warning's.

	gcc/testsuite/
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: Document
	current '-Wuninitialized' diagnostics.
	* c-c++-common/goacc/mdc-1.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-1-routine.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-kernels.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-parallel.c: Likewise.
	* c-c++-common/goacc/nested-reductions-2-routine.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* c-c++-common/goacc/uninit-firstprivate-clause.c: Likewise.
	* c-c++-common/goacc/uninit-if-clause.c: Likewise.
	* gfortran.dg/goacc/array-with-dt-1.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-2.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
	* gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-1.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-3.f90: Likewise.
	* gfortran.dg/goacc/derived-chartypes-4.f90: Likewise.
	* gfortran.dg/goacc/derived-classtypes-1.f95: Likewise.
	* gfortran.dg/goacc/derived-types-2.f90: Likewise.
	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/modules.f95: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-kernels.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Likewise.
	* gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise.
	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
	* gfortran.dg/goacc/pr93464.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute-loop.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-compute.f90: Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90:
	Likewise.
	* gfortran.dg/goacc/privatization-1-routine_gang.f90: Likewise.
	* gfortran.dg/goacc/uninit-dim-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-firstprivate-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-if-clause.f95: Likewise.
	* gfortran.dg/goacc/uninit-use-device-clause.f95: Likewise.
	* gfortran.dg/goacc/wait.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Document
	current '-Wuninitialized' diagnostics.
	* testsuite/libgomp.oacc-fortran/data-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gemm.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/optional-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr70643.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr96628-part1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reduction-7.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/reference-reductions.f90:
	Likewise.
---
 .../goacc/builtin-goacc-parlevel-id-size.c|  8 +
 gcc/testsuite/c-c++-common/goacc/mdc-1.c  |  4 +++
 .../goacc/nested-reductions-1-kernels.c   | 11 ++
 .../goacc/nested-reductions-1-parallel.c  | 14 
 .../goacc/nested-reductions-1-routine.c   |  4 +++
 .../goacc/nested-reductions-2-kernels.c   | 11 ++
 .../goacc/nested-reductions-2-parallel.c  | 14 
 .../goacc/nested-reductions-2-routine

Document current '-Wuninitialized' diagnostics for 'libgomp.oacc-fortran/routine-10.f90' [PR102192]

2022-01-13 Thread Thomas Schwinge
Hi!

On 2022-01-13T11:55:03+0100, I wrote:
> This has fallen out of (unfinished...) work earlier in the year: pushed
> to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
> "Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
> for OpenACC test cases".

..., and commit 2edbcaed95b8d8cbb05a6af486179db0da6e3245
"Document current '-Wuninitialized' diagnostics for
'libgomp.oacc-fortran/routine-10.f90' [PR102192]".


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 2edbcaed95b8d8cbb05a6af486179db0da6e3245 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 26 Aug 2021 16:55:21 +0200
Subject: [PATCH] Document current '-Wuninitialized' diagnostics for
 'libgomp.oacc-fortran/routine-10.f90' [PR102192]

	libgomp/
	PR tree-optimization/102192
	* testsuite/libgomp.oacc-fortran/routine-10.f90: Document current
	'-Wuninitialized' diagnostics.
---
 .../testsuite/libgomp.oacc-fortran/routine-10.f90  | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
index 90cca7c1024..9290e90f970 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-10.f90
@@ -1,5 +1,7 @@
 ! { dg-do run }
-!
+
+! { dg-additional-options -Wuninitialized }
+
 module m
   implicit none
 contains
@@ -26,6 +28,13 @@ contains
 
 call add_ps_routine(a, b, c)
   end function add_ef
+  ! This '-Wmaybe-uninitialized' diagnostic appears for '-O2' only; PR102192.
+  ! { dg-xfail-if PR102192 { *-*-* } { -O2 } }
+  ! There's another instance (again '-O2' only) further down, but as any number
+  ! of 'dg-xfail-if' only apply to the first 'dg-bogus' etc., we have no way to
+  ! XFAIL that other one, so we instead match all of them here (via line '0'):
+  ! { dg-bogus {'c' may be used uninitialized} {} { target *-*-* } 0 }
+  ! { TODO_dg-bogus {'c' may be used uninitialized} {} { target *-*-* } .-7 }
 end module m
 
 program main
@@ -44,6 +53,9 @@ program main
   do i = 1, n
  if (i .eq. 4) then
 c_a = add_ef(a_a, b_a)
+! See above.
+! { TODO_dg-xfail-if PR102192 { *-*-* } { -O2 } }
+! { TODO_dg-bogus {'c' may be used uninitialized} {} { target *-*-* } .-3 }
  end if
   end do
   !$acc end parallel
-- 
2.34.1



Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Liška

On 1/13/22 11:47, Martin Jambor wrote:

Hi,

On Tue, Jan 11 2022, Martin Liška wrote:

Hello.

I've got a patch series that does the renaming. It contains of 2 automatic
scripts ([1] and [2]) that were run as:

$ gcc-renaming-candidates.py gcc --rename && git commit -a -m 'Rename files.' && 
rename-gcc.py . -vv && git commit -a -m 'Automatic renaming'

The first scripts does the renaming (with a couple of exceptions that are 
really C files) and saves
the renamed files to a file. Then the file is then loaded and replacement of 
all the renamed files does happen
for most of the GCC files ([2]). It basically replaces at \b${old_filename}\b 
with ${old_filename}c
(with some exceptions). That corresponds to patch #1 and #2 and the patches are 
quite huge.

The last piece are manual changes needed for Makefile.in, configure.ac and so 
on.

The git branch can be seen here:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/cc-renaming

and pulled with:
$ git fetch refs/users/marxin/heads/cc-renaming
$ git co FETCH_HEAD



Thanks for the effort!  I looked at the branch and liked what I saw.


Thanks.


Perhaps only a small nit about the commit message of the 2nd commit
("Automatic renaming of .c files to .cc.") which confused me.  It does
not actually rename any files so I would change it to "change references
to .c files to .cc files" or something like that.


Sure, I'm going to update the commit message.



But I assume the branch will need to be committed squashed anyway, so
commit message worries might be a bit premature.


No, I would like to commit it as 3 separate commits for this reasons:
- git renaming with 100% match should guarantee git would fully work with 
merging and stuff like that
- I would like to distinguish manual changes from these that are only a 
mechanical replacement.

Cheers,
Martin



I am looking forward to seeing it in trunk.

Martin




Re: [PATCH] [12/11/10] Fix invalid format warnings on Windows

2022-01-13 Thread Tomas Kalibera via Gcc-patches

On 1/13/22 10:40 AM, Martin Liška wrote:

[...]
Apart from that, I support the patch (I cannot approve it). Note we're 
now approaching
stage4 and this is definitelly a stage1 material (opens after GCC 
12.1.0 gets released).


Thanks, Martin, I've updated the patch following your suggestions.

Cheers
Tomas




Cheers,
Martin

>From 4db4e6b35be5793902d8820d2c8e4d1f1cbba80d Mon Sep 17 00:00:00 2001
From: Tomas Kalibera 
Date: Thu, 13 Jan 2022 05:25:32 -0500
Subject: [PATCH] c-family: Let stdio.h override built in printf format
 [PR95130,PR92292]

Mingw32 targets use ms_printf format for printf, but mingw-w64 when
configured for UCRT uses gnu_format (via stdio.h).  GCC then checks both
formats, which means that one cannot print a 64-bit integer without a
warning.  All these lines issue a warning:

  printf("Hello %"PRIu64"\n", x);
  printf("Hello %I64u\n", x);
  printf("Hello %llu\n", x);

because each of them violates one of the formats.  Also, one gets a warning
twice if the format string violates both formats.

Fixed by disabling the built in format in case there are additional ones.

gcc/c-family/ChangeLog:

	PR c/95130
	PR c/92292

	* c-common.c (check_function_arguments): Pass also function
	  declaration to check_function_format.

	* c-common.h (check_function_format): Extra argument - function
	  declaration.

	* c-format.c (check_function_format): For builtin functions with a
	  built in format and at least one more, do not check the first one.
---
 gcc/c-family/c-common.c |  2 +-
 gcc/c-family/c-common.h |  2 +-
 gcc/c-family/c-format.c | 32 ++--
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4a6a4edb763..00fc734d28e 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6064,7 +6064,7 @@ check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype,
   /* Check for errors in format strings.  */
 
   if (warn_format || warn_suggest_attribute_format)
-check_function_format (fntype, TYPE_ATTRIBUTES (fntype), nargs, argarray,
+check_function_format (fndecl, fntype, TYPE_ATTRIBUTES (fntype), nargs, argarray,
 			   arglocs);
 
   if (warn_format)
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 8b7bf35e888..ee370eafbbc 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -856,7 +856,7 @@ extern void check_function_arguments_recurse (void (*)
 	  unsigned HOST_WIDE_INT);
 extern bool check_builtin_function_arguments (location_t, vec,
 	  tree, tree, int, tree *);
-extern void check_function_format (const_tree, tree, int, tree *,
+extern void check_function_format (const_tree, const_tree, tree, int, tree *,
    vec *);
 extern bool attribute_fallthrough_p (tree);
 extern tree handle_format_attribute (tree *, tree, tree, int, bool *);
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index afa77810a5c..bc2abee5146 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -1160,12 +1160,13 @@ decode_format_type (const char *s, bool *is_raw /* = NULL */)
attribute themselves.  */
 
 void
-check_function_format (const_tree fntype, tree attrs, int nargs,
+check_function_format (const_tree fndecl, const_tree fntype, tree attrs, int nargs,
 		   tree *argarray, vec *arglocs)
 {
-  tree a;
+  tree a, aa;
 
   tree atname = get_identifier ("format");
+  bool skipped_default_format = false;
 
   /* See if this function has any format attributes.  */
   for (a = attrs; a; a = TREE_CHAIN (a))
@@ -1176,6 +1177,33 @@ check_function_format (const_tree fntype, tree attrs, int nargs,
 	  function_format_info info;
 	  decode_format_attr (fntype, atname, TREE_VALUE (a), &info,
 			  /*validated=*/true);
+
+	  /* Mingw32 targets have traditionally used ms_printf format for the
+	 printf function, and this format is built in GCC. But nowadays,
+	 if mingw-w64 is configured to target UCRT, the printf function
+	 uses the gnu_printf format (specified in the stdio.h header). This
+	 causes GCC to check both formats, which means that there is no way
+	 to e.g. print a long long unsigned without a warning (ms_printf
+	 warns for %llu and gnu_printf warns for %I64u). Also, GCC would warn
+	 twice about the same issue when both formats are violated, e.g.
+	 for %lu used to print long long unsigned.
+
+	 Hence, if there are multiple format specifiers, we skip the first
+	 one. See PR 95130, PR 92292.  */
+
+	  if (!skipped_default_format && fndecl)
+	{
+	  for(aa = TREE_CHAIN (a); aa; aa = TREE_CHAIN(aa))
+		if (is_attribute_p ("format", get_attribute_name (aa)) &&
+		fndecl && fndecl_built_in_p (fndecl, BUILT_IN_NORMAL))
+		  {
+			skipped_default_format = true;
+			break;
+		  }
+	  if (skipped_default_format)
+		continue;
+	}
+
 	  if (warn_format)
 	{
 	  /* FIXME: Rewrite all the internal functions in this file
-- 
2.25.1



[ANNOUNCEMENT] Mass rename of C++ .c files to .cc suffix is going to happen on Jan 17 evening UTC TZ

2022-01-13 Thread Martin Liška

Hello.

Based on the discussion with release managers, the change is going to happen
after stage4 begins.

Martin


Re: [PATCH] libgomp, OpenMP, nvptx: Low-latency memory allocator

2022-01-13 Thread Andrew Stubbs
Updated patch: this version fixes some missed cases of malloc in the 
realloc implementation. It also reworks the unused variable workarounds 
so that the work better with my reworked pinned memory patches I've not 
posted yet.


Andrewlibgomp, nvptx: low-latency memory allocator

This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc.  The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.

The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC and MEMSPACE_ALLOC..
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 07a5645f4cc..1cc7486fc4c 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -34,6 +34,34 @@
 
 #define omp_max_predefined_alloc omp_thread_mem_alloc
 
+/* These macros may be overridden in config//allocator.c.  */
+#ifndef MEMSPACE_ALLOC
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#endif
+#ifndef MEMSPACE_CALLOC
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#endif
+#ifndef MEMSPACE_REALLOC
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#endif
+#ifndef MEMSPACE_FREE
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#endif
+
+/* Map the predefined allocators to the correct memory space.
+   The index to this table is the omp_allocator_handle_t enum value.  */
+static const omp_memspace_handle_t predefined_alloc_mapping[] = {
+  omp_default_mem_space,   /* omp_null_allocator. */
+  omp_default_mem_space,   /* omp_default_mem_alloc. */
+  omp_large_cap_mem_space, /* omp_large_cap_mem_alloc. */
+  omp_default_mem_space,   /* omp_const_mem_alloc. */
+  omp_high_bw_mem_space,   /* omp_high_bw_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_low_lat_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_cgroup_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_pteam_mem_alloc. */
+  omp_low_lat_mem_space,   /* omp_thread_mem_alloc. */
+};
+
 struct omp_allocator_data
 {
   omp_memspace_handle_t memspace;
@@ -281,7 +309,7 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (&allocator_data->lock);
 #endif
-  ptr = malloc (new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -297,7 +325,11 @@ retry:
 }
   else
 {
-  ptr = malloc (new_size);
+  omp_memspace_handle_t memspace __attribute__((unused))
+   = (allocator_data
+  ? allocator_data->memspace
+  : predefined_alloc_mapping[allocator]);
+  ptr = MEMSPACE_ALLOC (memspace, new_size);
   if (ptr == NULL)
goto fail;
 }
@@ -315,32 +347,35 @@ retry:
   return ret;
 
 fail:
-  if (allocator_data)
+  int fallback = (allocator_data
+ ? allocator_data->fallback
+ : allocator == omp_default_mem_alloc
+ ? omp_atv_null_fb
+ : omp_atv_default_mem_fb);
+  switch (fallback)
 {
-  switch (allocator_data->fallback)
+case omp_atv_default_mem_fb:
+  if ((new_alignment > sizeof (void *) && new_alignment > alignment)
+ || (allocator_data
+ && allocator_data->pool_size < ~(uintptr_t) 0)
+ || !allocator_data)
{
-   case omp_atv_default_mem_fb:
- if ((new_alignment > sizeof (void *) && new_alignment > alignment)
- || (allocator_data
- && allocator_data->pool

Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 11:59 AM Martin Liška  wrote:
>
> On 1/13/22 11:47, Martin Jambor wrote:
> > Hi,
> >
> > On Tue, Jan 11 2022, Martin Liška wrote:
> >> Hello.
> >>
> >> I've got a patch series that does the renaming. It contains of 2 automatic
> >> scripts ([1] and [2]) that were run as:
> >>
> >> $ gcc-renaming-candidates.py gcc --rename && git commit -a -m 'Rename 
> >> files.' && rename-gcc.py . -vv && git commit -a -m 'Automatic renaming'
> >>
> >> The first scripts does the renaming (with a couple of exceptions that are 
> >> really C files) and saves
> >> the renamed files to a file. Then the file is then loaded and replacement 
> >> of all the renamed files does happen
> >> for most of the GCC files ([2]). It basically replaces at 
> >> \b${old_filename}\b with ${old_filename}c
> >> (with some exceptions). That corresponds to patch #1 and #2 and the 
> >> patches are quite huge.
> >>
> >> The last piece are manual changes needed for Makefile.in, configure.ac and 
> >> so on.
> >>
> >> The git branch can be seen here:
> >> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=log;h=refs/users/marxin/heads/cc-renaming
> >>
> >> and pulled with:
> >> $ git fetch refs/users/marxin/heads/cc-renaming
> >> $ git co FETCH_HEAD
> >>
> >
> > Thanks for the effort!  I looked at the branch and liked what I saw.
>
> Thanks.
>
> > Perhaps only a small nit about the commit message of the 2nd commit
> > ("Automatic renaming of .c files to .cc.") which confused me.  It does
> > not actually rename any files so I would change it to "change references
> > to .c files to .cc files" or something like that.
>
> Sure, I'm going to update the commit message.
>
> >
> > But I assume the branch will need to be committed squashed anyway, so
> > commit message worries might be a bit premature.
>
> No, I would like to commit it as 3 separate commits for this reasons:
> - git renaming with 100% match should guarantee git would fully work with 
> merging and stuff like that
> - I would like to distinguish manual changes from these that are only a 
> mechanical replacement.

But please make sure all intermediate revs will still build.

Richard.

> Cheers,
> Martin
>
> >
> > I am looking forward to seeing it in trunk.
> >
> > Martin
>


Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Liška

On 1/13/22 12:14, Richard Biener wrote:

But please make sure all intermediate revs will still build.


That's not possible :) I don't it's a good idea mixing .cc renaming
and changes in that files.

Martin


Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread guojiufu via Gcc-patches

On 2022-01-03 22:30, Richard Biener wrote:

On Wed, 22 Dec 2021, Jiufu Guo wrote:


Hi,

Normaly, estimate_numbers_of_iterations get/caculate niter first,
and then invokes infer_loop_bounds_from_undefined. While in some case,
after a few call stacks, estimate_numbers_of_iterations is invoked 
before

niter is ready (e.g. before number_of_latch_executions returns).

e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
  --> estimate_numbers_of_iterations --> 
infer_loop_bounds_from_undefined.


Since niter is still not computed, call to 
infer_loop_bounds_from_undefined

may not get final result.
To avoid infer_loop_bounds_from_undefined to be called with interim 
state
and avoid infer_loop_bounds_from_undefined generates interim data, 
during
niter's computing, we could disable 
flag_aggressive_loop_optimizations.


Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for 
trunk?


So this is a optimality fix, not a correctness one?  I suppose the
estimates are computed/used from scev_probably_wraps_p via
loop_exits_before_overflow and ultimatively chrec_convert.

We have a call cycle here,

estimate_numbers_of_iterations -> number_of_latch_executions ->
... -> estimate_numbers_of_iterations

where the first estimate_numbers_of_iterations will make sure
the later call will immediately return.


Hi Richard,
Thanks for your comments! And sorry for the late reply.

In estimate_numbers_of_iterations, there is a guard to make sure
the second call to estimate_numbers_of_iterations returns
immediately.

Exactly as you said, it relates to scev_probably_wraps_p calls
loop_exits_before_overflow.

The issue is: the first calling to estimate_numbers_of_iterations
maybe inside number_of_latch_executions.



I'm not sure what your patch tries to do - it seems to tackle
the case where we enter the cycle via number_of_latch_executions?
Why do we get "non-final" values?  idx_infer_loop_bounds resorts


Right, when the call cycle starts from number_of_latch_execution,
the issue may occur:

number_of_latch_executions(*1st call)->..->
analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
loop_exits_before_overflow->
estimate_numbers_of_iterations (*1st call)->
number_of_latch_executions(*2nd call)->..->
analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow-> 
estimate_numbers_of_iterations(*2nd call)


The second calling to estimate_numbers_of_iterations returns quickly.
And then, in the first calling to estimate_numbers_of_iterations,
infer_loop_bounds_from_undefined is invoked.

And, function "infer_loop_bounds_from_undefined" instantiate/analyze
SCEV for each SSA in the loop.
*Here the issue occur*, these SCEVs are based on the interim IV's
SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
and those IV's SCEV will be overridden by up level
"analyze_scalar_evolution(IVs 1st)".

To handle this issue, disabling flag_aggressive_loop_optimizations
inside number_of_latch_executions is one method.
To avoid the issue in other cases, e.g. the call cycle starts from
number_of_iterations_exit or number_of_iterations_exit_assumptions,
this patch disable flag_aggressive_loop_optimizations inside
number_of_iterations_exit_assumptions.

Thanks again.

BR,
Jiufu


to SCEV and thus may recurse again - to me it would be more
logical to try avoid recursing in number_of_latch_executions by
setting ->nb_iterations to something early, maybe chrec_dont_know,
to signal we're using something we're just trying to compute.

Richard.


BR,
Jiufu

gcc/ChangeLog:

* tree-ssa-loop-niter.c (number_of_iterations_exit_assumptions):
Disable/restore flag_aggressive_loop_optimizations.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/scev-16.c: New test.

---
 gcc/tree-ssa-loop-niter.c   | 23 +++
 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c | 20 
 2 files changed, 39 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/scev-16.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 06954e437f5..51bb501019e 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2534,18 +2534,31 @@ number_of_iterations_exit_assumptions (class 
loop *loop, edge exit,

   && !POINTER_TYPE_P (type))
 return false;

+  /* Before niter is calculated, avoid to analyze interim state. */
+  int old_aggressive_loop_optimizations = 
flag_aggressive_loop_optimizations;

+  flag_aggressive_loop_optimizations = 0;
+
   tree iv0_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop, loop_containing_stmt (stmt),
  op0, &iv0, safe ? &iv0_niters : NULL, false))
-return number_of_iterations_popcount (loop, exit, code, niter);
+{
+  bool res = number_of_iterations_popcount (loop, exit, code, 
niter);
+  flag_aggressive_loop_optimizations = 
old_aggressive_loop_optimizations;

+  return res;
+}
   tree iv1_niters = NULL_TREE;
   if (!simple_iv_with_niters (loop,

Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 12:20:57PM +0100, Martin Liška wrote:
> On 1/13/22 12:14, Richard Biener wrote:
> > But please make sure all intermediate revs will still build.
> 
> That's not possible :) I don't it's a good idea mixing .cc renaming
> and changes in that files.

I think it is possible, but would require more work.
Comments in the files don't matter for sure, and in the Makefiles we
could do (just one random file can be checked):
ifeq (,$(wildcard $(srcdir)/expr.cc))
what we used to do
else
what we want to do newly
endif
A commit that changes the Makefiles that way comes first, then
the renaming commit, then a commit that removes those ifeq ... else
and endif lines.

Jakub



Re: [PATCH] PR fortran/67804 - ICE on data initialization of type(character) with wrong data

2022-01-13 Thread Mikael Morin

Le 12/01/2022 à 21:29, Harald Anlauf via Fortran a écrit :

Dear Fortranners,

the attached patch improves error recovery after an invalid
structure constructor has been detected in a DATA statement.

Testcase by Gerhard.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

This should be a rather safe patch which I would like to
backport to 11-branch after a suitable waiting period.


OK; thanks.


Merge 'c-c++-common/goacc/routine-6.c' into 'c-c++-common/goacc/routine-5.c', and document current C/C++ difference (was: [PATCH] openacc: Fix up C++ #pragma acc routine handling [PR101731])

2022-01-13 Thread Thomas Schwinge
Hi!

On 2021-11-22T16:02:31+0100, Jakub Jelinek via Gcc-patches 
 wrote:
> On Mon, Nov 22, 2021 at 03:49:42PM +0100, Thomas Schwinge wrote:
>> Then, regarding the user-visible behavior:
>>
>> > +#pragma acc routine  /* { dg-error "not immediately followed by a single 
>> > function declaration or definition" "" { target c++ } } */
>> > +int foo (int bar ());
>>
>> So in C++ we now refuse, but in C we do accept this.  I suppose I shall
>> look into making C behave the same way -- unless there is a reason for
>> the different behavior?  And/or, is it actually is useful to allow such
>> nested usage?  Per its associated clauses, an OpenACC 'routine' directive
>> really is meant to apply to one function only, in contrast to OpenMP
>> 'target declare'.  But the question is whether we should raise an error
>> for the example above, or whether the 'routine' shall just apply to 'foo'
>> but not 'bar', but without an error diagnostic?
>
> All I've verified is that our OpenMP code handles it the same way,

Thanks for the explanation.

Pushed to master branch commit 67fdcc8835665b5bc13652205e815e498d65c5a1
"Merge 'c-c++-common/goacc/routine-6.c' into
'c-c++-common/goacc/routine-5.c', and document current C/C++ difference",
see attached.


Grüße
 Thomas


> i.e.
> #pragma omp declare simd
> int foo (int bar ());
> is accepted in C and rejected in C++.
> I guess one question is to check if it is in both languages actually
> the same thing.  If we want to accept it in C++ and let the pragma
> apply only to the outer declaration, I guess we'd need to temporarily
> set to NULL parser->omp_declare_simd and parser->oacc_routine while
> parsing the parameters of a function declaration or definition.
> At least OpenMP is fairly fuzzy here, the reason we error on
> #pragma omp declare simd
> int foo (), i;
> has been mainly some discussions in the lang committee and the fact
> that it talks about a single declaration, not all affected declarations.
> Whether int foo (int bar ()); should be in that light treated as two
> function declarations or one with another one nested in it and irrelevant
> for it is unclear.
>
>   Jakub


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 67fdcc8835665b5bc13652205e815e498d65c5a1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Nov 2021 16:09:09 +0100
Subject: [PATCH] Merge 'c-c++-common/goacc/routine-6.c' into
 'c-c++-common/goacc/routine-5.c', and document current C/C++ difference

	gcc/testsuite/
	* c-c++-common/goacc/routine-6.c: Merge into...
	* c-c++-common/goacc/routine-5.c: ... this, and document current
	C/C++ difference.
---
 gcc/testsuite/c-c++-common/goacc/routine-5.c | 8 
 gcc/testsuite/c-c++-common/goacc/routine-6.c | 4 
 2 files changed, 8 insertions(+), 4 deletions(-)
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/routine-6.c

diff --git a/gcc/testsuite/c-c++-common/goacc/routine-5.c b/gcc/testsuite/c-c++-common/goacc/routine-5.c
index e3fbd6573b8..94678f2bf5b 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -94,6 +94,14 @@ typedef struct c_2 c_2;
 #pragma acc routine /* { dg-error ".#pragma acc routine. not immediately followed by function declaration or definition" } */
 struct d_2 {} d_2;
 
+/* PR c++/101731 */
+/* Regarding the current C/C++ difference, see
+   .  */
+#pragma acc routine /* { dg-error "not immediately followed by a single function declaration or definition" "" { target c++ } } */
+int pr101731_foo (int pr101731_bar ());
+#pragma acc routine (pr101731_foo) vector /* { dg-error "has already been marked with an OpenACC 'routine' directive" "" { target c } } */
+#pragma acc routine (pr101731_bar) vector /* { dg-error "'pr101731_bar' has not been declared" } */
+
 #pragma acc routine /* { dg-error ".#pragma acc routine. not immediately followed by function declaration or definition" } */
 #pragma acc routine
 int fn4 (void);
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c
deleted file mode 100644
index 0a231a015a7..000
--- a/gcc/testsuite/c-c++-common/goacc/routine-6.c
+++ /dev/null
@@ -1,4 +0,0 @@
-/* PR c++/101731 */
-
-#pragma acc routine	/* { dg-error "not immediately followed by a single function declaration or definition" "" { target c++ } } */
-int foo (int bar ());
-- 
2.34.1



Re: [PATCH] Mass rename of C++ .c files to .cc suffix

2022-01-13 Thread Martin Jambor
On Thu, Jan 13 2022, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jan 13, 2022 at 12:20:57PM +0100, Martin Liška wrote:
>> On 1/13/22 12:14, Richard Biener wrote:
>> > But please make sure all intermediate revs will still build.
>> 
>> That's not possible :) I don't it's a good idea mixing .cc renaming
>> and changes in that files.
>
> I think it is possible, but would require more work.
> Comments in the files don't matter for sure, and in the Makefiles we
> could do (just one random file can be checked):
> ifeq (,$(wildcard $(srcdir)/expr.cc))
> what we used to do
> else
> what we want to do newly
> endif
> A commit that changes the Makefiles that way comes first, then
> the renaming commit, then a commit that removes those ifeq ... else
> and endif lines.
>

I would expect that the problematic case is only when you modify a file
that you also rename.  Is there any such file where we do more than
adjust comments, where the contents modifications are essential for
bootstrap too?

I would expect that modifications in Makefiles, configure-scripts etc
could go in the same commit as the renames and these could be then
followed up with comments adjustments and similar.

But it would be more work, so I guess just using git bisect skip if
bisection ever lands in the middle of this is acceptable in this special
case too.

Martin



Re: [committed] libgomp/testsuite: Improve omp_get_device_num() tests

2022-01-13 Thread Thomas Schwinge
Hi!

On 2022-01-04T15:12:58+0100, Tobias Burnus  wrote:
> This commit r12-6209 now makes the testcases iterate over all devices
> (including the initial/host device).
>
> Hence, with multiple non-host devices and this test, the error had been
> found before ... ;-)

Yay for test cases!  :-)

... but we now run into issues if Intel MIC (emulated) offloading is
(additionally) enabled, because that one still doesn't properly implement
device-side 'omp_get_device_num'.  ;-)

Thus pushed to master branch
commit d97364aab1af361275b87713154c366ce2b9029a
"Improve Intel MIC offloading XFAILing for 'omp_get_device_num'", see
attached.

(It wasn't obvious to me how to implement that; very incomplete
"[WIP] Intel MIC 'omp_get_device_num'" attached, not planning on working
on this any further.)


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From d97364aab1af361275b87713154c366ce2b9029a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jan 2022 19:52:25 +0100
Subject: [PATCH] Improve Intel MIC offloading XFAILing for
 'omp_get_device_num'

After recent commit be661959a6b6d8f9c3c8608a746789e7b2ec3ca4
"libgomp/testsuite: Improve omp_get_device_num() tests", we're now iterating
over all OpenMP target devices.  Intel MIC (emulated) offloading still doesn't
properly implement device-side 'omp_get_device_num', and we thus regress:

PASS: libgomp.c/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/../libgomp.c-c++-common/target-45.c execution test

PASS: libgomp.c++/../libgomp.c-c++-common/target-45.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c++/../libgomp.c-c++-common/target-45.c execution test

PASS: libgomp.fortran/target10.f90   -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O0  execution test
PASS: libgomp.fortran/target10.f90   -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O1  execution test
PASS: libgomp.fortran/target10.f90   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O2  execution test
PASS: libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: libgomp.fortran/target10.f90   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -O3 -g  execution test
PASS: libgomp.fortran/target10.f90   -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target10.f90   -Os  execution test

Improve the XFAILing added in commit bb75b22aba254e8ff144db27b1c8b4804bad73bb
"Allow matching Intel MIC in OpenMP 'declare variant'" for the case that *any*
Intel MIC offload device is available.

	libgomp/
	* testsuite/libgomp.c-c++-common/on_device_arch.h
	(any_device_arch, any_device_arch_intel_mic): New.
	* testsuite/lib/libgomp.exp
	(check_effective_target_offload_device_any_intel_mic): New.
	* testsuite/libgomp.c-c++-common/target-45.c: Use it.
	* testsuite/libgomp.fortran/target10.f90: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp | 12 +-
 .../libgomp.c-c++-common/on_device_arch.h | 23 +++
 .../libgomp.c-c++-common/target-45.c  |  2 +-
 .../testsuite/libgomp.fortran/target10.f90|  2 +-
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 57fb6b068f3..8c5ecfff0ac 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -451,7 +451,6 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 # Return 1 if using Intel MIC offload device.
 proc check_effective_target_offload_device_intel_mic { } {
 return [check_runtime_nocache offload_device_intel_mic {
-  #include 
   #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
   int main ()
 	{
@@ -460,6 +459,17 @@ proc check_effective_target_offload_device_intel_mic { } {
 } ]
 }
 
+# Return 1 if any Intel MIC offload device is available.
+proc check_effective_target_offload_device_any_intel_mic { } {
+return [check_runtime_nocache offload_device_any_intel_mic {
+  #include "testsuite/libgomp.c-c++-common/on_device_arch.h"
+  int main ()
+	{
+	  return !any_device_arch_intel_mic ();
+	}
+} ]
+}
+
 # Return 1 if the OpenACC 'host' device type is selected.
 
 proc check_effective_target_openacc_host_selected { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h b/libgomp/testsuite/libgomp.c-c++-common/on_device_arch.h
index e

Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread Kewen.Lin via Gcc-patches
on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> on 2022/1/13 上午11:44, David Edelsohn wrote:
>> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
>>>
>>> Hi David,
>>>
>>> on 2022/1/13 上午11:07, David Edelsohn wrote:
 On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>
> Hi,
>
> This patch is to fix register constraint v with
> rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> just like some other existing register constraints with
> RS6000_CONSTRAINT_*.
>
> I happened to see this and hope it's not intentional and just
> got neglected.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?

 Why do you want to make this change?

 rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

 but all of the patterns that use a "v" constraint are (or should be)
 protected by TARGET_ALTIVEC, or some final condition that only is
 active for TARGET_ALTIVEC.  The other constraints are conditionally
 set because they can be used in a pattern with multiple alternatives
 where the pattern itself is active but some of the constraints
 correspond to NO_REGS when some instruction variants for VSX is not
 enabled.

>>>
>>> Good point!  Thanks for the explanation.
>>>
 The change isn't wrong, but it doesn't correct a bug and provides no
 additional benefit nor clarty that I can see.

>>>
>>> The original intention is to make it consistent with the other existing
>>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
>>> weird (like was neglected).  After you clarified above, RS6000_CONSTRAINT_v
>>> seems useless at all in the current framework.  Do you prefer to remove
>>> it to avoid any confusions instead?
>>
>> It's used in the reg_class, so there may be some heuristic in the GCC
>> register allocator that cares about the number of registers available
>> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
>> conditionally, so it seems best to leave it as is.
>>
> 
> I may miss something, but I didn't find it's used for the above purposes.
> If it's best to leave it as is, the proposed patch seems to offer better
> readability.

Two more inputs for maintainers' decision:

1) the original proposed patch fixed one "bug" that is:

In function rs6000_debug_reg_global, it tries to print the register class
for the register constraint:

  fprintf (stderr,
   "\n"
   "d  reg_class = %s\n"
   "f  reg_class = %s\n"
   "v  reg_class = %s\n"
   "wa reg_class = %s\n"
   ...
   "\n",
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
   ...

It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
set here:

  /* Add conditional constraints based on various options, to allow us to
 collapse multiple insn patterns.  */
  if (TARGET_ALTIVEC)
rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;

But the actual register class for register constraint is hardcoded as
ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

2) Bootstrapped and tested one below patch to remove all the code using
RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
powerpc64-linux-gnu P8 and P7 with no regressions.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 37f07fe5358..3652629c5d0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
   "\n"
   "d  reg_class = %s\n"
   "f  reg_class = %s\n"
-  "v  reg_class = %s\n"
   "wa reg_class = %s\n"
   "we reg_class = %s\n"
   "wr reg_class = %s\n"
@@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
   "\n",
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
-  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
@@ -2984,11 +2982,6 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
   if (TARGET_VSX)
 rs6000_constraints[RS6000_CONSTRAINT_wa] = VSX_REGS;

-  /* Add conditional constraints based on various options, to allow us to
- collapse multiple insn patterns.  */
-  if (TARGET_ALTIVEC)
-rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
-
   if (TARGET_POWERPC64)
 {
   rs6000_constraints[RS6000_CONSTRAINT_wr] = GENERAL_REGS;
diff --git a/gcc/config/rs6

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:

> This time to the list too (sorry for double email)
> 
> Hi,
> 
> The original patch '[vect] Re-analyze all modes for epilogues', skipped modes
> that should not be skipped since it used the vector mode provided by
> autovectorize_vector_modes to derive the minimum VF required for it. However,
> those modes should only really be used to dictate vector size, so instead this
> patch looks for the mode in 'used_vector_modes' with the largest element size,
> and constructs a vector mode with the smae size as the current
> vector_modes[mode_i]. Since we are using the largest element size the NUNITs
> for this mode is the smallest possible VF required for an epilogue with this
> mode and should thus skip only the modes we are certain can not be used.
> 
> Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.

Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size (and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Thanks,
Richard.

> gcc/ChangeLog:
> 
>     PR 103997
>     * tree-vect-loop.c (vect_analyze_loop): Fix mode skipping for 
> epilogue
>     vectorization.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread Kewen.Lin via Gcc-patches
Hi David,

on 2022/1/13 上午11:12, David Edelsohn wrote:
> On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
>> GET_MODE_NUNITS, which can use known constant instead.
> 
> I'll let Segher decide, but often the additional code is useful
> self-documentation instead of magic constants.  Or at least the change
> requires comments documenting the derivation of the constants
> currently described by the code itself.
> 

Thanks for the comments, I added some comments as suggested, also removed
the whole "altivec_vreveti2" since I noticed it's useless, it's not used
by any built-in functions and even unused in the commit db042e1603db50573.

The updated version has been tested as before.

BR,
Kewen
-
gcc/ChangeLog:

* config/rs6000/altivec.md (altivec_vreveti2): Remove.
* config/rs6000/vsx.md (*vsx_extract_si, *vsx_extract_si_float_df,
*vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
known constant values to simplify code.
---
 gcc/config/rs6000/altivec.md | 25 -
 gcc/config/rs6000/vsx.md | 12 
 2 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c2312cc1e0f..b7f056f8c60 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
   DONE;
 })

-;; Vector reverse elements
-(define_expand "altivec_vreveti2"
-  [(set (match_operand:TI 0 "register_operand" "=v")
-   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
- UNSPEC_VREVEV))]
-  "TARGET_ALTIVEC"
-{
-  int i, j, size, num_elements;
-  rtvec v = rtvec_alloc (16);
-  rtx mask = gen_reg_rtx (V16QImode);
-
-  size = GET_MODE_UNIT_SIZE (TImode);
-  num_elements = GET_MODE_NUNITS (TImode);
-
-  for (j = 0; j < num_elements; j++)
-for (i = 0; i < size; i++)
-  RTVEC_ELT (v, i + j * size)
-   = GEN_INT (i + (num_elements - 1 - j) * size);
-
-  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
-  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
-operands[1], mask));
-  DONE;
-})
-
 ;; Vector reverse elements for V16QI V8HI V4SI V4SF
 (define_expand "altivec_vreve2"
   [(set (match_operand:VEC_K 0 "register_operand" "=v")
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 802db0d112b..d246410880d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
   rtx vec_tmp = operands[3];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
   rtx v4si_tmp = operands[3];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4273,8 +4275,9 @@ (define_insn_and_split "*vsx_extract_si_float_"
   rtx df_tmp = operands[4];
   int value;

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
+element = GEN_INT (3 - INTVAL (element));

   /* If the value is in the correct position, we can avoid doing the VSPLT
  instruction.  */
@@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
 {
   int ele = INTVAL (operands[4]);

+  /* Adjust index for LE element ordering.  */
   if (!BYTES_BIG_ENDIAN)
-ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
+ele = 3 - ele;

   operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
   return "xxinsertw %x0,%x2,%4";
--
2.27.0



Re: [PATCH] [gfortran] Add support for allocate clause (OpenMP 5.0).

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Tue, Jan 11, 2022 at 10:31:54PM +, Hafiz Abid Qadeer wrote:
> +   gfc_omp_namelist *n;
> +   for (n = *head; n; n = n->next)

Better
  for (gfc_omp_namelist *n = *head; n; n = n->next)
as we are in C++ and n isn't used after the loop.

> +  /* non-composite constructs.  */

Capital N

Ok for trunk with these nits fixed, no need to repost.

Jakub



Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Richard Sandiford via Gcc-patches
Martin Sebor via Gcc-patches  writes:
> On 1/12/22 02:02, Martin Liška wrote:
>> Hello.
>> 
>> We've got -Wformat-diag for some time and I think we should start using it
>> in -Werror for GCC bootstrap. The following patch removes last pieces of 
>> the warning
>> for rs6000 target.
>> 
>> Ready to be installed?
>> Thanks,
>> Martin
>> 
>> 
>> gcc/ChangeLog:
>> 
>>  * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Wrap
>>  keywords and use %qs instead of %<%s%>.
>>  (rs6000_expand_builtin): Likewise.
>> 
>> gcc/testsuite/ChangeLog:
>> 
>>  * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Adjust scans in
>>  testcases.
>>  * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Likewise.
>>  * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Likewise.
>> ---
>>   gcc/config/rs6000/rs6000-call.c   | 8 
>>   .../gcc.target/powerpc/bfp/scalar-extract-exp-5.c | 2 +-
>>   .../gcc.target/powerpc/bfp/scalar-extract-sig-5.c | 2 +-
>>   .../gcc.target/powerpc/bfp/scalar-insert-exp-11.c | 2 +-
>>   4 files changed, 7 insertions(+), 7 deletions(-)
>> 
>> diff --git a/gcc/config/rs6000/rs6000-call.c 
>> b/gcc/config/rs6000/rs6000-call.c
>> index c78b8b08c40..becdad73812 100644
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -3307,7 +3307,7 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins 
>> fncode)
>>    "-mvsx");
>>     break;
>>   case ENB_IEEE128_HW:
>> -  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
>> +  error ("%qs requires ISA 3.0 IEEE 128-bit floating-point", name);
>
> The instances of the warning where floating point is at the end
> of a message aren't correct.  The warning should be relaxed to
> allow unhyphenated floating point as a noun (as discussed briefly
> last March:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566881.html)

Wouldn't it be fair to say that “floating point” in the message above is
really an adjective modifying an implicit noun?  The floating (decimal)
point doesn't itself have 128 bits.

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.

Thanks,
Richard


Re: [PATCH] disable aggressive_loop_optimizations until niter ready

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, guojiufu wrote:

> On 2022-01-03 22:30, Richard Biener wrote:
> > On Wed, 22 Dec 2021, Jiufu Guo wrote:
> > 
> >> Hi,
> >> 
> >> Normaly, estimate_numbers_of_iterations get/caculate niter first,
> >> and then invokes infer_loop_bounds_from_undefined. While in some case,
> >> after a few call stacks, estimate_numbers_of_iterations is invoked before
> >> niter is ready (e.g. before number_of_latch_executions returns).
> >> 
> >> e.g. number_of_latch_executions->...follow_ssa_edge_expr-->
> >>   --> estimate_numbers_of_iterations --> 
> >> infer_loop_bounds_from_undefined.
> >> 
> >> Since niter is still not computed, call to infer_loop_bounds_from_undefined
> >> may not get final result.
> >> To avoid infer_loop_bounds_from_undefined to be called with interim state
> >> and avoid infer_loop_bounds_from_undefined generates interim data, during
> >> niter's computing, we could disable flag_aggressive_loop_optimizations.
> >> 
> >> Bootstrap and regtest pass on ppc64* and x86_64.  Is this ok for trunk?
> > 
> > So this is a optimality fix, not a correctness one?  I suppose the
> > estimates are computed/used from scev_probably_wraps_p via
> > loop_exits_before_overflow and ultimatively chrec_convert.
> > 
> > We have a call cycle here,
> > 
> > estimate_numbers_of_iterations -> number_of_latch_executions ->
> > ... -> estimate_numbers_of_iterations
> > 
> > where the first estimate_numbers_of_iterations will make sure
> > the later call will immediately return.
> 
> Hi Richard,
> Thanks for your comments! And sorry for the late reply.
> 
> In estimate_numbers_of_iterations, there is a guard to make sure
> the second call to estimate_numbers_of_iterations returns
> immediately.
> 
> Exactly as you said, it relates to scev_probably_wraps_p calls
> loop_exits_before_overflow.
> 
> The issue is: the first calling to estimate_numbers_of_iterations
> maybe inside number_of_latch_executions.
> 
> > 
> > I'm not sure what your patch tries to do - it seems to tackle
> > the case where we enter the cycle via number_of_latch_executions?
> > Why do we get "non-final" values?  idx_infer_loop_bounds resorts
> 
> Right, when the call cycle starts from number_of_latch_execution,
> the issue may occur:
> 
> number_of_latch_executions(*1st call)->..->
> analyze_scalar_evolution(IVs 1st) ->..follow_ssa_edge_expr..->
> loop_exits_before_overflow->
> estimate_numbers_of_iterations (*1st call)->
> number_of_latch_executions(*2nd call)->..->
> analyze_scalar_evolution(IVs 2nd)->..loop_exits_before_overflow->
> estimate_numbers_of_iterations(*2nd call)
> 
> The second calling to estimate_numbers_of_iterations returns quickly.
> And then, in the first calling to estimate_numbers_of_iterations,
> infer_loop_bounds_from_undefined is invoked.
> 
> And, function "infer_loop_bounds_from_undefined" instantiate/analyze
> SCEV for each SSA in the loop.
> *Here the issue occur*, these SCEVs are based on the interim IV's
> SCEV which come from "analyze_scalar_evolution(IVs 2nd)",
> and those IV's SCEV will be overridden by up level
> "analyze_scalar_evolution(IVs 1st)".

OK, so indeed analyze_scalar_evolution is not protected against
recursive invocation on the same SSA name (though it definitely
doesn't expect to do that).  We could fix that by pre-seeding
the cache conservatively in analyze_scalar_evolution or by
not overwriting the cached result of the recursive invocation.

But to re-iterate an unanswered question, is this a correctness issue
or an optimization issue?

> To handle this issue, disabling flag_aggressive_loop_optimizations
> inside number_of_latch_executions is one method.
> To avoid the issue in other cases, e.g. the call cycle starts from
> number_of_iterations_exit or number_of_iterations_exit_assumptions,
> this patch disable flag_aggressive_loop_optimizations inside
> number_of_iterations_exit_assumptions.

But disabling flag_aggressive_loop_optimizations is a very
non-intuitive way of avoiding recursive calls.  I'd rather
avoid those in a similar way estimate_numbers_of_iterations does,
for example with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..cc1e510b6c2 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2807,7 +2807,7 @@ number_of_latch_executions (class loop *loop)
   if (dump_file && (dump_flags & TDF_SCEV))
 fprintf (dump_file, "(number_of_iterations_in_loop = \n");
 
-  res = chrec_dont_know;
+  loop->nb_iterations = res = chrec_dont_know;
   exit = single_exit (loop);
 
   if (exit && number_of_iterations_exit (loop, exit, &niter_desc, false))

though this doesn't seem to improve the SCEV analysis with your
testcase.  Alternatively one could more conciously compute an
"estimated" estimate like with

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 61d72c278a1..8529c44d574 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -2802,6 +2802,19 @@ number_of_latch_ex

Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> Hi!
> 
> When writing the PR98737 fix, I've handled just the case where people
> use __atomic_op_fetch (p, x, y) etc.
> But some people actually use the other builtins, like
> __atomic_fetch_op (p, x, y) op x.
> The following patch canonicalizes the latter to the former and vice versa
> when possible if the result of the builtin is a single use and if
> that use is a cast with same precision, also that cast's lhs has a single
> use.
> For all ops of +, -, &, | and ^ we can do those
> __atomic_fetch_op (p, x, y) op x -> __atomic_op_fetch (p, x, y)
> (and __sync too) opts, but cases of INTEGER_CST and SSA_NAME x
> behave differently.  For INTEGER_CST, typically - x is
> canonicalized to + (-x), while for SSA_NAME we need to handle various
> casts, which sometimes happen on the second argument of the builtin
> (there can be even two subsequent casts for char/short due to the
> promotions we do) and there can be a cast on the argument of op too.
> And all ops but - are commutative.
> For the other direction, i.e.
> __atomic_op_fetch (p, x, y) rop x -> __atomic_fetch_op (p, x, y)
> we can't handle op of & and |, those aren't reversible, for
> op + rop is -, for - rop is + and for ^ rop is ^, otherwise the same
> stuff as above applies.
> And, there is another case, we canonicalize
> x - y == 0 (or != 0) and x ^ y == 0 (or != 0) to x == y (or x != y)
> and for constant y x + y == 0 (or != 0) to x == -y (or != -y),
> so the patch also virtually undoes those canonicalizations, because
> e.g. for the earlier PR98737 patch but even generally, it is better
> if a result of atomic op fetch is compared against 0 than doing
> atomic fetch op and compare it to some variable or non-zero constant.
> As for debug info, for non-reversible operations (& and |) the patch
> resets debug stmts if there are any, for -fnon-call-exceptions too
> (didn't want to include debug temps right before all uses), but
> otherwise it emits the reverse operation from the result as a debug
> temp and uses that in debug stmts.
> 
> On the emitted assembly for the testcases which are fairly large,
> I see substantial decreases of the *.s size:
> -rw-rw-r--. 1 jakub jakub 116897 Jan 13 09:58 pr98737-1.svanilla
> -rw-rw-r--. 1 jakub jakub  93861 Jan 13 09:57 pr98737-1.spatched
> -rw-rw-r--. 1 jakub jakub  70257 Jan 13 09:57 pr98737-2.svanilla
> -rw-rw-r--. 1 jakub jakub  67537 Jan 13 09:57 pr98737-2.spatched
> There are some functions where due to RA we get one more instruction
> than previously, but most of them are smaller even when not hitting
> the PR98737 previous patch's optimizations.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2022-01-13  Jakub Jelinek  
> 
>   PR target/98737
>   * tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
>   __atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
>   and __atomic_op_fetch (p, x, y) iop x into
>   __atomic_fetch_op (p, x, y).
> 
>   * gcc.dg/tree-ssa/pr98737-1.c: New test.
>   * gcc.dg/tree-ssa/pr98737-2.c: New test.
> 
> --- gcc/tree-ssa-forwprop.c.jj2022-01-11 23:11:23.467275019 +0100
> +++ gcc/tree-ssa-forwprop.c   2022-01-12 22:12:24.666522743 +0100
> @@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
> memset (p + 4, ' ', 3);
> into
> memcpy (p, "abcd   ", 7);
> -   call if the latter can be stored by pieces during expansion.  */
> +   call if the latter can be stored by pieces during expansion.
> +
> +   Also canonicalize __atomic_fetch_op (p, x, y) op x
> +   to __atomic_op_fetch (p, x, y) or
> +   __atomic_op_fetch (p, x, y) iop x
> +   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
>  
>  static bool
>  simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
>  {
>gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
> +  enum built_in_function other_atomic = END_BUILTINS;
> +  enum tree_code atomic_op = ERROR_MARK;
>tree vuse = gimple_vuse (stmt2);
>if (vuse == NULL)
>  return false;
> @@ -1448,6 +1455,300 @@ simplify_builtin_call (gimple_stmt_itera
>   }
>   }
>break;
> +
> + #define CASE_ATOMIC(NAME, OTHER, OP) \
> +case BUILT_IN_##NAME##_1:
> \
> +case BUILT_IN_##NAME##_2:
> \
> +case BUILT_IN_##NAME##_4:
> \
> +case BUILT_IN_##NAME##_8:
> \
> +case BUILT_IN_##NAME##_16:   
> \
> +  atomic_op = OP;
> \
> +  other_atomic   \
> + = (enum built_in_function) (BUILT_IN_##OTHER##_1\
> + + (DECL_FUNCTION_CODE (callee2) \
> +- BUILT_

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/01/2022 12:36, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped modes
that should not be skipped since it used the vector mode provided by
autovectorize_vector_modes to derive the minimum VF required for it. However,
those modes should only really be used to dictate vector size, so instead this
patch looks for the mode in 'used_vector_modes' with the largest element size,
and constructs a vector mode with the smae size as the current
vector_modes[mode_i]. Since we are using the largest element size the NUNITs
for this mode is the smallest possible VF required for an epilogue with this
mode and should thus skip only the modes we are certain can not be used.

Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.
True, I'll start with QImode instead of the inner of 
vector_modes[mode_i] too since we can't guarantee the mode is a 
VECTOR_MODE_P and it is actually better too since we can't possible 
guarantee the element size of the USED_VECTOR_MODES is smaller than that 
of the first vector mode...



Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size
I suspect that should be fine though, since if we use the largest 
element size of all used_vector_modes then that should gives us the 
least possible number of NUNITS and thus only conservatively skip. That 
said, that does assume that no vector mode used may be larger than the 
size of the loop's vector_mode. Can I assume that?


(and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Yeah I think so too, thanks!

I keep going back to thinking (as I brought up in the bugzilla ticket), 
maybe we ought to only skip if the NUNITS of the vector mode with the 
same vector size as vector_modes[mode_i] is larger than first_info_vf, 
or just don't skip at all...




Re: [PATCH] libgomp, openmp: pinned memory

2022-01-13 Thread Andrew Stubbs

On 05/01/2022 17:07, Andrew Stubbs wrote:
I don't believe 64KB will be anything like enough for any real HPC 
application. Is it really worth optimizing for this case?


Anyway, I'm working on an implementation using mmap instead of malloc 
for pinned allocations. I figure that will simplify the unpin algorithm 
(because it'll be munmap) and optimize for large allocations such as I 
imagine HPC applications will use. It won't fix the ulimit issue.


Here's my new patch.

This version is intended to apply on top of the latest version of my 
low-latency allocator patch, although the dependency is mostly textual.


Pinned memory is allocated via mmap + mlock, and allocation fails 
(returns NULL) if the lock fails and there's no fallback configured.


This means that large allocations will now be page aligned and therefore 
pin the smallest number of pages for the size requested, and that that 
memory will be unpinned automatically when freed via munmap, or moved 
via mremap.


Obviously this is not ideal for allocations much smaller than one page. 
If that turns out to be a problem in the real world then we can add a 
special case fairly straight-forwardly, and incur the extra page 
tracking expense in those cases only, or maybe implement our own 
pinned-memory heap (something like already proposed for low-latency 
memory, perhaps).


Also new is a realloc implementation that works better when reallocation 
fails. This is confirmed by the new testcases.


OK for stage 1?

Thanks

Andrewlibgomp: pinned memory

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall.  Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index 1cc7486fc4c..5ab161b6314 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -36,16 +36,20 @@
 
 /* These macros may be overridden in config//allocator.c.  */
 #ifndef MEMSPACE_ALLOC
-#define MEMSPACE_ALLOC(MEMSPACE, SIZE) malloc (SIZE)
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : malloc (SIZE))
 #endif
 #ifndef MEMSPACE_CALLOC
-#define MEMSPACE_CALLOC(MEMSPACE, SIZE) calloc (1, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+  (PIN ? NULL : calloc (1, SIZE))
 #endif
 #ifndef MEMSPACE_REALLOC
-#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE) realloc (ADDR, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+  ((PIN) || (OLDPIN) ? NULL : realloc (ADDR, SIZE))
 #endif
 #ifndef MEMSPACE_FREE
-#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE) free (ADDR)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+  (PIN ? NULL : free (ADDR))
 #endif
 
 /* Map the predefined allocators to the correct memory space.
@@ -208,7 +212,7 @@ omp_init_allocator (omp_memspace_handle_t memspace, int 
ntraits,
 data.alignment = sizeof (void *);
 
   /* No support for these so far (for hbw will use memkind).  */
-  if (data.pinned || data.memspace == omp_high_bw_mem_space)
+  if (data.memspace == omp_high_bw_mem_space)
 return omp_null_allocator;
 
   ret = gomp_malloc (sizeof (struct omp_allocator_data));
@@ -309,7 +313,8 @@ retry:
   allocator_data->used_pool_size = used_pool_size;
   gomp_mutex_unlock (&allocator_data->lock);
 #endif
-  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size);
+  ptr = MEMSPACE_ALLOC (allocator_data->memspace, new_size,
+   allocator_data->pinned);
   if (ptr == NULL)
{
 #ifdef HAVE_SYNC_BUILTINS
@@ -329,7 +334,8 @@ retry:
= (allocator_data
   ? allocator_data->memspace
   : predefined_alloc_mapping[allocator]);
-  ptr = MEMSPACE_ALLOC (memspace, new_size);
+  ptr = MEMSPACE_ALLOC (memspace, new_size,
+   allocator_data && allocator_data->pinned);
   if (ptr == NULL)
goto fail;
 }
@@ -356,9 +362,9 @@ fail:
 {
 case omp_atv_default_mem_fb:
   if ((new_alignment > sizeof (void *) && new_alignment > alignment)
- || (allocator_data
- && allocator_da

[PATCH] tree-optimization/96707 - Add relation to unsigned right shift.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

A quick addition to range ops for

LHS = OP1 >> OP2

if OP1 and OP2 are both >= 0,   then we can register the relation  LHS 
<= OP1   and all the expected good things happen.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

Andrew
From c34dab537d6f54b66b430f5980cde278fa033904 Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 12 Jan 2022 13:28:55 -0500
Subject: [PATCH 1/2] Add relation to unsigned right shift.

If the first operand and the shift value of a right shift operation are both
>= 0, then we know the LHS of the operation is <= the first operand.

	PR tree-optimization/96707
	gcc/
	* range-op.c (operator_rshift::lhs_op1_relation): New.
	gcc/testtsuite/
	* g++.dg/pr96707.C: New.
---
 gcc/range-op.cc| 16 
 gcc/testsuite/g++.dg/pr96707.C | 10 ++
 2 files changed, 26 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/pr96707.C

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index a4f6e9eba29..19bdf30911a 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1941,9 +1941,25 @@ public:
 			  const irange &lhs,
 			  const irange &op2,
 			  relation_kind rel = VREL_NONE) const;
+  virtual enum tree_code lhs_op1_relation (const irange &lhs,
+	   const irange &op1,
+	   const irange &op2) const;
 } op_rshift;
 
 
+enum tree_code
+operator_rshift::lhs_op1_relation (const irange &lhs ATTRIBUTE_UNUSED,
+   const irange &op1,
+   const irange &op2) const
+{
+  // If both operands range are >= 0, then the LHS <= op1.
+  if (!op1.undefined_p () && !op2.undefined_p ()
+  && wi::ge_p (op1.lower_bound (), 0, TYPE_SIGN (op1.type ()))
+  && wi::ge_p (op2.lower_bound (), 0, TYPE_SIGN (op2.type (
+return LE_EXPR;
+  return VREL_NONE;
+}
+
 bool
 operator_lshift::fold_range (irange &r, tree type,
 			 const irange &op1,
diff --git a/gcc/testsuite/g++.dg/pr96707.C b/gcc/testsuite/g++.dg/pr96707.C
new file mode 100644
index 000..2653fe3d043
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr96707.C
@@ -0,0 +1,10 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+bool f(unsigned x, unsigned y)
+{
+return (x >> y) <= x;
+}
+
+/* { dg-final { scan-tree-dump "return 1" "evrp" } }  */
+
-- 
2.17.2



[PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

This patch actually addresses a few PRs.

The root PR was 97909.   Ranger context functionality was added to 
fold_const back in early November 
(https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)


The other 2 PRs mentioned (83072 and 83073) partially worked after this, 
but the original patch did not change the result of the query in 
expr_not_equal_to () to a multi-range object.


This patch simply changes the value_range variable in that routine to an 
int_range<5> so we can pick up more precision. This in turn allows us to 
capture all the tests as expected.


Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

Andrew
From 329626a426d21dfe484053f7b6ac4f2d0c14fa0e Mon Sep 17 00:00:00 2001
From: Andrew MacLeod 
Date: Wed, 12 Jan 2022 13:31:08 -0500
Subject: [PATCH 2/2] Allow more precision when querying from fold_const.

fold_const::expr_not_equal_to queries for a current range, but still uses
the old value_range class.  This is causing it to miss opportunities when
ranger can provide something better.

	PR tree-optimization/83072
	PR tree-optimization/83073
	PR tree-optimization/97909
	gcc/
	* fold-const.c (expr_not_equal_to): Use a multi-range class.

	gcc/testsuite/
	* gcc.dg/pr83072-2.c: New.
	* gcc.dg/pr83073.c: New.
---
 gcc/fold-const.c |  2 +-
 gcc/testsuite/gcc.dg/pr83072-2.c | 18 ++
 gcc/testsuite/gcc.dg/pr83073.c   | 10 ++
 3 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr83072-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr83073.c

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 397fa9a03a1..7945b8d9eda 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -10734,7 +10734,7 @@ tree_expr_nonzero_p (tree t)
 bool
 expr_not_equal_to (tree t, const wide_int &w)
 {
-  value_range vr;
+  int_range<5> vr;
   switch (TREE_CODE (t))
 {
 case INTEGER_CST:
diff --git a/gcc/testsuite/gcc.dg/pr83072-2.c b/gcc/testsuite/gcc.dg/pr83072-2.c
new file mode 100644
index 000..f495f2582c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83072-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp-details" } */
+
+int f1(int a, int b, int c){
+  if(c==0)__builtin_unreachable();
+  a *= c;
+  b *= c;
+  return a == b;
+}
+
+int f2(int a, int b, int c){
+  c |= 1;
+  a *= c;
+  b *= c;
+  return a == b;
+}
+
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } }  */
diff --git a/gcc/testsuite/gcc.dg/pr83073.c b/gcc/testsuite/gcc.dg/pr83073.c
new file mode 100644
index 000..1168ae822a4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr83073.c
@@ -0,0 +1,10 @@
+/* { dg-do compile} */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop" } */
+
+int f(int x)
+{
+x = x|1;
+return x & 1;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to.* = 1" "evrp" } }  */
-- 
2.17.2



[PATCH] c/104002 - shufflevector variable indexing

2022-01-13 Thread Richard Biener via Gcc-patches
Variable indexing of a __builtin_shufflevector result is broken because
we fail to properly mark the TARGET_EXPR decl as addressable.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

2022-01-13  Richard Biener  

PR c/104002
gcc/c-family/
* c-common.c (c_common_mark_addressable_vec): Handle TARGET_EXPR.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: Move ...
* c-c++-common/torture/builtin-shufflevector-3.c: ... here.
---
 gcc/c-family/c-common.c  | 5 -
 .../c-c++-common/{ => torture}/builtin-shufflevector-3.c | 0
 2 files changed, 4 insertions(+), 1 deletion(-)
 rename gcc/testsuite/c-c++-common/{ => torture}/builtin-shufflevector-3.c 
(100%)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 4a6a4edb763..a34f32f51a4 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6989,12 +6989,15 @@ c_common_mark_addressable_vec (tree t)
 }
   if (!VAR_P (t)
   && TREE_CODE (t) != PARM_DECL
-  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR)
+  && TREE_CODE (t) != COMPOUND_LITERAL_EXPR
+  && TREE_CODE (t) != TARGET_EXPR)
 return;
   if (!VAR_P (t) || !DECL_HARD_REGISTER (t))
 TREE_ADDRESSABLE (t) = 1;
   if (TREE_CODE (t) == COMPOUND_LITERAL_EXPR)
 TREE_ADDRESSABLE (COMPOUND_LITERAL_EXPR_DECL (t)) = 1;
+  else if (TREE_CODE (t) == TARGET_EXPR)
+TREE_ADDRESSABLE (TARGET_EXPR_SLOT (t)) = 1;
 }
 
 
diff --git a/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c 
b/gcc/testsuite/c-c++-common/torture/builtin-shufflevector-3.c
similarity index 100%
rename from gcc/testsuite/c-c++-common/builtin-shufflevector-3.c
rename to gcc/testsuite/c-c++-common/torture/builtin-shufflevector-3.c
-- 
2.31.1


Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:

> 
> On 13/01/2022 12:36, Richard Biener wrote:
> > On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:
> >
> >> This time to the list too (sorry for double email)
> >>
> >> Hi,
> >>
> >> The original patch '[vect] Re-analyze all modes for epilogues', skipped
> >> modes
> >> that should not be skipped since it used the vector mode provided by
> >> autovectorize_vector_modes to derive the minimum VF required for it.
> >> However,
> >> those modes should only really be used to dictate vector size, so instead
> >> this
> >> patch looks for the mode in 'used_vector_modes' with the largest element
> >> size,
> >> and constructs a vector mode with the smae size as the current
> >> vector_modes[mode_i]. Since we are using the largest element size the
> >> NUNITs
> >> for this mode is the smallest possible VF required for an epilogue with
> >> this
> >> mode and should thus skip only the modes we are certain can not be used.
> >>
> >> Passes bootstrap and regression on x86_64 and aarch64.
> > Clearly
> >
> > + /* To make sure we are conservative as to what modes we skip, we
> > +should use check the smallest possible NUNITS which would be
> > +derived from the mode in USED_VECTOR_MODES with the largest
> > +element size.  */
> > + scalar_mode max_elsize_mode = GET_MODE_INNER
> > (vector_modes[mode_i]);
> > + for (vec_info::mode_set::iterator i =
> > +   first_loop_vinfo->used_vector_modes.begin ();
> > + i != first_loop_vinfo->used_vector_modes.end (); ++i)
> > +   {
> > + if (VECTOR_MODE_P (*i)
> > + && GET_MODE_SIZE (GET_MODE_INNER (*i))
> > + > GET_MODE_SIZE (max_elsize_mode))
> > +   max_elsize_mode = GET_MODE_INNER (*i);
> > +   }
> >
> > can be done once before iterating over the modes for the epilogue.
> True, I'll start with QImode instead of the inner of vector_modes[mode_i] too
> since we can't guarantee the mode is a VECTOR_MODE_P and it is actually better
> too since we can't possible guarantee the element size of the
> USED_VECTOR_MODES is smaller than that of the first vector mode...
> 
> > Richard maybe knows whether we should take care to look at the
> > size of the vector mode as well since related_vector_mode when
> > passed 0 as nunits produces a vector mode with the same size
> > as vector_modes[mode_i] but not all used_vector_modes may be
> > of the same size
> I suspect that should be fine though, since if we use the largest element size
> of all used_vector_modes then that should gives us the least possible number
> of NUNITS and thus only conservatively skip. That said, that does assume that
> no vector mode used may be larger than the size of the loop's vector_mode. Can
> I assume that?

No idea, but I would lean towards a no ;)  I think the loops vector_mode
doesn't have to match vector_modes[mode_i] either, does it?  At least
autodetected_vector_mode will be not QImode based.

> >
> > (and you probably also want to exclude
> > VECTOR_BOOLEAN_TYPE_P from the search?)
> Yeah I think so too, thanks!
> 
> I keep going back to thinking (as I brought up in the bugzilla ticket), maybe
> we ought to only skip if the NUNITS of the vector mode with the same vector
> size as vector_modes[mode_i] is larger than first_info_vf, or just don't skip
> at all...

The question is how much work we do before realizing the chosen mode
cannot be used because there's not enough iterations?  Maybe we can
improve there easily?

Also for targets that for the main loop do not perform cost
comparison (like x86) but have lots of vector modes the previous
mode of operation really made sense (start at next_mode_i or
mode_i when unrolling).


Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > + tree d = build_debug_expr_decl (type);
> > + gdebug *g
> > +   = gimple_build_debug_bind (d, build2 (rcode, type,
> > + new_lhs, arg),
> > +  stmt2);
> > + gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> > + replace_uses_by (lhs2, d);
> 
> I wonder if you can leave a lhs2 = d; in the IL instead of using
> replace_uses_by which will process imm uses and fold stmts while
> we're going to do that anyway in the caller?  That would IMHO
> be better here.

I'd need to emit them always for reversible ops and when the
atomic call can't be last, regardless of whether it is needed or not,
just so that next DCE would remove those up and emit those debug stmts,
because otherwise that could result in -fcompare-debug failures
(at least with -fno-tree-dce -fno-tree-whatever ...).
And
+ tree narg = build_debug_expr_decl (type);
+ gdebug *g
+   = gimple_build_debug_bind (narg,
+  fold_convert (type, arg),
+  stmt2);
isn't that much more code compared to
  gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
Or would you like it to be emitted always, i.e.
  if (atomic_op != BIT_AND_EXPR
 && atomic_op != BIT_IOR_EXPR
 /* With -fnon-call-exceptions if we can't
add stmts after the call easily.  */
 && !stmt_ends_bb_p (stmt2))
{
  tree type = TREE_TYPE (lhs2);
  if (TREE_CODE (arg) == INTEGER_CST)
arg = fold_convert (type, arg);
  else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
{
  tree narg = make_ssa_name (type);
  gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
  gsi_insert_after (&gsi, g, GSI_NEW_STMT);
  arg = narg;
}
  enum tree_code rcode;
  switch (atomic_op)
{
case PLUS_EXPR: rcode = MINUS_EXPR; break;
case MINUS_EXPR: rcode = PLUS_EXPR; break;
case BIT_XOR_EXPR: rcode = atomic_op; break;
default: gcc_unreachable ();
}
  tree d = build_debug_expr_decl (type);
  gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
  gsi_insert_after (&gsi, g, GSI_NEW_STMT);
  lhs2 = NULL_TREE;
}
in between
  update_stmt (use_stmt);
and
  imm_use_iterator iter;
and then do the
 FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
   if (use_stmt != cast_stmt)
with resetting only if (lhs2)
and similarly release_ssa_name (lhs2) only if (lhs2)?
I think the usual case is that we emit debug exprs right away,
not emit something that we want to DCE.

+   if (atomic_op == BIT_AND_EXPR
+   || atomic_op == BIT_IOR_EXPR
+   /* Or with -fnon-call-exceptions if we can't
+  add debug stmts after the call.  */
+   || stmt_ends_bb_p (stmt2))


But now that you mention it, I think I don't handle right the
case where lhs2 has no debug uses but there is a cast_stmt that has debug
uses for its lhs.  We'd need to add_debug_temp in that case too and
add a debug temp.

Jakub



Re: [PATCH] c/104002 - shufflevector variable indexing

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 03:12:03PM +0100, Richard Biener wrote:
> Variable indexing of a __builtin_shufflevector result is broken because
> we fail to properly mark the TARGET_EXPR decl as addressable.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2022-01-13  Richard Biener  
> 
>   PR c/104002
> gcc/c-family/
>   * c-common.c (c_common_mark_addressable_vec): Handle TARGET_EXPR.
> 
> gcc/testsuite/
>   * c-c++-common/builtin-shufflevector-3.c: Move ...
>   * c-c++-common/torture/builtin-shufflevector-3.c: ... here.

LGTM.

Jakub



[PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0)

2022-01-13 Thread Hafiz Abid Qadeer
This patch series add initial support for allocate directive in the
gfortran.  Although every allocate directive is parsed, only those
which are associated with an allocate statement are translated. The
lowering consists of replacing implicitly generated malloc/free call
from the allocate statement to GOMP_alloc and GOMP_free calls.

Hafiz Abid Qadeer (5):
  [gfortran] Add parsing support for allocate directive (OpenMP 5.0).
  [gfortran] Translate allocate directive (OpenMP 5.0).
  [gfortran] Handle cleanup of omp allocated variables (OpenMP 5.0).
  Gimplify allocate directive (OpenMP 5.0).
  Lower allocate directive  (OpenMP 5.0).

 gcc/doc/gimple.texi   |  38 ++-
 gcc/fortran/dump-parse-tree.c |   3 +
 gcc/fortran/gfortran.h|   5 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.c  | 229 +-
 gcc/fortran/parse.c   |  10 +-
 gcc/fortran/resolve.c |   1 +
 gcc/fortran/st.c  |   1 +
 gcc/fortran/trans-decl.c  |  20 ++
 gcc/fortran/trans-openmp.c|  50 
 gcc/fortran/trans.c   |   1 +
 gcc/gimple-pretty-print.c |  37 +++
 gcc/gimple.c  |  10 +
 gcc/gimple.def|   6 +
 gcc/gimple.h  |  60 -
 gcc/gimplify.c|  19 ++
 gcc/gsstruct.def  |   1 +
 gcc/omp-low.c | 125 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 +
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  84 +++
 gcc/tree-core.h   |   9 +
 gcc/tree-pretty-print.c   |  23 ++
 gcc/tree.c|   1 +
 gcc/tree.def  |   4 +
 gcc/tree.h|  15 ++
 .../testsuite/libgomp.fortran/allocate-1.c|   7 +
 .../testsuite/libgomp.fortran/allocate-2.f90  |  49 
 28 files changed, 986 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

-- 
2.25.1



[PATCH 1/5] [gfortran] Add parsing support for allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
Currently we only make use of this directive when it is associated
with an allocate statement.

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_omp_node): Handle EXEC_OMP_ALLOCATE.
(show_code_node): Likewise.
* gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE.
(OMP_LIST_ALLOCATOR): New enum value.
(enum gfc_exec_op): Add EXEC_OMP_ALLOCATE.
* match.h (gfc_match_omp_allocate): New function.
* openmp.c (enum omp_mask1): Add OMP_CLAUSE_ALLOCATOR.
(OMP_ALLOCATE_CLAUSES): New define.
(gfc_match_omp_allocate): New function.
(resolve_omp_clauses): Add ALLOCATOR in clause_names.
(omp_code_to_statement): Handle EXEC_OMP_ALLOCATE.
(EMPTY_VAR_LIST): New define.
(check_allocate_directive_restrictions): New function.
(gfc_resolve_omp_allocate): Likewise.
(gfc_resolve_omp_directive): Handle EXEC_OMP_ALLOCATE.
* parse.c (decode_omp_directive): Handle ST_OMP_ALLOCATE.
(next_statement): Likewise.
(gfc_ascii_statement): Likewise.
* resolve.c (gfc_resolve_code): Handle EXEC_OMP_ALLOCATE.
* st.c (gfc_free_statement): Likewise.
* trans.c (trans_code): Likewise

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-4.f90: New test.
* gfortran.dg/gomp/allocate-5.f90: New test.
---
 gcc/fortran/dump-parse-tree.c |   3 +
 gcc/fortran/gfortran.h|   4 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.c  | 199 +-
 gcc/fortran/parse.c   |  10 +-
 gcc/fortran/resolve.c |   1 +
 gcc/fortran/st.c  |   1 +
 gcc/fortran/trans.c   |   1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90 | 112 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90 |  73 +++
 10 files changed, 400 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-4.f90
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-5.f90

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 7459f4b89a9..38fef42150a 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1993,6 +1993,7 @@ show_omp_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE: name = "CACHE"; is_oacc = true; break;
 case EXEC_OACC_ENTER_DATA: name = "ENTER DATA"; is_oacc = true; break;
 case EXEC_OACC_EXIT_DATA: name = "EXIT DATA"; is_oacc = true; break;
+case EXEC_OMP_ALLOCATE: name = "ALLOCATE"; break;
 case EXEC_OMP_ATOMIC: name = "ATOMIC"; break;
 case EXEC_OMP_BARRIER: name = "BARRIER"; break;
 case EXEC_OMP_CANCEL: name = "CANCEL"; break;
@@ -2194,6 +2195,7 @@ show_omp_node (int level, gfc_code *c)
   || c->op == EXEC_OMP_TARGET_UPDATE || c->op == EXEC_OMP_TARGET_ENTER_DATA
   || c->op == EXEC_OMP_TARGET_EXIT_DATA || c->op == EXEC_OMP_SCAN
   || c->op == EXEC_OMP_DEPOBJ || c->op == EXEC_OMP_ERROR
+  || c->op == EXEC_OMP_ALLOCATE
   || (c->op == EXEC_OMP_ORDERED && c->block == NULL))
 return;
   if (c->op == EXEC_OMP_SECTIONS || c->op == EXEC_OMP_PARALLEL_SECTIONS)
@@ -3314,6 +3316,7 @@ show_code_node (int level, gfc_code *c)
 case EXEC_OACC_CACHE:
 case EXEC_OACC_ENTER_DATA:
 case EXEC_OACC_EXIT_DATA:
+case EXEC_OMP_ALLOCATE:
 case EXEC_OMP_ATOMIC:
 case EXEC_OMP_CANCEL:
 case EXEC_OMP_CANCELLATION_POINT:
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 3b791a4f6be..79a43a2fdf0 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -259,7 +259,7 @@ enum gfc_statement
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_SERIAL_LOOP, ST_OACC_END_SERIAL_LOOP, ST_OACC_SERIAL,
   ST_OACC_END_SERIAL, ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
-  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC, ST_OMP_ALLOCATE,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1392,6 +1392,7 @@ enum
   OMP_LIST_USE_DEVICE_PTR,
   OMP_LIST_USE_DEVICE_ADDR,
   OMP_LIST_NONTEMPORAL,
+  OMP_LIST_ALLOCATOR,
   OMP_LIST_NUM
 };
 
@@ -2893,6 +2894,7 @@ enum gfc_exec_op
   EXEC_OACC_DATA, EXEC_OACC_HOST_DATA, EXEC_OACC_LOOP, EXEC_OACC_UPDATE,
   EXEC_OACC_WAIT, EXEC_OACC_CACHE, EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
   EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE,
+  EXEC_OMP_ALLOCATE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h
index 65ee3b6cb41..9f0449eda0e 100644
--- a/gcc/fortran/match.h
+++ b/gcc/fortr

[PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
gcc/fortran/ChangeLog:

* trans-openmp.c (gfc_trans_omp_clauses): Handle OMP_LIST_ALLOCATOR.
(gfc_trans_omp_allocate): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_ALLOCATE.

gcc/ChangeLog:

* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_ALLOCATOR.
(dump_generic_node): Handle OMP_ALLOCATE.
* tree.def (OMP_ALLOCATE): New.
* tree.h (OMP_ALLOCATE_CLAUSES): Likewise.
(OMP_ALLOCATE_DECL): Likewise.
(OMP_ALLOCATE_ALLOCATOR): Likewise.
* tree.c (omp_clause_num_ops): Add entry for OMP_CLAUSE_ALLOCATOR.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: New test.
---
 gcc/fortran/trans-openmp.c| 44 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 | 72 +++
 gcc/tree-core.h   |  3 +
 gcc/tree-pretty-print.c   | 19 +
 gcc/tree.c|  1 +
 gcc/tree.def  |  4 ++
 gcc/tree.h| 11 +++
 7 files changed, 154 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 9661c77f905..cb389f40370 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2649,6 +2649,28 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  }
  }
  break;
+   case OMP_LIST_ALLOCATOR:
+ for (; n != NULL; n = n->next)
+   if (n->sym->attr.referenced)
+ {
+   tree t = gfc_trans_omp_variable (n->sym, false);
+   if (t != error_mark_node)
+ {
+   tree node = build_omp_clause (input_location,
+ OMP_CLAUSE_ALLOCATOR);
+   OMP_ALLOCATE_DECL (node) = t;
+   if (n->expr)
+ {
+   tree allocator_;
+   gfc_init_se (&se, NULL);
+   gfc_conv_expr (&se, n->expr);
+   allocator_ = gfc_evaluate_now (se.expr, block);
+   OMP_ALLOCATE_ALLOCATOR (node) = allocator_;
+ }
+   omp_clauses = gfc_trans_add_clause (node, omp_clauses);
+ }
+ }
+ break;
case OMP_LIST_LINEAR:
  {
gfc_expr *last_step_expr = NULL;
@@ -4888,6 +4910,26 @@ gfc_trans_omp_atomic (gfc_code *code)
   return gfc_finish_block (&block);
 }
 
+static tree
+gfc_trans_omp_allocate (gfc_code *code)
+{
+  stmtblock_t block;
+  tree stmt;
+
+  gfc_omp_clauses *clauses = code->ext.omp_clauses;
+  gcc_assert (clauses);
+
+  gfc_start_block (&block);
+  stmt = make_node (OMP_ALLOCATE);
+  TREE_TYPE (stmt) = void_type_node;
+  OMP_ALLOCATE_CLAUSES (stmt) = gfc_trans_omp_clauses (&block, clauses,
+  code->loc, false,
+  true);
+  gfc_add_expr_to_block (&block, stmt);
+  gfc_merge_block_scope (&block);
+  return gfc_finish_block (&block);
+}
+
 static tree
 gfc_trans_omp_barrier (void)
 {
@@ -7280,6 +7322,8 @@ gfc_trans_omp_directive (gfc_code *code)
 {
   switch (code->op)
 {
+case EXEC_OMP_ALLOCATE:
+  return gfc_trans_omp_allocate (code);
 case EXEC_OMP_ATOMIC:
   return gfc_trans_omp_atomic (code);
 case EXEC_OMP_BARRIER:
diff --git a/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 
b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
new file mode 100644
index 000..2de2b52ee44
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/allocate-6.f90
@@ -0,0 +1,72 @@
+! { dg-do compile }
+! { dg-additional-options "-fdump-tree-original" }
+
+module omp_lib_kinds
+  use iso_c_binding, only: c_int, c_intptr_t
+  implicit none
+  private :: c_int, c_intptr_t
+  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
+
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_null_allocator = 0
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_default_mem_alloc = 1
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_large_cap_mem_alloc = 2
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_const_mem_alloc = 3
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_high_bw_mem_alloc = 4
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_low_lat_mem_alloc = 5
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_cgroup_mem_alloc = 6
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_pteam_mem_alloc = 7
+  integer (kind=omp_allocator_handle_kind), &
+ parameter :: omp_thread_mem_alloc = 8
+end module
+
+
+subroutine foo(x, y, al)
+  use omp_lib_kinds
+  implicit none
+  
+type :: my_type
+  in

[PATCH 3/5] [gfortran] Handle cleanup of omp allocated variables (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
Currently we are only handling omp allocate directive that is associated
with an allocate statement.  This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive.  But the free calls come in a separate cleanup block.  To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.

gcc/fortran/ChangeLog:

* gfortran.h (struct access_ref): Declare new members
omp_allocated and omp_allocated_end.
* openmp.c (gfc_match_omp_allocate): Set new_st.resolved_sym to
NULL.
(prepare_omp_allocated_var_list_for_cleanup): New function.
(gfc_resolve_omp_allocate): Call it.
* trans-decl.c (gfc_trans_deferred_vars): Process omp_allocated.
* trans-openmp.c (gfc_trans_omp_allocate): Set kind for the stmt
generated for allocate directive.

gcc/ChangeLog:

* tree-core.h (struct tree_base): Add comments.
* tree-pretty-print.c (dump_generic_node): Handle allocate directive
kind.
* tree.h (OMP_ALLOCATE_KIND_ALLOCATE): New define.
(OMP_ALLOCATE_KIND_FREE): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Test kind of allocate directive.
---
 gcc/fortran/gfortran.h|  1 +
 gcc/fortran/openmp.c  | 30 +++
 gcc/fortran/trans-decl.c  | 20 +
 gcc/fortran/trans-openmp.c|  6 
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  3 +-
 gcc/tree-core.h   |  6 
 gcc/tree-pretty-print.c   |  4 +++
 gcc/tree.h|  4 +++
 8 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 79a43a2fdf0..6a43847d31f 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1820,6 +1820,7 @@ typedef struct gfc_symbol
   gfc_array_spec *as;
   struct gfc_symbol *result;   /* function result symbol */
   gfc_component *components;   /* Derived type components */
+  gfc_omp_namelist *omp_allocated, *omp_allocated_end;
 
   /* Defined only for Cray pointees; points to their pointer.  */
   struct gfc_symbol *cp_pointer;
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index ee7c39980bb..f11812b0b12 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5818,6 +5818,7 @@ gfc_match_omp_allocate (void)
 
   new_st.op = EXEC_OMP_ALLOCATE;
   new_st.ext.omp_clauses = c;
+  new_st.resolved_sym = NULL;
   gfc_free_expr (allocator);
   return MATCH_YES;
 }
@@ -9049,6 +9050,34 @@ gfc_resolve_oacc_routines (gfc_namespace *ns)
 }
 }
 
+static void
+prepare_omp_allocated_var_list_for_cleanup (gfc_omp_namelist *cn, locus loc)
+{
+  gfc_symbol *proc = cn->sym->ns->proc_name;
+  gfc_omp_namelist *p, *n;
+
+  for (n = cn; n; n = n->next)
+{
+  if (n->sym->attr.allocatable && !n->sym->attr.save
+ && !n->sym->attr.result && !proc->attr.is_main_program)
+   {
+ p = gfc_get_omp_namelist ();
+ p->sym = n->sym;
+ p->expr = gfc_copy_expr (n->expr);
+ p->where = loc;
+ p->next = NULL;
+ if (proc->omp_allocated == NULL)
+   proc->omp_allocated_end = proc->omp_allocated = p;
+ else
+   {
+ proc->omp_allocated_end->next = p;
+ proc->omp_allocated_end = p;
+   }
+
+   }
+}
+}
+
 static void
 check_allocate_directive_restrictions (gfc_symbol *sym, gfc_expr *omp_al,
   gfc_namespace *ns, locus loc)
@@ -9179,6 +9208,7 @@ gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace 
*ns)
 code->loc);
}
 }
+  prepare_omp_allocated_var_list_for_cleanup (cn, code->loc);
 }
 
 
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 066fb3a5f61..e5c9bf413e7 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4583,6 +4583,26 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, 
gfc_wrapped_block * block)
  }
 }
 
+  /* Generate a dummy allocate pragma with free kind so that cleanup
+ of those variables which were allocated using the allocate statement
+ associated with an allocate clause happens correctly.  */
+
+  if (proc_sym->omp_allocated)
+{
+  gfc_clear_new_st ();
+  new_st.op = EXEC_OMP_ALLOCATE;
+  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  c->lists[OMP_LIST_ALLOCATOR] = proc_sym->omp_allocated;
+  new_st.ext.omp_clauses = c;
+  /* This is just a hacky way to convey to handler that we are
+dealing with cleanup here.  Saves us from using another field
+for it.  */
+  new_st.resolved_sym = proc_sym->omp_allocated->sym;
+  gfc_add_init_cleanup (block, NULL,
+ 

[PATCH 4/5] [gfortran] Gimplify allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
gcc/ChangeLog:

* doc/gimple.texi: Describe GIMPLE_OMP_ALLOCATE.
* gimple-pretty-print.c (dump_gimple_omp_allocate): New function.
(pp_gimple_stmt_1): Call it.
* gimple.c (gimple_build_omp_allocate): New function.
* gimple.def (GIMPLE_OMP_ALLOCATE): New node.
* gimple.h (enum gf_mask): Add GF_OMP_ALLOCATE_KIND_MASK,
GF_OMP_ALLOCATE_KIND_ALLOCATE and GF_OMP_ALLOCATE_KIND_FREE.
(struct gomp_allocate): New.
(is_a_helper ::test): New.
(is_a_helper ::test): New.
(gimple_build_omp_allocate): Declare.
(gimple_omp_subcode): Replace GIMPLE_OMP_TEAMS with
GIMPLE_OMP_ALLOCATE.
(gimple_omp_allocate_set_clauses): New.
(gimple_omp_allocate_set_kind): Likewise.
(gimple_omp_allocate_clauses): Likewise.
(gimple_omp_allocate_kind): Likewise.
(CASE_GIMPLE_OMP): Add GIMPLE_OMP_ALLOCATE.
* gimplify.c (gimplify_omp_allocate): New.
(gimplify_expr): Call it.
* gsstruct.def (GSS_OMP_ALLOCATE): Define.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.
---
 gcc/doc/gimple.texi   | 38 +++-
 gcc/gimple-pretty-print.c | 37 
 gcc/gimple.c  | 10 
 gcc/gimple.def|  6 ++
 gcc/gimple.h  | 60 ++-
 gcc/gimplify.c| 19 ++
 gcc/gsstruct.def  |  1 +
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |  4 +-
 8 files changed, 171 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/gimple.texi b/gcc/doc/gimple.texi
index 65ef63d6ee9..60a4d2c17ca 100644
--- a/gcc/doc/gimple.texi
+++ b/gcc/doc/gimple.texi
@@ -420,6 +420,9 @@ kinds, along with their relationships to @code{GSS_} values 
(layouts) and
  + gomp_continue
  |layout: GSS_OMP_CONTINUE, code: GIMPLE_OMP_CONTINUE
  |
+ + gomp_allocate
+ |layout: GSS_OMP_ALLOCATE, code: GIMPLE_OMP_ALLOCATE
+ |
  + gomp_atomic_load
  |layout: GSS_OMP_ATOMIC_LOAD, code: GIMPLE_OMP_ATOMIC_LOAD
  |
@@ -454,6 +457,7 @@ The following table briefly describes the GIMPLE 
instruction set.
 @item @code{GIMPLE_GOTO}   @tab x  @tab x
 @item @code{GIMPLE_LABEL}  @tab x  @tab x
 @item @code{GIMPLE_NOP}@tab x  @tab x
+@item @code{GIMPLE_OMP_ALLOCATE}   @tab x  @tab x
 @item @code{GIMPLE_OMP_ATOMIC_LOAD}@tab x  @tab x
 @item @code{GIMPLE_OMP_ATOMIC_STORE}   @tab x  @tab x
 @item @code{GIMPLE_OMP_CONTINUE}   @tab x  @tab x
@@ -1029,6 +1033,7 @@ Return a deep copy of statement @code{STMT}.
 * @code{GIMPLE_LABEL}::
 * @code{GIMPLE_GOTO}::
 * @code{GIMPLE_NOP}::
+* @code{GIMPLE_OMP_ALLOCATE}::
 * @code{GIMPLE_OMP_ATOMIC_LOAD}::
 * @code{GIMPLE_OMP_ATOMIC_STORE}::
 * @code{GIMPLE_OMP_CONTINUE}::
@@ -1729,6 +1734,38 @@ Build a @code{GIMPLE_NOP} statement.
 Returns @code{TRUE} if statement @code{G} is a @code{GIMPLE_NOP}.
 @end deftypefn
 
+@node @code{GIMPLE_OMP_ALLOCATE}
+@subsection @code{GIMPLE_OMP_ALLOCATE}
+@cindex @code{GIMPLE_OMP_ALLOCATE}
+
+@deftypefn {GIMPLE function} gomp_allocate *gimple_build_omp_allocate ( @
+tree clauses, int kind)
+Build a @code{GIMPLE_OMP_ALLOCATE} statement.  @code{CLAUSES} is the clauses
+associated with this node.  @code{KIND} is the enumeration value
+@code{GF_OMP_ALLOCATE_KIND_ALLOCATE} if this directive allocates memory
+or @code{GF_OMP_ALLOCATE_KIND_FREE} if it de-allocates.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_clauses ( @
+gomp_allocate *g, tree clauses)
+Set the @code{CLAUSES} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_aallocate_clauses ( @
+const gomp_allocate *g)
+Get the @code{CLAUSES} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} void gimple_omp_allocate_set_kind ( @
+gomp_allocate *g, int kind)
+Set the @code{KIND} for a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
+@deftypefn {GIMPLE function} tree gimple_omp_allocate_kind ( @
+const gomp_atomic_load *g)
+Get the @code{KIND} of a @code{GIMPLE_OMP_ALLOCATE}.
+@end deftypefn
+
 @node @code{GIMPLE_OMP_ATOMIC_LOAD}
 @subsection @code{GIMPLE_OMP_ATOMIC_LOAD}
 @cindex @code{GIMPLE_OMP_ATOMIC_LOAD}
@@ -1760,7 +1797,6 @@ const gomp_atomic_load *g)
 Get the @code{RHS} of an atomic set.
 @end deftypefn
 
-
 @node @code{GIMPLE_OMP_ATOMIC_STORE}
 @subsection @code{GIMPLE_OMP_ATOMIC_STORE}
 @cindex @code{GIMPLE_OMP_ATOMIC_STORE}
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index ebd87b20a0a..bb961a900df 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1967,6 +1967,38 @@ dump_gimple_omp_critical (pretty_printer *bu

[PATCH 5/5] [gfortran] Lower allocate directive (OpenMP 5.0).

2022-01-13 Thread Hafiz Abid Qadeer
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.

gcc/ChangeLog:

* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
(scan_omp_1_stmt): Call it.
(lower_omp_allocate): New function.
(lower_omp_1): Call it.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-6.f90: Add tests.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/allocate-1.c: New test.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
---
 gcc/omp-low.c | 125 ++
 gcc/testsuite/gfortran.dg/gomp/allocate-6.f90 |   9 ++
 .../testsuite/libgomp.fortran/allocate-1.c|   7 +
 .../testsuite/libgomp.fortran/allocate-2.f90  |  49 +++
 4 files changed, 190 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-1.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/allocate-2.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f2237428de1..8a0ae3932b9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1684,6 +1684,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_TASK_REDUCTION:
case OMP_CLAUSE_ALLOCATE:
+   case OMP_CLAUSE_ALLOCATOR:
  break;
 
case OMP_CLAUSE_ALIGNED:
@@ -1892,6 +1893,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
case OMP_CLAUSE_FINALIZE:
case OMP_CLAUSE_FILTER:
case OMP_CLAUSE__CONDTEMP_:
+   case OMP_CLAUSE_ALLOCATOR:
  break;
 
case OMP_CLAUSE__CACHE_:
@@ -2962,6 +2964,16 @@ scan_omp_simd_scan (gimple_stmt_iterator *gsi, gomp_for 
*stmt,
   maybe_lookup_ctx (new_stmt)->for_simd_scan_phase = true;
 }
 
+/* Scan an OpenMP allocate directive.  */
+
+static void
+scan_omp_allocate (gomp_allocate *stmt, omp_context *outer_ctx)
+{
+  omp_context *ctx;
+  ctx = new_omp_context (stmt, outer_ctx);
+  scan_sharing_clauses (gimple_omp_allocate_clauses (stmt), ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -4247,6 +4259,9 @@ scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool 
*handled_ops_p,
insert_decl_map (&ctx->cb, var, var);
   }
   break;
+case GIMPLE_OMP_ALLOCATE:
+  scan_omp_allocate (as_a  (stmt), ctx);
+  break;
 default:
   *handled_ops_p = false;
   break;
@@ -8680,6 +8695,111 @@ lower_omp_single_simple (gomp_single *single_stmt, 
gimple_seq *pre_p)
   gimple_seq_add_stmt (pre_p, gimple_build_label (flabel));
 }
 
+static void
+lower_omp_allocate (gimple_stmt_iterator *gsi_p, omp_context *)
+{
+  gomp_allocate *st = as_a  (gsi_stmt (*gsi_p));
+  tree clauses = gimple_omp_allocate_clauses (st);
+  int kind = gimple_omp_allocate_kind (st);
+  gcc_assert (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE
+ || kind == GF_OMP_ALLOCATE_KIND_FREE);
+  bool allocate = (kind == GF_OMP_ALLOCATE_KIND_ALLOCATE);
+
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+{
+  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_ALLOCATOR)
+   continue;
+  tree var = OMP_ALLOCATE_DECL (c);
+
+  gimple_stmt_iterator gsi = *gsi_p;
+  for (gsi_next (&gsi); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple *stmt = gsi_stmt (gsi);
+
+ if (gimple_code (stmt) != GIMPLE_CALL
+ || (allocate && gimple_call_fndecl (stmt)
+ != builtin_decl_explicit (BUILT_IN_MALLOC))
+ || (!allocate && gimple_call_fndecl (stmt)
+ != builtin_decl_explicit (BUILT_IN_FREE)))
+   continue;
+ const gcall *gs = as_a  (stmt);
+ tree allocator = OMP_ALLOCATE_ALLOCATOR (c)
+  ? OMP_ALLOCATE_ALLOCATOR (c)
+  : integer_zero_node;
+ if (allocate)
+   {
+ tree lhs = gimple_call_lhs (gs);
+ if (lhs && TREE_CODE (lhs) == SSA_NAME)
+   {
+ gimple_stmt_iterator gsi2 = gsi;
+ gsi_next (&gsi2);
+ gimple *assign = gsi_stmt (gsi2);
+ if (gimple_code (assign) == GIMPLE_ASSIGN)
+   {
+ lhs = gimple_assign_lhs (as_a  (assign));
+ if (lhs == NULL_TREE
+ || TREE_CODE (lhs) != COMPONENT_REF)
+   continue;
+ lhs = TREE_OPERAND (lhs, 0);
+   }
+   }
+
+ if (lhs == var)
+   {
+ unsigned HOST_WIDE_INT ialign = 0;
+ tree align;
+ if (TYPE_P (var))
+   ialign = TYPE_ALIGN_UNIT (var);
+ else
+   ialign = DECL_ALIGN_UNIT (var);
+ align = build_int_cst (size_type_node, ialign);
+ tree repl = builtin_decl_explicit (BUILT_IN_GOMP_

[PATCH v3 00/15] ARM/MVE use vectors of boolean for predicates

2022-01-13 Thread Christophe Lyon via Gcc-patches


This is v3 of this patch series, fixing issues I discovered before
committing v2 (which had been approved).

Thanks a lot to Richard Sandiford for his help.

The changes v2 -> v3 are:

Patch 4: Fix arm_hard_regno_nregs and CLASS_MAX_NREGS to support VPR.

Patch 7: Changes to the underlying representation of vectors of
booleans to account for the different expectations between AArch64/SVE
and Arm/MVE.

Patch 8: Re-use and extend existing thumb2_movhi* patterns instead of
duplicating them in mve_mov. This requires the introduction of a
new constraint to match a constant vector of booleans. Add a new RTL
test.

Patch 9: Introduce check_effective_target_arm_mve and skip
gcc.dg/signbit-2.c, because with MVE there is no fallback architecture
unlike SVE or AVX512.

Patch 12: Update less load/store MVE builtins
(mve_vldrdq_gather_base_z_v2di,
mve_vldrdq_gather_offset_z_v2di,
mve_vldrdq_gather_shifted_offset_z_v2di,
mve_vstrdq_scatter_base_p_v2di,
mve_vstrdq_scatter_offset_p_v2di,
mve_vstrdq_scatter_offset_p_v2di_insn,
mve_vstrdq_scatter_shifted_offset_p_v2di,
mve_vstrdq_scatter_shifted_offset_p_v2di_insn,
mve_vstrdq_scatter_base_wb_p_v2di,
mve_vldrdq_gather_base_wb_z_v2di,
mve_vldrdq_gather_base_nowb_z_v2di,
mve_vldrdq_gather_base_wb_z_v2di_insn) for which we keep HI mode
for vpr_register_operand.

Patch 13: No need to update
gcc.target/arm/acle/cde-mve-full-assembly.c anymore since we re-use
the mov pattern that emits '@ movhi' in the assembly.

Patch 15: This is a new patch to fix a problem I noticed during this
v2->v3 update.



I'll squash patch 2 with patch 9 and patch 3 with patch 8.

Original text:

This patch series addresses PR 100757 and 101325 by representing
vectors of predicates (MVE VPR.P0 register) as vectors of booleans
rather than using HImode.

As this implies a lot of mostly mechanical changes, I have tried to
split the patches in a way that should help reviewers, but the split
is a bit artificial.

Patches 1-3 add new tests.

Patches 4-6 are small independent improvements.

Patch 7 implements the predicate qualifier, but does not change any
builtin yet.

Patch 8 is the first of the two main patches, and uses the new
qualifier to describe the vcmp and vpsel builtins that are useful for
auto-vectorization of comparisons.

Patch 9 is the second main patch, which fixes the vcond_mask expander.

Patches 10-13 convert almost all the remaining builtins with HI
operands to use the predicate qualifier.  After these, there are still
a few builtins with HI operands left, about which I am not sure: vctp,
vpnot, load-gather and store-scatter with v2di operands.  In fact,
patches 11/12 update some STR/LDR qualifiers in a way that breaks
these v2di builtins although existing tests still pass.

Christophe Lyon (15):
  arm: Add new tests for comparison vectorization with Neon and MVE
  arm: Add tests for PR target/100757
  arm: Add tests for PR target/101325
  arm: Add GENERAL_AND_VPR_REGS regclass
  arm: Add support for VPR_REG in arm_class_likely_spilled_p
  arm: Fix mve_vmvnq_n_ argument mode
  arm: Implement MVE predicates as vectors of booleans
  arm: Implement auto-vectorized MVE comparisons with vectors of boolean
predicates
  arm: Fix vcond_mask expander for MVE (PR target/100757)
  arm: Convert remaining MVE vcmp builtins to predicate qualifiers
  arm: Convert more MVE builtins to predicate qualifiers
  arm: Convert more load/store MVE builtins to predicate qualifiers
  arm: Convert more MVE/CDE builtins to predicate qualifiers
  arm: Add VPR_REG to ALL_REGS
  arm: Fix constraint check for V8HI in mve_vector_mem_operand

 gcc/config/aarch64/aarch64-modes.def  |   8 +-
 gcc/config/arm/arm-builtins.c | 224 +++--
 gcc/config/arm/arm-builtins.h |   4 +-
 gcc/config/arm/arm-modes.def  |   8 +
 gcc/config/arm/arm-protos.h   |   4 +-
 gcc/config/arm/arm-simd-builtin-types.def |   4 +
 gcc/config/arm/arm.c  | 169 ++--
 gcc/config/arm/arm.h  |   9 +-
 gcc/config/arm/arm_mve_builtins.def   | 746 
 gcc/config/arm/constraints.md |   6 +
 gcc/config/arm/iterators.md   |   6 +
 gcc/config/arm/mve.md | 795 ++
 gcc/config/arm/neon.md|  39 +
 gcc/config/arm/vec-common.md  |  52 --
 gcc/config/arm/vfp.md |  34 +-
 gcc/doc/sourcebuild.texi  |   4 +
 gcc/emit-rtl.c|  20 +-
 gcc/genmodes.c|  81 +-
 gcc/machmode.def  |   2 +-
 gcc/rtx-vector-builder.c  |   4 +-
 gcc/simplify-rtx.c|  34 +-
 gcc/testsuite/gcc.dg/signbit-2.c  |   1 +
 .../gcc.target/arm/simd/mve-vcmp-f32-2.c  |  32 +
 .../gcc.target/arm/simd/neon-compare-1.c  |  78 ++
 .../gcc.target/arm/simd/neon-c

[PATCH v3 01/15] arm: Add new tests for comparison vectorization with Neon and MVE

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch mainly adds Neon tests similar to existing MVE ones,
to make sure we do not break Neon when fixing MVE.

mve-vcmp-f32-2.c is similar to mve-vcmp-f32.c but uses a conditional
with 2.0f and 3.0f constants to help scan-assembler-times.

2022-01-13  Christophe Lyon 

gcc/testsuite/
* gcc.target/arm/simd/mve-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-compare-1.c: New.
* gcc.target/arm/simd/neon-compare-2.c: New.
* gcc.target/arm/simd/neon-compare-3.c: New.
* gcc.target/arm/simd/neon-compare-scalar-1.c: New.
* gcc.target/arm/simd/neon-vcmp-f16.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-2.c: New.
* gcc.target/arm/simd/neon-vcmp-f32-3.c: New.
* gcc.target/arm/simd/neon-vcmp-f32.c: New.
* gcc.target/arm/simd/neon-vcmp.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c 
b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
new file mode 100644
index 000..917a95bf141
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vcmp-f32-2.c
@@ -0,0 +1,32 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+
+#include 
+
+#define NB 4
+
+#define FUNC(OP, NAME) \
+  void test_ ## NAME ##_f (float * __restrict__ dest, float *a, float *b) { \
+int i; \
+for (i=0; i, vcmpgt)
+FUNC(>=, vcmpge)
+
+/* { dg-final { scan-assembler-times {\tvcmp.f32\teq, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tne, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tlt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tle, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tgt, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcmp.f32\tge, q[0-9]+, q[0-9]+\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 24 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1077936128\n} 24 } } */ /* 
Constant 3.0f.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c 
b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
new file mode 100644
index 000..2e0222a71f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/neon-compare-1.c
@@ -0,0 +1,78 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-add-options arm_neon } */
+/* { dg-additional-options "-O3" } */
+
+#include "mve-compare-1.c"
+
+/* 64-bit vectors.  */
+/* vmvn is used by 'ne' comparisons: 3 sizes * 2 (signed/unsigned) * 2
+   (register/zero) = 12.  */
+/* { dg-final { scan-assembler-times {\tvmvn\td[0-9]+, d[0-9]+\n} 12 } } */
+
+/* { 8 bits } x { eq, ne, lt, le, gt, ge }. */
+/* ne uses eq, lt/le only apply to comparison with zero, they use gt/ge
+   otherwise.  */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i8\td[0-9]+, d[0-9]+, #0\n} 4 } } 
*/
+/* { dg-final { scan-assembler-times {\tvclt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcle.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s8\td[0-9]+, d[0-9]+, #0\n} 1 } } 
*/
+
+/* { 16 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i16\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcgt.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { dg-final { scan-assembler-times {\tvcge.s16\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+
+/* { 32 bits } x { eq, ne, lt, le, gt, ge }. */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
4 } } */
+/* { dg-final { scan-assembler-times {\tvceq.i32\td[0-9]+, d[0-9]+, #0\n} 4 } 
} */
+/* { dg-final { scan-assembler-times {\tvclt.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcle.s32\td[0-9]+, d[0-9]+, #0\n} 1 } 
} */
+/* { dg-final { scan-assembler-times {\tvcgt.s32\td[0-9]+, d[0-9]+, d[0-9]+\n} 
2 } } */
+/* { d

[PATCH v3 02/15] arm: Add tests for PR target/100757

2022-01-13 Thread Christophe Lyon via Gcc-patches
These tests currently trigger an ICE which is fixed later in the patch
series.

The pr100757*.c testcases are derived from
gcc.c-torture/compile/20160205-1.c, forcing the use of MVE, and using
various types and return values different from 0 and 1 to avoid
commonalization with boolean masks.  In addition, since we should not
need these masks, the tests make sure they are not present.

2022-01-13  Christophe Lyon  

gcc/testsuite/
PR target/100757
* gcc.target/arm/simd/pr100757-2.c: New.
* gcc.target/arm/simd/pr100757-3.c: New.
* gcc.target/arm/simd/pr100757-4.c: New.
* gcc.target/arm/simd/pr100757.c: New.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
new file mode 100644
index 000..c2262b4d81e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+int fn1(int d) {
+  int c = 4;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t4\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t5\n} 4 } } */ /* Possible value 
for c.  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
new file mode 100644
index 000..e604555c04c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O3 -funsafe-math-optimizations" } */
+/* Copied from gcc.c-torture/compile/20160205-1.c.  */
+
+float a[32];
+float fn1(int d) {
+  float c = 4.0f;
+  for (int b = 0; b < 32; b++)
+if (a[b] != 2.0f)
+  c = 5.0f;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t1073741824\n} 4 } } */ /* 
Constant 2.0f.  */
+/* { dg-final { scan-assembler-times {\t.word\t1084227584\n} 4 } } */ /* 
Initial value for c (4.0).  */
+/* { dg-final { scan-assembler-times {\t.word\t1082130432\n} 4 } } */ /* 
Possible value for c (5.0).  */
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-not {\t.word\t0\n} } } */ /* 'false' mask.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
new file mode 100644
index 000..c12040c517f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+unsigned int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr100757.c 
b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
new file mode 100644
index 000..41d6e4e2d7a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr100757.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+/* Derived from gcc.c-torture/compile/20160205-1.c.  */
+
+int a[32];
+int fn1(int d) {
+  int c = 2;
+  for (int b = 0; b < 32; b++)
+if (a[b])
+  c = 3;
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\t.word\t0\n} 4 } } */ /* 'false' mask.  
*/
+/* { dg-final { scan-assembler-not {\t.word\t1\n} } } */ /* 'true' mask.  */
+/* { dg-final { scan-assembler-times {\t.word\t2\n} 4 } } */ /* Initial value 
for c.  */
+/* { dg-final { scan-assembler-times {\t.word\t3\n} 4 } } */ /* Possible value 
for c.  */
-- 
2.25.1



[PATCH v3 03/15] arm: Add tests for PR target/101325

2022-01-13 Thread Christophe Lyon via Gcc-patches
These tests are derived from the one provided in the PR: there is a
compile-only test because I did not have access to anything that could
execute MVE code until recently.
I have been able to add an executable test since QEMU supports MVE.

Instead of adding arm_v8_1m_mve_hw, I update arm_mve_hw so that it
uses add_options_for_arm_v8_1m_mve_fp, like arm_neon_hw does.  This
ensures arm_mve_hw passes even if the toolchain does not generate MVE
code by default.

2022-01-13  Christophe Lyon  

gcc/testsuite/
PR target/101325
* gcc.target/arm/simd/pr101325.c: New.
* gcc.target/arm/simd/pr101325-2.c: New.
* lib/target-supports.exp (check_effective_target_arm_mve_hw): Use
add_options_for_arm_v8_1m_mve_fp.

diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
new file mode 100644
index 000..355f6473a00
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325-2.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_mve_hw } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_v8_1m_mve } */
+
+#include 
+
+
+__attribute((noipa))
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+
+int main(void)
+{
+  if (foo (vdupq_n_s8(0), vdupq_n_s8(0)) != 0xU)
+__builtin_abort ();
+}
diff --git a/gcc/testsuite/gcc.target/arm/simd/pr101325.c 
b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
new file mode 100644
index 000..4cb2513da87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/simd/pr101325.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_v8_1m_mve_ok } */
+/* { dg-add-options arm_v8_1m_mve } */
+/* { dg-additional-options "-O3" } */
+
+#include 
+
+unsigned foo(int8x16_t v, int8x16_t w)
+{
+  return vcmpeqq (v, w);
+}
+/* { dg-final { scan-assembler {\tvcmp.i8  eq} } } */
+/* { dg-final { scan-assembler {\tvmrs\tr[0-9]+, P0} } } */
+/* { dg-final { scan-assembler {\tuxth} } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b4bf2e6b495..0fe1e1e077a 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -5037,6 +5037,7 @@ proc check_effective_target_arm_cmse_hw { } {
}
 } "-mcmse"]
 }
+
 # Return 1 if the target supports executing MVE instructions, 0
 # otherwise.
 
@@ -5052,7 +5053,7 @@ proc check_effective_target_arm_mve_hw {} {
   : "0" (a), "r" (b));
  return (a != 2);
}
-} ""]
+} [add_options_for_arm_v8_1m_mve_fp ""]]
 }
 
 # Return 1 if this is an ARM target where ARMv8-M Security Extensions with
-- 
2.25.1



[PATCH v3 04/15] arm: Add GENERAL_AND_VPR_REGS regclass

2022-01-13 Thread Christophe Lyon via Gcc-patches
At some point during the development of this patch series, it appeared
that in some cases the register allocator wants “VPR or general”
rather than “VPR or general or FP” (which is the same thing as
ALL_REGS).  The series does not seem to require this anymore, but it
seems to be a good thing to do anyway, to give the register allocator
more freedom.

CLASS_MAX_NREGS and arm_hard_regno_nregs need adjustment to avoid a
regression in gcc.dg/stack-usage-1.c when compiled with -mthumb
-mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.h (reg_class): Add GENERAL_AND_VPR_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(CLASS_MAX_NREGS): Handle VPR.
* config/arm/arm.c (arm_hard_regno_nregs): Handle VPR.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index bb75921f32d..c3559ca8703 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -25287,6 +25287,9 @@ thumb2_asm_output_opcode (FILE * stream)
 static unsigned int
 arm_hard_regno_nregs (unsigned int regno, machine_mode mode)
 {
+  if (IS_VPR_REGNUM (regno))
+return CEIL (GET_MODE_SIZE (mode), 2);
+
   if (TARGET_32BIT
   && regno > PC_REGNUM
   && regno != FRAME_POINTER_REGNUM
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index dacce2b7f08..2416fb5ef64 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1287,6 +1287,7 @@ enum reg_class
   SFP_REG,
   AFP_REG,
   VPR_REG,
+  GENERAL_AND_VPR_REGS,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1316,6 +1317,7 @@ enum reg_class
   "SFP_REG",   \
   "AFP_REG",   \
   "VPR_REG",   \
+  "GENERAL_AND_VPR_REGS", \
   "ALL_REGS"   \
 }
 
@@ -1344,6 +1346,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0040 }, /* SFP_REG */\
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
+  { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
   { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
 }
 
@@ -1453,7 +1456,9 @@ extern const char *fp_sysreg_names[NB_FP_SYSREGS];
ARM regs are UNITS_PER_WORD bits.  
FIXME: Is this true for iWMMX?  */
 #define CLASS_MAX_NREGS(CLASS, MODE)  \
-  (ARM_NUM_REGS (MODE))
+  (CLASS == VPR_REG) \
+  ? CEIL (GET_MODE_SIZE (MODE), 2)\
+  : (ARM_NUM_REGS (MODE))
 
 /* If defined, gives a class of registers that cannot be used as the
operand of a SUBREG that changes the mode of the object illegally.  */
-- 
2.25.1



[PATCH v3 05/15] arm: Add support for VPR_REG in arm_class_likely_spilled_p

2022-01-13 Thread Christophe Lyon via Gcc-patches
VPR_REG is the only register in its class, so it should be handled by
TARGET_CLASS_LIKELY_SPILLED_P, which is achieved by calling
default_class_likely_spilled_p.  No test fails without this patch, but
it seems it should be implemented.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.c (arm_class_likely_spilled_p): Handle VPR_REG.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c3559ca8703..64a8f2dc7de 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29317,7 +29317,7 @@ arm_class_likely_spilled_p (reg_class_t rclass)
   || rclass  == CC_REG)
 return true;
 
-  return false;
+  return default_class_likely_spilled_p (rclass);
 }
 
 /* Implements target hook small_register_classes_for_mode_p.  */
-- 
2.25.1



[PATCH v3 06/15] arm: Fix mve_vmvnq_n_ argument mode

2022-01-13 Thread Christophe Lyon via Gcc-patches
The vmvnq_n* intrinsics and have [u]int[16|32]_t arguments, so use
 iterator instead of HI in mve_vmvnq_n_.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/mve.md (mve_vmvnq_n_): Use V_elem mode
for operand 1.

diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index 171dd384133..5c3b34dce3a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -617,7 +617,7 @@ (define_insn "mve_vcvtaq_"
 (define_insn "mve_vmvnq_n_"
   [
(set (match_operand:MVE_5 0 "s_register_operand" "=w")
-   (unspec:MVE_5 [(match_operand:HI 1 "immediate_operand" "i")]
+   (unspec:MVE_5 [(match_operand: 1 "immediate_operand" "i")]
 VMVNQ_N))
   ]
   "TARGET_HAVE_MVE"
-- 
2.25.1



[PATCH v3 07/15] arm: Implement MVE predicates as vectors of booleans

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch implements support for vectors of booleans to support MVE
predicates, instead of HImode.  Since the ABI mandates pred16_t (aka
uint16_t) to represent predicates in intrinsics prototypes, we
introduce a new "predicate" type qualifier so that we can map relevant
builtins HImode arguments and return value to the appropriate vector
of booleans (VxBI).

We have to update test_vector_ops_duplicate, because it iterates using
an offset in bytes, where we would need to iterate in bits: we stop
iterating when we reach the end of the vector of booleans.

In addition, we have to fix the underlying definition of vectors of
booleans because ARM/MVE needs a different representation than
AArch64/SVE. With ARM/MVE the 'true' bit is duplicated over the
element size, so that a true element of V4BI is represented by
'0b'.  This patch updates the aarch64 definition of VNx*BI as
needed.

2022-01-13  Christophe Lyon  
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/aarch64/aarch64-modes.def (VNx16BI, VNx8BI, VNx4BI,
VNx2BI): Update definition.
* config/arm/arm-builtins.c (arm_init_simd_builtin_types): Add new
simd types.
(arm_init_builtin): Map predicate vectors arguments to HImode.
(arm_expand_builtin_args): Move HImode predicate arguments to VxBI
rtx. Move return value to HImode rtx.
* config/arm/arm-builtins.h (arm_type_qualifiers): Add 
qualifier_predicate.
* config/arm/arm-modes.def (B2I, B4I, V16BI, V8BI, V4BI): New modes.
* config/arm/arm-simd-builtin-types.def (Pred1x16_t,
Pred2x8_t,Pred4x4_t): New.
* emit-rtl.c (init_emit_once): Handle all boolean modes.
* genmodes.c (mode_data): Add boolean field.
(blank_mode): Initialize it.
(make_complex_modes): Fix handling of boolean modes.
(make_vector_modes): Likewise.
(VECTOR_BOOL_MODE): Use new COMPONENT parameter.
(make_vector_bool_mode): Likewise.
(BOOL_MODE): New.
(make_bool_mode): New.
(emit_insn_modes_h): Fix generation of boolean modes.
(emit_class_narrowest_mode): Likewise.
* machmode.def: Use new BOOL_MODE instead of FRACTIONAL_INT_MODE
to define BImode.
* rtx-vector-builder.c (rtx_vector_builder::find_cached_value):
Fix handling of constm1_rtx for VECTOR_BOOL.
* simplify-rtx.c (native_encode_rtx): Fix support for VECTOR_BOOL.
(native_decode_vector_rtx): Likewise.
(test_vector_ops_duplicate): Skip vec_merge test
with vectors of booleans.
* varasm.c (output_constant_pool_2): Likewise.

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def
index 976bf9b42be..8f399225a80 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -47,10 +47,10 @@ ADJUST_FLOAT_FORMAT (HF, &ieee_half_format);
 
 /* Vector modes.  */
 
-VECTOR_BOOL_MODE (VNx16BI, 16, 2);
-VECTOR_BOOL_MODE (VNx8BI, 8, 2);
-VECTOR_BOOL_MODE (VNx4BI, 4, 2);
-VECTOR_BOOL_MODE (VNx2BI, 2, 2);
+VECTOR_BOOL_MODE (VNx16BI, 16, BI, 2);
+VECTOR_BOOL_MODE (VNx8BI, 8, BI, 2);
+VECTOR_BOOL_MODE (VNx4BI, 4, BI, 2);
+VECTOR_BOOL_MODE (VNx2BI, 2, BI, 2);
 
 ADJUST_NUNITS (VNx16BI, aarch64_sve_vg * 8);
 ADJUST_NUNITS (VNx8BI, aarch64_sve_vg * 4);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..2ccfa37c302 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -1548,6 +1548,13 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
   arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
 
+  if (TARGET_HAVE_MVE)
+{
+  arm_simd_types[Pred1x16_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred2x8_t].eltype = unsigned_intHI_type_node;
+  arm_simd_types[Pred4x4_t].eltype = unsigned_intHI_type_node;
+}
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = arm_simd_types[i].eltype;
@@ -1695,6 +1702,11 @@ arm_init_builtin (unsigned int fcode, arm_builtin_datum 
*d,
   if (qualifiers & qualifier_map_mode)
op_mode = d->mode;
 
+  /* MVE Predicates use HImode as mandated by the ABI: pred16_t is unsigned
+short.  */
+  if (qualifiers & qualifier_predicate)
+   op_mode = HImode;
+
   /* For pointers, we want a pointer to the basic type
 of the vector.  */
   if (qualifiers & qualifier_pointer && VECTOR_MODE_P (op_mode))
@@ -2939,6 +2951,11 @@ arm_expand_builtin_args (rtx target, machine_mode 
map_mode, int fcode,
case ARG_BUILTIN_COPY_TO_REG:
  if (POINTER_TYPE_P (TREE_TYPE (arg[argc])))
op[argc] = convert_memory_address (Pmode, op[argc]);
+
+ /* MVE uses mve_pred16_t (aka HImode) for vectors of predicates.  
*/
+ if (GET_MODE_CLASS (mode[argc]) == MODE_VECTOR_BOOL)
+   op[argc] = 

[PATCH v3 08/15] arm: Implement auto-vectorized MVE comparisons with vectors of boolean predicates

2022-01-13 Thread Christophe Lyon via Gcc-patches
We make use of qualifier_predicate to describe MVE builtins
prototypes, restricting to auto-vectorizable vcmp* and vpsel builtins,
as they are exercised by the tests added earlier in the series.

Special handling is needed for mve_vpselq because it has a v2di
variant, which has no natural VPR.P0 representation: we keep HImode
for it.

The vector_compare expansion code is updated to use the right VxBI
mode instead of HI for the result.

We extend the existing thumb2_movhi_vfp and thumb2_movhi_fp16 patterns
to use the new MVE_7_HI iterator which covers HI and the new VxBI
modes, in conjunction with the new DB constraint for a constant vector
of booleans.

2022-01-13  Christophe Lyon 
Richard Sandiford  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_PRED_UNONE_UNONE_QUALIFIERS)
(BINOP_PRED_NONE_NONE_QUALIFIERS)
(TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS)
(TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm-protos.h (mve_const_bool_vec_to_hi): New.
* config/arm/arm.c (arm_hard_regno_mode_ok): Handle new VxBI
modes.
(arm_mode_to_pred_mode): New.
(arm_expand_vector_compare): Use the right VxBI mode instead of
HI.
(arm_expand_vcond): Likewise.
(simd_valid_immediate): Handle MODE_VECTOR_BOOL.
(mve_const_bool_vec_to_hi): New.
(neon_make_constant): Call mve_const_bool_vec_to_hi when needed.
* config/arm/arm_mve_builtins.def (vcmpneq_, vcmphiq_, vcmpcsq_)
(vcmpltq_, vcmpleq_, vcmpgtq_, vcmpgeq_, vcmpeqq_, vcmpneq_f)
(vcmpltq_f, vcmpleq_f, vcmpgtq_f, vcmpgeq_f, vcmpeqq_f, vpselq_u)
(vpselq_s, vpselq_f): Use new predicated qualifiers.
* config/arm/constraints.md (DB): New.
* config/arm/iterators.md (MVE_7, MVE_7_HI): New mode iterators.
(MVE_VPRED, MVE_vpred): New attribute iterators.
* config/arm/mve.md (@mve_vcmpq_)
(@mve_vcmpq_f, @mve_vpselq_)
(@mve_vpselq_f): Use MVE_VPRED instead of HI.
(@mve_vpselq_v2di): Define separately.
(mov): New expander for VxBI modes.
* config/arm/vfp.md (thumb2_movhi_vfp, thumb2_movhi_fp16): Use
MVE_7_HI iterator and add support for DB constraint.

gcc/testsuite/
PR target/100757
PR target/101325
* gcc.dg/rtl/arm/mve-vxbi.c: New test.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2ccfa37c302..36d71ab1a13 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -420,6 +420,12 @@ 
arm_binop_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_binop_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned };
+#define BINOP_PRED_UNONE_UNONE_QUALIFIERS \
+  (arm_binop_pred_unone_unone_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_none, qualifier_immediate };
@@ -438,6 +444,12 @@ arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_UNONE_NONE_NONE_QUALIFIERS \
   (arm_binop_unone_none_none_qualifiers)
 
+static enum arm_type_qualifiers
+arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none };
+#define BINOP_PRED_NONE_NONE_QUALIFIERS \
+  (arm_binop_pred_none_none_qualifiers)
+
 static enum arm_type_qualifiers
 arm_binop_unone_unone_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_none };
@@ -509,6 +521,12 @@ 
arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
   (arm_ternop_none_none_none_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
+#define TERNOP_NONE_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_none_none_none_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_immediate, qualifier_unsigned 
};
@@ -528,6 +546,13 @@ 
arm_ternop_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
   (arm_ternop_unone_unone_unone_unone_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_unone_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_A

[PATCH v3 09/15] arm: Fix vcond_mask expander for MVE (PR target/100757)

2022-01-13 Thread Christophe Lyon via Gcc-patches
The problem in this PR is that we call VPSEL with a mask of vector
type instead of HImode. This happens because operand 3 in vcond_mask
is the pre-computed vector comparison and has vector type.

This patch fixes it by implementing TARGET_VECTORIZE_GET_MASK_MODE,
returning the appropriate VxBI mode when targeting MVE.  In turn, this
implies implementing vec_cmp,
vec_cmpu and vcond_mask_, and we can
move vec_cmp, vec_cmpu and
vcond_mask_ back to neon.md since they are not
used by MVE anymore.  The new * patterns listed above are
implemented in mve.md since they are only valid for MVE. However this
may make maintenance/comparison more painful than having all of them
in vec-common.md.

In the process, we can get rid of the recently added vcond_mve
parameter of arm_expand_vector_compare.

Compared to neon.md's vcond_mask_ before my "arm:
Auto-vectorization for MVE: vcmp" patch (r12-834), it keeps the VDQWH
iterator added in r12-835 (to have V4HF/V8HF support), as well as the
(! || flag_unsafe_math_optimizations) condition which
was not present before r12-834 although SF modes were enabled by VDQW
(I think this was a bug).

Using TARGET_VECTORIZE_GET_MASK_MODE has the advantage that we no
longer need to generate vpsel with vectors of 0 and 1: the masks are
now merged via scalar 'ands' instructions operating on 16-bit masks
after converting the boolean vectors.

In addition, this patch fixes a problem in arm_expand_vcond() where
the result would be a vector of 0 or 1 instead of operand 1 or 2.

Since we want to skip gcc.dg/signbit-2.c for MVE, we also add a new
arm_mve effective target.

Reducing the number of iterations in pr100757-3.c from 32 to 8, we
generate the code below:

float a[32];
float fn1(int d) {
  float c = 4.0f;
  for (int b = 0; b < 8; b++)
if (a[b] != 2.0f)
  c = 5.0f;
  return c;
}

fn1:
ldr r3, .L3+48
vldr.64 d4, .L3  // q2=(2.0,2.0,2.0,2.0)
vldr.64 d5, .L3+8
vldrw.32q0, [r3] // q0=a(0..3)
addsr3, r3, #16
vcmp.f32eq, q0, q2   // cmp a(0..3) == (2.0,2.0,2.0,2.0)
vldrw.32q1, [r3] // q1=a(4..7)
vmrs r3, P0
vcmp.f32eq, q1, q2   // cmp a(4..7) == (2.0,2.0,2.0,2.0)
vmrsr2, P0  @ movhi
andsr3, r3, r2   // r3=select(a(0..3]) & select(a(4..7))
vldr.64 d4, .L3+16   // q2=(5.0,5.0,5.0,5.0)
vldr.64 d5, .L3+24
vmsr P0, r3
vldr.64 d6, .L3+32   // q3=(4.0,4.0,4.0,4.0)
vldr.64 d7, .L3+40
vpsel q3, q3, q2 // q3=vcond_mask(4.0,5.0)
vmov.32 r2, q3[1]// keep the scalar max
vmov.32 r0, q3[3]
vmov.32 r3, q3[2]
vmov.f32s11, s12
vmovs15, r2
vmovs14, r3
vmaxnm.f32  s15, s11, s15
vmaxnm.f32  s15, s15, s14
vmovs14, r0
vmaxnm.f32  s15, s15, s14
vmovr0, s15
bx  lr
.L4:
.align  3
.L3:
.word   1073741824  // 2.0f
.word   1073741824
.word   1073741824
.word   1073741824
.word   1084227584  // 5.0f
.word   1084227584
.word   1084227584
.word   1084227584
.word   1082130432  // 4.0f
.word   1082130432
.word   1082130432
.word   1082130432

2022-01-13  Christophe Lyon  

PR target/100757
gcc/
* config/arm/arm-protos.h (arm_get_mask_mode): New prototype.
(arm_expand_vector_compare): Update prototype.
* config/arm/arm.c (TARGET_VECTORIZE_GET_MASK_MODE): New.
(arm_vector_mode_supported_p): Add support for VxBI modes.
(arm_expand_vector_compare): Remove useless generation of vpsel.
(arm_expand_vcond): Fix select operands.
(arm_get_mask_mode): New.
* config/arm/mve.md (vec_cmp): New.
(vec_cmpu): New.
(vcond_mask_): New.
* config/arm/vec-common.md (vec_cmp)
(vec_cmpu): Move to ...
* config/arm/neon.md (vec_cmp)
(vec_cmpu): ... here
and disable for MVE.
* doc/sourcebuild.texi (arm_mve): Document new effective-target.

gcc/testsuite/
* gcc.dg/signbit-2.c: Skip when targeting ARM/MVE.
* lib/target-supports.exp (check_effective_target_arm_mve): New.

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index b978adf2038..a84613104b1 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern opt_machine_mode arm_get_mask_mode (machine_mode mode);
 
 #endif /* RTX_CODE */
 
@@ -378,7 +379,7 @@ extern void arm_emit_coreregs_64bit_shift (enum rtx_code, 
rtx, rtx, rtx, rtx,
 extern bool arm_fusion_enabled_p (

[PATCH v3 10/15] arm: Convert remaining MVE vcmp builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This is mostly a mechanical change, only tested by the intrinsics
expansion tests.

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (BINOP_UNONE_NONE_NONE_QUALIFIERS):
Delete.
(TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS): Change to ...
(TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS): ... this.
(TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS): New.
* config/arm/arm_mve_builtins.def (vcmp*q_n_, vcmp*q_m_f): Use new
predicated qualifiers.
* config/arm/mve.md (mve_vcmpq_n_)
(mve_vcmp*q_m_f): Use MVE_VPRED instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 36d71ab1a13..9cc192ddb9a 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -438,12 +438,6 @@ arm_binop_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define BINOP_NONE_NONE_UNONE_QUALIFIERS \
   (arm_binop_none_none_unone_qualifiers)
 
-static enum arm_type_qualifiers
-arm_binop_unone_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none };
-#define BINOP_UNONE_NONE_NONE_QUALIFIERS \
-  (arm_binop_unone_none_none_qualifiers)
-
 static enum arm_type_qualifiers
 arm_binop_pred_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_predicate, qualifier_none, qualifier_none };
@@ -504,10 +498,10 @@ 
arm_ternop_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_ternop_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
-arm_ternop_unone_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_UNONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_unone_none_none_unone_qualifiers)
+arm_ternop_pred_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_none, qualifier_none, qualifier_predicate 
};
+#define TERNOP_PRED_NONE_NONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_none_none_pred_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -553,6 +547,13 @@ 
arm_ternop_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_UNONE_UNONE_UNONE_PRED_QUALIFIERS \
   (arm_ternop_unone_unone_unone_pred_qualifiers)
 
+static enum arm_type_qualifiers
+arm_ternop_pred_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_predicate, qualifier_unsigned, qualifier_unsigned,
+qualifier_predicate };
+#define TERNOP_PRED_UNONE_UNONE_PRED_QUALIFIERS \
+  (arm_ternop_pred_unone_unone_pred_qualifiers)
+
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_none_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none };
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 44b41eab4c5..b7ebbcab87f 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -118,9 +118,9 @@ VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vhaddq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, veorq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmphiq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_UNONE_UNONE, vcmpcsq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vbicq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vandq_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_UNONE, vaddvq_p_u, v16qi, v8hi, v4si)
@@ -142,17 +142,17 @@ VAR3 (BINOP_UNONE_UNONE_NONE, vbrsrq_n_u, v16qi, v8hi, 
v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshlq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vrshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_IMM, vqshlq_n_u, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpltq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpleq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgtq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
+VAR3 (BINOP_PRED_NONE_NONE, vcmpgeq_n_, v16qi, v8hi, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpeqq_, v16qi, v8hi, v4si)
-VAR3 (BINOP_UNONE_NONE_NONE, vcmpe

[PATCH v3 12/15] arm: Convert more load/store MVE builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few builtins where we do not use the 
iterator and thus we cannot use .

For v2di instructions, we keep the HI mode for predicates.

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (STRSBS_P_QUALIFIERS): Use predicate
qualifier.
(STRSBU_P_QUALIFIERS): Likewise.
(LDRGBS_Z_QUALIFIERS): Likewise.
(LDRGBU_Z_QUALIFIERS): Likewise.
(LDRGBWBXU_Z_QUALIFIERS): Likewise.
(LDRGBWBS_Z_QUALIFIERS): Likewise.
(LDRGBWBU_Z_QUALIFIERS): Likewise.
(STRSBWBS_P_QUALIFIERS): Likewise.
(STRSBWBU_P_QUALIFIERS): Likewise.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 0b063b5f037..73678a00398 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -689,13 +689,13 @@ arm_strss_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBS_P_QUALIFIERS (arm_strsbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_void, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBU_P_QUALIFIERS (arm_strsbu_p_qualifiers)
 
 static enum arm_type_qualifiers
@@ -731,13 +731,13 @@ arm_ldrgbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBS_Z_QUALIFIERS (arm_ldrgbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBU_Z_QUALIFIERS (arm_ldrgbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -777,7 +777,7 @@ arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -793,13 +793,13 @@ arm_ldrgbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_ldrgbwbs_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBS_Z_QUALIFIERS (arm_ldrgbwbs_z_qualifiers)
 
 static enum arm_type_qualifiers
 arm_ldrgbwbu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
-  qualifier_unsigned};
+  qualifier_predicate};
 #define LDRGBWBU_Z_QUALIFIERS (arm_ldrgbwbu_z_qualifiers)
 
 static enum arm_type_qualifiers
@@ -815,13 +815,13 @@ arm_strsbwbu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 static enum arm_type_qualifiers
 arm_strsbwbs_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_none, qualifier_unsigned};
+  qualifier_none, qualifier_predicate};
 #define STRSBWBS_P_QUALIFIERS (arm_strsbwbs_p_qualifiers)
 
 static enum arm_type_qualifiers
 arm_strsbwbu_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_const,
-  qualifier_unsigned, qualifier_unsigned};
+  qualifier_unsigned, qualifier_predicate};
 #define STRSBWBU_P_QUALIFIERS (arm_strsbwbu_p_qualifiers)
 
 static enum arm_type_qualifiers
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index a8087815c22..9633b7187f6 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -7282,7 +7282,7 @@ (define_insn "mve_vstrwq_scatter_base_p_v4si"
[(match_operand:V4SI 0 "s_register_operand" "w")
 (match_operand:SI 1 "immediate_operand" "i")
 (match_operand:V4SI 2 "s_register_operand" "w")
-(match_operand:HI 3 "vpr_register_operand" "Up")]
+(match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VSTRWSBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -7371,7 +7371,7 @@ (define_insn "mve_vldrwq_gather_base_z_v4si"
   [(set (match_operand:V4SI 0 "s_register_operand" "=&w")
(unspec:V4SI [(match_operand:V4SI 1 "s_register_operand" "w")
  (match_operand:SI 2 "immediate_operand" "i")
- (match_operand:HI 3 "vpr_register_operand" "Up")]
+ (match_operand:V4BI 3 "vpr_register_operand" "Up")]
 VLDRWGBQ))
   ]
   "TARGET_HAVE_MVE"
@@ -76

[PATCH v3 13/15] arm: Convert more MVE/CDE builtins to predicate qualifiers

2022-01-13 Thread Christophe Lyon via Gcc-patches
This patch covers a few non-load/store builtins where we do not use
the  iterator and thus we cannot use .

2022-01-13  Christophe Lyon  

gcc/
PR target/100757
PR target/101325
* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): Use
predicate.
(CX_BINARY_UNONE_QUALIFIERS): Likewise.
(CX_TERNARY_UNONE_QUALIFIERS): Likewise.
(TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS): Delete.
(QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS): Delete.
* config/arm/arm_mve_builtins.def: Use predicated qualifiers.
* config/arm/mve.md: Use VxBI instead of HI.

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 73678a00398..f9437752a22 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -295,7 +295,7 @@ static enum arm_type_qualifiers
 arm_cx_unary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_UNARY_UNONE_QUALIFIERS (arm_cx_unary_unone_qualifiers)
 
 /* T (immediate, T, T, unsigned immediate).  */
@@ -304,7 +304,7 @@ arm_cx_binary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_BINARY_UNONE_QUALIFIERS (arm_cx_binary_unone_qualifiers)
 
 /* T (immediate, T, T, T, unsigned immediate).  */
@@ -313,7 +313,7 @@ arm_cx_ternary_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_immediate,
   qualifier_none, qualifier_none, qualifier_none,
   qualifier_unsigned_immediate,
-  qualifier_unsigned };
+  qualifier_predicate };
 #define CX_TERNARY_UNONE_QUALIFIERS (arm_cx_ternary_unone_qualifiers)
 
 /* The first argument (return type) of a store should be void type,
@@ -509,12 +509,6 @@ 
arm_ternop_none_none_none_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define TERNOP_NONE_NONE_NONE_IMM_QUALIFIERS \
   (arm_ternop_none_none_none_imm_qualifiers)
 
-static enum arm_type_qualifiers
-arm_ternop_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_unsigned };
-#define TERNOP_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_ternop_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_ternop_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_predicate };
@@ -567,13 +561,6 @@ 
arm_quadop_unone_unone_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_UNONE_UNONE_NONE_NONE_PRED_QUALIFIERS \
   (arm_quadop_unone_unone_none_none_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_none_none_none_none_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
-qualifier_unsigned };
-#define QUADOP_NONE_NONE_NONE_NONE_UNONE_QUALIFIERS \
-  (arm_quadop_none_none_none_none_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_none_none_none_none_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_none, qualifier_none,
@@ -588,13 +575,6 @@ 
arm_quadop_none_none_none_imm_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define QUADOP_NONE_NONE_NONE_IMM_PRED_QUALIFIERS \
   (arm_quadop_none_none_none_imm_pred_qualifiers)
 
-static enum arm_type_qualifiers
-arm_quadop_unone_unone_unone_unone_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
-qualifier_unsigned, qualifier_unsigned };
-#define QUADOP_UNONE_UNONE_UNONE_UNONE_UNONE_QUALIFIERS \
-  (arm_quadop_unone_unone_unone_unone_unone_qualifiers)
-
 static enum arm_type_qualifiers
 arm_quadop_unone_unone_unone_unone_pred_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
diff --git a/gcc/config/arm/arm_mve_builtins.def 
b/gcc/config/arm/arm_mve_builtins.def
index 7db6d47867e..1c8ee34f5cb 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -87,8 +87,8 @@ VAR4 (BINOP_UNONE_UNONE_UNONE, vcreateq_u, v16qi, v8hi, v4si, 
v2di)
 VAR4 (BINOP_NONE_UNONE_UNONE, vcreateq_s, v16qi, v8hi, v4si, v2di)
 VAR3 (BINOP_UNONE_UNONE_IMM, vshrq_n_u, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_IMM, vshrq_n_s, v16qi, v8hi, v4si)
-VAR1 (BINOP_NONE_NONE_UNONE, vaddlvq_p_s, v4si)
-VAR1 (BINOP_UNONE_UNONE_UNONE, vaddlvq_p_u, v4si)
+VAR1 (BINOP_NONE_NONE_PRED, vaddlvq_p_s, v4si)
+VAR1 (BINOP_UNONE_UNONE_PRED, vaddlvq_p_u, v4si)
 VAR3 (BINOP_PRED_NONE_NONE, vcmpneq_, v16qi, v8hi, v4si)
 VAR3 (BINOP_NONE_NONE_NONE, vshlq_s, v16qi, v8hi, v4si)
 VAR3 (BINOP_UNONE_UNONE_NONE, vshlq_u, v16qi, v8hi, v4si)
@@ -465,20 +465,20 @@ VAR2 (TERNOP_NONE_NONE_NONE

[PATCH v3 14/15] arm: Add VPR_REG to ALL_REGS

2022-01-13 Thread Christophe Lyon via Gcc-patches
VPR_REG should be part of ALL_REGS, this patch fixes this omission.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.h (REG_CLASS_CONTENTS): Add VPR_REG to ALL_REGS.

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 2416fb5ef64..ea9fb16b9b1 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1347,7 +1347,7 @@ enum reg_class
   { 0x, 0x, 0x, 0x0080 }, /* AFP_REG */\
   { 0x, 0x, 0x, 0x0400 }, /* VPR_REG.  */  \
   { 0x5FFF, 0x, 0x, 0x0400 }, /* GENERAL_AND_VPR_REGS. 
 */ \
-  { 0x7FFF, 0x, 0x, 0x000F }  /* ALL_REGS.  */ \
+  { 0x7FFF, 0x, 0x, 0x040F }  /* ALL_REGS.  */ \
 }
 
 #define FP_SYSREGS \
-- 
2.25.1



[PATCH v3 15/15] arm: Fix constraint check for V8HI in mve_vector_mem_operand

2022-01-13 Thread Christophe Lyon via Gcc-patches
When compiling gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c with
-mthumb -mfloat-abi=hard -march=armv8.1-m.main+mve.fp+fp.dp, the compiler
crashes because:
error: insn does not satisfy its constraints:
(insn 28 14 17 2 (set (reg:V8HI 16 s0 [orig:249 u16 ] [249])
(mem/c:V8HI (pre_modify:SI (reg/f:SI 12 ip [248])
(plus:SI (reg/f:SI 12 ip [248])
(const_int 32 [0x20]))) [1 u16+0 S16 A64])) 
"arm_mve.h":17113:10 3011 {*mve_movv8hi}
(expr_list:REG_INC (reg/f:SI 12 ip [248])
  (nil)))
during RTL pass: reload

We are trying to generate:
vldrh.16q3, [ip], #14
but the constraint check fails because ip is not a low reg.

This patch replaces LAST_LO_REGNUM by LAST_ARM_REGNUM in
mve_vector_mem_operand and avoids the ICE.

2022-01-13  Christophe Lyon  

gcc/
* config/arm/arm.c (mve_vector_mem_operand): Fix handling of V8HI.

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 7d56fa71806..5edca248fb7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13479,7 +13479,7 @@ mve_vector_mem_operand (machine_mode mode, rtx op, bool 
strict)
  case E_V4HImode:
  case E_V4HFmode:
if (val % 2 == 0 && abs (val) <= 254)
- return reg_no <= LAST_LO_REGNUM
+ return reg_no <= LAST_ARM_REGNUM
|| reg_no >= FIRST_PSEUDO_REGISTER;
return FALSE;
  case E_V4SImode:
-- 
2.25.1



Re: [PATCH] forwprop: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > > +   tree d = build_debug_expr_decl (type);
> > > +   gdebug *g
> > > + = gimple_build_debug_bind (d, build2 (rcode, type,
> > > +   new_lhs, arg),
> > > +stmt2);
> > > +   gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> > > +   replace_uses_by (lhs2, d);
> > 
> > I wonder if you can leave a lhs2 = d; in the IL instead of using
> > replace_uses_by which will process imm uses and fold stmts while
> > we're going to do that anyway in the caller?  That would IMHO
> > be better here.
> 
> I'd need to emit them always for reversible ops and when the
> atomic call can't be last, regardless of whether it is needed or not,
> just so that next DCE would remove those up and emit those debug stmts,
> because otherwise that could result in -fcompare-debug failures
> (at least with -fno-tree-dce -fno-tree-whatever ...).
> And
> + tree narg = build_debug_expr_decl (type);
> + gdebug *g
> +   = gimple_build_debug_bind (narg,
> +  fold_convert (type, arg),
> +  stmt2);
> isn't that much more code compared to
> gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
> Or would you like it to be emitted always, i.e.
> if (atomic_op != BIT_AND_EXPR
>&& atomic_op != BIT_IOR_EXPR
>/* With -fnon-call-exceptions if we can't
>   add stmts after the call easily.  */
>&& !stmt_ends_bb_p (stmt2))
>   {
> tree type = TREE_TYPE (lhs2);
> if (TREE_CODE (arg) == INTEGER_CST)
>   arg = fold_convert (type, arg);
> else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
>   {
> tree narg = make_ssa_name (type);
> gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
> gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> arg = narg;
>   }
> enum tree_code rcode;
> switch (atomic_op)
>   {
>   case PLUS_EXPR: rcode = MINUS_EXPR; break;
>   case MINUS_EXPR: rcode = PLUS_EXPR; break;
>   case BIT_XOR_EXPR: rcode = atomic_op; break;
>   default: gcc_unreachable ();
>   }
> tree d = build_debug_expr_decl (type);
> gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
> gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> lhs2 = NULL_TREE;
>   }
> in between
> update_stmt (use_stmt);
> and
> imm_use_iterator iter;
> and then do the
>  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
>if (use_stmt != cast_stmt)
> with resetting only if (lhs2)
> and similarly release_ssa_name (lhs2) only if (lhs2)?
> I think the usual case is that we emit debug exprs right away,
> not emit something that we want to DCE.
> 
> +   if (atomic_op == BIT_AND_EXPR
> +   || atomic_op == BIT_IOR_EXPR
> +   /* Or with -fnon-call-exceptions if we can't
> +  add debug stmts after the call.  */
> +   || stmt_ends_bb_p (stmt2))
> 
> 
> But now that you mention it, I think I don't handle right the
> case where lhs2 has no debug uses but there is a cast_stmt that has debug
> uses for its lhs.  We'd need to add_debug_temp in that case too and
> add a debug temp.

I'm mostly concerned about the replace_uses_by use.  forwprop
will go over newly emitted stmts and thus the hypothetical added

lhs2 = d;

record the copy and schedule the stmt for removal, substituting 'd'
in each use as it goes along the function and folding them.  It's
a bit iffy (and maybe has unintended side-effects in odd cases)
to trample around and fold stuff behind that flows back.

I'd always vote to simplify the folding code so it's easier to
maintain and not micro-optimize there since it's not going to be
a hot part of the compiler.

Richard.


Re: [PATCH] tree-optimization/96707 - Add relation to unsigned right shift.

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 2:58 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> A quick addition to range ops for
>
> LHS = OP1 >> OP2
>
> if OP1 and OP2 are both >= 0,   then we can register the relation  LHS
> <= OP1   and all the expected good things happen.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
>
> OK for trunk?

OK.

>
> Andrew


Re: [PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Richard Biener via Gcc-patches
On Thu, Jan 13, 2022 at 2:59 PM Andrew MacLeod via Gcc-patches
 wrote:
>
> This patch actually addresses a few PRs.
>
> The root PR was 97909.   Ranger context functionality was added to
> fold_const back in early November
> (https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)
>
> The other 2 PRs mentioned (83072 and 83073) partially worked after this,
> but the original patch did not change the result of the query in
> expr_not_equal_to () to a multi-range object.
>
> This patch simply changes the value_range variable in that routine to an
> int_range<5> so we can pick up more precision. This in turn allows us to
> capture all the tests as expected.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.
>
> OK for trunk?

OK (though I wonder why not use int_range_max?)

Thanks,
Richard.

>
> Andrew


Re: [PATCH] rs6000: Use known constant for GET_MODE_NUNITS and similar

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:40 AM Kewen.Lin  wrote:
>
> Hi David,
>
> on 2022/1/13 上午11:12, David Edelsohn wrote:
> > On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >>
> >> Hi,
> >>
> >> This patch is to clean up some codes with GET_MODE_UNIT_SIZE or
> >> GET_MODE_NUNITS, which can use known constant instead.
> >
> > I'll let Segher decide, but often the additional code is useful
> > self-documentation instead of magic constants.  Or at least the change
> > requires comments documenting the derivation of the constants
> > currently described by the code itself.
> >
>
> Thanks for the comments, I added some comments as suggested, also removed
> the whole "altivec_vreveti2" since I noticed it's useless, it's not used
> by any built-in functions and even unused in the commit db042e1603db50573.
>
> The updated version has been tested as before.

As we have discussed offline, the comments need to be clarified and expanded.

And the removal of altivec_vreveti2 should be confirmed with Carl
Love, who added the pattern less than a year ago. There may be another
patch planning to use it.

Thanks, David

>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/altivec.md (altivec_vreveti2): Remove.
> * config/rs6000/vsx.md (*vsx_extract_si, 
> *vsx_extract_si_float_df,
> *vsx_extract_si_float_, *vsx_insert_extract_v4sf_p9): Use
> known constant values to simplify code.
> ---
>  gcc/config/rs6000/altivec.md | 25 -
>  gcc/config/rs6000/vsx.md | 12 
>  2 files changed, 8 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index c2312cc1e0f..b7f056f8c60 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -3950,31 +3950,6 @@ (define_expand "altivec_negv4sf2"
>DONE;
>  })
>
> -;; Vector reverse elements
> -(define_expand "altivec_vreveti2"
> -  [(set (match_operand:TI 0 "register_operand" "=v")
> -   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
> - UNSPEC_VREVEV))]
> -  "TARGET_ALTIVEC"
> -{
> -  int i, j, size, num_elements;
> -  rtvec v = rtvec_alloc (16);
> -  rtx mask = gen_reg_rtx (V16QImode);
> -
> -  size = GET_MODE_UNIT_SIZE (TImode);
> -  num_elements = GET_MODE_NUNITS (TImode);
> -
> -  for (j = 0; j < num_elements; j++)
> -for (i = 0; i < size; i++)
> -  RTVEC_ELT (v, i + j * size)
> -   = GEN_INT (i + (num_elements - 1 - j) * size);
> -
> -  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
> -  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
> -operands[1], mask));
> -  DONE;
> -})
> -
>  ;; Vector reverse elements for V16QI V8HI V4SI V4SF
>  (define_expand "altivec_vreve2"
>[(set (match_operand:VEC_K 0 "register_operand" "=v")
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 802db0d112b..d246410880d 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3854,8 +3854,9 @@ (define_insn_and_split  "*vsx_extract_si"
>rtx vec_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4230,8 +4231,9 @@ (define_insn_and_split "*vsx_extract_si_float_df"
>rtx v4si_tmp = operands[3];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4273,8 +4275,9 @@ (define_insn_and_split 
> "*vsx_extract_si_float_"
>rtx df_tmp = operands[4];
>int value;
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -element = GEN_INT (GET_MODE_NUNITS (V4SImode) - 1 - INTVAL (element));
> +element = GEN_INT (3 - INTVAL (element));
>
>/* If the value is in the correct position, we can avoid doing the VSPLT
>   instruction.  */
> @@ -4466,8 +4469,9 @@ (define_insn "*vsx_insert_extract_v4sf_p9"
>  {
>int ele = INTVAL (operands[4]);
>
> +  /* Adjust index for LE element ordering.  */
>if (!BYTES_BIG_ENDIAN)
> -ele = GET_MODE_NUNITS (V4SFmode) - 1 - ele;
> +ele = 3 - ele;
>
>operands[4] = GEN_INT (GET_MODE_SIZE (SFmode) * ele);
>return "xxinsertw %x0,%x2,%4";
> --
> 2.27.0
>


Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Richard Earnshaw via Gcc-patches




On 12/01/2022 12:59, Martin Liška wrote:

Hello.

We've got -Wformat-diag for some time and I think we should start using it
in -Werror for GCC bootstrap. The following patch removes last pieces of 
the warning

for ARM target.





> diff --git a/gcc/config/arm/arm-builtins.c 
b/gcc/config/arm/arm-builtins.c

> index 9c645722230..ab5c469b1ba 100644
> --- a/gcc/config/arm/arm-builtins.c
> +++ b/gcc/config/arm/arm-builtins.c
> @@ -3013,7 +3013,7 @@ constant_arg:
> else
>   error_at (EXPR_LOCATION (exp),
> "coproc must be a constant immediate in "
> -  "range [0-%d] enabled with +cdecp",
> +  "range [0-%d] enabled with +cdecp%",
> ARM_CDE_CONST_COPROC);
>   }
> else

I'm not sure about this hunk.  It changes a literal '<'...'>' into 
quotes.  The text is trying to say you substitute  with a digit in 
the range shown.  Closer would be:


 "range [0-%d] enabled with %<+cdecp%>"

The other changes look OK.

R.


Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

 * common/config/arm/arm-common.c (arm_target_mode): Wrap
 keywords with %<, %> and remove trailing punctuation char.
 (arm_canon_arch_option_1): Likewise.
 (arm_asm_auto_mfpu): Likewise.
 * config/arm/arm-builtins.c (arm_expand_builtin): Likewise.
 * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Likewise.
 (use_vfp_abi): Likewise.
 (aapcs_vfp_is_call_or_return_candidate): Likewise.
 (arm_handle_cmse_nonsecure_entry): Likewise.
 (arm_handle_cmse_nonsecure_call): Likewise.
 (thumb1_md_asm_adjust): Likewise.
---
  gcc/common/config/arm/arm-common.c | 12 +++
  gcc/config/arm/arm-builtins.c  | 50 +++---
  gcc/config/arm/arm.c   | 12 +++
  3 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/gcc/common/config/arm/arm-common.c 
b/gcc/common/config/arm/arm-common.c

index e7e19400263..6a898d8554b 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -286,7 +286,7 @@ arm_target_mode (int argc, const char **argv)

    if (argc % 2 != 0)
  fatal_error (input_location,
- "%%:target_mode_check takes an even number of parameters");
+ "%%:% takes an even number of parameters");

    while (argc)
  {
@@ -295,8 +295,8 @@ arm_target_mode (int argc, const char **argv)
    else if (strcmp (argv[0], "cpu") == 0)
  cpu = argv[1];
    else
-    fatal_error (input_location,
- "unrecognized option passed to %%:target_mode_check");
+    fatal_error (input_location, "unrecognized option passed to %%:"
+ "%>");
    argc -= 2;
    argv += 2;
  }
@@ -662,7 +662,7 @@ arm_canon_arch_option_1 (int argc, const char 
**argv, bool arch_for_multilib)


    if (argc & 1)
  fatal_error (input_location,
- "%%:canon_for_mlib takes 1 or more pairs of parameters");
+ "%%:% takes 1 or more pairs of parameters");

    while (argc)
  {
@@ -676,7 +676,7 @@ arm_canon_arch_option_1 (int argc, const char 
**argv, bool arch_for_multilib)

  abi = argv[1];
    else
  fatal_error (input_location,
- "unrecognized operand to %%:canon_for_mlib");
+ "unrecognized operand to %%:%");

    argc -= 2;
    argv += 2;
@@ -1032,7 +1032,7 @@ arm_asm_auto_mfpu (int argc, const char **argv)
  arch = argv[1];
    else
  fatal_error (input_location,
- "unrecognized operand to %%:asm_auto_mfpu");
+ "unrecognized operand to %%:%");
    argc -= 2;
    argv += 2;
  }
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 9c645722230..ab5c469b1ba 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -3013,7 +3013,7 @@ constant_arg:
    else
  error_at (EXPR_LOCATION (exp),
    "coproc must be a constant immediate in "
-  "range [0-%d] enabled with +cdecp",
+  "range [0-%d] enabled with +cdecp%",
    ARM_CDE_CONST_COPROC);
  }
    else
@@ -3860,60 +3860,60 @@ arm_expand_builtin (tree exp,
    && (imm < 0 || imm > 32))
  {
    if (fcode == ARM_BUILTIN_WRORHI)
-    error ("the range of count should be in 0 to 32.  please check 
the intrinsic _mm_rori_pi16 in code.");
+    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi16%> in code");

    else if (fcode == ARM_BUILTIN_WRORWI)
-    error ("the range of count should be in 0 to 32.  please check 
the intrinsic _mm_rori_pi32 in code.");
+    error ("the range of count should be in 0 to 32; please check 
the intrinsic %<_mm_rori_pi32%> in code");

    else if (fcode == ARM_BUILTIN_WRORH)
-    error ("the r

Re: [vect] PR103997: Fix epilogue mode skipping

2022-01-13 Thread Andre Vieira (lists) via Gcc-patches



On 13/01/2022 14:25, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


On 13/01/2022 12:36, Richard Biener wrote:

On Thu, 13 Jan 2022, Andre Vieira (lists) wrote:


This time to the list too (sorry for double email)

Hi,

The original patch '[vect] Re-analyze all modes for epilogues', skipped
modes
that should not be skipped since it used the vector mode provided by
autovectorize_vector_modes to derive the minimum VF required for it.
However,
those modes should only really be used to dictate vector size, so instead
this
patch looks for the mode in 'used_vector_modes' with the largest element
size,
and constructs a vector mode with the smae size as the current
vector_modes[mode_i]. Since we are using the largest element size the
NUNITs
for this mode is the smallest possible VF required for an epilogue with
this
mode and should thus skip only the modes we are certain can not be used.

Passes bootstrap and regression on x86_64 and aarch64.

Clearly

+ /* To make sure we are conservative as to what modes we skip, we
+should use check the smallest possible NUNITS which would be
+derived from the mode in USED_VECTOR_MODES with the largest
+element size.  */
+ scalar_mode max_elsize_mode = GET_MODE_INNER
(vector_modes[mode_i]);
+ for (vec_info::mode_set::iterator i =
+   first_loop_vinfo->used_vector_modes.begin ();
+ i != first_loop_vinfo->used_vector_modes.end (); ++i)
+   {
+ if (VECTOR_MODE_P (*i)
+ && GET_MODE_SIZE (GET_MODE_INNER (*i))
+ > GET_MODE_SIZE (max_elsize_mode))
+   max_elsize_mode = GET_MODE_INNER (*i);
+   }

can be done once before iterating over the modes for the epilogue.

True, I'll start with QImode instead of the inner of vector_modes[mode_i] too
since we can't guarantee the mode is a VECTOR_MODE_P and it is actually better
too since we can't possible guarantee the element size of the
USED_VECTOR_MODES is smaller than that of the first vector mode...


Richard maybe knows whether we should take care to look at the
size of the vector mode as well since related_vector_mode when
passed 0 as nunits produces a vector mode with the same size
as vector_modes[mode_i] but not all used_vector_modes may be
of the same size

I suspect that should be fine though, since if we use the largest element size
of all used_vector_modes then that should gives us the least possible number
of NUNITS and thus only conservatively skip. That said, that does assume that
no vector mode used may be larger than the size of the loop's vector_mode. Can
I assume that?

No idea, but I would lean towards a no ;)  I think the loops vector_mode
doesn't have to match vector_modes[mode_i] either, does it?  At least
autodetected_vector_mode will be not QImode based.
The mode doesn't but both vector modes have to be the same vector size 
surely, I'm not referring to the element size here.
What I was trying to ask was whether all vector modes in 
used_vector_modes had the same vector size as the loops vector mode (and 
the vector_modes[mode_i] it originated from).



(and you probably also want to exclude
VECTOR_BOOLEAN_TYPE_P from the search?)

Yeah I think so too, thanks!

I keep going back to thinking (as I brought up in the bugzilla ticket), maybe
we ought to only skip if the NUNITS of the vector mode with the same vector
size as vector_modes[mode_i] is larger than first_info_vf, or just don't skip
at all...

The question is how much work we do before realizing the chosen mode
cannot be used because there's not enough iterations?  Maybe we can
improve there easily?
IIUC the VF can change depending on whether we decide to use SLP, so 
really we can only check if after we have determined whether or not to 
use SLP, so either:
* When SLP fully succeeds, so somewhere between the last 'goto again;' 
and return success, but there is very little left to do there

* When SLP fails: here we could save on some work.



Also for targets that for the main loop do not perform cost
comparison (like x86) but have lots of vector modes the previous
mode of operation really made sense (start at next_mode_i or
mode_i when unrolling).
Are you hinting at maybe creating different paths here based on some 
target configurable thing? Could be something we ask vector_costs?




Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 03:37:31PM +, Richard Earnshaw via Gcc-patches 
wrote:
> I'm not sure about this hunk.  It changes a literal '<'...'>' into quotes.
> The text is trying to say you substitute  with a digit in the range
> shown.  Closer would be:
> 
>  "range [0-%d] enabled with %<+cdecp%>"

Then perhaps it should be %<+cdecp%>N ?   in between quotes suggests
literal .

Jakub



Re: [PATCH] Fix -Wformat-diag for ARM target.

2022-01-13 Thread Martin Liška

On 1/13/22 16:37, Richard Earnshaw wrote:

  "range [0-%d] enabled with %<+cdecp%>"


Great, this works.

So I'm going to push the commit.

Martin



The other changes look OK.




Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Martin Sebor via Gcc-patches

On 1/13/22 05:55, Richard Sandiford wrote:

Martin Sebor via Gcc-patches  writes:

On 1/12/22 02:02, Martin Liška wrote:

Hello.

We've got -Wformat-diag for some time and I think we should start using it
in -Werror for GCC bootstrap. The following patch removes last pieces of
the warning
for rs6000 target.

Ready to be installed?
Thanks,
Martin


gcc/ChangeLog:

  * config/rs6000/rs6000-call.c (rs6000_invalid_builtin): Wrap
  keywords and use %qs instead of %<%s%>.
  (rs6000_expand_builtin): Likewise.

gcc/testsuite/ChangeLog:

  * gcc.target/powerpc/bfp/scalar-extract-exp-5.c: Adjust scans in
  testcases.
  * gcc.target/powerpc/bfp/scalar-extract-sig-5.c: Likewise.
  * gcc.target/powerpc/bfp/scalar-insert-exp-11.c: Likewise.
---
   gcc/config/rs6000/rs6000-call.c   | 8 
   .../gcc.target/powerpc/bfp/scalar-extract-exp-5.c | 2 +-
   .../gcc.target/powerpc/bfp/scalar-extract-sig-5.c | 2 +-
   .../gcc.target/powerpc/bfp/scalar-insert-exp-11.c | 2 +-
   4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c
b/gcc/config/rs6000/rs6000-call.c
index c78b8b08c40..becdad73812 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -3307,7 +3307,7 @@ rs6000_invalid_builtin (enum rs6000_gen_builtins
fncode)
    "-mvsx");
     break;
   case ENB_IEEE128_HW:
-  error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
+  error ("%qs requires ISA 3.0 IEEE 128-bit floating-point", name);


The instances of the warning where floating point is at the end
of a message aren't correct.  The warning should be relaxed to
allow unhyphenated floating point as a noun (as discussed briefly
last March:
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/566881.html)


Wouldn't it be fair to say that “floating point” in the message above is
really an adjective modifying an implicit noun?  The floating (decimal)
point doesn't itself have 128 bits.

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.


I agree you could say that too.  I didn't mean what I said as
an objection to the change but more as an observation that it
shouldn't be necessary (and an acknowledgment that I haven't
yet done what I said I'd do).

Martin



Thanks,
Richard




[PATCH] [i386] Fix ICE of unrecognizable insn. [PR target/104001]

2022-01-13 Thread liuhongt via Gcc-patches
For define_insn_and_split "*xor2andn":

1. Refine predicate of operands[0] from nonimmediate_operand to
register_operand.
2. Remove TARGET_AVX512BW from condition to avoid kmov when TARGET_BMI
is not available.
3. Force_reg operands[2].

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/104001
PR target/94790
* config/i386/i386.md (*xor2andn): Refine predicate of
operands[0] from nonimmediate_operand to
register_operand, remove TARGET_AVX512BW from condition,
force_reg operands[2].

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr104001.c: New test.
---
 gcc/config/i386/i386.md  |  6 +++---
 gcc/testsuite/gcc.target/i386/pr104001.c | 21 +
 2 files changed, 24 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr104001.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9937643a273..7bd4f24aa07 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10455,7 +10455,7 @@ (define_insn_and_split "*xordi_1_btc"
 
 ;; PR target/94790: Optimize a ^ ((a ^ b) & mask) to (~mask & a) | (b & mask)
 (define_insn_and_split "*xor2andn"
-  [(set (match_operand:SWI248 0 "nonimmediate_operand")
+  [(set (match_operand:SWI248 0 "register_operand")
(xor:SWI248
  (and:SWI248
(xor:SWI248
@@ -10464,8 +10464,7 @@ (define_insn_and_split "*xor2andn"
(match_operand:SWI248 3 "nonimmediate_operand"))
  (match_dup 1)))
 (clobber (reg:CC FLAGS_REG))]
-  "(TARGET_BMI || TARGET_AVX512BW)
-   && ix86_pre_reload_split ()"
+  "TARGET_BMI && ix86_pre_reload_split ()"
   "#"
   "&& 1"
   [(parallel [(set (match_dup 4)
@@ -10486,6 +10485,7 @@ (define_insn_and_split "*xor2andn"
  (clobber (reg:CC FLAGS_REG))])]
 {
   operands[1] = force_reg (mode, operands[1]);
+  operands[2] = force_reg (mode, operands[2]);
   operands[3] = force_reg (mode, operands[3]);
   operands[4] = gen_reg_rtx (mode);
   operands[5] = gen_reg_rtx (mode);
diff --git a/gcc/testsuite/gcc.target/i386/pr104001.c 
b/gcc/testsuite/gcc.target/i386/pr104001.c
new file mode 100644
index 000..bd85aa7145e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr104001.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "kandn" } } */
+/* { dg-final { scan-assembler-times "andn" 1 } } */
+
+int b, c, d;
+int r;
+
+void
+__attribute__((target("bmi")))
+foo ()
+{
+  r = ((b & ~d) | (c & d));
+}
+
+void
+__attribute__((target("avx512bw")))
+bar ()
+{
+  r = ((b & ~d) | (c & d));
+}
-- 
2.18.1



Re: [PATCH] Loop unswitching: support gswitch statements.

2022-01-13 Thread Martin Liška

On 1/6/22 17:30, Martin Liška wrote:

I really welcome that, I've pushed devel/loop-unswitch-support-switches
branch with first changes you pointed out. Feel free playing with the branch.


Hello.

I've just pushed a revision to the branch that introduced top-level comment.
Feel free to play with the branch once you have spare cycles and we can
return to it next stage1.

Cheers,
Martin


Re: [PATCH] Fix -Wformat-diag for rs6000 target.

2022-01-13 Thread Martin Liška

On 1/13/22 13:55, Richard Sandiford wrote:

Like you say in the linked message, we could add an explicit noun too.
But the change seems OK as-is to me.


May I consider it as an approval of the suggested patch?

Thanks,
Martin


Re: Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Martin Sebor via Gcc-patches

On 1/13/22 03:55, Thomas Schwinge wrote:

Hi!

This has fallen out of (unfinished...) work earlier in the year: pushed
to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
"Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
for OpenACC test cases".


Thanks for the heads up.  If any of these are recent regressions
(either the false negatives or the false positives) it would be
helpful to isolate them to a few representative test cases.
The warning itself hasn't changed much in GCC 12 but regressions
in it could be due to the jump threading changes that it tends to
be sensitive to.

Martin




Grüße
  Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955




[PATCH] i386: Add 16-bit vector modes to xop_pcmov [PR104003]

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/104003
* config/i386/mmx.md (*xop_pcmov_): Use VI_16_32 mode iterator.

gcc/testsuite/ChangeLog:

PR target/104003
* g++.target/i386/pr103861-1-sse4.C: New test.
* g++.target/i386/pr103861-1-xop.C: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8a8142c8a09..295a132bc46 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2704,11 +2704,11 @@
   [(set_attr "type" "sse4arg")])
 
 (define_insn "*xop_pcmov_"
-  [(set (match_operand:VI_32 0 "register_operand" "=x")
-(if_then_else:VI_32
-  (match_operand:VI_32 3 "register_operand" "x")
-  (match_operand:VI_32 1 "register_operand" "x")
-  (match_operand:VI_32 2 "register_operand" "x")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x")
+(if_then_else:VI_16_32
+  (match_operand:VI_16_32 3 "register_operand" "x")
+  (match_operand:VI_16_32 1 "register_operand" "x")
+  (match_operand:VI_16_32 2 "register_operand" "x")))]
   "TARGET_XOP"
   "vpcmov\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "sse4arg")])
diff --git a/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C 
b/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C
new file mode 100644
index 000..a07b3ad111d
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr103861-1-sse4.C
@@ -0,0 +1,5 @@
+/* PR target/103861 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse4" } */
+
+#include "pr103861-1.C"
diff --git a/gcc/testsuite/g++.target/i386/pr103861-1-xop.C 
b/gcc/testsuite/g++.target/i386/pr103861-1-xop.C
new file mode 100644
index 000..d65542dc57f
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr103861-1-xop.C
@@ -0,0 +1,5 @@
+/* PR target/103861 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mxop" } */
+
+#include "pr103861-1.C"


Re: [PATCH] rs6000: Fix constraint v with rs6000_constraints[RS6000_CONSTRAINT_v]

2022-01-13 Thread David Edelsohn via Gcc-patches
On Thu, Jan 13, 2022 at 7:28 AM Kewen.Lin  wrote:
>
> on 2022/1/13 上午11:56, Kewen.Lin via Gcc-patches wrote:
> > on 2022/1/13 上午11:44, David Edelsohn wrote:
> >> On Wed, Jan 12, 2022 at 10:38 PM Kewen.Lin  wrote:
> >>>
> >>> Hi David,
> >>>
> >>> on 2022/1/13 上午11:07, David Edelsohn wrote:
>  On Wed, Jan 12, 2022 at 8:56 PM Kewen.Lin  wrote:
> >
> > Hi,
> >
> > This patch is to fix register constraint v with
> > rs6000_constraints[RS6000_CONSTRAINT_v] instead of ALTIVEC_REGS,
> > just like some other existing register constraints with
> > RS6000_CONSTRAINT_*.
> >
> > I happened to see this and hope it's not intentional and just
> > got neglected.
> >
> > Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> > powerpc64-linux-gnu P8.
> >
> > Is it ok for trunk?
> 
>  Why do you want to make this change?
> 
>  rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
> 
>  but all of the patterns that use a "v" constraint are (or should be)
>  protected by TARGET_ALTIVEC, or some final condition that only is
>  active for TARGET_ALTIVEC.  The other constraints are conditionally
>  set because they can be used in a pattern with multiple alternatives
>  where the pattern itself is active but some of the constraints
>  correspond to NO_REGS when some instruction variants for VSX is not
>  enabled.
> 
> >>>
> >>> Good point!  Thanks for the explanation.
> >>>
>  The change isn't wrong, but it doesn't correct a bug and provides no
>  additional benefit nor clarty that I can see.
> 
> >>>
> >>> The original intention is to make it consistent with the other existing
> >>> register constraints with RS6000_CONSTRAINT_*, otherwise it looks a bit
> >>> weird (like was neglected).  After you clarified above, 
> >>> RS6000_CONSTRAINT_v
> >>> seems useless at all in the current framework.  Do you prefer to remove
> >>> it to avoid any confusions instead?
> >>
> >> It's used in the reg_class, so there may be some heuristic in the GCC
> >> register allocator that cares about the number of registers available
> >> for the target.  rs6000_constraints[RS6000_CONSTRAINT_v] is defined
> >> conditionally, so it seems best to leave it as is.
> >>
> >
> > I may miss something, but I didn't find it's used for the above purposes.
> > If it's best to leave it as is, the proposed patch seems to offer better
> > readability.
>
> Two more inputs for maintainers' decision:
>
> 1) the original proposed patch fixed one "bug" that is:
>
> In function rs6000_debug_reg_global, it tries to print the register class
> for the register constraint:
>
>   fprintf (stderr,
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
>"v  reg_class = %s\n"
>"wa reg_class = %s\n"
>...
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>...
>
> It uses rs6000_constraints[RS6000_CONSTRAINT_v] which is conditionally
> set here:
>
>   /* Add conditional constraints based on various options, to allow us to
>  collapse multiple insn patterns.  */
>   if (TARGET_ALTIVEC)
> rs6000_constraints[RS6000_CONSTRAINT_v] = ALTIVEC_REGS;
>
> But the actual register class for register constraint is hardcoded as
> ALTIVEC_REGS rather than rs6000_constraints[RS6000_CONSTRAINT_v].

I agree that the information is inaccurate, but it is informal
debugging output.  And if Altivec is disabled, the value of the
constraint is irrelevant / garbage.

>
> 2) Bootstrapped and tested one below patch to remove all the code using
> RS6000_CONSTRAINT_v on powerpc64le-linux-gnu P10 and P9,
> powerpc64-linux-gnu P8 and P7 with no regressions.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 37f07fe5358..3652629c5d0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -2320,7 +2320,6 @@ rs6000_debug_reg_global (void)
>"\n"
>"d  reg_class = %s\n"
>"f  reg_class = %s\n"
> -  "v  reg_class = %s\n"
>"wa reg_class = %s\n"
>"we reg_class = %s\n"
>"wr reg_class = %s\n"
> @@ -2329,7 +2328,6 @@ rs6000_debug_reg_global (void)
>"\n",
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_d]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
> -  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
>reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wr]],
> @@ -2984,11 +2982,6 @@ rs6000_init

[committed] libgfortran: Fix Solaris version file creation [PR104006]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

I forgot to change the gfortran.map-sun goal to gfortran.ver-sun
when changing other spots for the preprocessed version file.

Fixed thusly, committed to trunk as obvious.

2022-01-13  Jakub Jelinek  

PR libfortran/104006
* Makefile.am (gfortran.map-sun): Rename target to ...
(gfortran.ver-sun): ... this.
* Makefile.in: Regenerated.

--- libgfortran/Makefile.am.jj  2022-01-13 17:43:58.685553296 +0100
+++ libgfortran/Makefile.am 2022-01-13 17:44:40.503962317 +0100
@@ -23,7 +23,7 @@ endif
 if LIBGFOR_USE_SYMVER_SUN
 version_arg = -Wl,-M,gfortran.ver-sun
 version_dep = gfortran.ver-sun gfortran.ver
-gfortran.map-sun : gfortran.ver \
+gfortran.ver-sun : gfortran.ver \
$(top_srcdir)/../contrib/make_sunver.pl \
$(libgfortran_la_OBJECTS) $(libgfortran_la_LIBADD)
perl $(top_srcdir)/../contrib/make_sunver.pl \
--- libgfortran/Makefile.in.jj  2022-01-13 17:43:58.687553267 +0100
+++ libgfortran/Makefile.in 2022-01-13 17:44:51.468807363 +0100
@@ -719,7 +719,6 @@ pdfdir = @pdfdir@
 prefix = @prefix@
 program_transform_name = @program_transform_name@
 psdir = @psdir@
-runstatedir = @runstatedir@
 sbindir = @sbindir@
 sharedstatedir = @sharedstatedir@
 srcdir = @srcdir@
@@ -7616,7 +7615,7 @@ uninstall-am: uninstall-cafexeclibLTLIBR
 @libgfor_use_symver_t...@gfortran.ver: $(srcdir)/gfortran.map kinds.inc
 @LIBGFOR_USE_SYMVER_TRUE@  $(EGREP) -v '#(#| |$$)' $< | \
 @LIBGFOR_USE_SYMVER_TRUE@$(PREPROCESS) -P -include config.h -include 
kinds.inc - > $@ || (rm -f $@ ; exit 1)
-@LIBGFOR_USE_SYMVER_SUN_TRUE@@libgfor_use_symver_t...@gfortran.map-sun : 
gfortran.ver \
+@LIBGFOR_USE_SYMVER_SUN_TRUE@@libgfor_use_symver_t...@gfortran.ver-sun : 
gfortran.ver \
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ 
$(top_srcdir)/../contrib/make_sunver.pl \
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ 
$(libgfortran_la_OBJECTS) $(libgfortran_la_LIBADD)
 @LIBGFOR_USE_SYMVER_SUN_TRUE@@LIBGFOR_USE_SYMVER_TRUE@ perl 
$(top_srcdir)/../contrib/make_sunver.pl \

Jakub



Re: [PATCH] tree-optimization/83072 - Allow more precision when querying from fold_const.

2022-01-13 Thread Andrew MacLeod via Gcc-patches

On 1/13/22 10:13, Richard Biener wrote:

On Thu, Jan 13, 2022 at 2:59 PM Andrew MacLeod via Gcc-patches
 wrote:

This patch actually addresses a few PRs.

The root PR was 97909.   Ranger context functionality was added to
fold_const back in early November
(https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583216.html)

The other 2 PRs mentioned (83072 and 83073) partially worked after this,
but the original patch did not change the result of the query in
expr_not_equal_to () to a multi-range object.

This patch simply changes the value_range variable in that routine to an
int_range<5> so we can pick up more precision. This in turn allows us to
capture all the tests as expected.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.

OK for trunk?

OK (though I wonder why not use int_range_max?)

No good reason..  Initially it was just because I wasn't familiar with 
what call chains might end up here, but really, I guess it doesn't matter.


I can change it to int_range_max before committing it.

Andrew



[PATCH v9] rtl: builtins: (not just) rs6000: Add builtins for fegetround, feclearexcept and feraiseexcept [PR94193]

2022-01-13 Thread Raoni Fassina Firmino via Gcc-patches
Changes since v8[8]:
  - Refactored and expanded builtin-feclearexcept-feraiseexcept-2.c
testcase:
+ Use a macro to avoid extended repetition of the core test code.
+ Expanded the test code to check builtins return code.
+ Added more tests to test all valid (standard) exceptions input
  combinations.
+ Updated the header comment to explain why the input must be
  passed as constants.

Changes since v7[7]:
  - Fixed an array indexing bug on fegeround testcase.
  - Fixed typos and spelling mistakes spread trouout the added comments.
  - Reworded header comment/description for fegetround expander.
  - Fixed changelog in the commit message.

This is an update to v8, based on the same review from Seguer to
expand the test coverage for feclearexcept and feraiseexcept (That is
also why I am keeping the v8 changelog here, since v8 had no reviews).
Two things to point out is: 1) The use of a macro there instead of a
function, unfortunately the builtins (for rs6000) only expand when the
input is a constant, so a macro is the way to go, and for the same
reason 2) I wanted to simplify the way to test all combinations of
input, but I could not think in a way without making some macro magics
that would be way less readable than listing all combinations by hand.

Tested on top of master (02a8a01bf396e009bfc31e1104c315fd403b4cca)
on the following plataforms with no regression:
  - powerpc64le-linux-gnu (Power 9)
  - powerpc64le-linux-gnu (Power 8)
  - powerpc64-linux-gnu (Power 9, with 32 and 64 bits tests)

Documentation changes tested on x86_64-redhat-linux.

==

I'm repeating the "changelog" from past versions here for convenience:

Changes since v6[6] and v5[5]:
  - Based this version on the v5 one.
  - Reworked all builtins back to the way they are in v5 and added the
following changes:
+ Added a test to target libc, only expanding with glibc as the
  target libc.
+ Updated all three expanders header comment to reflect the added
  behavior (fegetround got a full header as it had none).
+ Added extra documentation for the builtins on doc/extend.texi,
  similar to v6 version, but only the introductory paragraph,
  without a dedicated entry for each, since now they behavior and
  signature match the C99 ones.
  - Changed the description for the return operand in the RTL template
of the fegetround expander.  Using "(set )", the same way as
rs6000_mffsl expander (this change was taken from v6).
  - Updated the commit message mentioning the target libc restriction
and updated changelog.

Changes since v5[5]:
  - Reworked all builtins to accept the FE_* macros as parameters and
so be agnostic to libc implementations.  Largely based of
fpclassify.  To that end, there is some new files changed:
+ Change the argument list for the builtins declarations in
  builtins.def
+ Added new types in builtin-types.def to use in the buitins
  declarations.
+ Added extra documentation for the builtins on doc/extend.texi,
  similar to fpclassify.
  - Updated doc/md.texi documentation with the new optab behaviors.
  - Updated comments to the expanders and expand handlers to try to
explain whats is going on.
  - Changed the description for the return operand in the RTL template
of the fegetround expander.  Using "(set )", the same way as
rs6000_mffsl expander.
  - Updated testcases with helper macros with the new argument list.

Changes since v4[4]:
  - Fixed more spelling and code style.
  - Add more clarification on  comments for feraiseexcept and
feclearexcept expands;

Changes since v3[3]:
  - Fixed fegetround bug on powerpc64 (big endian) that Segher
spotted;

Changes since v2[2]:
  - Added documentation for the new optabs;
  - Remove use of non portable __builtin_clz;
  - Changed feclearexcept and feraiseexcept to accept all 4 valid
flags at the same time and added more test for that case;
  - Extended feclearexcept and feraiseexcept testcases to match
accepting multiple flags;
  - Fixed builtin-feclearexcept-feraiseexcept-2.c testcase comparison
after feclearexcept tests;
  - Updated commit message to reflect change in feclearexcept and
feraiseexcept from the glibc counterpart;
  - Fixed English spelling and typos;
  - Fixed code-style;
  - Changed subject line tag to make clear it is not just rs6000 code.

Changes since v1[1]:
  - Fixed English spelling;
  - Fixed code-style;
  - Changed match operand predicate in feclearexcept and feraiseexcept;
  - Changed testcase options;
  - Minor changes in test code to be C90 compatible;
  - Other minor changes suggested by Segher;
  - Changed subject line tag (not sure if I tagged correctly or should
include optabs: also)

[1] https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552024.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553297.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557109.html
[4] https://gcc

[PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-13 Thread Andreas Krebbel via Gcc-patches
The cprop_hardreg pass is built around the assumption that accessing a
register in a narrower mode is the same as accessing the lowpart of
the register.  This unfortunately is not true for vector registers on
IBM Z. This caused a miscompile of LLVM with GCC 8.5. The problem
could not be reproduced with upstream GCC unfortunately but we have to
assume that it is latent there. The right fix would require
substantial changes to the cprop pass and is certainly something we
would want for our platform. But since this would not be acceptable
for older GCCs I'll go with what Vladimir proposed in the RedHat BZ
and introduce a hopefully temporary and undocumented target hook to
disable that specific transformation in regcprop.c.

Here the RedHat BZ for reference:
https://bugzilla.redhat.com/show_bug.cgi?id=2028609

Bootstrapped and regression-tested on s390x.

Ok?

gcc/ChangeLog:

* target.def (narrow_mode_refers_low_part_p): Add new target hook.
* config/s390/s390.c (s390_narrow_mode_refers_low_part_p):
Implement new target hook for IBM Z.
(TARGET_NARROW_MODE_REFERS_LOW_PART_P): New macro.
* regcprop.c (maybe_mode_change): Disable transformation depending
on the new target hook.
---
 gcc/config/s390/s390.c | 14 ++
 gcc/regcprop.c |  3 ++-
 gcc/target.def | 12 +++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 056002e4a4a..aafc6d63be6 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10488,6 +10488,18 @@ s390_hard_regno_mode_ok (unsigned int regno, 
machine_mode mode)
   return false;
 }
 
+/* Implement TARGET_NARROW_MODE_REFERS_LOW_PART_P.  */
+
+static bool
+s390_narrow_mode_refers_low_part_p (unsigned int regno)
+{
+  if (reg_classes_intersect_p (VEC_REGS, REGNO_REG_CLASS (regno)))
+return false;
+
+  return true;
+}
+
+
 /* Implement TARGET_MODES_TIEABLE_P.  */
 
 static bool
@@ -17472,6 +17484,8 @@ s390_vectorize_vec_perm_const (machine_mode vmode, rtx 
target, rtx op0, rtx op1,
 #undef TARGET_VECTORIZE_VEC_PERM_CONST
 #define TARGET_VECTORIZE_VEC_PERM_CONST s390_vectorize_vec_perm_const
 
+#undef TARGET_NARROW_MODE_REFERS_LOW_PART_P
+#define TARGET_NARROW_MODE_REFERS_LOW_PART_P s390_narrow_mode_refers_low_part_p
 
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/regcprop.c b/gcc/regcprop.c
index 1a9bcf0a1ad..aaf94ad9b51 100644
--- a/gcc/regcprop.c
+++ b/gcc/regcprop.c
@@ -426,7 +426,8 @@ maybe_mode_change (machine_mode orig_mode, machine_mode 
copy_mode,
 
   if (orig_mode == new_mode)
 return gen_raw_REG (new_mode, regno);
-  else if (mode_change_ok (orig_mode, new_mode, regno))
+  else if (mode_change_ok (orig_mode, new_mode, regno)
+  && targetm.narrow_mode_refers_low_part_p (regno))
 {
   int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
   int use_nregs = hard_regno_nregs (copy_regno, new_mode);
diff --git a/gcc/target.def b/gcc/target.def
index 8fd2533e90a..598eea501ff 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5446,6 +5446,16 @@ value that the middle-end intended.",
  bool, (machine_mode from, machine_mode to, reg_class_t rclass),
  hook_bool_mode_mode_reg_class_t_true)
 
+/* This hook is used to work around a problem in regcprop. Hardcoded
+assumptions currently prevent it from working correctly for targets
+where the low part of a multi-word register doesn't align to accessing
+the register with a narrower mode.  */
+DEFHOOK_UNDOC
+(narrow_mode_refers_low_part_p,
+"",
+bool, (unsigned int regno),
+hook_bool_unit_true)
+
 /* Change pseudo allocno class calculated by IRA.  */
 DEFHOOK
 (ira_change_pseudo_allocno_class,
@@ -5949,7 +5959,7 @@ register if floating point arithmetic is not being done.  
As long as the\n\
 floating registers are not in class @code{GENERAL_REGS}, they will not\n\
 be used unless some pattern's constraint asks for one.",
  bool, (unsigned int regno, machine_mode mode),
- hook_bool_uint_mode_true)
+ hook_bool_uint_true)
 
 DEFHOOK
 (modes_tieable_p,
-- 
2.33.1



[PATCH] forwprop, v2: Canonicalize atomic fetch_op op x to op_fetch or vice versa [PR98737]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 04:07:20PM +0100, Richard Biener wrote:
> I'm mostly concerned about the replace_uses_by use.  forwprop
> will go over newly emitted stmts and thus the hypothetical added
> 
> lhs2 = d;
> 
> record the copy and schedule the stmt for removal, substituting 'd'
> in each use as it goes along the function and folding them.  It's
> a bit iffy (and maybe has unintended side-effects in odd cases)
> to trample around and fold stuff behind that flows back.
> 
> I'd always vote to simplify the folding code so it's easier to
> maintain and not micro-optimize there since it's not going to be
> a hot part of the compiler.

Ok.  So like this?

2022-01-13  Jakub Jelinek  

PR target/98737
* tree-ssa-forwprop.c (simplify_builtin_call): Canonicalize
__atomic_fetch_op (p, x, y) op x into __atomic_op_fetch (p, x, y)
and __atomic_op_fetch (p, x, y) iop x into
__atomic_fetch_op (p, x, y).

* gcc.dg/tree-ssa/pr98737-1.c: New test.
* gcc.dg/tree-ssa/pr98737-2.c: New test.

--- gcc/tree-ssa-forwprop.c.jj  2022-01-11 23:11:23.467275019 +0100
+++ gcc/tree-ssa-forwprop.c 2022-01-13 18:09:50.318625915 +0100
@@ -1241,12 +1241,19 @@ constant_pointer_difference (tree p1, tr
memset (p + 4, ' ', 3);
into
memcpy (p, "abcd   ", 7);
-   call if the latter can be stored by pieces during expansion.  */
+   call if the latter can be stored by pieces during expansion.
+
+   Also canonicalize __atomic_fetch_op (p, x, y) op x
+   to __atomic_op_fetch (p, x, y) or
+   __atomic_op_fetch (p, x, y) iop x
+   to __atomic_fetch_op (p, x, y) when possible (also __sync).  */
 
 static bool
 simplify_builtin_call (gimple_stmt_iterator *gsi_p, tree callee2)
 {
   gimple *stmt1, *stmt2 = gsi_stmt (*gsi_p);
+  enum built_in_function other_atomic = END_BUILTINS;
+  enum tree_code atomic_op = ERROR_MARK;
   tree vuse = gimple_vuse (stmt2);
   if (vuse == NULL)
 return false;
@@ -1448,6 +1455,290 @@ simplify_builtin_call (gimple_stmt_itera
}
}
   break;
+
+ #define CASE_ATOMIC(NAME, OTHER, OP) \
+case BUILT_IN_##NAME##_1:  \
+case BUILT_IN_##NAME##_2:  \
+case BUILT_IN_##NAME##_4:  \
+case BUILT_IN_##NAME##_8:  \
+case BUILT_IN_##NAME##_16: \
+  atomic_op = OP;  \
+  other_atomic \
+   = (enum built_in_function) (BUILT_IN_##OTHER##_1\
+   + (DECL_FUNCTION_CODE (callee2) \
+  - BUILT_IN_##NAME##_1)); \
+  goto handle_atomic_fetch_op;
+
+CASE_ATOMIC (ATOMIC_FETCH_ADD, ATOMIC_ADD_FETCH, PLUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_SUB, ATOMIC_SUB_FETCH, MINUS_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_AND, ATOMIC_AND_FETCH, BIT_AND_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_XOR, ATOMIC_XOR_FETCH, BIT_XOR_EXPR)
+CASE_ATOMIC (ATOMIC_FETCH_OR, ATOMIC_OR_FETCH, BIT_IOR_EXPR)
+
+CASE_ATOMIC (SYNC_FETCH_AND_ADD, SYNC_ADD_AND_FETCH, PLUS_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_SUB, SYNC_SUB_AND_FETCH, MINUS_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_AND, SYNC_AND_AND_FETCH, BIT_AND_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_XOR, SYNC_XOR_AND_FETCH, BIT_XOR_EXPR)
+CASE_ATOMIC (SYNC_FETCH_AND_OR, SYNC_OR_AND_FETCH, BIT_IOR_EXPR)
+
+CASE_ATOMIC (ATOMIC_ADD_FETCH, ATOMIC_FETCH_ADD, MINUS_EXPR)
+CASE_ATOMIC (ATOMIC_SUB_FETCH, ATOMIC_FETCH_SUB, PLUS_EXPR)
+CASE_ATOMIC (ATOMIC_XOR_FETCH, ATOMIC_FETCH_XOR, BIT_XOR_EXPR)
+
+CASE_ATOMIC (SYNC_ADD_AND_FETCH, SYNC_FETCH_AND_ADD, MINUS_EXPR)
+CASE_ATOMIC (SYNC_SUB_AND_FETCH, SYNC_FETCH_AND_SUB, PLUS_EXPR)
+CASE_ATOMIC (SYNC_XOR_AND_FETCH, SYNC_FETCH_AND_XOR, BIT_XOR_EXPR)
+
+#undef CASE_ATOMIC
+
+handle_atomic_fetch_op:
+  if (gimple_call_num_args (stmt2) >= 2 && gimple_call_lhs (stmt2))
+   {
+ tree lhs2 = gimple_call_lhs (stmt2), lhsc = lhs2;
+ tree arg = gimple_call_arg (stmt2, 1);
+ gimple *use_stmt, *cast_stmt = NULL;
+ use_operand_p use_p;
+ tree ndecl = builtin_decl_explicit (other_atomic);
+
+ if (ndecl == NULL_TREE || !single_imm_use (lhs2, &use_p, &use_stmt))
+   break;
+
+ if (gimple_assign_cast_p (use_stmt))
+   {
+ cast_stmt = use_stmt;
+ lhsc = gimple_assign_lhs (cast_stmt);
+ if (lhsc == NULL_TREE
+ || !INTEGRAL_TYPE_P (TREE_TYPE (lhsc))
+ || (TYPE_PRECISION (TREE_TYPE (lhsc))
+ != TYPE_PRECISION (TREE_TYPE (lhs2)))
+ || !single_imm_use (lhsc, &use_p, &use_stmt))
+   {
+ use_stmt = cast_stmt;
+ cast_stmt = NULL;
+  

Patch ping (Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074])

2022-01-13 Thread Jakub Jelinek via Gcc-patches
Hi!

I'd like to ping this patch:

> 2022-01-06  Jakub Jelinek  
> 
>   PR c++/89074
>   * fold-const.c (address_compare): Punt on comparison of address of
>   one object with address of end of another object if
>   folding_initializer.
> 
>   * g++.dg/cpp1y/constexpr-89074-1.C: New test.

Thanks.

Jakub



[PATCH] i386: Cleanup V2QI arithmetic instructions

2022-01-13 Thread Uros Bizjak via Gcc-patches
2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi): Disparage GPR alternative a bit.
Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size.
(negv2qi splitters): Use lowpart_subreg instead of
gen_lowpart to create subreg.
(v2qi3): Disparage GPR alternative a bit.
Disable for TARGET_PARTIAL_REG_STALL unless optimizing for size.
(v2qi3 splitters): Use lowpart_subreg instead of
gen_lowpart to create subreg.
* config/i386/i386.md (*subqi_ext_2): Move.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 9937643a273..bcaaa4993b1 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6905,6 +6905,30 @@
   [(set_attr "type" "alu")
(set_attr "mode" "SI")])
 
+(define_insn "*subqi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (minus:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 2 "register_operand" "Q")
+   (const_int 8)
+   (const_int 8)) 0)) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+  "sub{b}\t{%h2, %h0|%h0, %h2}"
+  [(set_attr "type" "alu")
+   (set_attr "mode" "QI")])
+
 ;; Subtract with jump on overflow.
 (define_expand "subv4"
   [(parallel [(set (reg:CCO FLAGS_REG)
@@ -6932,30 +6956,6 @@
 operands[4] = gen_rtx_SIGN_EXTEND (mode, operands[2]);
 })
 
-(define_insn "*subqi_ext_2"
-  [(set (zero_extract:SWI248
- (match_operand:SWI248 0 "register_operand" "+Q")
- (const_int 8)
- (const_int 8))
-   (subreg:SWI248
- (minus:QI
-   (subreg:QI
- (zero_extract:SWI248
-   (match_operand:SWI248 1 "register_operand" "0")
-   (const_int 8)
-   (const_int 8)) 0)
-   (subreg:QI
- (zero_extract:SWI248
-   (match_operand:SWI248 2 "register_operand" "Q")
-   (const_int 8)
-   (const_int 8)) 0)) 0))
-  (clobber (reg:CC FLAGS_REG))]
-  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
-   rtx_equal_p (operands[0], operands[1])"
-  "sub{b}\t{%h2, %h0|%h0, %h2}"
-  [(set_attr "type" "alu")
-   (set_attr "mode" "QI")])
-
 (define_insn "*subv4"
   [(set (reg:CCO FLAGS_REG)
(eq:CCO (minus:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 295a132bc46..3d99a5e851b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1633,12 +1633,20 @@
   "TARGET_MMX_WITH_SSE"
   "operands[2] = force_reg (mode, CONST0_RTX (mode));")
 
+(define_expand "neg2"
+  [(set (match_operand:VI_32 0 "register_operand")
+   (minus:VI_32
+ (match_dup 2)
+ (match_operand:VI_32 1 "register_operand")))]
+  "TARGET_SSE2"
+  "operands[2] = force_reg (mode, CONST0_RTX (mode));")
+
 (define_insn "negv2qi2"
   [(set (match_operand:V2QI 0 "register_operand" "=?Q,&Yw")
 (neg:V2QI
  (match_operand:V2QI 1 "register_operand" "0,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  ""
+  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
   "#"
   [(set_attr "isa" "*,sse2")
(set_attr "type" "multi")
@@ -1664,10 +1672,10 @@
  (const_int 8)) 0)) 0))
   (clobber (reg:CC FLAGS_REG))])]
 {
-  operands[3] = gen_lowpart (HImode, operands[1]);
-  operands[2] = gen_lowpart (HImode, operands[0]);
-  operands[1] = gen_lowpart (QImode, operands[1]);
-  operands[0] = gen_lowpart (QImode, operands[0]);
+  operands[3] = lowpart_subreg (HImode, operands[1], V2QImode);
+  operands[2] = lowpart_subreg (HImode, operands[0], V2QImode);
+  operands[1] = lowpart_subreg (QImode, operands[1], V2QImode);
+  operands[0] = lowpart_subreg (QImode, operands[0], V2QImode);
 })
 
 (define_split
@@ -1678,11 +1686,11 @@
   "reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
-   (minus:V4QI (match_dup 0) (match_dup 1)))]
+   (minus:V16QI (match_dup 0) (match_dup 1)))]
 {
-  operands[2] = CONST0_RTX (V4QImode);
-  operands[1] = gen_lowpart (V4QImode, operands[1]);
-  operands[0] = gen_lowpart (V4QImode, operands[0]);
+  operands[2] = CONST0_RTX (V16QImode);
+  operands[1] = lowpart_subreg (V16QImode, operands[1], V2QImode);
+  operands[0] = lowpart_subreg (V16QImode, operands[0], V2QImode);
 })
 
 (define_expand "mmx_3"
@@ -1718,14 +1726,6 @@
(set_attr "type" "mmxadd,sseadd,sseadd")
(set_attr "mode" "DI,TI,TI")])
 
-(define_expand "neg2"
-  [(set (match_operand:VI_32 0 "register_operand"

Ping^4: [PATCH, rs6000 V2] rotate and mask constants [PR94393]

2022-01-13 Thread Pat Haugen via Gcc-patches
Ping.

On 11/22/21 1:38 PM, Pat Haugen via Gcc-patches wrote:
> Updated version of the patch. Changes made from original are updated 
> commentary to hopefully aid readability, no functional changes.
> 
> 
> Implement more two insn constants.  rotate_and_mask_constant covers
> 64-bit constants that can be formed by rotating a 16-bit signed
> constant, rotating a 16-bit signed constant masked on left or right
> (rldicl and rldicr), rotating a 16-bit signed constant masked by
> rldic, and unusual "lis; rldicl" and "lis; rldicr" patterns.  All the
> values possible for DImode rs6000_is_valid_and_mask are covered.
> 
> Bootstrapped and regression tested on powerpc64(32/64) and powerpc64le.
> Ok for master?
> 
> -Pat
> 
> 
> 2021-11-22  Alan Modra  
>   Pat Haugen  
> 
>   PR 94393
> gcc/
>   * config/rs6000/rs6000.c (rotate_di, is_rotate_positive_constant,
>   is_rotate_negative_constant, rotate_and_mask_constant): New functions.
>   (num_insns_constant_multi, rs6000_emit_set_long_const): Use it here.
>   * config/rs6000/rs6000.md (*movdi_internal64+1 splitter): Delete.
> gcc/testsuite/
>   * gcc.target/powerpc/rot_cst.h,
>   * gcc.target/powerpc/rot_cst1.c,
>   * gcc.target/powerpc/rot_cst2.c: New tests.



Re: PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/12/22 10:33, David Malcolm wrote:

On Tue, 2022-01-11 at 23:36 -0500, Jason Merrill wrote:

On 1/10/22 16:36, David Malcolm via Gcc-patches wrote:

On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:

On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:

This patch adds a new __attribute__ ((tainted)) to the C/C++
frontends.


Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of
this
patch (attribute registration, documentation, the name of the
attribute, etc).

(I believe it's independent of the rest of the patch kit, in that
it
could go into trunk without needing the prior patches)

Thanks
Dave


Getting close to end of stage 3 for GCC 12, so pinging this patch
again...

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html


The c-family change is OK.


Thanks.

I'm retesting the patch now, but it now seems to me that
   __attribute__((tainted_args))
would lead to more readable code than:
   __attribute__((tainted))

in that the name "tainted_args" better conveys the idea that all
arguments are under attacker-control (as opposed to the body of the
function or the function pointer being under attacker-control).

Looking at
   https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
we already have some attributes with underscores in their names.

Does this sound good?


Makes sense to me.




Thanks
Dave






It can be used on function decls: the analyzer will treat as
tainted
all parameters to the function and all buffers pointed to by
parameters
to the function.  Adding this in one place to the Linux kernel's
__SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls
as
having tainted inputs.  This gives additional testing beyond e.g.
__user
pointers added by earlier patches - an example of the use of this
can
be
seen in CVE-2011-2210, where given:

   SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user
*,
buffer,
   unsigned long, nbytes, int __user *, start,
void
__user *, arg)

the analyzer will treat the nbytes param as under attacker
control,
and
can complain accordingly:

taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-
controlled
value
    ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
Wanalyzer-tainted-size]
     69 | if (copy_to_user(buffer, hwrpb, nbytes)
!= 0)
    | ^~~

Additionally, the patch allows the attribute to be used on field
decls:
specifically function pointers.  Any function used as an
initializer
for such a field gets treated as tainted.  An example can be seen
in
CVE-2020-13143, where adding __attribute__((tainted)) to the
"store"
callback of configfs_attribute:

    struct configfs_attribute {
   /* [...snip...] */
   ssize_t (*store)(struct config_item *, const char *,
size_t)
     __attribute__((tainted));
   /* [...snip...] */
    };

allows the analyzer to see:

   CONFIGFS_ATTR(gadget_dev_desc_, UDC);

and treat gadget_dev_desc_UDC_store as tainted, so that it
complains:

taint-CVE-2020-13143-1.c: In function
‘gadget_dev_desc_UDC_store’:
taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-
controlled
value
    ‘len + 18446744073709551615’ as offset without upper-bounds
checking [CWE-823] [-Wanalyzer-tainted-offset]
     33 | if (name[len - 1] == '\n')
    | ^

Similarly, the attribute could be used on the ioctl callback
field,
USB device callbacks, network-handling callbacks etc.  This
potentially
gives a lot of test coverage with relatively little code
annotation,
and
without necessarily needing link-time analysis (which -fanalyzer
can
only do at present on trivial examples).

I believe this is the first time we've had an attribute on a
field.
If that's an issue, I could prepare a version of the patch that
merely allowed it on functions themselves.

As before this currently still needs -fanalyzer-checker=taint (in
addition to -fanalyzer).

gcc/analyzer/ChangeLog:
  * engine.cc: Include "stringpool.h", "attribs.h", and
  "tree-dfa.h".
  (mark_params_as_tainted): New.
  (class tainted_function_custom_event): New.
  (class tainted_function_info): New.
  (exploded_graph::add_function_entry): Handle functions
with
  "tainted" attribute.
  (class tainted_field_custom_event): New.
  (class tainted_callback_custom_event): New.
  (class tainted_call_info): New.
  (add_tainted_callback): New.
  (add_any_callbacks): New.
  (exploded_graph::build_initial_worklist): Find callbacks
that
are
  reachable from global initializers, calling
add_any_callbacks
on
  them.

gcc/c-family/ChangeLog:
  * c-attribs.c (c_common_attribute_table): Add "tainted".
  (handle_tainted_attribute): New.

gcc/ChangeLog:
  * doc/extend.texi (Function Attributes): Note that
"taint

[PATCH] i386: Introduce V2QImode vectorized shifts [PR103861]

2022-01-13 Thread Uros Bizjak via Gcc-patches
Add V2QImode shift operations and split them to synthesized
double HI/LO QImode operations with integer registers.

Also robustify arithmetic split patterns.

2022-01-13  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386.md (*ashlqi_ext_2): New insn pattern.
(*qi_ext_2): Ditto.
* config/i386/mmx.md (v2qi):
New insn_and_split pattern.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861.c (shl,ashr,lshr): New tests.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bcaaa4993b1..c2acb1dbd90 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12413,6 +12413,54 @@
(const_string "*")))
(set_attr "mode" "")])
 
+(define_insn "*ashlqi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (ashift:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (match_operand:QI 2 "nonmemory_operand" "cI")) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+{
+  switch (get_attr_type (insn))
+{
+case TYPE_ALU:
+  gcc_assert (operands[2] == const1_rtx);
+  return "add{b}\t%h0, %h0";
+
+default:
+  if (operands[2] == const1_rtx
+ && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+   return "sal{b}\t%h0";
+  else
+   return "sal{b}\t{%2, %h0|%h0, %2}";
+}
+}
+  [(set (attr "type")
+ (cond [(and (match_test "TARGET_DOUBLE_WITH_ADD")
+(match_operand 2 "const1_operand"))
+ (const_string "alu")
+  ]
+  (const_string "ishift")))
+   (set (attr "length_immediate")
+ (if_then_else
+   (ior (eq_attr "type" "alu")
+   (and (eq_attr "type" "ishift")
+(and (match_operand 2 "const1_operand")
+ (ior (match_test "TARGET_SHIFT1")
+  (match_test "optimize_function_for_size_p 
(cfun)")
+   (const_string "0")
+   (const_string "*")))
+   (set_attr "mode" "QI")])
+
 ;; See comment above `ashl3' about how this works.
 
 (define_expand "3"
@@ -13143,6 +13191,39 @@
(const_string "0")
(const_string "*")))
(set_attr "mode" "")])
+
+(define_insn "*qi_ext_2"
+  [(set (zero_extract:SWI248
+ (match_operand:SWI248 0 "register_operand" "+Q")
+ (const_int 8)
+ (const_int 8))
+   (subreg:SWI248
+ (any_shiftrt:QI
+   (subreg:QI
+ (zero_extract:SWI248
+   (match_operand:SWI248 1 "register_operand" "0")
+   (const_int 8)
+   (const_int 8)) 0)
+   (match_operand:QI 2 "nonmemory_operand" "cI")) 0))
+  (clobber (reg:CC FLAGS_REG))]
+  "/* FIXME: without this LRA can't reload this pattern, see PR82524.  */
+   rtx_equal_p (operands[0], operands[1])"
+{
+  if (operands[2] == const1_rtx
+  && (TARGET_SHIFT1 || optimize_function_for_size_p (cfun)))
+return "{b}\t%h0";
+  else
+return "{b}\t{%2, %h0|%h0, %2}";
+}
+  [(set_attr "type" "ishift")
+   (set (attr "length_immediate")
+ (if_then_else
+   (and (match_operand 2 "const1_operand")
+   (ior (match_test "TARGET_SHIFT1")
+(match_test "optimize_function_for_size_p (cfun)")))
+   (const_string "0")
+   (const_string "*")))
+   (set_attr "mode" "QI")])
 
 ;; Rotate instructions
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3d99a5e851b..782da220f98 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1657,7 +1657,8 @@
 (neg:V2QI
  (match_operand:V2QI 1 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
   (neg:QI (match_dup 1)))
@@ -1683,7 +1684,8 @@
 (neg:V2QI
  (match_operand:V2QI 1 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && TARGET_SSE2 && reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
(minus:V16QI (match_dup 0) (match_dup 1)))]
@@ -1757,7 +1759,8 @@
  (match_operand:V2QI 1 "general_reg_operand")
  (match_operand:V2QI 2 "general_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (strict_low_part (match_dup 0))
   (plusmi

Re: [COMIITTED] Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

2022-01-13 Thread Jonathan Wakely via Gcc-patches

On 10/01/22 11:45 +, Jonathan Wakely wrote:

CC libstdc++ and Jakub.

On 08/01/22 23:22 -0700, Sandra Loosemore wrote:

I've checked in these tweaks for various testcases that fail on
nios2-elf without an explicit -fdelete-null-pointer-checks option.  This
target is configured to build with that optimization off by default.

-Sandra

commit 04c69d0e61c0f98a010d77a79ab749d5f0aa6b67
Author: Sandra Loosemore 
Date:   Sat Jan 8 22:02:13 2022 -0800

  Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

  nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking
  tests that implicitly depend on that optimization.  Add the option
  explicitly on these tests.

  2022-01-08  Sandra Loosemore  

gcc/testsuite/
* g++.dg/cpp0x/constexpr-compare1.C: Add explicit
-fdelete-null-pointer-checks option.
* g++.dg/cpp0x/constexpr-compare2.C: Likewise.
* g++.dg/cpp0x/constexpr-typeid2.C: Likewise.
* g++.dg/cpp1y/constexpr-94716.C: Likewise.
* g++.dg/cpp1z/constexpr-compare1.C: Likewise.
* g++.dg/cpp1z/constexpr-if36.C: Likewise.
* gcc.dg/init-compare-1.c: Likewise.

libstdc++-v3/
* testsuite/18_support/type_info/constexpr.cc: Add explicit
-fdelete-null-pointer-checks option.


This test should not be doing anything with null pointers. Instead of
working around the error on nios2-elf, I think the front-end needs
fixing.

Maybe something is not being folded early enough for the constexpr
evaluation to work. Jakub?

$ g++ -std=gnu++23  
~/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc -c 
-fno-delete-null-pointer-checks
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
 error: non-constant condition for static assertion
  49 | static_assert( test01() );
 |~~^~
In file included from 
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:5:
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
   in 'constexpr' expansion of 'test01()'
/home/jwakely/gcc/12/include/c++/12.0.0/typeinfo:196:19: error: '(((const 
std::type_info*)(& _ZTIi)) == ((const std::type_info*)(& _ZTIl)))' is not a 
constant expression
 196 |   return this == &__arg;
 |  ~^


This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104016



[PATCH] PR fortran/103782 - [9/10/11/12 Regression] internal error occurs when overloading intrinsic

2022-01-13 Thread Harald Anlauf via Gcc-patches
Dear all,

there was a regression handling overloaded elemental intrinsics,
leading to an ICE on valid code.  Reported by Urban Jost.

The logic for when we need to scalarize a call to an intrinsic
seems to have been broken during the 9-release.  The attached
patch fixes the ICE and seems to work on the extended testcase
as well as regtests fine on x86_64-pc-linux-gnu.

OK for mainline?  Backport to affected branches?

Thanks,
Harald

From 5b914bef991528aebfe9734b4e7af7bae039e66a Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 13 Jan 2022 21:50:45 +0100
Subject: [PATCH] Fortran: fix ICE overloading elemental intrinsics

gcc/fortran/ChangeLog:

	PR fortran/103782
	* expr.c (gfc_simplify_expr): Adjust logic for when to scalarize a
	call of an intrinsic which may have been overloaded.

gcc/testsuite/ChangeLog:

	PR fortran/103782
	* gfortran.dg/overload_4.f90: New test.
---
 gcc/fortran/expr.c   |  5 ++---
 gcc/testsuite/gfortran.dg/overload_4.f90 | 27 
 2 files changed, 29 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/overload_4.f90

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index a87686d8217..20b88a8ef56 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -2219,10 +2219,9 @@ gfc_simplify_expr (gfc_expr *p, int type)
 	  && gfc_intrinsic_func_interface (p, 1) == MATCH_ERROR)
 	return false;

-  if (p->expr_type == EXPR_FUNCTION)
+  if (p->symtree && (p->value.function.isym || p->ts.type == BT_UNKNOWN))
 	{
-	  if (p->symtree)
-	isym = gfc_find_function (p->symtree->n.sym->name);
+	  isym = gfc_find_function (p->symtree->n.sym->name);
 	  if (isym && isym->elemental)
 	scalarize_intrinsic_call (p, false);
 	}
diff --git a/gcc/testsuite/gfortran.dg/overload_4.f90 b/gcc/testsuite/gfortran.dg/overload_4.f90
new file mode 100644
index 000..43207e358ba
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/overload_4.f90
@@ -0,0 +1,27 @@
+! { dg-do run }
+! { dg-additional-options "-Wno-intrinsic-shadow" }
+! PR fortran/103782 - ICE overloading an intrinsic like dble or real
+! Contributed by Urban Jost
+
+program runtest
+  implicit none
+  interface dble
+ procedure to_double
+  end interface dble
+  interface real
+ procedure floor ! not really FLOOR...
+  end interface real
+  if (any (dble ([10.0d0,20.0d0]) - [10.0d0,20.0d0] /= 0.d0)) stop 1
+  if (any (real ([1.5,2.5])   - [1.5,2.5]   /= 0.0 )) stop 2
+contains
+  elemental function to_double (valuein) result(d_out)
+doubleprecision,intent(in) :: valuein
+doubleprecision:: d_out
+d_out=valuein
+  end function to_double
+  elemental function floor (valuein) result(d_out) ! not really FLOOR...
+real, intent(in) :: valuein
+real :: d_out
+d_out=valuein
+  end function floor
+end program runtest
--
2.31.1



Re: [PATCH] c++: error message for dependent template members [PR70417]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 12/9/21 10:51, Jason Merrill wrote:

On 12/4/21 12:23, Anthony Sharp wrote:

Hi Jason,

Hope you are well. Apologies for not coming back sooner.

 >I'd put it just above the definition of saved_token_sentinel in 
parser.c.


Sounds good, done.

 >Maybe cp_parser_require_end_of_template_parameter_list?  Either way 
is fine.


Even better, have changed it.

 >Hmm, good point; operators that are member functions must be 
non-static,

 >so we couldn't be doing a comparison of the address of the function.

In that case I have set it to return early there.

 >So declarator_p should be true there.  I'll fix that.

Thank you.

 >> +  if (next_token->keyword == RID_TEMPLATE)
 >> +    {
 >> +      /* But at least make sure it's properly formed (e.g. see 
PR19397).  */
 >> +      if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == 
CPP_NAME)

 >> +       return 1;
 >> +
 >> +      return -1;
 >> +    }
 >> +
 >> +  /* Could be a ~ referencing the destructor of a class template. 
 */

 >> +  if (next_token->type == CPP_COMPL)
 >> +    {
 >> +      /* It could only be a template.  */
 >> +      if (cp_lexer_peek_nth_token (parser->lexer, 2)->type == 
CPP_NAME)

 >> +       return 1;
 >> +
 >> +      return -1;
 >> +    }
 >
 >Why don't these check for the < ?

I think perhaps I could have named the function better; instead of
next_token_begins_template_id, how's about 
next_token_begins_template_name?

That's all I intended to check for.


You're using it to control whether we try to parse a template-id, and 
it's used to initialize variables named looks_like_template_id, so I 
think better to keep the old name.



In the first case, something like "->template some_name" will always be
intended as a template, so no need to check for the <. If there were 
default

template arguments you could also validly omit the <> completely, I think
(could be wrong).


Or if the template arguments can be deduced, yes:

template  struct A
{
   template  void f(U u);
};

template  void g(A a)
{
   a->template f(42);
}

But 'f' is still not a template-id.

...

Actually, it occurs to me that you might be better off handling this in 
cp_parser_template_name, something like the below, to avoid the complex 
duplicate logic in the id-expression handling.


Note that in this patch I'm using "any_object_scope" as a proxy for "I'm 
parsing an expression", since !is_declaration doesn't work for that; as 
a result, this doesn't handle the static member function template case. 
For that you'd probably still need to pass down a flag so that 
cp_parser_template_name knows it's being called from 
cp_parser_id_expression.


Your patch has a false positive on

template  struct A { };
template  void f()
{
   A();
};

which my patch checks in_template_argument_list_p to avoid, though 
checking any_object_scope also currently avoids it.


What do you think?


I decided that it made more sense to keep the check in 
cp_parser_id_expression like you had it, but I moved it to the end to 
simplify the logic.  Here's what I'm applying, thanks!From 1978f05716133b934de0fca7c3d64089b62e3e78 Mon Sep 17 00:00:00 2001
From: Anthony Sharp 
Date: Sat, 4 Dec 2021 17:23:22 +
Subject: [PATCH] c++: warning for dependent template members [PR70417]
To: gcc-patches@gcc.gnu.org

Add a helpful warning message for when the user forgets to
include the "template" keyword after ., -> or :: when
accessing a member in a dependent context, where the member is a
template.

	PR c++/70417

gcc/c-family/ChangeLog:

	* c.opt: Added -Wmissing-template-keyword.

gcc/cp/ChangeLog:

	* parser.c (cp_parser_id_expression): Handle
	-Wmissing-template-keyword.
	(struct saved_token_sentinel): Add modes to control what happens
	on destruction.
	(cp_parser_statement): Adjust.
	(cp_parser_skip_entire_template_parameter_list): New function that
	skips an entire template parameter list.
	(cp_parser_require_end_of_template_parameter_list): Rename old
	cp_parser_skip_to_end_of_template_parameter_list.
	(cp_parser_skip_to_end_of_template_parameter_list): Refactor to be
	called from one of the above two functions.
	(cp_parser_lambda_declarator_opt)
	(cp_parser_explicit_template_declaration)
	(cp_parser_enclosed_template_argument_list): Adjust.

gcc/ChangeLog:

	* doc/invoke.texi: Documentation for Wmissing-template-keyword.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/variadic-mem_fn2.C: Catch warning about missing
	template keyword.
	* g++.dg/template/dependent-name17.C: New test.
	* g++.dg/template/dependent-name18.C: New test.

Co-authored-by: Jason Merrill 
---
 gcc/doc/invoke.texi   |  33 
 gcc/c-family/c.opt|   4 +
 gcc/cp/parser.c   | 178 +-
 gcc/testsuite/g++.dg/cpp0x/variadic-mem_fn2.C |   1 +
 .../g++.dg/template/dependent-name17.C|  49 +
 .../g++.dg/template/dependent-name18.C|   5 +
 6 files changed, 223 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsui

Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/13/22 04:39, Jakub Jelinek wrote:

Hi!

The changes done to genericize_if_stmt in order to improve
-Wunreachable-code* warning (which Richi didn't actually commit
for GCC 12) are I think fine for normal ifs, but for constexpr if
and consteval if we have two competing warnings.
The problem is that we replace the non-taken clause (then or else)
with void_node and keep the if (cond) { something } else {}
or if (cond) {} else { something }; in the IL.
This helps -Wunreachable-code*, if something can't fallthru but the
non-taken clause can, we don't warn about code after it because it
is still (in theory) reachable.
But if the non-taken branch can't fallthru, we can get false positive
-Wreturn-type warnings (which are enabled by default) if there is
nothing after the if and the taken branch can't fallthru either.


Perhaps we should replace the non-taken clause with 
__builtin_unreachable() instead of void_node?


And/or block_may_fallthru could handle INTEGER_CST op0?


One possibility to fix this is revert at least temporarily
to the previous behavior for constexpr and consteval if, yes, we
can get false positive -Wunreachable-code* warnings but the warning
isn't present in GCC 12.
The patch below implements that for constexpr if which throws its
clauses very early (either during parsing or during instantiation),
and for consteval if it decides based on block_may_fallthru on the
non-taken (for constant evaluation only) clause - if the non-taken
branch may fallthru, it does what you did in genericize_if_stmt
for consteval if, if it can't fallthru, it uses the older way
of pretending there wasn't an if and just replacing it with the
taken clause.  There are some false positive risks with this though,
block_may_fallthru is optimistic and doesn't handle some statements
at all (like FOR_STMT, WHILE_STMT, DO_STMT - of course handling those
is quite hard).
For constexpr if (but perhaps for GCC 13?) we could try to
block_may_fallthru before we throw it away and remember it in some
flag on the IF_STMT, but am not sure how dangerous would it be to call
it on the discarded stmts.  Or if it is too dangerous e.g. just
remember whether the discarded block of consteval if wasn't present
or was empty, in that case assume fallthru, and otherwise assume
it can't fallthru (-Wunreachable-code possible false positives).

Bootstrapped/regtested on x86_64-linux and i686-linux, if needed,
I can also test the safer variant with just
   if (IF_STMT_CONSTEVAL_P (stmt))
 stmt = else_;
for consteval if.

2022-01-13  Jakub Jelinek  

PR c++/103991
* cp-objcp-common.c (cxx_block_may_fallthru) : For
IF_STMT_CONSTEXPR_P with constant false or true condition only
check if the taken clause may fall through.
* cp-gimplify.c (genericize_if_stmt): For consteval if, revert
to r12-5638^ behavior if then_ block can't fall through.  For
constexpr if, revert to r12-5638^ behavior.

* g++.dg/warn/Wreturn-type-13.C: New test.

--- gcc/cp/cp-objcp-common.c.jj 2022-01-11 23:11:22.091294356 +0100
+++ gcc/cp/cp-objcp-common.c2022-01-12 17:57:18.232202275 +0100
@@ -313,6 +313,13 @@ cxx_block_may_fallthru (const_tree stmt)
return false;
  
  case IF_STMT:

+  if (IF_STMT_CONSTEXPR_P (stmt))
+   {
+ if (integer_nonzerop (IF_COND (stmt)))
+   return block_may_fallthru (THEN_CLAUSE (stmt));
+ if (integer_zerop (IF_COND (stmt)))
+   return block_may_fallthru (ELSE_CLAUSE (stmt));
+   }
if (block_may_fallthru (THEN_CLAUSE (stmt)))
return true;
return block_may_fallthru (ELSE_CLAUSE (stmt));
--- gcc/cp/cp-gimplify.c.jj 2022-01-11 23:11:22.090294370 +0100
+++ gcc/cp/cp-gimplify.c2022-01-12 21:22:17.585212804 +0100
@@ -166,8 +166,15 @@ genericize_if_stmt (tree *stmt_p)
   can contain unfolded immediate function calls, we have to discard
   the then_ block regardless of whether else_ has side-effects or not.  */
if (IF_STMT_CONSTEVAL_P (stmt))
-stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
-  void_node, else_);
+{
+  if (block_may_fallthru (then_))
+   stmt = build3 (COND_EXPR, void_type_node, boolean_false_node,
+  void_node, else_);
+  else
+   stmt = else_;
+}
+  else if (IF_STMT_CONSTEXPR_P (stmt))
+stmt = integer_nonzerop (cond) ? then_ : else_;
else
  stmt = build3 (COND_EXPR, void_type_node, cond, then_, else_);
protected_set_expr_location_if_unset (stmt, locus);
--- gcc/testsuite/g++.dg/warn/Wreturn-type-13.C.jj  2022-01-12 
21:21:36.567794238 +0100
+++ gcc/testsuite/g++.dg/warn/Wreturn-type-13.C 2022-01-12 21:20:48.487475787 
+0100
@@ -0,0 +1,35 @@
+// PR c++/103991
+// { dg-do compile { target c++17 } }
+
+struct S { ~S(); };
+int
+foo ()
+{
+  S s;
+  if constexpr (true)
+return 0;
+  else
+return 1;
+}  // { dg-bogus "control reaches end of non-void 
f

Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/6/22 04:24, Jakub Jelinek wrote:


The following testcase used to be incorrectly accepted.  The match.pd
optimization that uses address_compare punts on folding comparison
of start of one object and end of another one only when those addresses
are cast to integral types, when the comparison is done on pointer types
it assumes undefined behavior and decides to fold the comparison such
that the addresses don't compare equal even when they at runtime they
could be equal.
But C++ says it is undefined behavior and so during constant evaluation
we should reject those, so this patch adds !folding_initializer &&
check to that spot.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


Note, address_compare has some special cases, e.g. it assumes that
static vars are never adjacent to automatic vars, which is the case
for the usual layout where automatic vars are on the stack and after
.rodata/.data sections there is heap:
   /* Assume that automatic variables can't be adjacent to global
  variables.  */
   else if (is_global_var (base0) != is_global_var (base1))
 ;
Is it ok that during constant evaluation we don't treat those as undefined
behavior, or shall that be with !folding_initializer && too?


I guess that's undefined as well.


Another special case is:
   if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
|| (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
|| (TREE_CODE (base0) == STRING_CST
&& TREE_CODE (base1) == STRING_CST
&& ioff0 >= 0 && ioff1 >= 0
&& ioff0 < TREE_STRING_LENGTH (base0)
&& ioff1 < TREE_STRING_LENGTH (base1)
   /* This is a too conservative test that the STRING_CSTs
  will not end up being string-merged.  */
&& strncmp (TREE_STRING_POINTER (base0) + ioff0,
TREE_STRING_POINTER (base1) + ioff1,
MIN (TREE_STRING_LENGTH (base0) - ioff0,
 TREE_STRING_LENGTH (base1) - ioff1)) != 0))
 ;
   else if (!DECL_P (base0) || !DECL_P (base1))
 return 2;
Here we similarly assume that vars aren't adjacent to string literals
or vice versa.  Do we need to stick !folding_initializer && to those
DECL_P vs. STRING_CST cases?


Seems so.


Though, because of the return 2; for
non-DECL_P that would mean rejecting comparisons like &var == &"foobar"[3]
etc. which ought to be fine, no?  So perhaps we need to watch for
decls. vs. STRING_CSTs like for DECLs whether the address is at the start
or at the end of the string literal or somewhere in between (at least
for folding_initializer)?


Agreed.


And yet another chapter but probably unsolvable is comparison of
string literal addresses.  I think pedantically in C++
&"foo"[0] == &"foo"[0] is undefined behavior, different occurences of
the same string literals might still not be merged in some implementations.


I disagree; it's unspecified whether string literals are merged, but I 
think the comparison result is well specified depending on that 
implementation behavior.



But constexpr const char *s = "foo"; &s[0] == &s[0] should be well defined,
and we aren't tracking anywhere whether the string literal was the same one
or different (and I think other compilers don't track that either).

2022-01-06  Jakub Jelinek  

PR c++/89074
* fold-const.c (address_compare): Punt on comparison of address of
one object with address of end of another object if
folding_initializer.

* g++.dg/cpp1y/constexpr-89074-1.C: New test.

--- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100
+++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100
@@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty
/* If this is a pointer comparison, ignore for now even
   valid equalities where one pointer is the offset zero
   of one object and the other to one past end of another one.  */
-  else if (!INTEGRAL_TYPE_P (type))
+  else if (!folding_initializer && !INTEGRAL_TYPE_P (type))
  ;
/* Assume that automatic variables can't be adjacent to global
   variables.  */
--- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj   2022-01-05 
20:43:03.696917484 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C  2022-01-05 
20:42:12.676634044 +0100
@@ -0,0 +1,28 @@
+// PR c++/89074
+// { dg-do compile { target c++14 } }
+
+constexpr bool
+foo ()
+{
+  int a[] = { 1, 2 };
+  int b[] = { 3, 4 };
+
+  if (&a[0] == &b[0])
+return false;
+
+  if (&a[1] == &b[0])
+return false;
+
+  if (&a[1] == &b[1])
+return false;
+
+  if (&a[2] == &b[1])
+return false;
+
+  if (&a[2] == &b[0])  // { dg-error "is not a constant expression" }
+return false;
+
+  return true;
+}
+
+constexpr bool a = foo ();




Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jakub Jelinek via Gcc-patches
On Thu, Jan 13, 2022 at 04:09:22PM -0500, Jason Merrill wrote:
> > The changes done to genericize_if_stmt in order to improve
> > -Wunreachable-code* warning (which Richi didn't actually commit
> > for GCC 12) are I think fine for normal ifs, but for constexpr if
> > and consteval if we have two competing warnings.
> > The problem is that we replace the non-taken clause (then or else)
> > with void_node and keep the if (cond) { something } else {}
> > or if (cond) {} else { something }; in the IL.
> > This helps -Wunreachable-code*, if something can't fallthru but the
> > non-taken clause can, we don't warn about code after it because it
> > is still (in theory) reachable.
> > But if the non-taken branch can't fallthru, we can get false positive
> > -Wreturn-type warnings (which are enabled by default) if there is
> > nothing after the if and the taken branch can't fallthru either.
> 
> Perhaps we should replace the non-taken clause with __builtin_unreachable()
> instead of void_node?

It depends.  If the non-taken clause doesn't exist, is empty or otherwise
can fallthru, then using void_node for it is what we want.
If it exists and can't fallthru, then __builtin_unreachable() is one
possibility, but for all purpose
  if (1)
something
  else
__builtin_unreachable();
is equivalent to genericization of it as
  something
and
  if (0)
__builtin_unreachable();
  else
something
too.
The main problem is what to do for the consteval if that throws away
the non-taken clause too early, whether we can do block_may_fallthru
already where we throw it away or not.  If we can do that, we could
as right now clear the non-taken clause if it can fallthru and otherwise
either set some flag on the IF_STMT or set the non-taken clause to
__builtin_unreachable or endless empty loop etc., ideally something
as cheap as possible.
 
> And/or block_may_fallthru could handle INTEGER_CST op0?

That is what I'm doing for consteval if in the patch because the info
whether the non-taken clause can fallthru is lost.
We can't do that for normal if, because the non-taken clause could
have labels in it to which something jumps.
But, block_may_fallthru isn't actually what is used for the -Wreturn-type
warning, I think we warn only at cfg creation.

Jakub



Re: Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics for OpenACC test cases

2022-01-13 Thread Thomas Schwinge
Hi Martin!

On 2022-01-13T09:06:16-0700, Martin Sebor  wrote:
> On 1/13/22 03:55, Thomas Schwinge wrote:
>> This has fallen out of (unfinished...) work earlier in the year: pushed
>> to master branch commit 4bd8b1e881f0c26a5103cd1919809b3d63b60ef2
>> "Document current '-Wuninitialized'/'-Wmaybe-uninitialized' diagnostics
>> for OpenACC test cases".
>
> Thanks for the heads up.  If any of these are recent regressions
> (either the false negatives or the false positives) it would be
> helpful to isolate them to a few representative test cases.
> The warning itself hasn't changed much in GCC 12 but regressions
> in it could be due to the jump threading changes that it tends to
> be sensitive to.

Ah, sorry for the ambiguity -- I don't think any of these are recent
regressions.


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[patch, libgfortran, power-ieee128] Add multiple defaults for GFORTRAN_CONVERT_UNIT

2022-01-13 Thread Thomas Koenig via Gcc-patches

Hello world,

with this patch, it is now possible to specify both the
endianness and the REAL(KIND=16) format using the
environment variable GFORTRAN_CONVERT_UNIT.  The following
now works:

koenig@gcc-fortran:~/Tst$ cat write_env.f90
program main
  real(kind=16) :: x
  character (len=30) :: conv
  x = 1/3._16
  open 
(10,file="out.dat",status="replace",access="stream",form="unformatted")

  inquire(10,convert=conv)
  print *,conv
  write (10) 1/3._16
end program main
tkoenig@gcc-fortran:~/Tst$ gfortran -g -static-libgfortran write_env.f90
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="little_endian;r16_ibm" 
&& ./a.out

 LITTLE_ENDIAN,R16_IBM
tkoenig@gcc-fortran:~/Tst$ 
GFORTRAN_CONVERT_UNIT="little_endian;r16_ieee" && ./a.out

 LITTLE_ENDIAN,R16_IEEE
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="big_endian;r16_ieee" 
&& ./a.out

 BIG_ENDIAN,R16_IEEE
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="big_endian;r16_ibm" && 
./a.out

 BIG_ENDIAN,R16_IBM

Since the branch has been pushed to trunk, I don't think we need
it any more (or do we?), so OK for trunk?

Best regards

Thomas

Allow for multiple defaults in endianness and r16 in GFORTRAN_CONVERT_UNIT.

With this patch, it is possible to specify multiple defaults inthe
GFORTRAN_CONVERT_UNIT environment variable so that, for example, R16_IEEE
and BIG_ENDIAN can be specified together.


libgfortran/ChangeLog:

* runtime/environ.c: Allow for multiple default values so that
separate default specifications for IBM long double format and
endianness are possible.

diff --git a/libgfortran/runtime/environ.c b/libgfortran/runtime/environ.c
index 3d60950234d..a53c64965b6 100644
--- a/libgfortran/runtime/environ.c
+++ b/libgfortran/runtime/environ.c
@@ -499,78 +499,79 @@ do_parse (void)
 
   unit_count = 0;
 
-  start = p;
-
   /* Parse the string.  First, let's look for a default.  */
-  tok = next_token ();
   endian = 0;
-
-  switch (tok)
+  while (1)
 {
-case NATIVE:
-  endian = GFC_CONVERT_NATIVE;
-  break;
+  start = p;
+  tok = next_token ();
+  switch (tok)
+	{
+	case NATIVE:
+	  endian = GFC_CONVERT_NATIVE;
+	  break;
 
-case SWAP:
-  endian = GFC_CONVERT_SWAP;
-  break;
+	case SWAP:
+	  endian = GFC_CONVERT_SWAP;
+	  break;
 
-case BIG:
-  endian = GFC_CONVERT_BIG;
-  break;
+	case BIG:
+	  endian = GFC_CONVERT_BIG;
+	  break;
 
-case LITTLE:
-  endian = GFC_CONVERT_LITTLE;
-  break;
+	case LITTLE:
+	  endian = GFC_CONVERT_LITTLE;
+	  break;
 
 #ifdef HAVE_GFC_REAL_17
-case R16_IEEE:
-  endian = GFC_CONVERT_R16_IEEE;
-  break;
+	case R16_IEEE:
+	  endian = GFC_CONVERT_R16_IEEE;
+	  break;
 
-case R16_IBM:
-  endian = GFC_CONVERT_R16_IBM;
-  break;
+	case R16_IBM:
+	  endian = GFC_CONVERT_R16_IBM;
+	  break;
 #endif
-case INTEGER:
-  /* A leading digit means that we are looking at an exception.
-	 Reset the position to the beginning, and continue processing
-	 at the exception list.  */
-  p = start;
-  goto exceptions;
-  break;
+	case INTEGER:
+	  /* A leading digit means that we are looking at an exception.
+	 Reset the position to the beginning, and continue processing
+	 at the exception list.  */
+	  p = start;
+	  goto exceptions;
+	  break;
 
-case END:
-  goto end;
-  break;
+	case END:
+	  goto end;
+	  break;
 
-default:
-  goto error;
-  break;
+	default:
+	  goto error;
+	  break;
 }
 
-  tok = next_token ();
-  switch (tok)
-{
-case ';':
-  def = endian;
-  break;
+  tok = next_token ();
+  switch (tok)
+	{
+	case ';':
+	  def = def == GFC_CONVERT_NONE ? endian : def | endian;
+	  break;
 
-case ':':
-  /* This isn't a default after all.  Reset the position to the
-	 beginning, and continue processing at the exception list.  */
-  p = start;
-  goto exceptions;
-  break;
+	case ':':
+	  /* This isn't a default after all.  Reset the position to the
+	 beginning, and continue processing at the exception list.  */
+	  p = start;
+	  goto exceptions;
+	  break;
 
-case END:
-  def = endian;
-  goto end;
-  break;
+	case END:
+	  def = def == GFC_CONVERT_NONE ? endian : def | endian;
+	  goto end;
+	  break;
 
-default:
-  goto error;
-  break;
+	default:
+	  goto error;
+	  break;
+	}
 }
 
  exceptions:


Re: [PATCH] cprop_hardreg: Workaround for narrow mode != lowpart targets

2022-01-13 Thread Andreas Krebbel via Gcc-patches
On 1/13/22 18:11, Andreas Krebbel via Gcc-patches wrote:
...
> @@ -5949,7 +5959,7 @@ register if floating point arithmetic is not being 
> done.  As long as the\n\
>  floating registers are not in class @code{GENERAL_REGS}, they will not\n\
>  be used unless some pattern's constraint asks for one.",
>   bool, (unsigned int regno, machine_mode mode),
> - hook_bool_uint_mode_true)
> + hook_bool_uint_true)
>  
>  DEFHOOK
>  (modes_tieable_p,

That hunk was a copy and paste bug and does not belong to the patch.

Andreas


[PATCH RFA] diagnostic: avoid repeating include path

2022-01-13 Thread Jason Merrill via Gcc-patches
When a sequence of diagnostic messages bounces back and forth repeatedly
between two includes, as with

 #include 
 std::map m ("123", "456");

The output is quite a bit longer than necessary because we dump the include
path each time it changes.  I'd think we could print the include path once
for each header file, and then expect that the user can look earlier in the
output if they're wondering.

Tested x86_64-pc-linux-gnu, OK for trunk?

gcc/ChangeLog:

* diagnostic.c (includes_seen): New.
(diagnostic_report_current_module): Use it.
---
 gcc/diagnostic.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 58139427d01..e56441a2dbf 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -700,6 +700,16 @@ set_last_module (diagnostic_context *context, const 
line_map_ordinary *map)
   context->last_module = map;
 }
 
+/* Only dump the "In file included from..." stack once for each file.  */
+
+static bool
+includes_seen (const line_map_ordinary *map)
+{
+  using hset = hash_set;
+  static hset *set = new hset;
+  return set->add (map);
+}
+
 void
 diagnostic_report_current_module (diagnostic_context *context, location_t 
where)
 {
@@ -721,7 +731,7 @@ diagnostic_report_current_module (diagnostic_context 
*context, location_t where)
   if (map && last_module_changed_p (context, map))
 {
   set_last_module (context, map);
-  if (! MAIN_FILE_P (map))
+  if (! MAIN_FILE_P (map) && !includes_seen (map))
{
  bool first = true, need_inc = true, was_module = MAP_MODULE_P (map);
  expanded_location s = {};

base-commit: b8ffa71e4271ae562c2d315b9b24c4979bbf8227
prerequisite-patch-id: e45065ef320968d982923dd44da7bed07e3326ef
-- 
2.27.0



Re: [PATCH RFA] diagnostic: avoid repeating include path

2022-01-13 Thread David Malcolm via Gcc-patches
On Thu, 2022-01-13 at 17:08 -0500, Jason Merrill wrote:
> When a sequence of diagnostic messages bounces back and forth
> repeatedly
> between two includes, as with
> 
>  #include 
>  std::map m ("123", "456");
> 
> The output is quite a bit longer than necessary because we dump the
> include
> path each time it changes.  I'd think we could print the include path
> once
> for each header file, and then expect that the user can look earlier
> in the
> output if they're wondering.
> 
> Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> gcc/ChangeLog:
> 
> * diagnostic.c (includes_seen): New.
> (diagnostic_report_current_module): Use it.
> ---
>  gcc/diagnostic.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
> index 58139427d01..e56441a2dbf 100644
> --- a/gcc/diagnostic.c
> +++ b/gcc/diagnostic.c
> @@ -700,6 +700,16 @@ set_last_module (diagnostic_context *context,
> const line_map_ordinary *map)
>    context->last_module = map;
>  }
>  
> +/* Only dump the "In file included from..." stack once for each
> file.  */
> +
> +static bool
> +includes_seen (const line_map_ordinary *map)
> +{
> +  using hset = hash_set;
> +  static hset *set = new hset;
> +  return set->add (map);
> +}

Overall, I like the idea, but...

- the patch works at the level of line_map_ordinary instances, rather
than header files.  There are various ways in which a single header
file can have multiple line maps e.g. due to very long lines, or
including another file, etc.  I think it makes sense to do it at the
per-file level, assuming we aren't in a horrible situation where a
header is being included repeatedly, with different effects.  So maybe
this ought to look at what include directive led to this map, i.e.
looking at the ord_map->included_from field, and having a
hash_set ?

- there's no test coverage, but it's probably not feasible to write
DejaGnu tests for this, given the way prune.exp's prune_gcc_output
strips these strings.  Maybe a dg directive to selectively disable the
pertinent pruning operations in prune_gcc_output???  Gah...

- global state is a pet peeve of mine; can the above state be put
inside the diagnostic_context instead?   (perhaps via a pointer to a
wrapper class to avoid requiring all users of diagnostic.h to include
hash-set.h?).

Hope this is constructive
Dave

> +
>  void
>  diagnostic_report_current_module (diagnostic_context *context,
> location_t where)
>  {
> @@ -721,7 +731,7 @@ diagnostic_report_current_module
> (diagnostic_context *context, location_t where)
>    if (map && last_module_changed_p (context, map))
>  {
>    set_last_module (context, map);
> -  if (! MAIN_FILE_P (map))
> +  if (! MAIN_FILE_P (map) && !includes_seen (map))
> {
>   bool first = true, need_inc = true, was_module =
> MAP_MODULE_P (map);
>   expanded_location s = {};
> 
> base-commit: b8ffa71e4271ae562c2d315b9b24c4979bbf8227
> prerequisite-patch-id: e45065ef320968d982923dd44da7bed07e3326ef




Re: [PATCH] c++: Avoid some -Wreturn-type false positives with const{expr,eval} if [PR103991]

2022-01-13 Thread Jason Merrill via Gcc-patches

On 1/13/22 16:23, Jakub Jelinek wrote:

On Thu, Jan 13, 2022 at 04:09:22PM -0500, Jason Merrill wrote:

The changes done to genericize_if_stmt in order to improve
-Wunreachable-code* warning (which Richi didn't actually commit
for GCC 12) are I think fine for normal ifs, but for constexpr if
and consteval if we have two competing warnings.
The problem is that we replace the non-taken clause (then or else)
with void_node and keep the if (cond) { something } else {}
or if (cond) {} else { something }; in the IL.
This helps -Wunreachable-code*, if something can't fallthru but the
non-taken clause can, we don't warn about code after it because it
is still (in theory) reachable.
But if the non-taken branch can't fallthru, we can get false positive
-Wreturn-type warnings (which are enabled by default) if there is
nothing after the if and the taken branch can't fallthru either.


Perhaps we should replace the non-taken clause with __builtin_unreachable()
instead of void_node?


It depends.  If the non-taken clause doesn't exist, is empty or otherwise
can fallthru, then using void_node for it is what we want.
If it exists and can't fallthru, then __builtin_unreachable() is one
possibility, but for all purpose
   if (1)
 something
   else
 __builtin_unreachable();
is equivalent to genericization of it as
   something
and
   if (0)
 __builtin_unreachable();
   else
 something
too.
The main problem is what to do for the consteval if that throws away
the non-taken clause too early, whether we can do block_may_fallthru
already where we throw it away or not.  If we can do that, we could
as right now clear the non-taken clause if it can fallthru and otherwise
either set some flag on the IF_STMT or set the non-taken clause to
__builtin_unreachable or endless empty loop etc., ideally something
as cheap as possible.
  

And/or block_may_fallthru could handle INTEGER_CST op0?


That is what I'm doing for consteval if in the patch because the info
whether the non-taken clause can fallthru is lost.
We can't do that for normal if, because the non-taken clause could
have labels in it to which something jumps.
But, block_may_fallthru isn't actually what is used for the -Wreturn-type
warning, I think we warn only at cfg creation.


Fair enough.  The patch is OK.

Jason



  1   2   >