[PATCH] c++: improve diagnostic of 'return's in coroutines

2024-08-07 Thread Arsen Arsenović
Enlargening the function-specific data block is not great.  I've
considered changing the location of RETURN_STMT expressions to cover
everything from the return expression to input_location after parsing
the returned expr.  The result of that is:

test.cc:38:3: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
   38 |   return {};
  |   ^
test.cc:37:3: note: function was made a coroutine here
   37 |   co_return;
  |   ^

... so, not bad, but I'm not sure how intrusive such a change would be
(haven't tried the testsuite).  The current patch produces:

test.cc:36:3: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
   36 |   return {};
  |   ^~
test.cc:35:3: note: function was made a coroutine here
   35 |   co_return;
  |   ^


Is there a better location to use here or is the current (latter) one
OK?  I haven't managed to found a nicer existing one.  We also can't
stash it in coroutine_info because a function might not have that at
time we parse a return.

Tested on x86_64-pc-linux-gnu.

Have a lovely evening.
-- >8 --
We now point out why a function is a coroutine.

gcc/cp/ChangeLog:

* coroutines.cc (coro_function_valid_p): Change how we diagnose
returning coroutines.
* cp-tree.h (struct language_function): Add first_return_loc
field.  Tracks the location of the first return encountered
during parsing.
(current_function_first_return_loc): New macro.  Expands to the
current functions' first_return_loc.
* parser.cc (cp_parser_jump_statement): If parsing a RID_RETURN,
save its location to current_function_first_return_loc.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-return-syntax-08-bad-return.C: Update to
match new diagnostic.
---
 gcc/cp/coroutines.cc  | 14 +++--
 gcc/cp/cp-tree.h  |  6 +++
 gcc/cp/parser.cc  |  4 ++
 .../co-return-syntax-08-bad-return.C  | 52 +--
 4 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 0f4dc42ec1c8..f32c7a2eec8d 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -968,11 +968,15 @@ coro_function_valid_p (tree fndecl)
 
   if (current_function_returns_value || current_function_returns_null)
 {
-   /* TODO: record or extract positions of returns (and the first coro
- keyword) so that we can add notes to the diagnostic about where
- the bad keyword is and what made the function into a coro.  */
-  error_at (f_loc, "a % statement is not allowed in coroutine;"
-   " did you mean %?");
+  coroutine_info *coro_info = get_or_insert_coroutine_info (fndecl);
+  auto retloc = current_function_first_return_loc;
+  gcc_checking_assert (retloc && coro_info->first_coro_keyword);
+
+  auto_diagnostic_group diaggrp;
+  error_at (retloc, "a % statement is not allowed in coroutine;"
+   " did you mean %?");
+  inform (coro_info->first_coro_keyword,
+ "function was made a coroutine here");
   return false;
 }
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 911d1d7924cc..68c681150a1f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2123,6 +2123,8 @@ struct GTY(()) language_function {
   tree x_vtt_parm;
   tree x_return_value;
 
+  location_t first_return_loc;
+
   BOOL_BITFIELD returns_value : 1;
   BOOL_BITFIELD returns_null : 1;
   BOOL_BITFIELD returns_abnormally : 1;
@@ -2217,6 +2219,10 @@ struct GTY(()) language_function {
 #define current_function_return_value \
   (cp_function_chain->x_return_value)
 
+/* Location of the first 'return' stumbled upon during parsing.  */
+
+#define current_function_first_return_loc cp_function_chain->first_return_loc
+
 /* In parser.cc.  */
 extern tree cp_literal_operator_id (const char *);
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index eb102dea8299..6cfe42f3bdd6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -14957,6 +14957,10 @@ cp_parser_jump_statement (cp_parser* parser, tree 
&std_attrs)
  AGGR_INIT_EXPR_MUST_TAIL (ret_expr) = musttail_p;
else
  set_musttail_on_return (expr, token->location, musttail_p);
+
+   /* Save where we saw this keyword.  */
+   if (current_function_first_return_loc == UNKNOWN_LOCATION)
+ current_function_first_return_loc = token->location;
  }
 
/* Build the return-statement, check co-return first, since type
diff --git a/gcc/testsuite/g++.dg/coroutines/co-return-syntax-08-bad-return.C 
b/gcc/testsuite/g++.dg/coroutines/co-return-syntax-08-bad-return.C
index 148ee4543e87..1e5d9e7a65a1 100644
--- a/gcc/testsuite/g++.dg/coroutines/co-return-syntax-08-bad-return.C
+++ b/gcc/testsuite/g++.dg/coroutines/co-return-syntax-08

Re: [PATCH v2] c++/modules: Handle instantiating qualified template friend classes [PR115801]

2024-08-07 Thread Jason Merrill

On 8/7/24 7:45 PM, Nathaniel Shead wrote:

On Wed, Aug 07, 2024 at 01:44:31PM -0400, Jason Merrill wrote:

On 8/6/24 2:35 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Another potential approach would be to go searching for this unexported
type and load it, either with a new LOOK_want::ANY_REACHABLE flag or by
expanding on the lookup_imported_temploid_friend hack.  I'm still not
exactly sure how name lookup for template friends is supposed to behave
though, specifically in terms of when and where they should conflict
with other entities with the same name.


CWG2588 tried to clarify this in https://eel.is/c++draft/basic#link-4.8 --
if there's a matching reachable declaration, the friend refers to it even if
it isn't visible to name lookup.

It seems like an oversight that the new second bullet refers specifically to
functions, it seems natural for it to apply to classes as well.

So, they correspond but do not conflict because they declare the same
entity.



Right, makes sense.  OK, I'll work on filling out our testcases to make
sure that we cover everything under that interpretation and potentially
come back to making an ANY_REACHABLE flag for this.


The relevant paragraphs seem to be https://eel.is/c++draft/temp.friend#2
and/or https://eel.is/c++draft/dcl.meaning.general#2.2.2, in addition to
the usual rules in [basic.link] and [basic.scope.scope], but how these
all are supposed to interact isn't super clear to me right now.

Additionally I wonder if maybe the better approach long-term would be to
focus on getting textual redefinitions working first, and then reuse
whatever logic we build for that to handle template friends rather than
relying on finding these hidden 'mergeable' slots first.


I have a WIP patch to allow textual redefinitions by teaching
duplicate_decls that it's OK to redefine an imported GM entity, so
check_module_override works.

My current plan is to then just token-skip the bodies.  This won't diagnose
ODR problems, but our module merging doesn't do that consistently either.


@@ -11800,6 +11800,15 @@ tsubst_friend_class (tree friend_tmpl, tree args)
 input_location = saved_input_location;
}
   }
+  else if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
+  <= TMPL_ARGS_DEPTH (args))


This condition seems impossible normally; it's only satisfied in this
testcase because friend_tmpl doesn't actually represent the friend
declaration, it's already the named class template.  So the substitution in
the next else fails because it was done already.

If this condition is true, we could set tmpl = friend_tmpl earlier, and skip
doing name lookup at all.

It's interesting that the previous if does the same depth comparison, and
that dates back to 2002; I wonder why it was needed then?

Jason



Ah right, I see.  I think the depth comparison in the previous if
actually is for exactly the same reason, just for the normal case when
the template *is* found by name lookup, e.g.

   template  struct A {};
   template  struct B {
 template  friend struct ::A;
   };

This is g++.dg/template/friend5.  Here's an updated patch which is so
far very lightly tested, OK for trunk if full bootstrap+regtest
succeeds?

-- >8 --

With modules it may be the case that a template friend class provided
with a qualified name is not found by name lookup at instantiation time,
due to the class not being exported from its module.  This causes issues
in tsubst_friend_class which did not handle this case.

This is a more general issue, in fact, caused by the named friend class
not actually requiring tsubsting.  This was already worked around for
the "found by name lookup" case (g++.dg/template/friend5.C), but it
looks like there's no need to do name lookup at all for this to work.

PR c++/115801

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_class): Return the type directly when no
tsubsting is required.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-16_a.C: New test.
* g++.dg/modules/tpl-friend-16_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc  | 39 ++
  .../g++.dg/modules/tpl-friend-16_a.C  | 40 +++
  .../g++.dg/modules/tpl-friend-16_b.C  | 17 
  3 files changed, 79 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2db59213c54..ea00577fd37 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -11732,6 +11732,15 @@ tsubst_friend_class (tree friend_tmpl, tree args)
return TREE_TYPE (tmpl);
  }
  
+  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))

+  <= TMPL_ARGS_DEPTH (args))
+/* The template has already been subsituted, e.g. for


"substituted"

OK with that typo fix.

Jason



Re: [PATCH] c++/modules: Fix merging of GM entities in partitions [PR114950]

2024-08-07 Thread Jason Merrill

On 8/7/24 7:22 PM, Nathaniel Shead wrote:

On Wed, Aug 07, 2024 at 04:18:47PM -0400, Jason Merrill wrote:

On 8/5/24 9:16 AM, Nathaniel Shead wrote:

Bootstrapped and regtested (so far just modules.exp) on
x86_64-pc-linux-gnu, OK for trunk if full regtest passes?


OK.


@@ -11316,6 +11319,7 @@ trees_in::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
  case NAMESPACE_DECL:
if (is_attached
+   && !is_imported_temploid_friend


How can a namespace be an imported temploid friend?


Cut off by context, but this is

switch (TREE_CODE (container))
  {
  default:
gcc_unreachable ();

  case NAMESPACE_DECL:
if (is_attached
&& !is_imported_temploid_friend
&& !(state->is_module () || state->is_partition ()))
  kind = "unique";

i.e. the NAMESPACE_DECL is referring to the container that the decl is
attached to for merging purposes.


Oops, yes, I figured that out but forgot to delete that comment.  :)


&& !(state->is_module () || state->is_partition ()))
  kind = "unique";
else
@@ -11347,7 +11351,9 @@ trees_in::key_mergeable (int tag, merge_kind mk, tree 
decl, tree inner,
break;
  case TYPE_DECL:
-   if (is_attached && !(state->is_module () || state->is_partition ())
+   if (is_attached
+   && !is_imported_temploid_friend


This is the one that may perhaps be unnecessary (on thinking over this
again I would expect any class-scope friends to not be redeclared
outside of their named module, even for imported templates?), so I'll
actually re-test this patch without this hunk.


Sounds good.

Jason



Re: [PATCH] c++: improve diagnostic of 'return's in coroutines

2024-08-07 Thread Jason Merrill

On 8/7/24 7:31 PM, Arsen Arsenović wrote:

Enlargening the function-specific data block is not great.


Indeed, I think it would be better to search DECL_SAVED_TREE for a 
RETURN_STMT once we've decided to give an error.



I've
considered changing the location of RETURN_STMT expressions to cover
everything from the return expression to input_location after parsing
the returned expr.  The result of that is:

test.cc:38:3: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
38 |   return {};
   |   ^
test.cc:37:3: note: function was made a coroutine here
37 |   co_return;
   |   ^

... so, not bad, but I'm not sure how intrusive such a change would be
(haven't tried the testsuite).  The current patch produces:

test.cc:36:3: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
36 |   return {};
   |   ^~
test.cc:35:3: note: function was made a coroutine here
35 |   co_return;
   |   ^


Is there a better location to use here or is the current (latter) one
OK?


The latter seems fine.


I haven't managed to found a nicer existing one.  We also can't
stash it in coroutine_info because a function might not have that at
time we parse a return.

Tested on x86_64-pc-linux-gnu.

Have a lovely evening.
-- >8 --
We now point out why a function is a coroutine.

gcc/cp/ChangeLog:

* coroutines.cc (coro_function_valid_p): Change how we diagnose
returning coroutines.
* cp-tree.h (struct language_function): Add first_return_loc
field.  Tracks the location of the first return encountered
during parsing.
(current_function_first_return_loc): New macro.  Expands to the
current functions' first_return_loc.
* parser.cc (cp_parser_jump_statement): If parsing a RID_RETURN,
save its location to current_function_first_return_loc.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-return-syntax-08-bad-return.C: Update to
match new diagnostic.
---
  gcc/cp/coroutines.cc  | 14 +++--
  gcc/cp/cp-tree.h  |  6 +++
  gcc/cp/parser.cc  |  4 ++
  .../co-return-syntax-08-bad-return.C  | 52 +--
  4 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 0f4dc42ec1c8..f32c7a2eec8d 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -968,11 +968,15 @@ coro_function_valid_p (tree fndecl)
  
if (current_function_returns_value || current_function_returns_null)

  {
-   /* TODO: record or extract positions of returns (and the first coro
- keyword) so that we can add notes to the diagnostic about where
- the bad keyword is and what made the function into a coro.  */
-  error_at (f_loc, "a % statement is not allowed in coroutine;"
-   " did you mean %?");
+  coroutine_info *coro_info = get_or_insert_coroutine_info (fndecl);
+  auto retloc = current_function_first_return_loc;
+  gcc_checking_assert (retloc && coro_info->first_coro_keyword);
+
+  auto_diagnostic_group diaggrp;
+  error_at (retloc, "a % statement is not allowed in coroutine;"
+   " did you mean %?");
+  inform (coro_info->first_coro_keyword,
+ "function was made a coroutine here");
return false;
  }
  
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h

index 911d1d7924cc..68c681150a1f 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -2123,6 +2123,8 @@ struct GTY(()) language_function {
tree x_vtt_parm;
tree x_return_value;
  
+  location_t first_return_loc;

+
BOOL_BITFIELD returns_value : 1;
BOOL_BITFIELD returns_null : 1;
BOOL_BITFIELD returns_abnormally : 1;
@@ -2217,6 +2219,10 @@ struct GTY(()) language_function {
  #define current_function_return_value \
(cp_function_chain->x_return_value)
  
+/* Location of the first 'return' stumbled upon during parsing.  */

+
+#define current_function_first_return_loc cp_function_chain->first_return_loc
+
  /* In parser.cc.  */
  extern tree cp_literal_operator_id (const char *);
  
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc

index eb102dea8299..6cfe42f3bdd6 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -14957,6 +14957,10 @@ cp_parser_jump_statement (cp_parser* parser, tree 
&std_attrs)
  AGGR_INIT_EXPR_MUST_TAIL (ret_expr) = musttail_p;
else
  set_musttail_on_return (expr, token->location, musttail_p);
+
+   /* Save where we saw this keyword.  */
+   if (current_function_first_return_loc == UNKNOWN_LOCATION)
+ current_function_first_return_loc = token->location;
  }
  
  	/* Build the return-statement, check co-return first, since type

diff --git a/gcc/testsuite/g++.dg/coroutines/co-return-syntax-08-bad-return.C 
b/gcc/tests

Re: [PATCH] c++/modules: Clarify error message in read_enum_def

2024-08-07 Thread Jason Merrill

On 8/7/24 7:08 PM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?


OK.


-- >8 --

This error message reads to me the wrong way around, particularly in the
context of other errors.  Updated so that the ellipsis connect.

gcc/cp/ChangeLog:

* module.cc (trees_in::read_enum_def): Clarify error.

gcc/testsuite/ChangeLog:

* g++.dg/modules/enum-bad-1_b.C: Update error message.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/module.cc| 4 ++--
  gcc/testsuite/g++.dg/modules/enum-bad-1_b.C | 6 +++---
  2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 723f0890d96..0f3e1d97c53 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12687,9 +12687,9 @@ trees_in::read_enum_def (tree defn, tree maybe_template)
  if (known_decl && new_decl)
{
  inform (DECL_SOURCE_LOCATION (new_decl),
- "... this enumerator %qD", new_decl);
+ "enumerator %qD does not match ...", new_decl);
  inform (DECL_SOURCE_LOCATION (known_decl),
- "enumerator %qD does not match ...", known_decl);
+ "... this enumerator %qD", known_decl);
}
  else if (known_decl || new_decl)
{
diff --git a/gcc/testsuite/g++.dg/modules/enum-bad-1_b.C 
b/gcc/testsuite/g++.dg/modules/enum-bad-1_b.C
index b01cd66a14d..23e17b088a2 100644
--- a/gcc/testsuite/g++.dg/modules/enum-bad-1_b.C
+++ b/gcc/testsuite/g++.dg/modules/enum-bad-1_b.C
@@ -13,13 +13,13 @@ import "enum-bad-1_a.H";
  
  
  ONE one;

-// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:5:6: error: definition of 'enum 
ONE' does not match\n[^\n]*enum-bad-1_b.C:3:6: note: existing definition 'enum 
ONE'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:5:11: note: ... this enumerator 
'A'\n[^\n]*enum-bad-1_b.C:3:11: note: enumerator 'Q' does not match 
...\n[^\n]*enum-bad-1_b.C:15:1: note: during load of binding '::ONE'\n} }
+// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:5:6: error: definition of 'enum 
ONE' does not match\n[^\n]*enum-bad-1_b.C:3:6: note: existing definition 'enum 
ONE'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:5:11: note: enumerator 'A' does 
not match ...\n[^\n]*enum-bad-1_b.C:3:11: note: ... this enumerator 
'Q'\n[^\n]*enum-bad-1_b.C:15:1: note: during load of binding '::ONE'\n} }
  
  int i = TWO;

-// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:6:6: error: definition of 
'enum' does not match\n[^\n]*enum-bad-1_b.C:4:6: note: existing definition 
'enum'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:6:12: note: ... this enumerator 
'THREE'\n[^\n]*enum-bad-1_b.C:4:12: note: enumerator 'DREI' does not match 
...\n[^\n]*enum-bad-1_b.C:18:9: note: during load of binding '::TWO'\n} }
+// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:6:6: error: definition of 
'enum' does not match\n[^\n]*enum-bad-1_b.C:4:6: note: existing definition 
'enum'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:6:12: note: enumerator 'THREE' does not match 
...\n[^\n]*enum-bad-1_b.C:4:12: note: ... this enumerator 
'DREI'\n[^\n]*enum-bad-1_b.C:18:9: note: during load of binding '::TWO'\n} }
  
  FOUR four;

-// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:7:6: error: definition of 'enum 
FOUR' does not match\n[^\n]*enum-bad-1_b.C:5:6: note: existing definition 'enum 
FOUR'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:7:12: note: ... this enumerator 
'B'\n[^\n]*enum-bad-1_b.C:5:12: note: enumerator 'B' does not match 
...\n[^\n]*enum-bad-1_b.C:21:1: note: during load of binding '::FOUR'\n} }
+// { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:7:6: error: definition of 'enum 
FOUR' does not match\n[^\n]*enum-bad-1_b.C:5:6: note: existing definition 'enum 
FOUR'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:7:12: note: enumerator 'B' does 
not match ...\n[^\n]*enum-bad-1_b.C:5:12: note: ... this enumerator 
'B'\n[^\n]*enum-bad-1_b.C:21:1: note: during load of binding '::FOUR'\n} }
  
  FIVE five;

  // { dg-regexp {In module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:8:6: error: definition of 'enum 
FIVE' does not match\n[^\n]*enum-bad-1_b.C:6:6: note: existing definition 'enum 
FIVE'\nIn module [^\n]*enum-bad-1_a.H, imported at 
[^\n]*enum-bad-1_b.C:8:\n[^\n]*enum-bad-1_a.H:8:

Re: [PATCH v2] c++/modules: Handle instantiating qualified template friend classes [PR115801]

2024-08-07 Thread Patrick Palka
On Thu, 8 Aug 2024, Nathaniel Shead wrote:

> On Wed, Aug 07, 2024 at 01:44:31PM -0400, Jason Merrill wrote:
> > On 8/6/24 2:35 AM, Nathaniel Shead wrote:
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > 
> > > Another potential approach would be to go searching for this unexported
> > > type and load it, either with a new LOOK_want::ANY_REACHABLE flag or by
> > > expanding on the lookup_imported_temploid_friend hack.  I'm still not
> > > exactly sure how name lookup for template friends is supposed to behave
> > > though, specifically in terms of when and where they should conflict
> > > with other entities with the same name.
> > 
> > CWG2588 tried to clarify this in https://eel.is/c++draft/basic#link-4.8 --
> > if there's a matching reachable declaration, the friend refers to it even if
> > it isn't visible to name lookup.
> > 
> > It seems like an oversight that the new second bullet refers specifically to
> > functions, it seems natural for it to apply to classes as well.
> > 
> > So, they correspond but do not conflict because they declare the same
> > entity.
> > 
> 
> Right, makes sense.  OK, I'll work on filling out our testcases to make
> sure that we cover everything under that interpretation and potentially
> come back to making an ANY_REACHABLE flag for this.
> 
> > > The relevant paragraphs seem to be https://eel.is/c++draft/temp.friend#2
> > > and/or https://eel.is/c++draft/dcl.meaning.general#2.2.2, in addition to
> > > the usual rules in [basic.link] and [basic.scope.scope], but how these
> > > all are supposed to interact isn't super clear to me right now.
> > > 
> > > Additionally I wonder if maybe the better approach long-term would be to
> > > focus on getting textual redefinitions working first, and then reuse
> > > whatever logic we build for that to handle template friends rather than
> > > relying on finding these hidden 'mergeable' slots first.
> > 
> > I have a WIP patch to allow textual redefinitions by teaching
> > duplicate_decls that it's OK to redefine an imported GM entity, so
> > check_module_override works.
> > 
> > My current plan is to then just token-skip the bodies.  This won't diagnose
> > ODR problems, but our module merging doesn't do that consistently either.
> > 
> > > @@ -11800,6 +11800,15 @@ tsubst_friend_class (tree friend_tmpl, tree args)
> > > input_location = saved_input_location;
> > >   }
> > >   }
> > > +  else if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
> > > +<= TMPL_ARGS_DEPTH (args))
> > 
> > This condition seems impossible normally; it's only satisfied in this
> > testcase because friend_tmpl doesn't actually represent the friend
> > declaration, it's already the named class template.  So the substitution in
> > the next else fails because it was done already.
> > 
> > If this condition is true, we could set tmpl = friend_tmpl earlier, and skip
> > doing name lookup at all.
> > 
> > It's interesting that the previous if does the same depth comparison, and
> > that dates back to 2002; I wonder why it was needed then?
> > 
> > Jason
> > 
> 
> Ah right, I see.  I think the depth comparison in the previous if
> actually is for exactly the same reason, just for the normal case when
> the template *is* found by name lookup, e.g. 
> 
>   template  struct A {};
>   template  struct B {
> template  friend struct ::A;
>   };
> 
> This is g++.dg/template/friend5.  Here's an updated patch which is so
> far very lightly tested, OK for trunk if full bootstrap+regtest
> succeeds?
> 
> -- >8 --
> 
> With modules it may be the case that a template friend class provided
> with a qualified name is not found by name lookup at instantiation time,
> due to the class not being exported from its module.  This causes issues
> in tsubst_friend_class which did not handle this case.
> 
> This is a more general issue, in fact, caused by the named friend class
> not actually requiring tsubsting.  This was already worked around for
> the "found by name lookup" case (g++.dg/template/friend5.C), but it
> looks like there's no need to do name lookup at all for this to work.
> 
>   PR c++/115801
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_friend_class): Return the type directly when no
>   tsubsting is required.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/tpl-friend-16_a.C: New test.
>   * g++.dg/modules/tpl-friend-16_b.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/pt.cc  | 39 ++
>  .../g++.dg/modules/tpl-friend-16_a.C  | 40 +++
>  .../g++.dg/modules/tpl-friend-16_b.C  | 17 
>  3 files changed, 79 insertions(+), 17 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_a.C
>  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_b.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 2db59213c54..ea00577fd37 100644
> --- a/gcc/cp/pt.cc
> +++ 

Re: [PATCH] c++: improve diagnostic of 'return's in coroutines

2024-08-07 Thread Arsen Arsenović
Jason Merrill  writes:

> On 8/7/24 7:31 PM, Arsen Arsenović wrote:
>> Enlargening the function-specific data block is not great.
>
> Indeed, I think it would be better to search DECL_SAVED_TREE for a RETURN_STMT
> once we've decided to give an error.

The trouble with that is that finish_return_stmt currently uses
input_location as the location for the entire return expr, so the
location ends up being after the entire return value.

I've hacked in a way to provide a different location to
finih_return_stmt, when applying it like below, the produced result is
the first result in the original email:

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6cfe42f3bdd6..44b45f16b026 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -14965,12 +14965,15 @@ cp_parser_jump_statement (cp_parser* parser, tree 
&std_attrs)
 
/* Build the return-statement, check co-return first, since type
  deduction is not valid there.  */
+   auto l = make_location (token->location,
+   token->location,
+   input_location);
if (keyword == RID_CO_RETURN)
 statement = finish_co_return_stmt (token->location, expr);
else if (FNDECL_USED_AUTO (current_function_decl) && in_discarded_stmt)
 /* Don't deduce from a discarded return statement.  */;
else
-statement = finish_return_stmt (expr);
+statement = finish_return_stmt (expr, l);
/* Look for the final `;'.  */
cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
   }

... without this change (so, using input_location), the result is:

test.cc:38:11: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
   38 |   return {};
  |   ^

... which is not the best.  That's the change I'm referring to in the
original post that I haven't ran the testsuite on.  Changing that
location allows for simply searching DECL_SAVED_TREE (fndecl), though,
and getting a good location out of it.

>> I've
>> considered changing the location of RETURN_STMT expressions to cover
>> everything from the return expression to input_location after parsing
>> the returned expr.  The result of that is:
>> test.cc:38:3: error: a ‘return’ statement is not allowed in coroutine; did
>> you mean ‘co_return’?
>> 38 |   return {};
>>|   ^
>> test.cc:37:3: note: function was made a coroutine here
>> 37 |   co_return;
>>|   ^
>> ... so, not bad, but I'm not sure how intrusive such a change would be
>> (haven't tried the testsuite).  The current patch produces:
>> test.cc:36:3: error: a ‘return’ statement is not allowed in coroutine; did
>> you mean ‘co_return’?
>> 36 |   return {};
>>|   ^~
>> test.cc:35:3: note: function was made a coroutine here
>> 35 |   co_return;
>>|   ^
>> Is there a better location to use here or is the current (latter) one
>> OK?
>
> The latter seems fine.
>
>> I haven't managed to found a nicer existing one.  We also can't
>> stash it in coroutine_info because a function might not have that at
>> time we parse a return.
>> Tested on x86_64-pc-linux-gnu.
>> Have a lovely evening.
>> -- >8 --
>> We now point out why a function is a coroutine.
>> gcc/cp/ChangeLog:
>>  * coroutines.cc (coro_function_valid_p): Change how we diagnose
>>  returning coroutines.
>>  * cp-tree.h (struct language_function): Add first_return_loc
>>  field.  Tracks the location of the first return encountered
>>  during parsing.
>>  (current_function_first_return_loc): New macro.  Expands to the
>>  current functions' first_return_loc.
>>  * parser.cc (cp_parser_jump_statement): If parsing a RID_RETURN,
>>  save its location to current_function_first_return_loc.
>> gcc/testsuite/ChangeLog:
>>  * g++.dg/coroutines/co-return-syntax-08-bad-return.C: Update to
>>  match new diagnostic.
>> ---
>>   gcc/cp/coroutines.cc  | 14 +++--
>>   gcc/cp/cp-tree.h  |  6 +++
>>   gcc/cp/parser.cc  |  4 ++
>>   .../co-return-syntax-08-bad-return.C  | 52 +--
>>   4 files changed, 68 insertions(+), 8 deletions(-)
>> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
>> index 0f4dc42ec1c8..f32c7a2eec8d 100644
>> --- a/gcc/cp/coroutines.cc
>> +++ b/gcc/cp/coroutines.cc
>> @@ -968,11 +968,15 @@ coro_function_valid_p (tree fndecl)
>>   if (current_function_returns_value || current_function_returns_null)
>>   {
>> -   /* TODO: record or extract positions of returns (and the first coro
>> -  keyword) so that we can add notes to the diagnostic about where
>> -  the bad keyword is and what made the function into a coro.  */
>> -  error_at (f_loc, "a % statement is not allowed in coroutine;"
>> -" did you mean %?");
>> +  coroutine_info *coro_info = get_or_insert_coroutine_

Re: [PATCH v2] c++/modules: Handle instantiating qualified template friend classes [PR115801]

2024-08-07 Thread Patrick Palka
On Wed, 7 Aug 2024, Patrick Palka wrote:

> On Thu, 8 Aug 2024, Nathaniel Shead wrote:
> 
> > On Wed, Aug 07, 2024 at 01:44:31PM -0400, Jason Merrill wrote:
> > > On 8/6/24 2:35 AM, Nathaniel Shead wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > 
> > > > Another potential approach would be to go searching for this unexported
> > > > type and load it, either with a new LOOK_want::ANY_REACHABLE flag or by
> > > > expanding on the lookup_imported_temploid_friend hack.  I'm still not
> > > > exactly sure how name lookup for template friends is supposed to behave
> > > > though, specifically in terms of when and where they should conflict
> > > > with other entities with the same name.
> > > 
> > > CWG2588 tried to clarify this in https://eel.is/c++draft/basic#link-4.8 --
> > > if there's a matching reachable declaration, the friend refers to it even 
> > > if
> > > it isn't visible to name lookup.
> > > 
> > > It seems like an oversight that the new second bullet refers specifically 
> > > to
> > > functions, it seems natural for it to apply to classes as well.
> > > 
> > > So, they correspond but do not conflict because they declare the same
> > > entity.
> > > 
> > 
> > Right, makes sense.  OK, I'll work on filling out our testcases to make
> > sure that we cover everything under that interpretation and potentially
> > come back to making an ANY_REACHABLE flag for this.
> > 
> > > > The relevant paragraphs seem to be https://eel.is/c++draft/temp.friend#2
> > > > and/or https://eel.is/c++draft/dcl.meaning.general#2.2.2, in addition to
> > > > the usual rules in [basic.link] and [basic.scope.scope], but how these
> > > > all are supposed to interact isn't super clear to me right now.
> > > > 
> > > > Additionally I wonder if maybe the better approach long-term would be to
> > > > focus on getting textual redefinitions working first, and then reuse
> > > > whatever logic we build for that to handle template friends rather than
> > > > relying on finding these hidden 'mergeable' slots first.
> > > 
> > > I have a WIP patch to allow textual redefinitions by teaching
> > > duplicate_decls that it's OK to redefine an imported GM entity, so
> > > check_module_override works.
> > > 
> > > My current plan is to then just token-skip the bodies.  This won't 
> > > diagnose
> > > ODR problems, but our module merging doesn't do that consistently either.
> > > 
> > > > @@ -11800,6 +11800,15 @@ tsubst_friend_class (tree friend_tmpl, tree 
> > > > args)
> > > > input_location = saved_input_location;
> > > > }
> > > >   }
> > > > +  else if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
> > > > +  <= TMPL_ARGS_DEPTH (args))
> > > 
> > > This condition seems impossible normally; it's only satisfied in this
> > > testcase because friend_tmpl doesn't actually represent the friend
> > > declaration, it's already the named class template.  So the substitution 
> > > in
> > > the next else fails because it was done already.
> > > 
> > > If this condition is true, we could set tmpl = friend_tmpl earlier, and 
> > > skip
> > > doing name lookup at all.
> > > 
> > > It's interesting that the previous if does the same depth comparison, and
> > > that dates back to 2002; I wonder why it was needed then?

I reckon the depth comparison in the previous if is equivalent to:

 if (DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (friend_tmpl))

But unfortunately we can't skip doing name lookup in that case due to
the mentioned example :/

> > > 
> > > Jason
> > > 
> > 
> > Ah right, I see.  I think the depth comparison in the previous if
> > actually is for exactly the same reason, just for the normal case when
> > the template *is* found by name lookup, e.g. 
> > 
> >   template  struct A {};
> >   template  struct B {
> > template  friend struct ::A;
> >   };
> > 
> > This is g++.dg/template/friend5.  Here's an updated patch which is so
> > far very lightly tested, OK for trunk if full bootstrap+regtest
> > succeeds?
> > 
> > -- >8 --
> > 
> > With modules it may be the case that a template friend class provided
> > with a qualified name is not found by name lookup at instantiation time,
> > due to the class not being exported from its module.  This causes issues
> > in tsubst_friend_class which did not handle this case.
> > 
> > This is a more general issue, in fact, caused by the named friend class
> > not actually requiring tsubsting.  This was already worked around for
> > the "found by name lookup" case (g++.dg/template/friend5.C), but it
> > looks like there's no need to do name lookup at all for this to work.
> > 
> > PR c++/115801
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (tsubst_friend_class): Return the type directly when no
> > tsubsting is required.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/modules/tpl-friend-16_a.C: New test.
> > * g++.dg/modules/tpl-friend-16_b.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
>

RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

2024-08-07 Thread Li, Pan2
Kindly ping++.

Pan

-Original Message-
From: Li, Pan2 
Sent: Wednesday, July 31, 2024 9:12 AM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: RE: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

Kindly ping.

Pan

-Original Message-
From: Li, Pan2  
Sent: Tuesday, July 23, 2024 1:06 PM
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; 
rdapp@gmail.com; Li, Pan2 
Subject: [PATCH v1] RISC-V: Implement the quad and oct .SAT_TRUNC for scalar

From: Pan Li 

This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:

Form 1:
  #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
  NT __attribute__((noinline)) \
  sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
  {\
bool overflow = x > (WT)(NT)(-1);  \
return ((NT)x) | (NT)-overflow;\
  }

DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)

Before this patch:
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   _Bool overflow;
   8   │   short unsigned int _1;
   9   │   short unsigned int _2;
  10   │   short unsigned int _3;
  11   │   uint16_t _6;
  12   │
  13   │ ;;   basic block 2, loop depth 0
  14   │ ;;pred:   ENTRY
  15   │   overflow_5 = x_4(D) > 65535;
  16   │   _1 = (short unsigned int) x_4(D);
  17   │   _2 = (short unsigned int) overflow_5;
  18   │   _3 = -_2;
  19   │   _6 = _1 | _3;
  20   │   return _6;
  21   │ ;;succ:   EXIT
  22   │
  23   │ }

After this patch:
   3   │
   4   │ __attribute__((noinline))
   5   │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
   6   │ {
   7   │   uint16_t _6;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _6 = .SAT_TRUNC (x_4(D)); [tail call]
  12   │   return _6;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc

gcc/ChangeLog:

* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc2):
Add new pattern for quad truncation.
(ustrunc2): Ditto but for oct.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/iterators.md | 20 
 gcc/config/riscv/riscv.md | 20 
 .../rvv/autovec/unop/vec_sat_u_trunc-2.c  |  2 +-
 .../rvv/autovec/unop/vec_sat_u_trunc-3.c  |  2 +-
 .../gcc.target/riscv/sat_arith_data.h | 51 +++
 .../gcc.target/riscv/sat_u_trunc-4.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-5.c  | 17 +++
 .../gcc.target/riscv/sat_u_trunc-6.c  | 20 
 .../gcc.target/riscv/sat_u_trunc-run-4.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-5.c  | 16 ++
 .../gcc.target/riscv/sat_u_trunc-run-6.c  | 16 ++
 11 files changed, 195 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-6.c

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 734da041f0c..bdcdb8babc8 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -67,14 +67,34 @@ (define_mode_iterator ANYI [QI HI SI (DI "TARGET_64BIT")])
 
 (define_mode_iterator ANYI_DOUBLE_TRUNC [HI SI (DI "TARGET_64BIT")])
 
+(define_mode_iterator ANYI_QUAD_TRUNC [SI (DI "TARGET_64BIT")])
+
+(define_mode_iterator ANYI_OCT_TRUNC [(DI "TARGET_64BIT")])
+
 (define_mode_attr ANYI_DOUBLE_TRUNCATED [
 

[PATCH v3] c++/modules: Handle instantiating already tsubsted template friend classes [PR115801]

2024-08-07 Thread Nathaniel Shead
On Wed, Aug 07, 2024 at 09:12:13PM -0400, Patrick Palka wrote:
> On Wed, 7 Aug 2024, Patrick Palka wrote:
> 
> > On Thu, 8 Aug 2024, Nathaniel Shead wrote:
> > 
> > > On Wed, Aug 07, 2024 at 01:44:31PM -0400, Jason Merrill wrote:
> > > > On 8/6/24 2:35 AM, Nathaniel Shead wrote:
> > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?
> > > > > 
> > > > > Another potential approach would be to go searching for this 
> > > > > unexported
> > > > > type and load it, either with a new LOOK_want::ANY_REACHABLE flag or 
> > > > > by
> > > > > expanding on the lookup_imported_temploid_friend hack.  I'm still not
> > > > > exactly sure how name lookup for template friends is supposed to 
> > > > > behave
> > > > > though, specifically in terms of when and where they should conflict
> > > > > with other entities with the same name.
> > > > 
> > > > CWG2588 tried to clarify this in https://eel.is/c++draft/basic#link-4.8 
> > > > --
> > > > if there's a matching reachable declaration, the friend refers to it 
> > > > even if
> > > > it isn't visible to name lookup.
> > > > 
> > > > It seems like an oversight that the new second bullet refers 
> > > > specifically to
> > > > functions, it seems natural for it to apply to classes as well.
> > > > 
> > > > So, they correspond but do not conflict because they declare the same
> > > > entity.
> > > > 
> > > 
> > > Right, makes sense.  OK, I'll work on filling out our testcases to make
> > > sure that we cover everything under that interpretation and potentially
> > > come back to making an ANY_REACHABLE flag for this.
> > > 
> > > > > The relevant paragraphs seem to be 
> > > > > https://eel.is/c++draft/temp.friend#2
> > > > > and/or https://eel.is/c++draft/dcl.meaning.general#2.2.2, in addition 
> > > > > to
> > > > > the usual rules in [basic.link] and [basic.scope.scope], but how these
> > > > > all are supposed to interact isn't super clear to me right now.
> > > > > 
> > > > > Additionally I wonder if maybe the better approach long-term would be 
> > > > > to
> > > > > focus on getting textual redefinitions working first, and then reuse
> > > > > whatever logic we build for that to handle template friends rather 
> > > > > than
> > > > > relying on finding these hidden 'mergeable' slots first.
> > > > 
> > > > I have a WIP patch to allow textual redefinitions by teaching
> > > > duplicate_decls that it's OK to redefine an imported GM entity, so
> > > > check_module_override works.
> > > > 
> > > > My current plan is to then just token-skip the bodies.  This won't 
> > > > diagnose
> > > > ODR problems, but our module merging doesn't do that consistently 
> > > > either.
> > > > 
> > > > > @@ -11800,6 +11800,15 @@ tsubst_friend_class (tree friend_tmpl, tree 
> > > > > args)
> > > > > input_location = saved_input_location;
> > > > >   }
> > > > >   }
> > > > > +  else if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
> > > > > +<= TMPL_ARGS_DEPTH (args))
> > > > 
> > > > This condition seems impossible normally; it's only satisfied in this
> > > > testcase because friend_tmpl doesn't actually represent the friend
> > > > declaration, it's already the named class template.  So the 
> > > > substitution in
> > > > the next else fails because it was done already.
> > > > 
> > > > If this condition is true, we could set tmpl = friend_tmpl earlier, and 
> > > > skip
> > > > doing name lookup at all.
> > > > 
> > > > It's interesting that the previous if does the same depth comparison, 
> > > > and
> > > > that dates back to 2002; I wonder why it was needed then?
> 
> I reckon the depth comparison in the previous if is equivalent to:
> 
>  if (DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (friend_tmpl))
> 
> But unfortunately we can't skip doing name lookup in that case due to
> the mentioned example :/
> 
> > > > 
> > > > Jason
> > > > 
> > > 
> > > Ah right, I see.  I think the depth comparison in the previous if
> > > actually is for exactly the same reason, just for the normal case when
> > > the template *is* found by name lookup, e.g. 
> > > 
> > >   template  struct A {};
> > >   template  struct B {
> > > template  friend struct ::A;
> > >   };
> > > 
> > > This is g++.dg/template/friend5.  Here's an updated patch which is so
> > > far very lightly tested, OK for trunk if full bootstrap+regtest
> > > succeeds?
> > > 
> > > -- >8 --
> > > 
> > > With modules it may be the case that a template friend class provided
> > > with a qualified name is not found by name lookup at instantiation time,
> > > due to the class not being exported from its module.  This causes issues
> > > in tsubst_friend_class which did not handle this case.
> > > 
> > > This is a more general issue, in fact, caused by the named friend class
> > > not actually requiring tsubsting.  This was already worked around for
> > > the "found by name lookup" case (g++.dg/template/friend5.C), but it
> > > looks like there's no need to do name lookup at all fo

[PATCH] doc: move the cross reference for -fprofile-arcs to the right paragraph

2024-08-07 Thread Wentao Zhang
The referenced page contains more explanation of auxname.gcda produced
by gcov profiler, which is a continuation of -fprofile-arcs's
description.

gcc/ChangeLog:

* doc/invoke.texi (Instrumentation Options): Move the cross
reference of "Cross-profiling" under the description for flag
"-fprofile-arcs".
---
 gcc/doc/invoke.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 27539a017..cd10d6cd5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17328,6 +17328,8 @@ Note that if a command line directly links source 
files, the corresponding
 E.g. @code{gcc a.c b.c -o binary} would generate @file{binary-a.gcda} and
 @file{binary-b.gcda} files.
 
+@xref{Cross-profiling}.
+
 @item -fcondition-coverage
 @opindex fcondition-coverage
 Add code so that program conditions are instrumented.  During execution the
@@ -17336,8 +17338,6 @@ can be used to verify that all terms in a Boolean 
function are tested and have
 an independent effect on the outcome of a decision.  The result can be read
 with @code{gcov --conditions}.
 
-@xref{Cross-profiling}.
-
 @cindex @command{gcov}
 @opindex coverage
 @item --coverage
-- 
2.34.1



Re: [PATCH v2] [libstdc++] [testsuite] avoid async.cc loss of precision [PR91486]

2024-08-07 Thread Alexandre Oliva
On Aug  1, 2024, Alexandre Oliva  wrote:

> Each iteration calls float_steady_clock::now() [...] an extra iteration
> will reach 5 and cause the test to fail.

> (Do we really want to use floats, that even with this tweak have
> borderline precision for sub-µs vs 1s deltas?  Do we want to make sure
> the wait time computation ensures we'll get past the deadline when the
> time is converted back to the given clock?)

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/658977.html

> for  libstdc++-v3/ChangeLog

>   PR libstdc++/91486
>   * testsuite/30_threads/async/async.cc
>   (test_pr91486_wait_for): Mark status as unused.
>   (test_pr91486_wait_until): Likewise.  Initialize epoch later.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] c++/modules: Fix merging of GM entities in partitions [PR114950]

2024-08-07 Thread Nathaniel Shead
On Wed, Aug 07, 2024 at 08:45:03PM -0400, Jason Merrill wrote:
> On 8/7/24 7:22 PM, Nathaniel Shead wrote:
> > On Wed, Aug 07, 2024 at 04:18:47PM -0400, Jason Merrill wrote:
> > > On 8/5/24 9:16 AM, Nathaniel Shead wrote:
> > > > Bootstrapped and regtested (so far just modules.exp) on
> > > > x86_64-pc-linux-gnu, OK for trunk if full regtest passes?
> > > 
> > > OK.
> > > 
> > > > @@ -11316,6 +11319,7 @@ trees_in::key_mergeable (int tag, merge_kind 
> > > > mk, tree decl, tree inner,
> > > >   case NAMESPACE_DECL:
> > > > if (is_attached
> > > > +   && !is_imported_temploid_friend
> > > 
> > > How can a namespace be an imported temploid friend?
> > 
> > Cut off by context, but this is
> > 
> > switch (TREE_CODE (container))
> >   {
> >   default:
> > gcc_unreachable ();
> > 
> >   case NAMESPACE_DECL:
> > if (is_attached
> > && !is_imported_temploid_friend
> > && !(state->is_module () || state->is_partition ()))
> >   kind = "unique";
> > 
> > i.e. the NAMESPACE_DECL is referring to the container that the decl is
> > attached to for merging purposes.
> 
> Oops, yes, I figured that out but forgot to delete that comment.  :)
> 
> > > > && !(state->is_module () || state->is_partition ()))
> > > >   kind = "unique";
> > > > else
> > > > @@ -11347,7 +11351,9 @@ trees_in::key_mergeable (int tag, merge_kind 
> > > > mk, tree decl, tree inner,
> > > > break;
> > > >   case TYPE_DECL:
> > > > -   if (is_attached && !(state->is_module () || 
> > > > state->is_partition ())
> > > > +   if (is_attached
> > > > +   && !is_imported_temploid_friend
> > 
> > This is the one that may perhaps be unnecessary (on thinking over this
> > again I would expect any class-scope friends to not be redeclared
> > outside of their named module, even for imported templates?), so I'll
> > actually re-test this patch without this hunk.
> 
> Sounds good.
> 
> Jason
> 

This is what I'll push if full bootstrap+regtest succeeds, replacing
that hunk with a 'gcc_checking_assert (!is_imported_temploid_friend)'
just to be extra clear.

-- >8 --

Currently name lookup generally seems to assume that all entities
declared within a named module (partition) are attached to said module,
which is not true for GM entities (e.g. via extern "C++"), and causes
issues with deduplication.

This patch fixes the issue by ensuring that module attachment of a
declaration is consistently used to handling merging.  Handling this
exposes some issues with deduplicating temploid friends; to resolve this
we always create the BINDING_SLOT_PARTITION slot so that we have
somewhere to place attached names (from any module).

This doesn't yet completely handle issues with allowing otherwise
conflicting temploid friends from different modules to co-exist in the
same module if neither are reachable from the other via name lookup.

PR c++/114950

gcc/cp/ChangeLog:

* module.cc (trees_out::decl_value): Stream bit indicating
imported temploid friends early.
(trees_in::decl_value): Use this bit with key_mergeable.
(trees_in::key_mergeable): Allow merging attached declarations
if they're imported temploid friends (which must be namespace
scope).
(module_state::read_cluster): Check for GM entities that may
require merging even when importing from partitions.
* name-lookup.cc (enum binding_slots): Adjust comment.
(get_fixed_binding_slot): Always create partition slot.
(name_lookup::search_namespace_only): Support binding vectors
with both partition and GM entities to dedup.
(walk_module_binding): Likewise.
(name_lookup::adl_namespace_fns): Likewise.
(set_module_binding): Likewise.
(check_module_override): Use attachment of the decl when
checking overrides rather than named_module_p.
(lookup_imported_hidden_friend): Use partition slot for finding
mergeable template bindings.
* name-lookup.h (set_module_binding): Split mod_glob_flag
parameter into separate global_p and partition_p params.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-13_e.C: Adjust error message.
* g++.dg/modules/ambig-2_a.C: New test.
* g++.dg/modules/ambig-2_b.C: New test.
* g++.dg/modules/part-9_a.C: New test.
* g++.dg/modules/part-9_b.C: New test.
* g++.dg/modules/part-9_c.C: New test.
* g++.dg/modules/tpl-friend-15.h: New test.
* g++.dg/modules/tpl-friend-15_a.C: New test.
* g++.dg/modules/tpl-friend-15_b.C: New test.
* g++.dg/modules/tpl-friend-15_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc  | 52 +--
 gcc/cp/name-lookup.cc | 65 ++-
 gcc/cp/name-lookup.h  

[PATCH v3] diagnostics: Follow DECL_ORIGIN in lhd_print_error_function [PR102061]

2024-08-07 Thread Peter Damianov
Currently, if a warning references a cloned function, the name of the cloned
function will be emitted in the "In function 'xyz'" part of the diagnostic,
which users aren't supposed to see. This patch follows the DECL_ORIGIN link
to get the name of the original function, so the internal compiler details
aren't exposed.

gcc/ChangeLog:
PR diagnostics/102061
* langhooks.cc (lhd_print_error_function): Follow DECL_ORIGIN
links.
* gcc.dg/pr102061.c: New testcase.

Signed-off-by: Peter Damianov 
---
v3: also follow DECL_ORIGIN when emitting "inlined from" warnings, I missed 
this before.
Add testcase.

 gcc/langhooks.cc|  3 +++
 gcc/testsuite/gcc.dg/pr102061.c | 35 +
 2 files changed, 38 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr102061.c

diff --git a/gcc/langhooks.cc b/gcc/langhooks.cc
index 61f2b676256..7a2a66b3c39 100644
--- a/gcc/langhooks.cc
+++ b/gcc/langhooks.cc
@@ -395,6 +395,8 @@ lhd_print_error_function (diagnostic_context *context, 
const char *file,
  else
fndecl = current_function_decl;
 
+ fndecl = DECL_ORIGIN(fndecl);
+
  if (TREE_CODE (TREE_TYPE (fndecl)) == METHOD_TYPE)
pp_printf
  (context->printer, _("In member function %qs"),
@@ -439,6 +441,7 @@ lhd_print_error_function (diagnostic_context *context, 
const char *file,
}
  if (fndecl)
{
+ fndecl = DECL_ORIGIN(fndecl);
  expanded_location s = expand_location (*locus);
  pp_comma (context->printer);
  pp_newline (context->printer);
diff --git a/gcc/testsuite/gcc.dg/pr102061.c b/gcc/testsuite/gcc.dg/pr102061.c
new file mode 100644
index 000..dbdd23965e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102061.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-Wall -O2" } */
+/* { dg-message "inlined from 'bar'" "" { target *-*-* } 0 } */
+/* { dg-excess-errors "" } */
+
+static inline void
+foo (char *p)
+{
+  __builtin___memcpy_chk (p, "abc", 3, __builtin_object_size (p, 0));
+}
+static void
+bar (char *p) __attribute__((noinline));
+static void
+bar (char *p)
+{
+  foo (p);
+}
+void f(char*) __attribute__((noipa));
+char buf[2];
+void
+baz (void) __attribute__((noinline));
+void
+baz (void)
+{
+  bar (buf);
+  f(buf);
+}
+
+void f(char*)
+{}
+
+int main(void)
+{
+baz();
+}
-- 
2.39.2



Re: [PATCH v3] c++/modules: Handle instantiating already tsubsted template friend classes [PR115801]

2024-08-07 Thread Jason Merrill

On 8/7/24 10:00 PM, Nathaniel Shead wrote:

On Wed, Aug 07, 2024 at 09:12:13PM -0400, Patrick Palka wrote:

On Wed, 7 Aug 2024, Patrick Palka wrote:


On Thu, 8 Aug 2024, Nathaniel Shead wrote:


On Wed, Aug 07, 2024 at 01:44:31PM -0400, Jason Merrill wrote:

On 8/6/24 2:35 AM, Nathaniel Shead wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

Another potential approach would be to go searching for this unexported
type and load it, either with a new LOOK_want::ANY_REACHABLE flag or by
expanding on the lookup_imported_temploid_friend hack.  I'm still not
exactly sure how name lookup for template friends is supposed to behave
though, specifically in terms of when and where they should conflict
with other entities with the same name.


CWG2588 tried to clarify this in https://eel.is/c++draft/basic#link-4.8 --
if there's a matching reachable declaration, the friend refers to it even if
it isn't visible to name lookup.

It seems like an oversight that the new second bullet refers specifically to
functions, it seems natural for it to apply to classes as well.

So, they correspond but do not conflict because they declare the same
entity.



Right, makes sense.  OK, I'll work on filling out our testcases to make
sure that we cover everything under that interpretation and potentially
come back to making an ANY_REACHABLE flag for this.


The relevant paragraphs seem to be https://eel.is/c++draft/temp.friend#2
and/or https://eel.is/c++draft/dcl.meaning.general#2.2.2, in addition to
the usual rules in [basic.link] and [basic.scope.scope], but how these
all are supposed to interact isn't super clear to me right now.

Additionally I wonder if maybe the better approach long-term would be to
focus on getting textual redefinitions working first, and then reuse
whatever logic we build for that to handle template friends rather than
relying on finding these hidden 'mergeable' slots first.


I have a WIP patch to allow textual redefinitions by teaching
duplicate_decls that it's OK to redefine an imported GM entity, so
check_module_override works.

My current plan is to then just token-skip the bodies.  This won't diagnose
ODR problems, but our module merging doesn't do that consistently either.


@@ -11800,6 +11800,15 @@ tsubst_friend_class (tree friend_tmpl, tree args)
 input_location = saved_input_location;
}
   }
+  else if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (friend_tmpl))
+  <= TMPL_ARGS_DEPTH (args))


This condition seems impossible normally; it's only satisfied in this
testcase because friend_tmpl doesn't actually represent the friend
declaration, it's already the named class template.  So the substitution in
the next else fails because it was done already.

If this condition is true, we could set tmpl = friend_tmpl earlier, and skip
doing name lookup at all.

It's interesting that the previous if does the same depth comparison, and
that dates back to 2002; I wonder why it was needed then?


I reckon the depth comparison in the previous if is equivalent to:

  if (DECL_UNINSTANTIATED_TEMPLATE_FRIEND_P (friend_tmpl))

But unfortunately we can't skip doing name lookup in that case due to
the mentioned example :/



Jason



Ah right, I see.  I think the depth comparison in the previous if
actually is for exactly the same reason, just for the normal case when
the template *is* found by name lookup, e.g.

   template  struct A {};
   template  struct B {
 template  friend struct ::A;
   };

This is g++.dg/template/friend5.  Here's an updated patch which is so
far very lightly tested, OK for trunk if full bootstrap+regtest
succeeds?

-- >8 --

With modules it may be the case that a template friend class provided
with a qualified name is not found by name lookup at instantiation time,
due to the class not being exported from its module.  This causes issues
in tsubst_friend_class which did not handle this case.

This is a more general issue, in fact, caused by the named friend class
not actually requiring tsubsting.  This was already worked around for
the "found by name lookup" case (g++.dg/template/friend5.C), but it
looks like there's no need to do name lookup at all for this to work.

PR c++/115801

gcc/cp/ChangeLog:

* pt.cc (tsubst_friend_class): Return the type directly when no
tsubsting is required.

gcc/testsuite/ChangeLog:

* g++.dg/modules/tpl-friend-16_a.C: New test.
* g++.dg/modules/tpl-friend-16_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/pt.cc  | 39 ++
  .../g++.dg/modules/tpl-friend-16_a.C  | 40 +++
  .../g++.dg/modules/tpl-friend-16_b.C  | 17 
  3 files changed, 79 insertions(+), 17 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_a.C
  create mode 100644 gcc/testsuite/g++.dg/modules/tpl-friend-16_b.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 2db59213c54..ea00577fd37 1006

[PATCH v2] RISC-V: Fix ICE for vector single-width integer multiply-add intrinsics

2024-08-07 Thread Jin Ma
When rs1 is the immediate 0, the following ICE occurs:

error: unrecognizable insn:
(insn 8 5 12 2 (set (reg:RVVM1DI 134 [  ])
(if_then_else:RVVM1DI (unspec:RVVMF64BI [
(const_vector:RVVMF64BI repeat [
(const_int 1 [0x1])
   ])
(reg/v:DI 137 [ vl ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 
[0]))
(reg/v:RVVM1DI 136 [ vs2 ]))
(reg/v:RVVM1DI 135 [ vd ]))
(reg/v:RVVM1DI 135 [ vd ])))

gcc/ChangeLog:

* config/riscv/vector.md: Allow scalar operand to be 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-7.c: New test.
* gcc.target/riscv/rvv/base/bug-8.c: New test.
---
 gcc/config/riscv/vector.md| 80 +--
 .../gcc.target/riscv/rvv/base/bug-7.c | 26 ++
 .../gcc.target/riscv/rvv/base/bug-8.c | 26 ++
 3 files changed, 92 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-8.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fb625f611d5..20420b74964 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5331,16 +5331,16 @@ (define_insn "*pred_madd_scalar"
  (plus:V_VLSI
(mult:V_VLSI
  (vec_duplicate:V_VLSI
-   (match_operand: 2 "register_operand" "  r,   r,  r,   r"))
+   (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  rJ"))
  (match_operand:V_VLSI 3 "register_operand"  "  0,  vr,  0,  
vr"))
(match_operand:V_VLSI 4 "register_operand"" vr,  vr, vr,  
vr"))
  (match_dup 3)))]
   "TARGET_VECTOR"
   "@
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%z2,%4%p1
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%z2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5363,16 +5363,16 @@ (define_insn "*pred_macc_scalar"
  (plus:V_VLSI
(mult:V_VLSI
  (vec_duplicate:V_VLSI
-   (match_operand: 2 "register_operand" "  r,   r,  r,   r"))
+   (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  rJ"))
  (match_operand:V_VLSI 3 "register_operand"  " vr,  vr, vr,  
vr"))
(match_operand:V_VLSI 4 "register_operand""  0,  vr,  0,  
vr"))
  (match_dup 4)))]
   "TARGET_VECTOR"
   "@
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmacc.vx\t%0,%z2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%z2,%3%p1
+   vmacc.vx\t%0,%z2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%z2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5431,16 +5431,16 @@ (define_insn "*pred_madd_extended_scalar"
(mult:V_VLSI_D
  (vec_duplicate:V_VLSI_D
(sign_extend:
- (match_operand: 2 "register_operand" "  r,   r,  r,   
r")))
+ (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  
rJ")))
  (match_operand:V_VLSI_D 3 "register_operand" "  0,  vr,  
0,  vr"))
(match_operand:V_VLSI_D 4 "register_operand"   " vr,  vr, 
vr,  vr"))
  (match_dup 3)))]
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m2r.v\t%0,%z2\;vmadd.vx\t%0,%z2,%4%p1
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m2r.v\t%0,%z2\;vmadd.vx\t%0,%z2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5464,16 +5464,16 @@ (define_insn "*pred_macc_extended_scalar"
(mult:V_VLSI_D
  (vec_duplicate:V_VLSI_D
(sign_extend:
- (match_operand: 2 "register_operand" "  r,   r,  r,   
r")))
+ (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  
rJ")))
  (match_operand:V_VLSI_D 3 "register_operand" " vr,  vr, 
vr,  vr"))
(match_operand:V_VLSI_D 4 "register_operand"   "  0,  vr,  
0,  vr"))
  (match_dup 4)))]
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmacc.vx\t%0,%z2,%3

Re: [PATCH] c++: improve diagnostic of 'return's in coroutines

2024-08-07 Thread Jason Merrill

On 8/7/24 9:00 PM, Arsen Arsenović wrote:

Jason Merrill  writes:


On 8/7/24 7:31 PM, Arsen Arsenović wrote:

Enlargening the function-specific data block is not great.


Indeed, I think it would be better to search DECL_SAVED_TREE for a RETURN_STMT
once we've decided to give an error.


The trouble with that is that finish_return_stmt currently uses
input_location as the location for the entire return expr, so the
location ends up being after the entire return value.

I've hacked in a way to provide a different location to
finih_return_stmt, when applying it like below, the produced result is
the first result in the original email:

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6cfe42f3bdd6..44b45f16b026 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -14965,12 +14965,15 @@ cp_parser_jump_statement (cp_parser* parser, tree 
&std_attrs)
  
  	/* Build the return-statement, check co-return first, since type

  deduction is not valid there.  */
+   auto l = make_location (token->location,
+   token->location,
+   input_location);
if (keyword == RID_CO_RETURN)
 statement = finish_co_return_stmt (token->location, expr);
else if (FNDECL_USED_AUTO (current_function_decl) && in_discarded_stmt)
 /* Don't deduce from a discarded return statement.  */;
else
-statement = finish_return_stmt (expr);
+statement = finish_return_stmt (expr, l);
/* Look for the final `;'.  */
cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON);
}

... without this change (so, using input_location), the result is:

test.cc:38:11: error: a ‘return’ statement is not allowed in coroutine; did you 
mean ‘co_return’?
38 |   return {};
   |   ^

... which is not the best.  That's the change I'm referring to in the
original post that I haven't ran the testsuite on.  Changing that
location allows for simply searching DECL_SAVED_TREE (fndecl), though,
and getting a good location out of it.


Sounds good.

Jason



[PATCH v2] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread Feng Wang
This patch add the vector rotate shift pattern for auto-vect.
With this patch, the scalar rotate shift can be automatically
vectorized into vector rotate shift.

gcc/ChangeLog:

* config/riscv/autovec.md (v3):
Add new define_expand pattern for vector rotate shift.
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.

Signed-off-by: Feng Wang 
---
 gcc/config/riscv/autovec.md   | 16 
 .../riscv/rvv/autovec/binop/vrolr-1.c |  9 ++
 .../riscv/rvv/autovec/binop/vrolr-run.c   | 88 +++
 .../riscv/rvv/autovec/binop/vrolr-template.h  | 29 ++
 4 files changed, 142 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-template.h

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0423d7bee13..e6649bf3f75 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2764,3 +2764,19 @@
 operands[2] = const0_rtx;
   }
 )
+
+;; -
+;; - vrol.vv vror.vv
+;; -
+(define_expand "v3"
+  [(set (match_operand:VI 0 "register_operand")
+(bitmanip_rotate:VI
+ (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "register_operand")))]
+  "TARGET_ZVBB || TARGET_ZVKB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (, mode),
+  riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
new file mode 100644
index 000..55dac27697c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vrolr-template.h"
+
+/* { dg-final { scan-assembler-times {\tvrol\.vv} 4 } } */
+/* { dg-final { scan-assembler-times {\tvror\.vv} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
new file mode 100644
index 000..b659a0804f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
@@ -0,0 +1,88 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define ARRAY_SIZE 512
+
+#define CIRCULAR_LEFT_SHIFT_ARRAY(arr, shifts, bit_size, size) \
+for (int i = 0; i < size; i++) { \
+(arr)[i] = (((arr)[i] << (shifts)[i]) | ((arr)[i] >> (bit_size - 
(shifts)[i]))); \
+}
+
+#define CIRCULAR_RIGHT_SHIFT_ARRAY(arr, shifts, bit_size, size) \
+for (int i = 0; i < size; i++) { \
+(arr)[i] = (((arr)[i] >> (shifts)[i]) | ((arr)[i] << (bit_size - 
(shifts)[i]))); \
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results8(
+uint8_t *result_left, uint8_t *result_right,
+int bit_size, uint8_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i]) | (i >> (bit_size - 
shift_values[i])));
+assert(result_right[i] == (i >> shift_values[i]) | (i << (bit_size - 
shift_values[i])));
+}
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results16(
+uint16_t *result_left, uint16_t *result_right,
+int bit_size, uint16_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i]) | (i >> (bit_size - 
shift_values[i])));
+assert(result_right[i] == (i >> shift_values[i]) | (i << (bit_size - 
shift_values[i])));
+}
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results32(
+uint32_t *result_left, uint32_t *result_right,
+int bit_size, uint32_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i]) | (i >> (bit_size - 
shift_values[i])));
+assert(result_right[i] == (i >> shift_values[i]) | (i << (bit_size - 
shift_values[i])));
+}
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results64(
+uint64_t *result_left, uint64_t *result_right,
+int bit_size, uint64_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] 

Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Jₑₙₛ Gustedt
Hello Alejandro,

On Thu, 8 Aug 2024 00:44:02 +0200, Alejandro Colomar wrote:

> +Its syntax is similar to @code{sizeof}.

For my curiosity, do you also make the same distinction that with
expressions you may omit the parenthesis?

I wouldn't be sure that we should continue that distinction from
`sizeof`. Also that prefix variant would be difficult to wrap in a
`lengthof` macro (without underscores) as we would probably like to
have it in the end.

Thanks
Jₑₙₛ


-- 
:: ICube :: deputy director ::
:: Université de Strasbourg :: ICPS ::
:: INRIA antenne de Strasbourg :: Camus ::
::  ☎ +33 368854536 ::
:: https://icube-icps.unistra.fr/index.php/Jens_Gustedt ::


pgpjNw9snjZs1.pgp
Description: OpenPGP digital signature


[PATCH v1 1/2] LoongArch: Drop vcond{,u} expanders.

2024-08-07 Thread Lulu Cheng
Optabs vcond{,u} will be removed for GCC 15.  Since regtest shows no
fallout, dropping the expanders, now.

gcc/ChangeLog:

PR target/114189
* config/loongarch/lasx.md (vcondu): Delete.
(vcond): Likewise.
* config/loongarch/lsx.md (vcondu): Likewise.
(vcond): Likewise.
---
 gcc/config/loongarch/lasx.md | 37 
 gcc/config/loongarch/lsx.md  | 31 --
 2 files changed, 68 deletions(-)

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index 7bd61f8ed5b..4087c4b5349 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -165,9 +165,6 @@ (define_c_enum "unspec" [
 ;; All vector modes with 256 bits.
 (define_mode_iterator LASX [V4DF V8SF V4DI V8SI V16HI V32QI])
 
-;; Same as LASX.  Used by vcond to iterate two modes.
-(define_mode_iterator LASX_2 [V4DF V8SF V4DI V8SI V16HI V32QI])
-
 ;; Only used for splitting insert_d and copy_{u,s}.d.
 (define_mode_iterator LASX_D [V4DI V4DF])
 
@@ -762,40 +759,6 @@ (define_expand "vec_perm"
DONE;
 })
 
-;; FIXME: 256??
-(define_expand "vcondu"
-  [(match_operand:LASX 0 "register_operand")
-   (match_operand:LASX 1 "reg_or_m1_operand")
-   (match_operand:LASX 2 "reg_or_0_operand")
-   (match_operator 3 ""
-[(match_operand:ILASX 4 "register_operand")
- (match_operand:ILASX 5 "register_operand")])]
-  "ISA_HAS_LASX
-   && (GET_MODE_NUNITS (mode)
-   == GET_MODE_NUNITS (mode))"
-{
-  loongarch_expand_vec_cond_expr (mode, mode,
- operands);
-  DONE;
-})
-
-;; FIXME: 256??
-(define_expand "vcond"
-  [(match_operand:LASX 0 "register_operand")
-   (match_operand:LASX 1 "reg_or_m1_operand")
-   (match_operand:LASX 2 "reg_or_0_operand")
-   (match_operator 3 ""
- [(match_operand:LASX_2 4 "register_operand")
-  (match_operand:LASX_2 5 "register_operand")])]
-  "ISA_HAS_LASX
-   && (GET_MODE_NUNITS (mode)
-   == GET_MODE_NUNITS (mode))"
-{
-  loongarch_expand_vec_cond_expr (mode, mode,
- operands);
-  DONE;
-})
-
 ;; Same as vcond_
 (define_expand "vcond_mask_"
   [(match_operand:LASX 0 "register_operand")
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 454cda47876..222a5afe5b2 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -186,9 +186,6 @@ (define_mode_attr VD2MODE
 ;; All vector modes with 128 bits.
 (define_mode_iterator LSX  [V2DF V4SF V2DI V4SI V8HI V16QI])
 
-;; Same as LSX.  Used by vcond to iterate two modes.
-(define_mode_iterator LSX_2[V2DF V4SF V2DI V4SI V8HI V16QI])
-
 ;; Only used for vilvh and splitting insert_d and copy_{u,s}.d.
 (define_mode_iterator LSX_D[V2DI V2DF])
 
@@ -533,34 +530,6 @@ (define_expand "vec_cmpu"
   DONE;
 })
 
-(define_expand "vcondu"
-  [(match_operand:LSX 0 "register_operand")
-   (match_operand:LSX 1 "reg_or_m1_operand")
-   (match_operand:LSX 2 "reg_or_0_operand")
-   (match_operator 3 ""
- [(match_operand:ILSX 4 "register_operand")
-  (match_operand:ILSX 5 "register_operand")])]
-  "ISA_HAS_LSX
-   && (GET_MODE_NUNITS (mode) == GET_MODE_NUNITS (mode))"
-{
-  loongarch_expand_vec_cond_expr (mode, mode, operands);
-  DONE;
-})
-
-(define_expand "vcond"
-  [(match_operand:LSX 0 "register_operand")
-   (match_operand:LSX 1 "reg_or_m1_operand")
-   (match_operand:LSX 2 "reg_or_0_operand")
-   (match_operator 3 ""
- [(match_operand:LSX_2 4 "register_operand")
-  (match_operand:LSX_2 5 "register_operand")])]
-  "ISA_HAS_LSX
-   && (GET_MODE_NUNITS (mode) == GET_MODE_NUNITS (mode))"
-{
-  loongarch_expand_vec_cond_expr (mode, mode, operands);
-  DONE;
-})
-
 (define_expand "vcond_mask_"
   [(match_operand:LSX 0 "register_operand")
(match_operand:LSX 1 "reg_or_m1_operand")
-- 
2.39.3



[PATCH v1 2/2] LoongArch: Provide ashr lshr and ashl RTL pattern for vectors.

2024-08-07 Thread Lulu Cheng
We support vashr vlshr and vashl. However, in r15-1638 support optimize
x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31.
To support this optimization, vector ashr lshr and ashl need to be implemented.

gcc/ChangeLog:

* config/loongarch/loongarch.md (insn): Added rotatert rotr pairs.
* config/loongarch/simd.md (rotr3): Remove to ...
(3): This.

gcc/testsuite/ChangeLog:

* g++.target/loongarch/vect-ashr-lshr.C: New test.
---
 gcc/config/loongarch/loongarch.md |   1 +
 gcc/config/loongarch/simd.md  |  13 +-
 .../g++.target/loongarch/vect-ashr-lshr.C | 147 ++
 3 files changed, 155 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/vect-ashr-lshr.C

diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index ee0310f2bd6..1f105cbf891 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -559,6 +559,7 @@ (define_code_attr optab [(ashift "ashl")
 (define_code_attr insn [(ashift "sll")
(ashiftrt "sra")
(lshiftrt "srl")
+   (rotatert "rotr")
(ior "or")
(xor "xor")
(and "and")
diff --git a/gcc/config/loongarch/simd.md b/gcc/config/loongarch/simd.md
index 00ff2823a4e..45ea114220e 100644
--- a/gcc/config/loongarch/simd.md
+++ b/gcc/config/loongarch/simd.md
@@ -306,14 +306,15 @@ (define_expand "rotl3"
 operands[4] = gen_reg_rtx (mode);
   });
 
-;; vrotri.{b/h/w/d}
+;; v{rotr/sll/sra/srl}i.{b/h/w/d}
 
-(define_insn "rotr3"
+(define_insn "3"
   [(set (match_operand:IVEC 0 "register_operand" "=f")
-   (rotatert:IVEC (match_operand:IVEC 1 "register_operand" "f")
-  (match_operand:SI 2 "const__operand")))]
-  ""
-  "vrotri.\t%0,%1,%2";
+   (shift_w:IVEC
+ (match_operand:IVEC 1 "register_operand" "f")
+ (match_operand:SI 2 "const__operand")))]
+  "ISA_HAS_LSX"
+  "vi.\t%0,%1,%2"
   [(set_attr "type" "simd_int_arith")
(set_attr "mode" "")])
 
diff --git a/gcc/testsuite/g++.target/loongarch/vect-ashr-lshr.C 
b/gcc/testsuite/g++.target/loongarch/vect-ashr-lshr.C
new file mode 100644
index 000..bcef985fae2
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/vect-ashr-lshr.C
@@ -0,0 +1,147 @@
+/* { dg-do compile } */
+/* { dg-options "-mlasx -O2" } */
+/* { dg-final { scan-assembler-times "vsrli.b" 2 } } */
+/* { dg-final { scan-assembler-times "vsrli.h" 2 } } */
+/* { dg-final { scan-assembler-times "vsrli.w" 2 } } */
+/* { dg-final { scan-assembler-times "vsrli.d" 2 } } */
+/* { dg-final { scan-assembler-times "vsrai.b" 2 } } */
+/* { dg-final { scan-assembler-times "vsrai.h" 2 } } */
+/* { dg-final { scan-assembler-times "vsrai.w" 2 } } */
+/* { dg-final { scan-assembler-times "vsrai.d" 2 } } */
+
+typedef signed char v16qi __attribute__((vector_size(16)));
+typedef signed char v32qi __attribute__((vector_size(32)));
+typedef short v8hi __attribute__((vector_size(16)));
+typedef short v16hi __attribute__((vector_size(32)));
+typedef int v4si __attribute__((vector_size(16)));
+typedef int v8si __attribute__((vector_size(32)));
+typedef long long v2di __attribute__((vector_size(16)));
+typedef long long v4di __attribute__((vector_size(32)));
+
+v16qi
+foo (v16qi a)
+{
+  v16qi const1_op = __extension__(v16qi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
+  v16qi const0_op = __extension__(v16qi){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v32qi
+foo2 (v32qi a)
+{
+  v32qi const1_op = 
__extension__(v32qi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
+  v32qi const0_op = 
__extension__(v32qi){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v8hi
+foo3 (v8hi a)
+{
+  v8hi const1_op = __extension__(v8hi){1,1,1,1,1,1,1,1};
+  v8hi const0_op = __extension__(v8hi){0,0,0,0,0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v16hi
+foo4 (v16hi a)
+{
+  v16hi const1_op = __extension__(v16hi){1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1};
+  v16hi const0_op = __extension__(v16hi){0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v4si
+foo5 (v4si a)
+{
+  v4si const1_op = __extension__(v4si){1,1,1,1};
+  v4si const0_op = __extension__(v4si){0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v8si
+foo6 (v8si a)
+{
+  v8si const1_op = __extension__(v8si){1,1,1,1,1,1,1,1};
+  v8si const0_op = __extension__(v8si){0,0,0,0,0,0,0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v2di
+foo7 (v2di a)
+{
+  v2di const1_op = __extension__(v2di){1,1};
+  v2di const0_op = __extension__(v2di){0,0};
+  return a < const0_op ? const1_op : const0_op;
+}
+
+v4di
+foo8 (v4di a)
+{
+  v4di const1_op = __extension__(v4di){1,1,1,1};
+  v4di const0_op = __extension__(v4di

Re: [wwwdocs] gcc-15: Mention c++ header dependency changes () in porting_to.html

2024-08-07 Thread Filip Kastl
On Tue 2024-08-06 17:00:24, Gerald Pfeifer wrote:
> > +
> > +The following headers are used less widely in libstdc++ and may need to
> > +be included explicitly when compiling with GCC 15:
> > +
> > + 
> > +  (for std::int8_t, std::int32_t etc.)
> > +
> 
> The text reads "headers", alas there only appears to be one right now?
> So "header is" (singular)?

Nice catch.  I didn't notice this plural/singular mismatch.  Hmm, I could do
this:

Some C++ Standard Library headers have been changed to no longer include
other headers that were being used internally by the library.
As such, C++ programs that used standard library components without
including the right headers will no longer compile.


In particular, the following header is used less widely in libstdc++ and may
need to be included explicitly when compiling with GCC 15:


 
  (for std::int8_t, std::int32_t etc.)



Here I only change the second paragraph.  The logic behind this is that the
first paragraph introduces the problem, therefore speaks more generally and
uses plural.  Then the second paragraph speaks about the particular header in
question and uses singular.

Or I could do this:

Some C++ Standard Library headers have been changed to no longer include the
/ header that was
being used internally by the library. As such, C++ programs that used standard
library components without including  where needed
will no longer compile.




Any opinion on these two options?

Cheers,
Filip Kastl


Re: [PATCH] tree-optimization/116166 - forward jump-threading going wild

2024-08-07 Thread Richard Biener
On Tue, 6 Aug 2024, Andrew MacLeod wrote:

> 
> On 8/6/24 09:12, Richard Biener wrote:
> > Currently the forward threader isn't limited as to the search space
> > it explores and with it now using path-ranger for simplifying
> > conditions it runs into it became pretty slow for degenerate cases
> > like compiling insn-emit.cc for RISC-V esp. when compiling for
> > a host with LOGICAL_OP_NON_SHORT_CIRCUIT disabled.
> >
> > The following makes the forward threader honor the search space
> > limit I introduced for the backward threader.  This reduces
> > compile-time from minutes to seconds for the testcase in PR116166.
> >
> > Note this wasn't necessary before we had ranger but with ranger
> > the work we do is quadatic in the length of the threading path
> > we build up (the same is true for the backwards threader).
> 
> Theres probably some work that can be done in the path processing space using
> the new gori_on_edge interface I introduced for the fast VRP pass.
> 
> // These APIs are used to query GORI if there are ranges generated on an edge.
> // GORI_ON_EDGE is used to get all the ranges at once (returned in an
> // ssa_cache structure).
> // Fill ssa-cache R with any outgoing ranges on edge E, using QUERY.
> bool gori_on_edge (class ssa_cache &r, edge e, range_query *query = NULL);
> 
> With this, the threader and path calculator can get and collect all the
> outgoing ranges for a block in linear time and just keep them.. and decide
> what it wants to use.   I suspect for really large CFGs, we'd want to
> substitute and alternative ssa_cache implementation to something like the
> sbr_sparse_bitmap class ranger's  cache uses which compresses the size of the
> vector so it isn't a vector over all the ssa-names, and at the same time limit
> it to  a max of 14 outgoing ranges.
> 
> no one has had any time to investigate that  yet.

The thing that path-ranger handles in addition to what range_on_edge
gives you is CFG merges where knowing the path into it improves
ranges downstream.  Basically path-ranger improves the range of PHIs
based on known path taken.  Whenever the decision which edge(s) of
a PHI are taken changes we'd have to wipe cached information.

In PR114855 the gori compute itself shows scalability issues
(I've identified the dominator walk sofar but it might not be
the full story).

> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> > OK if that succeeds?
> OK with me.

Thanks, pushed and queued for backport.

Richard.

Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Martin Uecker
Am Mittwoch, dem 07.08.2024 um 01:12 +0200 schrieb Alejandro Colomar:


Hi Alex,

a coupled of comments below.

> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -74,7 +74,17 @@ along with GCC; see the file COPYING3.  If not see
>  #include "bitmap.h"
>  #include "analyzer/analyzer-language.h"
>  #include "toplev.h"
> +
> +#define c_parser_sizeof_expression(parser)   
>  \
> +(
>  \
> +  c_parser_sizeof_or_lengthof_expression (parser, RID_SIZEOF)
>  \
> +)
>  
> +#define c_parser_lengthof_expression(parser) 
>  \
> +(
>  \
> +  c_parser_sizeof_or_lengthof_expression (parser, RID_LENGTHOF)  
>  \
> +)
> +

I suggest to avoid the macros.  I think the original function calls are
clear enough and this is then just another detour for somebody trying
to follow the code.  Or is there a reason I am missing?

...

> diff --git a/gcc/testsuite/gcc.dg/lengthof-compile.c 
> b/gcc/testsuite/gcc.dg/lengthof-compile.c
> new file mode 100644
> index 000..6b44704ca7e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lengthof-compile.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wno-declaration-after-statement -Wno-pedantic -Wno-vla" } 
> */
> +
> +extern int x[];
> +
> +void
> +incomplete (int p[])
> +{
> +  unsigned n;
> +
> +  n = __lengthof__ (x);  /* { dg-error "incomplete" } */
> +
> +  /* We want to support the following one in the future,
> + but for now it should fail.  */
> +  n = __lengthof__ (p);  /* { dg-error "invalid" } */
> +}
> +
> +void
> +fam (void)
> +{
> +  struct {
> +int x;
> +int fam[];
> +  } s;
> +  unsigned n;
> +
> +  n = __lengthof__ (s.fam); /* { dg-error "incomplete" } */
> +}
> +
> +void fix_fix (int i, char (*a)[3][5], int (*x)[__lengthof__ (*a)]);
> +void fix_var (int i, char (*a)[3][i], int (*x)[__lengthof__ (*a)]);
> +void fix_uns (int i, char (*a)[3][*], int (*x)[__lengthof__ (*a)]);


It would include a test that shows that when lengthof
is applied to [*] that it remains formally non-constant.  For example,
you could test with -Wvla-parameter that the two declarations do not give a
warning:

void foo(char (*a)[*], int x[*]);
void foo(char (*a)[*], int x[__lengthof__(*a)]);


(With  int (*x)[*]  we would run into the issue that we can not
distinguish zero arrays from unspecified ones, PR 98539)


> +
> +void
> +func (void)
> +{
> +  int  i3[3];
> +  int  i5[5];
> +  char c35[3][5];
> +
> +  fix_fix (5, &c35, &i3);
> +  fix_fix (5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
> +
> +  fix_var (5, &c35, &i3);
> +  fix_var (5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
> +
> +  fix_uns (5, &c35, &i3);
> +  fix_uns (5, &c35, &i5); /* { dg-error "incompatible-pointer-types" } */
> +}
> diff --git a/gcc/testsuite/gcc.dg/lengthof.c b/gcc/testsuite/gcc.dg/lengthof.c
> new file mode 100644
> index 000..38da5df52a5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/lengthof.c
> @@ -0,0 +1,127 @@
> +/* { dg-do run } */
> +/* { dg-options "-Wno-declaration-after-statement -Wno-pedantic -Wno-vla" } 
> */
> +
> +#undef NDEBUG
> +#include 
> +
> +void
> +array (void)
> +{
> +  short a[7];
> +
> +  assert (__lengthof__ (a) == 7);
> +  assert (__lengthof__ (long [0]) == 0);
> +  assert (__lengthof__ (unsigned [99]) == 99);
> +}

Instead of using assert you can use

if (! ...) __builtin_abort();

to avoid the include in the testsuite.  

Otherwise it looks fine from my side.

Joseph needs to approve and may have more comments.

Martin






Re: [PATCH v2 1/1] RISC-V: Support BF16 interfaces in libgcc

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 11:13:51AM +0800, Xiao Zeng wrote:
> gcc/ChangeLog:
> 
>   * builtin-types.def (BT_COMPLEX_BFLOAT16): Support BF16 node.
>   (BT_BFLOAT16_PTR): Ditto.
>   (BT_FN_BFLOAT16): New.
>   (BT_FN_BFLOAT16_BFLOAT16): Ditto.
>   (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>   (BT_FN_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>   (BT_FN_INT_BFLOAT16): Ditto.
>   (BT_FN_LONG_BFLOAT16): Ditto.
>   (BT_FN_LONGLONG_BFLOAT16): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16PTR): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_INT): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_INTPTR): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_LONG): Ditto.
>   (BT_FN_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16_COMPLEX_BFLOAT16): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_BFLOAT16): Ditto.
>   (BT_FN_BFLOAT16_BFLOAT16_BFLOAT16_INTPTR): Ditto.
>   * builtins.cc (expand_builtin_classify_type): Support BF16.
>   (mathfn_built_in_2): Ditto.
>   (CASE_MATHFN_FLOATN): Ditto.
>   * builtins.def (DEF_GCC_FLOATN_NX_BUILTINS): Ditto.
>   (DEF_EXT_LIB_FLOATN_NX_BUILTINS): Ditto.
>   (BUILT_IN_NANSF16B): Added in general processing, redundant
>   is removed here.
>   (BUILT_IN_NEXTAFTERF16B): Ditto.
>   * fold-const-call.cc (fold_const_call): Ditto.
>   (fold_const_call_sss): Ditto.
>   * gencfn-macros.cc: Support BF16.
>   * match.pd: Like FP16, add optimization for BF16.
>   * tree.h (CASE_FLT_FN_FLOATN_NX): Support BF16.
> 
> gcc/c-family/ChangeLog:
> 
>   * c-cppbuiltin.cc (c_cpp_builtins): Modify suffix names to avoid
>   conflicts.
> 
> libgcc/ChangeLog:
> 
>   * Makefile.in: Add _mulbc3 and _divbc3.
>   * libgcc2.c (if): Ditto.
>   (defined): Ditto.
>   (MTYPE): Macros defined for BF16.
>   (CTYPE): Ditto.
>   (AMTYPE): Ditto.
>   (MODE): Ditto.
>   (CEXT): Ditto.
>   (NOTRUNC): Ditto.
>   * libgcc2.h (LIBGCC2_HAS_BF_MODE): Support BF16.
>   (__attribute__): Ditto.
>   (__divbc3): Add __divbc3 declaration.
>   (__mulbc3): Add __mulbc3 declaration.
> 
> Signed-off-by: Xiao Zeng 

This looks all wrong to me.

On all the other targets that already do support __bf16 type it is a storage
only type, so all arithmetics on it is expected to be done on float, not in
__bf16.
Therefore, those targets really don't want any of those other builtins,
there will be no libm support for it, and they don't want support in libgcc
either, that is just wasted code.
Intentionally the only builtins provided are the minimum required for proper
C++23 support, __builtin_nansf16b and __builtin_nextafterf16b, because
those need to be constexpr friendly and can't be dealt with by extending to
float and using float builtins.

So, if riscv wants something different (will there by e.g. any libm
implementation with all the __bf16 APIs though?), it should ask for it some way
(target hook or whatever) and only in that case it should enable the other
builtins, libgcc APIs etc.

Jakub



Re: [PATCH] vect: Fix vect_reduction_def check for odd/even widen mult [PR116142]

2024-08-07 Thread Richard Biener
On Wed, 7 Aug 2024, Xi Ruoyao wrote:

> The check was implemented incorrectly, so vec_widen_smult_{even,odd}_M
> was never used.  This is not good for targets with native even/odd
> widening multiplication but not lo/hi multiplication.
> 
> The fix is actually developed by Richard Biener.

OK.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>   PR tree-optimization/116142
>   * tree-vect-stmts.cc (supportable_widening_operation): Remove an
>   redundant and incorrect vect_reduction_def check, and fix the
>   operand of another vect_reduction_def check.
> 
> gcc/testsuite/ChangeLog:
>   PR tree-optimization/116142
>   * gcc.target/i386/pr116142.c: New test.
> ---
> 
> Bootstrapped and regtested on x86_64-linux-gnu.  Ok for trunk?
> 
>  gcc/testsuite/gcc.target/i386/pr116142.c | 18 ++
>  gcc/tree-vect-stmts.cc   |  3 +--
>  2 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr116142.c
> 
> diff --git a/gcc/testsuite/gcc.target/i386/pr116142.c 
> b/gcc/testsuite/gcc.target/i386/pr116142.c
> new file mode 100644
> index 000..d288a50b237
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr116142.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512f -fdump-tree-optimized" } */
> +/* { dg-final { scan-tree-dump "WIDEN_MULT_EVEN_EXPR" "optimized" } } */
> +/* { dg-final { scan-tree-dump "WIDEN_MULT_ODD_EXPR" "optimized" } } */
> +
> +typedef __INT32_TYPE__ i32;
> +typedef __INT64_TYPE__ i64;
> +
> +i32 x[16], y[16];
> +
> +i64
> +test (void)
> +{
> +  i64 ret = 0;
> +  for (int i = 0; i < 16; i++)
> +ret ^= (i64) x[i] * y[i];
> +  return ret;
> +}
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index 20cae83e820..385e63163c2 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -14179,7 +14179,6 @@ supportable_widening_operation (vec_info *vinfo,
>are properly set up for the caller.  If we fail, we'll continue with
>a VEC_WIDEN_MULT_LO/HI_EXPR check.  */
>if (vect_loop
> -   && STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
> && !nested_in_vect_loop_p (vect_loop, stmt_info)
> && supportable_widening_operation (vinfo, VEC_WIDEN_MULT_EVEN_EXPR,
>stmt_info, vectype_out,
> @@ -14192,7 +14191,7 @@ supportable_widening_operation (vec_info *vinfo,
>   same operation.  One such an example is s += a * b, where 
> elements
>   in a and b cannot be reordered.  Here we check if the vector 
> defined
>   by STMT is only directly used in the reduction statement.  */
> -   tree lhs = gimple_assign_lhs (stmt_info->stmt);
> +   tree lhs = gimple_assign_lhs (vect_orig_stmt (stmt_info)->stmt);
> stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs);
> if (use_stmt_info
> && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [RESEND PATCH v5 1/3] ifcvt: handle sequences that clobber flags in noce_convert_multiple_sets

2024-08-07 Thread Manolis Tsamis
On Wed, Aug 7, 2024 at 8:57 AM Sam James  wrote:
>
> Philipp Tomsich  writes:
>
> Hi,
>
> > Sam, Jakub & Robin,
> >
> > We had an "OK for trunk" from Jeff for v4 (see
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656907.html) and
> > it has been two more weeks for this RESEND.
> > I'll push this by end of this week unless I hear otherwise.
>
> I'd ping Jeff for a quick re-review of v5 to make sure he's happy with
> the changes to address
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656908.html.
>
To answer Jeff's question, although that patch removes the checks for
particular SRC codes, it adds noce_operand_ok (src) which checks
side_effects_p (src) and may_trap_p (src). This handles the asm
related things and is in line with what the rest of ifcvt does, so
there was no change needed for that comment.

> I can't comment on ifcvt changes to say if they're OK or not otherwise,
> sorry.
>
> >
> > Thanks,
> > Philipp.
> >
> >
> > On Fri, 26 Jul 2024 at 12:50, Sam James  wrote:
> >>
> >> Manolis Tsamis  writes:
> >>
> >> > This is an extension of what was done in PR106590.
> >>
> >> FWIW, I think that if a bug is worth mentioning in the commit message,
> >> it's worth tagging so the hooks pick it up (as you get a nice
> >> reverse-mapping then if anyone is looking at it and wondering if a
> >> follow-up occurred).
> >>
> >> CC'd Jakub too given he wrote that commit and maybe he wants to review.
> >>
> >> Fixed Robin's email in CC list too.
> >>
> >> > [...]
> >>
> >> thanks,
> >> sam


Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Tue, Aug 6, 2024 at 8:50 PM Andi Kleen  wrote:
>
> > -  s += 16;
> > +  v16qi data, t;
> > +  /* Unaligned load.  Reading beyond the final newline is safe, since
> > +  files.cc:read_file_guts pads the allocation.  */
>
> You need to change that function to use 32 byte padding as Jakub
> pointed out (I forgot that too)
>
> > +  data = *(const v16qi_u *)s;
> > +  /* Prevent propagation into pshufb and pcmp as memory operand.  */
> > +  __asm__ ("" : "+x" (data));
>
> It would probably make sense to a file a PR on this separately,
> to eventually fix the compiler to not need such workarounds.
> Not sure how much difference it makes however.

This is probably to work around bugs in older compiler versions?  If
not I agree.

Otherwise the patch is OK.

Thanks,
Richard.

> -Andi


Re: [PATCH 2/5] range: Make range_op_table a true singleton class [PR116209]

2024-08-07 Thread Richard Biener
On Tue, Aug 6, 2024 at 11:29 PM Andrew Pinski  wrote:
>
> This is a small cleanup with respect to the ranger_op_table class.
> There should only ever be one instance of ranger_op_table so
> this adds a static member function which returns the instance.
> A few variables that are defined in range-op.cc should be local
> to the file so wrap them with an anonymous namespace.
> Also change operator_table into a reference that is initialized to
> the singelton.
>
> This has a small extra overhead at intiialization time of the operator_table;
> could be improved if we used C++20's consteval. Since this happens only once,
> there it should be ok.

Can you make it so with appropriate #if __cplusplus or __has_feature (consteval)
(or how that's done)?

>
> Bootstrapped and tested on x86_64-linux-gnu.
>
> PR tree-optimization/116209
> gcc/ChangeLog:
>
> * range-op.cc (op_equal, op_not_equal, op_lt, op_le, op_gt, op_ge,
> op_ident, op_cst, op_cast, op_plus, op_abs, op_minus, op_negate,
> op_mult, op_addr, op_bitwise_not, op_bitwise_xor, op_bitwise_and,
> op_bitwise_or, op_min, op_max, default_operator): Wrap with anonymous 
> namespace.
> (operator_table): Change to reference and initialize with 
> range_op_table::singleton.
> (range_op_table::singleton): New function.
> * range-op.h (range_op_table): New method, singleton.
> Make most functions private (rather than protected).
> Make ctor private.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/range-op.cc | 19 ++-
>  gcc/range-op.h  |  5 +++--
>  2 files changed, 17 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/range-op.cc b/gcc/range-op.cc
> index c576f688221..56a014e99bc 100644
> --- a/gcc/range-op.cc
> +++ b/gcc/range-op.cc
> @@ -49,8 +49,9 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-ssa-ccp.h"
>  #include "range-op-mixed.h"
>
> -// Instantiate the operators which apply to multiple types here.
> +namespace {
>
> +// Instantiate the operators which apply to multiple types here.
>  operator_equal op_equal;
>  operator_not_equal op_not_equal;
>  operator_lt op_lt;
> @@ -74,7 +75,12 @@ operator_min op_min;
>  operator_max op_max;
>
>  // Instantaite a range operator table.
> -range_op_table operator_table;
> +range_op_table &operator_table = range_op_table::singleton();
> +
> +// Instantiate a default range operator for opcodes with no entry.
> +range_operator default_operator;
> +
> +}
>
>  // Invoke the initialization routines for each class of range.
>
> @@ -111,9 +117,12 @@ range_op_table::range_op_table ()
>set (MAX_EXPR, op_max);
>  }
>
> -// Instantiate a default range operator for opcodes with no entry.
> -
> -range_operator default_operator;
> +// Returns the singleton instance of the table.
> +range_op_table &range_op_table::singleton()
> +{
> +  static range_op_table single;
> +  return single;
> +}
>
>  // Create a default range_op_handler.
>
> diff --git a/gcc/range-op.h b/gcc/range-op.h
> index 8edf967a445..e4e11f89624 100644
> --- a/gcc/range-op.h
> +++ b/gcc/range-op.h
> @@ -391,13 +391,13 @@ extern void wi_set_zero_nonzero_bits (tree type,
>  class range_op_table final
>  {
>  public:
> -  range_op_table ();
>inline range_operator *operator[] (unsigned code)
>  {
>gcc_checking_assert (code < RANGE_OP_TABLE_SIZE);
>return m_range_tree[code];
>  }
> -protected:
> +  static range_op_table &singleton();
> +private:
>inline void set (unsigned code, range_operator &op)
>  {
>gcc_checking_assert (code < RANGE_OP_TABLE_SIZE);
> @@ -408,6 +408,7 @@ protected:
>void initialize_integral_ops ();
>void initialize_pointer_ops ();
>void initialize_float_ops ();
> +  range_op_table ();
>  };
>
>  #endif // GCC_RANGE_OP_H
> --
> 2.43.0
>


Re: [PATCH 2/5] range: Make range_op_table a true singleton class [PR116209]

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 09:40:06AM +0200, Richard Biener wrote:
> On Tue, Aug 6, 2024 at 11:29 PM Andrew Pinski  
> wrote:
> >
> > This is a small cleanup with respect to the ranger_op_table class.
> > There should only ever be one instance of ranger_op_table so
> > this adds a static member function which returns the instance.
> > A few variables that are defined in range-op.cc should be local
> > to the file so wrap them with an anonymous namespace.
> > Also change operator_table into a reference that is initialized to
> > the singelton.
> >
> > This has a small extra overhead at intiialization time of the 
> > operator_table;
> > could be improved if we used C++20's consteval. Since this happens only 
> > once,
> > there it should be ok.
> 
> Can you make it so with appropriate #if __cplusplus or __has_feature 
> (consteval)
> (or how that's done)?

That would be
#if __cpp_consteval >= 201811L
unless you need the P2564R3 paper behavior (then it would be
#if __cpp_consteval >= 202211L
).

Jakub



Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 7:41 AM Alexander Monakov  wrote:
>
>
> On Tue, 6 Aug 2024, Alexander Monakov wrote:
>
> > --- a/libcpp/files.cc
> > +++ b/libcpp/files.cc
> [...]
> > +  pad = HAVE_AVX2 ? 32 : 16;
>
> This should have been
>
> #ifdef HAVE_AVX2
>   pad = 32;
> #else
>   pad = 16;
> #endif

OK with that change.

Did you think about a AVX512 version (possibly with 32 byte vectors)?
In case there's a more efficient variant of pshufb/pmovmskb available
there - possibly
the load on the branch unit could be lessened with using masking.

Thanks,
Richard.

> Alexander


Re: [PATCH] RISC-V: Minimal support for Zimop extension.

2024-08-07 Thread Nick Clifton

Hi Nelson,


Sounds good to me, too.  Once get the approval, I will backport them to 
binutils-2_43-branch :-)


Please could you ping me once you have done that.

I will make sure not to make the point release before receiving your message.

Cheers
  Nick




Pushed: [PATCH] vect: Fix vect_reduction_def check for odd/even widen mult [PR116142]

2024-08-07 Thread Xi Ruoyao
On Wed, 2024-08-07 at 07:01 +0100, Sam James wrote:
> Xi Ruoyao  writes:
> 
> > The check was implemented incorrectly, so
> > vec_widen_smult_{even,odd}_M
> > was never used.  This is not good for targets with native even/odd
> > widening multiplication but not lo/hi multiplication.
> > 
> > The fix is actually developed by Richard Biener.
> 
> Please use Co-authored-by from
> https://gcc.gnu.org/codingconventions.html#ChangeLogs.
> 
> > [...]
> 
> thanks,
> sam

Pushed https://gcc.gnu.org/r15-2791 with Co-authored-by added.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-07 Thread Alexander Monakov


On Wed, 7 Aug 2024, Richard Biener wrote:

> OK with that change.
> 
> Did you think about a AVX512 version (possibly with 32 byte vectors)?
> In case there's a more efficient variant of pshufb/pmovmskb available
> there - possibly
> the load on the branch unit could be lessened with using masking.

Thanks for the idea; unfortunately I don't see any possible improvement.
It would trade pmovmskb-(test+jcc,fused) for ktest-jcc, so unless the
latencies are shorter it seems to be a wash. The only way to use fewer
branches seems to be employing longer vectors.

(in any case I don't have access to a capable CPU to see for myself)

Alexander


Re: [PATCH v5 0/3] c: Add __lengthof__ operator

2024-08-07 Thread david.brown
Hi,The address david.br...@hesbynett.no is not bouncing.  Its my email address, 
and I'm getting the emails in this discussion just fine.However, I don't think 
I have anything to contribute here, so I don't object to being removed from the 
discussion. I am not a gcc developer, but as a long term user I  occasionally 
post on the developer mailing list. I don't remember having posted about a 
potential __lengthof__ operator, but it is certainly possible. On behalf of all 
gcc users, thank you for your efforts in working to improve gcc, no matter how 
this proposal turns out.Mvh.,David browndavid.br...@hesbynett.no
 Original message From: Alejandro Colomar  
Date: 07/08/2024  01:26  (GMT+01:00) To: gcc-patches@gcc.gnu.org Cc: Martin 
Uecker , Xavier Del Campo Romero , 
Joseph Myers , Gabriel Ravier , Jakub 
Jelinek , Kees Cook , Qing Zhao 
, Jens Gustedt , David Brown 
, Florian Weimer , Andreas Schwab 
 Subject: Re: [PATCH v5 0/3] c: Add __lengthof__ 
operator  is bouncing.  FYI.  I've removed it.On Wed, 
Aug 07, 2024 at 01:12:00AM GMT, Alejandro Colomar wrote:> Hi!> > This is ready 
for review.> > v5:> > -  Add changelog entries.> -  Wording fixes in commit 
messages.> -  s/sizeof/lengthof/ in comment.> -  CC += David, Florian, Andreas  
[Qing]> -  Docs: Remove some details about future directions.  [Qing]> -  Docs: 
Add examples.  [Qing]> -  Docs: Clarify when __lengthof__ evaluates as a 
constant expression>    and when it evaluates as a run-time value.  [Joseph, 
Qing]> -  Tests: Use several -Wno-* flags to turn off unwanted warnings.>    
[Martin, Joseph]> -  Tests: Merge into the same commit that adds the feature.> 
-  Tests: Fix style (whitespace).> > I won't paste the example program I used 
for development, since it's the> same as in v4.  Check that cover letter if 
necessary.> > When reviewing, mind that some parts of the code have been 
blindly> pasted from sizeof, and might not make much sense; I didn't fully> 
understand some parts.  However, it seems to behave well, and more or> less it 
makes sense.  Just be careful about it.> > At the bottom of this message is the 
range-diff against v4.> > Have a lovely night!> Alex> > > BTW, I've tested that 
there are no regressions:> >  alx@debian:~/src/gnu/gcc$ find len0 -type f>
len0/host-x86_64-pc-linux-gnu/gcc/testsuite/gcc/gcc.sum>
len0/host-x86_64-pc-linux-gnu/gcc/testsuite/gfortran/gfortran.sum>  
len0/host-x86_64-pc-linux-gnu/gcc/testsuite/objc/objc.sum>  
len0/host-x86_64-pc-linux-gnu/gcc/testsuite/g++/g++.sum>
len0/x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum>   
len0/x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum> 
len0/x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum> 
len0/x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum>  
alx@debian:~/src/gnu/gcc$ find len1 -type f>
len1/host-x86_64-pc-linux-gnu/gcc/testsuite/gcc/gcc.sum>
len1/host-x86_64-pc-linux-gnu/gcc/testsuite/gfortran/gfortran.sum>  
len1/host-x86_64-pc-linux-gnu/gcc/testsuite/objc/objc.sum>  
len1/host-x86_64-pc-linux-gnu/gcc/testsuite/g++/g++.sum>
len1/x86_64-pc-linux-gnu/libitm/testsuite/libitm.sum>   
len1/x86_64-pc-linux-gnu/libgomp/testsuite/libgomp.sum> 
len1/x86_64-pc-linux-gnu/libatomic/testsuite/libatomic.sum> 
len1/x86_64-pc-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum>  
alx@debian:~/src/gnu/gcc$ cat <(cd len0; find -type f) \>   
| while read f; do> diff -u 
"len0/$f" "len1/$f";>   done;>  --- 
len0/./host-x86_64-pc-linux-gnu/gcc/testsuite/gcc/gcc.sum   2024-08-06> 
+22:22:44.514175252 +0200>  +++ 
len1/./host-x86_64-pc-linux-gnu/gcc/testsuite/gcc/gcc.sum   2024-08-06> 
+23:29:53.693730123 +0200>  @@ -1,4 +1,4 @@>-Test run by alx on Tue 
Aug  6 19:28:53 2024>   +Test run by alx on Tue Aug  6 22:49:12 2024>Native 
configuration is x86_64-pc-linux-gnu> > === gcc tests 
===>  @@ -86504,6 +86504,15 @@>PASS: gcc.dg/large-size-array.c  
(test for errors, line 19)>PASS: gcc.dg/large-size-array.c (test for excess 
errors)>   UNSUPPORTED: gcc.dg/lazy-ptr-test.c>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 11)>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 15)>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 27)>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 42)>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 45)>   +PASS: 
gcc.dg/lengthof-compile.c  (test for errors, line 48)>   +PASS: 
gcc.dg/lengthof-compile.c (test for excess errors)>  +PASS: 
gcc.dg/lengthof.c (test for excess errors)>  +PASS: gcc.dg/lengthof.c 
execution test> PASS: gcc.dg/limits-width-1.c (test for excess errors)> 
PASS: gcc.dg/limits-width-2.c (test for excess errors)> PASS: 
gcc.dg/live-patching-1.c (test for excess errors)>   @

[PATCH][ Don't call clean_symbol_name in create_tmp_var_name [PR116219]

2024-08-07 Thread Jakub Jelinek
Hi!

SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
where the numbers in there are DECL_UIDs if there are unnamed
FIELD_DECLs etc.
Because -g0 vs. -g can cause differences between the exact DECL_UID
values (add bigger gaps in between them, corresponding decls should
still be ordered the same based on DECL_UID) we make sure such
decls have DECL_NAMELESS set and depending on exact options either don't
dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
there to D$.
Unfortunately in tons of places we then use get_name to grab either user
names or these SRA created names and use that as argument to
create_tmp_var{,_name,_raw} to base other artificial temporary names based
on that.  Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
starting with
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
calls clean_symbol_name which replaces the $s in there with _s and thus
dump_fancy_name doesn't sanitize it anymore.

I don't see any discussion of that commit (originally to TM branch, later
merged) on the mailing list, but from
   DECL_NAME (new_decl)
 = create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
-  SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
+  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
snippet elsewhere in that commit it seems create_tmp_var_name was used at
that point also to determine function names of clones, so presumably the
clean_symbol_name at that point was to ensure the symbol could be emitted
into assembly, maybe in case DECL_NAME is something like C++ operators or
whatever could have there undesirable characters.

Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
function and appends based on supportable symbol suffix separators the
separator and some suffix and/or number, so that part doesn't go through
create_tmp_var_name.

I don't see problems with having the $ and . etc. characters in the names
intended just to make dumps more readable, after all, we already are using
those in the SRA created names.  Those names shouldn't make it into the
assembly in any way, neither debug info nor assembly labels.

There is one theoretical case, where the gimplifier promotes automatic
vars into TREE_STATIC ones and therefore those can then appear in assembly,
just in case it would be on e.g. SRA created names and regimplified later
I've added code to ignore the names and force C.NNN if it is a DECL_NAMELESS
with problematic characters in the name.

Richi mentioned on IRC that the non-cleaned up names might make things
harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
the dumping for GIMPLE FE purposes that cleans those up (but at that point
it should also verify if some such cleaned up names don't collide with
others and somehow deal with those).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The -fcompare-debug failure on the testcase is gone, but the testcase
was huge and hard to reduce.

2024-08-06  Jakub Jelinek  

PR c++/116219
* gimple-expr.cc (remove_suffix): Formatting fixes.
(create_tmp_var_name): Don't call clean_symbol_name.
* gimplify.cc (gimplify_init_constructor): When promoting automatic
DECL_NAMELESS vars to static, only preserve their DECL_NAME if
it doesn't contain any characters clean_symbol_name replaces.

--- gcc/gimple-expr.cc.jj   2024-01-03 11:51:28.280776310 +0100
+++ gcc/gimple-expr.cc  2024-08-06 14:43:42.328673383 +0200
@@ -406,14 +406,12 @@ remove_suffix (char *name, int len)
 {
   int i;
 
-  for (i = 2;  i < 7 && len > i;  i++)
-{
-  if (name[len - i] == '.')
-   {
- name[len - i] = '\0';
- break;
-   }
-}
+  for (i = 2; i < 7 && len > i; i++)
+if (name[len - i] == '.')
+  {
+   name[len - i] = '\0';
+   break;
+  }
 }
 
 /* Create a new temporary name with PREFIX.  Return an identifier.  */
@@ -430,8 +428,6 @@ create_tmp_var_name (const char *prefix)
   char *preftmp = ASTRDUP (prefix);
 
   remove_suffix (preftmp, strlen (preftmp));
-  clean_symbol_name (preftmp);
-
   prefix = preftmp;
 }
 
--- gcc/gimplify.cc.jj  2024-08-05 13:04:53.903116091 +0200
+++ gcc/gimplify.cc 2024-08-06 15:27:40.404865291 +0200
@@ -5599,6 +5599,18 @@ gimplify_init_constructor (tree *expr_p,
 
DECL_INITIAL (object) = ctor;
TREE_STATIC (object) = 1;
+   if (DECL_NAME (object) && DECL_NAMELESS (object))
+ {
+   const char *name = get_name (object);
+   char *sname = ASTRDUP (name);
+   clean_symbol_name (sname);
+   /* If there are any undesirable characters in DECL_NAMELESS
+  name, just fall back to C.nnn name, we must ensure e.g.
+  SRA created names with DECL_UIDs don't 

Re: [PATCH] Support if conversion for switches

2024-08-07 Thread Richard Biener
On Tue, Aug 6, 2024 at 4:38 PM Andi Kleen  wrote:
>
> The gimple-if-to-switch pass converts if statements with
> multiple equal checks on the same value to a switch. This breaks
> vectorization which cannot handle switches.
>
> Teach the tree-if-conv pass used by the vectorizer to handle
> simple switch statements, like those created by if-to-switch earlier.
> These are switches that only have a single non default block,
> and no ranges. They are handled similar to if in if conversion.
>
> Some notes:
>
> In theory this handles switches with case ranges, but it seems
> for the simple "one target label" switch case that is supported
> here these are always optimized by the cfg passes to COND,
> so this case is latent.
>
> This makes the vect-bitfield-read-1-not test fail. The test
> checks for a bitfield analysis failing, but it actually
> relied on the ifcvt erroring out early because the test
> is using a switch. The if conversion still does not
> work because the switch is not in a form that this
> patch can handle, but it fails much later and the bitfield
> analysis succeeds, which makes the test fail. I marked
> it xfail because it doesn't seem to be testing what it wants
> to test.
>
> gcc/ChangeLog:
>
> PR tree-opt/115866
> * tree-if-conv.cc (if_convertible_switch_p): New function.
> (if_convertible_stmt_p): Check for switch.
> (get_loop_body_in_if_conv_order): Handle switch.
> (predicate_bbs): Likewise.
> (predicate_statements): Likewise.
> (remove_conditions_and_labels): Likewise.
> (ifcvt_split_critical_edges): Likewise.
> (ifcvt_local_dce): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
> * gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
> * gcc.dg/vect/vect-switch-search-line-fast.c: New test.
> * gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
> ---
>  .../gcc.dg/vect/vect-bitfield-read-1-not.c|   2 +-
>  .../gcc.dg/vect/vect-switch-ifcvt-1.c | 107 ++
>  .../gcc.dg/vect/vect-switch-ifcvt-2.c |  28 +
>  .../vect/vect-switch-search-line-fast.c   |  17 +++
>  gcc/tree-if-conv.cc   |  90 ++-
>  5 files changed, 238 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-switch-search-line-fast.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c 
> b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> index 0d91067ebb2..85f4de8464a 100644
> --- a/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> +++ b/gcc/testsuite/gcc.dg/vect/vect-bitfield-read-1-not.c
> @@ -55,6 +55,6 @@ int main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-not "Bitfield OK to lower." "ifcvt" } } */
> +/* { dg-final { scan-tree-dump-times "Bitfield OK to lower." 0 "ifcvt" { 
> xfail *-*-* } } } */
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c 
> b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
> new file mode 100644
> index 000..0b06d3c84a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-switch-ifcvt-1.c
> @@ -0,0 +1,107 @@
> +/* { dg-require-effective-target vect_int } */
> +
> +extern void abort (void);
> +
> +int
> +f1 (char *s)
> +{
> +  int c = 0;
> +  int i;
> +  for (i = 0; i < 64; i++)
> +{
> +  switch (*s)
> +   {
> +   case ',':
> +   case '|':
> + c++;
> +   }
> +  s++;
> +}
> +  return c;
> +}
> +
> +int
> +f2 (char *s)
> +{
> +  int c = 0;
> +  int i;
> +  for (i = 0; i < 64; i++)
> +{
> +  if (*s != '#')
> +   {
> + switch (*s)
> +   {
> +   case ',':
> +   case '|':
> + c++;
> +   }
> +   }
> +  s++;
> +}
> +  return c;
> +}
> +
> +int
> +f3 (char *s)
> +{
> +  int c = 0;
> +  int i;
> +  for (i = 0; i < 64; i++)
> +{
> +  if (*s != '#')
> +if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
> + c++;
> +  s++;
> +}
> +  return c;
> +}
> +
> +
> +int
> +f4 (char *s)
> +{
> +  int c = 0;
> +  int i;
> +  for (i = 0; i < 64; i++)
> +{
> +  if (*s == ',' || *s == '|' || *s == '@' || *s == '*')
> +   c++;
> +  s++;
> +}
> +  return c;
> +}
> +
> +#define CHECK(f, str, res) \
> +  __builtin_strcpy(buf, str); n = f(buf); if (n != res) abort();
> +
> +int
> +main ()
> +{
> +  int n;
> +  char buf[64];
> +
> +  CHECK (f1, ",,", 10);
> +  CHECK (f1, "||", 10);
> +  CHECK (f1, "aa", 0);
> +  CHECK (f1, "", 0);
> +  CHECK (f1, ",|,|xx", 4);
> +
> +  CHECK (f2, ",|,|xx", 4);
> +  CHECK (f2, ",|,|xx", 4);
> +  CHECK (f2, ",|,|xx", 4);
> +  CHECK (f2, ",|,|xx", 4);
> +
> +  CHECK (f3, ",|,|xx", 4);
> +  CHECK (f3, ",|,|xx", 4);
> +  CHECK 

Re: [PATCH 1/3] libcpp: configure: check for AVX2 instead of SSE4

2024-08-07 Thread Richard Biener
On Tue, Aug 6, 2024 at 6:19 PM Alexander Monakov  wrote:
>
> Upcoming patches first drop Binutils ISA support from SSE4.2 to SSSE3,
> then bump it to AVX2. Instead of fiddling with detection, just bump
> our configure check to AVX2 immediately: if by some accident somebody
> builds GCC without AVX2 support in the assembler, they will get SSE2
> vectorized lexer, which is not too slow.

OK.

> libcpp/ChangeLog:
>
> * config.in: Regenerate.
> * configure: Regenerate.
> * configure.ac: Check for AVX2 instead of SSE4.2.
> * lex.cc: Adjust for changed config macro.
> ---
>  libcpp/config.in| 6 +++---
>  libcpp/configure| 4 ++--
>  libcpp/configure.ac | 6 +++---
>  libcpp/lex.cc   | 2 +-
>  4 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/libcpp/config.in b/libcpp/config.in
> index 253ef03a3d..a0ca9e4df4 100644
> --- a/libcpp/config.in
> +++ b/libcpp/config.in
> @@ -35,6 +35,9 @@
> */
>  #undef HAVE_ALLOCA_H
>
> +/* Define to 1 if you can assemble AVX2 insns. */
> +#undef HAVE_AVX2
> +
>  /* Define to 1 if you have the Mac OS X function
> CFLocaleCopyPreferredLanguages in the CoreFoundation framework. */
>  #undef HAVE_CFLOCALECOPYPREFERREDLANGUAGES
> @@ -210,9 +213,6 @@
>  /* Define to 1 if you have the `putc_unlocked' function. */
>  #undef HAVE_PUTC_UNLOCKED
>
> -/* Define to 1 if you can assemble SSE4 insns. */
> -#undef HAVE_SSE4
> -
>  /* Define to 1 if you have the  header file. */
>  #undef HAVE_STDDEF_H
>
> diff --git a/libcpp/configure b/libcpp/configure
> index 32d6aaa306..74af097620 100755
> --- a/libcpp/configure
> +++ b/libcpp/configure
> @@ -9140,14 +9140,14 @@ case $target in
>  int
>  main ()
>  {
> -asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))
> +asm ("vpshufb %ymm0, %ymm1, %ymm2")
>;
>return 0;
>  }
>  _ACEOF
>  if ac_fn_c_try_compile "$LINENO"; then :
>
> -$as_echo "#define HAVE_SSE4 1" >>confdefs.h
> +$as_echo "#define HAVE_AVX2 1" >>confdefs.h
>
>  fi
>  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
> diff --git a/libcpp/configure.ac b/libcpp/configure.ac
> index b883fec776..cfefb63552 100644
> --- a/libcpp/configure.ac
> +++ b/libcpp/configure.ac
> @@ -197,9 +197,9 @@ fi
>
>  case $target in
>i?86-* | x86_64-*)
> -AC_TRY_COMPILE([], [asm ("pcmpestri %0, %%xmm0, %%xmm1" : : "i"(0))],
> -  [AC_DEFINE([HAVE_SSE4], [1],
> -[Define to 1 if you can assemble SSE4 insns.])])
> +AC_TRY_COMPILE([], [asm ("vpshufb %ymm0, %ymm1, %ymm2")],
> +  [AC_DEFINE([HAVE_AVX2], [1],
> +[Define to 1 if you can assemble AVX2 insns.])])
>  esac
>
>  # Enable --enable-host-shared.
> diff --git a/libcpp/lex.cc b/libcpp/lex.cc
> index 1591dcdf15..fa9c03614c 100644
> --- a/libcpp/lex.cc
> +++ b/libcpp/lex.cc
> @@ -344,7 +344,7 @@ search_line_sse2 (const uchar *s, const uchar *end 
> ATTRIBUTE_UNUSED)
>return (const uchar *)p + found;
>  }
>
> -#ifdef HAVE_SSE4
> +#ifdef HAVE_AVX2
>  /* A version of the fast scanner using SSE 4.2 vectorized string insns.  */
>
>  static const uchar *
> --
> 2.44.0
>


Re: [PATCH 2/5] range: Make range_op_table a true singleton class [PR116209]

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 9:42 AM Jakub Jelinek  wrote:
>
> On Wed, Aug 07, 2024 at 09:40:06AM +0200, Richard Biener wrote:
> > On Tue, Aug 6, 2024 at 11:29 PM Andrew Pinski  
> > wrote:
> > >
> > > This is a small cleanup with respect to the ranger_op_table class.
> > > There should only ever be one instance of ranger_op_table so
> > > this adds a static member function which returns the instance.
> > > A few variables that are defined in range-op.cc should be local
> > > to the file so wrap them with an anonymous namespace.
> > > Also change operator_table into a reference that is initialized to
> > > the singelton.
> > >
> > > This has a small extra overhead at intiialization time of the 
> > > operator_table;
> > > could be improved if we used C++20's consteval. Since this happens only 
> > > once,
> > > there it should be ok.
> >
> > Can you make it so with appropriate #if __cplusplus or __has_feature 
> > (consteval)
> > (or how that's done)?
>
> That would be
> #if __cpp_consteval >= 201811L
> unless you need the P2564R3 paper behavior (then it would be
> #if __cpp_consteval >= 202211L
> ).

Thanks - IMO it's worth optimizing static initialization.

Richard.

>
> Jakub
>


Re: [PATCH] RISC-V: Minimal support for Zimop extension.

2024-08-07 Thread Nelson Chu
Hi Nick,

Done for cherry-picking two patches to support riscv zimop and zcmop into
binutils-2_43-branch.
Commit bb566d7f, RISC-V: Add support for Zcmop extension
Commit 305fe5ed, RISC-V: Add support for Zimop extension

And also for gas/NEWS in the trunk, moved the entry into 2.43 section due
to the backport.
Commit 643f8ace, gas/NEWS: Moved RISC-V Zimop/Zcmop changes into 2.43
section due to backport.

Thanks
Nelson

On Wed, Aug 7, 2024 at 3:57 PM Nick Clifton  wrote:

> Hi Nelson,
>
> > Sounds good to me, too.  Once get the approval, I will backport them to
> binutils-2_43-branch :-)
>
> Please could you ping me once you have done that.
>
> I will make sure not to make the point release before receiving your
> message.
>
> Cheers
>Nick
>
>
>


Re: [PATCH v5 0/3] c: Add __lengthof__ operator

2024-08-07 Thread Alejandro Colomar
Hi David,

On Wed, Aug 07, 2024 at 10:11:12AM GMT, david.brown wrote:
> Hi,The address david.br...@hesbynett.no is not bouncing.  Its my email
> address, and I'm getting the emails in this discussion just fine.

Ahh, sorry, I didn't read the bounce notification properly; the address
is indeed reachable, but it rejected some message as spam (and notified
me, which is the first time I see that happen:).

(At the bottom I pasted part of the report I received.)

Thanks for replying!

>  However, I don't think I have anything to contribute here, so I don't
> object to being removed from the discussion. I am not a gcc developer,
> but as a long term user I  occasionally post on the developer mailing
> list. I don't remember having posted about a potential __lengthof__
> operator, but it is certainly possible.

You did post 4 years ago:



I will keep you on CC, unless you expressely want to be remvoved.  ;)

> On behalf of all gcc users, thank you for your efforts in working to
> improve gcc, no matter how this proposal turns out.

Thanks!

> Mvh.,David browndavid.br...@hesbynett.no

Have a lovely day!
Alex

---

This is the mail system at host dfw.source.kernel.org.

I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.

For further assistance, please send mail to postmaster.

If you do so, please include this problem report. You can
delete your own text from the attached returned message.

   The mail system

: host spam01.hesbynett.no[81.29.32.152] said: 550
5.7.1 Rejected by spam filter (4cb5907b-5449-11ef-98be-506b8dfa0e58) (in
reply to end of DATA command)


-- 



signature.asc
Description: PGP signature


Re: [Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-08-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-07-26T20:05:43+0200, Tobias Burnus  wrote:
> The main idea of 'link' is to permit putting only a subset of a
> huge array on the device. Well, in order to make this work properly,
> it requires that one can map an array section, which does not
> start with the first element.
>
> This patch adjusts the pointers such, that this actually works.
>
> (Tested on x86-64-gnu-linux with Nvptx offloading.)
> Comments, suggestions, remarks before I commit it?

> libgomp: Fix declare target link with offset array-section mapping [PR116107]
>
> Assume that 'int var[100]' is 'omp declare target link(var)'. When now
> mapping an array section with offset such as 'map(to:var[20:10])',
> the device-side link pointer has to store &[0] minus
> the offset such that var[20] will access [0]. But
> the offset calculation was missed such that the device-side 'var' pointed
> to the first element of the mapped data - and var[20] points beyond at
> some invalid memory.
>
>   PR middle-end/116107
>
> libgomp/ChangeLog:
>
>   * target.c (gomp_map_vars_internal): Honor array mapping offsets
>   with declare-target 'link' variables.
>   * testsuite/libgomp.c-c++-common/target-link-2.c: New test.
>
>  libgomp/target.c   |  7 ++-
>  .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 
> ++
>  2 files changed, 64 insertions(+), 2 deletions(-)

The new test case 'libgomp.c-c++-common/target-link-2.c' generally PASSes
on one-GPU systems, but on a multi-GPU system (tested nvidia5):

$ nvidia-smi -L
GPU 0: Tesla K80 (UUID: [...])
GPU 1: Tesla K80 (UUID: [...])

..., I see:

+PASS: libgomp.c/../libgomp.c-c++-common/target-link-2.c (test for excess 
errors)
+FAIL: libgomp.c/../libgomp.c-c++-common/target-link-2.c execution test

+PASS: libgomp.c++/../libgomp.c-c++-common/target-link-2.c (test for excess 
errors)
+FAIL: libgomp.c++/../libgomp.c-c++-common/target-link-2.c execution test

[...]
#2  0x77b548fc in __GI_abort () at abort.c:79
#3  0x1bd4 in main () at 
[...]/libgomp.c-c++-common/target-link-2.c:38
(gdb) frame 3
#3  0x1bd4 in main () at 
[...]/libgomp.c-c++-common/target-link-2.c:38
38  __builtin_abort ();
(gdb) list
33
34#pragma omp target map(from: res2) device(dev)
35  res2 = arr[5];
36
37if (res2 != 6)
38  __builtin_abort ();
[...]
(gdb) print res2
$1 = 60

I first thought that maybe just:

--- libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
+++ libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -54,6 +54,8 @@ int main()
   for (int i = 0; i < 10; i++)
if (res[i] != (4 + i)*10)
  __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[3:10]) device(dev)
 }
   return 0;
 }

... was missing, but that doesn't resolve the issue: same error state.
Could you please have a look what other state needs to be reset, in which
way?


Grüße
 Thomas


> diff --git a/libgomp/target.c b/libgomp/target.c
> index aa01c1367b9..e3e648f5443 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>   if (k->aux && k->aux->link_key)
> {
>   /* Set link pointer on target to the device address of the
> -mapped object.  */
> - void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
> +mapped object. Also deal with offsets due to
> +array-section mapping. */
> + void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
> +- (k->host_start
> +   - 
> k->aux->link_key->host_start));
>   /* We intentionally do not use coalescing here, as it's not
>  data allocated by the current call to this function.  */
>   gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c 
> b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
> new file mode 100644
> index 000..4ff4080da76
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
> @@ -0,0 +1,59 @@
> +/* PR middle-end/116107  */
> +
> +#include 
> +
> +int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +#pragma omp declare target link(arr)
> +
> +#pragma omp begin declare target
> +void f(int *res)
> +{
> +  __builtin_memcpy (res, &arr[5], sizeof(int)*10);
> +}
> +
> +void g(int *res)
> +{
> +  __builtin_memcpy (res, &arr[3], sizeof(int)*10);
> +}
> +#pragma omp end declare target
> +
> +int main()
> +{
> +  int res[10], res2;
> +  for (int dev = 0; dev < omp_get_num_devices(); dev++

Re: [PATCH] testsuite: fix pr115929-1.c with -Wformat-security

2024-08-07 Thread Sam James
Richard Sandiford  writes:

> Xi Ruoyao  writes:
>> On Sat, 2024-07-20 at 06:52 +0100, Sam James wrote:
>>> Some distributions like Gentoo make -Wformat and -Wformat-security
>>> enabled by default. Pass -Wno-format to the test to avoid a spurious
>>> fail in such environments.
>>> 
>>> gcc/testsuite/
>>> PR rtl-optimization/115929
>>> * gcc.dg/torture/pr115929-1.c: Pass -Wno-format.
>>> ---
>>
>> IMO if you are patching GCC downstream to enable some options, you can
>> patch the test case in the same .patch file anyway instead of pushing it
>> upstream.
>>
>> If we take the responsibility to make the test suite anticipate random
>> downstream changes, the test suite will ended up filled with different
>> workarounds for 42 distros.
>
> Yeah, I'm worried about that too.
>
>> If we have to anticipate downstream changes we should make a policy
>> about which changes we must anticipate (hmm and if we'll anticipate -
>> Wformat by default why not add a configuration option for it by the
>> way?), or do it in a more generic way (using a .spec file to explicitly
>> give the "baseline" options for testing?)
>
> Two systematic ways of dealing with this under the current testsuite
> framework would be:
>
> (1) Make dg-torture.exp add -w by default.  This is what gcc.c-torture
> already does.  Then, tests that want to test for warnings can
> enable them explicitly.
>
> Some of the existing dg-warnings are already due to lack of -w,
> rather than something that the test was originally designed for.
> E.g. pr26565.c.
>
> (2) Make dg-torture.exp add -Wall -Wextra by default, so that tests
> have to suppress any warnings they don't want.
>
> Personally, I'd prefer one of those two rather than patching upstream
> tests for downstream changes.

Another question: so, for now, I'm testing with RUNTESTFLAGS=... to
disable -Wformat. This works OK, but some tests for format behaviour are doing:
/* { dg-options "-Wall" } */
instead of
/* { dg-options "-Wformat" } */
so they end up failing, because GCC is clever wrt options, so
gcc -Wno-format -Wall # doesn't enable -Wformat
but
gcc -Wno-format -Wall -Wformat # does enable -Wformat

How do you feel about those being changed to either -Wall -Wformat (and
so on), or just -Wformat, as appropriate?

thanks,
sam


signature.asc
Description: PGP signature


Re: [PATCH] PR116080: Fix test suite checks for musttail

2024-08-07 Thread Thomas Schwinge
Hi Andi!

On 2024-08-02T14:12:59-0700, Andi Kleen  wrote:
> Andi Kleen  writes:
>> This is a new attempt to fix PR116080. The previous try was reverted
>> because it just broke a bunch of tests, hiding the problem.
>
> The previous version still had one failure on powerpc because
> of a template call that needs a dg-error check for external_tail_call.
> I fixed that now in the below version.
>
> Okay for trunk? I would like to check that one in to avoid the noise
> in the regression reports.

I've tested this version in a few trees.

('-Wc++-compat' are the C test cases, '-std=c++YY' the C++ ones.)


For x86_64 GNU/Linux, '-m32' testing, this does resolve the previous
FAILs:

[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++11[-(test for 
excess errors)-]
[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++17[-(test for 
excess errors)-]
[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++26[-(test for 
excess errors)-]

[-FAIL:-]{+UNSUPPORTED:+} g++.dg/musttail6.C[-(test for excess errors)-]  

..., but also "regresses" (PASS -> UNSUPPORTED):

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -Wc++-compat[-(test 
for excess errors)-] 

[-PASS: c-c++-common/musttail3.c  -Wc++-compat  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -Wc++-compat[-(test for 
excess errors)-]

[-PASS: c-c++-common/musttail3.c  -std=c++11  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++11[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++17  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++17[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++26  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++26[-(test for 
excess errors)-]

That's because of effective-target 'struct_musttail' for '-m32'
reporting:

struct_musttail1494739.cc: In function 'foo bar()':
struct_musttail1494739.cc:5:88: error: cannot tail-call: return value used 
after call

(I'm just mentioning the latter "regressions" in case those are
unexpected.)


For powerpc64le GNU/Linux, this does resolve the previous FAIL:

PASS: g++.dg/musttail10.C(test for errors, line 11)
{+PASS: g++.dg/musttail10.C(test for errors, line 15)+}
PASS: g++.dg/musttail10.C(test for errors, line 20)
PASS: g++.dg/musttail10.C(test for errors, line 24)
PASS: g++.dg/musttail10.C(test for errors, line 7)
[-FAIL:-]{+PASS:+} g++.dg/musttail10.C   (test for excess errors)

..., but similarly "regresses" (PASS -> UNSUPPORTED):

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -Wc++-compat[-(test 
for excess errors)-] 

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++11[-(test for 
excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++17[-(test for 
excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++26[-(test for 
excess errors)-]

[-PASS: c-c++-common/musttail3.c  -Wc++-compat  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -Wc++-compat[-(test for 
excess errors)-] 

[-PASS: c-c++-common/musttail3.c  -std=c++11  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++11[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++17  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++17[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++26  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++26[-(test for 
excess errors)-]

Here, that's because of effective-target 'struct_musttail' reporting:

struct_musttail485321.cc: In function 'foo bar()':
struct_musttail485321.cc:5:88: error: cannot tail-call: target is not able 
to optimize the call into a sibling call

(Again, I'm just mentioning the latter "regressions" in case those are
unexpected.)


So: looks good, all FAILs resolved (in these GCC configurations).


Grüße
 Thomas


> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bunch of tests, hiding the problem.
>
> - musttail behaves differently than tailcall at -O0. Some of the test
> run at -O0, so add separate effective target tests for musttail.
> - New effective target tests need to use unique file names
> to make dejagnu caching work
> - Change the tests to use new targets
> - Add a external_musttail test to check for target's ability
> to do tail calls between translation units. This covers some powerpc
> ABIs.
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/116080
>   * c-c++-common/musttail1.c: Use musttail target.
>   * c-c++-common/musttail12.c: Use struct_musttail target.
>   * c-c++-common/musttail2.c: Use m

[PATCH] ada: Fix s-taprop__solaris.adb compilation

2024-08-07 Thread Rainer Orth
Solaris Ada bootstrap is broken as of 2024-08-06 with

s-taprop.adb:1971:23: error: "int" is not visible
s-taprop.adb:1971:23: error: multiple use clauses cause hiding
s-taprop.adb:1971:23: error: hidden declaration at s-osinte.ads:51
s-taprop.adb:1971:23: error: hidden declaration at i-c.ads:62

because one instance of int isn't qualified.  This patch fixes this.

Bootstrapped without regressions on i386-pc-solaris2.11 and
sparc-sun-solaris2.11.

Ok for trunk?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2024-08-07  Rainer Orth  

gcc/ada:
* libgnarl/s-taprop__solaris.adb (Set_Task_Affinity): Fully
quality int.

# HG changeset patch
# Parent  e1cd3c7e6ffeab6f837a999725875174ac45a9ce
ada: Fix s-taprop__solaris.adb compilation

diff --git a/gcc/ada/libgnarl/s-taprop__solaris.adb b/gcc/ada/libgnarl/s-taprop__solaris.adb
--- a/gcc/ada/libgnarl/s-taprop__solaris.adb
+++ b/gcc/ada/libgnarl/s-taprop__solaris.adb
@@ -1968,7 +1968,7 @@ package body System.Task_Primitives.Oper
   then
  declare
 CPU_Set : aliased psetid_t;
-Result  : int;
+Result  : Interfaces.C.int;
 
  begin
 Result := pset_create (CPU_Set'Access);


Re: [PATCH][ Don't call clean_symbol_name in create_tmp_var_name [PR116219]

2024-08-07 Thread Richard Biener
On Wed, 7 Aug 2024, Jakub Jelinek wrote:

> Hi!
> 
> SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
> where the numbers in there are DECL_UIDs if there are unnamed
> FIELD_DECLs etc.
> Because -g0 vs. -g can cause differences between the exact DECL_UID
> values (add bigger gaps in between them, corresponding decls should
> still be ordered the same based on DECL_UID) we make sure such
> decls have DECL_NAMELESS set and depending on exact options either don't
> dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
> there to D$.
> Unfortunately in tons of places we then use get_name to grab either user
> names or these SRA created names and use that as argument to
> create_tmp_var{,_name,_raw} to base other artificial temporary names based
> on that.  Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
> starting with
> https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
> calls clean_symbol_name which replaces the $s in there with _s and thus
> dump_fancy_name doesn't sanitize it anymore.
> 
> I don't see any discussion of that commit (originally to TM branch, later
> merged) on the mailing list, but from
>DECL_NAME (new_decl)
>  = create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
> -  SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
> +  SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
> snippet elsewhere in that commit it seems create_tmp_var_name was used at
> that point also to determine function names of clones, so presumably the
> clean_symbol_name at that point was to ensure the symbol could be emitted
> into assembly, maybe in case DECL_NAME is something like C++ operators or
> whatever could have there undesirable characters.
> 
> Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
> purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
> function and appends based on supportable symbol suffix separators the
> separator and some suffix and/or number, so that part doesn't go through
> create_tmp_var_name.
> 
> I don't see problems with having the $ and . etc. characters in the names
> intended just to make dumps more readable, after all, we already are using
> those in the SRA created names.  Those names shouldn't make it into the
> assembly in any way, neither debug info nor assembly labels.
> 
> There is one theoretical case, where the gimplifier promotes automatic
> vars into TREE_STATIC ones and therefore those can then appear in assembly,
> just in case it would be on e.g. SRA created names and regimplified later
> I've added code to ignore the names and force C.NNN if it is a DECL_NAMELESS
> with problematic characters in the name.
> 
> Richi mentioned on IRC that the non-cleaned up names might make things
> harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
> the dumping for GIMPLE FE purposes that cleans those up (but at that point
> it should also verify if some such cleaned up names don't collide with
> others and somehow deal with those).

My plan was to accept an additional character in identifiers as
extension with -fgimple and use that.  I think we already accept dollars
but dots are of course problematic and cannot be used.  Replacing
unwanted chars with $ and pre-existing $ with $$ might work.  There's
not many (ASCII) characters available, @ might be.

> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> The -fcompare-debug failure on the testcase is gone, but the testcase
> was huge and hard to reduce.
> 
> 2024-08-06  Jakub Jelinek  
> 
>   PR c++/116219
>   * gimple-expr.cc (remove_suffix): Formatting fixes.
>   (create_tmp_var_name): Don't call clean_symbol_name.
>   * gimplify.cc (gimplify_init_constructor): When promoting automatic
>   DECL_NAMELESS vars to static, only preserve their DECL_NAME if
>   it doesn't contain any characters clean_symbol_name replaces.
> 
> --- gcc/gimple-expr.cc.jj 2024-01-03 11:51:28.280776310 +0100
> +++ gcc/gimple-expr.cc2024-08-06 14:43:42.328673383 +0200
> @@ -406,14 +406,12 @@ remove_suffix (char *name, int len)
>  {
>int i;
>  
> -  for (i = 2;  i < 7 && len > i;  i++)
> -{
> -  if (name[len - i] == '.')
> - {
> -   name[len - i] = '\0';
> -   break;
> - }
> -}
> +  for (i = 2; i < 7 && len > i; i++)
> +if (name[len - i] == '.')
> +  {
> + name[len - i] = '\0';
> + break;
> +  }
>  }
>  
>  /* Create a new temporary name with PREFIX.  Return an identifier.  */
> @@ -430,8 +428,6 @@ create_tmp_var_name (const char *prefix)
>char *preftmp = ASTRDUP (prefix);
>  
>remove_suffix (preftmp, strlen (preftmp));
> -  clean_symbol_name (preftmp);
> -
>prefix = preftmp;
>  }
>  
> --- gcc/gimplify.cc.jj2024-08-05 13:04:53.903116091 +0200
> +++ gcc/gimplify.cc   2024-08-06 15:27:40.404865291 +0200
> @@ -5599,6 +5599

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov


On Wed, 7 Aug 2024, Richard Biener wrote:

> > > +  data = *(const v16qi_u *)s;
> > > +  /* Prevent propagation into pshufb and pcmp as memory operand.  */
> > > +  __asm__ ("" : "+x" (data));
> >
> > It would probably make sense to a file a PR on this separately,
> > to eventually fix the compiler to not need such workarounds.
> > Not sure how much difference it makes however.
> 
> This is probably to work around bugs in older compiler versions?  If
> not I agree.

This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
macro-fused on Intel, so with propagation it is two uops early in the
CPU front-end.

The "propagation" actually falls out of IRA/LRA decisions, and stopped
happening in gcc-14. I'm not sure if there were relevant RA changes.
In any case, this can potentially flip-flop in the future again.

Considering the trunk gets this right, I think the next move is to
add a testcase for this, not a PR, correct?

> Otherwise the patch is OK.

Still OK with the asms, or would you prefer them be taken out?

Thanks.

Alexander


Re: [PATCH][ Don't call clean_symbol_name in create_tmp_var_name [PR116219]

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 11:03:18AM +0200, Richard Biener wrote:
> > Richi mentioned on IRC that the non-cleaned up names might make things
> > harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
> > the dumping for GIMPLE FE purposes that cleans those up (but at that point
> > it should also verify if some such cleaned up names don't collide with
> > others and somehow deal with those).
> 
> My plan was to accept an additional character in identifiers as
> extension with -fgimple and use that.  I think we already accept dollars
> but dots are of course problematic and cannot be used.  Replacing
> unwanted chars with $ and pre-existing $ with $$ might work.  There's
> not many (ASCII) characters available, @ might be.

Some kind of mangling... ;)

> > --- gcc/gimplify.cc.jj  2024-08-05 13:04:53.903116091 +0200
> > +++ gcc/gimplify.cc 2024-08-06 15:27:40.404865291 +0200
> > @@ -5599,6 +5599,18 @@ gimplify_init_constructor (tree *expr_p,
> >  
> > DECL_INITIAL (object) = ctor;
> > TREE_STATIC (object) = 1;
> > +   if (DECL_NAME (object) && DECL_NAMELESS (object))
> > + {
> > +   const char *name = get_name (object);
> > +   char *sname = ASTRDUP (name);
> > +   clean_symbol_name (sname);
> > +   /* If there are any undesirable characters in DECL_NAMELESS
> > +  name, just fall back to C.nnn name, we must ensure e.g.
> > +  SRA created names with DECL_UIDs don't make it into
> > +  assembly.  */
> > +   if (strcmp (name, sname))
> > + DECL_NAME (object) = NULL_TREE;
> > + }
> 
> Did you actually run into such a case?

No, but I haven't collected statistics about that (so didn't add
some fprintf there to log if it happened).

> I'd expect
> gimplify_init_constructor only happening on the original GENERIC IL.

In theory one can construct INIT_EXPR and gimplify it in later passes as
well.  But it is unlikely, sure.

> In any case how about the simpler
> 
> if (!DECL_NAME (object) || DECL_NAMELESS (object))
>   DECL_NAME (object) = create_tmp_var_name ("C");
> 
> ?

This is what I had in my first version, but I was worried too many things
would be DECL_NAMELESS during gimplification.  I think I'll do another
bootstrap/regtest with statistics gathering to see what is worth.

> Otherwise looks OK to me.  I guess this should get quite some
> soaking time before backporting (if that was intended).

Definitely.  At least two months I'd say.

Jakub



Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-07 Thread Mikael Morin

Hello,

Le 06/08/2024 à 22:57, Thomas Koenig a écrit :

Hi Mikael and Harald,


- inline expansion is inhibited at -Os.  But wouldn't it be good if
   we make this expansion also dependent on -ffrontend-optimize?
   (This was the case for rank-1 before your patch).


By the way, I disabled the minmaxloc frontend optimization without too 
much thought, because it was preventing me from seeing the effects of my 
patches in the dumps.  Now that both of you have put some focus on it, I 
think the optimization should be completely removed instead, because the 
patches make it unreachable.



The original idea was to have -ffrontend-optimize as a check if anything
went wrong with front-end optimization in particular - if the bug went
away with -fno-frontend-optimize, we knew where to look (and I knew
I had to look).

It also provides a way for users to workaround bugs in frontend 
optimizations.  If inline expansion were dependent on the flag, it would 
also provide the same benefit, but it would be using the flag outside of 
its intended scope, so I would rather not do it.



So, probably better to not do this at -Os.  One thought: Should we
also do the inlining without optimization?


At -Os: no inline expansion.  Don't we all agree on that?
I'm fine with also disabling expansion at -O0.

Mikael


Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Alejandro Colomar
Hi Martin,

On Wed, Aug 07, 2024 at 09:13:07AM GMT, Martin Uecker wrote:
> Am Mittwoch, dem 07.08.2024 um 01:12 +0200 schrieb Alejandro Colomar:
> > +#define c_parser_lengthof_expression(parser)   
> >\
> > +(  
> >\
> > +  c_parser_sizeof_or_lengthof_expression (parser, RID_LENGTHOF)
> >\
> > +)
> > +
> 
> I suggest to avoid the macros.  I think the original function calls are
> clear enough and this is then just another detour for somebody trying
> to follow the code.  Or is there a reason I am missing?

I imitated the following ones that already exist:

c-family/c-common.h:923:
#define c_sizeof(LOC, T)  c_sizeof_or_alignof_type (LOC, T, true, 
false, 1)

cp/cp-tree.h:8318:
#define cxx_sizeof(T)  cxx_sizeof_or_alignof_type (input_location, T, 
SIZEOF_EXPR, false, true)

c-family/c-common.h:924:
#define c_alignof(LOC, T) c_sizeof_or_alignof_type (LOC, T, false, 
false, 1)

But I'm fine using it raw.

> > +void fix_fix (int i, char (*a)[3][5], int (*x)[__lengthof__ (*a)]);
> > +void fix_var (int i, char (*a)[3][i], int (*x)[__lengthof__ (*a)]);
> > +void fix_uns (int i, char (*a)[3][*], int (*x)[__lengthof__ (*a)]);
> 
> 
> It would include a test that shows that when lengthof
> is applied to [*] that it remains formally non-constant.  For example,
> you could test with -Wvla-parameter that the two declarations do not give a
> warning:
> 
> void foo(char (*a)[*], int x[*]);
> void foo(char (*a)[*], int x[__lengthof__(*a)]);

But [*] is a VLA.  Do we want to return a constexpr for it?

> (With  int (*x)[*]  we would run into the issue that we can not
> distinguish zero arrays from unspecified ones, PR 98539)

As Martin Sebor said, I need to choose between supporting well [0] or
supporting well [*], but not both.

I would personally prefer supporting [0], and consider that not
supporting [*] is a bug in the implementation of [*] (and thus not my
problem).

However, since GCC doesn't support 0-length arrays, I'm not sure that
would be correct.

What do you think?

Does anyone oppose treating [0] as a constexpr 0 length?  That means not
supporting well [*], but please fix it separately, which Martin Uecker
is working on.  :)

> > diff --git a/gcc/testsuite/gcc.dg/lengthof.c 
> > b/gcc/testsuite/gcc.dg/lengthof.c
> > new file mode 100644
> > index 000..38da5df52a5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/lengthof.c
> > @@ -0,0 +1,127 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-Wno-declaration-after-statement -Wno-pedantic -Wno-vla" 
> > } */
> > +
> > +#undef NDEBUG
> > +#include 
> > +
> > +void
> > +array (void)
> > +{
> > +  short a[7];
> > +
> > +  assert (__lengthof__ (a) == 7);
> > +  assert (__lengthof__ (long [0]) == 0);
> > +  assert (__lengthof__ (unsigned [99]) == 99);
> > +}
> 
> Instead of using assert you can use
> 
> if (! ...) __builtin_abort();
> 
> to avoid the include in the testsuite.  

Is it frowned upon to include something?  I prefer assert(3).

> Otherwise it looks fine from my side.
> 
> Joseph needs to approve and may have more comments.

Thanks!

> 
> Martin

Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


[PATCH v2] Match: Support form 1 for scalar signed integer .SAT_ADD

2024-08-07 Thread pan2 . li
From: Pan Li 

This patch would like to support the form 1 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 1:
  #define DEF_SAT_S_ADD_FMT_1(T, MIN, MAX) \
  T __attribute__((noinline))  \
  sat_s_add_##T##_fmt_1 (T x, T y) \
  {\
T sum = x + y; \
return (x ^ y) < 0 \
  ? sum\
  : (sum ^ x) >= 0 \
? sum  \
: x < 0 ? MIN : MAX;   \
  }

DEF_SAT_S_ADD_FMT_1(int64_t, INT64_MIN, INT64_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t sum;
   8   │   long int _1;
   9   │   long int _2;
  10   │   int64_t _3;
  11   │   _Bool _8;
  12   │   long int _9;
  13   │   long int _10;
  14   │   long int _11;
  15   │   long int _12;
  16   │   long int _13;
  17   │
  18   │[local count: 1073741824]:
  19   │   sum_6 = x_4(D) + y_5(D);
  20   │   _1 = x_4(D) ^ y_5(D);
  21   │   _2 = x_4(D) ^ sum_6;
  22   │   _12 = ~_1;
  23   │   _13 = _2 & _12;
  24   │   if (_13 < 0)
  25   │ goto ; [41.00%]
  26   │   else
  27   │ goto ; [59.00%]
  28   │
  29   │[local count: 259738147]:
  30   │   _8 = x_4(D) < 0;
  31   │   _9 = (long int) _8;
  32   │   _10 = -_9;
  33   │   _11 = _10 ^ 9223372036854775807;
  34   │
  35   │[local count: 1073741824]:
  36   │   # _3 = PHI 
  37   │   return _3;
  38   │
  39   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int64_t sat_s_add_int64_t_fmt_1 (int64_t x, int64_t y)
   6   │ {
   7   │   int64_t _4;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _4 = .SAT_ADD (x_5(D), y_6(D)); [tail call]
  12   │   return _4;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the matching for signed .SAT_ADD.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_add): Add new
matching func decl.
(match_unsigned_saturation_add): Try signed .SAT_ADD and rename
to ...
(match_saturation_add): ... here.
(math_opts_dom_walker::after_dom_children): Update the above renamed
func from caller.

Signed-off-by: Pan Li 
---
 gcc/match.pd  | 17 
 gcc/tree-ssa-math-opts.cc | 42 ++-
 2 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index c9c8478d286..8b8a5dbcfe3 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3311,6 +3311,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   }
   (if (otype_precision < itype_precision && wi::eq_p (trunc_max, int_cst))
 
+/* Signed saturation add, case 1:
+   T sum = X + Y;
+   SAT_S_ADD = (X ^ Y) < 0
+ ? sum
+ : (sum ^ x) >= 0
+   ? sum
+   : x < 0 ? MIN : MAX;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (lt (bit_and:c (bit_xor:c @0 (convert?@2 (plus:c (convert? @0)
+(convert? @1
+  (bit_not (bit_xor:c @0 @1)))
+   integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   @2)
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* x >  y  &&  x != XXX_MIN  -->  x > y
x >  y  &&  x == XXX_MIN  -->  false . */
 (for eqne (eq ne)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 8d96a4c964b..f39c88741a4 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -4023,6 +4023,8 @@ extern bool gimple_unsigned_integer_sat_add (tree, tree*, 
tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_sub (tree, tree*, tree (*)(tree));
 extern bool gimple_unsigned_integer_sat_trunc (tree, tree*, tree (*)(tree));
 
+extern bool gimple_signed_integer_sat_add (tree, tree*, tree (*)(tree));
+
 static void
 build_saturation_binary_arith_call (gimple_stmt_iterator *gsi, internal_fn fn,
tree lhs, tree op_0, tree op_1)
@@ -4072,7 +4074,8 @@ match_unsigned_saturation_add (gimple_stmt_iterator *gsi, 
gassign *stmt)
 }
 
 /*
- * Try to match saturation unsigned add with PHI.
+ * Try to match saturation add with PHI.
+ * For unsigned integer:
  *:
  *   _1 = x_3(D) + y_4(D);
  *   if (_1 >= x_3(D))
@@ -4086,10 +4089,38 @@ match_unsigned_saturation_add (gimple_stmt_iterator 
*gsi, gassign *stmt)
  *   # _2 = PHI <255(2), _1(3)>
  *   =>
  *[local count: 1073741824]:
- *   _2 = .SAT_ADD (x_4(D), y_5(D));  */
+ *   _2 = .SAT_ADD (x_4(D), y_5(D));
+ 

Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-07 Thread Mikael Morin

Hello,

Le 06/08/2024 à 22:05, Harald Anlauf a écrit :

Hi Mikael,

thanks for this nice set of patches!

I've played around a bit, and it seems to look good.

I have only minor comments left (besides the nan issue raised):

- inline expansion is inhibited at -Os.  But wouldn't it be good if
   we make this expansion also dependent on -ffrontend-optimize?
   (This was the case for rank-1 before your patch).

See my answer to Thomas' message.


- in the case where two sets of loop(nest)s are generated, i.e. for
   non-integral argument x, I wonder if the predictors for conditionals
   (-> _likely/_unlikely) are chosen optimally.  E.g. for this code:

subroutine minloc_real (x, m, back)
   implicit none
   real, intent(in), contiguous :: x(:)
   integer  :: m(*)
   logical, optional    :: back
   m(1:rank(x)) = minloc (x,back=back)
end

the first loop becomes:

   S.10 = second_loop_entry.9 ? idx0.7 : 1;
   while (1)
     {
   if (S.10 > D.4310) goto L.3;
   if (__builtin_expect ((integer(kind=8)) ((*x.0)[S.10 * D.4314
+ D.4309] <= limit.8), 0, 8))
     {
   limit.8 = (*x.0)[S.10 * D.4314 + D.4309];
   pos0.5 = S.10 + offset0.6;
   idx0.7 = S.10;
   second_loop_entry.9 = 1;
   goto L.1;
     }
   S.10 = S.10 + 1;
     }

This results from this code in trans-intrinsic.cc:

   if (!lab1 || HONOR_NANS (DECL_MODE (limit)))
     {
   ...
   cond = gfc_unlikely (cond, PRED_BUILTIN_EXPECT);
   ifbody = build3_v (COND_EXPR, cond, ifbody,
  build_empty_stmt (input_location));
     }

As the reason for this separate loop is finding a first non-nan value,
I would expect gfc_likely to be more reasonable for the common case.

I think it is unlikely in the sense that it will be true at most once, 
because of the goto breaking out of the loop in the if body.


Your expectation is reasonable, I can't tell which is the better choice. 
 It seems also reasonable to generate code that is fast when there are 
many NANs and is slow but without visible effect when there are none, 
because there is a single slow iteration in that case.



(There is also the oddity S.10 = second_loop_entry.9 ? ..., where
idx0.7 seems to be not initialized, but luckily it seems to be
handled by the optimizer and seen that this is no problem.)

Yeah, I considered adding initializations, but finally decided to not 
pollute the frontend code (already complicated enough) and the generated 
code with useless stuff, that the optimizers would have to remove.



Having gfc_unlikely in the second set of loops is fine, as this passes
over all array elements.

Note that this is pre-existing without/before your patch, but since you
are at it, you may want to check.


Not sure how to check that.  I need a (realistic) benchmark.


Otherwise this is fine for mainline with the said issues considered.


Thanks for the review.
Will send a second version once we have settled on the topic of the 
frontend optimization flag.


Mikael


[PATCH] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread Feng Wang
This patch add the vector rotate shift pattern for auto-vect.
With this patch, the scalar rotate shift can be automatically
vectorized into vector rotate shift.

signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/autovec-opt.md (v3):
Add define_expand for vector rotate shift.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.

---
 gcc/config/riscv/autovec-opt.md   | 16 
 .../riscv/rvv/autovec/binop/vrolr-1.c |  9 ++
 .../riscv/rvv/autovec/binop/vrolr-run.c   | 88 +++
 .../riscv/rvv/autovec/binop/vrolr-template.h  | 29 ++
 4 files changed, 142 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-template.h

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index d7a3cfd4602..923122510ac 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1607,3 +1607,19 @@
 DONE;
   }
   [(set_attr "type" "vandn")])
+
+;; -
+;; - vrol.vv vror.vv
+;; -
+(define_expand "v3"
+  [(set (match_operand:VI 0 "register_operand")
+(bitmanip_rotate:VI
+ (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "register_operand")))]
+  "TARGET_ZVBB || TARGET_ZVKB"
+  {
+riscv_vector::emit_vlmax_insn (code_for_pred_v (, mode),
+  riscv_vector::BINARY_OP, operands);
+DONE;
+  }
+)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
new file mode 100644
index 000..55dac27697c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include "vrolr-template.h"
+
+/* { dg-final { scan-assembler-times {\tvrol\.vv} 4 } } */
+/* { dg-final { scan-assembler-times {\tvror\.vv} 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
new file mode 100644
index 000..221795ba871
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vrolr-run.c
@@ -0,0 +1,88 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_zvbb_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-add-options "riscv_zvbb" } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model" } */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define ARRAY_SIZE 512
+
+#define CIRCULAR_LEFT_SHIFT_ARRAY(arr, shifts, bit_size, size) \
+for (int i = 0; i < size; i++) { \
+(arr)[i] = (((arr)[i] << (shifts)[i % bit_size]) | ((arr)[i] >> 
(bit_size - (shifts)[i % bit_size]))); \
+}
+
+#define CIRCULAR_RIGHT_SHIFT_ARRAY(arr, shifts, bit_size, size) \
+for (int i = 0; i < size; i++) { \
+(arr)[i] = (((arr)[i] >> (shifts)[i % bit_size]) | ((arr)[i] << 
(bit_size - (shifts)[i % bit_size]))); \
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results8(
+uint8_t *result_left, uint8_t *result_right,
+int bit_size, uint8_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i % bit_size]) | (i >> 
(bit_size - shift_values[i % bit_size])));
+assert(result_right[i] == (i >> shift_values[i % bit_size]) | (i << 
(bit_size - shift_values[i % bit_size])));
+}
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results16(
+uint16_t *result_left, uint16_t *result_right,
+int bit_size, uint16_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i % bit_size]) | (i >> 
(bit_size - shift_values[i % bit_size])));
+assert(result_right[i] == (i >> shift_values[i % bit_size]) | (i << 
(bit_size - shift_values[i % bit_size])));
+}
+}
+
+void __attribute__((optimize("no-tree-vectorize"))) compare_results32(
+uint32_t *result_left, uint32_t *result_right,
+int bit_size, uint32_t *shift_values)
+{
+for (int i = 0; i < ARRAY_SIZE; i++) {
+assert(result_left[i] == (i << shift_values[i % bit_size]) | (i >> 
(bit_size - shift_values[i % bit_size])));
+assert(result_right[i] == (i >> shift_values[i % bit_size]) | (i << 
(bit_size - shift_values[i % bit_size])));
+}
+}
+
+void __attribute__((optimize("no-tree-vect

Re: [PATCH 0/8] fortran: Inline MINLOC/MAXLOC without DIM argument [PR90608]

2024-08-07 Thread Harald Anlauf

Hi Mikael, Thomas!

Am 07.08.24 um 11:11 schrieb Mikael Morin:

Hello,

Le 06/08/2024 à 22:57, Thomas Koenig a écrit :

Hi Mikael and Harald,


- inline expansion is inhibited at -Os.  But wouldn't it be good if
   we make this expansion also dependent on -ffrontend-optimize?
   (This was the case for rank-1 before your patch).


By the way, I disabled the minmaxloc frontend optimization without too 
much thought, because it was preventing me from seeing the effects of my 
patches in the dumps.  Now that both of you have put some focus on it, I 
think the optimization should be completely removed instead, because the 
patches make it unreachable.



The original idea was to have -ffrontend-optimize as a check if anything
went wrong with front-end optimization in particular - if the bug went
away with -fno-frontend-optimize, we knew where to look (and I knew
I had to look).

It also provides a way for users to workaround bugs in frontend 
optimizations.  If inline expansion were dependent on the flag, it would 
also provide the same benefit, but it would be using the flag outside of 
its intended scope, so I would rather not do it.



So, probably better to not do this at -Os.  One thought: Should we
also do the inlining without optimization?


At -Os: no inline expansion.  Don't we all agree on that?
I'm fine with also disabling expansion at -O0.


The following change to patch 2/8 does what I had in mind:

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 9f3c3ce47bc..cc0d00f4e39 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -11650,6 +11650,29 @@ gfc_inline_intrinsic_function_p (gfc_expr *expr)
 case GFC_ISYM_TRANSPOSE:
   return true;

+case GFC_ISYM_MINLOC:
+case GFC_ISYM_MAXLOC:
+  {
+   /* Disable inline expansion if code size matters.  */
+   if (optimize_size)
+ return false;

/* Disable inline expansion if frontend optimization is disabled.  */
if (!flag_frontend_optimize)
  return false;


As a result, the following happens:

- at -Os, inlining will never happen (as you had it)
- at -O0, the default is -fno-frontend-optimize, and we get the
  library implementation.  Inlining is forced with -ffrontend-optimize.
- at higher -Ox, the default is -ffrontend-optimize.

I believe this is also what Thomas' original motivation was.

(This flag actually helps to see that the inlining code in gcc-14
is currently broken for minloc/maxloc and optinional back argument).

As we are not planning to remove the library implementation (-Os!),
this is also the best way to compare library to inline code.

Cheers,
Harald


Mikael






Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Martin Uecker
Am Mittwoch, dem 07.08.2024 um 11:14 +0200 schrieb Alejandro Colomar:
> Hi Martin,
> 

> > > +void fix_fix (int i, char (*a)[3][5], int (*x)[__lengthof__ (*a)]);
> > > +void fix_var (int i, char (*a)[3][i], int (*x)[__lengthof__ (*a)]);
> > > +void fix_uns (int i, char (*a)[3][*], int (*x)[__lengthof__ (*a)]);
> > 
> > 
> > It would include a test that shows that when lengthof
> > is applied to [*] that it remains formally non-constant.  For example,
> > you could test with -Wvla-parameter that the two declarations do not give a
> > warning:
> > 
> > void foo(char (*a)[*], int x[*]);
> > void foo(char (*a)[*], int x[__lengthof__(*a)]);
> 
> But [*] is a VLA.  Do we want to return a constexpr for it?

No,  my point is only that we could have a test for not
returning a constant. 

If __lengthof__ would incorrectly return an integer constant
expression then you would get a warning with -Wvla-parameter.  So
adding these two declarations to the tests and activating
the warning would ensure that the int[__lengthof__(*a)]
is a VLA:  https://godbolt.org/z/7P7qW15ah

> 
> > (With  int (*x)[*]  we would run into the issue that we can not
> > distinguish zero arrays from unspecified ones, PR 98539)
> 
> As Martin Sebor said, I need to choose between supporting well [0] or
> supporting well [*], but not both.

If you have only one array index this works. (and should
already work correctly with your patch)

> 
> I would personally prefer supporting [0], and consider that not
> supporting [*] is a bug in the implementation of [*] (and thus not my
> problem).
> 
> However, since GCC doesn't support 0-length arrays, I'm not sure that
> would be correct.
> 
> What do you think?

I think the logic in your patch is OK as is.  It does not exactly
what you want, as it now treats some [0] as [*] but I would not
make the logic more complex here when we will fix it properly
anyway.

> 
> Does anyone oppose treating [0] as a constexpr 0 length?  That means not
> supporting well [*], but please fix it separately, which Martin Uecker
> is working on.  :)
> 
> > > diff --git a/gcc/testsuite/gcc.dg/lengthof.c 
> > > b/gcc/testsuite/gcc.dg/lengthof.c
> > > new file mode 100644
> > > index 000..38da5df52a5
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/lengthof.c
> > > @@ -0,0 +1,127 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-Wno-declaration-after-statement -Wno-pedantic 
> > > -Wno-vla" } */
> > > +
> > > +#undef NDEBUG
> > > +#include 
> > > +
> > > +void
> > > +array (void)
> > > +{
> > > +  short a[7];
> > > +
> > > +  assert (__lengthof__ (a) == 7);
> > > +  assert (__lengthof__ (long [0]) == 0);
> > > +  assert (__lengthof__ (unsigned [99]) == 99);
> > > +}
> > 
> > Instead of using assert you can use
> > 
> > if (! ...) __builtin_abort();
> > 
> > to avoid the include in the testsuite.  
> 
> Is it frowned upon to include something?  I prefer assert(3).

It makes the tests run faster.  At least people told me before
to avoid includes in tests for this reason.   But from my side
assert is ok too.

Martin


> 
> > Otherwise it looks fine from my side.
> > 
> > Joseph needs to approve and may have more comments.
> 
> Thanks!
> 
> > 
> > Martin
> 
> Have a lovely day!
> Alex
> 

-- 
Univ.-Prof. Dr. rer. nat. Martin Uecker
Graz University of Technology
Institute of Biomedical Imaging




[PATCH] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread 钟居哲
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index d7a3cfd4602..923122510ac 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -1607,3 +1607,19 @@

Such pattern which is natural auto-vectorization pattern should be in 
autovec.md instead of autovec-opt.md
which is supposed include the combine optimization pattern.


juzhe.zh...@rivai.ai


Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 11:08 AM Alexander Monakov  wrote:
>
>
> On Wed, 7 Aug 2024, Richard Biener wrote:
>
> > > > +  data = *(const v16qi_u *)s;
> > > > +  /* Prevent propagation into pshufb and pcmp as memory operand.  
> > > > */
> > > > +  __asm__ ("" : "+x" (data));
> > >
> > > It would probably make sense to a file a PR on this separately,
> > > to eventually fix the compiler to not need such workarounds.
> > > Not sure how much difference it makes however.
> >
> > This is probably to work around bugs in older compiler versions?  If
> > not I agree.
>
> This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> macro-fused on Intel, so with propagation it is two uops early in the
> CPU front-end.
>
> The "propagation" actually falls out of IRA/LRA decisions, and stopped
> happening in gcc-14. I'm not sure if there were relevant RA changes.
> In any case, this can potentially flip-flop in the future again.
>
> Considering the trunk gets this right, I think the next move is to
> add a testcase for this, not a PR, correct?

Well, merging the memory operand into the pshufb would be wrong - embedded
memory ops are always considered aligned, no?

> > Otherwise the patch is OK.
>
> Still OK with the asms, or would you prefer them be taken out?

I think it's OK with the asms.

Richard.

> Thanks.
>
> Alexander


Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 01:16:20PM +0200, Richard Biener wrote:
> Well, merging the memory operand into the pshufb would be wrong - embedded
> memory ops are always considered aligned, no?

Depends.  For VEX/EVEX encoded can be unaligned, for the pre-AVX encoding
aligned except when in explicitly unaligned instructions.

Jakub



Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov


On Wed, 7 Aug 2024, Richard Biener wrote:

> > > This is probably to work around bugs in older compiler versions?  If
> > > not I agree.
> >
> > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> > macro-fused on Intel, so with propagation it is two uops early in the
> > CPU front-end.
> >
> > The "propagation" actually falls out of IRA/LRA decisions, and stopped
> > happening in gcc-14. I'm not sure if there were relevant RA changes.
> > In any case, this can potentially flip-flop in the future again.
> >
> > Considering the trunk gets this right, I think the next move is to
> > add a testcase for this, not a PR, correct?
> 
> Well, merging the memory operand into the pshufb would be wrong - embedded
> memory ops are always considered aligned, no?

In SSE yes, in AVX no. For search_line_ssse3 the asms help if it is compiled
with e.g. -march=sandybridge (i.e. for a CPU that has AVX but lacks AVX2):
then VEX-encoded SSE instructions accept misaligned memory, and we want to
prevent that here.

Alexander


[PATCH v3 1/7] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-08-07 Thread Paul-Antoine Arras
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.

gcc/fortran/ChangeLog:

* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.
---
 gcc/builtin-types.def|  1 +
 gcc/fortran/types.def|  1 +
 gcc/omp-selectors.h  |  1 +
 gcc/tree-core.h  |  7 +++
 gcc/tree-pretty-print.cc | 21 +
 gcc/tree.cc  |  4 
 gcc/tree.def |  5 +
 gcc/tree.h   |  7 +++
 8 files changed, 47 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
 DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
 DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
diff --git a/gcc/fortran/types.def b/gcc/fortran/types.def
index 390cc9542f7..5047c8f816a 100644
--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def
@@ -120,6 +120,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, 
BT_BOOL)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTR_PTRMODE,
 BT_VOID, BT_PTR, BT_PTRMODE)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h
index c61808ec0ad..ef3ce9a449a 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -55,6 +55,7 @@ enum omp_ts_code {
   OMP_TRAIT_CONSTRUCT_PARALLEL,
   OMP_TRAIT_CONSTRUCT_FOR,
   OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
   OMP_TRAIT_LAST,
   OMP_TRAIT_INVALID = -1
 };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
 case OMP_CLAUSE_EXCLUSIVE:
   name = "exclusive";
   goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags);
   goto dump_omp_body;
 
+case OMP_DISPATCH:
+  pp_string (pp, "#pragma omp dispatch");
+  dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags);
+  goto dump_omp_body;
+
 case OMP_SECTION:
   pp_string (pp, "#pragma omp section");
   goto dump_omp_body;
diff --git a/gcc/tree.cc b/gcc/tree.cc

[PATCH v3 0/7] OpenMP: dispatch + adjust_args support

2024-08-07 Thread Paul-Antoine Arras
This is a respin of my patchset implementing both the `dispatch` construct and 
the `adjust_args` clause to the `declare variant` directive. The previous
submission can be found there: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657151.html.

Beside being rebased, this new iteration has the following changes:
 * Remove `need_device_ptr` as pseudo-trait set selector and pass `adjust_args`
 list as attribute to the base function. This also ensures that the list will 
 survive multiple declarations;
 * OpenMP 5.1 mandated that each dispatch construct should generate an explicit 
 task. This requirement has been lifted in 5.2, so we removed it;
 * As a result, some clauses are now handled differently: nowait has no effect, 
 depend is moved to a taskwait construct and the default-device ICV has to be 
 restored at the end of the dispatch region;
 * Update test cases.


Paul-Antoine Arras (7):
  OpenMP: dispatch + adjust_args tree data structures and front-end
interfaces
  OpenMP: middle-end support for dispatch + adjust_args
  OpenMP: C front-end support for dispatch + adjust_args
  OpenMP: C++ front-end support for dispatch + adjust_args
  OpenMP: common C/C++ testcases for dispatch + adjust_args
  OpenMP: Fortran front-end support for dispatch + adjust_args
  OpenMP: update documentation for dispatch and adjust_args

 gcc/builtin-types.def |   1 +
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 522 +--
 gcc/c/c-typeck.cc |   2 +
 gcc/cp/decl.cc|   7 +
 gcc/cp/parser.cc  | 631 --
 gcc/cp/semantics.cc   |  20 +
 gcc/fortran/dump-parse-tree.cc|  17 +
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  11 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 192 +-
 gcc/fortran/parse.cc  |  39 +-
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 135 
 gcc/fortran/trans.cc  |   1 +
 gcc/fortran/types.def |   1 +
 gcc/gimple-low.cc |   1 +
 gcc/gimple-pretty-print.cc|  33 +
 gcc/gimple-walk.cc|   1 +
 gcc/gimple.cc |  20 +
 gcc/gimple.def|   5 +
 gcc/gimple.h  |  33 +-
 gcc/gimplify.cc   | 421 +++-
 gcc/gimplify.h|   2 +
 gcc/omp-builtins.def  |   6 +
 gcc/omp-expand.cc |  18 +
 gcc/omp-general.cc|  14 +-
 gcc/omp-low.cc|  35 +
 gcc/omp-selectors.h   |   1 +
 .../c-c++-common/gomp/adjust-args-1.c |  30 +
 .../c-c++-common/gomp/adjust-args-2.c |  31 +
 .../c-c++-common/gomp/declare-variant-2.c |   4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  |  65 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  |  28 +
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  |  12 +
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  |  18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  |  27 +
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  |  18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  |  21 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 ++
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 +
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 ++
 .../gfortran.dg/gomp/adjust-args-1.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 +
 .../gfortran.dg/gomp/adjust-args-3.f90|  27 +
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 ++
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  25 +
 .../gfortran.dg/gomp/declare-variant-21.f90   |  22 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-4.f90 |  19 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-5.f90 |  25 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-6.f90 |  39 ++
 

[PATCH v3 2/7] OpenMP: middle-end support for dispatch + adjust_args

2024-08-07 Thread Paul-Antoine Arras
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `gomp_get_mapped_ptr` for the adequate device.

For dispatch, the following steps are performed:

* Handle the device clause, if any: set the default-device ICV at the top of the
dispatch region and restore its previous value at the end.

* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.

* If depend clauses are present, add a taskwait construct before the dispatch
region and move them there.

gcc/ChangeLog:

* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH.
* gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_dispatch): New function.
(gimple_copy): Handle GIMPLE_OMP_DISPATCH.
* gimple.def (GIMPLE_OMP_DISPATCH): Define.
* gimple.h (gimple_build_omp_dispatch): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH.
(gimple_omp_dispatch_clauses): New function.
(gimple_omp_dispatch_clauses_ptr): Likewise.
(gimple_omp_dispatch_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH.
* gimplify.cc (enum omp_region_type): Add ORT_DISPATCH.
(gimplify_call_expr): Handle need_device_ptr arguments.
(is_gimple_stmt): Handle OMP_DISPATCH.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch
construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext
clause.
(omp_has_novariants): New function.
(omp_has_nocontext): Likewise.
(gimplify_omp_dispatch): Likewise.
(gimplify_expr): Handle OMP_DISPATCH.
* gimplify.h (omp_has_novariants): Declare.
(omp_has_nocontext): Declare.
* omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define.
(BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define.
(BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define.
* omp-expand.cc (expand_omp_dispatch): New function.
(expand_omp): Handle GIMPLE_OMP_DISPATCH.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH.
(struct omp_ts_info): Add dispatch.
(omp_context_selector_matches): Handle OMP_TRAIT_SET_NEED_DEVICE_PTR.
(omp_resolve_declare_variant): Handle novariants. Adjust
DECL_ASSEMBLER_NAME.
---
 gcc/gimple-low.cc  |   1 +
 gcc/gimple-pretty-print.cc |  33 +++
 gcc/gimple-walk.cc |   1 +
 gcc/gimple.cc  |  20 ++
 gcc/gimple.def |   5 +
 gcc/gimple.h   |  33 ++-
 gcc/gimplify.cc| 421 -
 gcc/gimplify.h |   2 +
 gcc/omp-builtins.def   |   6 +
 gcc/omp-expand.cc  |  18 ++
 gcc/omp-general.cc |  14 +-
 gcc/omp-low.cc |  35 +++
 gcc/tree-inline.cc |   7 +
 13 files changed, 585 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-low.cc b/gcc/gimple-low.cc
index e0371988705..712a1ebf776 100644
--- a/gcc/gimple-low.cc
+++ b/gcc/gimple-low.cc
@@ -746,6 +746,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data 
*data)
 case GIMPLE_EH_MUST_NOT_THROW:
 case GIMPLE_OMP_FOR:
 case GIMPLE_OMP_SCOPE:
+case GIMPLE_OMP_DISPATCH:
 case GIMPLE_OMP_SECTIONS:
 case GIMPLE_OMP_SECTIONS_SWITCH:
 case GIMPLE_OMP_SECTION:
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 08b823c84ef..e7b2df9a0ef 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1726,6 +1726,35 @@ dump_gimple_omp_scope (pretty_printer *pp, const gimple 
*gs,
 }
 }
 
+/* Dump a GIMPLE_OMP_DISPATCH tuple on the pretty_printer BUFFER.  */
+
+static void
+dump_gimple_omp_dispatch (pretty_printer *buffer, const gimple *gs, int spc,
+ dump_flags_t flags)
+{
+  if (flags & TDF_RAW)
+{
+  dump_gimple_fmt (buffer, spc, flags, "%G <%+BODY <%S>%nCLAUSES <", gs,
+  gimple_omp_body (gs));
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  dump_gimple_fmt (buffer, spc, flags, " >");
+}
+  else
+{
+  pp_string (buffer, "#pragma omp dispatch");
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  if (!gimple_seq_empty_p (gimple_omp_body (gs)))
+   {
+ newline_and_indent (buffer, spc + 2);

[PATCH v3 5/7] OpenMP: common C/C++ testcases for dispatch + adjust_args

2024-08-07 Thread Paul-Antoine Arras
gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
---
 .../c-c++-common/gomp/adjust-args-1.c | 30 +
 .../c-c++-common/gomp/adjust-args-2.c | 31 +
 .../c-c++-common/gomp/declare-variant-2.c |  4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  | 65 +++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  | 28 
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  | 12 
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  | 18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  | 27 
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  | 18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  | 21 ++
 .../dispatch-1.c  |  0
 .../dispatch-2.c  |  0
 12 files changed, 252 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-7.c
 rename libgomp/testsuite/{libgomp.c => libgomp.c-c++-common}/dispatch-1.c 
(100%)
 rename libgomp/testsuite/{libgomp.c => libgomp.c-c++-common}/dispatch-2.c 
(100%)

diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
new file mode 100644
index 000..728abe62092
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b) adjust_args (need_device_ptr: c)
+int f1 (int a, void *b, float c[2]);
+
+int test () {
+  int a;
+  void *b;
+  float c[2];
+  struct {int a;} s;
+
+  s.a = f0 (a, b, c);
+  #pragma omp dispatch
+  s.a = f0 (a, b, c);
+
+  f1 (a, b, c);
+  #pragma omp dispatch
+  s.a = f1 (a, b, c);
+
+  return s.a;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
new file mode 100644
index 000..d2a4a5f4ec4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) adjust_args (need_device_ptr: b, c) match 
(construct={dispatch}) adjust_args (nothing: a) 
+int f1 (int a, void *b, float c[2]);
+
+void test () {
+  int a;
+  void *b;
+  float c[2];
+
+  #pragma omp dispatch
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (-4852)
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (a + a)
+  f0 (a, b, c);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 3 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, -4852\\);" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, -4852\\);" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-not "#pragma omp dispatch device" "gimple" } } 
*/
diff --git a/gcc/t

[PATCH v3 4/7] OpenMP: C++ front-end support for dispatch + adjust_args

2024-08-07 Thread Paul-Antoine Arras
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.

Additional C/C++ common testcases are provided in a subsequent patch in the
series.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add parameter. Update call to
cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
---
 gcc/cp/decl.cc|   7 +
 gcc/cp/parser.cc  | 631 --
 gcc/cp/semantics.cc   |  20 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 +++
 7 files changed, 818 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 279af21eed0..00056d37ecf 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8383,6 +8383,13 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
  if (!omp_context_selector_matches (ctx))
return true;
  TREE_PURPOSE (TREE_VALUE (attr)) = variant;
+
+ // Prepend adjust_args list to variant attributes
+ tree adjust_args_list = TREE_CHAIN (TREE_CHAIN (chain));
+ if (adjust_args_list != NULL_TREE)
+   DECL_ATTRIBUTES (variant) = tree_cons (
+ get_identifier ("omp declare variant variant adjust_args"),
+ TREE_VALUE (adjust_args_list), DECL_ATTRIBUTES (variant));
}
 }
   else if (!processing_template_decl)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index eb102dea829..b43eabb0cff 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#include "omp-selectors.h"
 #define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
@@ -2587,7 +2588,7 @@ static cp_ref_qualifier cp_parser_ref_qualifier_opt
 static tree cp_parser_tx_qualifier_opt
   (cp_parser *);
 static tree cp_parser_late_return_type_opt
-  (cp_parser *, cp_declarator *, tree &);
+  (cp_parser *, cp_declarator *, tree &, tree);
 static tree cp_parser_declarator_id
   (cp_parser *, bool);
 static tree cp_parser_type_id
@@ -2622,7 +2623,7 @@ static void 
cp_parser_ctor_initializer_opt_and_function_body
   (cp_parser *, bool);
 
 static tree cp_parser_late_parsing_omp_declare_simd
-  (cp_parser *, tree);
+  (cp_parser *, tree, tree);
 
 static tree cp_parser_late_parsing_oacc_routine
   (cp_parser *, tree);
@@ -24193,7 +24194,7 @@ cp_parser_direct_declarator (cp_parser* parser,
  tree requires_clause = NULL_TREE;
  late_return
= cp_parser_late_return_type_opt (parser, declarator,
- requires_clause);
+ requires_clause, params);
 
  cp_finalize_omp_declare_simd (parser, &odsd);
 
@@ -25058,8 +25059,8 @@ parsing_function_declarator ()
function.  */
 
 static tree
-cp_parser_late_return_type_opt (cp_parser* parser, cp_declarator *declarator,
-   tree& requires_clause)
+cp_parser_late_return_type_opt (c

[PATCH v3 7/7] OpenMP: update documentation for dispatch and adjust_args

2024-08-07 Thread Paul-Antoine Arras
libgomp/ChangeLog:

* libgomp.texi:
---
 libgomp/libgomp.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 07cd75124b0..b35424c047a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -294,8 +294,8 @@ The OpenMP 4.5 specification is fully supported.
 @item C/C++'s @code{declare variant} directive: elision support of
   preprocessed code @tab N @tab
 @item @code{declare variant}: new clauses @code{adjust_args} and
-  @code{append_args} @tab N @tab
-@item @code{dispatch} construct @tab N @tab
+  @code{append_args} @tab P @tab Only @code{adjust_args}
+@item @code{dispatch} construct @tab Y @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
 @item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
-- 
2.45.2



[PATCH v3 3/7] OpenMP: C front-end support for dispatch + adjust_args

2024-08-07 Thread Paul-Antoine Arras
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.

Additional common C/C++ testcases are in a later patch in the series.

gcc/c-family/ChangeLog:

* c-attribs.cc (c_common_gnu_attributes): Add attribute for adjust_args
need_device_ptr.
* c-omp.cc (c_omp_directives): Uncomment dispatch.
* c-pragma.cc (omp_pragmas): Add dispatch.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_DISPATCH.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_NOCONTEXT and
PRAGMA_OMP_CLAUSE_NOVARIANTS.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
---
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 522 +++---
 gcc/c/c-typeck.cc |   2 +
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 ++
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 +++
 libgomp/testsuite/libgomp.c/dispatch-1.c  |  76 
 libgomp/testsuite/libgomp.c/dispatch-2.c  |  84 
 10 files changed, 719 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/dispatch-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/dispatch-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/dispatch-2.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 685f212683f..91a5356796d 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -562,6 +562,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_omp_declare_variant_attribute, NULL },
   { "omp declare variant variant", 0, -1, true,  false, false, false,
  handle_omp_declare_variant_attribute, NULL },
+  { "omp declare variant adjust_args need_device_ptr", 0, -1, true,  false, 
false, false,
+ handle_omp_declare_variant_attribute, NULL },
   { "simd",  0, 1, true,  false, false, false,
  handle_simd_attribute, NULL },
   { "omp declare target", 0, -1, true, false, false, false,
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index b5ce1466e5d..c74a9fb2691 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4299,8 +4299,8 @@ const struct c_omp_directive c_omp_directives[] = {
 C_OMP_DIR_DECLARATIVE, false },
   { "depobj", nullptr, nullptr, PRAGMA_OMP_DEPOBJ,
 C_OMP_DIR_STANDALONE, false },
-  /* { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
-C_OMP_DIR_CONSTRUCT, false },  */
+  { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
+C_OMP_DIR_DECLARATIVE, false },
   { "distribute", nullptr, nullptr, PRAGMA_OMP_DISTRIBUTE,
 C_OMP_DIR_CONSTRUCT, true },
   { "end", "assumes", nullptr, PRAGMA_OMP_END,
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 25251c2b69f..b956819c0a5 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1526,6 +1526,7 @@ static const struct omp_pragma_def omp_pragmas[] = {
   { "cancellation", PRAGMA_OMP_CANCELLATION_POINT },
   { "critical", PRAGMA_OMP_CRITICAL },
   { "depobj", PRAGMA_OMP_DEPOBJ },
+  { "dispatch", PRAGMA_OMP_DISPATCH },
   { "error", PRAGMA_OMP_ERROR },
   { "end", PRAGMA_OMP_END },
   { "flush", PRAGMA_OMP_FLUSH },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 2ebde06c471..6b6826b2426 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,7 @@ enum pragma_kind {
   PRAGMA_OMP_CRITICAL,
   PRAGMA_OMP_DECLARE,
   PRAGMA_OMP_DEPOBJ,
+  PRAGMA_OMP_DISPATCH,
   PRAGMA_OMP_DISTRIBUTE,
   PRAGMA_OMP_ERROR,
   PRAGMA_OMP_END,
@@ -135,9 +136,11 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_LINK,
   PRAGMA_OMP_CLAUSE_MAP,
   PRAGMA_OMP_CLAUSE_MERGEABLE,
+  PRAGMA_OMP_CLAUSE_NOCONTEXT,
   PRAGMA_OMP_CLAUSE_NOGROUP,
   PRAGMA_OMP_CLAUSE_NONTEMPORAL,
   PRAGMA_O

[PATCH v3 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-08-07 Thread Paul-Antoine Arras
This patch adds support for the `dispatch` construct and the `adjust_args`
clause to the Fortran front-end.

Handling of `adjust_args` across translation units is missing due to PR115271.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext
clauses.
(show_omp_node): Handle EXEC_OMP_DISPATCH.
(show_code_node): Likewise.
* frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext.
* gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH.
(symbol_attribute): Add omp_declare_variant_need_device_ptr.
(gfc_omp_clauses): Add novariants and nocontext.
(gfc_omp_declare_variant): Add need_device_ptr_arg_list.
(enum gfc_exec_op): Add EXEC_OMP_DISPATCH.
* match.h (gfc_match_omp_dispatch): Declare.
* openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext
clauses.
(gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list
namelist.
(enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(OMP_DISPATCH_CLAUSES): Define.
(gfc_match_omp_dispatch): New function.
(gfc_match_omp_declare_variant): Parse adjust_args.
(resolve_omp_clauses): Handle adjust_args, novariants and nocontext.
Adjust handling of OMP_LIST_IS_DEVICE_PTR.
(icode_code_error_callback): Handle EXEC_OMP_DISPATCH.
(omp_code_to_statement): Likewise.
(resolve_omp_dispatch): New function.
(gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH.
* parse.cc (decode_omp_directive): Match dispatch.
(next_statement): Handle ST_OMP_DISPATCH.
(gfc_ascii_statement): Likewise.
(parse_omp_dispatch): New function.
(parse_executable): Handle ST_OMP_DISPATCH.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH.
* st.cc (gfc_free_statement): Likewise.
* trans-decl.cc (create_function_arglist): Declare.
(gfc_get_extern_function_decl): Call it.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and
nocontext.
(gfc_trans_omp_dispatch): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH.
(gfc_trans_omp_declare_variant): Handle adjust_args.
* trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-2.f90: Update dg-error.
* gfortran.dg/gomp/declare-variant-21.f90: New test (xfail).
* gfortran.dg/gomp/declare-variant-21-aux.f90: New test.
* gfortran.dg/gomp/adjust-args-1.f90: New test.
* gfortran.dg/gomp/adjust-args-2.f90: New test.
* gfortran.dg/gomp/adjust-args-3.f90: New test.
* gfortran.dg/gomp/adjust-args-4.f90: New test.
* gfortran.dg/gomp/adjust-args-5.f90: New test.
* gfortran.dg/gomp/dispatch-1.f90: New test.
* gfortran.dg/gomp/dispatch-2.f90: New test.
* gfortran.dg/gomp/dispatch-3.f90: New test.
* gfortran.dg/gomp/dispatch-4.f90: New test.
* gfortran.dg/gomp/dispatch-5.f90: New test.
* gfortran.dg/gomp/dispatch-6.f90: New test.
* gfortran.dg/gomp/dispatch-7.f90: New test.
* gfortran.dg/gomp/dispatch-8.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 ++
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  11 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 192 --
 gcc/fortran/parse.cc  |  39 +++-
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 135 
 gcc/fortran/trans.cc  |   1 +
 .../gfortran.dg/gomp/adjust-args-1.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 ++
 .../gfortran.dg/gomp/adjust-args-3.f90|  27 +++
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 ++
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  25 +++
 .../gfortran.dg/gomp/declare-variant-21.f90   |  22 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 
 gcc/testsuite/gfortran.dg/gomp/dispatch-4.f90 |  19 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-5.f90 |  25 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-6.f90 |  39 
 gcc/testsuite/gfortran.dg/gomp/dispatch-7.f90 |  26 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-8.f90 |  33 +++
 27 files change

RE: [x86_64 PATCH] Refactor V2DI arithmetic right shift expansion for STV.

2024-08-07 Thread Roger Sayle


My sincere apologies for not noticing that g++.dg/other/sse2-pr85572-1.C
was FAILing with my recent ashrv2di patch.  I'm not sure how that happened.
Many thanks to Andrew Pinski for alerting me, and confirming that the
changes are harmless/beneficial.  The following tweak to the testsuite
has been committed as obvious.  Sorry again for the inconvenience.

Tested on x86_64-pc-linux-gnu with RUNTESTFLAGS="dg.exp=sse2-pr85572-1.C".


2024-08-07  Roger Sayle  

gcc/testsuite/ChangeLog
* g++.dg/other/sse2-pr85572-1.C: Update expected output after
my recent patch for ashrv2di3.  Now with one less instruction.


> -Original Message-
> From: Andrew Pinski 
> Sent: 06 August 2024 22:17
> On Mon, Aug 5, 2024 at 3:23 AM Roger Sayle 
> wrote:
> >
> >
> > This patch refactors ashrv2di RTL expansion into a function so that it
> > may be reused by a pre-reload splitter, such that DImode right shifts
> > may be considered candidates during the Scalar-To-Vector (STV) pass.
> > Currently DImode arithmetic right shifts are not considered potential
> > candidates during STV, so for the following testcase:
> >
> > long long m;
> > typedef long long v2di __attribute__((vector_size (16))); void
> > foo(v2di x) { m = x[0]>>63; }
> >
> > We currently see the following warning/error during STV2
> > >  r101 use in insn 7 isn't convertible
> >
> > And end up generating scalar code with an interunit move:
> >
> > foo:movq%xmm0, %rax
> > sarq$63, %rax
> > movq%rax, m(%rip)
> > ret
> >
> > With this patch, we can reuse the RTL expansion logic and produce:
> >
> > foo:psrad   $31, %xmm0
> > pshufd  $245, %xmm0, %xmm0
> > movq%xmm0, m(%rip)
> > ret
> >
> > Or with the addition of -mavx2, the equivalent:
> >
> > foo:vpxor   %xmm1, %xmm1, %xmm1
> > vpcmpgtq%xmm0, %xmm1, %xmm0
> > vmovq   %xmm0, m(%rip)
> > ret
> >
> >
> > The only design decision of note is the choice to continue lowering
> > V2DI into vector sequences during RTL expansion, to enable combine to
> > optimize things if possible.  Using just define_insn_and_split
> > potentially misses optimizations, such as reusing the zero vector produced 
> > by
> vpxor above.
> > It may be necessary to tweak STV's compute gain at some point, but
> > this patch controls what's possible (rather than what's beneficial).
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> Looks like you didn't update the testcase g++.dg/other/sse2-pr85572-1.C .
> Before the change GCC produced:
> ```
> foo(long long __vector(2)):
> movdqa  xmm2, xmm0
> pxorxmm1, xmm1
> psrlq   xmm2, 63
> psubq   xmm1, xmm2
> pxorxmm0, xmm1
> psubq   xmm0, xmm1
> ret
> ```
> 
> But afterwards GCC produces (which is better and is now similar to what llvm
> produces):
> ```
> _Z3fooDv2_x:
> movdqa  %xmm0, %xmm1
> psrad   $31, %xmm1
> pshufd  $245, %xmm1, %xmm1
> pxor%xmm1, %xmm0
> psubq   %xmm1, %xmm0
> ret
> ```
> 
> Thanks,
> Andrew
> 
> >
> > 2024-08-05  Roger Sayle  
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.cc (ix86_expand_v2di_ashiftrt): New
> > function refactored from define_expand ashrv2di3.
> > * config/i386/i386-features.cc
> > (general_scalar_to_vector_candidate_p)
> > : Handle like other shifts and rotates.
> > * config/i386/i386-protos.h (ix86_expand_v2di_ashiftrt): Prototype.
> > * config/i386/sse.md (ashrv2di3): Call ix86_expand_v2di_ashiftrt.
> > (*ashrv2di3): New define_insn_and_split to enable creation by stv2
> > pass, and splitting during split1 reusing ix86_expand_v2di_ashiftrt.
> >
> > gcc/testsuite/ChangeLog
> > * gcc.target/i386/sse2-stv-2.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >



Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Richard Biener
On Wed, Aug 7, 2024 at 1:37 PM Alexander Monakov  wrote:
>
>
> On Wed, 7 Aug 2024, Richard Biener wrote:
>
> > > > This is probably to work around bugs in older compiler versions?  If
> > > > not I agree.
> > >
> > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not
> > > macro-fused on Intel, so with propagation it is two uops early in the
> > > CPU front-end.
> > >
> > > The "propagation" actually falls out of IRA/LRA decisions, and stopped
> > > happening in gcc-14. I'm not sure if there were relevant RA changes.
> > > In any case, this can potentially flip-flop in the future again.
> > >
> > > Considering the trunk gets this right, I think the next move is to
> > > add a testcase for this, not a PR, correct?
> >
> > Well, merging the memory operand into the pshufb would be wrong - embedded
> > memory ops are always considered aligned, no?
>
> In SSE yes, in AVX no. For search_line_ssse3 the asms help if it is compiled
> with e.g. -march=sandybridge (i.e. for a CPU that has AVX but lacks AVX2):
> then VEX-encoded SSE instructions accept misaligned memory, and we want to
> prevent that here.

Ah, yeah - I think there's even existing bugreports that we're too happy to
duplicate a memory operand even into multiple insns.

Richard.

>
> Alexander


[PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates

2024-08-07 Thread Tobias Burnus

CCed Fortran because of the first item:

This patch now uses (again like in v1) a builtin for 
'omp_is_initial_device'; like in v2, it is compile-time evaluated, but 
this time (new!) it also handled the case that a user wrote that routine.


Note: The omp_… namespace is owned by OpenMP, i.e. if it breaks for a 
user-defined function (when compiled with -fopenmp), it's the fault of 
the user.


Otherwise, it is unchanged except for the following first suggestion. 
And while 'nohost' should be optimized (away on the host), that's 
deferred to a to-be-written follow-up patch.


On Aug 1, 2024, Jakub Jelinek wrote:

On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote:

-  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
+  tree name;
...

I'd just use a single buffer here,
   char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER))
  + 1 /* \0 */ + 32];

Done as proposed.

Given that the Xeon PHI offloading is gone and fork offloading doesn't seem
to be worked on, my preference would be
__builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because
that will actually help user code too.

Done.

And of course, it would be much better to figure out real nohost fix,
because if we need to register a constructor which will just do nothing, it
still wastes runtime.


To be done in a follow-up patch.

Comments, suggestions, concerns?

Tobias

PS: In principle, 'omp_get_num_devices()' would be a candidate for 
'-foffload=disable' (or not configured), but I am not sure how useful it 
is, especially as the decision whether offloading should be done is 
deferred to the link time.


PPS: For OpenACC, there is already an optimization for the similar but 
more complex acc_on_device. But that one doesn't handle Fortran due to 
the different ABI. See https://gcc.gnu.org/PR116269 for details.
OpenMP: Constructors and destructors for "declare target" static aggregates

This commit also compile-time expands (__builtin_)omp_is_initial_device for
both Fortran and C/C++. But the main change is:

This commit adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.

Before this commit, space is allocated on the target for such aggregates,
but nothing ever constructs them properly, so they end up zero-initialised.

(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)

2024-08-07  Julian Brown  
	Tobias Burnus  

gcc/ChangeLog:

	* builtins.def (DEF_GOMP_BUILTIN_COMPILER): Define
	DEF_GOMP_BUILTIN_COMPILER to handle the non-prefix version.
	* gimple-fold.cc (gimple_fold_builtin_omp_is_initial_device): New.
	(gimple_fold_builtin): Call it.
	* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): Define.
	* tree.cc (get_file_function_name): Support names for on-target
	constructor/destructor functions.

gcc/cp/
	* decl2.cc (tree-inline.h): Include.
	(static_init_fini_fns): Bump to four entries. Update comment.
	(start_objects, start_partial_init_fini_fn): Add 'omp_target'
	parameter. Support "declare target" decls. Update forward declaration.
	(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
	the created function. Support "declare target".
	(OMP_SSDF_IDENTIFIER): New macro.
	(partition_vars_for_init_fini): Support partitioning "declare target"
	variables also.
	(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
	"declare target" decls.
	(c_parse_final_cleanups): Support constructors/destructors on OpenMP
	offload targets.

gcc/fortran/ChangeLog:

	* f95-lang.cc (gfc_init_builtin_functions): Handle
	DEF_GOMP_BUILTIN_COMPILER)
	* trans-decl.cc (gfc_get_extern_function_decl): Add code to use
	DEF_GOMP_BUILTIN_COMPILER for 'omp_is_initial_device'.

libgomp/ChangeLog:

	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New test.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New test.
	* testsuite/libgomp.c-c++-common/target-is-initial-host.c: New test.
	* testsuite/libgomp.fortran/target-is-initial-host.f: New test.
	* testsuite/libgomp.fortran/target-is-initial-host.f90: New test.

Co-authored-by: Tobias Burnus 

 gcc/builtins.def   |   4 +
 gcc/cp/decl2.cc| 229 +
 gcc/fortran/f95-lang.cc|   9 +
 gcc/fortran/trans-decl.cc  |   8 +
 gcc/gimple-fold.cc |  20 ++
 gcc/omp-builtins.def   |   4 +
 gcc/tree.cc|   6 +-
 .../static-aggr-constructor-destructor-1.C |  72 +++
 .../static-aggr-constructor-destructor-2.C |  50 +

Re: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" static aggregates

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 02:08:42PM +0200, Tobias Burnus wrote:
> On Aug 1, 2024, Jakub Jelinek wrote:
> > On Tue, Jul 30, 2024 at 10:51:56PM +0200, Tobias Burnus wrote:
> > > -  char id[sizeof (SSDF_IDENTIFIER) + 1 /* '\0' */ + 32];
> > > +  tree name;
> > > ...
> > I'd just use a single buffer here,
> >char id[MAX (sizeof (SSDF_IDENTIFIER), sizeof (OMP_SSDF_IDENTIFIER))
> >   + 1 /* \0 */ + 32];
> Done as proposed.
> > Given that the Xeon PHI offloading is gone and fork offloading doesn't seem
> > to be worked on, my preference would be
> > __builtin_omp_is_initial_device () and fold that to 0/1 after IPA, because
> > that will actually help user code too.
> Done.
> > And of course, it would be much better to figure out real nohost fix,
> > because if we need to register a constructor which will just do nothing, it
> > still wastes runtime.
> 
> To be done in a follow-up patch.
> 
> Comments, suggestions, concerns?

As I wrote, I think there should be some option to override the
omp_is_initial_device folding, e.g. for the case where one is compiling some
library code which could be linked either way and so need to avoid folding
omp_is_initial_device because we'll only know at runtime.
But it can certainly wait for incremental change.

> gcc/fortran/ChangeLog:
> 
>   * f95-lang.cc (gfc_init_builtin_functions): Handle
>   DEF_GOMP_BUILTIN_COMPILER)

s/)/./

> @@ -5220,6 +5237,9 @@ gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  case BUILT_IN_ACC_ON_DEVICE:
>return gimple_fold_builtin_acc_on_device (gsi,
>   gimple_call_arg (stmt, 0));
> +case BUILT_IN_OMP_IS_INITIAL_DEVICE:
> + return gimple_fold_builtin_omp_is_initial_device (gsi);

The indentation here looks wrong, case is 4 spaces indented and next line
uses tab, should use 6 spaces.

Maybe would be worth testing that omp_is_initial_device is not treated like
a builtin in C++ in custom namespace, or as a static or non-static member
function, or for C or Fortran as nested function.

Otherwise LGTM.

Jakub



C++ Patch ping

2024-08-07 Thread Jakub Jelinek
Hi!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/thread.html#656299
patch.

Jonathan has acked the libstdc++ side thereof (I've added the
requested #undef on my side), is the c-cppbuiltin.cc side ok for trunk?

And, shall we (incrementally or right away) add some new tree to represent
the new expressions so that constant evaluation can do the required
diagnostics?

Thanks.

On Wed, Jul 03, 2024 at 04:37:00PM +0200, Jakub Jelinek wrote:
> 2024-07-03  Jakub Jelinek  
> 
>   PR c++/115744
> gcc/c-family/
>   * c-cppbuiltin.cc (c_cpp_builtins): Change __cpp_constexpr
>   from 202306L to 202406L for C++26.
> gcc/testsuite/
>   * g++.dg/cpp2a/construct_at.h (operator new, operator new[]):
>   Use constexpr instead of inline if __cpp_constexpr >= 202406L.
>   * g++.dg/cpp26/constexpr-new1.C: New test.
>   * g++.dg/cpp26/constexpr-new2.C: New test.
>   * g++.dg/cpp26/constexpr-new3.C: New test.
>   * g++.dg/cpp26/feat-cxx26.C (__cpp_constexpr): Adjust expected
>   value.
> libstdc++-v3/
>   * libsupc++/new (__glibcxx_want_constexpr_new): Define before
>   including bits/version.h.
>   (_GLIBCXX_PLACEMENT_CONSTEXPR): Define.
>   (operator new, operator new[]): Use it for placement new instead
>   of inline.
>   * include/bits/version.def (constexpr_new): New FTM.
>   * include/bits/version.h: Regenerate.

Jakub



[PATCH] tree-optimization/116258 - do not lower PAREN_EXPR of vectors

2024-08-07 Thread Richard Biener
The following avoids lowering of PAREN_EXPR of vectors as unsupported
to scalars.  Instead PAREN_EXPR is like a plain move or a VIEW_CONVERT.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.  I plan
to push to branches, __builtin_assoc_barrier is new in GCC 12.

PR tree-optimization/116258
* tree-vect-generic.cc (expand_vector_operations_1): Do not
lower PAREN_EXPR.

* gcc.target/i386/pr116258.c: New testcase.
---
 gcc/testsuite/gcc.target/i386/pr116258.c | 14 ++
 gcc/tree-vect-generic.cc |  9 +++--
 2 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr116258.c

diff --git a/gcc/testsuite/gcc.target/i386/pr116258.c 
b/gcc/testsuite/gcc.target/i386/pr116258.c
new file mode 100644
index 000..bd7d3a97b2c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr116258.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#define vect16 __attribute__((vector_size(16)))
+#define h(a) __builtin_assoc_barrier((a))
+
+ vect16 float  f( vect16 float  x, vect16 float vconstants0)
+{
+  vect16 float  t = (x * (vconstants0[0]));
+  return (x + h(t));
+}
+
+/* { dg-final { scan-assembler-times "shufps" 1 } } */
+/* { dg-final { scan-assembler-not "unpck" } } */
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index 8336cbb8c73..4bcab71c168 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -2206,10 +2206,15 @@ expand_vector_operations_1 (gimple_stmt_iterator *gsi,
}
 }
 
+  /* Plain moves do not need lowering.  */
+  if (code == SSA_NAME
+  || code == VIEW_CONVERT_EXPR
+  || code == PAREN_EXPR)
+return;
+
   if (CONVERT_EXPR_CODE_P (code)
   || code == FLOAT_EXPR
-  || code == FIX_TRUNC_EXPR
-  || code == VIEW_CONVERT_EXPR)
+  || code == FIX_TRUNC_EXPR)
 return;
 
   /* The signedness is determined from input argument.  */
-- 
2.43.0


[PATCH]AArch64: Fix signbit mask creation after late combine [PR116229]

2024-08-07 Thread Tamar Christina
Hi All,

The optimization to generate a DI signbit constant by using fneg was relying
on nothing being able to push the constant into the negate.  It's run quite
late for this reason.

However late combine now runs after it and triggers RTL simplification based on
the neg.  When -fno-signed-zeros this ends up dropping the - from the -0.0 and
this producing incorrect code.

This change adds a new unspec FNEG on DI mode which prevents this simplication.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/116229
* config/aarch64/aarch64-simd.md (aarch64_fnegv2di2): New.
* config/aarch64/aarch64.cc (aarch64_maybe_generate_simd_constant):
Update call to gen_aarch64_fnegv2di2.
* config/aarch64/iterators.md: New UNSPEC_FNEG.

gcc/testsuite/ChangeLog:

PR target/116229
* gcc.target/aarch64/pr116229.c: New test.

---
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
459e11b09a19cdc97a5153cfd8c4e0e07a7ffb0c..75b2d6cf3ea0902cfe89c2f54a7e60e041fba536
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2629,6 +2629,15 @@ (define_insn "neg2"
   [(set_attr "type" "neon_fp_neg_")]
 )
 
+(define_insn "aarch64_fnegv2di2"
+ [(set (match_operand:V2DI 0 "register_operand" "=w")
+   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "w")]
+ UNSPEC_FNEG))]
+ "TARGET_SIMD"
+ "fneg\\t%0.2d, %1.2d"
+  [(set_attr "type" "neon_fp_neg_d")]
+)
+
 (define_insn "abs2"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
(abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9810f2c039004cae4df37d07b5dcac948745011a..04fa4e71ae1ed2047f304a7a0e9607c7dc790652
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11804,8 +11804,8 @@ aarch64_maybe_generate_simd_constant (rtx target, rtx 
val, machine_mode mode)
   /* Use the same base type as aarch64_gen_shareable_zero.  */
   rtx zero = CONST0_RTX (V4SImode);
   emit_move_insn (lowpart_subreg (V4SImode, target, mode), zero);
-  rtx neg = lowpart_subreg (V2DFmode, target, mode);
-  emit_insn (gen_negv2df2 (neg, copy_rtx (neg)));
+  rtx neg = lowpart_subreg (V2DImode, target, mode);
+  emit_insn (gen_aarch64_fnegv2di2 (neg, copy_rtx (neg)));
   return true;
 }
 
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 
95fe8f070f4c3f5770e4424162bf13b712adedf3..92bebcf48b1e4462537e8ca2b97df46de5e73cb5
 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -686,6 +686,7 @@ (define_c_enum "unspec"
 UNSPEC_FMINNMV ; Used in aarch64-simd.md.
 UNSPEC_FMINV   ; Used in aarch64-simd.md.
 UNSPEC_FADDV   ; Used in aarch64-simd.md.
+UNSPEC_FNEG; Used in aarch64-simd.md.
 UNSPEC_ADDV; Used in aarch64-simd.md.
 UNSPEC_SMAXV   ; Used in aarch64-simd.md.
 UNSPEC_SMINV   ; Used in aarch64-simd.md.
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116229.c 
b/gcc/testsuite/gcc.target/aarch64/pr116229.c
new file mode 100644
index 
..cc42078478f77b3ee96de3e7fe853088d0c57c1c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116229.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-signed-zeros" } */
+
+typedef __attribute__((__vector_size__ (8))) unsigned long V;
+
+V __attribute__((__noipa__))
+foo (void)
+{
+  return (V){ 0x8000 };
+}
+
+V ref = (V){ 0x8000 };
+
+int
+main ()
+{
+  V v = foo ();
+  if (v[0] != ref[0])
+__builtin_abort();
+}




-- 
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 459e11b09a19cdc97a5153cfd8c4e0e07a7ffb0c..75b2d6cf3ea0902cfe89c2f54a7e60e041fba536 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2629,6 +2629,15 @@ (define_insn "neg2"
   [(set_attr "type" "neon_fp_neg_")]
 )
 
+(define_insn "aarch64_fnegv2di2"
+ [(set (match_operand:V2DI 0 "register_operand" "=w")
+   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "w")]
+		  UNSPEC_FNEG))]
+ "TARGET_SIMD"
+ "fneg\\t%0.2d, %1.2d"
+  [(set_attr "type" "neon_fp_neg_d")]
+)
+
 (define_insn "abs2"
  [(set (match_operand:VHSDF 0 "register_operand" "=w")
(abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 9810f2c039004cae4df37d07b5dcac948745011a..04fa4e71ae1ed2047f304a7a0e9607c7dc790652 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -11804,8 +11804,8 @@ aarch64_maybe_generate_simd_constant (rtx target, rtx val, machine_mode mode)
   /* Use the same base type as aarch64_gen_shareable_ze

[PATCH] RISC-V: Fix ICE for vector single-width integer multiply-add intrinsics

2024-08-07 Thread Jin Ma
When rs1 is the immediate 0, the following ICE occurs:

error: unrecognizable insn:
(insn 8 5 12 2 (set (reg:RVVM1DI 134 [  ])
(if_then_else:RVVM1DI (unspec:RVVMF64BI [
(const_vector:RVVMF64BI repeat [
(const_int 1 [0x1])
])
(reg/v:DI 137 [ vl ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 
[0]))
(reg/v:RVVM1DI 136 [ vs2 ]))
(reg/v:RVVM1DI 135 [ vd ]))
(reg/v:RVVM1DI 135 [ vd ])))

gcc/ChangeLog:

* config/riscv/vector.md: Allow scalar operand to be 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-7.c: New test.
* gcc.target/riscv/rvv/base/bug-8.c: New test.
---
 gcc/config/riscv/vector.md| 80 +--
 .../gcc.target/riscv/rvv/base/bug-7.c | 26 ++
 .../gcc.target/riscv/rvv/base/bug-8.c | 26 ++
 3 files changed, 92 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/bug-8.c

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index fb625f611d5..ab60a5bce32 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -5331,16 +5331,16 @@ (define_insn "*pred_madd_scalar"
  (plus:V_VLSI
(mult:V_VLSI
  (vec_duplicate:V_VLSI
-   (match_operand: 2 "register_operand" "  r,   r,  r,   r"))
+   (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  rJ"))
  (match_operand:V_VLSI 3 "register_operand"  "  0,  vr,  0,  
vr"))
(match_operand:V_VLSI 4 "register_operand"" vr,  vr, vr,  
vr"))
  (match_dup 3)))]
   "TARGET_VECTOR"
   "@
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%2,%4%p1"
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%z2,%4%p1
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m3r.v\t%0,%3\;vmadd.vx\t%0,%z2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5363,16 +5363,16 @@ (define_insn "*pred_macc_scalar"
  (plus:V_VLSI
(mult:V_VLSI
  (vec_duplicate:V_VLSI
-   (match_operand: 2 "register_operand" "  r,   r,  r,   r"))
+   (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  rJ"))
  (match_operand:V_VLSI 3 "register_operand"  " vr,  vr, vr,  
vr"))
(match_operand:V_VLSI 4 "register_operand""  0,  vr,  0,  
vr"))
  (match_dup 4)))]
   "TARGET_VECTOR"
   "@
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmacc.vx\t%0,%z2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%z2,%3%p1
+   vmacc.vx\t%0,%z2,%3%p1
+   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%z2,%3%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "4")
@@ -5431,16 +5431,16 @@ (define_insn "*pred_madd_extended_scalar"
(mult:V_VLSI_D
  (vec_duplicate:V_VLSI_D
(sign_extend:
- (match_operand: 2 "register_operand" "  r,   r,  r,   
r")))
+ (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  
rJ")))
  (match_operand:V_VLSI_D 3 "register_operand" "  0,  vr,  
0,  vr"))
(match_operand:V_VLSI_D 4 "register_operand"   " vr,  vr, 
vr,  vr"))
  (match_dup 3)))]
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1
-   vmadd.vx\t%0,%2,%4%p1
-   vmv%m2r.v\t%0,%2\;vmadd.vx\t%0,%2,%4%p1"
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m2r.v\t%0,%z2\;vmadd.vx\t%0,%z2,%4%p1
+   vmadd.vx\t%0,%z2,%4%p1
+   vmv%m2r.v\t%0,%z2\;vmadd.vx\t%0,%z2,%4%p1"
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")
(set_attr "merge_op_idx" "3")
@@ -5464,16 +5464,16 @@ (define_insn "*pred_macc_extended_scalar"
(mult:V_VLSI_D
  (vec_duplicate:V_VLSI_D
(sign_extend:
- (match_operand: 2 "register_operand" "  r,   r,  r,   
r")))
+ (match_operand: 2 "reg_or_0_operand" " rJ,  rJ, rJ,  
rJ")))
  (match_operand:V_VLSI_D 3 "register_operand" " vr,  vr, 
vr,  vr"))
(match_operand:V_VLSI_D 4 "register_operand"   "  0,  vr,  
0,  vr"))
  (match_dup 4)))]
   "TARGET_VECTOR && !TARGET_64BIT"
   "@
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1
-   vmacc.vx\t%0,%2,%3%p1
-   vmv%m4r.v\t%0,%4\;vmacc.vx\t%0,%2,%3%p1"
+   vmacc.vx\t%0,%z2,%

Re: [RFC][PATCH] SVE intrinsics: Fold svdiv (svptrue, x, x) to ones

2024-08-07 Thread Richard Sandiford
This has been an active recent discussion on irc.  I'll try to summarise
my position there here:

Ramana Radhakrishnan  writes:
>> On 6 Aug 2024, at 4:14 PM, Richard Sandiford  
>> wro>> Kyrylo Tkachov  writes:
 On 5 Aug 2024, at 18:00, Richard Sandiford  
 wro Kyrylo Tkachov  writes:
>> On 5 Aug 2024, at 12:01, Richard Sandiford  
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Jennifer Schmitz  writes:
>>> This patch folds the SVE intrinsic svdiv into a vector of 1's in case
>>> 1) the predicate is svptrue and
>>> 2) dividend and divisor are equal.
>>> This is implemented in the gimple_folder for signed and unsigned
>>> integers. Corresponding test cases were added to the existing test
>>> suites.
>>> 
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>> 
>>> Please also advise whether it makes sense to implement the same 
>>> optimization
>>> for float types and if so, under which conditions?
>> 
>> I think we should instead use const_binop to try to fold the division
>> whenever the predicate is all-true, or if the function uses _x 
>> predication.
>> (As a follow-on, we could handle _z and _m too, using VEC_COND_EXPR.)
>> 
> 
> From what I can see const_binop only works on constant arguments.
 
 Yeah, it only produces a result for constant arguments.  I see now
 that that isn't the case that the patch is interested in, sorry.
 
> Is fold_binary a better interface to use ? I think it’d hook into the 
> match.pd machinery for divisions at some point.
 
 We shouldn't use that from gimple folders AIUI, but perhaps I misremember.
 (I realise we'd be using it only to test whether the result is constant,
 but even so.)
 
 Have you (plural) come across a case where svdiv is used with equal
 non-constant arguments?  If it's just being done on first principles
 then how about starting with const_binop instead?  If possible, it'd be
 good to structure it so that we can reuse the code for svadd, svmul,
 svsub, etc.
>>> 
>>> We’ve had a bit of internal discussion on this to get our ducks in a row.
>>> We are interested in having more powerful folding of SVE intrinsics 
>>> generally and we’d like some advice on how best to approach this.
>>> Prathamesh suggested adding code to fold intrinsics to standard GIMPLE 
>>> codes where possible when they are _x-predicated or have a ptrue predicate. 
>>> Hopefully that would allow us to get all the match.pd and fold-const.cc 
>>>  optimizations “for free”.
>>> Would that be a reasonable direction rather than adding custom folding code 
>>> to individual intrinsics such as svdiv?
>>> We’d need to ensure that the midend knows how to expand such GIMPLE codes 
>>> with VLA types and that the required folding rules exist in match.pd 
>>> (though maybe they work already for VLA types?)
>> 
>> Expansion shouldn't be a problem, since we already rely on that for
>> autovectorisation.
>> 
>> But I think this comes back to what we discussed earlier, in the context
>> of whether we should replace divisions by constants with multi-instruction
>> alternatives.  My comment there was:
>
>
>> 
>> 
>>  If people want to write out a calculation in natural arithmetic, it
>>  would be better to write the algorithm in scalar code and let the
>>  vectoriser handle it.  That gives the opportunity for many more
>>  optimisations than just this one.
>> 
>
>
>
> It’s been a while and apologies if I’m coming in a bit late in this and 
> possibly that thinking has moved on. I’ve always viewed ACLE as an extension 
> to the language and thus fair game for compilers to optimise . For folks who 
> really really need that instruction there’s also inline asm :) 

But the language already provides division via /.  GCC doesn't support
that yet for VLA SVE vectors, but clang does, and Tejas is looking at
adding the corresponding support to GCC.

If people just want to add vectors, divide vectors, etc., without any
preference about implementation, IMO it's better to let them express that
directly with generic features, rather than force them to use target-specific
instruction-derived intrinsics like svdiv_x.  That would also make the code
more portable across targets.

So, on the "there's also inline asm" point: I'd argue that (with Tejas's
work), there's also generic C/C++ for people who just want to express
dataflow and let the compiler do the instruction selection.

One of the advantages of intrinsics (at least as currently implemented
for SVE, but I think in practice more generally) is that they let
programmers do vector instruction selection while leaving the compiler
to do things like register allocation, loop control, ivopts, addressing
mode selection, etc.  It can act as a form of high-level assem

Re: [PATCH]AArch64: Fix signbit mask creation after late combine [PR116229]

2024-08-07 Thread Richard Sandiford
Tamar Christina  writes:
> Hi All,
>
> The optimization to generate a DI signbit constant by using fneg was relying
> on nothing being able to push the constant into the negate.  It's run quite
> late for this reason.
>
> However late combine now runs after it and triggers RTL simplification based 
> on
> the neg.  When -fno-signed-zeros this ends up dropping the - from the -0.0 and
> this producing incorrect code.
>
> This change adds a new unspec FNEG on DI mode which prevents this 
> simplication.

Yeah, agreed that that's the most robust fix.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/116229
>   * config/aarch64/aarch64-simd.md (aarch64_fnegv2di2): New.
>   * config/aarch64/aarch64.cc (aarch64_maybe_generate_simd_constant):
>   Update call to gen_aarch64_fnegv2di2.
>   * config/aarch64/iterators.md: New UNSPEC_FNEG.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/116229
>   * gcc.target/aarch64/pr116229.c: New test.

LGTM.  OK if there are no objections in 24 hours.

Thanks,
Richard

>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 459e11b09a19cdc97a5153cfd8c4e0e07a7ffb0c..75b2d6cf3ea0902cfe89c2f54a7e60e041fba536
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2629,6 +2629,15 @@ (define_insn "neg2"
>[(set_attr "type" "neon_fp_neg_")]
>  )
>  
> +(define_insn "aarch64_fnegv2di2"
> + [(set (match_operand:V2DI 0 "register_operand" "=w")
> +   (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "w")]
> +   UNSPEC_FNEG))]
> + "TARGET_SIMD"
> + "fneg\\t%0.2d, %1.2d"
> +  [(set_attr "type" "neon_fp_neg_d")]
> +)
> +
>  (define_insn "abs2"
>   [(set (match_operand:VHSDF 0 "register_operand" "=w")
> (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w")))]
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 9810f2c039004cae4df37d07b5dcac948745011a..04fa4e71ae1ed2047f304a7a0e9607c7dc790652
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -11804,8 +11804,8 @@ aarch64_maybe_generate_simd_constant (rtx target, rtx 
> val, machine_mode mode)
>/* Use the same base type as aarch64_gen_shareable_zero.  */
>rtx zero = CONST0_RTX (V4SImode);
>emit_move_insn (lowpart_subreg (V4SImode, target, mode), zero);
> -  rtx neg = lowpart_subreg (V2DFmode, target, mode);
> -  emit_insn (gen_negv2df2 (neg, copy_rtx (neg)));
> +  rtx neg = lowpart_subreg (V2DImode, target, mode);
> +  emit_insn (gen_aarch64_fnegv2di2 (neg, copy_rtx (neg)));
>return true;
>  }
>  
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> 95fe8f070f4c3f5770e4424162bf13b712adedf3..92bebcf48b1e4462537e8ca2b97df46de5e73cb5
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -686,6 +686,7 @@ (define_c_enum "unspec"
>  UNSPEC_FMINNMV   ; Used in aarch64-simd.md.
>  UNSPEC_FMINV ; Used in aarch64-simd.md.
>  UNSPEC_FADDV ; Used in aarch64-simd.md.
> +UNSPEC_FNEG  ; Used in aarch64-simd.md.
>  UNSPEC_ADDV  ; Used in aarch64-simd.md.
>  UNSPEC_SMAXV ; Used in aarch64-simd.md.
>  UNSPEC_SMINV ; Used in aarch64-simd.md.
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr116229.c 
> b/gcc/testsuite/gcc.target/aarch64/pr116229.c
> new file mode 100644
> index 
> ..cc42078478f77b3ee96de3e7fe853088d0c57c1c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr116229.c
> @@ -0,0 +1,20 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-signed-zeros" } */
> +
> +typedef __attribute__((__vector_size__ (8))) unsigned long V;
> +
> +V __attribute__((__noipa__))
> +foo (void)
> +{
> +  return (V){ 0x8000 };
> +}
> +
> +V ref = (V){ 0x8000 };
> +
> +int
> +main ()
> +{
> +  V v = foo ();
> +  if (v[0] != ref[0])
> +__builtin_abort();
> +}


[PATCH] coroutines: diagnose usage of alloca in coroutines

2024-08-07 Thread Arsen Arsenović
Tested on x86_64-pc-linux-gnu.  OK for trunk?
-- >8 --
We do not support it currently, and the resulting memory can only be
used inside a single resumption, so best not confuse the user with it.

PR c++/115858 - Incompatibility of coroutines and alloca()

gcc/ChangeLog:

PR c++/115858
* coroutine-passes.cc (execute_early_expand_coro_ifns): Emit a
sorry if a statement is an alloca call.

gcc/testsuite/ChangeLog:

PR c++/115858
* g++.dg/coroutines/pr115858.C: New test.
---
 gcc/coroutine-passes.cc| 10 ++
 gcc/testsuite/g++.dg/coroutines/pr115858.C | 23 ++
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/coroutines/pr115858.C

diff --git a/gcc/coroutine-passes.cc b/gcc/coroutine-passes.cc
index c0d6eca7c070..9124ecae5916 100644
--- a/gcc/coroutine-passes.cc
+++ b/gcc/coroutine-passes.cc
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "tree-pass.h"
 #include "ssa.h"
+#include "calls.h"
 #include "cgraph.h"
 #include "pretty-print.h"
 #include "diagnostic-core.h"
@@ -306,6 +307,15 @@ execute_early_expand_coro_ifns (void)
   {
gimple *stmt = gsi_stmt (gsi);
 
+   /* Tell the user about 'alloca', we don't support it yet.  */
+   if (gimple_alloca_call_p (stmt))
+ {
+   sorry_at (gimple_location (stmt),
+ "% is not yet supported in coroutines");
+   gsi_next (&gsi);
+   continue;
+ }
+
if (!is_gimple_call (stmt) || !gimple_call_internal_p (stmt))
  {
gsi_next (&gsi);
diff --git a/gcc/testsuite/g++.dg/coroutines/pr115858.C 
b/gcc/testsuite/g++.dg/coroutines/pr115858.C
new file mode 100644
index ..3dfe820dbdfd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/coroutines/pr115858.C
@@ -0,0 +1,23 @@
+#include 
+
+struct task
+{
+  struct promise_type
+  {
+void return_void () {}
+task get_return_object () { return {}; }
+void unhandled_exception () {}
+std::suspend_never initial_suspend () { return {}; }
+std::suspend_never final_suspend () noexcept { return {}; }
+  };
+};
+
+task
+f ()
+{
+  void* a = __builtin_alloca (10);
+  // { dg-message "sorry, unimplemented: 'alloca' is not yet supported in 
coroutines" "" { target *-*-* } {.-1} }
+  void* b = __builtin_alloca_with_align (10, 16);
+  // { dg-message "sorry, unimplemented: 'alloca' is not yet supported in 
coroutines" "" { target *-*-* } {.-1} }
+  co_return;
+}
-- 
2.45.2



[to-be-committed][RISC-V][PR target/116240] Ensure object is a comparison before extracting arguments

2024-08-07 Thread Jeff Law
This was supposed to go out the door yesterday, but I kept getting 
interrupted.


The target bits for rtx costing can't assume the rtl they're given 
actually matches a target pattern.   It's just kind of inherent in how 
the costing routines get called in various places.


In this particular case we're trying to cost a conditional move:

(set (dest) (if_then_else (cond) (true) (false))


On the RISC-V port the backend only allows actual conditionals for COND. 
 So something like (eq (reg) (const_int 0)).  In the costing code for 
if-then-else we did something like


(XEXP (XEXP (cond, 0), 0)))

Which fails miserably if COND is a terminal node like (reg) rather than 
(ne (reg) (const_int 0)


So this patch tightens up the RTL scanning to ensure that we have a 
comparison before we start looking at the comparison's arguments.


Run through my tester without incident, but I'll wait for the pre-commit 
tester to run through a cycle before pushing to the trunk.


Jeff

ps.   We probably could support a naked REG for the condition and 
internally convert it to (ne (reg) (const_int 0)), but I don't think it 
likely happens with any regularity.



PR target/116240
gcc/
* config/riscv/riscv.cc (riscv_rtx_costs): Ensure object is a
comparison before looking at its arguments.

gcc/testsuite
* gcc.target/riscv/pr116240.c: New test.


diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 5fe4273beb7..3d0a1d12b14 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3646,9 +3646,11 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
&& XEXP (x, 2) == CONST0_RTX (GET_MODE (XEXP (x, 1
   || (GET_CODE (XEXP (x, 2)) == REG
   && XEXP (x, 1) == CONST0_RTX (GET_MODE (XEXP (x, 2
-  || (GET_CODE (XEXP (x, 1)) == REG
+  || (COMPARISON_P (XEXP (x, 0))
+  && GET_CODE (XEXP (x, 1)) == REG
   && rtx_equal_p (XEXP (x, 1), XEXP (XEXP (x, 0), 0)))
-  || (GET_CODE (XEXP (x, 1)) == REG
+  || (COMPARISON_P (XEXP (x, 0))
+  && GET_CODE (XEXP (x, 1)) == REG
   && rtx_equal_p (XEXP (x, 2), XEXP (XEXP (x, 0), 0)
{
  *total = COSTS_N_INSNS (1);
diff --git a/gcc/testsuite/gcc.target/riscv/pr116240.c 
b/gcc/testsuite/gcc.target/riscv/pr116240.c
new file mode 100644
index 000..7e3eaa2f544
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr116240.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fwrapv 
-march=rv64imvxtheadcondmov_xventanacondops -mabi=lp64d" } */
+
+int a, b;
+void c() {
+  int e = a >= 2 ? b : a;
+  short d = e * 2;
+  if (d)
+for (;;)
+  ;
+}
+


Re: [PATCH] genoutput: Accelerate the place_operands function.

2024-08-07 Thread Richard Sandiford
Xianmiao Qu  writes:
> With the increase in the number of modes and patterns for some
> backend architectures, the place_operands function becomes a
> bottleneck in the speed of genoutput, and may even become a
> bottleneck in the overall speed of building the GCC project.
> This patch aims to accelerate the place_operands function,
> the optimizations it includes are:
> 1. Use a hash table to store operand information,
>improving the lookup time for the first operand.
> 2. Move mode comparison to the beginning to avoid the scenarios of most 
> strcmp.
>
> I tested the speed improvements for the following backends,
>   Improvement Ratio
> x86_64197.9%
> aarch64   954.5%
> riscv 2578.6%
> If the build machine is slow, then this improvement can save a lot of time.
>
> I tested the genoutput output for x86_64/aarch64/riscv backends,
> and there was no difference compared to before the optimization,
> so this shouldn't introduce any functional issues.

Looks like a nice speed-up thanks.

A couple of general points:

* Could you try using the more type-safe hash-table.h, instead of hashtab.h?
  Similarly inchash.h for the hashing.

* Although this wasn't always the style in older code, the preference now
  is to put new functions before their first use where possible, to avoid
  forward declarations.

A couple of very minor comments below.

> ---
>  gcc/genoutput.cc | 101 ---
>  1 file changed, 95 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/genoutput.cc b/gcc/genoutput.cc
> index efd81766bb5b..456d96112cfb 100644
> --- a/gcc/genoutput.cc
> +++ b/gcc/genoutput.cc
> @@ -112,6 +112,8 @@ static int next_operand_number = 1;
>  struct operand_data
>  {
>struct operand_data *next;
> +  /* Point to the next member with the same hash value in the hash table.  */
> +  struct operand_data *eq_next;
>int index;
>const char *predicate;
>const char *constraint;
> @@ -127,11 +129,12 @@ struct operand_data
>  
>  static struct operand_data null_operand =
>  {
> -  0, 0, "", "", E_VOIDmode, 0, 0, 0, 0, 0
> +  0, 0, 0, "", "", E_VOIDmode, 0, 0, 0, 0, 0
>  };
>  
>  static struct operand_data *odata = &null_operand;
>  static struct operand_data **odata_end = &null_operand.next;
> +static htab_t operand_data_table;
>  
>  /* Must match the constants in recog.h.  */
>  
> @@ -180,6 +183,11 @@ static void place_operands (class data *);
>  static void process_template (class data *, const char *);
>  static void validate_insn_alternatives (class data *);
>  static void validate_insn_operands (class data *);
> +static hashval_t hash_struct_operand_data (const void *);
> +static int eq_struct_operand_data (const void *, const void *);
> +static void insert_operand_data (struct operand_data *);
> +static struct operand_data *lookup_operand_data (struct operand_data *);
> +static void init_operand_data_table (void);
>  
>  class constraint_data
>  {
> @@ -532,6 +540,13 @@ compare_operands (struct operand_data *d0, struct 
> operand_data *d1)
>  {
>const char *p0, *p1;
>  
> +  /* On one hand, comparing strings for predicate and constraint
> + is time-consuming, and on the other hand, the probability of
> + different modes is relatively high. Therefore, checking the mode
> + first can speed up the execution of the program.  */
> +  if (d0->mode != d1->mode)
> +return 0;
> +
>p0 = d0->predicate;
>if (!p0)
>  p0 = "";
> @@ -550,9 +565,6 @@ compare_operands (struct operand_data *d0, struct 
> operand_data *d1)
>if (strcmp (p0, p1) != 0)
>  return 0;
>  
> -  if (d0->mode != d1->mode)
> -return 0;
> -
>if (d0->strict_low != d1->strict_low)
>  return 0;
>  
> @@ -577,9 +589,9 @@ place_operands (class data *d)
>return;
>  }
>  
> +  od = lookup_operand_data (&d->operand[0]);
>/* Brute force substring search.  */
> -  for (od = odata, i = 0; od; od = od->next, i = 0)
> -if (compare_operands (od, &d->operand[0]))
> +  for (i = 0; od; od = od->eq_next, i = 0)

I think we should move the i = 0 to after the loop, for the "no match" case.
As it stands, each iteration immediate sets i to 1.

The loop body should be moved 2 columns to the left, to account for the
removed if condition.

Richard

>{
>   od2 = od->next;
>   i = 1;
> @@ -605,6 +617,7 @@ place_operands (class data *d)
>*odata_end = od2;
>odata_end = &od2->next;
>od2->index = next_operand_number++;
> +  insert_operand_data (od2);
>  }
>*odata_end = NULL;
>return;
> @@ -1049,6 +1062,7 @@ main (int argc, const char **argv)
>progname = "genoutput";
>  
>init_insn_for_nothing ();
> +  init_operand_data_table ();
>  
>if (!init_rtx_reader_args (argc, argv))
>  return (FATAL_EXIT_CODE);
> @@ -1224,3 +1238,78 @@ mdep_constraint_len (const char *s, file_location loc, 
> int opno)
>message_at (loc, "note:  in operand %d", opno);
>return 1; /* safe */
>  }
> +
> +/*

Re: [PATCH] RISC-V: Add auto-vect pattern for vector rotate shift

2024-08-07 Thread Jeff Law




On 8/7/24 3:42 AM, Feng Wang wrote:

This patch add the vector rotate shift pattern for auto-vect.
With this patch, the scalar rotate shift can be automatically
vectorized into vector rotate shift.

signed-off-by: Feng Wang 
gcc/ChangeLog:

* config/riscv/autovec-opt.md (v3):
Add define_expand for vector rotate shift.
I suspect pre-commit hooks will reject this ChangeLog entry as it has an 
extraeneous tab on the "Add define_expand" line.  Obviously the filename 
will need adjustment per Juzhe's request to move this code into vector.md.


The lint phase of the pre-commit testing flagged a case where you've got 
8 spaces instead of a tab in your new pattern.  So you should fix that 
as well.


OK for the trunk with those fixes.  No need to wait for another 
pre-commit testrun or approval.


Jeff



Re: [PATCH] RISC-V: Fix ICE for vector single-width integer multiply-add intrinsics

2024-08-07 Thread Jeff Law




On 8/7/24 7:01 AM, Jin Ma wrote:

When rs1 is the immediate 0, the following ICE occurs:

error: unrecognizable insn:
(insn 8 5 12 2 (set (reg:RVVM1DI 134 [  ])
 (if_then_else:RVVM1DI (unspec:RVVMF64BI [
 (const_vector:RVVMF64BI repeat [
 (const_int 1 [0x1])
 ])
 (reg/v:DI 137 [ vl ])
 (const_int 2 [0x2]) repeated x2
 (const_int 0 [0])
 (reg:SI 66 vl)
 (reg:SI 67 vtype)
 ] UNSPEC_VPREDICATE)
 (plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 
[0]))
 (reg/v:RVVM1DI 136 [ vs2 ]))
 (reg/v:RVVM1DI 135 [ vd ]))
 (reg/v:RVVM1DI 135 [ vd ])))

gcc/ChangeLog:

* config/riscv/vector.md: Allow scalar operand to be 0.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/bug-7.c: New test.
* gcc.target/riscv/rvv/base/bug-8.c: New test.
So I think we had a patch from Pan last year which tried to optimize 
this case.  While I don't think it makes sense to try and optimize this 
case, we absolutely do need to avoid the ICE.


The lint phase of the pre-commit tester has several spaces vs tabs 
errors flagged.  Fix those and this is fine for the trunk.  No need to 
retest or wait for additional approvals.




https://github.com/ewlu/gcc-precommit-ci/issues/2021#issuecomment-2273493378




Jeff


[PATCH] c++: Implement CWG2387 - Linkage of const-qualified variable template [PR109126]

2024-08-07 Thread Jakub Jelinek
Hi!

The following patch attempts to implement DR2387 by making variable
templates including their specialization TREE_PUBLIC when at file
scope and they don't have static storage class.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-08-07  Jakub Jelinek  

PR c++/109126
* decl.cc (grokvardecl): Implement CWG 2387 - Linkage of
const-qualified variable template.  Set TREE_PUBLIC on variable
templates with const qualified types unless static is present.

* g++.dg/DRs/dr2387.C: New test.
* g++.dg/DRs/dr2387-aux.cc: New file.

--- gcc/cp/decl.cc.jj   2024-08-06 11:05:29.147469440 +0200
+++ gcc/cp/decl.cc  2024-08-07 11:58:11.275368835 +0200
@@ -11225,6 +11225,8 @@ grokvardecl (tree type,
|| ! constp
|| volatilep
|| inlinep
+   || in_template_context
+   || processing_specialization
|| module_attach_p ()));
   TREE_STATIC (decl) = ! DECL_EXTERNAL (decl);
 }
--- gcc/testsuite/g++.dg/DRs/dr2387.C.jj2024-08-07 12:11:14.372115971 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2387.C   2024-08-07 12:13:35.089273604 +0200
@@ -0,0 +1,22 @@
+// DR 2387
+// { dg-do run { target c++14 } }
+// { dg-additional-sources "dr2387-aux.cc" }
+
+template 
+const int a = N;
+template 
+static const int b = N;
+template 
+extern const int c = N;
+template 
+const volatile int d = N;
+template 
+const int e = N;
+template <>
+const int e <43> = 44;
+
+const int *pa = &a <42>;
+const int *pb = &b <42>;
+const int *pc = &c <42>;
+const volatile int *pd = &d <42>;
+const int *pe = &e <43>;
--- gcc/testsuite/g++.dg/DRs/dr2387-aux.cc.jj   2024-08-07 12:11:09.388181223 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2387-aux.cc  2024-08-07 12:13:25.321401491 
+0200
@@ -0,0 +1,25 @@
+// DR 2387
+
+template 
+extern const int a;
+template 
+static const int b = N;
+template 
+extern const int c;
+template 
+extern const volatile int d;
+template 
+extern const int e;
+extern const int *pa, *pb, *pc, *pe;
+extern const volatile int *pd;
+
+int
+main ()
+{
+  if (pa != &a <42>
+  || pb == &b <42>
+  || pc != &c <42>
+  || pd != &d <42>
+  || pe != &e <43>)
+__builtin_abort ();
+}

Jakub



Re: [PATCH v2 1/1] RISC-V: Support BF16 interfaces in libgcc

2024-08-07 Thread Jeff Law




On 8/7/24 1:16 AM, Jakub Jelinek wrote:



This looks all wrong to me.

On all the other targets that already do support __bf16 type it is a storage
only type, so all arithmetics on it is expected to be done on float, not in
__bf16.
RISC-V has (via extensions) degrees of arithmetic/conversion support, so 
for example it can do a multiply-add of bf16 operands widening to float.




So, if riscv wants something different (will there by e.g. any libm
implementation with all the __bf16 APIs though?), it should ask for it some way
(target hook or whatever) and only in that case it should enable the other
builtins, libgcc APIs etc.
ISTM for the limited cases where we want native bf16 support we could 
just have target specific builtins.


I'm not sure what the motivation is behind trying to support the richer 
set of operations really is.  So perhaps Xiao could start with 
explaining why this is important.


jeff



Re: [PATCH 1/3] RISC-V: testsuite: xtheadfmemidx: Rename test and add similar Zfa test

2024-08-07 Thread Jeff Law




On 8/7/24 12:27 AM, Christoph Müllner wrote:

Test file xtheadfmemidx-medany.c has been added in b79cd204c780 as a
test case that provoked an ICE when loading DFmode registers via two
SImode register loads followed by a SI->DF[63:32] move from XTheadFmv.
Since Zfa is affected in the same way as XTheadFmv, even if both
have slightly different instructions, let's add a test for Zfa as well
and give the tests proper names.

Let's also add a test into the test files that counts the SI->DF moves
from XTheadFmv/Zfa.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-medany.c: Move to...
* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: ...here.
* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: New test.

OK
jeff



Re: [PATCH v2 1/1] RISC-V: Support BF16 interfaces in libgcc

2024-08-07 Thread Jakub Jelinek
On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote:
> 
> 
> On 8/7/24 1:16 AM, Jakub Jelinek wrote:
> 
> > 
> > This looks all wrong to me.
> > 
> > On all the other targets that already do support __bf16 type it is a storage
> > only type, so all arithmetics on it is expected to be done on float, not in
> > __bf16.
> RISC-V has (via extensions) degrees of arithmetic/conversion support, so for
> example it can do a multiply-add of bf16 operands widening to float.

Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on
other targets), but there those functions are at least part of C23, even
when they are really not implemented yet in libm (at least talking about
glibc, but I doubt other C libraries are any further than that).
For __bf16, the only standard required stuff is in C++23 and the provided
builtins are whatever was necessary for that.

I understand RISC-V has via extensions more full _Float16 and __bf16
support, but if it needs further builtins, the questions are:
1) should they be enabled on all arches or just on those that need them?
2) is there plan to add libm support for __bf16, even when it is
non-standard in C (especially if we don't know if C2y or newer will or won't
add support for it and if it will use the chosen suffixes or some others)?
3) is there plan to add variants for C++23  and  etc.
to handle _Float16 and __bf16 differently?  Currently those types are just
handled by doing as much as possible on float, using its builtins

Jakub



Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Joseph Myers
On Wed, 7 Aug 2024, Alejandro Colomar wrote:

> +@node Length
> +@section Determining the Length of Arrays
> +@cindex lengthof
> +@cindex length
> +@cindex array length
> +
> +The keyword @code{__lengthof__} determines the length of an array operand,
> +that is, the number of elements in the array.
> +Its syntax is just like @code{sizeof}.
> +The operand must be a complete array type.

I think you mean the operand must be *an expression whose type is a 
complete array type* or *a type name for a complete array type*.  The 
wording you have suggests only type names, you need to be clear about both 
kinds of operands being possible (and include examples for them).

> +@smallexample
> +__lengthof__ (int [7][n++]);  // constexpr
> +__lengthof__ (int [n++][7]);  // run-time value
> +@end smallexample

I don't think using "constexpr" to mean "constant expression" is a good 
idea, they're different things.

> +void
> +incomplete (int p[])
> +{
> +  unsigned n;
> +
> +  n = __lengthof__ (x);  /* { dg-error "incomplete" } */
> +
> +  /* We want to support the following one in the future,
> + but for now it should fail.  */
> +  n = __lengthof__ (p);  /* { dg-error "invalid" } */

This seems to be the only test you have for a non-array operand.  I'd 
expect such tests (both for type name operands and for expression 
operands) covering cases that we *don't* want to support in future, not 
just this one that we would like to be supportable in future.

I don't see any tests for the constraints on external definitions from 
6.9.1 that we discussed - that referenced to undefined internal linkage 
identifiers are OK inside __lengthof__ returning a constant (both 
constant-length arrays of non-VLA and constant-length arrays of VLA) but 
not in the cases where __lengthof__ is evaluated.

-- 
Joseph S. Myers
josmy...@redhat.com



[PATCH] c++: erroneous partial spec vs primary tmpl [PR116064]

2024-08-07 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

When a partial specialization is deemed erroneous at parse time, we
currently flag the primary template as erroneous instead.  Later
at instantiation time we check if the primary template is erroneous
rather than the selected partial specialization, so at least we're
consistent.

But it's better not to conflate a partial specialization with the
primary template since they're instantiated independenty.  This avoids
rejecting the instantiation of A in the below testcase.

PR c++/116064

gcc/cp/ChangeLog:

* error.cc (get_current_template): If the current scope is
a partial specialization, return it instead of the primary
template.
* pt.cc (instantiate_class_template): Pass the partial
specialization if any to maybe_diagnose_erroneous_template
instead of the primary template.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error2.C: New test.
---
 gcc/cp/error.cc   |  6 +-
 gcc/cp/pt.cc  |  2 +-
 gcc/testsuite/g++.dg/template/permissive-error2.C | 15 +++
 3 files changed, 21 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/permissive-error2.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 6c22ff55b46..879e5a115cf 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -173,7 +173,11 @@ get_current_template ()
 {
   if (scope_chain && in_template_context && !current_instantiation ())
 if (tree ti = get_template_info (current_scope ()))
-  return TI_TEMPLATE (ti);
+  {
+   if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti)) && TI_PARTIAL_INFO (ti))
+ ti = TI_PARTIAL_INFO (ti);
+   return TI_TEMPLATE (ti);
+  }
 
   return NULL_TREE;
 }
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 542962b6387..3e55d5c0fea 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -12381,7 +12381,7 @@ instantiate_class_template (tree type)
   if (! push_tinst_level (type))
 return type;
 
-  maybe_diagnose_erroneous_template (templ);
+  maybe_diagnose_erroneous_template (t ? TI_TEMPLATE (t) : templ);
 
   int saved_unevaluated_operand = cp_unevaluated_operand;
   int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
diff --git a/gcc/testsuite/g++.dg/template/permissive-error2.C 
b/gcc/testsuite/g++.dg/template/permissive-error2.C
new file mode 100644
index 000..692e7c7ac82
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/permissive-error2.C
@@ -0,0 +1,15 @@
+// PR c++/116064
+// { dg-additional-options -fpermissive }
+// Verify we correctly mark a partial specialization as erroneous
+// instead its primary template.
+
+template
+struct A { };
+
+template
+struct A { // { dg-error "instantiating erroneous template" }
+  void f(typename A::type); // { dg-warning "does not name a type" }
+};
+
+A a;  // { dg-bogus "" }
+A b; // { dg-message "required from here" }
-- 
2.46.0.39.g891ee3b9db



Re: [PATCH 2/3] RISC-V: xthead(f)memidx: Eliminate optimization patterns

2024-08-07 Thread Jeff Law




On 8/7/24 12:27 AM, Christoph Müllner wrote:

We have a huge amount of optimization patterns (insn_and_split) for
XTheadMemIdx and XTheadFMemIdx that attempt to do something, that can be
done more efficient by generic GCC passes, if we have proper support code.

A key function in eliminating the optimization patterns is
th_memidx_classify_address_index(), which needs to identify each possible
memory expression that can be lowered into a XTheadMemIdx/XTheadFMemIdx
instruction.  This patch adds all memory expressions that were
previously only recognized by the optimization patterns.

Now, that the address classification is complete, we can finally remove
all optimization patterns with the side-effect or getting rid of the
non-canonical memory expression they produced: (plus (reg) (ashift (reg) 
(imm))).

A positive side-effect of this change is, that we address an RV32 ICE,
that was caused by the th_memidx_I_c pattern, which did not properly
handle SUBREGs (more details are in PR116131).

A temporary negative side-effect of this change is, that we cause a
regression of the xtheadfmemidx + xtheadfmv/zfa tests (initially
introduced as part of b79cd204c780 to address an ICE).
As this issue cannot be addressed in the code parts that are
adjusted in this patch, we just accept the regression for now.

PR target/116131

gcc/ChangeLog:

* config/riscv/thead.cc (th_memidx_classify_address_index):
Recognize all possible XTheadMemIdx memory operand structures.
(th_fmemidx_output_index): Do strict classification.
* config/riscv/thead.md (*th_memidx_operand): Remove.
(TARGET_XTHEADMEMIDX): Likewise.
(TARGET_HARD_FLOAT && TARGET_XTHEADFMEMIDX): Likewise.
(!TARGET_64BIT && TARGET_XTHEADMEMIDX): Likewise.
(*th_memidx_I_a): Likewise.
(*th_memidx_I_b): Likewise.
(*th_memidx_I_c): Likewise.
(*th_memidx_US_a): Likewise.
(*th_memidx_US_b): Likewise.
(*th_memidx_US_c): Likewise.
(*th_memidx_UZ_a): Likewise.
(*th_memidx_UZ_b): Likewise.
(*th_memidx_UZ_c): Likewise.
(*th_fmemidx_movsf_hardfloat): Likewise.
(*th_fmemidx_movdf_hardfloat_rv64): Likewise.
(*th_fmemidx_I_a): Likewise.
(*th_fmemidx_I_c): Likewise.
(*th_fmemidx_US_a): Likewise.
(*th_fmemidx_US_c): Likewise.
(*th_fmemidx_UZ_a): Likewise.
(*th_fmemidx_UZ_c): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr116131.c: New test.
Nice cleanup.  I did wander the old PA code a bit most of what found was 
actually addressing limitations due to its weird implicit segment 
selection based on the base register rather than the full effective 
address.  Bad memories.


But that was enough to get a few synapses to fire. When I last looked at 
the canonicalization issues (circa 2015 IIRC) in this space I adjusted 
combine to handle things better.  So probably not nearly as much to do 
in the target files anymore.


I'll note the thead code is riddled with explicit mode testing which 
looks less than ideal, but that's a pre-existing issue and it may in 
fact be reasonable, I don't know the thead extensions well enough to be 
100% sure either way.  So I won't object, but there's a bit of "ewww" 
when I look at the thead address classification code.



It looks like the pre-commit tester didn't test patches #2 or #3.  So 
while I'll ack, please be on the lookout for any failures.


Jeff


[PATCH v3 0/2] Add support for AdvSIMD faminmax

2024-08-07 Thread saurabh.jha
From: Saurabh Jha 

This patch series is a respin of a previous patch here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/658984.html

The AArch64 FEAT_FAMINMAX is optional from Armv9.2-a and mandatory from
Armv9.5-a. It introduces instructions for computing the floating point
absolute maximum and minimum of the two vectors element-wise.

This new version addresses all review comments from the previous version.
Additionally, we realised that the NaN/Inf behaviour of famax/famin and
fmax/fmin are not the same, as we previously thought. The behaviour of
famax/famin and fmaxnm/fminnm are not same either.

The new codegen strategy is to combine the rtl operators smax and abs
into famax and smin and abs into famin.

We are using two instruction patterns: one for intrinsics and one for
codegen.

Apart from codegen changes and their test cases, this new version also
changes intrinsic tests to use the -O3 flag. This removes the need for
testing loads and stores.

The old code for intrinsic and the refactoring of report_missing_extension
and report_missing_extension_p are same as the previous version.

Regression tested for aarch64-none-linux-gnu and found no regressions.

Ok for master? I don't have commit access so can someone please commit
on my behalf?

Saurabh Jha (2):
  aarch64: Add AdvSIMD faminmax intrinsics
  aarch64: Add codegen support for AdvSIMD faminmax

 gcc/config/aarch64/aarch64-builtins.cc| 173 +-
 gcc/config/aarch64/aarch64-builtins.h |   5 +-
 .../aarch64/aarch64-option-extensions.def |   2 +
 gcc/config/aarch64/aarch64-simd.md|  21 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc|  22 --
 gcc/config/aarch64/aarch64.h  |   4 +
 gcc/config/aarch64/iterators.md   |  12 +
 gcc/config/arm/types.md   |   6 +
 gcc/doc/invoke.texi   |   2 +
 .../aarch64/simd/faminmax-builtins-no-flag.c  |  10 +
 .../aarch64/simd/faminmax-builtins.c  | 115 ++
 .../aarch64/simd/faminmax-codegen-no-flag.c   | 217 ++
 .../aarch64/simd/faminmax-codegen.c   | 197 
 13 files changed, 754 insertions(+), 32 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c

-- 
2.43.2



[PATCH v3 2/2] aarch64: Add codegen support for AdvSIMD faminmax

2024-08-07 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
mandatory from Armv9.5-a. It introduces instructions for computing the
floating point absolute maximum and minimum of the two vectors
element-wise.

This patch adds code generation support for famax and famin in terms of
existing RTL operators.

famax/famin is equivalent to first taking abs of the operands and then
taking smax/smin on the results of abs.

famax/famin (a, b) = smax/smin (abs (a), abs (b))

This fusion of operators is only possible when -march=armv9-a+faminmax
flags are passed. We also need to pass -ffast-math flag; if we don't,
then a statement like

c[i] = __builtin_fmaxf16 (a[i], b[i]);

is RTL expanded to UNSPEC_FMAXNM instead of smax (likewise for smin).

This code generation is only available on -O2 or -O3 as that is when
auto-vectorization is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md
(*aarch64_faminmax_fused): Instruction pattern for faminmax
codegen.
* config/aarch64/iterators.md: Attribute for faminmax codegen.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test.
* gcc.target/aarch64/simd/faminmax-codegen.c: New test.
---
 gcc/config/aarch64/aarch64-simd.md|  10 +
 gcc/config/aarch64/iterators.md   |   3 +
 .../aarch64/simd/faminmax-codegen-no-flag.c   | 217 ++
 .../aarch64/simd/faminmax-codegen.c   | 197 
 4 files changed, 427 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 0e1dd48dddb..37923037055 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -9901,3 +9901,13 @@
   "\t%0., %1., %2."
   [(set_attr "type" "neon_fp_aminmax")]
 )
+
+(define_insn "*aarch64_faminmax_fused"
+  [(set (match_operand:VHSDF 0 "register_operand" "=w")
+	(FMAXMIN:VHSDF
+	  (abs:VHSDF (match_operand:VHSDF 1 "register_operand" "w"))
+	  (abs:VHSDF (match_operand:VHSDF 2 "register_operand" "w"]
+  "TARGET_FAMINMAX"
+  "\t%0., %1., %2."
+  [(set_attr "type" "neon_fp_aminmax")]
+)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index ce1c63e63cc..28b35a7da5c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -4471,3 +4471,6 @@
 
 (define_int_attr faminmax_uns_op
   [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")])
+
+(define_code_attr faminmax_op
+  [(smax "famax") (smin "famin")])
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c
new file mode 100644
index 000..d77f5a5d19f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c
@@ -0,0 +1,217 @@
+/* { dg-do assemble} */
+/* { dg-additional-options "-O3 -ffast-math -march=armv9-a" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include "arm_neon.h"
+
+#pragma GCC target "+nosve"
+
+/*
+** test_vamax_f16:
+**	fabs	v1.4h, v1.4h
+**	fabs	v0.4h, v0.4h
+**	fmaxnm	v0.4h, v0.4h, v1.4h
+**	ret
+*/
+float16x4_t
+test_vamax_f16 (float16x4_t a, float16x4_t b)
+{
+  int i;
+  float16x4_t c;
+
+  for (i = 0; i < 4; ++i) {
+a[i] = __builtin_fabsf16 (a[i]);
+b[i] = __builtin_fabsf16 (b[i]);
+c[i] = __builtin_fmaxf16 (a[i], b[i]);
+  }
+  return c;
+}
+
+/*
+** test_vamaxq_f16:
+**	fabs	v1.8h, v1.8h
+**	fabs	v0.8h, v0.8h
+**	fmaxnm	v0.8h, v0.8h, v1.8h
+**	ret
+*/
+float16x8_t
+test_vamaxq_f16 (float16x8_t a, float16x8_t b)
+{
+  int i;
+  float16x8_t c;
+
+  for (i = 0; i < 8; ++i) {
+a[i] = __builtin_fabsf16 (a[i]);
+b[i] = __builtin_fabsf16 (b[i]);
+c[i] = __builtin_fmaxf16 (a[i], b[i]);
+  }
+  return c;
+}
+
+/*
+** test_vamax_f32:
+**	fabs	v1.2s, v1.2s
+**	fabs	v0.2s, v0.2s
+**	fmaxnm	v0.2s, v0.2s, v1.2s
+**	ret
+*/
+float32x2_t
+test_vamax_f32 (float32x2_t a, float32x2_t b)
+{
+  int i;
+  float32x2_t c;
+
+  for (i = 0; i < 2; ++i) {
+a[i] = __builtin_fabsf32 (a[i]);
+b[i] = __builtin_fabsf32 (b[i]);
+c[i] = __builtin_fmaxf32 (a[i], b[i]);
+  }
+  return c;
+}
+
+/*
+** test_vamaxq_f32:
+**	fabs	v1.4s, v1.4s
+**	fabs	v0.4s, v0.4s
+**	fmaxnm	v0.4s, v0.4s, v1.4s
+**	ret
+*/
+float32x4_t
+test_vamaxq_f32 (float32x4_t a, float32x4_t b)
+{
+  int i;
+  float32x4_t c;
+
+  for (i = 0; i < 4; ++i) {
+a[i] = __builtin_fabsf32 (a[i]);
+b[i] = __builtin_fabsf32 (b[i]);
+c[i] = __builtin_fmaxf32 (a[i], b[i]);
+  }
+  return c;
+}
+
+/*
+** test_vamaxq_f64:
+**	fabs	v1.2d, v1.2d
+**	fabs	v0.2d, v0.2d
+**	fmaxnm	v0.2d, v0.2d, v1.2d
+**	ret
+*/
+float64x2_t
+test_vamaxq_f64 (float64x2_t a, float64x2_t b)
+{
+  int i;
+  float64x2_t c;
+
+  for (i = 0; i < 2; ++i) {
+a[i] = __builtin_fabsf64 (a[i]);
+b[i] = __builtin_fabs

[PATCH v3 1/2] aarch64: Add AdvSIMD faminmax intrinsics

2024-08-07 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and
mandatory from Armv9.5-a. It introduces instructions for computing the
floating point absolute maximum and minimum of the two vectors element-wise.

This patch does two things:
1. Introduces AdvSIMD faminmax intrinsics.
2. Move report_missing_extension and reported_missing_extension_p to
   make it more usable.

The intrinsics of this extension are implemented as the following
builtin functions:
* vamax_f16
* vamaxq_f16
* vamax_f32
* vamaxq_f32
* vamaxq_f64
* vamin_f16
* vaminq_f16
* vamin_f32
* vaminq_f32
* vaminq_f64

We moved the definition of `report_missing_extension` from
gcc/config/aarch64/aarch64-sve-builtins.cc to
gcc/config/aarch64/aarch64-builtins.cc and its declaration to
gcc/config/aarch64/aarch64-builtins.h. We also moved the declaration
of `reported_missing_extension_p` from
gcc/config/aarch64/aarch64-sve-builtins.cc
to gcc/config/aarch64/aarch64-builtins.cc, closer to the definition of
`report_missing_extension`. In the exsiting code structure, this leads
to `report_missing_extension` being usable from both normal builtins
and sve builtins.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc
(enum aarch64_builtins): New enum values for faminmax builtins.
(aarch64_init_faminmax_builtins): New function to declare new
builtins.
(handle_arm_neon_h): Modify to call
aarch64_init_faminmax_builtins.
(aarch64_general_check_builtin_call): Modify to check whether
+faminmax flag is being used and printing error message if not being
used.
(aarch64_expand_builtin_faminmax): New function to emit
instructions of this extension.
(aarch64_general_expand_builtin): Modify to call
aarch64_expand_builtin_faminmax.
(report_missing_extension): Move from
config/aarch64/aarch64-sve-builtins.cc.
* config/aarch64/aarch64-builtins.h
(report_missing_extension): Declaration for this function so
that it can be used wherever this header is included.
(reported_missing_extension_p): Move from
config/aarch64/aarch64-sve-builtins.cc
* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): Introduce new flag for this
extension.
* config/aarch64/aarch64-simd.md
(aarch64_): Instruction pattern for
faminmax intrinsics.
* config/aarch64/aarch64-sve-builtins.cc
(reported_missing_extension_p): Move to
config/aarch64/aarch64-builtins.c
(report_missing_extension): Move to
config/aarch64/aarch64-builtins.cc
* config/aarch64/aarch64.h
(TARGET_FAMINMAX): Introduce new flag for this extension.
* config/aarch64/iterators.md: Introduce new iterators for
  faminmax intrinsics.
* config/arm/types.md: Introduce neon_fp_aminmax attributes.
* doc/invoke.texi: Document extension in AArch64 Options.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test.
* gcc.target/aarch64/simd/faminmax-builtins.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.cc| 173 +-
 gcc/config/aarch64/aarch64-builtins.h |   5 +-
 .../aarch64/aarch64-option-extensions.def |   2 +
 gcc/config/aarch64/aarch64-simd.md|  11 ++
 gcc/config/aarch64/aarch64-sve-builtins.cc|  22 ---
 gcc/config/aarch64/aarch64.h  |   4 +
 gcc/config/aarch64/iterators.md   |   9 +
 gcc/config/arm/types.md   |   6 +
 gcc/doc/invoke.texi   |   2 +
 .../aarch64/simd/faminmax-builtins-no-flag.c  |  10 +
 .../aarch64/simd/faminmax-builtins.c  | 115 
 11 files changed, 327 insertions(+), 32 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 30669f8aa18..cd590186f22 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -829,6 +829,17 @@ enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  /* FAMINMAX builtins.  */
+  AARCH64_FAMINMAX_BUILTIN_FAMAX4H,
+  AARCH64_FAMINMAX_BUILTIN_FAMAX8H,
+  AARCH64_FAMINMAX_BUILTIN_FAMAX2S,
+  AARCH64_FAMINMAX_BUILTIN_FAMAX4S,
+  AARCH64_FAMINMAX_BUILTIN_FAMAX2D,
+  AARCH64_FAMINMAX_BUILTIN_FAMIN4H,
+  AARCH64_FAMINMAX_BUILTIN_FAMIN8H,
+  AARCH64_FAMINMAX_BUILTIN_FAMIN2S,
+  AARCH64_FAMINMAX_BUILTIN_FAMIN4S,
+  AARCH64_FAMINMAX_BUILTIN_FAMIN2D,
   /* System register builtins.  */
   AARCH64_RSR,
   AARCH64_RSRP,
@@ -1547,6 +1558,66 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma)
 }
 }
 
+/* Initialize the absolute maximum/minimum (FAMINMAX) builtins.  */
+
+typedef struct
+{
+  const char *name;
+  unsigned int code;
+  tree eltype;
+  machine_mode mode;
+} faminmax_builtins_data;
+
+static v

Re: [PATCH 3/3] RISC-V: rv32/DF: Prevent 2 SImode loads using XTheadMemIdx

2024-08-07 Thread Jeff Law




On 8/7/24 12:27 AM, Christoph Müllner wrote:

When enabling XTheadFmv/Zfa and XThead(F)MemIdx, we might end up
with the following insn (registers are examples, but of correct class):

(set (reg:DF a4)
  (mem:DF (plus:SI (mult:SI (reg:SI a0)
   (const_int 8))
  (reg:SI a5

This is a result of an attempt to load the DF register via two SI
register loads followed by XTheadFmv/Zfa instructions to move the
contents of the two SI registers into the DF register.

The two loads are generated in riscv_split_doubleword_move(),
where the second load adds an offset of 4 to load address.
While this works fine for RVI loads, this can't be handled
for XTheadMemIdx addresses.  Coming back to the example above,
we would end up with the following insn, which can't be simplified
or matched:

(set (reg:SI a4)
  (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI a0)
(const_int 8))
   (reg:SI a5))
  (const_int 4

This triggered an ICE in the past, which was resolved in b79cd204c780,
which also added the test xtheadfmemidx-medany.c, where the examples
are from.  The patch postponed the optimization insn_and_split pattern
for XThead(F)MemIdx, so that the situation could effectively be avoided.

Since we don't want to rely on these optimization pattern in the future,
we need a different solution.  Therefore, this patch restricts the
movdf_hardfloat_rv32 insn to not match for split-double-word-moves
with XThead(F)MemIdx operands.  This ensures we don't need to split
them up later.

When looking at the code generation of the test file, we can see that
we have less GP<->FP conversions, but cannot use the indexed loads.
The new sequence is identical to rv32gc_xtheadfmv (similar to rv32gc_zfa).

Old:
[...]
lla a5,.LANCHOR0
th.flrd fa5,a5,a0,3
fmv.x.w a4,fa5
th.fmv.x.hw a5,fa5
.L1:
fmv.w.x fa0,a4
th.fmv.hw.x fa0,a5
ret
[...]

New:
[...]
lla a5,.LANCHOR0
sllia4,a0,3
add a4,a4,a5
lw  a5,4(a4)
lw  a4,0(a4)
.L1:
fmv.w.x fa0,a4
th.fmv.hw.x fa0,a5
ret
[...]

This was tested (together with the patch that eliminates the
XTheadMemIdx optimization patterns) with SPEC CPU 2017 intrate
on QEMU (RV64/lp64d).

gcc/ChangeLog:

* config/riscv/constraints.md (th_m_noi): New constraint.
* config/riscv/riscv.md: Adjust movdf_hardfloat_rv32 for
XTheadMemIdx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/xtheadfmemidx-xtheadfmv-medany.c: Adjust.
* gcc.target/riscv/xtheadfmemidx-zfa-medany.c: Likewise.

OK

Note I think there's offsettable_address or something like that which 
can be used to help with these scenarios if you wanted to try and 
improve things.  I don't think the problem you've run into is inherently 
specific to risc-v.  Other targets split double sized moves and 
sometimes need to restrict the kinds of memory addresses matched.


jeff


Re: [PATCH v2 1/1] RISC-V: Support BF16 interfaces in libgcc

2024-08-07 Thread Jeff Law




On 8/7/24 8:55 AM, Jakub Jelinek wrote:

On Wed, Aug 07, 2024 at 08:46:11AM -0600, Jeff Law wrote:



On 8/7/24 1:16 AM, Jakub Jelinek wrote:



This looks all wrong to me.

On all the other targets that already do support __bf16 type it is a storage
only type, so all arithmetics on it is expected to be done on float, not in
__bf16.

RISC-V has (via extensions) degrees of arithmetic/conversion support, so for
example it can do a multiply-add of bf16 operands widening to float.


Even the __builtin_*f16 _Float16 builtins are mostly unused (at least on
other targets), but there those functions are at least part of C23, even
when they are really not implemented yet in libm (at least talking about
glibc, but I doubt other C libraries are any further than that).
For __bf16, the only standard required stuff is in C++23 and the provided
builtins are whatever was necessary for that.

I understand RISC-V has via extensions more full _Float16 and __bf16
support, but if it needs further builtins, the questions are:
1) should they be enabled on all arches or just on those that need them?
I'd tend to take a wait and see approach, meaning start when them as 
target builtins and promote them to generic builtins if we see other 
targets implementing a richer set of bf16 operations.



2) is there plan to add libm support for __bf16, even when it is
non-standard in C (especially if we don't know if C2y or newer will or won't
add support for it and if it will use the chosen suffixes or some others)?
> 3) is there plan to add variants for C++23  and  
etc.> to handle _Float16 and __bf16 differently?  Currently those types 
are just

handled by doing as much as possible on float, using its builtins

I have no idea on either of these questions.

jeff


Re: [PATCH] PR116080: Fix test suite checks for musttail

2024-08-07 Thread Andi Kleen
> > Okay for trunk? I would like to check that one in to avoid the noise
> > in the regression reports.
> 
> I've tested this version in a few trees.

Thanks Thomas.

> That's because of effective-target 'struct_musttail' for '-m32'
> reporting:
> 
> struct_musttail1494739.cc: In function 'foo bar()':
> struct_musttail1494739.cc:5:88: error: cannot tail-call: return value 
> used after call
> 
> (I'm just mentioning the latter "regressions" in case those are
> unexpected.)

I believe that's because these test cases are handled by the GIMPLE level
tail call handling in tree-tailcall (which avoids any target
restrictions), while the TCL test checks for the generic case using 
an extern (so hits target restrictions).

While this could probably be distinguished in the test case probing
I don't think it's worth it. Some of this is just for the frontend,
which is architecture independent enough.

-Andi


Re: [PATCH] RISC-V: Clarify that Vector Crypto Extensions require Vector Extensions[PR116150]

2024-08-07 Thread Jeff Law




On 8/6/24 12:36 AM, Liao Shihua wrote:


在 2024/8/6 12:34, Jeff Law 写道:



On 8/5/24 10:23 AM, Patrick O'Neill wrote:


On 8/5/24 01:23, Liao Shihua wrote:
 PR 116150: Zvk* and Zvb* extensions requires v or zve* 
extension, but on gcc v is implied.


gcc/ChangeLog:

 * common/config/riscv/riscv-common.cc: Removed the zvk 
extension's implicit expansion of v extension.

 * config/riscv/arch-canonicalize: Ditto.
 * config/riscv/riscv.cc (riscv_override_options_internal): 
Throw error when zvb or zvk extension without v extension.


gcc/testsuite/ChangeLog:

 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-47.c: add v or zve* to -march.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-48.c: Ditto.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-49.c: Ditto.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-50.c: Ditto.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-51.c: Ditto.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-52.c: Ditto.
 * gcc.target/riscv/rvv/base/ 
target_attribute_v_with_intrinsic-53.c: Ditto.

 * gcc.target/riscv/rvv/base/zvbc-intrinsic.c: Ditto.
 * gcc.target/riscv/rvv/base/zvbc_vx_constraint-1.c: Ditto.
 * gcc.target/riscv/rvv/base/zvbc_vx_constraint-2.c: Ditto.
 * gcc.target/riscv/rvv/base/zvknhb-intrinsic.c: Ditto.
 * gcc.target/riscv/zvbb.c: Ditto.
 * gcc.target/riscv/zvbc.c: Ditto.
 * gcc.target/riscv/zvkb.c: Ditto.
 * gcc.target/riscv/zvkg.c: Ditto.
 * gcc.target/riscv/zvkn-1.c: Ditto.
 * gcc.target/riscv/zvkn.c: Ditto.
 * gcc.target/riscv/zvknc-1.c: Ditto.
 * gcc.target/riscv/zvknc-2.c: Ditto.
 * gcc.target/riscv/zvknc.c: Ditto.
 * gcc.target/riscv/zvkned.c: Ditto.
 * gcc.target/riscv/zvkng-1.c: Ditto.
 * gcc.target/riscv/zvkng-2.c: Ditto.
 * gcc.target/riscv/zvkng.c: Ditto.
 * gcc.target/riscv/zvknha.c: Ditto.
 * gcc.target/riscv/zvknhb.c: Ditto.
 * gcc.target/riscv/zvks-1.c: Ditto.
 * gcc.target/riscv/zvks.c: Ditto.
 * gcc.target/riscv/zvksc-1.c: Ditto.
 * gcc.target/riscv/zvksc-2.c: Ditto.
 * gcc.target/riscv/zvksc.c: Ditto.
 * gcc.target/riscv/zvksed.c: Ditto.
 * gcc.target/riscv/zvksg-1.c: Ditto.
 * gcc.target/riscv/zvksg-2.c: Ditto.
 * gcc.target/riscv/zvksg.c: Ditto.
 * gcc.target/riscv/zvksh.c: Ditto.
 * gcc.target/riscv/pr116150-1.c: New test.
 * gcc.target/riscv/pr116150-2.c: New test.
 * gcc.target/riscv/pr116150-3.c: New test.
 * gcc.target/riscv/pr116150-4.c: New test.

---


Thanks for the patch! It's not clear to me if we want to match LLVM's 
behavior here.


Here's where GCC's current behavior is documented: https:// 
gcc.gnu.org/ git/?p=gcc.git;a=blob;f=gcc/doc/ 
invoke.texi;h=0fe99ca8ef6e8868f60369f6329fe29599d89159;hb=HEAD#l31150


Maybe Jeff or Kito can provide some guidance for what we want to do 
here.
Our behavior is documented as implying V if we were to enable Zv*, so 
unless there's a strong need to follow LLVM here, I'd tend to leave it 
as-is.  But we can certainly discuss tomorrow AM.



jeff


Hello, Jeff and Patrick

In my opinion, this question is not whether to follow LLVM.

I want to make it clear that when we implement RISC-V extensions in GCC, 
Do we need to explicitly declare dependencies between extensions or just 
implicitly?
We discussed this as a group in the patchwork meeting yesterday.  The 
consensus was to keep GCC's behavior as-is.  So for example, enabling 
Zv* would imply V.


Thanks!
jeff




Re: [PATCH v5 3/3] c: Add __lengthof__ operator

2024-08-07 Thread Jens Gustedt
Hi

Am 7. August 2024 17:05:48 MESZ schrieb Joseph Myers :
> On Wed, 7 Aug 2024, Alejandro Colomar wrote:
> 
> > +@node Length
> > +@section Determining the Length of Arrays
> > +@cindex lengthof
> > +@cindex length
> > +@cindex array length
> > +
> > +The keyword @code{__lengthof__} determines the length of an array operand,
> > +that is, the number of elements in the array.
> > +Its syntax is just like @code{sizeof}.
> > +The operand must be a complete array type.
> 
> I think you mean the operand must be *an expression whose type is a 
> complete array type* or *a type name for a complete array type*.  The 
> wording you have suggests only type names, you need to be clear about both 
> kinds of operands being possible (and include examples for them).
> 
> > +@smallexample
> > +__lengthof__ (int [7][n++]);  // constexpr
> > +__lengthof__ (int [n++][7]);  // run-time value
> > +@end smallexample
> 
> I don't think using "constexpr" to mean "constant expression" is a good 
> idea, they're different things.

It should actually state "integer constant expression", I think. the nuance is 
probably important


> > +void
> > +incomplete (int p[])
> > +{
> > +  unsigned n;
> > +
> > +  n = __lengthof__ (x);  /* { dg-error "incomplete" } */
> > +
> > +  /* We want to support the following one in the future,
> > + but for now it should fail.  */
> > +  n = __lengthof__ (p);  /* { dg-error "invalid" } */
> 
> This seems to be the only test you have for a non-array operand.  I'd 
> expect such tests (both for type name operands and for expression 
> operands) covering cases that we *don't* want to support in future, not 
> just this one that we would like to be supportable in future.
> 
> I don't see any tests for the constraints on external definitions from 
> 6.9.1 that we discussed - that referenced to undefined internal linkage 
> identifiers are OK inside __lengthof__ returning a constant (both 
> constant-length arrays of non-VLA and constant-length arrays of VLA) but 
> not in the cases where __lengthof__ is evaluated.
> 


-- 
Jens Gustedt - INRIA & ICube, Strasbourg, France


Re: [PATCH] c++: erroneous partial spec vs primary tmpl [PR116064]

2024-08-07 Thread Jason Merrill

On 8/7/24 11:09 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

When a partial specialization is deemed erroneous at parse time, we
currently flag the primary template as erroneous instead.  Later
at instantiation time we check if the primary template is erroneous
rather than the selected partial specialization, so at least we're
consistent.

But it's better not to conflate a partial specialization with the
primary template since they're instantiated independenty.  This avoids
rejecting the instantiation of A in the below testcase.

PR c++/116064

gcc/cp/ChangeLog:

* error.cc (get_current_template): If the current scope is
a partial specialization, return it instead of the primary
template.
* pt.cc (instantiate_class_template): Pass the partial
specialization if any to maybe_diagnose_erroneous_template
instead of the primary template.

gcc/testsuite/ChangeLog:

* g++.dg/template/permissive-error2.C: New test.
---
  gcc/cp/error.cc   |  6 +-
  gcc/cp/pt.cc  |  2 +-
  gcc/testsuite/g++.dg/template/permissive-error2.C | 15 +++
  3 files changed, 21 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/permissive-error2.C

diff --git a/gcc/cp/error.cc b/gcc/cp/error.cc
index 6c22ff55b46..879e5a115cf 100644
--- a/gcc/cp/error.cc
+++ b/gcc/cp/error.cc
@@ -173,7 +173,11 @@ get_current_template ()
  {
if (scope_chain && in_template_context && !current_instantiation ())
  if (tree ti = get_template_info (current_scope ()))
-  return TI_TEMPLATE (ti);
+  {
+   if (PRIMARY_TEMPLATE_P (TI_TEMPLATE (ti)) && TI_PARTIAL_INFO (ti))
+ ti = TI_PARTIAL_INFO (ti);
+   return TI_TEMPLATE (ti);
+  }
  
return NULL_TREE;

  }
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 542962b6387..3e55d5c0fea 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -12381,7 +12381,7 @@ instantiate_class_template (tree type)
if (! push_tinst_level (type))
  return type;
  
-  maybe_diagnose_erroneous_template (templ);

+  maybe_diagnose_erroneous_template (t ? TI_TEMPLATE (t) : templ);
  
int saved_unevaluated_operand = cp_unevaluated_operand;

int saved_inhibit_evaluation_warnings = c_inhibit_evaluation_warnings;
diff --git a/gcc/testsuite/g++.dg/template/permissive-error2.C 
b/gcc/testsuite/g++.dg/template/permissive-error2.C
new file mode 100644
index 000..692e7c7ac82
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/permissive-error2.C
@@ -0,0 +1,15 @@
+// PR c++/116064
+// { dg-additional-options -fpermissive }
+// Verify we correctly mark a partial specialization as erroneous
+// instead its primary template.
+
+template
+struct A { };
+
+template
+struct A { // { dg-error "instantiating erroneous template" }
+  void f(typename A::type); // { dg-warning "does not name a type" }
+};
+
+A a;  // { dg-bogus "" }
+A b; // { dg-message "required from here" }




Re: [PATCH] c++: Implement CWG2387 - Linkage of const-qualified variable template [PR109126]

2024-08-07 Thread Jason Merrill

On 8/7/24 10:40 AM, Jakub Jelinek wrote:

Hi!

The following patch attempts to implement DR2387 by making variable
templates including their specialization TREE_PUBLIC when at file
scope and they don't have static storage class.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2024-08-07  Jakub Jelinek  

PR c++/109126
* decl.cc (grokvardecl): Implement CWG 2387 - Linkage of
const-qualified variable template.  Set TREE_PUBLIC on variable
templates with const qualified types unless static is present.

* g++.dg/DRs/dr2387.C: New test.
* g++.dg/DRs/dr2387-aux.cc: New file.

--- gcc/cp/decl.cc.jj   2024-08-06 11:05:29.147469440 +0200
+++ gcc/cp/decl.cc  2024-08-07 11:58:11.275368835 +0200
@@ -11225,6 +11225,8 @@ grokvardecl (tree type,
|| ! constp
|| volatilep
|| inlinep
+   || in_template_context
+   || processing_specialization
|| module_attach_p ()));
TREE_STATIC (decl) = ! DECL_EXTERNAL (decl);
  }
--- gcc/testsuite/g++.dg/DRs/dr2387.C.jj2024-08-07 12:11:14.372115971 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2387.C   2024-08-07 12:13:35.089273604 +0200
@@ -0,0 +1,22 @@
+// DR 2387
+// { dg-do run { target c++14 } }
+// { dg-additional-sources "dr2387-aux.cc" }
+
+template 
+const int a = N;
+template 
+static const int b = N;
+template 
+extern const int c = N;
+template 
+const volatile int d = N;
+template 
+const int e = N;
+template <>
+const int e <43> = 44;
+
+const int *pa = &a <42>;
+const int *pb = &b <42>;
+const int *pc = &c <42>;
+const volatile int *pd = &d <42>;
+const int *pe = &e <43>;
--- gcc/testsuite/g++.dg/DRs/dr2387-aux.cc.jj   2024-08-07 12:11:09.388181223 
+0200
+++ gcc/testsuite/g++.dg/DRs/dr2387-aux.cc  2024-08-07 12:13:25.321401491 
+0200
@@ -0,0 +1,25 @@
+// DR 2387
+
+template 
+extern const int a;
+template 
+static const int b = N;
+template 
+extern const int c;
+template 
+extern const volatile int d;
+template 
+extern const int e;
+extern const int *pa, *pb, *pc, *pe;
+extern const volatile int *pd;
+
+int
+main ()
+{
+  if (pa != &a <42>
+  || pb == &b <42>
+  || pc != &c <42>
+  || pd != &d <42>
+  || pe != &e <43>)
+__builtin_abort ();
+}

Jakub





[PATCH] Ada, libgnarl: Fix s-taprop__posix.adb compilation.

2024-08-07 Thread Iain Sandoe
Tested on x86_64-darwin21, OK for trunk?
thanks
Iain

--- 8< ---

Bootstrap on Darwin, and likely any other targets using the posix
implementation of s-taprop was broken by commits between r15-2743
and r15-2747:
s-taprop.adb:297:15: error: "size_t" is not visible
s-taprop.adb:297:15: error: multiple use clauses cause hiding
s-taprop.adb:297:15: error: hidden declaration at s-osinte.ads:58
s-taprop.adb:297:15: error: hidden declaration at i-c.ads:9

This seems to be caused by an omitted change to use Interfaces.C.size_t
instead of just size_t.  Fixed thus.

gcc/ada/ChangeLog:

* libgnarl/s-taprop__posix.adb (Stack_Guard): Use Interfaces.C.size_t
for the type of Page_Size.

Signed-off-by: Iain Sandoe 
---
 gcc/ada/libgnarl/s-taprop__posix.adb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/libgnarl/s-taprop__posix.adb 
b/gcc/ada/libgnarl/s-taprop__posix.adb
index 3d76679ad4a..5f6a4d69c91 100644
--- a/gcc/ada/libgnarl/s-taprop__posix.adb
+++ b/gcc/ada/libgnarl/s-taprop__posix.adb
@@ -294,7 +294,7 @@ package body System.Task_Primitives.Operations is
  Res :=
mprotect
  (Stack_Base - (Stack_Base mod Page_Size) + Page_Size,
-  size_t (Page_Size),
+  Interfaces.C.size_t (Page_Size),
   prot => (if On then PROT_ON else PROT_OFF));
  pragma Assert (Res = 0);
   end if;
-- 
2.39.2 (Apple Git-143)



  1   2   >