date:20240905

Re: New version of unsiged patch

2024-09-05 Thread Thomas Koenig


Ping (a little bit)?

With another weekend coming up, I would have some time to
work on incorporating any feedback, or on putting in
more intrinsics.

Best regards

Thomas

[PATCH] c++: Handle attributes on exception declarations [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

This is a continuation of the series for the ignorability of standard
attributes, on top of
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661904.html   

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661905.html   

  
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661906.html

I've added a test for assume attribute diagnostics appertaining to various
entities (mostly invalid) and while doing that, I've discovered that
attributes on exception declarations were mostly ignored, this patch
adds the missing cp_decl_attributes call and also in the
cp_parser_type_specifier_seq case differentiates between attributes and
std_attributes to be able to differentiate between attributes which apply
to the declaration using type-specifier-seq and attributes after the type
specifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* parser.cc (cp_parser_type_specifier_seq): Chain cxx11_attribute_p
attributes after any type specifier in the is_declaration case
to std_attributes rather than attributes.  Set also ds_attribute
or ds_std_attribute locations if not yet set.
(cp_parser_exception_declaration): Pass &type_specifiers.attributes
instead of NULL as last argument, call cp_decl_attributes.

* g++.dg/cpp0x/attr-assume1.C: New test.
* g++.dg/cpp0x/attr-deprecated1.C: Add tests for attributes
after type specifiers before parameter or exception declarations
and after parameter or exception declaration declarators.

--- gcc/cp/parser.cc.jj 2024-09-04 12:36:51.903244117 +0200
+++ gcc/cp/parser.cc2024-09-04 15:13:22.875960509 +0200
@@ -25375,9 +25375,24 @@ cp_parser_type_specifier_seq (cp_parser*
  || tok->type == CPP_EQ || tok->type == CPP_OPEN_BRACE)
break;
}
- type_specifier_seq->attributes
-   = attr_chainon (type_specifier_seq->attributes,
-   cp_parser_attributes_opt (parser));
+ location_t attr_loc = cp_lexer_peek_token (parser->lexer)->location;
+ tree attrs = cp_parser_attributes_opt (parser);
+ if (seen_type_specifier
+ && is_declaration
+ && cxx11_attribute_p (attrs))
+   {
+ type_specifier_seq->std_attributes
+   = attr_chainon (type_specifier_seq->std_attributes, attrs);
+ if (type_specifier_seq->locations[ds_std_attribute] == 0)
+   type_specifier_seq->locations[ds_std_attribute] = attr_loc;
+   }
+ else
+   {
+ type_specifier_seq->attributes
+   = attr_chainon (type_specifier_seq->attributes, attrs);
+ if (type_specifier_seq->locations[ds_attribute] == 0)
+   type_specifier_seq->locations[ds_attribute] = attr_loc;
+   }
  continue;
}
 
@@ -29630,7 +29645,12 @@ cp_parser_exception_declaration (cp_pars
   if (!type_specifiers.any_specifiers_p)
 return error_mark_node;
 
-  return grokdeclarator (declarator, &type_specifiers, CATCHPARM, 1, NULL);
+  tree decl = grokdeclarator (declarator, &type_specifiers, CATCHPARM, 1,
+ &type_specifiers.attributes);
+  if (decl != error_mark_node && type_specifiers.attributes)
+cplus_decl_attributes (&decl, type_specifiers.attributes, 0);
+
+  return decl;
 }
 
 /* Parse a throw-expression.
--- gcc/testsuite/g++.dg/cpp0x/attr-assume1.C.jj2024-09-04 
12:47:19.356072957 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-assume1.C   2024-09-04 15:28:42.486057664 
+0200
@@ -0,0 +1,151 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  [[assume (n > 0)]];
+  [[assume]];  // { dg-error "wrong number of 
arguments specified for 'assume' attribute" }
+  [[assume ("abc")]];
+  [[assume (1, 2, 3)]];// { dg-error "wrong 
number of arguments specified for 'assume' attribute" }
+
+  [[assume (true)]] int x1;// { dg-error "'assume' 
attribute ignored" }
+
+  auto a = [] [[assume (true)]] () {}; // { dg-error "'assume' 
attribute ignored" }
+  auto b = [] constexpr [[assume (true)]] {};  // { dg-error "'assume' 
attribute ignored" }
+   // { dg-error "parameter 
declaration before lambda declaration specifiers only optional with" "" { 
target c++20_down } .-1 }
+   // { dg-error "'constexpr' 
lambda only available with" "" { target c++14_down } .-2 }
+  auto c = []

[PATCH] c++: Add carries_dependency further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

This patch adds additional test coverage for the carries_dependency
attribute (unlike other attributes, the attribute actually isn't implemented
for real, so we warn even in the cases of valid uses because we ignore those
as well).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-carries_dependency2.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency2.C.jj2024-09-04 
15:42:01.250738751 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-carries_dependency2.C   2024-09-04 
16:21:50.416239580 +0200
@@ -0,0 +1,152 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+xyzzy (int *a [[carries_dependency]],  // { dg-warning 
"'carries_dependency' attribute ignored" }
+   int *b [[carries_dependency (1)]])  // { dg-error 
"'carries_dependency' attribute does not take any arguments" }
+{  // { dg-error "expected 
',' or '...' before 'b'" "" { target *-*-* } .-1 }
+}
+
+void
+foo (int n)
+{
+  [[carries_dependency]] int x1;   // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+
+  auto a = [] [[carries_dependency]] () {};// { dg-warning 
"'carries_dependency' attribute ignored" }
+  auto b = [] constexpr [[carries_dependency]] {}; // { dg-warning 
"'carries_dependency' attribute does not apply to types" }
+   // { dg-error 
"parameter declaration before lambda declaration specifiers only optional with" 
"" { target c++20_down } .-1 }
+   // { dg-error 
"'constexpr' lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[carries_dependency]] {};  // { dg-warning 
"'carries_dependency' attribute does not apply to types" }
+   // { dg-error 
"parameter declaration before lambda exception specification only optional 
with" "" { target c++20_down } .-1 }
+  auto d = [] () [[carries_dependency]] {};// { dg-warning 
"'carries_dependency' attribute does not apply to types" }
+  auto e = new int [n] [[carries_dependency]]; // { dg-warning 
"attributes ignored on outermost array type in new expression" }
+  auto e2 = new int [n] [[carries_dependency]] [42];   // { dg-warning 
"attributes ignored on outermost array type in new expression" }
+  auto f = new int [n][42] [[carries_dependency]]; // { dg-warning 
"'carries_dependency' attribute does not apply to types" }
+  [[carries_dependency]];  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[carries_dependency]] {}// { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[carries_dependency]] if (true) {}  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[carries_dependency]] while (false) {}  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[carries_dependency]] goto lab; // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[carries_dependency]] lab:; // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+  [[carries_dependency]] try {} catch (int) {} // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  if ([[carries_dependency]] int x = 0) {} // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+  switch (n)
+{
+[[carries_dependency]] case 1: // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+[[carries_dependency]] break;  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+[[carries_dependency]] default:// { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+break;
+}
+  for ([[carries_dependency]] auto a : arr) {} // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+  for ([[carries_dependency]] auto [a, b] : arr2) {}   // { dg-warning 
"'carries_dependency' attribute can only be applied to functions or parameters" 
}
+   // { dg-error 
"structured bindings only available with" "" { target c++14_down } .-1 }
+  [[carries_dependency]] asm (""); // { dg-warning 
"attributes ignored on 'asm' declaration" }
+  try {} catch ([[carries_dependency]] int x)

Re: [PATCH] RISC-V: Handle unused-only-live stmts in SLP discovery

2024-09-05 Thread Richard Biener

On Wed, 4 Sep 2024, Palmer Dabbelt wrote:

> On Wed, 04 Sep 2024 04:10:52 PDT (-0700), rguent...@suse.de wrote:
> > The following adds SLP discovery for roots that are only live but
> > otherwise unused.  These are usually inductions.  This allows a
> > few more testcases to be handled fully with SLP, for example
> > gcc.dg/vect/no-scevccp-pr86725-1.c
> >
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >
> >  * tree-vect-slp.cc (vect_analyze_slp): Analyze SLP for live
> >  but otherwise unused defs.
> > ---
> >  gcc/tree-vect-slp.cc | 30 ++
> >  1 file changed, 30 insertions(+)
> 
> Are you putting the "RISC-V" in there just to kick the CI into running it?

Yes.

> If so you can also just CC  (or trip anything that
> matches the filter at [1]).  No big deal on my end, just worried non-RISC-V
> people are going to see the tag and think this is RISC-V-only and thus ignore
> it.
> 
> If you're looking for a RISC-V reviewer, I don't really know this stuff well
> enough to say much here.  Robin would probably be the best bet...

Good points, I'll see to use CCing patchworks...@rivosinc.com from now.
I can self-approve those patches but of course still welcome 
feedback.

Richard.

> [1]:
> https://github.com/patrick-rivos/riscv-gnu-toolchain/blob/1496f76a9ad4081c0afdde8f7f8ffb22573a1789/scripts/create_patches_files.py#L89
> 
> > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > index 41bc92b138a..91d6927016d 100644
> > --- a/gcc/tree-vect-slp.cc
> > +++ b/gcc/tree-vect-slp.cc
> > @@ -4704,6 +4704,36 @@ vect_analyze_slp (vec_info *vinfo, unsigned
> > max_tree_size)
> > saved_stmts.release ();
> >   }
> > }
> > +
> > +  /* Make sure to vectorize only-live stmts, usually inductions.  */
> > +  for (edge e : get_loop_exit_edges (LOOP_VINFO_LOOP (loop_vinfo)))
> > +   for (auto gsi = gsi_start_phis (e->dest); !gsi_end_p (gsi);
> > +gsi_next (&gsi))
> > + {
> > +   gphi *lc_phi = *gsi;
> > +   tree def = gimple_phi_arg_def_from_edge (lc_phi, e);
> > +   stmt_vec_info stmt_info;
> > +   if (TREE_CODE (def) == SSA_NAME
> > +   && !virtual_operand_p (def)
> > +   && (stmt_info = loop_vinfo->lookup_def (def))
> > +   && STMT_VINFO_RELEVANT (stmt_info) == vect_used_only_live
> > +   && STMT_VINFO_LIVE_P (stmt_info)
> > +   && (STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def
> > +   || (STMT_VINFO_DEF_TYPE (stmt_info) == vect_internal_def
> > +   && STMT_VINFO_REDUC_IDX (stmt_info) == -1)))
> > + {
> > +   vec stmts;
> > +   vec roots = vNULL;
> > +   vec remain = vNULL;
> > +   stmts.create (1);
> > +   stmts.quick_push (vect_stmt_to_vectorize (stmt_info));
> > +   vect_build_slp_instance (vinfo,
> > +slp_inst_kind_reduc_group,
> > +stmts, roots, remain,
> > +max_tree_size, &limit,
> > +bst_map, NULL);
> > + }
> > + }
> >  }
> >
> >hash_set visited_patterns;
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

[PATCH] c++: Add fallthrough attribute further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

Similarly for fallthrough attribute.  Had to add a second testcase because
the diagnostics for fallthrough not used within switch at all is done during
expansion and expansion won't happen if there are other errors in the
testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-fallthrough1.C: New test.
* g++.dg/cpp0x/attr-fallthrough2.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-fallthrough1.C.jj   2024-09-04 
16:27:52.786654084 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-fallthrough1.C  2024-09-04 
16:44:41.879666097 +0200
@@ -0,0 +1,169 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  switch (n)
+{
+case 1:
+  [[fallthrough (n > 0)]]; // { dg-error "'fallthrough' 
attribute does not take any arguments" }
+case 2:
+  break;
+case 3:
+  [[fallthrough]];
+case 4:
+  break;
+case 5:
+  [[fallthrough ("abc")]]; // { dg-error "'fallthrough' 
attribute does not take any arguments" }
+case 6:
+  break;
+case 7:
+  [[fallthrough (1, 2, 3)]];   // { dg-error "'fallthrough' 
attribute does not take any arguments" }
+case 8:
+  [[fallthrough]]; // { dg-error "attribute 
'fallthrough' not preceding a case label or default label" }
+  foo (n - 1);
+  break;
+default:
+  break;
+}
+
+  [[fallthrough]] int x1;  // { dg-error "'fallthrough' 
attribute ignored" }
+
+  auto a = [] [[fallthrough]] () {};   // { dg-error "'fallthrough' 
attribute ignored" }
+  auto b = [] constexpr [[fallthrough]] {};// { dg-error "'fallthrough' 
attribute ignored" }
+   // { dg-error "parameter 
declaration before lambda declaration specifiers only optional with" "" { 
target c++20_down } .-1 }
+   // { dg-error "'constexpr' 
lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[fallthrough]] {}; // { dg-error "'fallthrough' 
attribute ignored" }
+   // { dg-error "parameter 
declaration before lambda exception specification only optional with" "" { 
target c++20_down } .-1 }
+  auto d = [] () [[fallthrough]] {};   // { dg-error "'fallthrough' 
attribute ignored" }
+  auto e = new int [n] [[fallthrough]];// { dg-warning 
"attributes ignored on outermost array type in new expression" }
+  auto e2 = new int [n] [[fallthrough]] [42];  // { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto f = new int [n][42] [[fallthrough]];// { dg-error "'fallthrough' 
attribute ignored" }
+  [[fallthrough]] {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[fallthrough]] if (true) {} // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[fallthrough]] while (false) {} // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[fallthrough]] goto lab;// { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[fallthrough]] lab:;// { dg-error 
"'fallthrough' attribute ignored" }
+  [[fallthrough]] try {} catch (int) {}// { dg-warning 
"attributes at the beginning of statement are ignored" }
+  if ([[fallthrough]] int x = 0) {}// { dg-error "'fallthrough' 
attribute ignored" }
+  switch (n)
+{
+[[fallthrough]] case 1:// { dg-error "'fallthrough' 
attribute ignored" }
+[[fallthrough]] break; // { dg-warning "attributes at 
the beginning of statement are ignored" }
+[[fallthrough]] default:   // { dg-error "'fallthrough' 
attribute ignored" }
+break;
+}
+  for ([[fallthrough]] auto a : arr) {}// { dg-error 
"'fallthrough' attribute ignored" }
+  for ([[fallthrough]] auto [a, b] : arr2) {}  // { dg-error "'fallthrough' 
attribute ignored" }
+   // { dg-error "structured 
bindings only available with" "" { target c++14_down } .-1 }
+  [[fallthrough]] asm ("");// { dg-warning "attributes 
ignored on 'asm' declaration" }
+  try {} catch ([[fallthrough]] int x) {}  // { dg-error "'fallthrough' 
attribute ignored" }
+  try {} catch ([[fallthrough]] int) {}// { dg-error 
"'fallthrough' attribute ignored" }
+  try {} catch (int [[fallthrough]] x) {}  // { dg-warning "attribute 
ignored" }
+  try {} catch (int [[fallthrough]]) {}// { dg-warning 
"attribute ignored" }
+  try {} catch (

[PATCH] c++: Add {,un}likely attribute further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

Similarly for likely/unlikely attributes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-likely1.C: New test.
* g++.dg/cpp0x/attr-unlikely1.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-likely1.C.jj2024-09-04 
16:53:59.829472783 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-likely1.C   2024-09-04 17:04:52.566048248 
+0200
@@ -0,0 +1,149 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  [[likely]];
+  [[likely (1)]];  // { dg-error "'likely' attribute does 
not take any arguments" }
+  [[likely]] ++n;
+  [[likely]] int x1;   // { dg-warning "'likely' attribute 
ignored" }
+
+  auto a = [] [[likely]] () {};// { dg-warning "ISO C\\\+\\\+ 
'likely' attribute does not apply to functions; treating as 
'\\\[\\\[gnu::hot\\\]\\\]'" }
+  auto b = [] constexpr [[likely]] {}; // { dg-warning "'likely' attribute 
ignored" }
+   // { dg-error "parameter declaration 
before lambda declaration specifiers only optional with" "" { target c++20_down 
} .-1 }
+   // { dg-error "'constexpr' lambda only 
available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[likely]] {};  // { dg-warning "'likely' attribute 
ignored" }
+   // { dg-error "parameter declaration 
before lambda exception specification only optional with" "" { target 
c++20_down } .-1 }
+  auto d = [] () [[likely]] {};// { dg-warning "'likely' 
attribute ignored" }
+  auto e = new int [n] [[likely]]; // { dg-warning "attributes ignored on 
outermost array type in new expression" }
+  auto e2 = new int [n] [[likely]] [42];// { dg-warning "attributes ignored on 
outermost array type in new expression" }
+  auto f = new int [n][42] [[likely]]; // { dg-warning "'likely' attribute 
ignored" }
+  [[likely]];
+  [[likely]] {}
+  [[likely]] if (true) {}
+  [[likely]] while (false) {}
+  [[likely]] goto lab;
+  [[likely]] lab:;
+  [[likely]] try {} catch (int) {}
+  if ([[likely]] int x = 0) {} // { dg-warning "'likely' attribute 
ignored" }
+  switch (n)
+{
+[[likely]] case 1:
+[[likely]] break;
+[[likely]] default:
+break;
+}
+  for ([[likely]] auto a : arr) {} // { dg-warning "'likely' attribute 
ignored" }
+  for ([[likely]] auto [a, b] : arr2) {}// { dg-warning "'likely' attribute 
ignored" }
+   // { dg-error "structured bindings only 
available with" "" { target c++14_down } .-1 }
+  [[likely]] asm (""); // { dg-warning "attributes ignored on 
'asm' declaration" }
+  try {} catch ([[likely]] int x) {}   // { dg-warning "'likely' attribute 
ignored" }
+  try {} catch ([[likely]] int) {} // { dg-warning "'likely' attribute 
ignored" }
+  try {} catch (int [[likely]] x) {}   // { dg-warning "attribute ignored" }
+  try {} catch (int [[likely]]) {} // { dg-warning "attribute ignored" }
+  try {} catch (int x [[likely]]) {}   // { dg-warning "'likely' attribute 
ignored" }
+}
+
+[[likely]] int bar (); // { dg-warning "ISO C\\\+\\\+ 'likely' 
attribute does not apply to functions; treating as '\\\[\\\[gnu::hot\\\]\\\]'" }
+using foobar [[likely]] = int; // { dg-warning "'likely' attribute 
ignored" }
+[[likely]] int a;  // { dg-warning "'likely' attribute 
ignored" }
+[[likely]] auto [b, c] = arr;  // { dg-warning "'likely' attribute 
ignored" }
+   // { dg-error "structured bindings only 
available with" "" { target c++14_down } .-1 }
+[[likely]];// { dg-warning "attribute ignored" }
+inline [[likely]] void baz () {}   // { dg-warning "attribute ignored" }
+   // { dg-error "standard attributes in 
middle of decl-specifiers" "" { target *-*-* } .-1 }
+constexpr [[likely]] int qux () { return 0; }  // { dg-warning "attribute 
ignored" }
+   // { dg-error "standard attributes in 
middle of decl-specifiers" "" { target *-*-* } .-1 }
+int [[likely]] d;  // { dg-warning "attribute ignored" }
+int const [[likely]] e = 1;// { dg-warning "attribute ignored" }
+struct A {} [[likely]];// { dg-warning "attribute 
ignored in declaration of 'struct A'" }
+struct A [[likely]];   // { dg-warning "attribute ignored" }
+struct A [[likely]] a1;// { dg-warning "attribute 
ignored" }
+A [[likely]] a2;   // { dg-warning "attribute ignored" }
+enum B { B0 } [[likely]];  // { dg-warning "attribute ignored in 
declaration of 'enum B'" }

Re: [PATCH] fab: Cleanup eh after optimize_memcpy [PR116601]

2024-09-05 Thread Richard Biener

On Thu, Sep 5, 2024 at 8:25 AM Andrew Pinski  wrote:
>
> When optimize_memcpy was added in r7-5443-g7b45d0dfeb5f85,
> a path was added such that a statement was turned into a non-throwing
> statement and maybe_clean_or_replace_eh_stmt/gimple_purge_dead_eh_edges
> would not be called for that statement.
> This adds these calls to that path.
>
> Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> Ok? For the trunk, 14, 13 and 12 branches?

I wonder if this can be somehow integrated better with the existing

  old_stmt = stmt;
  stmt = gsi_stmt (i);
  update_stmt (stmt);

  if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
  && gimple_purge_dead_eh_edges (bb))
cfg_changed = true;

which frankly looks odd - update_stmt shouldn't ever change stmt.  Maybe
moving the old_stmt assign before the switch works?

> PR tree-optimization/116601
>
> gcc/ChangeLog:
>
> * tree-ssa-ccp.cc (pass_fold_builtins::execute): Cleanup eh
> after optimize_memcpy on a mem statement.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/torture/except-2.C: New test.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/testsuite/g++.dg/torture/except-2.C | 18 ++
>  gcc/tree-ssa-ccp.cc | 11 +--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/torture/except-2.C
>
> diff --git a/gcc/testsuite/g++.dg/torture/except-2.C 
> b/gcc/testsuite/g++.dg/torture/except-2.C
> new file mode 100644
> index 000..d896937a118
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/except-2.C
> @@ -0,0 +1,18 @@
> +// { dg-do compile }
> +// { dg-additional-options "-fexceptions -fnon-call-exceptions" }
> +// PR tree-optimization/116601
> +
> +struct RefitOption {
> +  char subtype;
> +  int string;
> +} n;
> +void h(RefitOption);
> +void k(RefitOption *__val)
> +{
> +  try {
> +*__val = RefitOption{};
> +RefitOption __trans_tmp_2 = *__val;
> +h(__trans_tmp_2);
> +  }
> +  catch(...){}
> +}
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 44711018e0e..3cd385f476b 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -4325,8 +4325,15 @@ pass_fold_builtins::execute (function *fun)
>if (gimple_code (stmt) != GIMPLE_CALL)
> {
>   if (gimple_assign_load_p (stmt) && gimple_store_p (stmt))
> -   optimize_memcpy (&i, gimple_assign_lhs (stmt),
> -gimple_assign_rhs1 (stmt), NULL_TREE);
> +   {
> + optimize_memcpy (&i, gimple_assign_lhs (stmt),
> +  gimple_assign_rhs1 (stmt), NULL_TREE);
> + old_stmt = stmt;
> + stmt = gsi_stmt (i);
> + if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt)
> + && gimple_purge_dead_eh_edges (bb))
> +   cfg_changed = true;
> +   }
>   gsi_next (&i);
>   continue;
> }
> --
> 2.43.0
>

[PATCH] c++: Fix up maybe_unused attribute handling [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

When adding test coverage for maybe_unused attribute, I've run into
several things:
1) similarly to deprecated attribute, the attribute shouldn't pedantically
   appertain to types other than class/enumeration definitions
2) similarly to deprecated attribute, the attribute shouldn't pedantically
   appertain to unnamed bit-fields
3) the standard says that it can appertain to identifier labels, but
   we handled it silently also on case and default labels
4) I've run into a weird spurious error on
   int f [[maybe_unused]];
   int & [[maybe_unused]] i = f;
   int && [[maybe_unused]] j = 0;
   The problem was that we create an attribute variant for the int &
   type, then create an attribute variant for the int && type, and
   the type_canon_hash hashing just thought those 2 are the same,
   so used int & [[maybe_unused]] type for j rather than
   int && [[maybe_unused]].  As TYPE_REF_IS_RVALUE is a flag in the
   generic code, it was easily possible to hash that flag and compare
   it

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
gcc/
* tree.cc (type_hash_canon_hash): Hash TYPE_REF_IS_RVALUE for
REFERENCE_TYPE.
(type_cache_hasher::equal): Compare TYPE_REF_IS_RVALUE for
REFERENCE_TYPE.
gcc/cp/
* tree.cc (handle_maybe_unused_attribute): New function.
(std_attributes): Use handle_maybe_unused_attribute instead
of handle_unused_attribute for maybe_unused attribute.
gcc/testsuite/
* g++.dg/cpp0x/attr-maybe_unused1.C: New test.

--- gcc/tree.cc.jj  2024-08-30 16:41:34.712367197 +0200
+++ gcc/tree.cc 2024-09-04 18:52:37.792157965 +0200
@@ -6085,6 +6085,10 @@ type_hash_canon_hash (tree type)
   hstate.add_poly_int (TYPE_VECTOR_SUBPARTS (type));
   break;
 
+case REFERENCE_TYPE:
+  hstate.add_flag (TYPE_REF_IS_RVALUE (type));
+  break;
+
 default:
   break;
 }
@@ -6127,7 +6131,6 @@ type_cache_hasher::equal (type_hash *a,
 case OPAQUE_TYPE:
 case COMPLEX_TYPE:
 case POINTER_TYPE:
-case REFERENCE_TYPE:
 case NULLPTR_TYPE:
   return true;
 
@@ -6217,6 +6220,9 @@ type_cache_hasher::equal (type_hash *a,
break;
   return false;
 
+case REFERENCE_TYPE:
+  return TYPE_REF_IS_RVALUE (a->type) == TYPE_REF_IS_RVALUE (b->type);
+
 default:
   return false;
 }
--- gcc/cp/tree.cc.jj   2024-09-04 12:36:51.904244104 +0200
+++ gcc/cp/tree.cc  2024-09-04 17:56:08.946371353 +0200
@@ -5106,6 +5106,27 @@ handle_std_deprecated_attribute (tree *n
   return ret;
 }
 
+/* The C++17 [[maybe_unused]] attribute mostly maps to the GNU unused
+   attribute.  */
+
+static tree
+handle_maybe_unused_attribute (tree *node, tree name, tree args, int flags,
+  bool *no_add_attrs)
+{
+  tree t = *node;
+  tree ret = handle_unused_attribute (node, name, args, flags, no_add_attrs);
+  if (TYPE_P (*node) && t != *node)
+pedwarn (input_location, OPT_Wattributes,
+"%qE on a type other than class or enumeration definition", name);
+  else if (TREE_CODE (*node) == FIELD_DECL && DECL_UNNAMED_BIT_FIELD (*node))
+pedwarn (input_location, OPT_Wattributes, "%qE on unnamed bit-field",
+name);
+  else if (TREE_CODE (*node) == LABEL_DECL && DECL_NAME (*node) == NULL_TREE)
+pedwarn (input_location, OPT_Wattributes,
+"%qE on % or % label", name);
+  return ret;
+}
+
 /* Table of valid C++ attributes.  */
 static const attribute_spec cxx_gnu_attributes[] =
 {
@@ -5132,7 +5153,7 @@ static const attribute_spec std_attribut
   { "deprecated", 0, 1, false, false, false, false,
 handle_std_deprecated_attribute, NULL },
   { "maybe_unused", 0, 0, false, false, false, false,
-handle_unused_attribute, NULL },
+handle_maybe_unused_attribute, NULL },
   { "nodiscard", 0, 1, false, false, false, false,
 handle_nodiscard_attribute, NULL },
   { "no_unique_address", 0, 0, true, false, false, false,
--- gcc/testsuite/g++.dg/cpp0x/attr-maybe_unused1.C.jj  2024-09-04 
17:28:44.552574792 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-maybe_unused1.C 2024-09-04 
18:54:14.207069672 +0200
@@ -0,0 +1,148 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  [[maybe_unused]] int x1;
+  [[maybe_unused ("foobar")]] int x2;  // { dg-error "'maybe_unused' 
attribute does not take any arguments" }
+   // { dg-error "expected 
primary-expression before 'int'" "" { target *-*-* } .-1 }
+  [[maybe_unused (0)]] int x3; // { dg-error "'maybe_unused' 
attribute does not take any arguments" }
+   // { dg-error "expected 
primary-expression before 'int'" "" { target *-*-* } .-1 }
+
+  auto a = [] [[maybe_unused]] () {};
+  auto b = [] const

[PATCH] RISC-V: Lookup reversely in riscv_select_multilib_by_abi

2024-09-05 Thread YunQiang Su

From: YunQiang Su 

When use --print-multilib-os-dir, gcc outputs different value
with full -march option and the base one only.

$ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc
lib64/lp64d

$ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc_zba
.

The reason is that in multilib.h, the fallback value of multilib
is listed as the 1st one in `multilib_raw[]`.

gcc
* common/config/riscv/riscv-common.cc(riscv_select_multilib_by_abi):
look up reversely as the fallback path is listed as the 1st one.
---
 gcc/common/config/riscv/riscv-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 62c6e1dab1f..2c1ce7fc7cb 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -2079,7 +2079,7 @@ riscv_select_multilib_by_abi (
   const std::string &riscv_current_abi_str,
   const std::vector &multilib_infos)
 {
-  for (size_t i = 0; i < multilib_infos.size (); ++i)
+  for (ssize_t i = multilib_infos.size (); i >= 0; --i)
 if (riscv_current_abi_str == multilib_infos[i].abi_str)
   return xstrdup (multilib_infos[i].path.c_str ());
 
-- 
2.39.3 (Apple Git-146)

[PATCH] vrp: Fix up diagnostics wording

2024-09-05 Thread Jakub Jelinek

Hi!

I've noticed non-standard wording of this diagnostics when looking at
a miscompilation with --param=vrp-block-limit=0.

Diagnostics generally shouldn't start with uppercase letter (unless
the upper case would appear also in the middle of a sentence) and shouldn't
be separate sentences with dot as separator, ; is IMHO more frequently used.

Ok for trunk?

2024-09-05  Jakub Jelinek  

* tree-vrp.cc (pass_vrp::execute): Start diagnostics with
lowercase u rather than capital U, use semicolon instead of dot.

--- gcc/tree-vrp.cc.jj  2024-07-01 11:28:23.489227914 +0200
+++ gcc/tree-vrp.cc 2024-09-04 21:26:35.693634896 +0200
@@ -1337,7 +1337,7 @@ public:
{
  use_fvrp = true;
  warning (OPT_Wdisabled_optimization,
-  "Using fast VRP algorithm. %d basic blocks"
+  "using fast VRP algorithm; %d basic blocks"
   " exceeds %<--param=vrp-block-limit=%d%> limit",
   n_basic_blocks_for_fn (fun),
   param_vrp_block_limit);

Jakub

Re: [PATCH] vrp: Fix up diagnostics wording

2024-09-05 Thread Aldy Hernandez

Ok

On Thu, Sep 5, 2024, 09:30 Jakub Jelinek  wrote:

> Hi!
>
> I've noticed non-standard wording of this diagnostics when looking at
> a miscompilation with --param=vrp-block-limit=0.
>
> Diagnostics generally shouldn't start with uppercase letter (unless
> the upper case would appear also in the middle of a sentence) and shouldn't
> be separate sentences with dot as separator, ; is IMHO more frequently
> used.
>
> Ok for trunk?
>
> 2024-09-05  Jakub Jelinek  
>
> * tree-vrp.cc (pass_vrp::execute): Start diagnostics with
> lowercase u rather than capital U, use semicolon instead of dot.
>
> --- gcc/tree-vrp.cc.jj  2024-07-01 11:28:23.489227914 +0200
> +++ gcc/tree-vrp.cc 2024-09-04 21:26:35.693634896 +0200
> @@ -1337,7 +1337,7 @@ public:
> {
>   use_fvrp = true;
>   warning (OPT_Wdisabled_optimization,
> -  "Using fast VRP algorithm. %d basic blocks"
> +  "using fast VRP algorithm; %d basic blocks"
>" exceeds %<--param=vrp-block-limit=%d%> limit",
>n_basic_blocks_for_fn (fun),
>param_vrp_block_limit);
>
> Jakub
>
>

[COMMITTED 1/6] ada: Tweak assertions in Inline.Cannot_Inline

2024-09-05 Thread Marc Poulhiès

From: Ronan Desplanques 

The purpose of this patch is to silence a GNATSAS report.

gcc/ada/

* inline.adb (Cannot_Inline): Remove assertion.
* inline.ads (Cannot_Inline): Add precondition.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/inline.adb | 2 --
 gcc/ada/inline.ads | 5 -
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/inline.adb b/gcc/ada/inline.adb
index 519e26ecec8..5f310abafda 100644
--- a/gcc/ada/inline.adb
+++ b/gcc/ada/inline.adb
@@ -2136,8 +2136,6 @@ package body Inline is
  end;
   end if;
 
-  pragma Assert (Msg (Msg'Last) = '?');
-
   --  Legacy front-end inlining model
 
   if not Back_End_Inlining then
diff --git a/gcc/ada/inline.ads b/gcc/ada/inline.ads
index bc90c0ce6d8..696f4227c7b 100644
--- a/gcc/ada/inline.ads
+++ b/gcc/ada/inline.ads
@@ -165,7 +165,10 @@ package Inline is
   N : Node_Id;
   Subp  : Entity_Id;
   Is_Serious: Boolean := False;
-  Suppress_Info : Boolean := False);
+  Suppress_Info : Boolean := False)
+ with
+   Pre => Msg'First <= Msg'Last
+   and then Msg (Msg'Last) = '?';
--  This procedure is called if the node N, an instance of a call to
--  subprogram Subp, cannot be inlined. Msg is the message to be issued,
--  which ends with ? (it does not end with ?p?, this routine takes care of
-- 
2.45.2

[COMMITTED 5/6] ada: Streamline handling of low-level peculiarities of record field layout

2024-09-05 Thread Marc Poulhiès

From: Eric Botcazou 

This factors out the interface to the low-level field layout machinery.

gcc/ada/

* gcc-interface/gigi.h (default_field_alignment): New function.
* gcc-interface/misc.cc: Include tm_p header file.
(default_field_alignment): New function.
* gcc-interface/trans.cc (addressable_p) : Replace
previous alignment klduge with call to default_field_alignment.
* gcc-interface/utils.cc (finish_record_type): Likewise for the
alignment based on which DECL_BIT_FIELD should be cleared.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/gigi.h   |  4 
 gcc/ada/gcc-interface/misc.cc  | 21 +
 gcc/ada/gcc-interface/trans.cc | 24 +++-
 gcc/ada/gcc-interface/utils.cc |  2 +-
 4 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/gcc/ada/gcc-interface/gigi.h b/gcc/ada/gcc-interface/gigi.h
index 40f3f0d3d13..f4b302be3e0 100644
--- a/gcc/ada/gcc-interface/gigi.h
+++ b/gcc/ada/gcc-interface/gigi.h
@@ -1008,6 +1008,10 @@ extern bool must_pass_by_ref (tree gnu_type);
 /* Return the size of the FP mode with precision PREC.  */
 extern int fp_prec_to_size (int prec);
 
+/* Return the default alignment of a FIELD of TYPE declared in a record or
+   union type as specified by the ABI of the target architecture.  */
+extern unsigned int default_field_alignment (tree field, tree type);
+
 /* Return the precision of the FP mode with size SIZE.  */
 extern int fp_size_to_prec (int size);
 
diff --git a/gcc/ada/gcc-interface/misc.cc b/gcc/ada/gcc-interface/misc.cc
index 13cb39e91cb..ef5de7f5651 100644
--- a/gcc/ada/gcc-interface/misc.cc
+++ b/gcc/ada/gcc-interface/misc.cc
@@ -28,6 +28,7 @@
 #include "coretypes.h"
 #include "target.h"
 #include "tree.h"
+#include "tm_p.h"
 #include "diagnostic.h"
 #include "opts.h"
 #include "alias.h"
@@ -1129,6 +1130,26 @@ must_pass_by_ref (tree gnu_type)
  && TREE_CODE (TYPE_SIZE_UNIT (gnu_type)) != INTEGER_CST));
 }
 
+/* Return the default alignment of a FIELD of TYPE declared in a record or
+   union type as specified by the ABI of the target architecture.  */
+
+unsigned int
+default_field_alignment (tree ARG_UNUSED (field), tree type)
+{
+  /* This is modeled on layout_decl.  */
+  unsigned int align = TYPE_ALIGN (type);
+
+#ifdef BIGGEST_FIELD_ALIGNMENT
+  align = MIN (align, (unsigned int) BIGGEST_FIELD_ALIGNMENT);
+#endif
+
+#ifdef ADJUST_FIELD_ALIGN
+  align = ADJUST_FIELD_ALIGN (field, type, align);
+#endif
+
+  return align;
+}
+
 /* This function is called by the front-end to enumerate all the supported
modes for the machine, as well as some predefined C types.  F is a function
which is called back with the parameters as listed below, first a string,
diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index c99b06670d5..9e9f5f8dcba 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -10291,23 +10291,13 @@ addressable_p (tree gnu_expr, tree gnu_type)
/* Even with DECL_BIT_FIELD cleared, we have to ensure that
   the field is sufficiently aligned, in case it is subject
   to a pragma Component_Alignment.  But we don't need to
-  check the alignment of the containing record, as it is
-  guaranteed to be not smaller than that of its most
-  aligned field that is not a bit-field.  */
-   && (DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
-   >= TYPE_ALIGN (TREE_TYPE (gnu_expr))
-#ifdef TARGET_ALIGN_DOUBLE
-  /* Cope with the misalignment of doubles in records for
- ancient 32-bit ABIs like that of x86/Linux.  */
-  || (DECL_ALIGN (TREE_OPERAND (gnu_expr, 1)) == 32
-  && TYPE_ALIGN (TREE_TYPE (gnu_expr)) == 64
-  && !TARGET_ALIGN_DOUBLE
-#ifdef TARGET_64BIT
-  && !TARGET_64BIT
-#endif
- )
-#endif
-  ))
+  check the alignment of the containing record, since it
+  is guaranteed to be not smaller than that of its most
+  aligned field that is not a bit-field.  However, we need
+  to cope with quirks of ABIs that may misalign fields.  */
+   && DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
+  >= default_field_alignment (TREE_OPERAND (gnu_expr, 1),
+  TREE_TYPE (gnu_expr)))
   /* The field of a padding record is always addressable.  */
   || TYPE_IS_PADDING_P (TREE_TYPE (TREE_OPERAND (gnu_expr, 0
  && addressable_p (TREE_OPERAND (gnu_expr, 0), NULL_TREE));
diff --git a/gcc/ada/gcc-interface/utils.cc b/gcc/ada/gcc-interface/utils.cc
index 66e3192ea4f..60f36b1e50d 100644
--- a/gcc/ada/gcc-interface/utils.cc
+++ b/gcc/ada/gcc-interface/utils.

[COMMITTED 2/6] ada: Binder respects Ada version for checksum of runtime files

2024-09-05 Thread Marc Poulhiès

From: Jose Ruiz 

The parsing to compute the checksums of runtime files (within the
binder) was done using the default Ada version (Ada 2012 currently),
while the creation of the checksum, when the runtime files are
compiled, is performed in a more recent Ada version (Ada 2022
currently). This change forces the checksum computation for runtime
files to be done with the same Ada version as when they were created.

gcc/ada/

* ali-util.adb (Get_File_Checksum): Force the parsing for
the checksum computation of runtime files to be done in
the corresponding recent Ada version.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/ali-util.adb | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/ali-util.adb b/gcc/ada/ali-util.adb
index 61dddb94e85..4bcb06e6a1f 100644
--- a/gcc/ada/ali-util.adb
+++ b/gcc/ada/ali-util.adb
@@ -29,6 +29,7 @@ with Opt; use Opt;
 with Output;  use Output;
 with Osint;   use Osint;
 with Scans;   use Scans;
+with Fname;   use Fname;
 with Scng;
 with Sinput.C;
 with Stringt;
@@ -87,8 +88,10 @@ package body ALI.Util is
---
 
function Get_File_Checksum (Fname : File_Name_Type) return Word is
-  Full_Name: File_Name_Type;
-  Source_Index : Source_File_Index;
+  Full_Name   : File_Name_Type;
+  Source_Index: Source_File_Index;
+  Ada_Version_Current : Ada_Version_Type;
+  Internal_Unit   : constant Boolean := Is_Internal_File_Name (Fname);
 
begin
   Full_Name := Find_File (Fname, Osint.Source);
@@ -109,6 +112,15 @@ package body ALI.Util is
 
   Scanner.Initialize_Scanner (Source_Index);
 
+  --  The runtime files are precompiled with an implicitly defined Ada
+  --  version that we set here to improve the parsing required to compute
+  --  the checksum.
+
+  if Internal_Unit then
+ Ada_Version_Current := Ada_Version;
+ Ada_Version := Ada_Version_Runtime;
+  end if;
+
   --  Scan the complete file to compute its checksum
 
   loop
@@ -116,6 +128,12 @@ package body ALI.Util is
  exit when Token = Tok_EOF;
   end loop;
 
+  --  Restore the Ada version if we changed it
+
+  if Internal_Unit then
+ Ada_Version := Ada_Version_Current;
+  end if;
+
   return Scans.Checksum;
end Get_File_Checksum;
 
-- 
2.45.2

[COMMITTED 4/6] ada: Remove unused parameters in validity checking routine

2024-09-05 Thread Marc Poulhiès

From: Piotr Trojanek 

Code cleanup; semantics is unaffected.

gcc/ada/

* exp_util.ads, exp_util.adb (Duplicate_Subexpr_No_Checks):
Remove parameters, which are no longer used.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_util.adb | 18 ++
 gcc/ada/exp_util.ads | 16 +++-
 2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/gcc/ada/exp_util.adb b/gcc/ada/exp_util.adb
index 8e5cdb7332e..9b67384755a 100644
--- a/gcc/ada/exp_util.adb
+++ b/gcc/ada/exp_util.adb
@@ -5049,23 +5049,17 @@ package body Exp_Util is
-
 
function Duplicate_Subexpr_No_Checks
- (Exp   : Node_Id;
-  Name_Req  : Boolean   := False;
-  Renaming_Req  : Boolean   := False;
-  Related_Id: Entity_Id := Empty;
-  Is_Low_Bound  : Boolean   := False;
-  Is_High_Bound : Boolean   := False) return Node_Id
+ (Exp  : Node_Id;
+  Name_Req : Boolean := False;
+  Renaming_Req : Boolean := False) return Node_Id
is
   New_Exp : Node_Id;
 
begin
   Remove_Side_Effects
-(Exp   => Exp,
- Name_Req  => Name_Req,
- Renaming_Req  => Renaming_Req,
- Related_Id=> Related_Id,
- Is_Low_Bound  => Is_Low_Bound,
- Is_High_Bound => Is_High_Bound);
+(Exp  => Exp,
+ Name_Req => Name_Req,
+ Renaming_Req => Renaming_Req);
 
   New_Exp := New_Copy_Tree (Exp);
   Remove_Checks (New_Exp);
diff --git a/gcc/ada/exp_util.ads b/gcc/ada/exp_util.ads
index 279feb2e6fe..49e75c79d35 100644
--- a/gcc/ada/exp_util.ads
+++ b/gcc/ada/exp_util.ads
@@ -457,24 +457,14 @@ package Exp_Util is
--  following functions allow this behavior to be modified.
 
function Duplicate_Subexpr_No_Checks
- (Exp   : Node_Id;
-  Name_Req  : Boolean   := False;
-  Renaming_Req  : Boolean   := False;
-  Related_Id: Entity_Id := Empty;
-  Is_Low_Bound  : Boolean   := False;
-  Is_High_Bound : Boolean   := False) return Node_Id;
+ (Exp  : Node_Id;
+  Name_Req : Boolean := False;
+  Renaming_Req : Boolean := False) return Node_Id;
--  Identical in effect to Duplicate_Subexpr, except that Remove_Checks is
--  called on the result, so that the duplicated expression does not include
--  checks. This is appropriate for use when Exp, the original expression is
--  unconditionally elaborated before the duplicated expression, so that
--  there is no need to repeat any checks.
-   --
-   --  Related_Id denotes the entity of the context where Expr appears. Flags
-   --  Is_Low_Bound and Is_High_Bound specify whether the expression to check
-   --  is the low or the high bound of a range. These three optional arguments
-   --  signal Remove_Side_Effects to create an external symbol of the form
-   --  Chars (Related_Id)_FIRST/_LAST. For suggested use of these parameters
-   --  see the warning in the body of Sem_Ch3.Process_Range_Expr_In_Decl.
 
function Duplicate_Subexpr_Move_Checks
  (Exp  : Node_Id;
-- 
2.45.2

[COMMITTED 6/6] ada: Add bypass for internal fields on strict-alignment platforms

2024-09-05 Thread Marc Poulhiès

From: Eric Botcazou 

This is required to support misalignment of tagged types in legacy code.

gcc/ada/

* gcc-interface/trans.cc (addressable_p) : Add bypass
for internal fields on strict-alignment platforms.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 9e9f5f8dcba..92e000686fb 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -10295,9 +10295,14 @@ addressable_p (tree gnu_expr, tree gnu_type)
   is guaranteed to be not smaller than that of its most
   aligned field that is not a bit-field.  However, we need
   to cope with quirks of ABIs that may misalign fields.  */
-   && DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
-  >= default_field_alignment (TREE_OPERAND (gnu_expr, 1),
-  TREE_TYPE (gnu_expr)))
+   && (DECL_ALIGN (TREE_OPERAND (gnu_expr, 1))
+   >= default_field_alignment (TREE_OPERAND (gnu_expr, 1),
+   TREE_TYPE (gnu_expr))
+   /* We do not enforce this on strict-alignment platforms for
+  internal fields in order to keep supporting misalignment
+  of tagged types in legacy code.  */
+   || (!STRICT_ALIGNMENT
+   && DECL_INTERNAL_P (TREE_OPERAND (gnu_expr, 1)
   /* The field of a padding record is always addressable.  */
   || TYPE_IS_PADDING_P (TREE_TYPE (TREE_OPERAND (gnu_expr, 0
  && addressable_p (TREE_OPERAND (gnu_expr, 0), NULL_TREE));
-- 
2.45.2

Re: [PATCH] Vect: use a small step to calculate induction for the unrolled loop (PR tree-optimization/110449)

2024-09-05 Thread Richard Biener

On Thu, Jul 6, 2023 at 7:50 PM Richard Sandiford
 wrote:
>
> Richard Biener via Gcc-patches  writes:
> > On Wed, Jul 5, 2023 at 8:44 AM Hao Liu OS via Gcc-patches
> >  wrote:
> >>
> >> Hi,
> >>
> >> If a loop is unrolled by n times during vectoriation, two steps are used to
> >> calculate the induction variable:
> >>   - The small step for the unrolled ith-copy: vec_1 = vec_iv + (VF/n * 
> >> Step)
> >>   - The large step for the whole loop: vec_loop = vec_iv + (VF * Step)
> >>
> >> This patch calculates an extra vec_n to replace vec_loop:
> >>   vec_n = vec_prev + (VF/n * S) = vec_iv + (VF/n * S) * n = vec_loop.
> >>
> >> So that we can save the large step register and related operations.
> >
> > OK.  It would be nice to avoid the dead stmts created earlier though.
>
> FWIW, I still don't think we should do this.  Part of the point of
> unrolling is to shorten loop-carried dependencies, whereas this patch
> is going in the opposite direction.

Just to note when forcing SLP for the testcase added we're now back
with the shorter dependence (and a testsuite regression).  Either the
SLP path didn't exist or it wasn't updated.

I'll leave the testcase FAILing and will not work to carry over the
"optimization" to the SLP side (I agree that unrolling should avoid
those dependence chains).

Richard.

> Richard
>
> >
> > Thanks,
> > Richard.
> >
> >> gcc/ChangeLog:
> >>
> >> PR tree-optimization/110449
> >> * tree-vect-loop.cc (vectorizable_induction): use vec_n to replace
> >> vec_loop for the unrolled loop.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/aarch64/pr110449.c: New testcase.
> >> ---
> >>  gcc/testsuite/gcc.target/aarch64/pr110449.c | 40 +
> >>  gcc/tree-vect-loop.cc   | 21 +--
> >>  2 files changed, 58 insertions(+), 3 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr110449.c
> >>
> >> diff --git a/gcc/testsuite/gcc.target/aarch64/pr110449.c 
> >> b/gcc/testsuite/gcc.target/aarch64/pr110449.c
> >> new file mode 100644
> >> index 000..bb3b6dcfe08
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/aarch64/pr110449.c
> >> @@ -0,0 +1,40 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-Ofast -mcpu=neoverse-n2 --param 
> >> aarch64-vect-unroll-limit=2" } */
> >> +/* { dg-final { scan-assembler-not "8.0e\\+0" } } */
> >> +
> >> +/* Calcualte the vectorized induction with smaller step for an unrolled 
> >> loop.
> >> +
> >> +   before (suggested_unroll_factor=2):
> >> + fmovs30, 8.0e+0
> >> + fmovs31, 4.0e+0
> >> + dup v27.4s, v30.s[0]
> >> + dup v28.4s, v31.s[0]
> >> + .L6:
> >> + mov v30.16b, v31.16b
> >> + faddv31.4s, v31.4s, v27.4s
> >> + faddv29.4s, v30.4s, v28.4s
> >> + stp q30, q29, [x0]
> >> + add x0, x0, 32
> >> + cmp x1, x0
> >> + bne .L6
> >> +
> >> +   after:
> >> + fmovs31, 4.0e+0
> >> + dup v29.4s, v31.s[0]
> >> + .L6:
> >> + faddv30.4s, v31.4s, v29.4s
> >> + stp q31, q30, [x0]
> >> + add x0, x0, 32
> >> + faddv31.4s, v29.4s, v30.4s
> >> + cmp x0, x1
> >> + bne .L6  */
> >> +
> >> +void
> >> +foo2 (float *arr, float freq, float step)
> >> +{
> >> +  for (int i = 0; i < 1024; i++)
> >> +{
> >> +  arr[i] = freq;
> >> +  freq += step;
> >> +}
> >> +}
> >> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> >> index 3b46c58a8d8..706ecbffd0c 100644
> >> --- a/gcc/tree-vect-loop.cc
> >> +++ b/gcc/tree-vect-loop.cc
> >> @@ -10114,7 +10114,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >>new_vec, step_vectype, NULL);
> >>
> >>vec_def = induc_def;
> >> -  for (i = 1; i < ncopies; i++)
> >> +  for (i = 1; i < ncopies + 1; i++)
> >> {
> >>   /* vec_i = vec_prev + vec_step  */
> >>   gimple_seq stmts = NULL;
> >> @@ -10124,8 +10124,23 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >>   vec_def = gimple_convert (&stmts, vectype, vec_def);
> >>
> >>   gsi_insert_seq_before (&si, stmts, GSI_SAME_STMT);
> >> - new_stmt = SSA_NAME_DEF_STMT (vec_def);
> >> - STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> >> + if (i < ncopies)
> >> +   {
> >> + new_stmt = SSA_NAME_DEF_STMT (vec_def);
> >> + STMT_VINFO_VEC_STMTS (stmt_info).safe_push (new_stmt);
> >> +   }
> >> + else
> >> +   {
> >> + /* vec_1 = vec_iv + (VF/n * S)
> >> +vec_2 = vec_1 + (VF/n * S)
> >> +...
> >> +vec_n = vec_prev + (VF/n * S) = vec_iv + VF * S = vec_loop
> >> +
> >> +vec_n is used as vec_loop to save the large step register 
> >> and
> >> +related operations.  */
> >> +

Re: [PATCH] RISC-V: Lookup reversely in riscv_select_multilib_by_abi

2024-09-05 Thread Kito Cheng

LGTM, thanks for catching this, but commit log seems not right?
should it be -print-multi-directory or -print-multi-os-directory
rather than --print-multilib-os-dir?
(I guess should be -print-multi-directory per your output)

Anyway, you can go ahead and push that after the fix:)


On Thu, Sep 5, 2024 at 3:30 PM YunQiang Su  wrote:
>
> From: YunQiang Su 
>
> When use --print-multilib-os-dir, gcc outputs different value
> with full -march option and the base one only.
>
> $ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc
> lib64/lp64d
>
> $ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc_zba
> .
>
> The reason is that in multilib.h, the fallback value of multilib
> is listed as the 1st one in `multilib_raw[]`.
>
> gcc
> * common/config/riscv/riscv-common.cc(riscv_select_multilib_by_abi):
> look up reversely as the fallback path is listed as the 1st one.
> ---
>  gcc/common/config/riscv/riscv-common.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 62c6e1dab1f..2c1ce7fc7cb 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -2079,7 +2079,7 @@ riscv_select_multilib_by_abi (
>const std::string &riscv_current_abi_str,
>const std::vector &multilib_infos)
>  {
> -  for (size_t i = 0; i < multilib_infos.size (); ++i)
> +  for (ssize_t i = multilib_infos.size (); i >= 0; --i)
>  if (riscv_current_abi_str == multilib_infos[i].abi_str)
>return xstrdup (multilib_infos[i].path.c_str ());
>
> --
> 2.39.3 (Apple Git-146)
>

[PATCH] testsuite: Fix xorsign.c, vect-double-2.c fails with -march=x86-64-v2

2024-09-05 Thread Hu, Lin1

Hi, all

These testcases raise fails with -march=x86-64-v2, so add -mno-sse4 to avoid
these unexpected fails.

Bootstrap and regtest running on x86-64-linux-gnu, pushed as obvious.

BRs,
Lin

gcc/testsuite/ChangeLog:

PR testsuite/116608
* gcc.target/i386/vect-double-2.c: Add extra option -mno-sse4
* gcc.target/i386/xorsign.c: Ditto.
---
 gcc/testsuite/gcc.target/i386/vect-double-2.c | 2 +-
 gcc/testsuite/gcc.target/i386/xorsign.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/vect-double-2.c 
b/gcc/testsuite/gcc.target/i386/vect-double-2.c
index eea53bfa6b1..065d2e5af08 100644
--- a/gcc/testsuite/gcc.target/i386/vect-double-2.c
+++ b/gcc/testsuite/gcc.target/i386/vect-double-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns 
-mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats" } */
+/* { dg-options "-O2 -ftree-vectorize -fno-tree-loop-distribute-patterns 
-mfpmath=sse -msse2 -mtune=atom -fdump-tree-vect-stats -mno-sse4" } */
 
 extern void abort (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/xorsign.c 
b/gcc/testsuite/gcc.target/i386/xorsign.c
index ebed5edccb6..f280dd20d7b 100644
--- a/gcc/testsuite/gcc.target/i386/xorsign.c
+++ b/gcc/testsuite/gcc.target/i386/xorsign.c
@@ -1,5 +1,5 @@
 /* { dg-do run { target sse2_runtime } } */
-/* { dg-options "-O2 -msse2 -mfpmath=sse -ftree-vectorize 
-fdump-tree-vect-details -save-temps" } */
+/* { dg-options "-O2 -msse2 -mfpmath=sse -ftree-vectorize 
-fdump-tree-vect-details -save-temps -mno-sse4" } */
 
 extern void abort ();
 
-- 
2.39.1

Re: [PATCH] RISC-V: Lookup reversely in riscv_select_multilib_by_abi

2024-09-05 Thread YunQiang Su

Kito Cheng  于2024年9月5日周四 16:36写道：
>
> LGTM, thanks for catching this, but commit log seems not right?
> should it be -print-multi-directory or -print-multi-os-directory
> rather than --print-multilib-os-dir?

Yes. It is a typo.
I used `--print-multilib-os-dir`, and yes, as you said, `-print-multi-directory`
has same problem.

> (I guess should be -print-multi-directory per your output)
>
> Anyway, you can go ahead and push that after the fix:)
>
>
> On Thu, Sep 5, 2024 at 3:30 PM YunQiang Su  wrote:
> >
> > From: YunQiang Su 
> >
> > When use --print-multilib-os-dir, gcc outputs different value
> > with full -march option and the base one only.
> >
> > $ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc
> > lib64/lp64d
> >
> > $ ./gcc/xgcc --print-multilib-os-dir -mabi=lp64d -march=rv64gc_zba
> > .
> >
> > The reason is that in multilib.h, the fallback value of multilib
> > is listed as the 1st one in `multilib_raw[]`.
> >
> > gcc
> > * common/config/riscv/riscv-common.cc(riscv_select_multilib_by_abi):
> > look up reversely as the fallback path is listed as the 1st one.
> > ---
> >  gcc/common/config/riscv/riscv-common.cc | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/gcc/common/config/riscv/riscv-common.cc 
> > b/gcc/common/config/riscv/riscv-common.cc
> > index 62c6e1dab1f..2c1ce7fc7cb 100644
> > --- a/gcc/common/config/riscv/riscv-common.cc
> > +++ b/gcc/common/config/riscv/riscv-common.cc
> > @@ -2079,7 +2079,7 @@ riscv_select_multilib_by_abi (
> >const std::string &riscv_current_abi_str,
> >const std::vector &multilib_infos)
> >  {
> > -  for (size_t i = 0; i < multilib_infos.size (); ++i)
> > +  for (ssize_t i = multilib_infos.size (); i >= 0; --i)
> >  if (riscv_current_abi_str == multilib_infos[i].abi_str)
> >return xstrdup (multilib_infos[i].path.c_str ());
> >
> > --
> > 2.39.3 (Apple Git-146)
> >

[PATCH] tree-optimization/116609 - SLP live lane vectorization with partial vectors

2024-09-05 Thread Richard Biener

The following implements the simple case of single-lane SLP when
using partial vectors which can use the VEC_EXTRACT_LAST code
generation without changes.  I'll keep the PR open for further
enhancements.

This avoids FAILs of gcc.target/aarch64/sve/live_1.c when using
single-lane SLP for non-grouped stores.

Bootstrap and regtest on x86_64-unknown-linux-gnu in progress.

PR tree-optimization/116609
* tree-vect-loop.cc (vectorizable_live_operation_1): Support
partial vectors for single-lane SLP.
---
 gcc/tree-vect-loop.cc | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 242d5e2d916..62cf9205059 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10961,7 +10961,8 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 
 where VEC_LHS is the vectorized live-out result and MASK is
 the loop mask for the final iteration.  */
-  gcc_assert (ncopies == 1 && !slp_node);
+  gcc_assert (ncopies == 1
+ && (!slp_node || SLP_TREE_LANES (slp_node) == 1));
   gimple_seq tem = NULL;
   gimple_stmt_iterator gsi = gsi_last (tem);
   tree len = vect_get_loop_len (loop_vinfo, &gsi,
@@ -10995,7 +10996,7 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 
 where VEC_LHS is the vectorized live-out result and MASK is
 the loop mask for the final iteration.  */
-  gcc_assert (!slp_node);
+  gcc_assert (!slp_node || SLP_TREE_LANES (slp_node) == 1);
   tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
   gimple_seq tem = NULL;
   gimple_stmt_iterator gsi = gsi_last (tem);
@@ -11147,7 +11148,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* No transformation required.  */
   if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
- if (slp_node)
+ if (slp_node && SLP_TREE_LANES (slp_node) != 1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -11166,7 +11167,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
}
  else
{
- gcc_assert (ncopies == 1 && !slp_node);
+ gcc_assert (ncopies == 1
+ && (!slp_node || SLP_TREE_LANES (slp_node) == 1));
  if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
  OPTIMIZE_FOR_SPEED))
vect_record_loop_mask (loop_vinfo,
@@ -11213,8 +11215,9 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (slp_node)
 {
   gcc_assert (!loop_vinfo
- || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
- && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)));
+ || ((!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+  && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ || SLP_TREE_LANES (slp_node) == 1));
 
   /* Get the correct slp vectorized stmt.  */
   vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry];
-- 
2.43.0

[PATCH] x86: Refine V4BF/V2BF FMA testcase

2024-09-05 Thread Levy Hsu

Simple testcase fix, ok for trunk?

This patch removes specific register checks to account for possible
register spills and disables tests in 32-bit mode. This adjustment
is necessary because V4BF operations in 32-bit mode require duplicating
instructions, which lead to unintended test failures. It fixed the
case when testing with --target_board='unix{-m32\ -march=cascadelake}'

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c: Remove specific
register checks to account for potential register spills. Exclude tests
in 32-bit mode to prevent incorrect failure reports due to the need for
multiple instruction executions in handling V4BF operations.
---
 .../gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c 
b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
index 72e17e99603..17c32c1d36b 100644
--- a/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
+++ b/gcc/testsuite/gcc.target/i386/avx10_2-partial-bf-vector-fma-1.c
@@ -1,9 +1,9 @@
 /* { dg-do compile } */
 /* { dg-options "-mavx10.2 -O2" } */
-/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
-/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 2 } } */
+/* { dg-final { scan-assembler-times "vfmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmadd132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmsub132nepbf16\[^\n\r\]*xmm\[0-9\]" 2 
{ target { ! ia32 } } } } */
 
 typedef __bf16 v4bf __attribute__ ((__vector_size__ (8)));
 typedef __bf16 v2bf __attribute__ ((__vector_size__ (4)));
-- 
2.31.1

[patch][v2] Fortran: Add OpenMP 'interop' directive parsing support

2024-09-05 Thread Tobias Burnus

Now also supports the following (note the variable name): 
'init(targetsync, target)' – and I fixed an ICE when the variable 
parsing failed.


Comments before I commit it?

Tobias

Tobias Burnus wrote:
This patch adds Fortran parsing support for OpenMP's 'interop' 
directive (which stops with a 'sorry' in trans-openmp.cc as the middle 
end support is still missing).


Tested on x86-64-gnu-linux.

Comments, suggestions, remarks?

* * *

Background:

'interop' makes it easier to call, e.g., a CUDA-BLAS function directly 
as it permits to map an OpenMP device number (→ "target" modifier 
required) to the "foreign runtime" device number or to get directly a 
stream object (→ if "targetsync" modifier specified) with dependency 
tracking.


Just calling '!$omp interop init(obj)' works but that leaves the 
decision which type of object should be returned to the run time.


Using 'prefer_type', the user can ask for a specific type. Permits is 
a string such as "hip" or an integer constant such as 
omp_ifr_cuda_driver – and the old-style syntax is 'prefer_type(integer expr|literal string> [ ,  ...])'.  [Note 
thatn a constant integer expression is permitted.]


The new syntax permits additional attributes like for 'sycl' 
requesting an 'in-order' queue (instead of the default 'out-of-order' 
queue when obtaining a stream. The new syntax is 'prefer_type( {...} 
[, {...} ... } ) where '{ ... }' is a list of either 
'attr("ompx_...")' (i.e. 'attr(...)' with literal string arg that 
starts with ompx_ and does not contain a ',') or 
'fr()' where the identifier is an integer 
constant. 'fr' can be present or not, but only once per {...} while 
multiple 'attr' may be used. [Note that as non-string only an 
identifier is permitted (i.e. a integer parameter).]


I decided for the used way to encode the string – but I am open to 
other representations as well. In my WIP/RFC patch is is used as shown 
in plugin-*.c in the patch 
https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661207.html


The available foreign runtimes and values that can be returned values 
are hidden in that patch and more readable in the documentation patch 
at https://gcc.gnu.org/pipermail/gcc-patches/2024-August/661365.html


If someone wants to delve into the details of the 'interop' feature: 
Have a look at OpenMP 5.1 (5.2) *and* TR13 and the additional 
definition document at https://www.openmp.org/specifications/ ('hsa': 
publishing pending).


* * *

Tobias

PS: In the dump, I am a bit lazy and add spurious tailing ','. As it 
is only a dump, I decided adding a bunch of checks to ensure that a 
',' only gets printed if needed is not really required. If you think 
otherwise, I can surely add a bunch of 'if' an only print it 
conditionally.


PPS: In order to to use 'interop', mainly the part in middle is 
missing, i.e. some middle-end gimplification with a call into libgomp 
– and the libgomp function. A stub version of the latter and some 
(loosely) tested plugin handling does exist as WIP/RFC patch, see 
patch link above. - Besides gimplify and the libgomp function, a bunch 
of tests and, obviously, the C and C++ FE counterpart to this patch 
have to be implemented.Fortran: Add OpenMP 'interop' directive parsing support

Parse OpenMP's 'interop' directive but stop with a 'sorry, unimplemented'
after resolving.

Additionally, it moves some clause dumping away from the end directive as
that lead to 'nowait' not being printed when it should as some cases were
missed.

gcc/fortran/ChangeLog:

	* dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_INIT.
	(show_omp_clauses): Handle OMP_LIST_{INIT,USE,DESTORY}; move 'nowait'
	from end-directive to the directive dump.
	(show_omp_node, show_code_node): Handle EXEC_OMP_INTEROP.
	* gfortran.h (enum gfc_statement): Add ST_OMP_INTEROP.
	(OMP_LIST_INIT, OMP_LIST_USE, OMP_LIST_DESTROY): Add.
	(enum gfc_exec_op): Add EXEC_OMP_INTEROP.
	(struct gfc_omp_namelist): Add interop items to union.
	(gfc_free_omp_namelist): Add boolean arg.
	* match.cc (gfc_free_omp_namelist): Update to free
	interop union members.
	* match.h (gfc_match_omp_interop): New.
	* openmp.cc (gfc_omp_directives): Uncomment 'interop' entry.
	(gfc_free_omp_clauses, gfc_match_omp_allocate,
	gfc_match_omp_flush, gfc_match_omp_clause_reduction): Update
	call.
	(enum omp_mask2): Add OMP_CLAUSE_{INIT,USE,DESTROY}.
	(OMP_INTEROP_CLAUSES): Use it.
	(gfc_match_omp_clauses): Match those clauses.
	(gfc_match_omp_prefer_type, gfc_match_omp_init,
	gfc_match_omp_interop): New.
	(resolve_omp_clauses): Handle interop clauses.
	(omp_code_to_statement): Add ST_OMP_INTEROP.
	(gfc_resolve_omp_directive): Add EXEC_OMP_INTEROP.
	* parse.cc (decode_omp_directive): Parse 'interop' directive.
	(next_statement, gfc_ascii_statement): Handle ST_OMP_INTEROP.
	* st.cc (gfc_free_statement): Likewise
	* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_INTEROP.
	* trans.cc (trans_code): Likewise.
	* trans-openmp.cc (gfc_trans_omp_directive): Print 'sorry'
	for EXEC_OMP_

[PATCH] tree-optimization/116610 - wrong SLP induction bias for mask peeling

2024-09-05 Thread Richard Biener

The following fixes a mistake when applying the bias for peeling via
masking to the inital value of SLP inductions.

This resolves gcc.target/aarch64/sve/peel_ind_1.c (a scan-assembler
only unfortunately) when forcing single-lane SLP for it.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

PR tree-optimization/116610
* tree-vect-loop.cc (vectorizable_induction): Use MINUS_EXPR
to apply a mask peeling adjustment.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 62cf9205059..c981ab656ae 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10543,7 +10543,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
  vec_steps.safe_push (vec_step);
  tree step_mul = gimple_build_vector (&init_stmts, &mul_elts);
  if (peel_mul)
-   step_mul = gimple_build (&init_stmts, PLUS_EXPR, step_vectype,
+   step_mul = gimple_build (&init_stmts, MINUS_EXPR, step_vectype,
 step_mul, peel_mul);
  if (!init_node)
vec_init = gimple_build_vector (&init_stmts, &init_elts);
-- 
2.43.0

[PATCH] [AARCH64] adjust gcc.target/aarch64/sve/mask_gather_load_7.c

2024-09-05 Thread Richard Biener

The following adjusts the scan-assembler to also allow predicate
registers p8-15 to be used for the destination of the compares.
I see that code generation with a pending vectorizer patch (the
only assembler change is different predicate register allocation).

Tested on aarch64.

OK for trunk?

Thanks,
Richard.

* gcc.target/aarch64/sve/mask_gather_load_7.c: Allow
p8-15 to be used for the destination of the compares.
---
 gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c 
b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
index c31fae308a5..7812ae7c928 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
@@ -41,13 +41,13 @@
 TEST_ALL (TEST_LOOP)
 
 /* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, \[x[0-9]+, 
x[0-9]+, lsl 1\]\n} 36 } } */
-/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.h, p[0-7]/z, 
z[0-9]+\.h, z[0-9]+\.h\n} 12 } } */
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.h, p[0-7]/z, 
z[0-9]+\.h, z[0-9]+\.h\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-9]+\.h, p[0-7]/z, 
z[0-9]+\.h, z[0-9]+\.h\n} 12 } } */
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.h, p[0-7]/z, 
z[0-9]+\.h, z[0-9]+\.h\n} 6 } } */
 /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, 
z[0-9]+\.s, sxtw 2\]\n} 18 } } */
 /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, 
z[0-9]+\.s, uxtw 2\]\n} 18 } } */
 
 /* Also used for the TEST32 indices.  */
 /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, 
x[0-9]+, lsl 2\]\n} 72 } } */
-/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
-/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */
+/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
+/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */
 /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, 
z[0-9]+\.d, lsl 3\]\n} 36 } } */
-- 
2.43.0

Re: [PATCH][testsuite]: remove -fwrapv from signbit-5.c

2024-09-05 Thread Torbjorn SVENSSON





On 2024-09-05 01:02, Jeff Law wrote:



On 9/4/24 1:13 AM, Torbjorn SVENSSON wrote:



On 2024-09-03 20:23, Richard Biener wrote:



Am 03.09.2024 um 19:00 schrieb Tamar Christina 
:


Hi All,

The meaning of the testcase was changed by passing it -fwrapv.  The 
reason for
the test failures on some platform was because the test was testing 
some

implementation defined behavior wrt INT_MIN in generic code.

Instead of using -fwrapv this just removes the border case from the 
test so
all the values now have a defined semantic.  It still relies on the 
handling of
shifting a negative value right, but that wasn't changed with - 
fwrapv anyway.


The -fwrapv case is being handled already by other testcases.

Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?


Ok


As my patch (r14-10592) that was adding -fwrapv also got backported to 
releases/gcc-14, I assume that this patch should also be backported.


Do you want me to do the backport or will you manage it?

If you have commit privs, go right ahead ;-)


Backported as r14-10643.



Jeff

[PATCH] libsanitizer: On aarch64 use hint #34 in prologue of libsanitizer functions

2024-09-05 Thread Jakub Jelinek

Hi!

When gcc is built with -mbranch-protection=standard, running sanitized
programs doesn't work properly on bti enabled kernels.

This has been fixed upstream with
https://github.com/llvm/llvm-project/pull/84061

The following patch cherry picks that from upstream, ok for trunk/14.3?

For trunk we should eventually do a full merge from upstream, but I'm hoping
they will first fix up the _BitInt libubsan support mess.

2024-09-05  Jakub Jelinek  

* sanitizer_common/sanitizer_asm.h: Cherry-pick llvm-project revision
1c792d24e0a228ad49cc004a1c26bbd7cd87f030.
* interception/interception.h: Likewise.

--- libsanitizer/sanitizer_common/sanitizer_asm.h
+++ libsanitizer/sanitizer_common/sanitizer_asm.h
@@ -42,6 +42,16 @@
 # define CFI_RESTORE(reg)
 #endif
 
+#if defined(__aarch64__) && defined(__ARM_FEATURE_BTI_DEFAULT)
+# define ASM_STARTPROC CFI_STARTPROC; hint #34
+# define C_ASM_STARTPROC SANITIZER_STRINGIFY(CFI_STARTPROC) "\nhint #34"
+#else
+# define ASM_STARTPROC CFI_STARTPROC
+# define C_ASM_STARTPROC SANITIZER_STRINGIFY(CFI_STARTPROC)
+#endif
+#define ASM_ENDPROC CFI_ENDPROC
+#define C_ASM_ENDPROC SANITIZER_STRINGIFY(CFI_ENDPROC)
+
 #if defined(__x86_64__) || defined(__i386__) || defined(__sparc__)
 # define ASM_TAIL_CALL jmp
 #elif defined(__arm__) || defined(__aarch64__) || defined(__mips__) || \
@@ -114,9 +124,9 @@
  .globl __interceptor_trampoline_##name;   
\
  ASM_TYPE_FUNCTION(__interceptor_trampoline_##name);   
\
  __interceptor_trampoline_##name:  
\
- CFI_STARTPROC;
\
+ ASM_STARTPROC;
\
  ASM_TAIL_CALL ASM_PREEMPTIBLE_SYM(__interceptor_##name);  
\
- CFI_ENDPROC;  
\
+ ASM_ENDPROC;  
\
  ASM_SIZE(__interceptor_trampoline_##name)
 #  define ASM_INTERCEPTOR_TRAMPOLINE_SUPPORT 1
 # endif  // Architecture supports interceptor trampoline
--- libsanitizer/interception/interception.h
+++ libsanitizer/interception/interception.h
@@ -204,11 +204,11 @@ const interpose_substitution substitution_##func_name[]   
  \
".type  " SANITIZER_STRINGIFY(TRAMPOLINE(func)) ", "
\
  ASM_TYPE_FUNCTION_STR "\n"
\
SANITIZER_STRINGIFY(TRAMPOLINE(func)) ":\n" 
\
-   SANITIZER_STRINGIFY(CFI_STARTPROC) "\n" 
\
+   C_ASM_STARTPROC "\n"
\
C_ASM_TAIL_CALL(SANITIZER_STRINGIFY(TRAMPOLINE(func)),  
\
"__interceptor_"
\
  SANITIZER_STRINGIFY(ASM_PREEMPTIBLE_SYM(func))) "\n"  
\
-   SANITIZER_STRINGIFY(CFI_ENDPROC) "\n"   
\
+   C_ASM_ENDPROC "\n"  
\
".size  " SANITIZER_STRINGIFY(TRAMPOLINE(func)) ", "
\
 ".-" SANITIZER_STRINGIFY(TRAMPOLINE(func)) "\n"
\
  );

Jakub

Re: [PATCH] libsanitizer: On aarch64 use hint #34 in prologue of libsanitizer functions

2024-09-05 Thread Richard Sandiford

Jakub Jelinek  writes:
> Hi!
>
> When gcc is built with -mbranch-protection=standard, running sanitized
> programs doesn't work properly on bti enabled kernels.
>
> This has been fixed upstream with
> https://github.com/llvm/llvm-project/pull/84061
>
> The following patch cherry picks that from upstream, ok for trunk/14.3?

Yes, thanks!

Richard

> For trunk we should eventually do a full merge from upstream, but I'm hoping
> they will first fix up the _BitInt libubsan support mess.
>
> 2024-09-05  Jakub Jelinek  
>
>   * sanitizer_common/sanitizer_asm.h: Cherry-pick llvm-project revision
>   1c792d24e0a228ad49cc004a1c26bbd7cd87f030.
>   * interception/interception.h: Likewise.
>
> --- libsanitizer/sanitizer_common/sanitizer_asm.h
> +++ libsanitizer/sanitizer_common/sanitizer_asm.h
> @@ -42,6 +42,16 @@
>  # define CFI_RESTORE(reg)
>  #endif
>  
> +#if defined(__aarch64__) && defined(__ARM_FEATURE_BTI_DEFAULT)
> +# define ASM_STARTPROC CFI_STARTPROC; hint #34
> +# define C_ASM_STARTPROC SANITIZER_STRINGIFY(CFI_STARTPROC) "\nhint #34"
> +#else
> +# define ASM_STARTPROC CFI_STARTPROC
> +# define C_ASM_STARTPROC SANITIZER_STRINGIFY(CFI_STARTPROC)
> +#endif
> +#define ASM_ENDPROC CFI_ENDPROC
> +#define C_ASM_ENDPROC SANITIZER_STRINGIFY(CFI_ENDPROC)
> +
>  #if defined(__x86_64__) || defined(__i386__) || defined(__sparc__)
>  # define ASM_TAIL_CALL jmp
>  #elif defined(__arm__) || defined(__aarch64__) || defined(__mips__) || \
> @@ -114,9 +124,9 @@
>   .globl __interceptor_trampoline_##name; 
>   \
>   ASM_TYPE_FUNCTION(__interceptor_trampoline_##name); 
>   \
>   __interceptor_trampoline_##name:
>   \
> - CFI_STARTPROC;  
>   \
> + ASM_STARTPROC;  
>   \
>   ASM_TAIL_CALL ASM_PREEMPTIBLE_SYM(__interceptor_##name);
>   \
> - CFI_ENDPROC;
>   \
> + ASM_ENDPROC;
>   \
>   ASM_SIZE(__interceptor_trampoline_##name)
>  #  define ASM_INTERCEPTOR_TRAMPOLINE_SUPPORT 1
>  # endif  // Architecture supports interceptor trampoline
> --- libsanitizer/interception/interception.h
> +++ libsanitizer/interception/interception.h
> @@ -204,11 +204,11 @@ const interpose_substitution substitution_##func_name[] 
> \
> ".type  " SANITIZER_STRINGIFY(TRAMPOLINE(func)) ", "  
>   \
>   ASM_TYPE_FUNCTION_STR "\n"  
>   \
> SANITIZER_STRINGIFY(TRAMPOLINE(func)) ":\n"   
>   \
> -   SANITIZER_STRINGIFY(CFI_STARTPROC) "\n"   
>   \
> +   C_ASM_STARTPROC "\n"  
>   \
> C_ASM_TAIL_CALL(SANITIZER_STRINGIFY(TRAMPOLINE(func)),
>   \
> "__interceptor_"  
>   \
>   SANITIZER_STRINGIFY(ASM_PREEMPTIBLE_SYM(func))) 
> "\n"  \
> -   SANITIZER_STRINGIFY(CFI_ENDPROC) "\n" 
>   \
> +   C_ASM_ENDPROC "\n"
>   \
> ".size  " SANITIZER_STRINGIFY(TRAMPOLINE(func)) ", "  
>   \
>  ".-" SANITIZER_STRINGIFY(TRAMPOLINE(func)) "\n"  
>   \
>   );
>
>   Jakub

[PATCH V4 04/10] arm: Fix arm backend-use of (u|s|us)dot_prod patterns

2024-09-05 Thread Victor Do Nascimento



Changes from previous revision:

As was done for the equivalent aarch64 patch, we rework this patch to do away 
with
mission creep, keeping changes as simple as possible.

We thus remove the `gimple_fold_builtin' changes that would have replaced the
dot-product builtin calls with DOT_PROD_EXPRs as well as the novel 
initialization
mechanism for dot-product builtins, choosing instead to redirect the single-mode
CODE_FOR_neon_(u|s|us)dot* values generated from `arm_neon_builtins.def' to 
their
new 2-mode equivalents.

Regression tested on arm-none-linux-gnueabihf, no new failures identified.

--

Given recent changes to the dot_prod standard pattern name, this patch
fixes the arm back-end by implementing the following changes:

1. Add 2nd mode to all patterns relating to the dot-product in .md
files.
2. redirect the single-mode CODE_FOR_neon_(u|s|us)dot values
generated from `arm_neon_builtins.def' to their new 2-mode
equivalents via means of simple aliases, as per the following example:

  constexpr insn_code CODE_FOR_neon_sdotv8qi
= CODE_FOR_neon_sdotv2siv8qi;

gcc/ChangeLog:

* config/arm/neon.md (dot_prod): Renamed to...
(dot_prod): ...this.
(neon_dot): Renamed to...
(neon_dot): ...this.
(neon_usdot): Renamed to...
(neon_usdot): ...this.
(usdot_prod): Renamed to...
(usdot_prod): ...this.
* config/arm/arm-builtins.cc
(CODE_FOR_neon_sdotv8qi): Definie as alias to
new CODE_FOR_neon_sdotv2siv8qi.
(CODE_FOR_neon_udotv8qi): Definie as alias to
new CODE_FOR_neon_udotv2siv8qi.
(CODE_FOR_neon_usdotv8qi): Definie as alias to
new CODE_FOR_neon_usdotv2siv8qi.
(CODE_FOR_neon_sdotv16qi): Definie as alias to
new CODE_FOR_neon_sdotv4siv16qi.
(CODE_FOR_neon_udotv16qi): Definie as alias to
new CODE_FOR_neon_udotv4siv16qi.
(CODE_FOR_neon_usdotv16qi): Definie as alias to
new CODE_FOR_neon_usdotv4siv16qi.
---
 gcc/config/arm/arm-builtins.cc | 7 +++
 gcc/config/arm/neon.md | 8 
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm-builtins.cc b/gcc/config/arm/arm-builtins.cc
index c9d50bf8fbb..74cea8900b4 100644
--- a/gcc/config/arm/arm-builtins.cc
+++ b/gcc/config/arm/arm-builtins.cc
@@ -908,6 +908,13 @@ typedef struct {
   enum arm_type_qualifiers *qualifiers;
 } arm_builtin_datum;
 
+constexpr insn_code CODE_FOR_neon_sdotv8qi = CODE_FOR_neon_sdotv2siv8qi;
+constexpr insn_code CODE_FOR_neon_udotv8qi = CODE_FOR_neon_udotv2siv8qi;
+constexpr insn_code CODE_FOR_neon_usdotv8qi = CODE_FOR_neon_usdotv2siv8qi;
+constexpr insn_code CODE_FOR_neon_sdotv16qi = CODE_FOR_neon_sdotv4siv16qi;
+constexpr insn_code CODE_FOR_neon_udotv16qi = CODE_FOR_neon_udotv4siv16qi;
+constexpr insn_code CODE_FOR_neon_usdotv16qi = CODE_FOR_neon_usdotv4siv16qi;
+
 #define CF(N,X) CODE_FOR_neon_##N##X
 
 #define VAR1(T, N, A) \
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fa4a7aeda35..6892b7b0f44 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -2989,7 +2989,7 @@ (define_expand "cmul3"
 ;; ...
 ;;
 ;; and so the vectorizer provides r, in which the result has to be accumulated.
-(define_insn "dot_prod"
+(define_insn "dot_prod"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
(plus:VCVTI
  (unspec:VCVTI [(match_operand: 1 "register_operand" "w")
@@ -3002,7 +3002,7 @@ (define_insn "dot_prod"
 )
 
 ;; These instructions map to the __builtins for the Dot Product operations
-(define_expand "neon_dot"
+(define_expand "neon_dot"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
(plus:VCVTI
  (unspec:VCVTI [(match_operand: 2 "register_operand")
@@ -3013,7 +3013,7 @@ (define_expand "neon_dot"
 )
 
 ;; These instructions map to the __builtins for the Dot Product operations.
-(define_insn "neon_usdot"
+(define_insn "neon_usdot"
   [(set (match_operand:VCVTI 0 "register_operand" "=w")
(plus:VCVTI
  (unspec:VCVTI
@@ -3112,7 +3112,7 @@ (define_insn "neon_dot_laneq"
 )
 
 ;; Auto-vectorizer pattern for usdot
-(define_expand "usdot_prod"
+(define_expand "usdot_prod"
   [(set (match_operand:VCVTI 0 "register_operand")
(plus:VCVTI (unspec:VCVTI [(match_operand: 1
"register_operand")
-- 
2.34.1

Re: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-09-05 Thread Simon Martin

Hi Jason,

On 4 Sep 2024, at 18:09, Jason Merrill wrote:

> On 9/1/24 2:51 PM, Simon Martin wrote:
>> Hi Jason,
>>
>> On 26 Aug 2024, at 19:23, Jason Merrill wrote:
>>
>>> On 8/25/24 12:37 PM, Simon Martin wrote:
 On 24 Aug 2024, at 23:59, Simon Martin wrote:
> On 24 Aug 2024, at 15:13, Jason Merrill wrote:
>
>> On 8/23/24 12:44 PM, Simon Martin wrote:
>>> We currently emit an incorrect -Woverloaded-virtual warning upon

>>> the

>>> following
>>> test case
>>>
>>> === cut here ===
>>> struct A {
>>>  virtual operator int() { return 42; }
>>>  virtual operator char() = 0;
>>> };
>>> struct B : public A {
>>>  operator char() { return 'A'; }
>>> };
>>> === cut here ===
>>>
>>> The problem is that warn_hidden relies on get_basefndecls to 
>>> find

>>> the
>>> methods
>>> in A possibly hidden B's operator char(), and gets both the
>>> conversion operator
>>> to int and to char. It eventually wrongly concludes that the
>>> conversion to int
>>> is hidden.
>>>
>>> This patch fixes this by filtering out conversion operators to
>>> different types
>>> from the list returned by get_basefndecls.
>>
>> Hmm, same_signature_p already tries to handle comparing 
>> conversion
>> operators, why isn't that working?
>>
> It does indeed.
>
> However, `ovl_range (fns)` does not only contain `char
> B::operator()` -
> for which `any_override` gets true - but also `conv_op_marker` - 
> for
>>
> which `any_override` gets false, causing `seen_non_override` to 

> get
> to
> true. Because of that, we run the last loop, that will emit a
> warning
> for all `base_fndecls` (except `char B::operator()` that has been
> removed).
>
> We could test `fndecl` and `base_fndecls[k]` against
> `conv_op_marker` in
> the loop, but we’d still need to inspect the “converting to”
> type
> in the last loop (for when `warn_overloaded_virtual` is 2). This
> would
> make the code much more complex than the current patch.
>>>
>>> Makes sense.
>>>
> It would however probably be better if `get_basefndecls` only
> returned
> the right conversion operator, not all of them. I’ll draft 
> another
> version of the patch that does that and submit it in this thread.
>
 I have explored my suggestion further and it actually ends up more
 complicated than the initial patch.
>>>
>>> Yeah, you'd need to do lookup again for each member of fns.
>>>
 Please find attached a new revision to fix the reported issue, as
 well
 as new ones I discovered while testing with -Woverloaded-virtual=2.
>>

 It’s pretty close to the initial patch, but (1) adds a missing
 “continue;” (2) fixes a location problem when
 -Woverloaded-virtual==2 (3) adds more test cases. The commit log is
 also
 more comprehensive, and should describe well the various problems 
 and
>>

 why the patch is correct.
>>>
 +  if (IDENTIFIER_CONV_OP_P (name)
 +  && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
 +   DECL_CONV_FN_TYPE (base_fndecls[k])))
 +{
 +  base_fndecls[k] = NULL_TREE;
 +  continue;
 +}
>>>
>>> So this removes base_fndecls[k] if it doesn't return the same type 
>>> as
>>> fndecl.  But what if there's another conversion op in fns that does
>>
>>> return the same type as base_fndecls[k]?
>>>
>>> If I add an operator int() to both base and derived in
>>> Woverloaded-virt7.C, the warning disappears.
>>>
>> That was an issue indeed. I’ve reworked the patch, and came up with
>> the attached latest version. It explicitly keeps track both of
>> overloaded and of hidden base methods (and the “hiding method” 
>> for
>> the latter), and uses those instead of juggling with bools and 
>> nullified
>> base_decls.
>>
>> On top of fixing the issue the PR reports, it fixes a few that I came

>>
>> across while investigating:
>> - wrongly emitting the warning if the base method is not virtual (the

>>
>> lines added to Woverloaded-virt1.C would cause a warning without the
>> patch)
>> - wrongly emitting the warning when the derived class method is a
>> template, which is wrong since template members don’t override 
>> virtual
>> base methods (see the change in pr61945.C)
>
> This change seems wrong to me; the warning is documented as "Warn when 
> a function declaration hides virtual functions from a base class," and 
> templates can certainly hide virtual base methods, as indeed they do 
> in that testcase.
Gasp, you’re right. The updated patch fixes this by simply working 
from the TEMPLATE_TEMPLATE_RESULT of TEMPLATE_DECL; so pr61945.C warns 
again (after changing the signature so that it actually hides the base 
class; it was not be

[PATCH V4 10/10] autovectorizer: Test autovectorization of different dot-prod modes.

2024-09-05 Thread Victor Do Nascimento

Changes from previous revision:

Rename new `check_effective_target' and tests to make their intent
clearer.

  * lib/target-supports.exp: For new `check_effective_target',
s/vect_dotprod_twoway/vect_dotprod_hisi/.
  * One test is renamed to `vect-dotprod-conv-optab.c' to emphasize
aim of checking the new dotprod convert optab allows
autovectorization of a given datatype to distinct target
data-types.
  * The aarch64 runtime-correctness check has had the mode supported
for its two-way dot-product added to the test name, resulting in
the new `vect-dotprod-twoway-hisi.c' name.

--

Given the novel treatment of the dot product optab as a conversion, we
are now able to target different relationships between output modes and
input modes.

This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:

uint32_t udot4(int n, uint8_t* data) {
  uint32_t sum = 0;
  for (int i=0; i
+
+uint32_t udot4(int n, uint8_t* data) {
+  uint32_t sum = 0;
+  for (int i=0; i
+#include 
+#pragma GCC target "+sme2"
+
+uint32_t
+udot2 (int n, uint16_t* data)  __arm_streaming
+{
+  uint32_t sum = 0;
+  for (int i=0; i

Re: [PATCH] [AARCH64] adjust gcc.target/aarch64/sve/mask_gather_load_7.c

2024-09-05 Thread Richard Sandiford

Richard Biener  writes:
> The following adjusts the scan-assembler to also allow predicate
> registers p8-15 to be used for the destination of the compares.
> I see that code generation with a pending vectorizer patch (the
> only assembler change is different predicate register allocation).

Oops, yes, I should have realised that 0-7 was overly constrained.

> Tested on aarch64.
>
> OK for trunk?

OK, thanks.

Richard

>
> Thanks,
> Richard.
>
>   * gcc.target/aarch64/sve/mask_gather_load_7.c: Allow
>   p8-15 to be used for the destination of the compares.
> ---
>  gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
> index c31fae308a5..7812ae7c928 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/mask_gather_load_7.c
> @@ -41,13 +41,13 @@
>  TEST_ALL (TEST_LOOP)
>  
>  /* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, 
> \[x[0-9]+, x[0-9]+, lsl 1\]\n} 36 } } */
> -/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.h, p[0-7]/z, 
> z[0-9]+\.h, z[0-9]+\.h\n} 12 } } */
> -/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.h, p[0-7]/z, 
> z[0-9]+\.h, z[0-9]+\.h\n} 6 } } */
> +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-9]+\.h, p[0-7]/z, 
> z[0-9]+\.h, z[0-9]+\.h\n} 12 } } */
> +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.h, p[0-7]/z, 
> z[0-9]+\.h, z[0-9]+\.h\n} 6 } } */
>  /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, 
> \[x[0-9]+, z[0-9]+\.s, sxtw 2\]\n} 18 } } */
>  /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, 
> \[x[0-9]+, z[0-9]+\.s, uxtw 2\]\n} 18 } } */
>  
>  /* Also used for the TEST32 indices.  */
>  /* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, 
> \[x[0-9]+, x[0-9]+, lsl 2\]\n} 72 } } */
> -/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, 
> z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
> -/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, 
> z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */
> +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-9]+\.s, p[0-7]/z, 
> z[0-9]+\.s, z[0-9]+\.s\n} 12 } } */
> +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-9]+\.s, p[0-7]/z, 
> z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */
>  /* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, 
> \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 36 } } */

[PATCH v1 1/9] Support weak references

2024-09-05 Thread Evgeny Karpov

Monday, September 2, 2024
Richard Sandiford  wrote:

> On patch 1, do you have a reference for how AArch64 and x86 handle weak
> references for MinGW?  The code looks good, but I didn't really follow
> why it was doing what it was doing.

Monday, September 2, 2024
Martin Storsjö 
>> The patch adds support for weak references. The original MinGW
>> implementation targets ix86, which handles weak symbols differently
>> compared to AArch64.
>
> Please clarify this statement.

Here is an explanation of why this change is needed and what the
difference is between x86_64-w64-mingw32 and aarch64-w64-mingw32.

The way x86_64 calls a weak function:
call  weak_fn2

GCC emits the call and creates the required definitions at the end
of the assembly:

.weak weak_fn2
.def  weak_fn2;   .scl  2;.type 32;   .endef

This is different from aarch64:

weak_fn2 will be legitimized and replaced by .refptr.weak_fn2,
and there will be no other references to weak_fn2 in the code.

adrp  x0, .refptr.weak_fn2
add   x0, x0, :lo12:.refptr.weak_fn2
ldr   x0, [x0]
blr   x0

GCC does not emit the required definitions at the end of the assembly,
and weak_fn2 is tracked only by the mingw stub sybmol.

Without the change, the stub definition will emit:

    .section  .rdata$.refptr.weak_fn2, "dr"
    .globl  .refptr.weak_fn2
    .linkonce discard
.refptr.weak_fn2:
    .quad   weak_fn2

which is not enough. This fix will emit the required definitions:

    .weak   weak_fn2
    .defweak_fn2;   .scl  2;.type 32;   .endef
    .section  .rdata$.refptr.weak_fn2, "dr"
    .globl  .refptr.weak_fn2
    .linkonce discard
.refptr.weak_fn2:
    .quad   weak_fn2

Regards,
Evgeny

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-05 Thread H.J. Lu

On Tue, Sep 3, 2024 at 4:00 AM Jan Hubicka  wrote:
>
> > > >
> > > > PR ipa/116410
> > > > * ipa-modref.cc (analyze_parms): Always analyze function 
> > > > parameter
> > > > for LTO streaming.
> > > >
> > > > Signed-off-by: H.J. Lu 
> > > > ---
> > > >  gcc/ipa-modref.cc | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> > > > index 59cfe91f987..9275030c254 100644
> > > > --- a/gcc/ipa-modref.cc
> > > > +++ b/gcc/ipa-modref.cc
> > > > @@ -2975,7 +2975,7 @@ analyze_parms (modref_summary *summary, 
> > > > modref_summary_lto *summary_lto,
> > > > summary->arg_flags.safe_grow_cleared (count, true);
> > > >   summary->arg_flags[parm_index] = EAF_UNUSED;
> > > > }
> > > > - else if (summary_lto)
> > > > + if (summary_lto)
> > > > {
> > > >   if (parm_index >= summary_lto->arg_flags.length ())
> > > > summary_lto->arg_flags.safe_grow_cleared (count, true);
> > > > @@ -3034,7 +3034,7 @@ analyze_parms (modref_summary *summary, 
> > > > modref_summary_lto *summary_lto,
> > > > summary->arg_flags.safe_grow_cleared (count, true);
> > > >   summary->arg_flags[parm_index] = flags;
> > > > }
> > > > - else if (summary_lto)
> > > > + if (summary_lto)
> > > > {
> > > >   if (parm_index >= summary_lto->arg_flags.length ())
> > > > summary_lto->arg_flags.safe_grow_cleared (count, true);
> > > > --
> > > > 2.46.0
> > > >
> > >
> > > These are oversights in
> > >
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=85ebbabd85e03bdc3afc190aeb29250606d18322
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3350e59f2985469b2472e4d9a6d387337da4519b
> > >
> > > to have
> > >
> > >   if (summary)
> > >   ...
> > >   else if (summary_lto)
> > >    This disables LTO optimization for  -ffat-lto-objects.
> > >
> > > Is this patch OK for master and backports?
> >
> > OK for master.  Please wait with backports though, eventually Honza has 
> > comments
> > as well.
>
> It looks good to me.  The code was originally written for separate LTO
> and non-LTO paths (since with LTO we can not collect alias sets that
> are not stable across LTO streaming).  Plan was to eventually merge more
> of the logic by templates, but that did not happen (yet).  I will try to
> look into cleaning this up bit more after adding the nonsequential
> attribtue
>

OK for backports?


-- 
H.J.

[PATCH v1 5/9] aarch64: Multiple adjustments to support the SMALL code model correctly

2024-09-05 Thread Evgeny Karpov

Monday, September 2, 2024
Richard Sandiford  wrote:

> I realise this is pre-existing, bue the last line should probably be:
>
>   fprintf ((FILE), "," HOST_WIDE_INT_PRINT_UNSIGNED "\n", (ROUNDED)))
>
> to avoid silent truncation.  (Even if the format only supports 32-bit
> code and data, it's better for out-of-bounds values to be flagged by
> the assembler rather than silently truncated.)


>> +#undef ASM_DECLARE_OBJECT_NAME
>> +#define ASM_DECLARE_OBJECT_NAME(STREAM, NAME, DECL)  \
>> +  mingw_pe_declare_object_type (STREAM, NAME, TREE_PUBLIC (DECL)); \
>> +  ASM_OUTPUT_LABEL ((STREAM), (NAME))
>> +
>> +
>> +#undef ASM_DECLARE_FUNCTION_NAME
>> +#define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL)   \
>> +  mingw_pe_declare_function_type (STR, NAME, TREE_PUBLIC (DECL)); \
>> +  aarch64_declare_function_name (STR, NAME, DECL)
>> +
>> +
>
> These two should probaly either be wrapped in:
>
>  do { ... ] while (0)

> Using "STREAM" rather than "STR" in ASM_DECLARE_FUNCTION_NAME
> would be more consistent with ASM_DECLARE_OBJECT_NAME.

> The new function should have its own comment (the existing one
> describes mingw_pe_declare_function_type).  Could we make "pub"
> a bool for both functions?

> Maybe the two functions are similar enough that it would be worth
> having them forward to an internal helper that takes DT_NON or DT_FCN
> as appropriate.  I suppose that's more personal preference though,
> so let me know if you disagree.

The patch has been refactored to address the review. Thanks!

Regards,
Evgeny

gcc/ChangeLog:

* config/aarch64/aarch64-coff.h (LOCAL_LABEL_PREFIX):
Use "." as the local label prefix.
(ASM_OUTPUT_ALIGNED_LOCAL): Remove.
(ASM_OUTPUT_LOCAL): New.
* config/aarch64/cygming.h (ASM_OUTPUT_EXTERNAL_LIBCALL):
Update.
(ASM_DECLARE_OBJECT_NAME): New.
(ASM_DECLARE_FUNCTION_NAME): New.
* config/i386/cygming.h (ASM_DECLARE_COLD_FUNCTION_NAME):
Update.
(ASM_OUTPUT_EXTERNAL_LIBCALL): Update.
* config/mingw/winnt.cc (mingw_pe_declare_function_type):
Rename into ...
(mingw_pe_declare_type): ... this.
(i386_pe_start_function): Update.
* config/mingw/winnt.h (mingw_pe_declare_function_type):
Renamte into ...
(mingw_pe_declare_type): ... this.
---
 gcc/config/aarch64/aarch64-coff.h | 22 ++
 gcc/config/aarch64/cygming.h  | 18 +-
 gcc/config/i386/cygming.h |  8 
 gcc/config/mingw/winnt.cc | 18 +-
 gcc/config/mingw/winnt.h  |  3 +--
 5 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-coff.h 
b/gcc/config/aarch64/aarch64-coff.h
index 81fd9954f75..17f346fe540 100644
--- a/gcc/config/aarch64/aarch64-coff.h
+++ b/gcc/config/aarch64/aarch64-coff.h
@@ -20,9 +20,8 @@
 #ifndef GCC_AARCH64_COFF_H
 #define GCC_AARCH64_COFF_H
 
-#ifndef LOCAL_LABEL_PREFIX
-# define LOCAL_LABEL_PREFIX""
-#endif
+#undef LOCAL_LABEL_PREFIX
+#define LOCAL_LABEL_PREFIX  "."
 
 /* Using long long breaks -ansi and -std=c90, so these will need to be
made conditional for an LLP64 ABI.  */
@@ -54,19 +53,10 @@
 }
 #endif
 
-/* Output a local common block.  /bin/as can't do this, so hack a
-   `.space' into the bss segment.  Note that this is *bad* practice,
-   which is guaranteed NOT to work since it doesn't define STATIC
-   COMMON space but merely STATIC BSS space.  */
-#ifndef ASM_OUTPUT_ALIGNED_LOCAL
-# define ASM_OUTPUT_ALIGNED_LOCAL(STREAM, NAME, SIZE, ALIGN)   \
-{  \
-  switch_to_section (bss_section); \
-  ASM_OUTPUT_ALIGN (STREAM, floor_log2 (ALIGN / BITS_PER_UNIT));   \
-  ASM_OUTPUT_LABEL (STREAM, NAME); \
-  fprintf (STREAM, "\t.space\t%d\n", (int)(SIZE)); \
-}
-#endif
+#define ASM_OUTPUT_LOCAL(FILE, NAME, SIZE, ROUNDED)  \
+( fputs (".lcomm ", (FILE)),   \
+  assemble_name ((FILE), (NAME)),  \
+  fprintf ((FILE), ",%lu\n", (ROUNDED)))
 
 #define ASM_OUTPUT_SKIP(STREAM, NBYTES)\
   fprintf (STREAM, "\t.space\t%d  // skip\n", (int) (NBYTES))
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index e4ceab82b9e..3afeb77110d 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -78,7 +78,7 @@ still needed for compilation.  */
 
 /* Declare the type properly for any external libcall.  */
 #define ASM_OUTPUT_EXTERNAL_LIBCALL(FILE, FUN) \
-  mingw_pe_declare_function_type (FILE, XSTR (FUN, 0), 1)
+  mingw_pe_declare_type (FILE, XSTR (FUN, 0), 1, 1)
 
 /* Use section relative relocations for debugging offsets.  Unlike
other targets that fake this by putting the section VMA at 0, PE
@@ -213,6 +213,22 @@ still needed for compilation.  */
 
 #define SUPPORTS_ONE_ONLY 1
 
+#undef ASM_DECLARE_OBJECT_NAME

Re: [PATCH v1 1/9] Support weak references

2024-09-05 Thread Martin Storsjö


On Thu, 5 Sep 2024, Evgeny Karpov wrote:


Monday, September 2, 2024
Martin Storsjö 

The patch adds support for weak references. The original MinGW
implementation targets ix86, which handles weak symbols differently
compared to AArch64.


Please clarify this statement.


Here is an explanation of why this change is needed and what the
difference is between x86_64-w64-mingw32 and aarch64-w64-mingw32.

The way x86_64 calls a weak function:
call  weak_fn2

GCC emits the call and creates the required definitions at the end
of the assembly:

.weak weak_fn2
.def  weak_fn2;   .scl  2;.type 32;   .endef

This is different from aarch64:

weak_fn2 will be legitimized and replaced by .refptr.weak_fn2,
and there will be no other references to weak_fn2 in the code.

adrp  x0, .refptr.weak_fn2
add   x0, x0, :lo12:.refptr.weak_fn2
ldr   x0, [x0]
blr   x0


Right, this is the core of what I'm arguing here.


Is there any intrinsic reason why there _should_ be a difference to x86_64 
here? Because most of the same reasons for why aarch64 wants to do 
indirection via .refptr here also do apply for x86_64.


Or to put it in more clear words: I think x86_64 also should use .refptr 
for weak symbols.


There are a number of open bugs for GCC targeting x86_64 mingw, regarding 
weak symbols, and I think a few of them could be solved if GCC would use 
.refptr for the weak symbols on x86_64 as well.



So I don't argue that this change is wrong, it probably is correct.

But I'm arguing that there shouldn't be any difference between the 
architectures regarding how it is handled. Whenever there's an 
architecture difference in such a matter which shouldn't be architecture 
specific, there may be a latent bug hiding.


That's not necessarily a blocker for this patch though, but the wording 
should make it clear: There's no specific reason for why aarch64 should 
behave differently than x86_64, but the x86_64 implementation probably 
needs to catch up.


// Martin

[PATCH v1 0/9] SMALL code model fixes, optimization fixes, LTO and minimal C++ enablement

2024-09-05 Thread Evgeny Karpov

Monday, September 2, 2024
Richard Sandiford  wrote:

> Thanks for submitting this.  I've responded with minor comments to
> some individual patches, but the rest (1, 2, 7, and 8) look good to
> me as-is.

Thank you for the review. The patch series v2 will be submitted after
the validation.

Regards,
Evgeny

[PATCH] RISC-V: Fix out of index in riscv_select_multilib_by_abi

2024-09-05 Thread YunQiang Su

commit b5c2aae48723c9098a8a3dab1409b30fd87bbf56
Author: YunQiang Su 
Date:   Thu Sep 5 15:14:43 2024 +0800

RISC-V: Lookup reversely in riscv_select_multilib_by_abi

The last element should use index
   multilib_infos.size () - 1

gcc
* common/config/riscv/riscv-common.cc(riscv_select_multilib_by_abi):
Fix out of index problem.
---
 gcc/common/config/riscv/riscv-common.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 2c1ce7fc7cb..bd42fd01532 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -2079,7 +2079,7 @@ riscv_select_multilib_by_abi (
   const std::string &riscv_current_abi_str,
   const std::vector &multilib_infos)
 {
-  for (ssize_t i = multilib_infos.size (); i >= 0; --i)
+  for (ssize_t i = multilib_infos.size () - 1; i >= 0; --i)
 if (riscv_current_abi_str == multilib_infos[i].abi_str)
   return xstrdup (multilib_infos[i].path.c_str ());
 
-- 
2.39.3 (Apple Git-146)

[PATCH v2 1/2] Genmatch: Support control flow graph case 1 for phi on condition

2024-09-05 Thread pan2 . li

From: Pan Li 

The gen_phi_on_cond can only support below control flow for cond
from day 1.  Aka:

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

Unfortunately, there will be more scenarios of control flow on PHI.
For example as below:

T __attribute__((noinline))\
sat_s_add_##T##_fmt_3 (T x, T y)   \
{  \
  T sum;   \
  bool overflow = __builtin_add_overflow (x, y, &sum); \
  return overflow ? x < 0 ? MIN : MAX : sum;   \
}

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

With expanded RTL like below.
   3   │
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

The above code will have below control flow which is not supported by
the gen_phi_on_cond.

+--+
| def  |
| ...  |   +-+
| cond |-->| def |
+--+   | ... |
   |   +-+
   |  |
   v  |
+-+   |
| def |   |
| ... |   |
+-+   |
   |  |
   |  |
   v  |
+-+   |
| PHI |<--+
+-+

This patch would like to add support above control flow for the
gen_phi_on_cond.  The generated match code looks like below.

Before this patch:
basic_block _b1 = gimple_bb (_a1);
if (gimple_phi_num_args (_a1) == 2)
  {
basic_block _pb_0_1 = EDGE_PRED (_b1, 0)->src;
basic_block _pb_1_1 = EDGE_PRED (_b1, 1)->src;
basic_block _db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) ? 
_pb_0_1 : _pb_1_1;
basic_block _other_db_1 = safe_dyn_cast  (*gsi_last_bb (_pb_0_1)) 
? _pb_1_1 : _pb_0_1;
gcond *_ct_1 = safe_dyn_cast  (*gsi_last_bb (_db_1));
if (_ct_1 && EDGE_COUNT (_other_db_1->preds) == 1
  && EDGE_COUNT (_other_db_1->succs) == 1
  && EDGE_PRED (_other_db_1, 0)->src == _db_1)
  {
...

After this patch:
basic_block _b1 = gimple_bb (_a1);
basic_block _b_cond_1;
if (gimple_phi_num_args (_a1) == 2
&& (control_flow_graph_case_0_match (_b1, &_b_cond_1)
|| control_flow_graph_case_1_match (_b1, &_b_cond_1)))
{
...

The below testsuites are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* genmatch.cc (dt_operand::gen_phi_on_cond): Add support control
flow graph case 1 for gen phi on condition.
* gimple-match-head.cc (control_flow_graph_case_0_match): Add
new func impl to match case 0 of cfg.
(control_flow_graph_case_1_match): Ditto but for case 1.

Signed-off-by: Pan Li 
---
 gcc/genmatch.cc  |  37 +
 gcc/gimple-match-head.cc | 115 +++
 2 files changed, 130 insertions(+), 22 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a56bd90cb2c..e0ec1c0e928 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -3518,43 +3518,36 @@ dt_operand::gen_phi_on_cond (FILE *f, int indent, int 
depth)
 {
   fprintf_indent (f, indent,
 "basic_block _b%d = gimple_bb (_a%d);\n", depth, depth);
+  fprintf_indent (f, indent, "basic_block _b_cond_%d;\n", depth);
 
-  fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2)\n", depth);
+  fprintf_indent (f, indent, "if (gimple_phi_num_args (_a%d) == 2\n", depth);
 
-  indent += 2;
-  fprintf_indent (f, indent, "{\n");
   indent += 2;
 
   fprintf_indent (f, indent,
-"basic_block _pb_0_%d = EDGE_PRED (_b%d, 0)->src;\n", depth, depth);
-  fprintf_indent (f, indent,
-"basic_block _pb_1_%d = EDGE_PRED (_b%d, 1)->src;\n", depth, depth);
-  fprintf_indent (f

[PATCH v2 2/2] Match: Support form 3 for scalar signed integer .SAT_ADD

2024-09-05 Thread pan2 . li

From: Pan Li 

This patch would like to support the form 3 of the scalar signed
integer .SAT_ADD.  Aka below example:

Form 3:
  #define DEF_SAT_S_ADD_FMT_3(T, UT, MIN, MAX)   \
  T __attribute__((noinline))\
  sat_s_add_##T##_fmt_3 (T x, T y)   \
  {  \
T sum;   \
bool overflow = __builtin_add_overflow (x, y, &sum); \
return overflow ? x < 0 ? MIN : MAX : sum;   \
  }

DEF_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX)

We can tell the difference before and after this patch if backend
implemented the ssadd3 pattern similar as below.

Before this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   signed char _1;
   8   │   signed char _2;
   9   │   int8_t _3;
  10   │   __complex__ signed char _6;
  11   │   _Bool _8;
  12   │   signed char _9;
  13   │   signed char _10;
  14   │   signed char _11;
  15   │
  16   │ ;;   basic block 2, loop depth 0
  17   │ ;;pred:   ENTRY
  18   │   _6 = .ADD_OVERFLOW (x_4(D), y_5(D));
  19   │   _2 = IMAGPART_EXPR <_6>;
  20   │   if (_2 != 0)
  21   │ goto ; [50.00%]
  22   │   else
  23   │ goto ; [50.00%]
  24   │ ;;succ:   4
  25   │ ;;3
  26   │
  27   │ ;;   basic block 3, loop depth 0
  28   │ ;;pred:   2
  29   │   _1 = REALPART_EXPR <_6>;
  30   │   goto ; [100.00%]
  31   │ ;;succ:   5
  32   │
  33   │ ;;   basic block 4, loop depth 0
  34   │ ;;pred:   2
  35   │   _8 = x_4(D) < 0;
  36   │   _9 = (signed char) _8;
  37   │   _10 = -_9;
  38   │   _11 = _10 ^ 127;
  39   │ ;;succ:   5
  40   │
  41   │ ;;   basic block 5, loop depth 0
  42   │ ;;pred:   3
  43   │ ;;4
  44   │   # _3 = PHI <_1(3), _11(4)>
  45   │   return _3;
  46   │ ;;succ:   EXIT
  47   │
  48   │ }

After this patch:
   4   │ __attribute__((noinline))
   5   │ int8_t sat_s_add_int8_t_fmt_3 (int8_t x, int8_t y)
   6   │ {
   7   │   int8_t _3;
   8   │
   9   │ ;;   basic block 2, loop depth 0
  10   │ ;;pred:   ENTRY
  11   │   _3 = .SAT_ADD (x_4(D), y_5(D)); [tail call]
  12   │   return _3;
  13   │ ;;succ:   EXIT
  14   │
  15   │ }

The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.

gcc/ChangeLog:

* match.pd: Add the form 3 of signed .SAT_ADD matching.

Signed-off-by: Pan Li 
---
 gcc/match.pd | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index 621306213e4..1d478d42ed5 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3207,6 +3207,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
   && types_match (type, @0, @1
 
+/* Signed saturation add, case 3:
+   Z = .ADD_OVERFLOW (X, Y)
+   SAT_S_ADD = IMAGPART_EXPR (Z) != 0 ? (-(T)(X < 0) ^ MAX) : sum;  */
+(match (signed_integer_sat_add @0 @1)
+ (cond^ (ne (imagpart (IFN_ADD_OVERFLOW:c@2 @0 @1)) integer_zerop)
+   (bit_xor:c (negate (convert (lt @0 integer_zerop))) max_value)
+   (realpart @2))
+ (if (INTEGRAL_TYPE_P (type) && !TYPE_UNSIGNED (type)
+  && types_match (type, @0, @1
+
 /* Unsigned saturation sub, case 1 (branch with gt):
SAT_U_SUB = X > Y ? X - Y : 0  */
 (match (unsigned_integer_sat_sub @0 @1)
-- 
2.43.0

Re: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Raphael Zinsly

On Wed, Sep 4, 2024 at 8:32 PM Jeff Law  wrote:
> On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
> ...
> > +  bool bit31 = (hival & 0x8000) != 0;
> > +  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
> > +  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
> > +  int shiftval = 0;
> > +
> > +  /* Adjust the shift into the high half accordingly.  */
> > +  if ((trailing_shift > 0 && hival == (loval >> trailing_shift))
> > +   || (trailing_shift < 0 && hival == (loval << trailing_shift)))
> > + shiftval = 32 - trailing_shift;
> > +  else if ((leading_shift < 0 && hival == (loval >> leading_shift))
> > + || (leading_shift > 0 && hival == (loval << leading_shift)))
> Don't these trigger undefined behavior when tailing_shift or
> leading_shift is < 0?  We shouldn't ever generate negative shift counts.

The value of trailing/leading_shift is added to 32, we will never have
negative shift counts.



--
Raphael Moreira Zinsly

Re: [PATCH 3/3] RISC-V: Constant synthesis of inverted halves

2024-09-05 Thread Raphael Zinsly

On Wed, Sep 4, 2024 at 8:35 PM Jeff Law  wrote:
> On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
>...
> > +unsigned long foo_0x4afe605fb5019fa0(void) { return 0x4afe605fb5019fa0UL; }
> > +unsigned long foo_0x07a80d21f857f2de(void) { return 0x07a80d21f857f2deUL; }
> > +unsigned long foo_0x6699f19c99660e63(void) { return 0x6699f19c99660e63UL; }
> > +unsigned long foo_0x6c80e48a937f1b75(void) { return 0x6c80e48a937f1b75UL; }
> > +unsigned long foo_0x47d7193eb828e6c1(void) { return 0x47d7193eb828e6c1UL; }
> > +unsigned long foo_0x7c627816839d87e9(void) { return 0x7c627816839d87e9UL; }
> > +unsigned long foo_0x3d69e83ec29617c1(void) { return 0x3d69e83ec29617c1UL; }
> > +unsigned long foo_0x5bee7ee6a4118119(void) { return 0x5bee7ee6a4118119UL; }
> > +unsigned long foo_0x73fe20828c01df7d(void) { return 0x73fe20828c01df7dUL; }
> > +unsigned long foo_0x0f1dc294f0e23d6b(void) { return 0x0f1dc294f0e23d6bUL; }
> I must be missing something.  All the tests have bit31 on.  But I don't
> think this synthesis is valid when bit31 is on and the code seems to
> check this.  What am I missing?

The upper half is the one that is shifted so we check for bit31 of the hival:
bool bit31 = (hival & 0x8000) != 0;
Maybe we should change the name of the variable to bit63.




--
Raphael Moreira Zinsly

Zen5 tuning part 5: update instruction latencies in x86-tune-costs

2024-09-05 Thread Jan Hubicka

Hi,
there is nothing exciting in this patch.  I measured latencies and also
compared them with newly released optimization guide and it seems that
only important change is that addss is fastr now. It can be 2 cycles
instaead of 3 in some cases when the input parameter is computed by
another addition. The throughput has increased but we have no model for
that.

I added comments whic should make it easier to update the table for
future revisions.

I also increased the large insn bound since decoders seems no longer
require instructions to be 8 bytes or less.

Bootstrapped/rgtested x86_64-linux, comitted.

gcc/ChangeLog:

* config/i386/x86-tune-costs.h (znver5_cost): Update instruction
costs.

diff --git a/gcc/config/i386/x86-tune-costs.h b/gcc/config/i386/x86-tune-costs.h
index b90567fbbf2..1b3227ace16 100644
--- a/gcc/config/i386/x86-tune-costs.h
+++ b/gcc/config/i386/x86-tune-costs.h
@@ -2034,6 +2034,7 @@ struct processor_costs znver5_cost = {
   COSTS_N_INSNS (1),   /* cost of a lea instruction.  */
   COSTS_N_INSNS (1),   /* variable shift costs.  */
   COSTS_N_INSNS (1),   /* constant shift costs.  */
+  /* mul has latency 3, executes in 3 integer units.  */
   {COSTS_N_INSNS (3),  /* cost of starting multiply for QI.  */
COSTS_N_INSNS (3),  /*   HI.  */
COSTS_N_INSNS (3),  /*   SI.  */
@@ -2041,6 +2042,8 @@ struct processor_costs znver5_cost = {
COSTS_N_INSNS (3)}, /*  other.  */
   0,   /* cost of multiply per each bit
   set.  */
+  /* integer divide has latency of 8 cycles
+ plus 1 for every 9 bits of quotient.  */
   {COSTS_N_INSNS (10), /* cost of a divide/mod for QI.  */
COSTS_N_INSNS (11), /*  HI.  */
COSTS_N_INSNS (13), /*  SI.  */
@@ -2048,7 +2051,7 @@ struct processor_costs znver5_cost = {
COSTS_N_INSNS (16)},/*  
other.  */
   COSTS_N_INSNS (1),   /* cost of movsx.  */
   COSTS_N_INSNS (1),   /* cost of movzx.  */
-  8,   /* "large" insn.  */
+  15,  /* "large" insn.  */
   9,   /* MOVE_RATIO.  */
   6,   /* CLEAR_RATIO */
   {6, 6, 6},   /* cost of loading integer registers
@@ -2065,12 +2068,13 @@ struct processor_costs znver5_cost = {
   2, 2, 2, /* cost of moving XMM,YMM,ZMM
   register.  */
   6,   /* cost of moving SSE register to 
integer.  */
-  /* VGATHERDPD is 17 uops and throughput is 4, VGATHERDPS is 24 uops,
- throughput 5.  Approx 7 uops do not depend on vector size and every load
- is 5 uops.  */
+
+  /* TODO: gather and scatter instructions are currently disabled in
+ x86-tune.def.  In some cases they are however a win, see PR116582
+ We however need good cost model for them.  */
   14, 10,  /* Gather load static, per_elt.  */
   14, 20,  /* Gather store static, per_elt.  */
-  32,  /* size of l1 cache.  */
+  48,  /* size of l1 cache.  */
   1024,/* size of l2 cache.  */
   64,  /* size of prefetch block.  */
   /* New AMD processors never drop prefetches; if they cannot be performed
@@ -2080,6 +2084,8 @@ struct processor_costs znver5_cost = {
  time).  */
   100, /* number of parallel prefetches.  */
   3,   /* Branch cost.  */
+  /* TODO x87 latencies are still based on znver4.
+ Probably not very important these days.  */
   COSTS_N_INSNS (7),   /* cost of FADD and FSUB insns.  */
   COSTS_N_INSNS (7),   /* cost of FMUL instruction.  */
   /* Latency of fdiv is 8-15.  */
@@ -2089,16 +2095,24 @@ struct processor_costs znver5_cost = {
   /* Latency of fsqrt is 4-10.  */
   COSTS_N_INSNS (25),  /* cost of FSQRT instruction.  */
 
+  /* SSE instructions have typical throughput 4 and latency 1.  */
   COSTS_N_INSNS (1),   /* cost of cheap SSE instruction.  */
-  COSTS_N_INSNS (3),   /* cost of ADDSS/SD SUBSS/SD insns.  */
+  /* ADDSS has throughput 2 and latency 2
+ (in some cases when source is another addition).  */
+  COSTS_N_INSNS (2),   /* cost of ADDSS/SD SUBSS/SD insns.  */
+  /* MULSS has throughput 2 and latency 3.  */
   COSTS_N_INSNS (3),   /* cost of MULSS ins

Move from 'gcc.target/nvptx/nvptx.exp' into 'target-supports.exp' additions for nvptx target (was: [PATCH] Make 'target-supports.exp' additions for nvptx target generally available)

2024-09-05 Thread Thomas Schwinge

Hi!

On 2024-07-18T13:44:37+0200,  wrote:
> OK to push (once testing completes) the attached
> "Make 'target-supports.exp' additions for nvptx target generally available"?
>
> The idea of this new scheme is that explicit feature/target-specific
> stuff isn't kept in 'gcc/testsuite/lib/target-supports.exp', but instead
> in feature/target-specific 'gcc/testsuite/lib/target-supports-*.exp'
> files.  (..., and hoping that other maintainers also pick up this new
> scheme, and likewise move any feature/target-specific stuff from
> 'gcc/testsuite/lib/target-supports.exp', for example, into new
> 'gcc/testsuite/lib/target-supports-*.exp' files, to un-bloat the former
> one.)

I've not yet had any response to that proposal, so I've for now done it
the standard way, and pushed to trunk branch
commit a121af90fe9244258c8620901dd6fa22537767bb
"Move from 'gcc.target/nvptx/nvptx.exp' into 'target-supports.exp' additions 
for nvptx target",
see attached.


Grüße
 Thomas


>From a121af90fe9244258c8620901dd6fa22537767bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Jul 2024 14:40:34 +0200
Subject: [PATCH] Move from 'gcc.target/nvptx/nvptx.exp' into
 'target-supports.exp' additions for nvptx target

	gcc/testsuite/
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_default_ptx_isa_version_at_least)
	(check_effective_target_default_ptx_isa_version_at_least_6_0)
	(check_effective_target_runtime_ptx_isa_version_at_least)
	(check_effective_target_runtime_ptx_alias)
	(add_options_for_ptx_alias): Move...
	* lib/target-supports.exp
	(check_nvptx_default_ptx_isa_version_at_least)
	(check_effective_target_nvptx_default_ptx_isa_version_at_least_6_0)
	(check_nvptx_runtime_ptx_isa_version_at_least)
	(check_effective_target_nvptx_runtime_alias_ptx)
	(add_options_for_nvptx_alias_ptx): ... here.
	* gcc.target/nvptx/alias-1.c: Adjust.
	* gcc.target/nvptx/alias-2.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-4.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
	* gcc.target/nvptx/alias-weak-1.c: Likewise.
	* gcc.target/nvptx/uniform-simt-5.c: Likewise.
	gcc/
	* doc/sourcebuild.texi (Effective-Target Keywords): Document
	'nvptx_default_ptx_isa_version_at_least_6_0',
	'nvptx_runtime_alias_ptx'.
	(Add Options): Document 'nvptx_alias_ptx'.
---
 gcc/doc/sourcebuild.texi  | 14 
 gcc/testsuite/gcc.target/nvptx/alias-1.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-2.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-3.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-4.c  |  4 +-
 .../gcc.target/nvptx/alias-to-alias-1.c   |  2 +-
 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c |  2 +-
 gcc/testsuite/gcc.target/nvptx/nvptx.exp  | 66 -
 .../gcc.target/nvptx/uniform-simt-5.c |  4 +-
 gcc/testsuite/lib/target-supports.exp | 72 +++
 10 files changed, 98 insertions(+), 78 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 7c7094dc5a9..6ba72fd44a2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2424,6 +2424,17 @@ MSP430 target has the small memory model enabled (@code{-msmall}).
 MSP430 target has the large memory model enabled (@code{-mlarge}).
 @end table
 
+@subsubsection nvptx-specific attributes
+
+@table @code
+@item nvptx_default_ptx_isa_version_at_least_6_0
+nvptx code by default compiles for at least PTX ISA version 6.0.
+
+@item nvptx_runtime_alias_ptx
+The nvptx runtime environment supports the PTX ISA directive
+@code{.alias}.
+@end table
+
 @subsubsection PowerPC-specific attributes
 
 @table @code
@@ -3302,6 +3313,9 @@ compliance mode.
 @code{mips16} function attributes.
 Only MIPS targets support this feature, and only then in certain modes.
 
+@item nvptx_alias_ptx
+Enable using the PTX ISA directive @code{.alias} on nvptx targets.
+
 @item riscv_a
 Add the 'A' extension to the -march string on RISC-V targets.
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-1.c
index d251eee6e42..1c0642b14d9 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
@@ -1,7 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-do run { target nvptx_runtime_alias_ptx } } */
 /* { dg-options "-save-temps" } */
-/* { dg-add-options ptx_alias } */
+/* { dg-add-options nvptx_alias_ptx } */
 
 int v;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 96cb7e2c1ef..5c4b9c787e1 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -1,7 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-do run { target nvptx_runtime_alias_ptx } } */
 /* { dg-options "-save-temps -O2" } */
-/* { dg-add-options ptx_alias } */
+/* { dg-add-options nvptx_alias_ptx } */
 
 #include "alias-1.c"
 
diff --git a/gc

Fix 'gcc.target/nvptx/alias-2.c' comment (was: [committed][nvptx] Use .alias directive for mptx >= 6.3)

2024-09-05 Thread Thomas Schwinge

Hi!

On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
 wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
> @@ -0,0 +1,27 @@
> +[...]
> +int v;
> +
> +void __f ()
> +{
> +  v = 1;
> +}
> +
> +void f () __attribute__ ((alias ("__f")));
> +
> +int
> +main (void)
> +{
> +  if (v != 0)
> +__builtin_abort ();
> +  f ();
> +  if (v != 1)
> +__builtin_abort ();
> +  return 0;
> +}
> +[...]
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do link } */
> +/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
> +/* { dg-options "-save-temps -malias -mptx=6.3 -O2" } */
> +
> +#include "alias-1.c"
> +
> +/* Inlined, so no alias.  */
> +/* { dg-final { scan-assembler-not "\\.alias.*;" } } */
> +/* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
> +
> +/* Note static and inlined, so still there.  */
> +/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */

Actually: 's%static%extern'.  Pushed to trunk branch
commit 973c1bf51fb0f58fbfe43651bb0a61e1d124b35d
"Fix 'gcc.target/nvptx/alias-2.c' comment", see attached.


Grüße
 Thomas


>From 973c1bf51fb0f58fbfe43651bb0a61e1d124b35d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 18 Sep 2023 22:41:56 +0200
Subject: [PATCH] Fix 'gcc.target/nvptx/alias-2.c' comment

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-2.c: Fix comment.
---
 gcc/testsuite/gcc.target/nvptx/alias-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 5c4b9c787e1..7a88b6f4f6f 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -9,6 +9,6 @@
 /* { dg-final { scan-assembler-not "\\.alias.*;" } } */
 /* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
 
-/* Note static and inlined, so still there.  */
+/* Note extern and inlined, so still there.  */
 /* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
 
-- 
2.34.1

Re: [committed][nvptx] Use .alias directive for mptx >= 6.3

2024-09-05 Thread Thomas Schwinge

Hi!

On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
 wrote:
> Starting with ptx isa version 6.3, a ptx directive .alias is available.
> Use this directive to support symbol aliases, as far as possible.

> The alias support has the following [and more] limitations.

> Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
> This is currently not prohibited by the compiler, but with the driver link we
> run into:  "Internal error: alias to unknown symbol" .

Prathamesh in

"[nvptx] Fix code-gen for alias attribute" has proposed a way to make
these work, to a degree, via resolving to 'ultimate_alias_target'.

> Unreferenced aliases are not emitted (these can occur f.i. when inlining a
> call to an alias).  This avoids driver link error "Internal error: reference
> to deleted section".

That is, indeed, (still) necessary, but also problematic: if the
reference (use) of the alias is in a different compilation unit
("there"), we can't detect that "here" when deciding to not emit the
alias that's unused "here", and we then run into an unresolved symbol
"there".  (I've not yet spent further thoughts on this, in the current
GCC/nvptx using PTX '.alias' scenario.)

> At some point we may add support in the nvptx-tools linker for symbol
> aliases, and define f.i. malias=ptx and malias=ld to choose between the two in
> the compiler.

I'm working on that: 
"[nvptx] Need better alias support", via

"[LD] Handle alias in nvptx-ld as nvptx's .alias does not handle it fully".

> [nvptx] Use .alias directive for mptx >= 6.3

> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc

> @@ -968,7 +969,8 @@ static void
>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
> const char *name, const_tree decl)
>  {
> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);

This non-emitting of DECL and DEF linker markers for aliases is
problematic, as I'll discuss in the following.

Grüße
 Thomas

Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][nvptx] Use .alias directive for mptx >= 6.3)

2024-09-05 Thread Thomas Schwinge

Hi!

On 2024-09-05T14:36:54+0200, I wrote:
> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>  wrote:
>> [nvptx] Use .alias directive for mptx >= 6.3
>
>> --- a/gcc/config/nvptx/nvptx.cc
>> +++ b/gcc/config/nvptx/nvptx.cc
>
>> @@ -968,7 +969,8 @@ static void
>>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
>>const char *name, const_tree decl)
>>  {
>> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
>> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>
> This non-emitting of DECL and DEF linker markers for aliases is
> problematic, as I'll discuss in the following.

First, to show what currently is (not) happening, I've pushed to trunk
branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
"Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.


Grüße
 Thomas


>From d0f02538494ded78cac12c63f5708a53f5a77bda Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 15:27:51 +0200
Subject: [PATCH] Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning

... in order to demonstrate unexpected behavior (XFAILed here).

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-1.c: Enhance assembler scanning.
	* gcc.target/nvptx/alias-2.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-4.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
---
 gcc/testsuite/gcc.target/nvptx/alias-1.c  | 15 ++---
 gcc/testsuite/gcc.target/nvptx/alias-2.c  | 16 ++
 gcc/testsuite/gcc.target/nvptx/alias-3.c  | 15 ++---
 gcc/testsuite/gcc.target/nvptx/alias-4.c  | 17 ++
 .../gcc.target/nvptx/alias-to-alias-1.c   | 22 ++-
 5 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-1.c
index 1c0642b14d9..0fb06495f67 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
@@ -23,6 +23,15 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "\\.alias f,__f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.visible \\.func f;" 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: f$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: f$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.alias f,__f;$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)\tcall __f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)\tcall f;$} 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 7a88b6f4f6f..8ae8b5cfaed 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -5,10 +5,18 @@
 
 #include "alias-1.c"
 
+/* Note extern and inlined, so still there.  */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f$} 1 } } */
+
 /* Inlined, so no alias.  */
-/* { dg-final { scan-assembler-not "\\.alias.*;" } } */
-/* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
 
-/* Note extern and inlined, so still there.  */
-/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: f$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: f$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^\.alias f,__f;$} 0 } } */
 
+/* { dg-final { scan-assembler-times {(?n)\tcall __f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)\tcall f;$} 0 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-3.c b/gcc/testsuite/gcc.target/nvptx/alias-3.c
index b55ff26269e..1906607f95f 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-3.c
@@ -25,6 +25,15 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "\\.alias f,__f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.func __f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.func f;" 1 } } */
+/* { dg-final { scan-assembler-t

Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C' (was: Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][nvptx] Use .alias directive for mptx >= 6.3))

2024-09-05 Thread Thomas Schwinge

Hi!

On 2024-09-05T14:39:46+0200, I wrote:
> On 2024-09-05T14:36:54+0200, I wrote:
>> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>>  wrote:
>>> [nvptx] Use .alias directive for mptx >= 6.3
>>
>>> --- a/gcc/config/nvptx/nvptx.cc
>>> +++ b/gcc/config/nvptx/nvptx.cc
>>
>>> @@ -968,7 +969,8 @@ static void
>>>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
>>>   const char *name, const_tree decl)
>>>  {
>>> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
>>> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>
>> This non-emitting of DECL and DEF linker markers for aliases is
>> problematic, as I'll discuss in the following.
>
> First, to show what currently is (not) happening, I've pushed to trunk
> branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
> "Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.

Then, commit a1865fd33897bc6c6e0109df0a12ee73ce386315
"Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'", see attached, as
one representative example of C++ code where the current behavior is an
actual problem.


Grüße
 Thomas


>From a1865fd33897bc6c6e0109df0a12ee73ce386315 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 18:02:50 +0200
Subject: [PATCH] Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'

... as one minimized example for the issue that with nvptx '-malias' enabled
(as implemented in commit f8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3"), there are hundreds of
instances of link-time 'unresolved symbol [alias]' across the C++ test suite,
which are regressions compared to a test run with (default) '-mno-alias'.

	PR target/104957
	gcc/testsuite/
	* g++.target/nvptx/alias-g++.dg_init_dtor2-1.C: Add.
---
 .../nvptx/alias-g++.dg_init_dtor2-1.C | 33 +++
 1 file changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C

diff --git a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
new file mode 100644
index 000..747656d51d6
--- /dev/null
+++ b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
@@ -0,0 +1,33 @@
+/* Reduced from 'g++.dg/init/dtor2.C'.  */
+
+/* { dg-do compile } */
+/* { dg-add-options nvptx_alias_ptx } */
+/* { dg-additional-options -save-temps } */
+/* Via the magic string "-std=*++" indicate that testing one (the default) C++ standard is sufficient.  */
+
+struct B
+{
+  ~B();
+};
+
+B::~B () {
+}
+
+int main()
+{
+  B b;
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: _ZN1BD2Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD2Ev \(\.param\.u64 %in_ar0\);$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: _ZN1BD2Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD2Ev \(\.param\.u64 %in_ar0\)$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: _ZN1BD1Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD1Ev \(\.param\.u64 %in_ar0\);$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: _ZN1BD1Ev$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.alias _ZN1BD1Ev,_ZN1BD2Ev;$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)\tcall _ZN1BD1Ev, \(} 1 } }
+   { dg-final { scan-assembler-times {(?n)\tcall _ZN1BD2Ev, \(} 0 } } */
-- 
2.34.1

nvptx: Emit DECL and DEF linker markers for aliases [PR104957] (was: Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C' (was: Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][

2024-09-05 Thread Thomas Schwinge

Hi!

On 2024-09-05T14:42:00+0200, I wrote:
> On 2024-09-05T14:39:46+0200, I wrote:
>> On 2024-09-05T14:36:54+0200, I wrote:
>>> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>>>  wrote:
 [nvptx] Use .alias directive for mptx >= 6.3
>>>
 --- a/gcc/config/nvptx/nvptx.cc
 +++ b/gcc/config/nvptx/nvptx.cc
>>>
 @@ -968,7 +969,8 @@ static void
  write_fn_proto_1 (std::stringstream &s, bool is_defn,
  const char *name, const_tree decl)
  {
 -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
 +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
 +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>>
>>> This non-emitting of DECL and DEF linker markers for aliases is
>>> problematic, as I'll discuss in the following.
>>
>> First, to show what currently is (not) happening, I've pushed to trunk
>> branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
>> "Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.
>
> Then, commit a1865fd33897bc6c6e0109df0a12ee73ce386315
> "Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'", see attached, as
> one representative example of C++ code where the current behavior is an
> actual problem.

Finally, commit 8f5aade15e595b288a2c4ec60ddde8dc80df1a80
"nvptx: Emit DECL and DEF linker markers for aliases [PR104957]", see
attached, to address this issue.


Grüße
 Thomas


>From 8f5aade15e595b288a2c4ec60ddde8dc80df1a80 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 23:56:25 +0200
Subject: [PATCH] nvptx: Emit DECL and DEF linker markers for aliases
 [PR104957]

With nvptx '-malias' enabled (as implemented in
commit f8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3"), the C++ front end in certain
cases does 'write_fn_proto' before an eventual 'alias' attribute has been
added.  In that case, we do emit (via 'write_fn_marker') a DECL linker marker,
but then never emit a corresponding DEF linker marker for the alias.  This
causes hundreds of instances of link-time 'unresolved symbol [alias]' across
the C++ test suite, which are regressions compared to a test run with (default)
'-mno-alias' (in which case the respective functions get duplicated).

	PR target/104957
	gcc/
	* config/nvptx/nvptx.cc (write_fn_proto_1): Revert 2022-03-22
	change; 'write_fn_marker' also for alias DECL.
	(nvptx_asm_output_def_from_decls): 'write_fn_marker' for alias
	DEF.
	gcc/testsuite/
	* g++.target/nvptx/alias-g++.dg_init_dtor2-1.C: Un-XFAIL.
	* gcc.target/nvptx/alias-1.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
---
 gcc/config/nvptx/nvptx.cc | 6 --
 .../g++.target/nvptx/alias-g++.dg_init_dtor2-1.C  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-1.c  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-3.c  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c | 8 
 5 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 144b8d0c874..4a7c64f05eb 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -997,8 +997,7 @@ static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
 		  const char *name, const_tree decl, bool force_public)
 {
-  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
+  write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
@@ -7627,6 +7626,9 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
   fputs (s.str ().c_str (), stream);
 
   tree id = DECL_ASSEMBLER_NAME (name);
+  std::stringstream s_def;
+  write_fn_marker (s_def, true, TREE_PUBLIC (name), IDENTIFIER_POINTER (id));
+  fputs (s_def.str ().c_str (), stream);
   NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
 			IDENTIFIER_POINTER (value));
 }
diff --git a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
index 747656d51d6..a30f99af308 100644
--- a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
+++ b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
@@ -1,6 +1,6 @@
 /* Reduced from 'g++.dg/init/dtor2.C'.  */
 
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-add-options nvptx_alias_ptx } */
 /* { dg-additional-options -save-temps } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ standard is sufficient.  */
@@ -26,7 +26,7 @@ int main()
 
 /* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: _ZN1BD1Ev$} 1 } }
{ dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD1Ev \(\.param\.u64 %in_ar0\);$} 1 } }
-   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTIO

c++: Add nodiscard attribute further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

Fairly non-problematic attribute, again on top of the whole series.

Tested on x86_64-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-nodiscard1.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-nodiscard1.C.jj 2024-09-05 
13:11:26.914049570 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-nodiscard1.C2024-09-05 
13:38:05.456626161 +0200
@@ -0,0 +1,155 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  struct [[nodiscard]] S1 {};
+  struct [[nodiscard ("foobar")]] S2 {};
+  struct [[nodiscard (0)]] S3 {};  // { dg-error "'nodiscard' 
attribute argument must be a string constant" }
+  struct [[nodiscard ("foo", "bar", "baz")]] S4 {};// { dg-error "wrong 
number of arguments specified for 'nodiscard' attribute" }
+  struct [[nodiscard (0, 1, 2)]] S5 {};// { dg-error "wrong 
number of arguments specified for 'nodiscard' attribute" }
+
+  auto a = [] [[nodiscard]] () {};
+  auto b = [] constexpr [[nodiscard]] {};  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+   // { dg-error "parameter 
declaration before lambda declaration specifiers only optional with" "" { 
target c++20_down } .-1 }
+   // { dg-error "'constexpr' 
lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[nodiscard]] {};   // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+   // { dg-error "parameter 
declaration before lambda exception specification only optional with" "" { 
target c++20_down } .-1 }
+  auto d = [] () [[nodiscard]] {}; // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  auto e = new int [n] [[nodiscard]];  // { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto e2 = new int [n] [[nodiscard]] [42];// { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto f = new int [n][42] [[nodiscard]];  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  [[nodiscard]];   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[nodiscard]] {} // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[nodiscard]] if (true) {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[nodiscard]] while (false) {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[nodiscard]] goto lab;  // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[nodiscard]] lab:;  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  [[nodiscard]] try {} catch (int) {}  // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  if ([[nodiscard]] int x = 0) {}  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  switch (n)
+{
+[[nodiscard]] case 1:  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+[[nodiscard]] break;   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+[[nodiscard]] default: // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+break;
+}
+  for ([[nodiscard]] auto a : arr) {}  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  for ([[nodiscard]] auto [a, b] : arr2) {}// { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+   // { dg-error "structured 
bindings only available with" "" { target c++14_down } .-1 }
+  [[nodiscard]] asm ("");  // { dg-warning "attributes 
ignored on 'asm' declaration" }
+  try {} catch ([[nodiscard]] int x) {}// { dg-warning 
"'nodiscard' attribute can only be applied to functions or to class or 
enumeration types" }
+  try {} catch ([[nodiscard]] int) {}  // { dg-warning "'nodiscard' 
attribute can only be applied to functions or to class or enumeration types" }
+  try {} catch (int [[nodiscard]] x) {}// { dg-warning 
"attribute ignored" }
+  try {} cat

c++: Add noreturn attribute further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

Another non-problematic attribute.

Tested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-noreturn1.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-noreturn1.C.jj  2024-09-05 
13:45:58.193567109 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-noreturn1.C 2024-09-05 13:57:50.457348169 
+0200
@@ -0,0 +1,160 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+[[noreturn]] void foo1 ();
+[[noreturn ("foobar")]] void foo2 ();  // { dg-error "'noreturn' 
attribute does not take any arguments" }
+[[noreturn (0)]] void foo3 (); // { dg-error "'noreturn' 
attribute does not take any arguments" }
+
+void
+foo (int n)
+{
+  auto a = [] [[noreturn]] () { do { } while (true); };
+  auto b = [] constexpr [[noreturn]] {};   // { dg-warning "'noreturn' 
attribute does not apply to types" }
+   // { dg-error "parameter 
declaration before lambda declaration specifiers only optional with" "" { 
target c++20_down } .-1 }
+   // { dg-error "'constexpr' 
lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[noreturn]] {};// { dg-warning 
"'noreturn' attribute does not apply to types" }
+   // { dg-error "parameter 
declaration before lambda exception specification only optional with" "" { 
target c++20_down } .-1 }
+  auto d = [] () [[noreturn]] {};  // { dg-warning "'noreturn' 
attribute does not apply to types" }
+  auto e = new int [n] [[noreturn]];   // { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto e2 = new int [n] [[noreturn]] [42]; // { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto f = new int [n][42] [[noreturn]];   // { dg-warning "'noreturn' 
attribute does not apply to types" }
+  [[noreturn]];// { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[noreturn]] {}  // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[noreturn]] if (true) {}// { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[noreturn]] while (false) {}// { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[noreturn]] goto lab;   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  [[noreturn]] lab:;   // { dg-warning "'noreturn' 
attribute ignored" }
+  [[noreturn]] try {} catch (int) {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  if ([[noreturn]] int x = 0) {}   // { dg-warning "'noreturn' 
attribute ignored" }
+  switch (n)
+{
+[[noreturn]] case 1:   // { dg-warning "'noreturn' 
attribute ignored" }
+[[noreturn]] break;// { dg-warning 
"attributes at the beginning of statement are ignored" }
+[[noreturn]] default:  // { dg-warning "'noreturn' 
attribute ignored" }
+break;
+}
+  for ([[noreturn]] auto a : arr) {}   // { dg-warning "'noreturn' 
attribute ignored" }
+  for ([[noreturn]] auto [a, b] : arr2) {} // { dg-warning "'noreturn' 
attribute ignored" }
+   // { dg-error "structured 
bindings only available with" "" { target c++14_down } .-1 }
+  [[noreturn]] asm ("");   // { dg-warning "attributes 
ignored on 'asm' declaration" }
+  try {} catch ([[noreturn]] int x) {} // { dg-warning "'noreturn' 
attribute ignored" }
+  try {} catch ([[noreturn]] int) {}   // { dg-warning "'noreturn' 
attribute ignored" }
+  try {} catch (int [[noreturn]] x) {} // { dg-warning "attribute 
ignored" }
+  try {} catch (int [[noreturn]]) {}   // { dg-warning "attribute 
ignored" }
+  try {} catch (int x [[noreturn]]) {} // { dg-warning "'noreturn' 
attribute ignored" }
+}
+
+[[noreturn]] int bar ();
+using foobar [[noreturn]] = int;   // { dg-warning "'noreturn' 
attribute ignored" }
+[[noreturn]] int a;// { dg-warning "'noreturn' 
attribute ignored" }
+[[noreturn]] auto [b, c] = arr;// { dg-warning 
"'noreturn' attribute ignored" }
+   // { dg-error "structured 
bindings only available with" "" { target c++14_down } .-1 }
+[[noreturn]];  // { dg-warning "attribute 
ignored" }
+inline [[noreturn]] void baz () {} // { dg-warning "attribute 
ignored" }
+

c++: Add no_unique_address attribute further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

Another non-problematic attribute.

Tested on x86_64-linux and i686-linux, ok for trunk?

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/attr-no_unique_address1.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/attr-no_unique_address1.C.jj 2024-09-05 
14:01:00.396886959 +0200
+++ gcc/testsuite/g++.dg/cpp0x/attr-no_unique_address1.C2024-09-05 
14:11:04.710883438 +0200
@@ -0,0 +1,151 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+struct S2 {
+  [[no_unique_address]] struct {} a;
+  [[no_unique_address ("foobar")]] struct {} b;// { dg-error 
"'no_unique_address' attribute does not take any arguments" }
+  [[no_unique_address (0)]] struct {} c;   // { dg-error 
"'no_unique_address' attribute does not take any arguments" }
+  struct {} d [[no_unique_address]];
+};
+
+void
+foo (int n)
+{
+  auto a = [] [[no_unique_address]] () { };// { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  auto b = [] constexpr [[no_unique_address]] {};  // { dg-warning 
"'no_unique_address' attribute does not apply to types" }
+   // { dg-error 
"parameter declaration before lambda declaration specifiers only optional with" 
"" { target c++20_down } .-1 }
+   // { dg-error 
"'constexpr' lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept [[no_unique_address]] {};   // { dg-warning 
"'no_unique_address' attribute does not apply to types" }
+   // { dg-error 
"parameter declaration before lambda exception specification only optional 
with" "" { target c++20_down } .-1 }
+  auto d = [] () [[no_unique_address]] {}; // { dg-warning 
"'no_unique_address' attribute does not apply to types" }
+  auto e = new int [n] [[no_unique_address]];  // { dg-warning 
"attributes ignored on outermost array type in new expression" }
+  auto e2 = new int [n] [[no_unique_address]] [42];// { dg-warning 
"attributes ignored on outermost array type in new expression" }
+  auto f = new int [n][42] [[no_unique_address]];  // { dg-warning 
"'no_unique_address' attribute does not apply to types" }
+  [[no_unique_address]];   // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[no_unique_address]] {} // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[no_unique_address]] if (true) {}   // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[no_unique_address]] while (false) {}   // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[no_unique_address]] goto lab;  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  [[no_unique_address]] lab:;  // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  [[no_unique_address]] try {} catch (int) {}  // { dg-warning 
"attributes at the beginning of statement are ignored" }
+  if ([[no_unique_address]] int x = 0) {}  // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  switch (n)
+{
+[[no_unique_address]] case 1:  // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+[[no_unique_address]] break;   // { dg-warning 
"attributes at the beginning of statement are ignored" }
+[[no_unique_address]] default: // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+break;
+}
+  for ([[no_unique_address]] auto a : arr) {}  // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  for ([[no_unique_address]] auto [a, b] : arr2) {}// { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+   // { dg-error 
"structured bindings only available with" "" { target c++14_down } .-1 }
+  [[no_unique_address]] asm ("");  // { dg-warning 
"attributes ignored on 'asm' declaration" }
+  try {} catch ([[no_unique_address]] int x) {}// { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  try {} catch ([[no_unique_address]] int) {}  // { dg-warning 
"'no_unique_address' attribute can only be applied to non-static data members" }
+  try {} catch (int [[no_unique_address]] x) {}// { dg-warning 
"attribute ignored" }
+

[RFC PATCH] c++: Add alignas further test coverage [PR110345]

2024-09-05 Thread Jakub Jelinek

Hi!

I've tried to do the same thing I did for normal standard attributes
also for alignas, but there are way too many cases which are silently
accepted although my reading of:

"An alignment-specifier may be applied to a variable or to a class data member,
but it shall not be applied to a bit-field, a function parameter, or an
exception-declaration ([except.handle]).
An alignment-specifier may also be applied to the declaration of a class (in
an elaborated-type-specifier ([dcl.type.elab]) or class-head ([class]),
respectively)."

I've marked the spots where I'd expect some pedwarn with // FIXME.
Clearly we accept it e.g. on bit-fields, exception-declarations, enum
declarations, functions, to e.g. array/reference etc. types, ...

Is some of this intentional?

Though, trying clang trunk, it diagnoses all the // FIXME lines.

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/alignas21.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/alignas21.C.jj   2024-09-05 14:16:44.366395041 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/alignas21.C  2024-09-05 14:42:42.690465771 
+0200
@@ -0,0 +1,156 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  alignas (int) int x1;
+  alignas ("foobar") int x2;   // { dg-error "'alignas' 
argument has non-integral type 'const char \\\[7\\\]'" }
+  alignas (0) int x3;  // { dg-warning "requested 
alignment '0' is not a positive power of 2" }
+  alignas ("foo", "bar", "baz") int x4;// { dg-error 
"'alignas' argument has non-integral type 'const char \\\[4\\\]'" }
+   // { dg-error "expected '\\\)' 
before ',' token" "" { target *-*-* } .-1 }
+   // { dg-error "expected 
declaration before ',' token" "" { target *-*-* } .-2 }
+   // { dg-error "expected 
primary-expression before ',' token" "" { target *-*-* } .-3 }
+  alignas (0, 1, 2) int x5;// { dg-error "expected '\\\)' 
before ',' token" }
+   // { dg-error "expected 
declaration before ',' token" "" { target *-*-* } .-1 }
+   // { dg-error "expected 
primary-expression before ',' token" "" { target *-*-* } .-2 }
+
+  auto a = [] alignas (int) () {}; // FIXME
+  auto b = [] constexpr alignas (int) {};  // FIXME
+   // { dg-error "parameter 
declaration before lambda declaration specifiers only optional with" "" { 
target c++20_down } .-1 }
+   // { dg-error "'constexpr' 
lambda only available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept alignas (int) {};   // FIXME
+   // { dg-error "parameter 
declaration before lambda exception specification only optional with" "" { 
target c++20_down } .-1 }
+  auto d = [] () alignas (int) {}; // FIXME
+  auto e = new int [n] alignas (int);  // { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto e2 = new int [n] alignas (int) [42];// { dg-warning "attributes 
ignored on outermost array type in new expression" }
+  auto f = new int [n][42] alignas (int);  // FIXME
+  alignas (int);   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  alignas (int) {} // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  alignas (int) if (true) {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  alignas (int) while (false) {}   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  alignas (int) goto lab;  // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  alignas (int) lab:;  // { dg-error "alignment may 
not be specified for 'lab'" }
+  alignas (int) try {} catch (int) {}  // { dg-warning "attributes at 
the beginning of statement are ignored" }
+  if (alignas (int) int x = 0) {}
+  switch (n)
+{
+alignas (int) case 1:  // { dg-error "alignment may 
not be specified for" }
+alignas (int) break;   // { dg-warning "attributes at 
the beginning of statement are ignored" }
+alignas (int) default: // { dg-error "alignment may 
not be specified for" }
+break;
+}
+  for (alignas (int) auto a : arr) {}
+  for (alignas (int) auto [a, b] : arr2) {}// { dg-error "structured 
bindings only available with" "" { target c++14_down } }
+  alignas (int) asm ("");  // { dg-warning "attributes 
i

RE: [gimplify.cc] Avoid ICE when passing VLA vector to accelerator

2024-09-05 Thread Prathamesh Kulkarni



> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, September 3, 2024 2:09 PM
> To: Prathamesh Kulkarni 
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [gimplify.cc] Avoid ICE when passing VLA vector to
> accelerator
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 3 Sep 2024, Prathamesh Kulkarni wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, September 2, 2024 12:47 PM
> > > To: Prathamesh Kulkarni 
> > > Cc: gcc-patches@gcc.gnu.org
> > > Subject: Re: [gimplify.cc] Avoid ICE when passing VLA vector to
> > > accelerator
> > >
> > > External email: Use caution opening links or attachments
> > >
> > >
> > > On Sun, 1 Sep 2024, Prathamesh Kulkarni wrote:
> > >
> > > > Hi,
> > > > For the following test:
> > > > #include 
> > > >
> > > > int main()
> > > > {
> > > >   svint32_t x;
> > > >   #pragma omp target map(x)
> > > > x;
> > > >   return 0;
> > > > }
> > > >
> > > > compiling with -fopenmp -foffload=nvptx-none results in
> following
> > > ICE:
> > > >
> > > > t_sve.c: In function 'main':
> > > > t_sve.c:6:11: internal compiler error: Segmentation fault
> > > > 6 |   #pragma omp target map(x)
> > > >   |   ^~~
> > > > 0x228ed13 internal_error(char const*, ...)
> > > > ../../gcc/gcc/diagnostic-global-context.cc:491
> > > > 0xfcf68f crash_signal
> > > > ../../gcc/gcc/toplev.cc:321 0xc17d9c omp_add_variable
> > > > ../../gcc/gcc/gimplify.cc:7811
> > >
> > > that's not on trunk head?  Anyway, I think that instead
> > >
> > >   /* When adding a variable-sized variable, we have to handle all
> > > sorts
> > >  of additional bits of data: the pointer replacement variable,
> and
> > >  the parameters of the type.  */
> > >   if (DECL_SIZE (decl) && TREE_CODE (DECL_SIZE (decl)) !=
> > > INTEGER_CST)
> > >
> > > should instead be checking for !POLY_INT_CST_P (DECl_SIZE (decl))
> > Hi Richard,
> > Thanks for the suggestions. The attached patch adds !POLY_INT_CST_P
> > check in omp_add_variable (and few more places where it segfaulted),
> > but keeps TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST check to avoid
> above ICE with -msve-vector-bits= option.
> >
> > The test now fails with:
> > lto1: fatal error: degree of 'poly_int' exceeds
> 'NUM_POLY_INT_COEFFS'
> > (1) compilation terminated.
> > nvptx mkoffload: fatal error:
> > ../install/bin/aarch64-unknown-linux-gnu-accel-nvptx-none-gcc
> returned 1 exit status compilation terminated.
> >
> > Which looks reasonable IMO, since we don't yet fully support
> streaming
> > of poly_ints (and compiles OK when length is set with -msve-vector-
> bits= option).
> >
> > Bootstrap+test in progress on aarch64-linux-gnu.
> > Does the patch look OK ?
> 
> Please use use !poly_int_tree_p which checks for both INTEGER_CST and
> POLY_INT_CST_P.
> 
> OK with that change.
Thanks, I have committed the attached patch in:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ae88e91938af364ef5613e5461b12b484b578bc5

after verifying it passes bootstrap+test on aarch64-linux-gnu and survives 
libgomp tests
for Aarch64/nvptx offloading.

Thanks,
Prathamesh
> 
> Richard.
> 
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > >
> > > > 0xc17d9c omp_add_variable
> > > > ../../gcc/gcc/gimplify.cc:7752 0xc4176b
> > > > gimplify_scan_omp_clauses
> > > > ../../gcc/gcc/gimplify.cc:12881
> > > > 0xc46d53 gimplify_omp_workshare
> > > > ../../gcc/gcc/gimplify.cc:17139
> > > > 0xc23383 gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > (*)(tree_node*), int)
> > > > ../../gcc/gcc/gimplify.cc:18668
> > > > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > > > ../../gcc/gcc/gimplify.cc:7646
> > > > 0xc24ef7 gimplify_statement_list
> > > > ../../gcc/gcc/gimplify.cc:2250
> > > > 0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > (*)(tree_node*), int)
> > > > ../../gcc/gcc/gimplify.cc:18565
> > > > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > > > ../../gcc/gcc/gimplify.cc:7646
> > > > 0xc289d3 gimplify_bind_expr
> > > > ../../gcc/gcc/gimplify.cc:1642 0xc24b9b
> > > > gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > > (*)(tree_node*),
> > > int)
> > > > ../../gcc/gcc/gimplify.cc:18315
> > > > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > > > ../../gcc/gcc/gimplify.cc:7646
> > > > 0xc24ef7 gimplify_statement_list
> > > > ../../gcc/gcc/gimplify.cc:2250
> > > > 0xc24ef7 gimplify_expr(tree_node**, gimple**, gimple**, bool
> > > (*)(tree_node*), int)
> > > > ../../gcc/gcc/gimplify.cc:18565
> > > > 0xc27f53 gimplify_stmt(tree_node**, gimple**)
> > > > ../../gcc/gcc/gimplify.cc:7646 0xc2aadb
> > > > gimplify_body(tree_node*, bool)
> > > > ../../gcc/gcc/gimplify.cc:19393 0xc2b05f
> > > > gimplify_function_tree(tree_node*)
> > > > ../../gcc/gcc/gimplify.cc:19594 0xa0e4

[PATCH 1/3] tree-optimization/116609 - SLP live lane vectorization with partial vectors

2024-09-05 Thread Richard Biener

The following implements the simple case of single-lane SLP when
using partial vectors which can use the VEC_EXTRACT_LAST code
generation without changes.  I'll keep the PR open for further
enhancements.

This avoids FAILs of gcc.target/aarch64/sve/live_1.c when using
single-lane SLP for non-grouped stores.

PR tree-optimization/116609
* tree-vect-loop.cc (vectorizable_live_operation_1): Support
partial vectors for single-lane SLP.
---
 gcc/tree-vect-loop.cc | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 242d5e2d916..31cdc4bf53d 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10961,7 +10961,8 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 
 where VEC_LHS is the vectorized live-out result and MASK is
 the loop mask for the final iteration.  */
-  gcc_assert (ncopies == 1 && !slp_node);
+  gcc_assert (ncopies == 1
+ && (!slp_node || SLP_TREE_LANES (slp_node) == 1));
   gimple_seq tem = NULL;
   gimple_stmt_iterator gsi = gsi_last (tem);
   tree len = vect_get_loop_len (loop_vinfo, &gsi,
@@ -10995,7 +10996,7 @@ vectorizable_live_operation_1 (loop_vec_info loop_vinfo,
 
 where VEC_LHS is the vectorized live-out result and MASK is
 the loop mask for the final iteration.  */
-  gcc_assert (!slp_node);
+  gcc_assert (!slp_node || SLP_TREE_LANES (slp_node) == 1);
   tree scalar_type = TREE_TYPE (STMT_VINFO_VECTYPE (stmt_info));
   gimple_seq tem = NULL;
   gimple_stmt_iterator gsi = gsi_last (tem);
@@ -11147,7 +11148,7 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   /* No transformation required.  */
   if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
{
- if (slp_node)
+ if (slp_node && SLP_TREE_LANES (slp_node) != 1)
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -11156,7 +11157,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
 "the loop.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
- else if (ncopies > 1)
+ else if (ncopies > 1
+  || (slp_node && SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > 1))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -11166,7 +11168,8 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
}
  else
{
- gcc_assert (ncopies == 1 && !slp_node);
+ gcc_assert (ncopies == 1
+ && (!slp_node || SLP_TREE_LANES (slp_node) == 1));
  if (direct_internal_fn_supported_p (IFN_EXTRACT_LAST, vectype,
  OPTIMIZE_FOR_SPEED))
vect_record_loop_mask (loop_vinfo,
@@ -11213,8 +11216,9 @@ vectorizable_live_operation (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (slp_node)
 {
   gcc_assert (!loop_vinfo
- || (!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
- && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo)));
+ || ((!LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)
+  && !LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+ || SLP_TREE_LANES (slp_node) == 1));
 
   /* Get the correct slp vectorized stmt.  */
   vec_lhs = SLP_TREE_VEC_DEFS (slp_node)[vec_entry];
-- 
2.43.0

[PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-09-05 Thread Richard Biener

The following enables single-lane loop SLP discovery for non-grouped stores
and adjusts vectorizable_store to properly handle those.

For gfortran.dg/vect/vect-8.f90 we vectorize one additional loop,
not running into the "not falling back to strided accesses" bail-out.
I have not investigated in detail.

There is a set of i386 target assembler test FAILs,
gcc.target/i386/pr88531-2[bc].c in particular fail because the
target cannot identify SLP emulated gathers, see another mail from me.
Others need adjustment, I've adjusted one with this patch only.
In particular there are gcc.target/i386/cond_op_fma_*-1.c FAILs
that are because we no longer fold a VEC_COND_EXPR during the
region value-numbering we do after vectorization since we
code-generate a { 0.0, ... } constant in the VEC_COND_EXPR now
instead of having a separate statement which gets forwarded
and then triggers folding.  This leads to sligtly different
code generation.  The solution is probably to use gimple_build
when building stmts or, in this case, directly emit .COND_FMA
instead of .FMA and a VEC_COND_EXPR.

gcc.dg/vect/slp-19a.c mixes contiguous 8-lane SLP with a single
lane contiguous store from one lane of the 8-lane load and we
expect to use load-lanes for this reason but the heuristic for
forcing single-lane rediscovery as implemented doesn't trigger
here as it treats both SLP instances separately.  FAILs on RISC-V

gcc.dg/vect/slp-19c.c shows we fail to implement an interleaving
scheme for group_size 12 (by extension using the group_size 3
scheme to reduce to 4 lanes and then continue with a pow2 scheme
would work);  we are also not considering load-lanes because of
the above reason, but aarch64 cannot do ld12.  FAILs on AARCH64
(load requires three vectors) and x86_64.

gcc.dg/vect/slp-19c.c FAILs with variable-length vectors because
of "SLP induction not supported for variable-length vectors".

gcc.target/aarch64/pr110449.c will FAIL because the (contested)
optimization in r14-2367-g224fd59b2dc8a5 was only applied to
loop-vect but not SLP vect.  I'll leave it to target maintainers
to either XFAIL (the optimization is bad) or remove the test.

* tree-vect-slp.cc (vect_analyze_slp): Perform single-lane
loop SLP discovery for non-grouped stores.  Move check on the root
for re-doing SLP analysis with a single lane for load/store-lanes
earlier and make sure we are dealing with a grouped access.
* tree-vect-stmts.cc (vectorizable_store): Always set
vec_num for SLP.

* gcc.dg/vect/O3-pr39675-2.c: Adjust expected number of SLP.
* gcc.dg/vect/fast-math-vect-call-1.c: Likewise.
* gcc.dg/vect/no-scevccp-slp-31.c: Likewise.
* gcc.dg/vect/slp-12b.c: Likewise.
* gcc.dg/vect/slp-12c.c: Likewise.
* gcc.dg/vect/slp-19a.c: Likewise.
* gcc.dg/vect/slp-19b.c: Likewise.
* gcc.dg/vect/slp-4-big-array.c: Likewise.
* gcc.dg/vect/slp-4.c: Likewise.
* gcc.dg/vect/slp-5.c: Likewise.
* gcc.dg/vect/slp-7.c: Likewise.
* gcc.dg/vect/slp-perm-7.c: Likewise.
* gcc.dg/vect/slp-37.c: Likewise.
* gcc.dg/vect/slp-26.c: RISC-V can now SLP two instances.
* gcc.dg/vect/vect-outer-slp-3.c: Disable vectorization of
initialization loop.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/no-scevccp-outer-12.c: Un-XFAIL.  SLP can handle
inner loop inductions with multiple vector stmt copies.
* gfortran.dg/vect/vect-8.f90: Adjust expected number of
vectorized loops.
* gcc.target/i386/vectorize1.c: Adjust what we scan for.
---
 gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c  |  2 +-
 .../gcc.dg/vect/fast-math-vect-call-1.c   |  2 +-
 .../gcc.dg/vect/fast-math-vect-call-2.c   |  2 +-
 .../gcc.dg/vect/no-scevccp-outer-12.c |  3 +-
 gcc/testsuite/gcc.dg/vect/no-scevccp-slp-31.c |  5 ++-
 gcc/testsuite/gcc.dg/vect/slp-12b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-12c.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19a.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-19b.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-26.c|  3 +-
 gcc/testsuite/gcc.dg/vect/slp-37.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4-big-array.c   |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-4.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-5.c |  2 +-
 gcc/testsuite/gcc.dg/vect/slp-7.c |  4 +-
 gcc/testsuite/gcc.dg/vect/slp-perm-7.c|  2 +-
 gcc/testsuite/gcc.dg/vect/slp-reduc-5.c   |  3 +-
 gcc/testsuite/gcc.dg/vect/vect-outer-slp-3.c  |  1 +
 gcc/testsuite/gcc.target/i386/vectorize1.c|  4 +-
 gcc/testsuite/gfortran.dg/vect/vect-8.f90 |  2 +-
 gcc/tree-vect-slp.cc  | 45 +++
 gcc/tree-vect-stmts.cc| 11 +++--
 22 files changed, 69 insertions(+), 36 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/O3-pr39675-2.c 
b/gcc/testsuite/gcc.dg

[PATCH 2/3] tree-optimization/116610 - wrong SLP induction bias for mask peeling

2024-09-05 Thread Richard Biener

The following fixes a mistake when applying the bias for peeling via
masking to the inital value of SLP inductions.

This resolves gcc.target/aarch64/sve/peel_ind_1.c (a scan-assembler
only unfortunately) when forcing single-lane SLP for it.

PR tree-optimization/116610
* tree-vect-loop.cc (vectorizable_induction): Use MINUS_EXPR
to apply a mask peeling adjustment.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 31cdc4bf53d..a879a13bbf0 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10543,7 +10543,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
  vec_steps.safe_push (vec_step);
  tree step_mul = gimple_build_vector (&init_stmts, &mul_elts);
  if (peel_mul)
-   step_mul = gimple_build (&init_stmts, PLUS_EXPR, step_vectype,
+   step_mul = gimple_build (&init_stmts, MINUS_EXPR, step_vectype,
 step_mul, peel_mul);
  if (!init_node)
vec_init = gimple_build_vector (&init_stmts, &init_elts);
-- 
2.43.0

Re: PING: [PATCH] ipa: Don't disable function parameter analysis for fat LTO streaming

2024-09-05 Thread Jan Hubicka

> On Tue, Sep 3, 2024 at 4:00 AM Jan Hubicka  wrote:
> >
> > > > >
> > > > > PR ipa/116410
> > > > > * ipa-modref.cc (analyze_parms): Always analyze function 
> > > > > parameter
> > > > > for LTO streaming.
> > > > >
> > > > > Signed-off-by: H.J. Lu 
> > > > > ---
> > > > >  gcc/ipa-modref.cc | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/gcc/ipa-modref.cc b/gcc/ipa-modref.cc
> > > > > index 59cfe91f987..9275030c254 100644
> > > > > --- a/gcc/ipa-modref.cc
> > > > > +++ b/gcc/ipa-modref.cc
> > > > > @@ -2975,7 +2975,7 @@ analyze_parms (modref_summary *summary, 
> > > > > modref_summary_lto *summary_lto,
> > > > > summary->arg_flags.safe_grow_cleared (count, true);
> > > > >   summary->arg_flags[parm_index] = EAF_UNUSED;
> > > > > }
> > > > > - else if (summary_lto)
> > > > > + if (summary_lto)
> > > > > {
> > > > >   if (parm_index >= summary_lto->arg_flags.length ())
> > > > > summary_lto->arg_flags.safe_grow_cleared (count, 
> > > > > true);
> > > > > @@ -3034,7 +3034,7 @@ analyze_parms (modref_summary *summary, 
> > > > > modref_summary_lto *summary_lto,
> > > > > summary->arg_flags.safe_grow_cleared (count, true);
> > > > >   summary->arg_flags[parm_index] = flags;
> > > > > }
> > > > > - else if (summary_lto)
> > > > > + if (summary_lto)
> > > > > {
> > > > >   if (parm_index >= summary_lto->arg_flags.length ())
> > > > > summary_lto->arg_flags.safe_grow_cleared (count, 
> > > > > true);
> > > > > --
> > > > > 2.46.0
> > > > >
> > > >
> > > > These are oversights in
> > > >
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=85ebbabd85e03bdc3afc190aeb29250606d18322
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3350e59f2985469b2472e4d9a6d387337da4519b
> > > >
> > > > to have
> > > >
> > > >   if (summary)
> > > >   ...
> > > >   else if (summary_lto)
> > > >    This disables LTO optimization for  -ffat-lto-objects.
> > > >
> > > > Is this patch OK for master and backports?
> > >
> > > OK for master.  Please wait with backports though, eventually Honza has 
> > > comments
> > > as well.
> >
> > It looks good to me.  The code was originally written for separate LTO
> > and non-LTO paths (since with LTO we can not collect alias sets that
> > are not stable across LTO streaming).  Plan was to eventually merge more
> > of the logic by templates, but that did not happen (yet).  I will try to
> > look into cleaning this up bit more after adding the nonsequential
> > attribtue
> >
> 
> OK for backports?
OK,
Thanks!
Honza
> 
> 
> -- 
> H.J.

[RISCV] target-specific source placement

2024-09-05 Thread Nathan Sidwell


Hi,
looking at the RISCV code, it seems that there are several vendor-specific files 
in config/riscv.  For instance sifive-7.md and xiangshan.md. It seems these are 
unconditionally included for all riscv targets. I guess then one doesn't end up 
with some combinatorial explosion of possible riscv compilers. But it doesn't 
seem scalable, given the one of the points of riscv is to add your own magic 
pixie dust.


In my case, I have a port that also has a bunch of vendor-specific passes, which 
have unfortunately been placed in the main gcc directory. They directly rely on 
an API added to config/riscv. IMHO placing them in a vendor subdirectory of 
config/riscv would seem cleaner. Then have config glue to include them in the 
build under something like a --with-riscv-$vendor-extensions configure flag


Whether this port gets considered for upstreaming is unknown.

Anyway, I guess I'm suggesting that, for new code:

1) vendor-specific files get put in a config/riscv/$vendor subdirectory

2) configure-time options determine whether a specific vendor's bits are 
included in the build.


$vendor names can be those used in the X$vendor$suffix ISA extension scheme

thoughts?

nathan

--
Nathan Sidwell

Re: [PATCH v2] c++: fn redecl in fn scope wrongly accepted [PR116239]

2024-09-05 Thread Jason Merrill


On 9/4/24 1:18 PM, Marek Polacek wrote:

On Wed, Sep 04, 2024 at 12:28:49PM -0400, Jason Merrill wrote:



+  if (!validate_constexpr_redeclaration (alias, decl))
+return;
+
 retrofit_lang_decl (decl);
 DECL_LOCAL_DECL_ALIAS (decl) = alias;
   }


I don't think we need this in the case that we built a new alias and pushed
it; in that case alias is built from decl, and should certainly match
already.  So let's put this call before we decide to build a new alias,
either in the loop or just after it.


That's right.  How about this?


Hmm, I'd still prefer to have it *before* the !alias case for locality; 
putting it as an else to a 66-line if seems obscure.


But OK however.


dg.exp passed.

-- >8 --
Redeclaration such as

   void f(void);
   consteval void f(void);

is invalid.  In a namespace scope, we detect the collision in
validate_constexpr_redeclaration, but not when one declaration is
at block scope.

When we have

   void f(void);
   void g() { consteval void f(void); }

we call pushdecl on the second f and call push_local_extern_decl_alias.
It finds the namespace-scope f:

 for (ovl_iterator iter (binding); iter; ++iter)
   if (decls_match (decl, *iter, /*record_versions*/false))
 {
   alias = *iter;
   break;
 }

but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
(and do not call another pushdecl leading to duplicate_decls which
would detect mismatching return types, for example).  I don't think
we want to change decls_match, so a simple fix is to detect the
problem in push_local_extern_decl_alias.

PR c++/116239

gcc/cp/ChangeLog:

* cp-tree.h (validate_constexpr_redeclaration): Declare.
* decl.cc (validate_constexpr_redeclaration): No longer static.
* name-lookup.cc (push_local_extern_decl_alias): Call
validate_constexpr_redeclaration.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/redeclaration-6.C: New test.
---
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl.cc|  2 +-
  gcc/cp/name-lookup.cc |  2 ++
  .../g++.dg/diagnostic/redeclaration-6.C   | 34 +++
  4 files changed, 38 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..1a763b683de 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6992,6 +6992,7 @@ extern bool member_like_constrained_friend_p  (tree);
  extern bool fns_correspond(tree, tree);
  extern int decls_match(tree, tree, bool = 
true);
  extern bool maybe_version_functions   (tree, tree, bool);
+extern bool validate_constexpr_redeclaration   (tree, tree);
  extern bool merge_default_template_args   (tree, tree, bool);
  extern tree duplicate_decls   (tree, tree,
 bool hiding = false,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 6458e96bded..4ad68d609d7 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1412,7 +1412,7 @@ check_redeclaration_exception_specification (tree 
new_decl,
  /* Return true if OLD_DECL and NEW_DECL agree on constexprness.
 Otherwise issue diagnostics.  */
  
-static bool

+bool
  validate_constexpr_redeclaration (tree old_decl, tree new_decl)
  {
old_decl = STRIP_TEMPLATE (old_decl);
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 70ad4cbf3b5..98e15878657 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3706,6 +3706,8 @@ push_local_extern_decl_alias (tree decl)
  /* Adjust visibility.  */
  determine_visibility (alias);
}
+  else if (!validate_constexpr_redeclaration (alias, decl))
+   return;
  }
  
retrofit_lang_decl (decl);

diff --git a/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C 
b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
new file mode 100644
index 000..ed8d4af7792
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
@@ -0,0 +1,34 @@
+// PR c++/116239
+// { dg-do compile { target c++20 } }
+
+consteval void f1();
+void f2();
+constexpr void f3();
+void f4();
+consteval void f5();
+constexpr void f6();
+
+void
+g ()
+{
+  void f1();   // { dg-error "differs in .consteval." }
+  consteval void f2(); // { dg-error "differs in .consteval." }
+
+  void f3();   // { dg-error "differs in .constexpr." }
+  constexpr void f4();  // { dg-error "differs in .constexpr." }
+
+  consteval void f5();
+  constexpr void f6();
+
+  void f7();
+  consteval void f7(); // { dg-error "differs in .consteval." }
+
+  consteval void f8();
+  void f8();   // { dg-error "differs in .consteval." }
+
+  void f9();
+  constexpr void f9(); // { dg-error "differs in .constexpr." }
+
+  constexpr void f10();
+  void f10();  // { d

Re: [PATCH] c++, v3: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-05 Thread Jason Merrill


On 9/4/24 2:03 PM, Jakub Jelinek wrote:

On Wed, Sep 04, 2024 at 01:22:47PM -0400, Jason Merrill wrote:

@@ -8985,6 +9003,13 @@ cp_finish_decl (tree decl, tree init, bo
 if (var_definition_p)
abstract_virtuals_error (decl, type);
+  if (decomp && !processing_template_decl)
+   {
+ need_decomp_init = cp_finish_decomp (decl, decomp, true);
+ if (!need_decomp_init)
+   decomp_cl.decomp = NULL;



It seems like all tests of need_decomp_init could instead test
decomp_cl.decomp.  And if we make that field a reference to the decomp
parameter, we could refer to the parameter instead of ever referring to
decomp_cl.


So like this (so far quickly tested on *decomp* dr2867*, full
bootstrap/regtest queued)?

2024-09-04  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp fpr structured bindings
here if !processing_template_decl, first with test_p.  For
automatic structured binding bases if the test cp_finish_decomp
returned true wrap the initialization together with what non-test
cp_finish_decomp emits with a CLEANUP_POINT_EXPR, and if there are
any CLEANUP_STMTs needed, emit them around the whole
CLEANUP_POINT_EXPR with guard variables for the cleanups.  Call
cp_finish_decomp using RAII if not called with decomp != NULL
otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call.  Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-30 09:09:45.466623869 +0200
+++ gcc/cp/cp-tree.h2024-08-30 11:00:39.861747964 +0200
@@ -7024,7 +7024,7 @@ extern void omp_declare_variant_finalize
  struct cp_decomp { tree decl; unsigned int count; };
  extern void cp_finish_decl(tree, tree, bool, tree, int, 
cp_decomp * = nullptr);
  extern tree lookup_decomp_type(tree);
-extern void cp_finish_decomp   (tree, cp_decomp *);
+extern bool cp_finish_decomp   (tree, cp_decomp *, bool = 
false);
  extern int cp_complete_array_type (tree *, tree, bool);
  extern int cp_complete_array_type_or_error(tree *, tree, bool, 
tsubst_flags_t);
  extern tree build_ptrmemfunc_type (tree);
--- gcc/cp/decl.cc.jj   2024-08-30 09:09:45.495623494 +0200
+++ gcc/cp/decl.cc  2024-09-04 19:55:59.046491602 +0200
@@ -103,7 +103,7 @@ static tree check_special_function_retur
  static tree push_cp_library_fn (enum tree_code, tree, int);
  static tree build_cp_library_fn (tree, enum tree_code, tree, int);
  static void store_parm_decls (tree);
-static void initialize_local_var (tree, tree);
+static void initialize_local_var (tree, tree, bool);
  static void expand_static_init (tree, tree);
  static location_t smallest_type_location (const cp_decl_specifier_seq*);
  static bool identify_goto (tree, location_t, const location_t *,
@@ -8058,14 +8058,13 @@ wrap_temporary_cleanups (tree init, tree
  /* Generate code to initialize DECL (a local variable).  */
  
  static void

-initialize_local_var (tree decl, tree init)
+initialize_local_var (tree decl, tree init, bool decomp)
  {
tree type = TREE_TYPE (decl);
tree cleanup;
int already_used;
  
-  gcc_assert (VAR_P (decl)

- || TREE_CODE (decl) == RESULT_DECL);
+  gcc_assert (VAR_P (decl) || TREE_CODE (decl) == RESULT_DECL);
gcc_assert (!TREE_STATIC (decl));
  
if (DECL_SIZE (decl) == NULL_TREE)

@@ -8085,7 +8084,8 @@ initialize_local_var (tree decl, tree in
  DECL_READ_P (decl) = 1;
  
/* Generate a cleanup, if necessary.  */

-  cleanup = cxx_maybe_build_cleanup (decl, tf_warning_or_error);
+  cleanup = (decomp ? NULL_TREE
+: cxx_maybe_build_cleanup (decl, tf_warning_or_error));
  
/* Perform the initialization.  */

if (init)
@@ -8120,10 +8120,16 @@ initialize_local_var (tree decl, tree in
  
  	  gcc_assert (building_stmt_list_p ());

  saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
- current_stmt_tree

[PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Patrick Palka

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

A lambda within a default template argument used in some template-id
may have a smaller template depth than the context of the template-id.
For example, the lambda in v1's default template argument has template
depth 1, and in v2's has template depth 2, but the template-ids v1<0>
and v2<0> which uses these default arguments appear in a depth 3 template
context.  So add_extra_args will ultimately return args with depth 3 --
too many args for the lambda, leading to a bogus substitution.

This patch fixes this by trimming the result of add_extra_args to match
the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH field
is added that tracks the template-ness of a lambda;

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
trim the augmented template arguments to match the template depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
 gcc/cp/pt.cc  | 11 +
 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30 +++
 2 files changed, 41 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..c49a26b4f5e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
   return t;
 }
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then it's probably from
+a context (e.g. default template argument context) which may have fewer
+levels than the current context it's embedded in.  Adjust the result of
+add_extra_args accordingly.  */
+  tree ctx_parms = DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (oldfn));
+  if (generic_lambda_fn_p (oldfn))
+   ctx_parms = TREE_CHAIN (ctx_parms);
+  args = get_innermost_template_args (args, TMPL_PARMS_DEPTH (ctx_parms));
+}
 
   tree r = build_lambda_expr ();
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
new file mode 100644
index 000..c5c0525908e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
@@ -0,0 +1,30 @@
+// PR c++/116567
+// { dg-do compile { target c++20 } }
+
+template
+bool v1 = true;
+
+template
+bool v1g = true;
+
+template
+struct A {
+  template
+  static inline bool v2 = true;
+
+  template
+  static inline bool v2g = true;
+
+  template
+  struct B {
+template
+static void f() {
+  v1<0> && v1g<0>;
+  v2<0> && v2g<0>;
+}
+  };
+};
+
+auto main() -> int {
+  A::B::f();
+}
-- 
2.46.0.519.g2e7b89e038

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Patrick Palka

On Thu, 5 Sep 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk/14?
> 
> -- >8 --
> 
> A lambda within a default template argument used in some template-id
> may have a smaller template depth than the context of the template-id.
> For example, the lambda in v1's default template argument has template
> depth 1, and in v2's has template depth 2, but the template-ids v1<0>
> and v2<0> which uses these default arguments appear in a depth 3 template
> context.  So add_extra_args will ultimately return args with depth 3 --
> too many args for the lambda, leading to a bogus substitution.
> 
> This patch fixes this by trimming the result of add_extra_args to match
> the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH field
> is added that tracks the template-ness of a lambda;

Oops, disregard this last sentence, it's a vestige from an earlier
version of this patch before it occurred to me we can freely obtain the
template depth of a lambda through its operator()'s TEMPLATE_DECL.

> 
>   PR c++/116567
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
>   trim the augmented template arguments to match the template depth
>   of the lambda.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/lambda-targ7.C: New test.
> ---
>  gcc/cp/pt.cc  | 11 +
>  gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30 +++
>  2 files changed, 41 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 747e627f547..c49a26b4f5e 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args, 
> tsubst_flags_t complain, tree in_decl)
>LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
>return t;
>  }
> +  if (LAMBDA_EXPR_EXTRA_ARGS (t))
> +{
> +  /* If we deferred substitution into this lambda, then it's probably 
> from
> +  a context (e.g. default template argument context) which may have fewer
> +  levels than the current context it's embedded in.  Adjust the result of
> +  add_extra_args accordingly.  */
> +  tree ctx_parms = DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (oldfn));
> +  if (generic_lambda_fn_p (oldfn))
> + ctx_parms = TREE_CHAIN (ctx_parms);
> +  args = get_innermost_template_args (args, TMPL_PARMS_DEPTH 
> (ctx_parms));
> +}
>  
>tree r = build_lambda_expr ();
>  
> diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C 
> b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> new file mode 100644
> index 000..c5c0525908e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> @@ -0,0 +1,30 @@
> +// PR c++/116567
> +// { dg-do compile { target c++20 } }
> +
> +template
> +bool v1 = true;
> +
> +template
> +bool v1g = true;
> +
> +template
> +struct A {
> +  template
> +  static inline bool v2 = true;
> +
> +  template
> +  static inline bool v2g = true;
> +
> +  template
> +  struct B {
> +template
> +static void f() {
> +  v1<0> && v1g<0>;
> +  v2<0> && v2g<0>;
> +}
> +  };
> +};
> +
> +auto main() -> int {
> +  A::B::f();
> +}
> -- 
> 2.46.0.519.g2e7b89e038
> 
>

[PATCH] c++: Fix mangling of otherwise unattached class-scope lambdas [PR116568]

2024-09-05 Thread Nathaniel Shead

Bootstrapped and regtested (so far just dg.exp) on x86_64-pc-linux-gnu,
OK for trunk if full regtest passes?  Or would it be better to try to
implement all the rules mentioned in the linked pull request for one
commit; I admit I haven't looked very closely yet at how else we
diverge?

-- >8 --

This is a step closer to implementing the suggested changes for
https://github.com/itanium-cxx-abi/cxx-abi/pull/85.

The main purpose of the patch is to solve testcase PR c++/116568, caused
by lambda expressions within the templates not correctly having the
extra mangling scope attached.

While I was working on this I found some other cases where the mangling
of lambdas was incorrect and causing issues, notably the testcase
lambda-ctx3.C which currently emits the same mangling for the base class
and member lambdas, causing mysterious assembler errors.  Fixing this
ended up also improving the situation for PR c++/107741 as well, though
it doesn't seem easily possible to fix the A::x case at this time so
I've left that as an XFAIL.

PR c++/107741
PR c++/116568

gcc/cp/ChangeLog:

* cp-tree.h (LAMBDA_EXPR_EXTRA_SCOPE): Adjust comment.
* parser.cc (cp_parser_class_head): Start (and do not finish)
lambda scope for all valid types.
(cp_parser_class_specifier): Finish lambda scope after parsing
members instead.
(cp_parser_member_declaration): Adjust comment to mention
missing lambda scoping for static member initializers.
* pt.cc (instantiate_class_template): Add lambda scoping.
(instantiate_template): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/abi/lambda-ctx2.C: New test.
* g++.dg/abi/lambda-ctx3.C: New test.
* g++.dg/modules/lambda-8.h: New test.
* g++.dg/modules/lambda-8_a.H: New test.
* g++.dg/modules/lambda-8_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h  |  3 +-
 gcc/cp/parser.cc  | 31 +
 gcc/cp/pt.cc  | 12 +++-
 gcc/testsuite/g++.dg/abi/lambda-ctx2.C| 34 +++
 gcc/testsuite/g++.dg/abi/lambda-ctx3.C| 21 ++
 gcc/testsuite/g++.dg/modules/lambda-8.h   |  7 +
 gcc/testsuite/g++.dg/modules/lambda-8_a.H |  5 
 gcc/testsuite/g++.dg/modules/lambda-8_b.C |  5 
 8 files changed, 104 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx2.C
 create mode 100644 gcc/testsuite/g++.dg/abi/lambda-ctx3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8.h
 create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/lambda-8_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..af1e254745b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1513,7 +1513,8 @@ enum cp_lambda_default_capture_mode_type {
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->locus)
 
 /* The mangling scope for the lambda: FUNCTION_DECL, PARM_DECL, VAR_DECL,
-   FIELD_DECL or NULL_TREE.  If this is NULL_TREE, we have no linkage.  */
+   FIELD_DECL, TYPE_DECL, or NULL_TREE.  If this is NULL_TREE, we have no
+   linkage.  */
 #define LAMBDA_EXPR_EXTRA_SCOPE(NODE) \
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->extra_scope)
 
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5654bc00e4d..6e5228757a5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -27051,6 +27051,8 @@ cp_parser_class_specifier (cp_parser* parser)
   if (!braces.require_open (parser))
 {
   pop_deferring_access_checks ();
+  if (type != error_mark_node)
+   finish_lambda_scope ();
   return error_mark_node;
 }
 
@@ -27115,7 +27117,10 @@ cp_parser_class_specifier (cp_parser* parser)
   if (cp_parser_allow_gnu_extensions_p (parser))
 attributes = cp_parser_gnu_attributes_opt (parser);
   if (type != error_mark_node)
-type = finish_struct (type, attributes);
+{
+  type = finish_struct (type, attributes);
+  finish_lambda_scope ();
+}
   if (nested_name_specifier_p)
 pop_inner_scope (old_scope, scope);
 
@@ -27955,6 +27960,12 @@ cp_parser_class_head (cp_parser* parser,
   if (flag_concepts)
 type = associate_classtype_constraints (type);
 
+  /* Lambdas in bases and members must have the same mangling scope for ABI.
+ We open this scope now, and will close it in cp_parser_class_specifier
+ after parsing the member list.  */
+  if (type && type != error_mark_node)
+start_lambda_scope (TYPE_NAME (type));
+
   /* We will have entered the scope containing the class; the names of
  base classes should be looked up in that context.  For example:
 
@@ -27969,16 +27980,10 @@ cp_parser_class_head (cp_parser* parser,
   if (cp_lexer_next_token_is (parser->lexer, CPP_COLON))
 {
   if (type)
-   {
- pushclass (type);
- start_lambda_scope (TYPE_NAME (type));
-   }
+

Re: [PATCH] RISC-V: Optimize branches with shifted immediate operands

2024-09-05 Thread Jovan Vukic

> It's worth noting there is a newer way which is usually slightly simpler
> than a match_operator. Specifically code iterators.

Thank you for the very detailed feedback. It is not a problem to add code 
iterators. I would add iterators for "eq" and "ne" in riscv/iterators.md since 
they don't currently exist:

> (define_code_iterator any_eq [eq ne])

I would also add new  values for "eq" and "ne". I assume it would be 
best to submit the patch again as version 2 with these changes.

> The pattern uses shifted_const_arith_operand, which is good as it
> validates that the constant, if normalized by shifting away its trailing
> zeros fits in a simm12.
>
> But the normalization you're doing on the two constants is limited by
> the smaller of trailing zero counts.  So operands2 might be 0x8100 which
> requires an 8 bit shift for normalization.  operands3 might be 0x81000
> which requires a 12 bit shift for normalization.  In that case we'll use
> 8 as our shift count for normalization, resulting in:
>
> 0x8100 >> 8 = 0x81, a valid small operand
> 0x81000 >> 8 = 0x810, not a valid small operand.
>
>
> I think that'll generate invalid RTL at split time.
>
> What I think you need to do is in the main predicate (the same place
> you're currently !SMALL_OPERAND (INTVAL (operands[3]))), you'll need to
> check that both operands are SMALL_OPERAND after normalization.

Regarding the second issue, thanks again for the clear explanation. While at 
first this might seem like a problem, I believe these cases won't actually be a 
problem.

The comparisons you mentioned, (x & 0x81000) == 0x8100 and (x & 0x8100) == 
0x81000, will always evaluate as false, and this pattern will never be used for 
them (https://godbolt.org/z/Y11EGMb4f).

Even in general, I'm quite sure we will never encounter an operand, after 
shifting, greater than 2^11 (i.e. not a SMALL_OPERAND). I will provide my 
reasoning below, but if you find it incorrect, or have any counterexamples, I 
would be happy to make the requested changes, add the mentioned check and 
submit that as PATCH v2.

Lets consider the general expression (x & c1) == c2, where t1 and t2 represent 
the counts of trailing zeros in each constant. There are three cases to 
consider:
1. When t1 == t2: The pattern works fine, with no edge cases.
2. When t1 > t2: The expression will always evaluate as false, and the pattern 
won’t even be considered. For example, (x & 0x81000) == 0x8100.
3. When t1 < t2: In this case:
   - c1 must be of the form 0x0KLM00 (where the highest bit of K cannot be set) 
to meet the shifted_const_arith_operand constraint, ensuring 
SMALL_OPERAND(0x0KLM) == true (i.e. 0x0KLM < 2^11).
   - To prevent the expression from immediately evaluating as false, c2 must be 
in the form 0x0PQ<0bxxx0>00, where this value has to have only 0 or 1 in bit 
positions where c1 has 1 (and 0 elsewhere). Otherwise, (x & c1) == c2 would 
instantly be false, so this pattern wouldn’t be used. Lets call this "the 
critical condition".
   - After shifting c1 and c2, we have c1 == 0xKLM and c2 == 0xPQ<0bxxx0>, 
assuming the LSB of M is set to 1.
   - Due to "the critical condition", c2 == 0xPQ<0bxxx0> cannot have the 
highest bit of P set to 1. Otherwise, (x & c1) == c2 would immediately evaluate 
as false, since 0xKLM is guaranteed not to have the highest bit of K set to 1. 
This guarantees that SMALL_OPERAND(0xPQ<0bxxx0>) will always be true (i.e. c2 < 
2^11).
CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.

[PATCH v3] c++: fn redecl in fn scope wrongly accepted [PR116239]

2024-09-05 Thread Marek Polacek

On Thu, Sep 05, 2024 at 10:42:22AM -0400, Jason Merrill wrote:
> On 9/4/24 1:18 PM, Marek Polacek wrote:
> > On Wed, Sep 04, 2024 at 12:28:49PM -0400, Jason Merrill wrote:
> 
> > > > +  if (!validate_constexpr_redeclaration (alias, decl))
> > > > +return;
> > > > +
> > > >  retrofit_lang_decl (decl);
> > > >  DECL_LOCAL_DECL_ALIAS (decl) = alias;
> > > >}
> > > 
> > > I don't think we need this in the case that we built a new alias and 
> > > pushed
> > > it; in that case alias is built from decl, and should certainly match
> > > already.  So let's put this call before we decide to build a new alias,
> > > either in the loop or just after it.
> > 
> > That's right.  How about this?
> 
> Hmm, I'd still prefer to have it *before* the !alias case for locality;
> putting it as an else to a 66-line if seems obscure.
> 
> But OK however.

Ah, let me change it.  Still OK?  dg.exp passed.

-- >8 --
Redeclaration such as

  void f(void);
  consteval void f(void);

is invalid.  In a namespace scope, we detect the collision in
validate_constexpr_redeclaration, but not when one declaration is
at block scope.

When we have

  void f(void);
  void g() { consteval void f(void); }

we call pushdecl on the second f and call push_local_extern_decl_alias.
It finds the namespace-scope f:

for (ovl_iterator iter (binding); iter; ++iter)
  if (decls_match (decl, *iter, /*record_versions*/false))
{
  alias = *iter;
  break;
}

but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
(and do not call another pushdecl leading to duplicate_decls which
would detect mismatching return types, for example).  I don't think
we want to change decls_match, so a simple fix is to detect the
problem in push_local_extern_decl_alias.

PR c++/116239

gcc/cp/ChangeLog:

* cp-tree.h (validate_constexpr_redeclaration): Declare.
* decl.cc (validate_constexpr_redeclaration): No longer static.
* name-lookup.cc (push_local_extern_decl_alias): Call
validate_constexpr_redeclaration.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/redeclaration-6.C: New test.
---
 gcc/cp/cp-tree.h  |  1 +
 gcc/cp/decl.cc|  2 +-
 gcc/cp/name-lookup.cc |  2 ++
 .../g++.dg/diagnostic/redeclaration-6.C   | 34 +++
 4 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..1a763b683de 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6992,6 +6992,7 @@ extern bool member_like_constrained_friend_p  (tree);
 extern bool fns_correspond (tree, tree);
 extern int decls_match (tree, tree, bool = true);
 extern bool maybe_version_functions(tree, tree, bool);
+extern bool validate_constexpr_redeclaration   (tree, tree);
 extern bool merge_default_template_args(tree, tree, bool);
 extern tree duplicate_decls(tree, tree,
 bool hiding = false,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7bad3047ad9..f4128dbccdf 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1412,7 +1412,7 @@ check_redeclaration_exception_specification (tree 
new_decl,
 /* Return true if OLD_DECL and NEW_DECL agree on constexprness.
Otherwise issue diagnostics.  */
 
-static bool
+bool
 validate_constexpr_redeclaration (tree old_decl, tree new_decl)
 {
   old_decl = STRIP_TEMPLATE (old_decl);
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 7a6cc244c15..cd3947cbe4f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3637,6 +3637,8 @@ push_local_extern_decl_alias (tree decl)
  if (decls_match (decl, *iter, /*record_versions*/false))
{
  alias = *iter;
+ if (!validate_constexpr_redeclaration (alias, decl))
+   return;
  break;
}
 
diff --git a/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C 
b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
new file mode 100644
index 000..ed8d4af7792
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
@@ -0,0 +1,34 @@
+// PR c++/116239
+// { dg-do compile { target c++20 } }
+
+consteval void f1();
+void f2();
+constexpr void f3();
+void f4();
+consteval void f5();
+constexpr void f6();
+
+void
+g ()
+{
+  void f1();   // { dg-error "differs in .consteval." }
+  consteval void f2(); // { dg-error "differs in .consteval." }
+
+  void f3();   // { dg-error "differs in .constexpr." }
+  constexpr void f4();  // { dg-error "differs in .constexpr." }
+
+  consteval void f5();
+  constexpr void f6();
+
+  void f7();
+  consteval void f7(); // { dg-error "differs in .consteval." }
+
+  consteval void f8();
+  void f

[PATCH] c++, v4: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-05 Thread Jakub Jelinek

On Thu, Sep 05, 2024 at 10:51:47AM -0400, Jason Merrill wrote:
> > @@ -8993,6 +9010,11 @@ cp_finish_decl (tree decl, tree init, bo
> > if (var_definition_p)
> > abstract_virtuals_error (decl, type);
> > +  if (decomp
> > + && !processing_template_decl
> > + && !cp_finish_decomp (decl, decomp, true))
> 
> It looks like when processing_template_decl this patch will do the expanded
> cleanup below, unlike the v2 patch.

No, because around 300 lines earlier there is
  if (processing_template_decl)
{
...
  return;
}

But that means I can leave that && !preprocessing_template_decl part
out.
The earlier patch passed bootstrap/regtest on x86_64-linux and i686-linux,
will obviously retest this one which changes just that single hunk.

Ok for trunk if it passes?

2024-09-05  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp for structured bindings
here, first with test_p.  For automatic structured binding bases
if the test cp_finish_decomp returned true wrap the initialization
together with what non-test cp_finish_decomp emits with a
CLEANUP_POINT_EXPR, and if there are any CLEANUP_STMTs needed, emit
them around the whole CLEANUP_POINT_EXPR with guard variables for the
cleanups.  Call cp_finish_decomp using RAII if not called with
decomp != NULL otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call.  Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-30 09:09:45.466623869 +0200
+++ gcc/cp/cp-tree.h2024-08-30 11:00:39.861747964 +0200
@@ -7024,7 +7024,7 @@ extern void omp_declare_variant_finalize
 struct cp_decomp { tree decl; unsigned int count; };
 extern void cp_finish_decl (tree, tree, bool, tree, int, 
cp_decomp * = nullptr);
 extern tree lookup_decomp_type (tree);
-extern void cp_finish_decomp   (tree, cp_decomp *);
+extern bool cp_finish_decomp   (tree, cp_decomp *, bool = 
false);
 extern int cp_complete_array_type  (tree *, tree, bool);
 extern int cp_complete_array_type_or_error (tree *, tree, bool, 
tsubst_flags_t);
 extern tree build_ptrmemfunc_type  (tree);
--- gcc/cp/decl.cc.jj   2024-08-30 09:09:45.495623494 +0200
+++ gcc/cp/decl.cc  2024-09-04 19:55:59.046491602 +0200
@@ -103,7 +103,7 @@ static tree check_special_function_retur
 static tree push_cp_library_fn (enum tree_code, tree, int);
 static tree build_cp_library_fn (tree, enum tree_code, tree, int);
 static void store_parm_decls (tree);
-static void initialize_local_var (tree, tree);
+static void initialize_local_var (tree, tree, bool);
 static void expand_static_init (tree, tree);
 static location_t smallest_type_location (const cp_decl_specifier_seq*);
 static bool identify_goto (tree, location_t, const location_t *,
@@ -8058,14 +8058,13 @@ wrap_temporary_cleanups (tree init, tree
 /* Generate code to initialize DECL (a local variable).  */

 static void
-initialize_local_var (tree decl, tree init)
+initialize_local_var (tree decl, tree init, bool decomp)
 {
   tree type = TREE_TYPE (decl);
   tree cleanup;
   int already_used;

-  gcc_assert (VAR_P (decl)
- || TREE_CODE (decl) == RESULT_DECL);
+  gcc_assert (VAR_P (decl) || TREE_CODE (decl) == RESULT_DECL);
   gcc_assert (!TREE_STATIC (decl));

   if (DECL_SIZE (decl) == NULL_TREE)
@@ -8085,7 +8084,8 @@ initialize_local_var (tree decl, tree in
 DECL_READ_P (decl) = 1;

   /* Generate a cleanup, if necessary.  */
-  cleanup = cxx_maybe_build_cleanup (decl, tf_warning_or_error);
+  cleanup = (decomp ? NULL_TREE
+: cxx_maybe_build_cleanup (decl, tf_warning_or_error));

   /* Perform the initialization.  */
   if (init)
@@ -8120,10 +8120,16 @@ initialize_local_var (tree decl, tree in

  gcc_assert (building_stmt_list_p ());
  saved_stmts_are_full_exprs_p = stmts_are_full_exprs_p ();
- current_stmt_tree ()->stmts_are_full_exprs_

Re: [RFC PATCH] c++: Add alignas further test coverage [PR110345]

2024-09-05 Thread Jason Merrill


On 9/5/24 9:03 AM, Jakub Jelinek wrote:

Hi!

I've tried to do the same thing I did for normal standard attributes
also for alignas, but there are way too many cases which are silently
accepted although my reading of:

"An alignment-specifier may be applied to a variable or to a class data member,
but it shall not be applied to a bit-field, a function parameter, or an
exception-declaration ([except.handle]).
An alignment-specifier may also be applied to the declaration of a class (in
an elaborated-type-specifier ([dcl.type.elab]) or class-head ([class]),
respectively)."

I've marked the spots where I'd expect some pedwarn with // FIXME.
Clearly we accept it e.g. on bit-fields, exception-declarations, enum
declarations, functions, to e.g. array/reference etc. types, ...

Is some of this intentional?


Allowing it for functions and enums seems consistent with the GNU 
aligned attribute, I'd complain only when -pedantic.


I think we might want to pedwarn about standard attribute syntax 
appertaining to a type (other than between the class/enum key and name) 
when !affects_type_identity.  But that seems like a separate issue.


Does GNU aligned on a bit-field do anything useful?

Allowing it on an exception-declaration is a bug, those should be 
treated the same as parameters.



Though, trying clang trunk, it diagnoses all the // FIXME lines.

2024-09-05  Jakub Jelinek  

PR c++/110345
* g++.dg/cpp0x/alignas21.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/alignas21.C.jj   2024-09-05 14:16:44.366395041 
+0200
+++ gcc/testsuite/g++.dg/cpp0x/alignas21.C  2024-09-05 14:42:42.690465771 
+0200
@@ -0,0 +1,156 @@
+// C++ 26 P2552R3 - On the ignorability of standard attributes
+// { dg-do compile { target c++11 } }
+
+int arr[2];
+struct S { int a, b; };
+S arr2[2];
+
+void
+foo (int n)
+{
+  alignas (int) int x1;
+  alignas ("foobar") int x2; // { dg-error "'alignas' argument has 
non-integral type 'const char \\\[7\\\]'" }
+  alignas (0) int x3;  // { dg-warning "requested alignment 
'0' is not a positive power of 2" }
+  alignas ("foo", "bar", "baz") int x4;  // { dg-error "'alignas' argument 
has non-integral type 'const char \\\[4\\\]'" }
+   // { dg-error "expected '\\\)' before ',' 
token" "" { target *-*-* } .-1 }
+   // { dg-error "expected declaration before ',' 
token" "" { target *-*-* } .-2 }
+   // { dg-error "expected primary-expression 
before ',' token" "" { target *-*-* } .-3 }
+  alignas (0, 1, 2) int x5;// { dg-error "expected '\\\)' 
before ',' token" }
+   // { dg-error "expected declaration before ',' 
token" "" { target *-*-* } .-1 }
+   // { dg-error "expected primary-expression 
before ',' token" "" { target *-*-* } .-2 }
+
+  auto a = [] alignas (int) () {}; // FIXME
+  auto b = [] constexpr alignas (int) {};  // FIXME
+   // { dg-error "parameter declaration before 
lambda declaration specifiers only optional with" "" { target c++20_down } .-1 }
+   // { dg-error "'constexpr' lambda only 
available with" "" { target c++14_down } .-2 }
+  auto c = [] noexcept alignas (int) {};   // FIXME
+   // { dg-error "parameter declaration before 
lambda exception specification only optional with" "" { target c++20_down } .-1 }
+  auto d = [] () alignas (int) {}; // FIXME
+  auto e = new int [n] alignas (int);  // { dg-warning "attributes ignored 
on outermost array type in new expression" }
+  auto e2 = new int [n] alignas (int) [42];// { dg-warning "attributes ignored 
on outermost array type in new expression" }
+  auto f = new int [n][42] alignas (int);  // FIXME
+  alignas (int);   // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  alignas (int) {} // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  alignas (int) if (true) {}   // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  alignas (int) while (false) {}   // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  alignas (int) goto lab;  // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  alignas (int) lab:;  // { dg-error "alignment may not be 
specified for 'lab'" }
+  alignas (int) try {} catch (int) {}  // { dg-warning "attributes at the 
beginning of statement are ignored" }
+  if (alignas (int) int x = 0) {}
+  switch (n)
+{
+alignas (int) case 1:  // { dg-error "alignment may no

Re: [PATCH v3] c++: fn redecl in fn scope wrongly accepted [PR116239]

2024-09-05 Thread Jason Merrill


On 9/5/24 11:10 AM, Marek Polacek wrote:

On Thu, Sep 05, 2024 at 10:42:22AM -0400, Jason Merrill wrote:

On 9/4/24 1:18 PM, Marek Polacek wrote:

On Wed, Sep 04, 2024 at 12:28:49PM -0400, Jason Merrill wrote:



+  if (!validate_constexpr_redeclaration (alias, decl))
+return;
+
  retrofit_lang_decl (decl);
  DECL_LOCAL_DECL_ALIAS (decl) = alias;
}


I don't think we need this in the case that we built a new alias and pushed
it; in that case alias is built from decl, and should certainly match
already.  So let's put this call before we decide to build a new alias,
either in the loop or just after it.


That's right.  How about this?


Hmm, I'd still prefer to have it *before* the !alias case for locality;
putting it as an else to a 66-line if seems obscure.

But OK however.


Ah, let me change it.  Still OK?  dg.exp passed.


OK, thanks.


-- >8 --
Redeclaration such as

   void f(void);
   consteval void f(void);

is invalid.  In a namespace scope, we detect the collision in
validate_constexpr_redeclaration, but not when one declaration is
at block scope.

When we have

   void f(void);
   void g() { consteval void f(void); }

we call pushdecl on the second f and call push_local_extern_decl_alias.
It finds the namespace-scope f:

 for (ovl_iterator iter (binding); iter; ++iter)
   if (decls_match (decl, *iter, /*record_versions*/false))
 {
   alias = *iter;
   break;
 }

but decls_match says they match so we just set DECL_LOCAL_DECL_ALIAS
(and do not call another pushdecl leading to duplicate_decls which
would detect mismatching return types, for example).  I don't think
we want to change decls_match, so a simple fix is to detect the
problem in push_local_extern_decl_alias.

PR c++/116239

gcc/cp/ChangeLog:

* cp-tree.h (validate_constexpr_redeclaration): Declare.
* decl.cc (validate_constexpr_redeclaration): No longer static.
* name-lookup.cc (push_local_extern_decl_alias): Call
validate_constexpr_redeclaration.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/redeclaration-6.C: New test.
---
  gcc/cp/cp-tree.h  |  1 +
  gcc/cp/decl.cc|  2 +-
  gcc/cp/name-lookup.cc |  2 ++
  .../g++.dg/diagnostic/redeclaration-6.C   | 34 +++
  4 files changed, 38 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 2eeb5e3e8b1..1a763b683de 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6992,6 +6992,7 @@ extern bool member_like_constrained_friend_p  (tree);
  extern bool fns_correspond(tree, tree);
  extern int decls_match(tree, tree, bool = 
true);
  extern bool maybe_version_functions   (tree, tree, bool);
+extern bool validate_constexpr_redeclaration   (tree, tree);
  extern bool merge_default_template_args   (tree, tree, bool);
  extern tree duplicate_decls   (tree, tree,
 bool hiding = false,
diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index 7bad3047ad9..f4128dbccdf 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -1412,7 +1412,7 @@ check_redeclaration_exception_specification (tree 
new_decl,
  /* Return true if OLD_DECL and NEW_DECL agree on constexprness.
 Otherwise issue diagnostics.  */
  
-static bool

+bool
  validate_constexpr_redeclaration (tree old_decl, tree new_decl)
  {
old_decl = STRIP_TEMPLATE (old_decl);
diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 7a6cc244c15..cd3947cbe4f 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3637,6 +3637,8 @@ push_local_extern_decl_alias (tree decl)
  if (decls_match (decl, *iter, /*record_versions*/false))
{
  alias = *iter;
+ if (!validate_constexpr_redeclaration (alias, decl))
+   return;
  break;
}
  
diff --git a/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C

new file mode 100644
index 000..ed8d4af7792
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/redeclaration-6.C
@@ -0,0 +1,34 @@
+// PR c++/116239
+// { dg-do compile { target c++20 } }
+
+consteval void f1();
+void f2();
+constexpr void f3();
+void f4();
+consteval void f5();
+constexpr void f6();
+
+void
+g ()
+{
+  void f1();   // { dg-error "differs in .consteval." }
+  consteval void f2(); // { dg-error "differs in .consteval." }
+
+  void f3();   // { dg-error "differs in .constexpr." }
+  constexpr void f4();  // { dg-error "differs in .constexpr." }
+
+  consteval void f5();
+  constexpr void f6();
+
+  void f7();
+  consteval void f7(); // { dg-error "differs in .consteval." }
+
+  consteval void f8();
+  void f8();

Re: [PATCH] RISC-V: Make the setCC/REE tests robust to instruction selection

2024-09-05 Thread Palmer Dabbelt


On Wed, 04 Sep 2024 15:20:45 PDT (-0700), jeffreya...@gmail.com wrote:



On 9/4/24 4:07 PM, Palmer Dabbelt wrote:

These tests were checking that the output of the setCC instruction was bit
flipped, but it looks like they're really designed to test that
redundant sign extension elimination fires on conditionals from function
inputs.  Jeff just posed a patch to clean this code up with trips up on
the arbitrary xori/snez instruction selection decision changing, so
let's just robustify the tests.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/sge.c: Adjust regex to match the input.
* gcc.target/riscv/sgeu.c: Likewise.
* gcc.target/riscv/sle.c: Likewise.
* gcc.target/riscv/sleu.c: Likewise.

Works for me.  Didn't see worth much effort here.  Based on the history
of those tests their main purpose is to ensure we don't have an
extension after the sCC.


Ah, OK, I guess I misunderstood then -- we'd had a bunch of issues 
overly sign extending inputs, so I figured that's what this was trying 
to do.  I think the output side is still covered by the 
scan-assembler-not for W ops, so we should be safe there.





---
I haven't tested this, but it should be indepndent from Jeff's.

Independent, but it would eliminate the need for my twiddle to these
tests.   I don't much care either way other than making sure they stay
in a PASS state ;-)


Ya, sorry, I guess "independent" is really the wrong word there.  I was 
trying to say we could merge this and it'd pass before/after your 
change.




jeff

Re: [PATCH] c++, v4: Partially implement CWG 2867 - Order of initialization for structured bindings [PR115769]

2024-09-05 Thread Jason Merrill


On 9/5/24 11:14 AM, Jakub Jelinek wrote:

On Thu, Sep 05, 2024 at 10:51:47AM -0400, Jason Merrill wrote:

@@ -8993,6 +9010,11 @@ cp_finish_decl (tree decl, tree init, bo
 if (var_definition_p)
abstract_virtuals_error (decl, type);
+  if (decomp
+ && !processing_template_decl
+ && !cp_finish_decomp (decl, decomp, true))


It looks like when processing_template_decl this patch will do the expanded
cleanup below, unlike the v2 patch.


No, because around 300 lines earlier there is
   if (processing_template_decl)
 {
...
   return;
 }

But that means I can leave that && !preprocessing_template_decl part
out.
The earlier patch passed bootstrap/regtest on x86_64-linux and i686-linux,
will obviously retest this one which changes just that single hunk.

Ok for trunk if it passes?


OK.


2024-09-05  Jakub Jelinek  

PR c++/115769
* cp-tree.h: Partially implement CWG 2867 - Order of initialization
for structured bindings.
(cp_finish_decomp): Add TEST_P argument defaulted to false.
* decl.cc (initialize_local_var): Add DECOMP argument, if true,
don't build cleanup and temporarily override stmts_are_full_exprs_p
to 0 rather than 1.  Formatting fix.
(cp_finish_decl): Invoke cp_finish_decomp for structured bindings
here, first with test_p.  For automatic structured binding bases
if the test cp_finish_decomp returned true wrap the initialization
together with what non-test cp_finish_decomp emits with a
CLEANUP_POINT_EXPR, and if there are any CLEANUP_STMTs needed, emit
them around the whole CLEANUP_POINT_EXPR with guard variables for the
cleanups.  Call cp_finish_decomp using RAII if not called with
decomp != NULL otherwise.
(cp_finish_decomp): Add TEST_P argument, change return type from
void to bool, if TEST_P is true, return true instead of emitting
actual code for the tuple case, otherwise return false.
* parser.cc (cp_convert_range_for): Don't call cp_finish_decomp
after cp_finish_decl.
(cp_parser_decomposition_declaration): Set DECL_DECOMP_BASE
before cp_finish_decl call.  Don't call cp_finish_decomp after
cp_finish_decl.
(cp_finish_omp_range_for): Don't call cp_finish_decomp after
cp_finish_decl.
* pt.cc (tsubst_stmt): Likewise.

* g++.dg/DRs/dr2867-1.C: New test.
* g++.dg/DRs/dr2867-2.C: New test.

--- gcc/cp/cp-tree.h.jj 2024-08-30 09:09:45.466623869 +0200
+++ gcc/cp/cp-tree.h2024-08-30 11:00:39.861747964 +0200
@@ -7024,7 +7024,7 @@ extern void omp_declare_variant_finalize
  struct cp_decomp { tree decl; unsigned int count; };
  extern void cp_finish_decl(tree, tree, bool, tree, int, 
cp_decomp * = nullptr);
  extern tree lookup_decomp_type(tree);
-extern void cp_finish_decomp   (tree, cp_decomp *);
+extern bool cp_finish_decomp   (tree, cp_decomp *, bool = 
false);
  extern int cp_complete_array_type (tree *, tree, bool);
  extern int cp_complete_array_type_or_error(tree *, tree, bool, 
tsubst_flags_t);
  extern tree build_ptrmemfunc_type (tree);
--- gcc/cp/decl.cc.jj   2024-08-30 09:09:45.495623494 +0200
+++ gcc/cp/decl.cc  2024-09-04 19:55:59.046491602 +0200
@@ -103,7 +103,7 @@ static tree check_special_function_retur
  static tree push_cp_library_fn (enum tree_code, tree, int);
  static tree build_cp_library_fn (tree, enum tree_code, tree, int);
  static void store_parm_decls (tree);
-static void initialize_local_var (tree, tree);
+static void initialize_local_var (tree, tree, bool);
  static void expand_static_init (tree, tree);
  static location_t smallest_type_location (const cp_decl_specifier_seq*);
  static bool identify_goto (tree, location_t, const location_t *,
@@ -8058,14 +8058,13 @@ wrap_temporary_cleanups (tree init, tree
  /* Generate code to initialize DECL (a local variable).  */
  
  static void

-initialize_local_var (tree decl, tree init)
+initialize_local_var (tree decl, tree init, bool decomp)
  {
tree type = TREE_TYPE (decl);
tree cleanup;
int already_used;
  
-  gcc_assert (VAR_P (decl)

- || TREE_CODE (decl) == RESULT_DECL);
+  gcc_assert (VAR_P (decl) || TREE_CODE (decl) == RESULT_DECL);
gcc_assert (!TREE_STATIC (decl));
  
if (DECL_SIZE (decl) == NULL_TREE)

@@ -8085,7 +8084,8 @@ initialize_local_var (tree decl, tree in
  DECL_READ_P (decl) = 1;
  
/* Generate a cleanup, if necessary.  */

-  cleanup = cxx_maybe_build_cleanup (decl, tf_warning_or_error);
+  cleanup = (decomp ? NULL_TREE
+: cxx_maybe_build_cleanup (decl, tf_warning_or_error));
  
/* Perform the initialization.  */

if (init)
@@ -8120,10 +8120,16 @@ initialize_local_var (tree decl, tree in
  
  	  gcc_assert (building_stmt_list_p ());

  saved_stmts_are_full_exprs_p = stmts_

[PATCH] testsuite/gcc.dg/pr84877.c: Add machinery to stabilize stack aligmnent

2024-09-05 Thread Hans-Peter Nilsson

Tested adding 0..more-than-four environment variables,
running cris-sim+cris-elf.  I also checked that foo stays
the same generated code regardless of the new code: this is
not obviously true as foo is "just" noinline, not __noipa__.

Ok to commit?

-- >8 --
This test awkwardly "blinks"; xfails and xpasses apparently
randomly for cris-elf using the "gdb simulator".  On
inspection, I see that the stack address depends on the
number of environment variables, deliberately passed to the
simulator, each adding the size of a pointer.

This test is IMHO important enough not to be just skipped
just because it blinks (fixing the actual problem is a
different task).

I guess a random non-16 stack-alignment could happen for
other targets as well, so let's try and add a generic
machinery to "stabilize" the test as failing, by allocating
a dynamic amount to make sure it's misaligned.  The most
target-dependent item here is an offset between the incoming
stack-pointer value (within main in the added framework) and
outgoing (within "xmain" as called from main when setting up
the p0 parameter).  I know there are other wonderful stack
shapes, but such targets would fall under the "complicated
situations"-label and are no worse off than before.

* gcc.dg/pr84877.c: Try to make the test result consistent by
misaligning the stack.
---
 gcc/testsuite/gcc.dg/pr84877.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/pr84877.c b/gcc/testsuite/gcc.dg/pr84877.c
index e82991f42dd4..2f2e29578df9 100644
--- a/gcc/testsuite/gcc.dg/pr84877.c
+++ b/gcc/testsuite/gcc.dg/pr84877.c
@@ -3,6 +3,32 @@
 
 #include 
 
+#ifdef __CRIS__
+#define OUTGOING_SP_OFFSET (-sizeof (void *))
+/* Suggestion: append #elif defined() after this 
comment,
+   either defining OUTGOING_SP_OFFSET to whatever the pertinent amount is at 
-O2,
+   if that makes your target consistently fail this test, or define
+   DO_NOT_TAMPER for more complicated situations.  Either way, compile with
+   -DDO_NO_TAMPER to avoid any meddling.  */
+#endif
+
+#if defined (OUTGOING_SP_OFFSET) && !defined (DO_NOT_TAMPER)
+extern int xmain () __attribute__ ((__noipa__));
+int main ()
+{
+  uintptr_t misalignment
+= (OUTGOING_SP_OFFSET
++ (15 & (uintptr_t) __builtin_stack_address ()));
+  /* Allocate a minimal amount if the stack was accidentally aligned.  */
+  void *q = __builtin_alloca (misalignment == 0);
+  xmain ();
+  /* Fake use to avoid the "allocation" being optimized out.  */
+  asm volatile ("" : : "rm" (q));
+  return 0;
+}
+#define main xmain
+#endif
+
 struct U {
 int M0;
 int M1;
-- 
2.30.2

[PATCH] arm: avoid indirect sibcalls when IP is live [PR116597]

2024-09-05 Thread Richard Earnshaw

On Arm only r0-r3 (the argument registers) and IP are available for
use as an address for an indirect sibcall.  But if all the argument
registers are used and IP is clobbered during the epilogue, or is used
to pass closure information, then there is no spare register to hold
the address and we must reject the sibcall.

arm_function_ok_for_sibcall did try to handle this, but it did this by
examining the function declaration.  That doesn't work if the function
has no prototype, or if the prototype has variadic arguments: we must,
instead, look at the list of actuals for the call rather than the list
of formals.

The old code also worked by laying out all the arguments and then
trying to add one more integer argument at the end of the list, but
this missed a corner case where a hole had been left in the argument
register list due to argument alignment.

We fix all of this by now scanning the list of actual values to be
passed and then checking if a core register has been assigned to that
argument.  If it has, then we record which registers were assigned.
Once done we then look to see if all the argument registers have been
assigned and only block the sibcall if that is the case.  This permits
us to sibcall:

int (*d)(int, ...);
int g(void);
int i () { return d(g(), 2LL);}

because r1 remains free (the 2LL argument is passed in {r2,r3}).

gcc/
PR target/116597
* config/arm/arm.cc (arm_function_ok_for_sibcall): Use the list of
actuals for the call, not the list of formals.

gcc/testsuite/
PR target/116597
* gcc.target/arm/pac-sibcall-2.c: New test.
* gcc.target/arm/pac-sibcall-3.c: New test.
---

I think the above is all OK, but I'm not especially familiar with handling
call expansion.  I'll wait until next week before committing this.

 gcc/config/arm/arm.cc| 38 ++--
 gcc/testsuite/gcc.target/arm/pac-sibcall-2.c | 14 
 gcc/testsuite/gcc.target/arm/pac-sibcall-3.c | 14 
 3 files changed, 55 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall-2.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pac-sibcall-3.c

diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 17485447693..de34e9867e6 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -8007,10 +8007,11 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   && DECL_WEAK (decl))
 return false;
 
-  /* We cannot tailcall an indirect call by descriptor if all the 
call-clobbered
- general registers are live (r0-r3 and ip).  This can happen when:
-  - IP contains the static chain, or
-  - IP is needed for validating the PAC signature.  */
+  /* Indirect tailcalls need a call-clobbered register to hold the function
+ address.  But we only have r0-r3 and ip in that class.  If r0-r3 all hold
+ function arguments, then we can only use IP.  But IP may be needed in the
+ epilogue (for PAC validation), or for passing the static chain.  We have
+ to disable the tail call if nothing is available.  */
   if (!decl
   && ((CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
  || arm_current_function_pac_enabled_p()))
@@ -8022,18 +8023,33 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
   arm_init_cumulative_args (&cum, fntype, NULL_RTX, NULL_TREE);
   cum_v = pack_cumulative_args (&cum);
 
-  for (tree t = TYPE_ARG_TYPES (fntype); t; t = TREE_CHAIN (t))
+  tree arg;
+  call_expr_arg_iterator iter;
+  unsigned used_regs = 0;
+
+  /* Layout each actual argument in turn.  If it is allocated to
+core regs, note which regs have been allocated.  */
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
{
- tree type = TREE_VALUE (t);
- if (!VOID_TYPE_P (type))
+ tree type = TREE_TYPE (arg);
+ function_arg_info arg_info (type, /*named=*/true);
+ rtx reg = arm_function_arg (cum_v, arg_info);
+ if (reg && REG_P (reg)
+ && REGNO (reg) <= LAST_ARG_REGNUM)
{
- function_arg_info arg (type, /*named=*/true);
- arm_function_arg_advance (cum_v, arg);
+ /* Avoid any chance of UB here.  We don't care if TYPE
+is very large since it will use up all the argument regs.  */
+ unsigned nregs = MIN (ARM_NUM_REGS2 (GET_MODE (reg), type),
+   LAST_ARG_REGNUM + 1);
+ used_regs |= ((1 << nregs) - 1) << REGNO (reg);
}
+ arm_function_arg_advance (cum_v, arg_info);
}
 
-  function_arg_info arg (integer_type_node, /*named=*/true);
-  if (!arm_function_arg (cum_v, arg))
+  /* We've used all the argument regs, and we know IP is live during the
+epilogue for some reason, so we can't tailcall.  */
+  if ((used_regs & ((1 << (LAST_ARG_REGNUM + 1)) - 1))
+ == ((1 << (LAST_ARG_REGNUM + 1)) - 1))
return

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Jason Merrill


On 9/5/24 10:54 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

A lambda within a default template argument used in some template-id
may have a smaller template depth than the context of the template-id.
For example, the lambda in v1's default template argument has template
depth 1, and in v2's has template depth 2, but the template-ids v1<0>
and v2<0> which uses these default arguments appear in a depth 3 template
context.  So add_extra_args will ultimately return args with depth 3 --
too many args for the lambda, leading to a bogus substitution.

This patch fixes this by trimming the result of add_extra_args to match
the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH field
is added that tracks the template-ness of a lambda;

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
trim the augmented template arguments to match the template depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
  gcc/cp/pt.cc  | 11 +
  gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30 +++
  2 files changed, 41 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..c49a26b4f5e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
return t;
  }
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then it's probably from


"probably" seems wrong, given that it wasn't implemented for this case.


+a context (e.g. default template argument context) which may have fewer
+levels than the current context it's embedded in.  Adjust the result of
+add_extra_args accordingly.  */


Hmm, this looks like a situation of not just fewer levels, but 
potentially unrelated levels.  "args" here is for f, which shares no 
template context with v1.  What happens if your templates have non-type 
template parameters?



+  tree ctx_parms = DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (oldfn));
+  if (generic_lambda_fn_p (oldfn))
+   ctx_parms = TREE_CHAIN (ctx_parms);
+  args = get_innermost_template_args (args, TMPL_PARMS_DEPTH (ctx_parms));
+}
  
tree r = build_lambda_expr ();
  
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

new file mode 100644
index 000..c5c0525908e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
@@ -0,0 +1,30 @@
+// PR c++/116567
+// { dg-do compile { target c++20 } }
+
+template
+bool v1 = true;
+
+template
+bool v1g = true;
+
+template
+struct A {
+  template
+  static inline bool v2 = true;
+
+  template
+  static inline bool v2g = true;
+
+  template
+  struct B {
+template
+static void f() {
+  v1<0> && v1g<0>;
+  v2<0> && v2g<0>;
+}
+  };
+};
+
+auto main() -> int {
+  A::B::f();
+}

Re: [PATCH] c++: Fix overeager Woverloaded-virtual with conversion operators [PR109918]

2024-09-05 Thread Jason Merrill


On 9/5/24 7:02 AM, Simon Martin wrote:

Hi Jason,

On 4 Sep 2024, at 18:09, Jason Merrill wrote:


On 9/1/24 2:51 PM, Simon Martin wrote:

Hi Jason,

On 26 Aug 2024, at 19:23, Jason Merrill wrote:


On 8/25/24 12:37 PM, Simon Martin wrote:

On 24 Aug 2024, at 23:59, Simon Martin wrote:

On 24 Aug 2024, at 15:13, Jason Merrill wrote:


On 8/23/24 12:44 PM, Simon Martin wrote:

We currently emit an incorrect -Woverloaded-virtual warning upon



the



following
test case

=== cut here ===
struct A {
  virtual operator int() { return 42; }
  virtual operator char() = 0;
};
struct B : public A {
  operator char() { return 'A'; }
};
=== cut here ===

The problem is that warn_hidden relies on get_basefndecls to
find



the
methods
in A possibly hidden B's operator char(), and gets both the
conversion operator
to int and to char. It eventually wrongly concludes that the
conversion to int
is hidden.

This patch fixes this by filtering out conversion operators to
different types
from the list returned by get_basefndecls.


Hmm, same_signature_p already tries to handle comparing
conversion
operators, why isn't that working?


It does indeed.

However, `ovl_range (fns)` does not only contain `char
B::operator()` -
for which `any_override` gets true - but also `conv_op_marker` -
for



which `any_override` gets false, causing `seen_non_override` to



get
to
true. Because of that, we run the last loop, that will emit a
warning
for all `base_fndecls` (except `char B::operator()` that has been
removed).

We could test `fndecl` and `base_fndecls[k]` against
`conv_op_marker` in
the loop, but we’d still need to inspect the “converting to”
type
in the last loop (for when `warn_overloaded_virtual` is 2). This
would
make the code much more complex than the current patch.


Makes sense.


It would however probably be better if `get_basefndecls` only
returned
the right conversion operator, not all of them. I’ll draft
another
version of the patch that does that and submit it in this thread.


I have explored my suggestion further and it actually ends up more
complicated than the initial patch.


Yeah, you'd need to do lookup again for each member of fns.


Please find attached a new revision to fix the reported issue, as
well
as new ones I discovered while testing with -Woverloaded-virtual=2.




It’s pretty close to the initial patch, but (1) adds a missing
“continue;” (2) fixes a location problem when
-Woverloaded-virtual==2 (3) adds more test cases. The commit log is
also
more comprehensive, and should describe well the various problems
and




why the patch is correct.



+   if (IDENTIFIER_CONV_OP_P (name)
+   && !same_type_p (DECL_CONV_FN_TYPE (fndecl),
+DECL_CONV_FN_TYPE (base_fndecls[k])))
+ {
+   base_fndecls[k] = NULL_TREE;
+   continue;
+ }


So this removes base_fndecls[k] if it doesn't return the same type
as
fndecl.  But what if there's another conversion op in fns that does



return the same type as base_fndecls[k]?

If I add an operator int() to both base and derived in
Woverloaded-virt7.C, the warning disappears.


That was an issue indeed. I’ve reworked the patch, and came up with
the attached latest version. It explicitly keeps track both of
overloaded and of hidden base methods (and the “hiding method”
for
the latter), and uses those instead of juggling with bools and
nullified
base_decls.

On top of fixing the issue the PR reports, it fixes a few that I came




across while investigating:
- wrongly emitting the warning if the base method is not virtual (the




lines added to Woverloaded-virt1.C would cause a warning without the
patch)
- wrongly emitting the warning when the derived class method is a
template, which is wrong since template members don’t override
virtual
base methods (see the change in pr61945.C)


This change seems wrong to me; the warning is documented as "Warn when
a function declaration hides virtual functions from a base class," and
templates can certainly hide virtual base methods, as indeed they do
in that testcase.

Gasp, you’re right. The updated patch fixes this by simply working
from the TEMPLATE_TEMPLATE_RESULT of TEMPLATE_DECL; so pr61945.C warns
again (after changing the signature so that it actually hides the base
class; it was not before, hence the warning was actually incorrect).


It was hiding the base function before, the warning was correct; hiding 
is based on name, not signature.  Only overriding depends on the signature.


Jason

Re: [PATCH 6/8] gcn: Add else operand to masked loads.

2024-09-05 Thread Andrew Stubbs


(Sorry, I missed this because I was on vacation.)

On 11/08/2024 22:00, Robin Dapp wrote:

This patch adds a zero else operand to the masked loads.


The patch is OK, but I have a question below.


gcc/ChangeLog:

* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
  gcc/config/gcn/gcn-valu.md   | 6 --
  gcc/config/gcn/predicates.md | 3 +++
  2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index b24cf9be32e..2344bc00ffc 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -4002,7 +4002,8 @@ (define_expand "while_ultsidi"
  (define_expand "maskloaddi"
[(match_operand:V_MOV 0 "register_operand")
 (match_operand:V_MOV 1 "memory_operand")
-   (match_operand 2 "")]
+   (match_operand 2 "")
+   (match_operand:V_MOV 3 "maskload_else_operand")]
""
{
  rtx exec = force_reg (DImode, operands[2]);
@@ -4040,7 +4041,8 @@ (define_expand "mask_gather_load"
 (match_operand: 2 "register_operand")
 (match_operand 3 "immediate_operand")
 (match_operand:SI 4 "gcn_alu_operand")
-   (match_operand:DI 5 "")]
+   (match_operand:DI 5 "")
+   (match_operand:V_MOV 6 "maskload_else_operand")]
""
{
  rtx exec = force_reg (DImode, operands[5]);
diff --git a/gcc/config/gcn/predicates.md b/gcc/config/gcn/predicates.md
index 3f59396a649..9bc806cf990 100644
--- a/gcc/config/gcn/predicates.md
+++ b/gcc/config/gcn/predicates.md
@@ -228,3 +228,6 @@ (define_predicate "ascending_zero_int_parallel"
return gcn_stepped_zero_int_parallel_p (op, 1);
  })
  
+(define_predicate "maskload_else_operand"

+  (and (match_code "const_int,const_vector")
+   (match_test "op == CONST0_RTX (GET_MODE (op))")))


This forces maskload and mask_gather_load to only accept zero here, but 
in fact the hardware would allow us to accept any value (including 
undefined).


Here's the expand code:

  /* Masked lanes are required to hold zero.  */
  emit_move_insn (operands[0], gcn_vec_constant (mode, 0));

  emit_insn (gen_gather_expr_exec (operands[0], addr, as, v,
 operands[0], exec));

In other words, initialize the whole vector to zero, and then use the 
gather_load instruction to implement the masked load (GCN does not have 
a contiguous-memory vector load instruction).


We could easily omit the initialization instruction, or pass through the 
new value.


Would there be any advantage to accepting other values, or is forcing 
zero actually the right choice?


Thanks for the patch

Andrew

[pushed] doc: remove stray character

2024-09-05 Thread Marek Polacek

Applying to trunk as obvious.

-- >8 --
There's an extra '+'.

gcc/ChangeLog:

* doc/invoke.texi: Remove an extra char in @item sme2.
---
 gcc/doc/invoke.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 193db761d64..0f9b1bab19f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -21847,7 +21847,7 @@ Enable the Scalable Matrix Extension.
 Enable the FEAT_SME_I16I64 extension to SME.
 @item sme-f64f64
 Enable the FEAT_SME_F64F64 extension to SME.
-+@item sme2
+@item sme2
 Enable the Scalable Matrix Extension 2.  This also enables SME instructions.
 @item lse128
 Enable the LSE128 128-bit atomic instructions extension.  This also

base-commit: d44cae2d9310660e3e47f15202e86e4f73f15b37
-- 
2.46.0

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Patrick Palka

On Thu, 5 Sep 2024, Jason Merrill wrote:

> On 9/5/24 10:54 AM, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > for trunk/14?
> > 
> > -- >8 --
> > 
> > A lambda within a default template argument used in some template-id
> > may have a smaller template depth than the context of the template-id.
> > For example, the lambda in v1's default template argument has template
> > depth 1, and in v2's has template depth 2, but the template-ids v1<0>
> > and v2<0> which uses these default arguments appear in a depth 3 template
> > context.  So add_extra_args will ultimately return args with depth 3 --
> > too many args for the lambda, leading to a bogus substitution.
> > 
> > This patch fixes this by trimming the result of add_extra_args to match
> > the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH field
> > is added that tracks the template-ness of a lambda;
> > 
> > PR c++/116567
> > 
> > gcc/cp/ChangeLog:
> > 
> > * pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
> > trim the augmented template arguments to match the template depth
> > of the lambda.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp2a/lambda-targ7.C: New test.
> > ---
> >   gcc/cp/pt.cc  | 11 +
> >   gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30 +++
> >   2 files changed, 41 insertions(+)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> > 
> > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > index 747e627f547..c49a26b4f5e 100644
> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
> > tsubst_flags_t complain, tree in_decl)
> > LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
> > return t;
> >   }
> > +  if (LAMBDA_EXPR_EXTRA_ARGS (t))
> > +{
> > +  /* If we deferred substitution into this lambda, then it's probably
> > from
> 
> "probably" seems wrong, given that it wasn't implemented for this case.

I said "probably" because in e.g.

template
bool b = true;

template
void f() {
  b<0>;
}

the lambda context has the same depth as the template-id context.  But
as you point out, the issue is ultimately related vs unrelated
parameters rather than depth.

> 
> > +a context (e.g. default template argument context) which may have
> > fewer
> > +levels than the current context it's embedded in.  Adjust the result
> > of
> > +add_extra_args accordingly.  */
> 
> Hmm, this looks like a situation of not just fewer levels, but potentially
> unrelated levels.  "args" here is for f, which shares no template context with
> v1.  What happens if your templates have non-type template parameters?

Indeed before add_extra_args 'args' will be unrelated, but after doing
add_extra_args the innermost levels of 'args' will correspond to the
lambda's template context, and so using get_innermost_template_args
ought to get rid of the unrelated arguments, keeping only the ones
relevant to the original lambda context.

Here's v2 which clarifies the comment to talk about related parameters
rather than differing depth, and extends the testcase to invoke
the lambda etc.

-- >8 --

Subject: [PATCH] c++: template depth of lambda in default targ [PR116567]

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
trim the augmented template arguments to match the template depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
 gcc/cp/pt.cc  | 12 +++
 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 42 +++
 2 files changed, 54 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..e6c10d5bd20 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,18 @@ tsubst_lambda_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
   LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
   return t;
 }
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then its original
+context (e.g. default template argument context) might be unrelated
+to the current context it's embedded in.  After add_extra_args though,
+the innermost levels of 'args' will correspond to the lambda context,
+so get rid of all unrelated levels.  */
+  tree ctx_parms = DECL_TEMPLATE_PARMS (DECL_TI_TEMPLATE (oldfn));
+  if (generic_lambda_fn_p (oldfn))
+   ctx_parms = TREE_CHAIN (ctx_parms);
+  args = get_innermost_template_args (args, TMPL_PARMS_DEPTH (ctx_parms));
+}
 
   tree r = build_lambda_expr ();
 
diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
new file mode

Re: [RISCV] target-specific source placement

2024-09-05 Thread Jeff Law





On 9/5/24 8:27 AM, Nathan Sidwell wrote:

Hi,
looking at the RISCV code, it seems that there are several vendor- 
specific files in config/riscv.  For instance sifive-7.md and 
xiangshan.md. It seems these are unconditionally included for all riscv 
targets. I guess then one doesn't end up with some combinatorial 
explosion of possible riscv compilers. But it doesn't seem scalable, 
given the one of the points of riscv is to add your own magic pixie dust.
In general separating pipeline models is easy (and I think that's mostly 
what you're seeing with the core specific md files).


When a core has custom extensions that don't overlap well with existing 
ratified extensions, those can be split out pretty easily too.


Where things get nasty is vendor specific extensions that map closely to 
ratified extensions.  For example, there's parts out there with 
zicond-like extensions or Zb*-like extensions.  Those are tough to 
figure out the best way forward.


And sometimes core specific extensions require changes into the common 
risc-v code.


So it may seem like putting vendor stuff into a subdir/file makes sense, 
but it's probably not as easy as you might think in practice.




In my case, I have a port that also has a bunch of vendor-specific 
passes, which have unfortunately been placed in the main gcc directory. 
They directly rely on an API added to config/riscv. IMHO placing them in 
a vendor subdirectory of config/riscv would seem cleaner. Then have 
config glue to include them in the build under something like a --with- 
riscv-$vendor-extensions configure flag


Whether this port gets considered for upstreaming is unknown.

Anyway, I guess I'm suggesting that, for new code:

1) vendor-specific files get put in a config/riscv/$vendor subdirectory

2) configure-time options determine whether a specific vendor's bits are 
included in the build.

#1 seems quite reasonable.

#2 could well be a long term maintenance problem.  Not sure.  We're 
actually struggling a bit with this question for the core-v bit extensions.


jeff

ps.  As much as folks think vendor extensions are a secret sauce for 
risc-v adoption, I'm far from convinced it's a good idea...

Re: [RISCV] target-specific source placement

2024-09-05 Thread Palmer Dabbelt


On Thu, 05 Sep 2024 07:27:57 PDT (-0700), nat...@acm.org wrote:

Hi,
looking at the RISCV code, it seems that there are several vendor-specific files
in config/riscv.  For instance sifive-7.md and xiangshan.md. It seems these are
unconditionally included for all riscv targets. I guess then one doesn't end up
with some combinatorial explosion of possible riscv compilers. But it doesn't
seem scalable, given the one of the points of riscv is to add your own magic
pixie dust.


Ya, this has always been a point of tension in RISC-V land: the main 
argument for using RISC-V is that vendors can go do whatever they want, 
but then you end up with a bunch of fragmentation and a very clunky 
software stack.  We've got basically the same problem all over the 
place, and it's not even limited to just vendor stuff -- even if you 
just stick to the standard extensions, we've now got way more RISC-V ISA 
flavors than there are people on earth.  It's just not viable to make 
that all fit together in any sort of sane way, it's too big of a number 
for anyone.


So far we've tried really hard to avoid forks on the software side of 
things -- either proper hard forks (ie, just making a different config/ 
directory and it another ISA) or soft forks (ie, scattering a bunch of 
#ifdefs around and ending up with single-vendor builds).  That has led 
to a ton of complexity in software land, but I still think it's a good 
goal as we've seen how much of a mess can come from going the other way 
from other targets.


That said, I do get that it's painful just on a day-to-day level.  It is 
for everyone, I just don't really think there's a way around it -- like 
you said above, it's the design of the ISA.



In my case, I have a port that also has a bunch of vendor-specific passes, which
have unfortunately been placed in the main gcc directory. They directly rely on
an API added to config/riscv. IMHO placing them in a vendor subdirectory of
config/riscv would seem cleaner. Then have config glue to include them in the
build under something like a --with-riscv-$vendor-extensions configure flag

Whether this port gets considered for upstreaming is unknown.


So I think there's generally this "if it's not going upstream it doesn't 
count" type argument in open source land.  It always feels a bit 
antagonistic, though.


IMO these "maybe it's going upstream" things are always the tricky bit: 
if you know it's not going upstream you can just hack out the bits you 
don't care about, and if you know it's going upstream then you'll 
eventually need to play nice so might as well keep them.  It's just kind 
of hard to argue to do the extra work there when you don't know if it's 
going to be useful.



Anyway, I guess I'm suggesting that, for new code:

1) vendor-specific files get put in a config/riscv/$vendor subdirectory


Is that so different than just having them in different files?

That's sort of how we split stuff out now, though there's a lot mixed 
together in the generic files.  At a certain point all thus stuff ends 
up somewhat coupled anyway, as you can't just independently write stuff 
in MD files without understanding the rest of the port (and some amount 
of the rest of the vendors' behavior).


The constant gen stuff is a perfect example: there's just a bunch of "if 
you've got this extension flavor, then it's cheap to make those 
constants".  We could throw some code at the problem to let that stuff 
live in its own files, but I'm not sure that's a net win in terms of 
complexity.


There's also some stuff that's vendor-specific-ish -- I'm thinking of 
stuff like the condops, for example, where we've got multiple vendor 
extensions that basically do the same thing and then a standard 
extension that does it slightly differently.  I'd bet we end up in that 
situation fairly often, at least on the supervisor side of things that's 
happened a few times in supervisor land.



2) configure-time options determine whether a specific vendor's bits are
included in the build.


IMO as long as we can turn on all the vendor stuff in the same build 
then I'm fine with something like that.  I just don't know if it'd 
actually help, as there's enough cases with coupling that it kind of 
feels like splitting stuff out is going to be more work than we save.


I'd really strongly be opposed to defaulting to per-vendor builds, 
though.  I think we'd pretty quickly end up with a tangled mess of 
conflicting code that's going to make things hard to maintain.



$vendor names can be those used in the X$vendor$suffix ISA extension scheme

thoughts?

nathan

Re: [RISCV] target-specific source placement

2024-09-05 Thread Palmer Dabbelt

[Sorry I crossed the streams here, I had to run out in the middle of 
writing up that other reply.]


On Thu, 05 Sep 2024 10:49:47 PDT (-0700), jeffreya...@gmail.com wrote:



On 9/5/24 8:27 AM, Nathan Sidwell wrote:

Hi,
looking at the RISCV code, it seems that there are several vendor-
specific files in config/riscv.  For instance sifive-7.md and
xiangshan.md. It seems these are unconditionally included for all riscv
targets. I guess then one doesn't end up with some combinatorial
explosion of possible riscv compilers. But it doesn't seem scalable,
given the one of the points of riscv is to add your own magic pixie dust.

In general separating pipeline models is easy (and I think that's mostly
what you're seeing with the core specific md files).


The pipeline models themselves are split out, but we've still got 
coupling with the types in the generic MD patterns.  Maybe we've got 
enough generic types these days that we can support a wider set of 
pipelines, but I'm betting these pipelines with super-secret magic 
instructions also have some fun performance quirks ;)



When a core has custom extensions that don't overlap well with existing
ratified extensions, those can be split out pretty easily too.

Where things get nasty is vendor specific extensions that map closely to
ratified extensions.  For example, there's parts out there with
zicond-like extensions or Zb*-like extensions.  Those are tough to
figure out the best way forward.

And sometimes core specific extensions require changes into the common
risc-v code.

So it may seem like putting vendor stuff into a subdir/file makes sense,
but it's probably not as easy as you might think in practice.



In my case, I have a port that also has a bunch of vendor-specific
passes, which have unfortunately been placed in the main gcc directory.
They directly rely on an API added to config/riscv. IMHO placing them in
a vendor subdirectory of config/riscv would seem cleaner. Then have
config glue to include them in the build under something like a --with-
riscv-$vendor-extensions configure flag

Whether this port gets considered for upstreaming is unknown.

Anyway, I guess I'm suggesting that, for new code:

1) vendor-specific files get put in a config/riscv/$vendor subdirectory

2) configure-time options determine whether a specific vendor's bits are
included in the build.

#1 seems quite reasonable.

#2 could well be a long term maintenance problem.  Not sure.  We're
actually struggling a bit with this question for the core-v bit extensions.


Ya, I think everyone's struggling with this everywhere in RISC-V land.  
That's kind of how we end up with just a bunch of ad-hoc decisions as to 
how to split stuff up.  That does make things a little clunky to look 
at, but I think it's actually the right way to go -- we're essentially 
just saying "let's go look at each specific extension and do what's 
easiest", and since extensions vary so wildly in complexity I think 
that's the best way to go.


It does involve more "just write some patches and see if they feel 
good", though, which can be kind of scary for the vendors -- basically 
we're not giving them a set of guidelines to go by, so they don't really 
know what to expect.  Those uncertain timelines can look scary for 
hardware company management types, which can scare them off even trying 
to upstream stuff.


I guess at a certain point that's inevitable, though, as everything's 
got to get reviewed upstream.  So maybe this is just as good as it gets?




jeff

ps.  As much as folks think vendor extensions are a secret sauce for
risc-v adoption, I'm far from convinced it's a good idea...


I'd agrue those are not mutually exclusive statements ;)

[to-be-committed][V2][RISC-V] Avoid unnecessary extensions after sCC insns

2024-09-05 Thread Jeff Law



So the first patch failed the pre-commit CI; it didn't fail in my 
testing because I'm using --with-arch to set a default configuration 
that includes things like zicond to ensure that's always tested.  And 
the failing test is skipped when zicond is enabled by default.


The failing test is designed to ensure that we don't miss an 
if-conversion due to costing issues around the extension that was 
typically done in an sCC sequence (which is why it's only run when 
zicond is off).


The test failed because we have a little routine that is highly 
dependent on the code generated by the sCC expander and will adjust the 
costing to account for expansion quirks that usually go away in register 
allocation.



That code needs to be enhanced to work after the sCC expansion change. 
Essentially it needs to account for the subreg extraction that shows up 
in the sequence as well as being a bit looser on mode checking.


I kept the code working for the old sequences -- in theory a user could 
conjure up the old sequence so handling them seems useful.


This also drops the testsuite changes.  Palmer's change makes them 
unnecessary.


Waiting on pre-commit CI before taking any further action...

Jeffdiff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a38cb72f09f..39489c4377e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4218,11 +4218,29 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
  riscv_if_info.original_cost += COSTS_N_INSNS (1);
  riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
}
- last_dest = NULL_RTX;
+
  rtx dest = SET_DEST (x);
- if (COMPARISON_P (src)
+
+ /* Do something similar for the  moves that are likely to
+turn into NOP moves by the time the register allocator is
+done.  These are also side effects of how our sCC expanders
+work.  We'll want to check and update LAST_DEST here too.  */
+ if (last_dest
  && REG_P (dest)
- && GET_MODE (dest) == SImode)
+ && GET_MODE (dest) == SImode
+ && SUBREG_P (src)
+ && SUBREG_PROMOTED_VAR_P (src)
+ && REGNO (SUBREG_REG (src)) == REGNO (last_dest))
+   {
+ riscv_if_info.original_cost += COSTS_N_INSNS (1);
+ riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
+ if (last_dest)
+   last_dest = dest;
+   }
+ else
+   last_dest = NULL_RTX;
+
+ if (COMPARISON_P (src) && REG_P (dest))
last_dest = dest;
}
   else
@@ -4904,13 +4922,31 @@ riscv_expand_int_scc (rtx target, enum rtx_code code, 
rtx op0, rtx op1, bool *in
   riscv_extend_comparands (code, &op0, &op1);
   op0 = force_reg (word_mode, op0);
 
+  /* For sub-word targets on rv64, do the computation in DImode
+ then extract the lowpart for the final target, marking it
+ as sign extended.  Note that it's also properly zero extended,
+ but it's probably more profitable to expose it as sign extended.  */
+  rtx t;
+  if (TARGET_64BIT && GET_MODE (target) == SImode)
+t = gen_reg_rtx (DImode);
+  else
+t = target;
+
   if (code == EQ || code == NE)
 {
   rtx zie = riscv_zero_if_equal (op0, op1);
-  riscv_emit_binary (code, target, zie, const0_rtx);
+  riscv_emit_binary (code, t, zie, const0_rtx);
 }
   else
-riscv_emit_int_order_test (code, invert_ptr, target, op0, op1);
+riscv_emit_int_order_test (code, invert_ptr, t, op0, op1);
+
+  if (t != target)
+{
+  t = gen_lowpart (SImode, t);
+  SUBREG_PROMOTED_VAR_P (t) = 1;
+  SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+  emit_move_insn (target, t);
+}
 }
 
 /* Like riscv_expand_int_scc, but for floating-point comparisons.  */

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Jason Merrill


On 9/5/24 1:26 PM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 10:54 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

A lambda within a default template argument used in some template-id
may have a smaller template depth than the context of the template-id.
For example, the lambda in v1's default template argument has template
depth 1, and in v2's has template depth 2, but the template-ids v1<0>
and v2<0> which uses these default arguments appear in a depth 3 template
context.  So add_extra_args will ultimately return args with depth 3 --
too many args for the lambda, leading to a bogus substitution.

This patch fixes this by trimming the result of add_extra_args to match
the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH field
is added that tracks the template-ness of a lambda;

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
trim the augmented template arguments to match the template depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
   gcc/cp/pt.cc  | 11 +
   gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30 +++
   2 files changed, 41 insertions(+)
   create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..c49a26b4f5e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
tsubst_flags_t complain, tree in_decl)
 LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args, complain);
 return t;
   }
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then it's probably
from


"probably" seems wrong, given that it wasn't implemented for this case.


I said "probably" because in e.g.

 template
 bool b = true;

 template
 void f() {
   b<0>;
 }

the lambda context has the same depth as the template-id context.  But
as you point out, the issue is ultimately related vs unrelated
parameters rather than depth.




+a context (e.g. default template argument context) which may have
fewer
+levels than the current context it's embedded in.  Adjust the result
of
+add_extra_args accordingly.  */


Hmm, this looks like a situation of not just fewer levels, but potentially
unrelated levels.  "args" here is for f, which shares no template context with
v1.  What happens if your templates have non-type template parameters?


Indeed before add_extra_args 'args' will be unrelated, but after doing
add_extra_args the innermost levels of 'args' will correspond to the
lambda's template context, and so using get_innermost_template_args
ought to get rid of the unrelated arguments, keeping only the ones
relevant to the original lambda context.


Will they?  The original function of add_extra_args was to reintroduce 
outer args that we weren't able to substitute the last time through 
tsubst_lambda_expr.  I expect the innermost levels of 'args' to be the 
same before and after.


Hmm, looking at add_extra_args again, I see that whether the EXTRA_ARGS 
go on the outside or the inside depends on whether they're dependent. 
How does this work other than by accident? >.>


Jason

Re: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Jeff Law





On 9/5/24 6:16 AM, Raphael Zinsly wrote:

On Wed, Sep 4, 2024 at 8:32 PM Jeff Law  wrote:

On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
...

+  bool bit31 = (hival & 0x8000) != 0;
+  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
+  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
+  int shiftval = 0;
+
+  /* Adjust the shift into the high half accordingly.  */
+  if ((trailing_shift > 0 && hival == (loval >> trailing_shift))
+   || (trailing_shift < 0 && hival == (loval << trailing_shift)))
+ shiftval = 32 - trailing_shift;
+  else if ((leading_shift < 0 && hival == (loval >> leading_shift))
+ || (leading_shift > 0 && hival == (loval << leading_shift)))

Don't these trigger undefined behavior when tailing_shift or
leading_shift is < 0?  We shouldn't ever generate negative shift counts.


The value of trailing/leading_shift is added to 32, we will never have
negative shift counts.

In the IF you have this conditional:


(trailing_shift < 0 && hival == (loval << trailing_shift))


How could that not be undefined behvaior?  You first test that the value 
is less than zero and if it is less than zero you use it as a shift count.


Similarly for:


(leading_shift < 0 && hival == (loval >> leading_shift))


Jeff

[PATCH] c++: vtable referring to "unavailable" virtual fn [PR116606]

2024-09-05 Thread Marek Polacek

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
mark_vtable_entries already has

   /* It's OK for the vtable to refer to deprecated virtual functions.  */
   warning_sentinel w(warn_deprecated_decl);

but that doesn't cover __attribute__((unavailable)).  We can use the
following override to cover both.

PR c++/116606

gcc/cp/ChangeLog:

* decl2.cc (mark_vtable_entries): Temporarily override deprecated_state 
to
UNAVAILABLE_DEPRECATED_SUPPRESS.  Remove a warning_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-13.C: New test.
---
 gcc/cp/decl2.cc| 3 ++-
 gcc/testsuite/g++.dg/ext/attr-unavailable-13.C | 8 
 2 files changed, 10 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/attr-unavailable-13.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 3c4f34868ee..0279372488c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2180,7 +2180,8 @@ static void
 mark_vtable_entries (tree decl, vec &consteval_vtables)
 {
   /* It's OK for the vtable to refer to deprecated virtual functions.  */
-  warning_sentinel w(warn_deprecated_decl);
+  auto du = make_temp_override (deprecated_state,
+   UNAVAILABLE_DEPRECATED_SUPPRESS);
 
   bool consteval_seen = false;
 
diff --git a/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C 
b/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C
new file mode 100644
index 000..9ca40005419
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C
@@ -0,0 +1,8 @@
+// PR c++/116606
+// { dg-do compile }
+
+struct C {
+__attribute__((unavailable)) virtual void f() {}
+};
+
+C c;

base-commit: c880fca6cdb16c5efe3a12ee7ecdb2435f5e7105
-- 
2.46.0

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Patrick Palka

On Thu, 5 Sep 2024, Jason Merrill wrote:

> On 9/5/24 1:26 PM, Patrick Palka wrote:
> > On Thu, 5 Sep 2024, Jason Merrill wrote:
> > 
> > > On 9/5/24 10:54 AM, Patrick Palka wrote:
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> > > > for trunk/14?
> > > > 
> > > > -- >8 --
> > > > 
> > > > A lambda within a default template argument used in some template-id
> > > > may have a smaller template depth than the context of the template-id.
> > > > For example, the lambda in v1's default template argument has template
> > > > depth 1, and in v2's has template depth 2, but the template-ids v1<0>
> > > > and v2<0> which uses these default arguments appear in a depth 3
> > > > template
> > > > context.  So add_extra_args will ultimately return args with depth 3 --
> > > > too many args for the lambda, leading to a bogus substitution.
> > > > 
> > > > This patch fixes this by trimming the result of add_extra_args to match
> > > > the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH
> > > > field
> > > > is added that tracks the template-ness of a lambda;
> > > > 
> > > > PR c++/116567
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * pt.cc (tsubst_lambda_expr): For a deferred-substitution 
> > > > lambda,
> > > > trim the augmented template arguments to match the template 
> > > > depth
> > > > of the lambda.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp2a/lambda-targ7.C: New test.
> > > > ---
> > > >gcc/cp/pt.cc  | 11 +
> > > >gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30
> > > > +++
> > > >2 files changed, 41 insertions(+)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C
> > > > 
> > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > index 747e627f547..c49a26b4f5e 100644
> > > > --- a/gcc/cp/pt.cc
> > > > +++ b/gcc/cp/pt.cc
> > > > @@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
> > > > tsubst_flags_t complain, tree in_decl)
> > > >  LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args,
> > > > complain);
> > > >  return t;
> > > >}
> > > > +  if (LAMBDA_EXPR_EXTRA_ARGS (t))
> > > > +{
> > > > +  /* If we deferred substitution into this lambda, then it's
> > > > probably
> > > > from
> > > 
> > > "probably" seems wrong, given that it wasn't implemented for this case.
> > 
> > I said "probably" because in e.g.
> > 
> >  template
> >  bool b = true;
> > 
> >  template
> >  void f() {
> >b<0>;
> >  }
> > 
> > the lambda context has the same depth as the template-id context.  But
> > as you point out, the issue is ultimately related vs unrelated
> > parameters rather than depth.
> > 
> > > 
> > > > +a context (e.g. default template argument context) which may 
> > > > have
> > > > fewer
> > > > +levels than the current context it's embedded in.  Adjust the 
> > > > result
> > > > of
> > > > +add_extra_args accordingly.  */
> > > 
> > > Hmm, this looks like a situation of not just fewer levels, but potentially
> > > unrelated levels.  "args" here is for f, which shares no template context
> > > with
> > > v1.  What happens if your templates have non-type template parameters?
> > 
> > Indeed before add_extra_args 'args' will be unrelated, but after doing
> > add_extra_args the innermost levels of 'args' will correspond to the
> > lambda's template context, and so using get_innermost_template_args
> > ought to get rid of the unrelated arguments, keeping only the ones
> > relevant to the original lambda context.
> 
> Will they?  The original function of add_extra_args was to reintroduce outer
> args that we weren't able to substitute the last time through
> tsubst_lambda_expr.  I expect the innermost levels of 'args' to be the same
> before and after.
> 
> Hmm, looking at add_extra_args again, I see that whether the EXTRA_ARGS go on
> the outside or the inside depends on whether they're dependent. How does this
> work other than by accident? >.>

It's kind of a happy accident indeed :P  In the cases this patch is
concerned with, i.e. a template-id using a default argument containing
a lambda, the extra args will always be considered dependent because
during default template argument coercion we substitute an incomplete
set of targs into the default targ (namely the element corresponding to
the default targ is NULL_TREE), which any_dependent_template_arguments_p
considers dependent.  So add_extra_args will reliably put these captured
extra args as the innermost!

> 
> Jason
> 
>

Re: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Raphael Zinsly

On Thu, Sep 5, 2024 at 3:10 PM Jeff Law  wrote:
> On 9/5/24 6:16 AM, Raphael Zinsly wrote:
> > On Wed, Sep 4, 2024 at 8:32 PM Jeff Law  wrote:
> >> On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
> >> ...
> >>> +  bool bit31 = (hival & 0x8000) != 0;
> >>> +  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
> >>> +  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
> >>> +  int shiftval = 0;
> >>> +
> >>> +  /* Adjust the shift into the high half accordingly.  */
> >>> +  if ((trailing_shift > 0 && hival == (loval >> trailing_shift))
> >>> +   || (trailing_shift < 0 && hival == (loval << trailing_shift)))
> >>> + shiftval = 32 - trailing_shift;
> >>> +  else if ((leading_shift < 0 && hival == (loval >> leading_shift))
> >>> + || (leading_shift > 0 && hival == (loval << leading_shift)))
> >> Don't these trigger undefined behavior when tailing_shift or
> >> leading_shift is < 0?  We shouldn't ever generate negative shift counts.
> >
> > The value of trailing/leading_shift is added to 32, we will never have
> > negative shift counts.
> In the IF you have this conditional:
>
> > (trailing_shift < 0 && hival == (loval << trailing_shift))
>
> How could that not be undefined behvaior?  You first test that the value
> is less than zero and if it is less than zero you use it as a shift count.

I'm not using trailing_shift as the shift count, I'm using shiftval:

+ /* Now we want to shift the previously generated constant into the
+high half.  */
+ alt_codes[alt_cost - 2].code = ASHIFT;
+ alt_codes[alt_cost - 2].value = shiftval;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;


-- 
Raphael Moreira Zinsly

Re: [PATCH 3/3] RISC-V: Constant synthesis of inverted halves

2024-09-05 Thread Jeff Law





On 9/5/24 6:18 AM, Raphael Zinsly wrote:

On Wed, Sep 4, 2024 at 8:35 PM Jeff Law  wrote:

On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
...

+unsigned long foo_0x4afe605fb5019fa0(void) { return 0x4afe605fb5019fa0UL; }
+unsigned long foo_0x07a80d21f857f2de(void) { return 0x07a80d21f857f2deUL; }
+unsigned long foo_0x6699f19c99660e63(void) { return 0x6699f19c99660e63UL; }
+unsigned long foo_0x6c80e48a937f1b75(void) { return 0x6c80e48a937f1b75UL; }
+unsigned long foo_0x47d7193eb828e6c1(void) { return 0x47d7193eb828e6c1UL; }
+unsigned long foo_0x7c627816839d87e9(void) { return 0x7c627816839d87e9UL; }
+unsigned long foo_0x3d69e83ec29617c1(void) { return 0x3d69e83ec29617c1UL; }
+unsigned long foo_0x5bee7ee6a4118119(void) { return 0x5bee7ee6a4118119UL; }
+unsigned long foo_0x73fe20828c01df7d(void) { return 0x73fe20828c01df7dUL; }
+unsigned long foo_0x0f1dc294f0e23d6b(void) { return 0x0f1dc294f0e23d6bUL; }

I must be missing something.  All the tests have bit31 on.  But I don't
think this synthesis is valid when bit31 is on and the code seems to
check this.  What am I missing?


The upper half is the one that is shifted so we check for bit31 of the hival:
bool bit31 = (hival & 0x8000) != 0;
Maybe we should change the name of the variable to bit63.

Ah!  missed that it comes from hival...

But doesn't that highlight the problem.  The lui/addi to construct the 
low bits will sign extend the result out to bit 63 which is why the 
synthesis doesn't work when bit 31 is on.



More concretely for:

unsigned long foo_0x4afe605fb5019fa0(void) { return 0x4afe605fb5019fa0UL; }


The resulting code looks like:


li  a5,-1258184704
addia5,a5,-96
xoria0,a5,-1
sllia0,a0,32
add a0,a0,a5


Which I think is wrong.

The problem is $a5 is going to have the sign extended low constant after 
the li+addi:


0xb5019fa0

We invert that resulting in this value for a0:

0xb5019fa0

We shift that 32 bits with a new value in a0:

0x4afe605f

So we have these values before the last step.

a5:  0xb5019fa0
a0:  0x4afe605f

That can't be right.  That's going to result in:

0x4afe605eb5019fa0
 ^
 ^-- that should have been 0xf


The sequence will work if bit31 is off.  Bit 63's value doesn't really 
matter.


It may help from a mental model to remember that the two input values to 
that final PLUS must not have any set bits in common.  We're using a 
PLUS because it's more likely to compress vs an IOR.  If we had used the 
more obvious IOR your sequence would have generated:


0xb5019fa0

Jeff

Re: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Jeff Law





On 9/5/24 12:38 PM, Raphael Zinsly wrote:

On Thu, Sep 5, 2024 at 3:10 PM Jeff Law  wrote:

On 9/5/24 6:16 AM, Raphael Zinsly wrote:

On Wed, Sep 4, 2024 at 8:32 PM Jeff Law  wrote:

On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
...

+  bool bit31 = (hival & 0x8000) != 0;
+  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
+  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
+  int shiftval = 0;
+
+  /* Adjust the shift into the high half accordingly.  */
+  if ((trailing_shift > 0 && hival == (loval >> trailing_shift))
+   || (trailing_shift < 0 && hival == (loval << trailing_shift)))
+ shiftval = 32 - trailing_shift;
+  else if ((leading_shift < 0 && hival == (loval >> leading_shift))
+ || (leading_shift > 0 && hival == (loval << leading_shift)))

Don't these trigger undefined behavior when tailing_shift or
leading_shift is < 0?  We shouldn't ever generate negative shift counts.


The value of trailing/leading_shift is added to 32, we will never have
negative shift counts.

In the IF you have this conditional:


(trailing_shift < 0 && hival == (loval << trailing_shift))


How could that not be undefined behvaior?  You first test that the value
is less than zero and if it is less than zero you use it as a shift count.


I'm not using trailing_shift as the shift count, I'm using shiftval:

+ /* Now we want to shift the previously generated constant into the
+high half.  */
+ alt_codes[alt_cost - 2].code = ASHIFT;
+ alt_codes[alt_cost - 2].value = shiftval;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;
I'm not referring to the generated code.  The compiler itself will 
exhibit undefined behavior due to the negative shift count in that test.



jeff

Re: [to-be-committed][V2][RISC-V] Avoid unnecessary extensions after sCC insns

2024-09-05 Thread Palmer Dabbelt


On Thu, 05 Sep 2024 11:03:18 PDT (-0700), jeffreya...@gmail.com wrote:


So the first patch failed the pre-commit CI; it didn't fail in my
testing because I'm using --with-arch to set a default configuration
that includes things like zicond to ensure that's always tested.  And
the failing test is skipped when zicond is enabled by default.

The failing test is designed to ensure that we don't miss an
if-conversion due to costing issues around the extension that was
typically done in an sCC sequence (which is why it's only run when
zicond is off).

The test failed because we have a little routine that is highly
dependent on the code generated by the sCC expander and will adjust the
costing to account for expansion quirks that usually go away in register
allocation.


That code needs to be enhanced to work after the sCC expansion change.
Essentially it needs to account for the subreg extraction that shows up
in the sequence as well as being a bit looser on mode checking.

I kept the code working for the old sequences -- in theory a user could
conjure up the old sequence so handling them seems useful.

This also drops the testsuite changes.  Palmer's change makes them
unnecessary.


OK, so we'll just go with that one assuming it passes the tests?  I 
don't really care a ton either way, I was mostly just interested in the 
sign extension stuff as we've had so many issues there that I don't know 
how to solve.  So I figured I'd poke around to see if there was anything 
interesting going on, but it was pretty boring.




Waiting on pre-commit CI before taking any further action...

Jeff
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a38cb72f09f..39489c4377e 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4218,11 +4218,29 @@ riscv_noce_conversion_profitable_p (rtx_insn *seq,
  riscv_if_info.original_cost += COSTS_N_INSNS (1);
  riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
}
- last_dest = NULL_RTX;
+
  rtx dest = SET_DEST (x);
- if (COMPARISON_P (src)
+
+ /* Do something similar for the  moves that are likely to
+turn into NOP moves by the time the register allocator is
+done.  These are also side effects of how our sCC expanders
+work.  We'll want to check and update LAST_DEST here too.  */
+ if (last_dest
  && REG_P (dest)
- && GET_MODE (dest) == SImode)
+ && GET_MODE (dest) == SImode
+ && SUBREG_P (src)
+ && SUBREG_PROMOTED_VAR_P (src)
+ && REGNO (SUBREG_REG (src)) == REGNO (last_dest))
+   {
+ riscv_if_info.original_cost += COSTS_N_INSNS (1);
+ riscv_if_info.max_seq_cost += COSTS_N_INSNS (1);
+ if (last_dest)
+   last_dest = dest;
+   }
+ else
+   last_dest = NULL_RTX;
+
+ if (COMPARISON_P (src) && REG_P (dest))
last_dest = dest;
}
   else
@@ -4904,13 +4922,31 @@ riscv_expand_int_scc (rtx target, enum rtx_code code, 
rtx op0, rtx op1, bool *in
   riscv_extend_comparands (code, &op0, &op1);
   op0 = force_reg (word_mode, op0);

+  /* For sub-word targets on rv64, do the computation in DImode
+ then extract the lowpart for the final target, marking it
+ as sign extended.  Note that it's also properly zero extended,
+ but it's probably more profitable to expose it as sign extended.  */
+  rtx t;
+  if (TARGET_64BIT && GET_MODE (target) == SImode)
+t = gen_reg_rtx (DImode);
+  else
+t = target;
+
   if (code == EQ || code == NE)
 {
   rtx zie = riscv_zero_if_equal (op0, op1);
-  riscv_emit_binary (code, target, zie, const0_rtx);
+  riscv_emit_binary (code, t, zie, const0_rtx);
 }
   else
-riscv_emit_int_order_test (code, invert_ptr, target, op0, op1);
+riscv_emit_int_order_test (code, invert_ptr, t, op0, op1);
+
+  if (t != target)
+{
+  t = gen_lowpart (SImode, t);
+  SUBREG_PROMOTED_VAR_P (t) = 1;
+  SUBREG_PROMOTED_SET (t, SRP_SIGNED);
+  emit_move_insn (target, t);
+}
 }

 /* Like riscv_expand_int_scc, but for floating-point comparisons.  */

[PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-09-05 Thread Palmer Dabbelt

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

PR target/116615
* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
---
There's a bunch more discussion in the bug, but it's starting to smell
like this was just a holdover from MIPS (where maybe it also shouldn't
be set).  I haven't tested this, but I figured I'd send the patch to get
a little more visibility.

I guess we should also kick off something like a SPEC run to make sure
there's no regressions?
---
 gcc/config/riscv/riscv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index ead97867eb8..a0ccd1fc762 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -939,8 +939,6 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
 #define TARGET_VECTOR_MISALIGN_SUPPORTED \
riscv_vector_unaligned_access_p
 
-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
-
 /* Control the assembler format that we output.  */
 
 /* Output to assembler file text saying following lines
-- 
2.45.2

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-09-05 Thread Palmer Dabbelt


On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:

We have cheap logical ops, so let's just move this back to the default
to take advantage of the standard branch/op hueristics.

gcc/ChangeLog:

PR target/116615
* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
---
There's a bunch more discussion in the bug, but it's starting to smell
like this was just a holdover from MIPS (where maybe it also shouldn't
be set).  I haven't tested this, but I figured I'd send the patch to get
a little more visibility.

I guess we should also kick off something like a SPEC run to make sure
there's no regressions?


Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where 
short-circuiting the FP comparisons helps on LoongArch.


Not sure if I'm also missing something here, but it kind of feels like 
that should be handled by a more generic optimization decision that just 
globally "should we short circuit logical ops" -- assuming it really is 
the FP comparisons that are causing the cost, as opposed to the actual 
logical ops themselves.


Probably best to actually run the benchmarks, though...


---
 gcc/config/riscv/riscv.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index ead97867eb8..a0ccd1fc762 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -939,8 +939,6 @@ extern enum riscv_cc get_riscv_cc (const rtx use);
 #define TARGET_VECTOR_MISALIGN_SUPPORTED \
riscv_vector_unaligned_access_p

-#define LOGICAL_OP_NON_SHORT_CIRCUIT 0
-
 /* Control the assembler format that we output.  */

 /* Output to assembler file text saying following lines

Re: [PATCH 2/3] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Raphael Zinsly

On Thu, Sep 5, 2024 at 3:41 PM Jeff Law  wrote:
>
>
>
> On 9/5/24 12:38 PM, Raphael Zinsly wrote:
> > On Thu, Sep 5, 2024 at 3:10 PM Jeff Law  wrote:
> >> On 9/5/24 6:16 AM, Raphael Zinsly wrote:
> >>> On Wed, Sep 4, 2024 at 8:32 PM Jeff Law  wrote:
>  On 9/2/24 2:01 PM, Raphael Moreira Zinsly wrote:
>  ...
> > +  bool bit31 = (hival & 0x8000) != 0;
> > +  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
> > +  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
> > +  int shiftval = 0;
> > +
> > +  /* Adjust the shift into the high half accordingly.  */
> > +  if ((trailing_shift > 0 && hival == (loval >> trailing_shift))
> > +   || (trailing_shift < 0 && hival == (loval << trailing_shift)))
> > + shiftval = 32 - trailing_shift;
> > +  else if ((leading_shift < 0 && hival == (loval >> leading_shift))
> > + || (leading_shift > 0 && hival == (loval << 
> > leading_shift)))
>  Don't these trigger undefined behavior when tailing_shift or
>  leading_shift is < 0?  We shouldn't ever generate negative shift counts.
> >>>
> >>> The value of trailing/leading_shift is added to 32, we will never have
> >>> negative shift counts.
> >> In the IF you have this conditional:
> >>
> >>> (trailing_shift < 0 && hival == (loval << trailing_shift))
> >>
> >> How could that not be undefined behvaior?  You first test that the value
> >> is less than zero and if it is less than zero you use it as a shift count.
> >
> > I'm not using trailing_shift as the shift count, I'm using shiftval:
> >
> > + /* Now we want to shift the previously generated constant into the
> > +high half.  */
> > + alt_codes[alt_cost - 2].code = ASHIFT;
> > + alt_codes[alt_cost - 2].value = shiftval;
> > + alt_codes[alt_cost - 2].use_uw = false;
> > + alt_codes[alt_cost - 2].save_temporary = false;
> I'm not referring to the generated code.  The compiler itself will
> exhibit undefined behavior due to the negative shift count in that test.
>

Oh sorry. I get it now, the issue is with (loval << trailing_shift).
I was trying to cover all possibilities, but if the trailing_shift is
negative, the leading_shift should be positive and vice versa, so we
could keep only the positive tests.

I'll prepare a v2.


Thanks,
--
Raphael Moreira Zinsly

Re: [PATCH] c++: template depth of lambda in default targ [PR116567]

2024-09-05 Thread Jason Merrill


On 9/5/24 2:28 PM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 1:26 PM, Patrick Palka wrote:

On Thu, 5 Sep 2024, Jason Merrill wrote:


On 9/5/24 10:54 AM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
for trunk/14?

-- >8 --

A lambda within a default template argument used in some template-id
may have a smaller template depth than the context of the template-id.
For example, the lambda in v1's default template argument has template
depth 1, and in v2's has template depth 2, but the template-ids v1<0>
and v2<0> which uses these default arguments appear in a depth 3
template
context.  So add_extra_args will ultimately return args with depth 3 --
too many args for the lambda, leading to a bogus substitution.

This patch fixes this by trimming the result of add_extra_args to match
the template depth of the lambda.  A new LAMBDA_EXPR_TEMPLATE_DEPTH
field
is added that tracks the template-ness of a lambda;

PR c++/116567

gcc/cp/ChangeLog:

* pt.cc (tsubst_lambda_expr): For a deferred-substitution lambda,
trim the augmented template arguments to match the template depth
of the lambda.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ7.C: New test.
---
gcc/cp/pt.cc  | 11 +
gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C | 30
+++
2 files changed, 41 insertions(+)
create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ7.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 747e627f547..c49a26b4f5e 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19699,6 +19699,17 @@ tsubst_lambda_expr (tree t, tree args,
tsubst_flags_t complain, tree in_decl)
  LAMBDA_EXPR_EXTRA_ARGS (t) = build_extra_args (t, args,
complain);
  return t;
}
+  if (LAMBDA_EXPR_EXTRA_ARGS (t))
+{
+  /* If we deferred substitution into this lambda, then it's
probably
from


"probably" seems wrong, given that it wasn't implemented for this case.


I said "probably" because in e.g.

  template
  bool b = true;

  template
  void f() {
b<0>;
  }

the lambda context has the same depth as the template-id context.  But
as you point out, the issue is ultimately related vs unrelated
parameters rather than depth.




+a context (e.g. default template argument context) which may have
fewer
+levels than the current context it's embedded in.  Adjust the result
of
+add_extra_args accordingly.  */


Hmm, this looks like a situation of not just fewer levels, but potentially
unrelated levels.  "args" here is for f, which shares no template context
with
v1.  What happens if your templates have non-type template parameters?


Indeed before add_extra_args 'args' will be unrelated, but after doing
add_extra_args the innermost levels of 'args' will correspond to the
lambda's template context, and so using get_innermost_template_args
ought to get rid of the unrelated arguments, keeping only the ones
relevant to the original lambda context.


Will they?  The original function of add_extra_args was to reintroduce outer
args that we weren't able to substitute the last time through
tsubst_lambda_expr.  I expect the innermost levels of 'args' to be the same
before and after.

Hmm, looking at add_extra_args again, I see that whether the EXTRA_ARGS go on
the outside or the inside depends on whether they're dependent. How does this
work other than by accident? >.>


It's kind of a happy accident indeed :P  In the cases this patch is
concerned with, i.e. a template-id using a default argument containing
a lambda, the extra args will always be considered dependent because
during default template argument coercion we substitute an incomplete
set of targs into the default targ (namely the element corresponding to
the default targ is NULL_TREE), which any_dependent_template_arguments_p
considers dependent.  So add_extra_args will reliably put these captured
extra args as the innermost!


I see that the testcases you enabled this code to handle in
r12--g2c699fd29829cd were also about lambda/requires in default 
template arguments.  Can we detect this case some other way than 
uses_template_parms?


Can we do the pruning added by this patch in add_extra_args instead of 
its caller?


Incidentally, why is having extra outer levels causing trouble?  I 
thought we were generally able to safely ignore them.


Jason

Re: [PATCH] c++: vtable referring to "unavailable" virtual fn [PR116606]

2024-09-05 Thread Jason Merrill


On 9/5/24 2:28 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?


OK.


-- >8 --
mark_vtable_entries already has

/* It's OK for the vtable to refer to deprecated virtual functions.  */
warning_sentinel w(warn_deprecated_decl);

but that doesn't cover __attribute__((unavailable)).  We can use the
following override to cover both.

PR c++/116606

gcc/cp/ChangeLog:

* decl2.cc (mark_vtable_entries): Temporarily override deprecated_state 
to
UNAVAILABLE_DEPRECATED_SUPPRESS.  Remove a warning_sentinel.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-unavailable-13.C: New test.
---
  gcc/cp/decl2.cc| 3 ++-
  gcc/testsuite/g++.dg/ext/attr-unavailable-13.C | 8 
  2 files changed, 10 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/ext/attr-unavailable-13.C

diff --git a/gcc/cp/decl2.cc b/gcc/cp/decl2.cc
index 3c4f34868ee..0279372488c 100644
--- a/gcc/cp/decl2.cc
+++ b/gcc/cp/decl2.cc
@@ -2180,7 +2180,8 @@ static void
  mark_vtable_entries (tree decl, vec &consteval_vtables)
  {
/* It's OK for the vtable to refer to deprecated virtual functions.  */
-  warning_sentinel w(warn_deprecated_decl);
+  auto du = make_temp_override (deprecated_state,
+   UNAVAILABLE_DEPRECATED_SUPPRESS);
  
bool consteval_seen = false;
  
diff --git a/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C b/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C

new file mode 100644
index 000..9ca40005419
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/attr-unavailable-13.C
@@ -0,0 +1,8 @@
+// PR c++/116606
+// { dg-do compile }
+
+struct C {
+__attribute__((unavailable)) virtual void f() {}
+};
+
+C c;

base-commit: c880fca6cdb16c5efe3a12ee7ecdb2435f5e7105

Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-09-05 Thread Xi Ruoyao

On Thu, 2024-09-05 at 11:59 -0700, Palmer Dabbelt wrote:
> On Thu, 05 Sep 2024 11:52:57 PDT (-0700), Palmer Dabbelt wrote:
> > We have cheap logical ops, so let's just move this back to the default
> > to take advantage of the standard branch/op hueristics.
> > 
> > gcc/ChangeLog:
> > 
> > PR target/116615
> > * config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> > ---
> > There's a bunch more discussion in the bug, but it's starting to smell
> > like this was just a holdover from MIPS (where maybe it also shouldn't
> > be set).  I haven't tested this, but I figured I'd send the patch to get
> > a little more visibility.
> > 
> > I guess we should also kick off something like a SPEC run to make sure
> > there's no regressions?
> 
> Sorry I missed it in the bug, but Ruoyao points to dddafe94823 
> ("LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUIT") where 
> short-circuiting the FP comparisons helps on LoongArch.
> 
> Not sure if I'm also missing something here, but it kind of feels like
> that should be handled by a more generic optimization decision that just 
> globally "should we short circuit logical ops" -- assuming it really is 
> the FP comparisons that are causing the cost, as opposed to the actual
> logical ops themselves.

IIUC there are some contributing factors here:

1. On LoongArch FP comparison is slow (costing 5 cycles).
2. On LoongArch the FP comparison result is stored into FCC registers,
and to do logical operations on two comparison results they need to be
moved into GPR first.  The move costs one or two cycles (depending on
the uarch).

and maybe

3. The FP comparison result in the SPEC tests are somewhat predictable.
IIRC when I tested dddafe94823 I made a test program where the FP
comparison results are "randomized" (so the branch predictor is
defeated), then the branch-less code generated with -Ofast --param
logical-op-non-short-circuit=1 was actually faster than the code
generated with -Ofast --param logical-op-non-short-circuit=0.

AFAIK 2 isn't an issue for RISC-V (where FP comparison result is just in
GPR) but 1 and 3 may still need to be considered.

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University

Re: [to-be-committed][V2][RISC-V] Avoid unnecessary extensions after sCC insns

2024-09-05 Thread Jeff Law





On 9/5/24 12:46 PM, Palmer Dabbelt wrote:

On Thu, 05 Sep 2024 11:03:18 PDT (-0700), jeffreya...@gmail.com wrote:


So the first patch failed the pre-commit CI; it didn't fail in my
testing because I'm using --with-arch to set a default configuration
that includes things like zicond to ensure that's always tested.  And
the failing test is skipped when zicond is enabled by default.

The failing test is designed to ensure that we don't miss an
if-conversion due to costing issues around the extension that was
typically done in an sCC sequence (which is why it's only run when
zicond is off).

The test failed because we have a little routine that is highly
dependent on the code generated by the sCC expander and will adjust the
costing to account for expansion quirks that usually go away in register
allocation.


That code needs to be enhanced to work after the sCC expansion change.
Essentially it needs to account for the subreg extraction that shows up
in the sequence as well as being a bit looser on mode checking.

I kept the code working for the old sequences -- in theory a user could
conjure up the old sequence so handling them seems useful.

This also drops the testsuite changes.  Palmer's change makes them
unnecessary.


OK, so we'll just go with that one assuming it passes the tests? 
That's the plan.  I pushed your change last night, so I just need a 
clean run on my change now (fingers crossed).




don't really care a ton either way, I was mostly just interested in the 
sign extension stuff as we've had so many issues there that I don't know 
how to solve.  So I figured I'd poke around to see if there was anything 
interesting going on, but it was pretty boring.

There's still "stuff" in this space, but it's of less and less of a concern.

Extensions are typically less than 1% of our dynamic instruction stream 
for specint these days.   The worst cases are 502.gcc where extensions 
vary from 1% - 1.3% of the dynamic stream and 557.xz where they range 
from 1.2% - 1.4% of the dynamic instruction stream.


If it weren't for the measurable real performance regression we saw 
internally on x264 I wouldn't have been looking in this space at all. 
Finding the nugget for sCC expansion was just a bit of frosting from 
that effort.


As far as "stuff" goes.  There's probably on the order of 2b unnecessary 
extensions in 541.leela.  I haven't chased that down yet -- it 
represents a tiny fraction of the dynamic count.  Whatever it is, it was 
caught by the REP_MODE_EXTENDED bits from VRULL and isn't by any of the 
other mechanisms we have in place right now.



Jeff

[PATCH] Fortran: fix ICE in gfc_create_module_variable [PR100273]

2024-09-05 Thread Harald Anlauf

Dear all,

the attached simple patch fixes a corner case related to pr84868,
which was tracked separately.  While Paul's patch for pr84868 added
the framework for treating len_trim in the specification part of
a character function, it missed the possibility that that function
need not appear at the top level of a module, but could be a contained
function.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald

From 1f462b5072a5e82c35921f7e3bdf3959c4a49dc9 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 5 Sep 2024 21:30:25 +0200
Subject: [PATCH] Fortran: fix ICE in gfc_create_module_variable [PR100273]

gcc/fortran/ChangeLog:

	PR fortran/100273
	* trans-decl.cc (gfc_create_module_variable): Handle module
	variable also when it is needed for the result specification
	of a contained function.

gcc/testsuite/ChangeLog:

	PR fortran/100273
	* gfortran.dg/pr100273.f90: New test.
---
 gcc/fortran/trans-decl.cc  |  3 ++-
 gcc/testsuite/gfortran.dg/pr100273.f90 | 26 ++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr100273.f90

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 6692ac7ef4c..ee41d66e6d2 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -5540,7 +5540,8 @@ gfc_create_module_variable (gfc_symbol * sym)
   /* Create the variable.  */
   pushdecl (decl);
   gcc_assert (sym->ns->proc_name->attr.flavor == FL_MODULE
-	  || (sym->ns->parent->proc_name->attr.flavor == FL_MODULE
+	  || ((sym->ns->parent->proc_name->attr.flavor == FL_MODULE
+		   || sym->ns->parent->proc_name->attr.flavor == FL_PROCEDURE)
 		  && sym->fn_result_spec));
   DECL_CONTEXT (decl) = sym->ns->proc_name->backend_decl;
   rest_of_decl_compilation (decl, 1, 0);
diff --git a/gcc/testsuite/gfortran.dg/pr100273.f90 b/gcc/testsuite/gfortran.dg/pr100273.f90
new file mode 100644
index 000..f71947ad802
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr100273.f90
@@ -0,0 +1,26 @@
+! { dg-do compile }
+! PR fortran/100273 - ICE in gfc_create_module_variable
+!
+! Contributed by G.Steinmetz
+
+module m
+  implicit none
+contains
+  character(4) function g(k)
+integer :: k
+g = f(k)
+  contains
+function f(n)
+  character(3), parameter :: a(2) = ['1  ', '123']
+  integer :: n
+  character(len_trim(a(n))) :: f
+  f = 'abc'
+end
+  end
+end
+program p
+  use m
+  implicit none
+  print *, '>>' // g(1) // '<<'
+  print *, '>>' // g(2) // '<<'
+end
--
2.35.3

Re: [PATCH] Fortran: fix ICE in gfc_create_module_variable [PR100273]

2024-09-05 Thread Jerry D


On 9/5/24 12:42 PM, Harald Anlauf wrote:

Dear all,

the attached simple patch fixes a corner case related to pr84868,
which was tracked separately.  While Paul's patch for pr84868 added
the framework for treating len_trim in the specification part of
a character function, it missed the possibility that that function
need not appear at the top level of a module, but could be a contained
function.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

Thanks,
Harald



OK for mainline. Thanks.

Jerry

[PATCH 1/2 v2] RISC-V: Additional large constant synthesis improvements

2024-09-05 Thread Raphael Moreira Zinsly

Changes since v1:
- Fix bit31.
- Remove negative shift checks.
- Fix synthesis-7.c expected output.

-- >8 --

Improve handling of large constants in riscv_build_integer, generate
better code for constants where the high half can be constructed
by shifting/shiftNadding the low half or if the halves differ by less
than 2k.

gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_build_integer): Detect new case
of constants that can be improved.
(riscv_move_integer): Add synthesys for concatening constants
without Zbkb.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/synthesis-7.c: Adjust expected output.
* gcc.target/riscv/synthesis-12.c: New test.
* gcc.target/riscv/synthesis-13.c: New test.
* gcc.target/riscv/synthesis-14.c: New test.
---
 gcc/config/riscv/riscv.cc | 138 +-
 gcc/testsuite/gcc.target/riscv/synthesis-12.c |  26 
 gcc/testsuite/gcc.target/riscv/synthesis-13.c |  26 
 gcc/testsuite/gcc.target/riscv/synthesis-14.c |  28 
 gcc/testsuite/gcc.target/riscv/synthesis-7.c  |   2 +-
 5 files changed, 213 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-12.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-13.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/synthesis-14.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index a38cb72f09f..df8a5a1c1e2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1231,6 +1231,122 @@ riscv_build_integer (struct riscv_integer_op *codes, 
HOST_WIDE_INT value,
}
 
 }
+  else if (cost > 4 && TARGET_64BIT && can_create_pseudo_p ()
+  && allow_new_pseudos)
+{
+  struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
+  int alt_cost;
+
+  unsigned HOST_WIDE_INT loval = value & 0x;
+  unsigned HOST_WIDE_INT hival = (value & ~loval) >> 32;
+  bool bit31 = (loval & 0x8000) != 0;
+  int trailing_shift = ctz_hwi (loval) - ctz_hwi (hival);
+  int leading_shift = clz_hwi (loval) - clz_hwi (hival);
+  int shiftval = 0;
+
+  /* Adjust the shift into the high half accordingly.  */
+  if ((trailing_shift > 0 && hival == (loval >> trailing_shift)))
+   shiftval = 32 - trailing_shift;
+  else if ((leading_shift > 0 && hival == (loval << leading_shift)))
+   shiftval = 32 + leading_shift;
+
+  if (shiftval && !bit31)
+   alt_cost = 2 + riscv_build_integer_1 (alt_codes, sext_hwi (loval, 32),
+ mode);
+
+  /* For constants where the upper half is a shift of the lower half we
+can do a shift followed by an or.  */
+  if (shiftval && alt_cost < cost && !bit31)
+   {
+ /* We need to save the first constant we build.  */
+ alt_codes[alt_cost - 3].save_temporary = true;
+
+ /* Now we want to shift the previously generated constant into the
+high half.  */
+ alt_codes[alt_cost - 2].code = ASHIFT;
+ alt_codes[alt_cost - 2].value = shiftval;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;
+
+ /* And the final step, IOR the two halves together.  Since this uses
+the saved temporary, use CONCAT similar to what we do for Zbkb.  */
+ alt_codes[alt_cost - 1].code = CONCAT;
+ alt_codes[alt_cost - 1].value = 0;
+ alt_codes[alt_cost - 1].use_uw = false;
+ alt_codes[alt_cost - 1].save_temporary = false;
+
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+
+  if (cost > 4 && !bit31 && TARGET_ZBA)
+   {
+ int value = 0;
+
+ /* Check for a shNadd.  */
+ if (hival == loval * 3)
+   value = 3;
+ else if (hival == loval * 5)
+   value = 5;
+ else if (hival == loval * 9)
+   value = 9;
+
+ if (value)
+   alt_cost = 2 + riscv_build_integer_1 (alt_codes,
+ sext_hwi (loval, 32), mode);
+
+ /* For constants where the upper half is a shNadd of the lower half
+we can do a similar transformation.  */
+ if (value && alt_cost < cost)
+   {
+ alt_codes[alt_cost - 3].save_temporary = true;
+ alt_codes[alt_cost - 2].code = FMA;
+ alt_codes[alt_cost - 2].value = value;
+ alt_codes[alt_cost - 2].use_uw = false;
+ alt_codes[alt_cost - 2].save_temporary = false;
+ alt_codes[alt_cost - 1].code = CONCAT;
+ alt_codes[alt_cost - 1].value = 0;
+ alt_codes[alt_cost - 1].use_uw = false;
+ alt_codes[alt_cost - 1].save_temporary = false;
+
+ memcpy (codes, alt_codes, sizeof (alt_codes));
+ cost = alt_cost;
+   }
+   }
+
+  if (cost > 4 &

1 2 >

1 - 100 of 130 matches

Mail list logo