Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 8:26 PM Jonathan Wakely  wrote:
>
> On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> >
> > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > >
> > > Instead of looping over every byte of the tail, unroll loop manually
> > > using switch statement, then compilers (at least GCC and Clang) will
> > > generate a jump table [1], which is faster on a microbenchmark [2].
> > >
> > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > >
> > > libstdc++-v3/ChangeLog:
> > >
> > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > >   loop using switch statement.
> > >
> > > Signed-off-by: Dmitry Ilvokhin 
> > > ---
> > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > index 3665375096a..294a7323dd0 100644
> > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > @@ -50,10 +50,29 @@ namespace
> > >load_bytes(const char* p, int n)
> > >{
> > >  std::size_t result = 0;
> > > ---n;
> > > -do
> > > -  result = (result << 8) + static_cast(p[n]);
> > > -while (--n >= 0);
> >
> > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > only hash the first 8 bytes.
>
> Ah, but it's only ever called with load_bytes(end, len & 0x7)

The compiler should do such transforms - you probably want to tell
it that n < 8 though, it likely doesn't (always) know.

>
>
> >
> > > +switch(n & 7)
> > > +  {
> > > +  case 7:
> > > +   result |= std::size_t(p[6]) << 48;
> > > +   [[gnu::fallthrough]];
> > > +  case 6:
> > > +   result |= std::size_t(p[5]) << 40;
> > > +   [[gnu::fallthrough]];
> > > +  case 5:
> > > +   result |= std::size_t(p[4]) << 32;
> > > +   [[gnu::fallthrough]];
> > > +  case 4:
> > > +   result |= std::size_t(p[3]) << 24;
> > > +   [[gnu::fallthrough]];
> > > +  case 3:
> > > +   result |= std::size_t(p[2]) << 16;
> > > +   [[gnu::fallthrough]];
> > > +  case 2:
> > > +   result |= std::size_t(p[1]) << 8;
> > > +   [[gnu::fallthrough]];
> > > +  case 1:
> > > +   result |= std::size_t(p[0]);
> > > +  };
> > >  return result;
> > >}
> > >
> > > --
> > > 2.43.5
> > >
>


Re: [patch,testsuite] Fix gcc.c-torture/execute/ieee/pr108540-1.c

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 1:30 PM Georg-Johann Lay  wrote:
>
> gcc.c-torture/execute/ieee/pr108540-1.c obviously requires that double
> is a 64-bit type, hence add pr108540-1.x as an according filter.
>
> Ok for trunk?
>
> And is there a reason for why we are still putting test cases in
> these old parts of the testsuite that don't support dg-magic-comments
> like
>
> /* { dg-require-effective-target double64 } */
>
> ?

No, it's better to move the test - OK with that change.

Thanks,
Richard.

> Johann
>
> --
>
> testsuite - Fix gcc.c-torture/execute/ieee/pr108540-1.c
>
>   PR testsuite/108540
> gcc/testsuite/
> * gcc.c-torture/execute/ieee/pr108540-1.c: Un-preprocess
> __SIZE_TYPE__ and __INT64_TYPE__.
> * gcc.c-torture/execute/ieee/pr108540-1.x: New file, requires 
> double64.


Re: [PATCH 1/3] cfgexpand: Expand comment on when non-var clobbers can show up

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 6:09 PM Andrew Pinski  wrote:
>
> The comment here is not wrong, just it would be better if mentioning
> the C++ front-end instead of just the nested function lowering.

OK

> gcc/ChangeLog:
>
> * cfgexpand.cc (add_scope_conflicts_1): Expand comment
> on when non-var clobbers show up.
>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/cfgexpand.cc | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index f32cf1b20c9..6c1096363af 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -639,8 +639,9 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
> for_conflict)
> {
>   tree lhs = gimple_assign_lhs (stmt);
>   unsigned *v;
> - /* Nested function lowering might introduce LHSs
> -that are COMPONENT_REFs.  */
> + /* Handle only plain var clobbers.
> +Nested functions lowering and C++ front-end inserts clobbers
> +which are not just plain variables.  */
>   if (!VAR_P (lhs))
> continue;
>   if (DECL_RTL_IF_SET (lhs) == pc_rtx
> --
> 2.34.1
>


Re: [PATCH] testsuite: Make check-function-bodies work with LTO

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 3:48 PM Richard Sandiford
 wrote:
>
> This patch tries to make check-function-bodies automatically
> choose between reading the regular assembly file and reading the
> LTO assembly file.  There should only ever be one right answer,
> since check-function-bodies doesn't make sense on slim LTO output.
>
> Maybe this will turn out to be impossible to get right, but I'd like
> to try at least.
>
> Tested on aarch64-linux-gnu.  OK to install?

OK.

> Richard
>
>
> gcc/testsuite/
> * lib/scanasm.exp (check-function-bodies): Look in ltrans0.ltrans.s
> if the test appears to be using LTO.
> ---
>  gcc/testsuite/lib/scanasm.exp | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/testsuite/lib/scanasm.exp b/gcc/testsuite/lib/scanasm.exp
> index 737eefc655e..26504deb0e6 100644
> --- a/gcc/testsuite/lib/scanasm.exp
> +++ b/gcc/testsuite/lib/scanasm.exp
> @@ -997,16 +997,17 @@ proc check-function-bodies { args } {
> error "too many arguments to check-function-bodies"
>  }
>
> +upvar 2 dg-extra-tool-flags extra_tool_flags
> +set flags $extra_tool_flags
> +
> +global torture_current_flags
> +if { [info exists torture_current_flags] } {
> +   append flags " " $torture_current_flags
> +}
> +
>  if { [llength $args] >= 3 } {
> set required_flags [lindex $args 2]
>
> -   upvar 2 dg-extra-tool-flags extra_tool_flags
> -   set flags $extra_tool_flags
> -
> -   global torture_current_flags
> -   if { [info exists torture_current_flags] } {
> -   append flags " " $torture_current_flags
> -   }
> foreach required_flag $required_flags {
> switch -- $required_flag {
> target -
> @@ -1043,7 +1044,14 @@ proc check-function-bodies { args } {
>
>  global srcdir
>  set input_filename "$srcdir/$filename"
> -set output_filename "[file rootname [file tail $filename]].s"
> +set output_filename "[file rootname [file tail $filename]]"
> +if { [string match "* -flto *" " ${flags} "]
> +&& ![string match "* -fno-use-linker-plugin *" " ${flags} "]
> +&& ![string match "* -ffat-lto-objects *" " ${flags} "] } {
> +   append output_filename ".ltrans0.ltrans.s"
> +} else {
> +   append output_filename ".s"
> +}
>
>  set prefix [lindex $args 0]
>  set prefix_len [string length $prefix]
> --
> 2.25.1
>


Re: [patch,testsuite,applied] Fix gcc.dg/signbit-6.c for int != 32-bit targets

2024-10-03 Thread Richard Biener
On Wed, Oct 2, 2024 at 5:01 PM Georg-Johann Lay  wrote:
>
> This test failed on int != 32-bit targets due to
> a[0] = b[0] = INT_MIN instead of using INT32_MIN.

OK.

Richard.

> Johann
>
> --
>
>  testsuite/52641 - Fix gcc.dg/signbit-6.c for int != 32-bit targets.
>
>  PR testsuite/52641
>  gcc/testsuite/
>  * gcc.dg/signbit-6.c (main): Initialize a[0] and b[0]
>  with INT32_MIN (instead of with INT_MIN).
>
> diff --git a/gcc/testsuite/gcc.dg/signbit-6.c
> b/gcc/testsuite/gcc.dg/signbit-6.c
> index da186624cfa..3a522893222 100644
> --- a/gcc/testsuite/gcc.dg/signbit-6.c
> +++ b/gcc/testsuite/gcc.dg/signbit-6.c
> @@ -38,8 +38,10 @@ int main ()
> TYPE a[N];
> TYPE b[N];
>
> -  a[0] = INT_MIN;
> -  b[0] = INT_MIN;
> +  /* This will invoke UB due to -INT32_MIN.  The test is supposed to pass
> + because GCC is supposed to handle this UB case in a predictable
> way.  */
> +  a[0] = INT32_MIN;
> +  b[0] = INT32_MIN;
>
> for (int i = 1; i < N; ++i)
>   {


Re: [PATCH] RISC-V: Define LOGICAL_OP_NON_SHORT_CIRCUIT to 1 [PR116615]

2024-10-03 Thread Richard Biener
On Thu, Oct 3, 2024 at 3:15 AM Andrew Waterman  wrote:
>
> On Wed, Oct 2, 2024 at 4:41 PM Jeff Law  wrote:
> >
> >
> >
> > On 10/2/24 4:39 PM, Andrew Waterman wrote:
> > > On Wed, Oct 2, 2024 at 5:56 AM Jeff Law  wrote:
> > >>
> > >>
> > >>
> > >> On 9/5/24 12:52 PM, Palmer Dabbelt wrote:
> > >>> We have cheap logical ops, so let's just move this back to the default
> > >>> to take advantage of the standard branch/op hueristics.
> > >>>
> > >>> gcc/ChangeLog:
> > >>>
> > >>>PR target/116615
> > >>>* config/riscv/riscv.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Remove.
> > >> So on the BPI  this is a pretty clear win.  Not surprisingly perlbench
> > >> and gcc are the big winners.  It somewhat surprisingly regresses x264,
> > >> deepsjeng & leela, but the magnitudes are smaller.  The net from a cycle
> > >> perspective is 2.4%.  Every benchmark looks better from a branch count
> > >> perspective.
> > >>
> > >> So in my mind it's just a matter of fixing any testsuite fallout (I
> > >> would expect some) and this is OK.
> > >
> > > Jeff, were you able to measure the change in static code size, too?
> > > These results are very encouraging, but I'd like to make sure we don't
> > > need to retain the current behavior when optimizing for size.
> > Codesize is ever so slightly worse.  As in less than .1%.  Not worth it
> > in my mind to do something different in that range.

It probably helps code-size when not optimizing for size depending on
how you align jumps.

Richard.

> Thanks.  Agreed.
>
> >
> > Jeff


Re: [PATCH 3/3] Record template specialization hash

2024-10-03 Thread Jason Merrill

On 10/2/24 7:53 AM, Richard Biener wrote:

For a specific testcase a lot of compile-time is spent in re-hashing
hashtable elements upon expansion.  The following records the hash
in the hash element.  This speeds up compilation by 20%.

There's probably module-related uses that need to be adjusted.

Bootstrap failed (guess I was expecting this), but still I think this
is a good idea - maybe somebody can pick it up.


Applying the attached, thanks!


Possibly instead of having a single global hash table having one per ID would be
better.


That sounds excessive to me.  Is the actual hashtable lookup significant 
in the profile?



The hashtable also keeps things GC-live ('args' for example).


Those args should also be referenced by TI_ARGS from the respective 
template specialization.


JasonFrom 6b5e211b071273174d700f9dea34ff219eb43023 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Thu, 3 Oct 2024 16:29:20 -0400
Subject: [PATCH] c++: record template specialization hash
To: gcc-patches@gcc.gnu.org

A lot of compile time of template-heavy code is spent in re-hashing
hashtable elements upon expansion.  The following records the hash in the
hash element.  This speeds up C++20 compilation of stdc++.h by about 25% for
about a 0.1% increase in memory usage.

With the hash value in the entry, we don't need to pass it separately to the
find functions.

Adding default arguments to the spec and hash fields simplifies spec_entry
initialization and avoids problems from hash starting with an indeterminate
value.

gcc/cp/ChangeLog:

	* cp-tree.h (spec_entry::hash): New member.
	* pt.cc (spec_hasher::hash): Set it and return it.
	(maybe_process_partial_specialization): Clear it when
	changing tmpl/args.
	(lookup_template_class): Likewise, don't pass hash to find.
	(retrieve_specialization): Set it, don't pass hash to find.
	(register_specialization): Don't pass hash to find.
	(reregister_specialization): Likewise.
	(match_mergeable_specialization): Likewise.
	(add_mergeable_specialization): Likewise.

Co-authored-by: Richard Biener 
---
 gcc/cp/cp-tree.h | 11 ---
 gcc/cp/pt.cc | 35 +++
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index c5d02567cb4..dc153a97dc4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5840,9 +5840,14 @@ public:
 /* Entry in the specialization hash table.  */
 struct GTY((for_user)) spec_entry
 {
-  tree tmpl;  /* The general template this is a specialization of.  */
-  tree args;  /* The args for this (maybe-partial) specialization.  */
-  tree spec;  /* The specialization itself.  */
+  /* The general template this is a specialization of.  */
+  tree tmpl;
+  /* The args for this (maybe-partial) specialization.  */
+  tree args;
+  /* The specialization itself.  */
+  tree spec = NULL_TREE;
+  /* The cached result of hash_tmpl_and_args (tmpl, args).  */
+  hashval_t hash = 0;
 };
 
 /* in class.cc */
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 4ceae1d38de..03a1144765b 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -1161,6 +1161,7 @@ maybe_process_partial_specialization (tree type)
 		  elt.tmpl = tmpl;
 		  CLASSTYPE_TI_ARGS (inst)
 		= elt.args = INNERMOST_TEMPLATE_ARGS (elt.args);
+		  elt.hash = 0; /* Recalculate after changing tmpl/args.  */
 
 		  spec_entry **slot
 		= type_specializations->find_slot (&elt, INSERT);
@@ -1282,7 +1283,7 @@ retrieve_specialization (tree tmpl, tree args, hashval_t hash)
   spec_entry elt;
   elt.tmpl = tmpl;
   elt.args = args;
-  elt.spec = NULL_TREE;
+  elt.hash = hash;
 
   spec_hash_table *specializations;
   if (DECL_CLASS_TEMPLATE_P (tmpl))
@@ -1290,9 +1291,7 @@ retrieve_specialization (tree tmpl, tree args, hashval_t hash)
   else
 specializations = decl_specializations;
 
-  if (hash == 0)
-hash = spec_hasher::hash (&elt);
-  if (spec_entry *found = specializations->find_with_hash (&elt, hash))
+  if (spec_entry *found = specializations->find (&elt))
 return found->spec;
 
   return NULL_TREE;
@@ -1551,7 +1550,7 @@ register_specialization (tree spec, tree tmpl, tree args, bool is_friend,
   if (hash == 0)
 hash = spec_hasher::hash (&elt);
 
-  spec_entry **slot = decl_specializations->find_slot_with_hash (&elt, hash, INSERT);
+  spec_entry **slot = decl_specializations->find_slot (&elt, INSERT);
   if (*slot)
 fn = (*slot)->spec;
   else
@@ -1739,7 +1738,9 @@ spec_hasher::hash (tree tmpl, tree args)
 hashval_t
 spec_hasher::hash (spec_entry *e)
 {
-  return spec_hasher::hash (e->tmpl, e->args);
+  if (e->hash == 0)
+e->hash = hash (e->tmpl, e->args);
+  return e->hash;
 }
 
 /* Recursively calculate a hash value for a template argument ARG, for use
@@ -1973,7 +1974,6 @@ reregister_specialization (tree spec, tree tinfo, tree new_spec)
 
   elt.tmpl = most_general_template (TI_TEMPLATE (tinfo));
   elt.args = TI_ARGS (tinfo);
-  elt.spec = NULL_TREE;
 
   entry = decl_specializations->find (&elt);
   if (en

Re: [PATCH 2/2] c++: -Wdeprecated enables later standard deprecations

2024-10-03 Thread Jason Merrill

On 10/3/24 7:09 PM, Eric Gallager wrote:

On Thu, Oct 3, 2024 at 12:41 PM Jason Merrill  wrote:


By default -Wdeprecated warns about deprecations in the active standard.
When specified explicitly, let's also warn about deprecations in later
standards.


This strikes me as slightly dangerous. At the very least this should
get a note in the "Caveats" and/or "Porting To" section of the release
notes, as I can see the change breaking some builds that also use
-Werror.


Agreed, but do you think people are using explicit -Wdeprecated?  It's 
had little effect before this patch, since the flag is on by default.


Jason



Re: [PATCH 2/2] c++: -Wdeprecated enables later standard deprecations

2024-10-03 Thread Sam James
Jason Merrill  writes:

> On 10/3/24 7:09 PM, Eric Gallager wrote:
>> On Thu, Oct 3, 2024 at 12:41 PM Jason Merrill  wrote:
>>>
>>> By default -Wdeprecated warns about deprecations in the active standard.
>>> When specified explicitly, let's also warn about deprecations in later
>>> standards.
>> This strikes me as slightly dangerous. At the very least this should
>> get a note in the "Caveats" and/or "Porting To" section of the release
>> notes, as I can see the change breaking some builds that also use
>> -Werror.
>
> Agreed, but do you think people are using explicit -Wdeprecated?  It's
> had little effect before this patch, since the flag is on by default.

On a machine with with 2100 installed packages, I only see aflplusplus,
enchant, catch, gcc, and pam.

thanks,
sam


[PATCH] libstdc++/ranges: Implement various small LWG issues

2024-10-03 Thread Patrick Palka
Tested on x86_64-pc-linux-gnu, does this look OK for trunk/14 and perhaps
13?

-- >8 --

This implements the following small LWG issues:

  3848. adjacent_view, adjacent_transform_view and slide_view missing base 
accessor
  3851. chunk_view::inner-iterator missing custom iter_move and iter_swap
  3947. Unexpected constraints on adjacent_transform_view::base()
  4001. iota_view should provide empty
  4012. common_view::begin/end are missing the simple-view check
  4013. lazy_split_view::outer-iterator::value_type should not provide default 
constructor
  4035. single_view should provide empty
  4053. Unary call to std::views::repeat does not decay the argument
  4054. Repeating a repeat_view should repeat the view

libstdc++-v3/ChangeLog:

* include/std/ranges (single_view::empty): Define as per LWG 4035.
(iota_view::empty): Define as per LWG 4001.
(lazy_split_view::_OuterIter::value_type): Remove default
constructor and make other constructor private as per LWG 4013.
(common_view::begin): Disable non-const overload for simple
views as per LWG 4012.
(common_view::end): Likewise.
(adjacent_view::base): Define as per LWG 3848.
(adjacent_transform_view::base): Likewise.
(chunk_view::_InnerIter::iter_move): Define as per LWG 3851.
(chunk_view::_InnerIter::itep_swap): Likewise.
(slide_view::base): Define as per LWG 3848.
(repeat_view): Adjust deduction guide as per LWG 4053.
(_Repeat::operator()): Adjust single-parameter overload
as per LWG 4054.
* testsuite/std/ranges/adaptors/adjacent/1.cc: Verify existence
of base member function.
* testsuite/std/ranges/adaptors/adjacent_transform/1.cc: Likewise.
* testsuite/std/ranges/adaptors/chunk/1.cc: Test LWG 3851 example.
* testsuite/std/ranges/adaptors/slide/1.cc: Verify existence of
base member function.
* testsuite/std/ranges/iota/iota_view.cc: Test LWG 4001 example.
* testsuite/std/ranges/repeat/1.cc: Test LWG 4053/4054 examples.
---
 libstdc++-v3/include/std/ranges   | 84 +--
 .../std/ranges/adaptors/adjacent/1.cc |  3 +
 .../ranges/adaptors/adjacent_transform/1.cc   |  3 +
 .../testsuite/std/ranges/adaptors/chunk/1.cc  | 15 
 .../testsuite/std/ranges/adaptors/slide/1.cc  |  3 +
 .../testsuite/std/ranges/iota/iota_view.cc| 12 +++
 libstdc++-v3/testsuite/std/ranges/repeat/1.cc | 23 +
 7 files changed, 135 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 30f45e0a750..6e6e3b97d82 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -335,6 +335,12 @@ namespace ranges
   end() const noexcept
   { return data() + 1; }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4035. single_view should provide empty
+  static constexpr bool
+  empty() noexcept
+  { return false; }
+
   static constexpr size_t
   size() noexcept
   { return 1; }
@@ -695,6 +701,12 @@ namespace ranges
   end() const requires same_as<_Winc, _Bound>
   { return _Iterator{_M_bound}; }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4001. iota_view should provide empty
+  constexpr bool
+  empty() const
+  { return _M_value == _M_bound; }
+
   constexpr auto
   size() const
   requires (same_as<_Winc, _Bound> && __detail::__advanceable<_Winc>)
@@ -3349,14 +3361,17 @@ namespace views::__adaptor
  private:
_OuterIter _M_i = _OuterIter();
 
- public:
-   value_type() = default;
-
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 4013. lazy_split_view::outer-iterator::value_type should not
+   // provide default constructor
constexpr explicit
value_type(_OuterIter __i)
  : _M_i(std::move(__i))
{ }
 
+   friend _OuterIter;
+
+ public:
constexpr _InnerIter<_Const>
begin() const
{ return _InnerIter<_Const>{_M_i}; }
@@ -3948,8 +3963,10 @@ namespace views::__adaptor
   base() &&
   { return std::move(_M_base); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4012. common_view::begin/end are missing the simple-view check
   constexpr auto
-  begin()
+  begin() requires (!__detail::__simple_view<_Vp>)
   {
if constexpr (random_access_range<_Vp> && sized_range<_Vp>)
  return ranges::begin(_M_base);
@@ -3969,7 +3986,7 @@ namespace views::__adaptor
   }
 
   constexpr auto
-  end()
+  end() requires (!__detail::__simple_view<_Vp>)
   {
if constexpr (random_access_range<_Vp> && sized_range<_Vp>)
  return ranges::begin(_M_base) + ranges::size(_M_base);
@@ -5316,6 +5333,16 @@ namespace views::__adaptor
   : _M_base(std::move(__base))
 { }
 
+// _GLIBCXX_RESOLVE_LIB_DEFECTS
+// 3848. 

[PATCH] c++: Allow references to internal-linkage vars in C++11 [PR113266]

2024-10-03 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu, so far just dg.exp with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26,impcx.

OK for trunk if full bootstrap + regtest passes?

-- >8 --

[temp.arg.nontype] changed in C++11 to allow naming internal-linkage
variables and functions.  We currently already handle internal-linkage
functions, but variables were missed; this patch updates this.

PR c++/113266
PR c++/116911

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_argument): Allow
internal-linkage variables since C++11.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nontype6.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/parser.cc  | 17 -
 gcc/testsuite/g++.dg/cpp0x/nontype6.C | 19 +++
 2 files changed, 31 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/nontype6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 08f9c89f1f0..e758ddeb1d5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19864,9 +19864,11 @@ cp_parser_template_argument (cp_parser* parser)
 
  -- the name of a non-type template-parameter; or
 
- -- the name of an object or function with external linkage...
+ -- the name of an object or function with external (or internal,
+   since C++11) linkage...
 
- -- the address of an object or function with external linkage...
+ -- the address of an object or function with external (or internal,
+   since C++11) linkage...
 
  -- a pointer to member...  */
   /* Look for a non-type template parameter.  */
@@ -19929,11 +19931,16 @@ cp_parser_template_argument (cp_parser* parser)
probe = TREE_OPERAND (probe, 1);
  if (VAR_P (probe))
{
- /* A variable without external linkage might still be a
+ /* A variable without valid linkage might still be a
 valid constant-expression, so no error is issued here
 if the external-linkage check fails.  */
- if (!address_p && !DECL_EXTERNAL_LINKAGE_P (probe))
-   cp_parser_simulate_error (parser);
+ if (!address_p && cxx_dialect < cxx17)
+   {
+ linkage_kind linkage = decl_linkage (probe);
+ if (linkage != lk_external
+ && (cxx_dialect < cxx11 || linkage != lk_internal))
+   cp_parser_simulate_error (parser);
+   }
}
  else if (is_overloaded_fn (argument))
/* All overloaded functions are allowed; if the external
diff --git a/gcc/testsuite/g++.dg/cpp0x/nontype6.C 
b/gcc/testsuite/g++.dg/cpp0x/nontype6.C
new file mode 100644
index 000..5543d1e8b6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nontype6.C
@@ -0,0 +1,19 @@
+// PR c++/113266, PR c++/116911
+// { dg-do compile }
+
+template  struct a {};
+static int guard1;
+a b;  // { dg-error "constant-expression|invalid" "" { target 
c++98_only } }
+
+namespace {
+  int guard2;
+}
+a c;  // OK in C++98 because guard2 has external linkage
+  // OK since C++11 because we can refer to an internal linkage 
decl
+
+void nolinkage() {
+  static int guard3;
+  a d;  // { dg-error "constant-expression|invalid" "" { target 
c++98_only } }
+  // { dg-error "constant expression|no linkage" "" { target { c++11 && 
c++14_down } } .-1 }
+  // OK since C++17 since we can now refer to no-linkage decls
+}
-- 
2.46.0



Re: [PATCH] c++: Allow references to internal-linkage vars in C++11 [PR113266]

2024-10-03 Thread Jason Merrill

On 10/3/24 10:41 PM, Nathaniel Shead wrote:

Tested on x86_64-pc-linux-gnu, so far just dg.exp with
GXX_TESTSUITE_STDS=98,11,14,17,20,23,26,impcx.

OK for trunk if full bootstrap + regtest passes?

-- >8 --

[temp.arg.nontype] changed in C++11 to allow naming internal-linkage
variables and functions.  We currently already handle internal-linkage
functions, but variables were missed; this patch updates this.

PR c++/113266
PR c++/116911

gcc/cp/ChangeLog:

* parser.cc (cp_parser_template_argument): Allow
internal-linkage variables since C++11.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nontype6.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/parser.cc  | 17 -
  gcc/testsuite/g++.dg/cpp0x/nontype6.C | 19 +++
  2 files changed, 31 insertions(+), 5 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/nontype6.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 08f9c89f1f0..e758ddeb1d5 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19864,9 +19864,11 @@ cp_parser_template_argument (cp_parser* parser)
  
   -- the name of a non-type template-parameter; or
  
- -- the name of an object or function with external linkage...

+ -- the name of an object or function with external (or internal,
+   since C++11) linkage...
  
- -- the address of an object or function with external linkage...

+ -- the address of an object or function with external (or internal,
+   since C++11) linkage...
  
   -- a pointer to member...  */

/* Look for a non-type template parameter.  */
@@ -19929,11 +19931,16 @@ cp_parser_template_argument (cp_parser* parser)
probe = TREE_OPERAND (probe, 1);
  if (VAR_P (probe))
{
- /* A variable without external linkage might still be a
+ /* A variable without valid linkage might still be a
 valid constant-expression, so no error is issued here
 if the external-linkage check fails.  */
- if (!address_p && !DECL_EXTERNAL_LINKAGE_P (probe))
-   cp_parser_simulate_error (parser);
+ if (!address_p && cxx_dialect < cxx17)


I think you can't get here for cxx17+ because we skipped over this code 
with goto general_expr.  OK without the cxx_dialect check.



+   {
+ linkage_kind linkage = decl_linkage (probe);
+ if (linkage != lk_external
+ && (cxx_dialect < cxx11 || linkage != lk_internal))
+   cp_parser_simulate_error (parser);
+   }
}
  else if (is_overloaded_fn (argument))
/* All overloaded functions are allowed; if the external
diff --git a/gcc/testsuite/g++.dg/cpp0x/nontype6.C 
b/gcc/testsuite/g++.dg/cpp0x/nontype6.C
new file mode 100644
index 000..5543d1e8b6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/nontype6.C
@@ -0,0 +1,19 @@
+// PR c++/113266, PR c++/116911
+// { dg-do compile }
+
+template  struct a {};
+static int guard1;
+a b;  // { dg-error "constant-expression|invalid" "" { target 
c++98_only } }
+
+namespace {
+  int guard2;
+}
+a c;  // OK in C++98 because guard2 has external linkage
+  // OK since C++11 because we can refer to an internal linkage 
decl
+
+void nolinkage() {
+  static int guard3;
+  a d;  // { dg-error "constant-expression|invalid" "" { target 
c++98_only } }
+  // { dg-error "constant expression|no linkage" "" { target { c++11 && 
c++14_down } } .-1 }
+  // OK since C++17 since we can now refer to no-linkage decls
+}




Re: [PATCH] c++: Return the underlying decl rather than the USING_DECL from update_binding [PR116913]

2024-10-03 Thread Jason Merrill

On 10/3/24 8:52 PM, Nathaniel Shead wrote:

Tested on x86_64-pc-linux-gnu (so far just dg.exp), OK for trunk if full
bootstrap + regtest passes?


OK.


-- >8 --

Users of pushdecl assume that the returned decl will be a possibly
updated decl matching the one that was passed in.  My r15-3910 change
broke this since in some cases we would now return USING_DECLs; this
patch fixes the situation.

PR c++/116913

gcc/cp/ChangeLog:

* name-lookup.cc (update_binding): Return the strip_using'd old
decl rather than the binding.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/using70.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/name-lookup.cc |  4 ++--
  gcc/testsuite/g++.dg/lookup/using70.C | 13 +
  2 files changed, 15 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/lookup/using70.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 4754ef5a522..609bd6e8c9b 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3101,7 +3101,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
{
  if (same_type_p (TREE_TYPE (old), TREE_TYPE (decl)))
/* Two type decls to the same type.  Do nothing.  */
-   return old_bval;
+   return old;
  else
goto conflict;
}
@@ -3114,7 +3114,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
  
  	  /* The new one must be an alias at this point.  */

  gcc_assert (DECL_NAMESPACE_ALIAS (decl));
- return old_bval;
+ return old;
}
else if (TREE_CODE (old) == VAR_DECL)
{
diff --git a/gcc/testsuite/g++.dg/lookup/using70.C 
b/gcc/testsuite/g++.dg/lookup/using70.C
new file mode 100644
index 000..14838eea7ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/using70.C
@@ -0,0 +1,13 @@
+// PR c++/116913
+// { dg-do compile { target c++11 } }
+
+namespace ns {
+  struct c {};
+  using d = int;
+}
+
+using ns::c;
+using ns::d;
+
+using c = ns::c;
+using d = ns::d;




Re: [PATCH] gcc-wwwdocs: Mention check-c++-all target for C++ front end patch testing

2024-10-03 Thread Jason Merrill

On 10/2/24 4:51 AM, Simon Martin wrote:

This is a follow-up to the discussion about testing changes to the C++
front end in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664258.html

It also clarifies that the make invocation examples should be made from
the *build* tree.

Validated fine via https://validator.w3.org.


OK.


---
  htdocs/contribute.html | 10 +-
  1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/htdocs/contribute.html b/htdocs/contribute.html
index 53c27c6e..3ab65323 100644
--- a/htdocs/contribute.html
+++ b/htdocs/contribute.html
@@ -111,9 +111,17 @@ For a normal native configuration, running
  make bootstrap
  make -k check
  
-from the top level of the GCC tree (not the
+from the top level of the GCC build tree (not the
  gcc subdirectory) will accomplish this.
  
+If your change is to the C++ front end, you need to run the C++ testsuite

+in all standard conformance levels. For a normal native configuration,
+running
+
+make -C gcc -k check-c++-all
+
+from the top level of the GCC build tree will accomplish this.
+
  If your change is to a front end other than the C or C++ front end,
  or a runtime library other than libgcc, you need to verify
  only that the runtime library for that language still builds and the




Re: [PATCH] gcc-wwwdocs: Mention check-c++-all target for C++ front end patch testing

2024-10-03 Thread Sam James
Simon Martin  writes:

> This is a follow-up to the discussion about testing changes to the C++
> front end in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664258.html
>
> It also clarifies that the make invocation examples should be made from
> the *build* tree.
>
> Validated fine via https://validator.w3.org.

I've added this to the wiki at https://gcc.gnu.org/wiki/Testing_GCC too, thanks.

> ---
>  htdocs/contribute.html | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> [...]

thanks,
sam


[COMMITTED] gcc: fix typo in gimplify

2024-10-03 Thread Sam James
gcc/ChangeLog:

* gimplify.cc (gimple_add_init_for_auto_var): Fix 'variable' typo.
---
Committed as obvious.

 gcc/gimplify.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index ceb53e5d5bb7..dd7efa71b742 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -2019,7 +2019,7 @@ gimple_add_init_for_auto_var (tree decl,
   gimplify_assign (decl, call, seq_p);
 }
 
-/* Generate padding initialization for automatic vairable DECL.
+/* Generate padding initialization for automatic variable DECL.
C guarantees that brace-init with fewer initializers than members
aggregate will initialize the rest of the aggregate as-if it were
static initialization.  In turn static initialization guarantees
-- 
2.46.2



[COMMITTED] testsuite: gnat.dg: fix dg-do directive syntax

2024-10-03 Thread Sam James
Fix incorrect use of '[' rather than '{' in 'dg-...' directives.

gcc/testsuite/ChangeLog:

* gnat.dg/pack13.adb: Fix 'dg-...' directive syntax.
* gnat.dg/size_attribute.adb: Ditto.
* gnat.dg/subp_elim_errors.adb: Ditto.
---
Committed as obvious.

 gcc/testsuite/gnat.dg/pack13.adb   | 2 +-
 gcc/testsuite/gnat.dg/size_attribute.adb   | 2 +-
 gcc/testsuite/gnat.dg/subp_elim_errors.adb | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gnat.dg/pack13.adb b/gcc/testsuite/gnat.dg/pack13.adb
index dd9cb09cf7b6..6581a53f9987 100644
--- a/gcc/testsuite/gnat.dg/pack13.adb
+++ b/gcc/testsuite/gnat.dg/pack13.adb
@@ -1,4 +1,4 @@
--- [ dg-do compile }
+-- { dg-do compile }
 
 package body Pack13 is
 
diff --git a/gcc/testsuite/gnat.dg/size_attribute.adb 
b/gcc/testsuite/gnat.dg/size_attribute.adb
index 25642e0b0aad..ec655f67a868 100644
--- a/gcc/testsuite/gnat.dg/size_attribute.adb
+++ b/gcc/testsuite/gnat.dg/size_attribute.adb
@@ -1,5 +1,5 @@
 -- PR middle-end/35823
--- { dg-do compile ]
+-- { dg-do compile }
 
 procedure Size_Attribute (Arg : in String) is
Size : constant Natural := Arg'Size;
diff --git a/gcc/testsuite/gnat.dg/subp_elim_errors.adb 
b/gcc/testsuite/gnat.dg/subp_elim_errors.adb
index 669e8772117e..1b8c3f23054d 100644
--- a/gcc/testsuite/gnat.dg/subp_elim_errors.adb
+++ b/gcc/testsuite/gnat.dg/subp_elim_errors.adb
@@ -1,4 +1,4 @@
--- [ dg-do compile }
+-- { dg-do compile }
 
 with System;
 
-- 
2.46.2



[PATCH 0/3] aarch64: Clean warnings in libgcc

2024-10-03 Thread Christophe Lyon
These patches fix several warnings which appeared when building libgcc
for aarch64.

* Patch 1 fixes a redefinition of macro 'L' in lse.S after a recent
  patch.

* Patch 2 adds prototypes to avoid a warning emitted because we build
  with -Wmissing-prototypes.

* Patch 3 adds support for -Werror in configure/Makefile, so that it
  will now be enabled when the top-level configure is invoked with
  --enable-werror, or when doing a bootstrap.  The patch only affects
  aarch64 to avoid breaking bootstrap for other architectures.

Thanks,

Christophe

Christophe Lyon (3):
  aarch64: libgcc: Cleanup warnings in lse.S
  aarch64: libgcc: add prototypes in cpuinfo
  aarch64: libgcc: Add -Werror support

 libgcc/Makefile.in  |  1 +
 libgcc/config/aarch64/cpuinfo.c |  2 ++
 libgcc/config/aarch64/lse.S |  4 
 libgcc/config/aarch64/t-aarch64 |  1 +
 libgcc/configure| 31 +++
 libgcc/configure.ac |  5 +
 6 files changed, 44 insertions(+)

-- 
2.34.1



[PATCH 3/3] aarch64: libgcc: Add -Werror support

2024-10-03 Thread Christophe Lyon
When --enable-werror is enabled when running the top-level configure,
it passes --enable-werror-always to subdirs.  Some of them, like
libgcc, ignore it.

This patch adds support for it, enabled only for aarch64, to avoid
breaking bootstrap for other targets.

The patch also adds -Wno-prio-ctor-dtor to avoid a warning when compiling 
lse_init.c

libgcc/
* Makefile.in (WERROR): New.
* config/aarch64/t-aarch64: Handle WERROR. Always use
-Wno-prio-ctor-dtor.
* configure.ac: Add support for --enable-werror-always.
* configure: Regenerate.
---
 libgcc/Makefile.in  |  1 +
 libgcc/config/aarch64/t-aarch64 |  1 +
 libgcc/configure| 31 +++
 libgcc/configure.ac |  5 +
 4 files changed, 38 insertions(+)

diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 0e46e9ef768..eca62546642 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -84,6 +84,7 @@ AR_FLAGS = rc
 
 CC = @CC@
 CFLAGS = @CFLAGS@
+WERROR = @WERROR@
 RANLIB = @RANLIB@
 LN_S = @LN_S@
 
diff --git a/libgcc/config/aarch64/t-aarch64 b/libgcc/config/aarch64/t-aarch64
index b70e7b94edd..ae1588ce307 100644
--- a/libgcc/config/aarch64/t-aarch64
+++ b/libgcc/config/aarch64/t-aarch64
@@ -30,3 +30,4 @@ LIB2ADDEH += \
$(srcdir)/config/aarch64/__arm_za_disable.S
 
 SHLIB_MAPFILES += $(srcdir)/config/aarch64/libgcc-sme.ver
+LIBGCC2_CFLAGS += $(WERROR) -Wno-prio-ctor-dtor
diff --git a/libgcc/configure b/libgcc/configure
index cff1eff9625..ae56f7dbdc9 100755
--- a/libgcc/configure
+++ b/libgcc/configure
@@ -592,6 +592,7 @@ enable_execute_stack
 asm_hidden_op
 extra_parts
 cpu_type
+WERROR
 get_gcc_base_ver
 HAVE_STRUB_SUPPORT
 thread_header
@@ -719,6 +720,7 @@ enable_tm_clone_registry
 with_glibc_version
 enable_tls
 with_gcc_major_version_only
+enable_werror_always
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1361,6 +1363,7 @@ Optional Features:
   installations without PT_GNU_EH_FRAME support
   --disable-tm-clone-registrydisable TM clone registry
   --enable-tlsUse thread-local storage [default=yes]
+  --enable-werror-always  enable -Werror despite compiler version
 
 Optional Packages:
   --with-PACKAGE[=ARG]use PACKAGE [ARG=yes]
@@ -5808,6 +5811,34 @@ fi
 
 
 
+# Only enable with --enable-werror-always until existing warnings are
+# corrected.
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext 
$LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+WERROR=
+# Check whether --enable-werror-always was given.
+if test "${enable_werror_always+set}" = set; then :
+  enableval=$enable_werror_always;
+else
+  enable_werror_always=no
+fi
+
+if test $enable_werror_always = yes; then :
+  WERROR="$WERROR${WERROR:+ }-Werror"
+fi
+
+ac_ext=c
+ac_cpp='$CPP $CPPFLAGS'
+ac_compile='$CC -c $CFLAGS $CPPFLAGS conftest.$ac_ext >&5'
+ac_link='$CC -o conftest$ac_exeext $CFLAGS $CPPFLAGS $LDFLAGS conftest.$ac_ext 
$LIBS >&5'
+ac_compiler_gnu=$ac_cv_c_compiler_gnu
+
+
+
 # Substitute configuration variables
 
 
diff --git a/libgcc/configure.ac b/libgcc/configure.ac
index 4e8c036990f..6b3ea2aea5c 100644
--- a/libgcc/configure.ac
+++ b/libgcc/configure.ac
@@ -13,6 +13,7 @@ sinclude(../config/unwind_ipinfo.m4)
 sinclude(../config/gthr.m4)
 sinclude(../config/sjlj.m4)
 sinclude(../config/cet.m4)
+sinclude(../config/warnings.m4)
 
 AC_INIT([GNU C Runtime Library], 1.0,,[libgcc])
 AC_CONFIG_SRCDIR([static-object.mk])
@@ -746,6 +747,10 @@ AC_SUBST(HAVE_STRUB_SUPPORT)
 # Determine what GCC version number to use in filesystem paths.
 GCC_BASE_VER
 
+# Only enable with --enable-werror-always until existing warnings are
+# corrected.
+ACX_PROG_CC_WARNINGS_ARE_ERRORS([manual])
+
 # Substitute configuration variables
 AC_SUBST(cpu_type)
 AC_SUBST(extra_parts)
-- 
2.34.1



[PATCH 1/3] aarch64: libgcc: Cleanup warnings in lse.S

2024-10-03 Thread Christophe Lyon
Since
  Commit c608ada288ced0268c1fd4136f56c34b24d4
  Author: Zac Walker 
  CommitDate: 2024-01-23 15:32:30 +

  Ifdef `.hidden`, `.type`, and `.size` pseudo-ops for `aarch64-w64-mingw32` 
target

lse.S includes aarch64-asm.h, leading to a conflicting definition of macro 'L':
- in lse.S it expands to either '' or 'L'
- in aarch64-asm.h it is used to generate .L ## label

lse.S does not use the second, so this patch just undefines L after
the inclusion of aarch64-asm.h.

libgcc/
* config/aarch64/lse.S: Undefine L() macro.
---
 libgcc/config/aarch64/lse.S | 4 
 1 file changed, 4 insertions(+)

diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index ecef47086c6..77b3dc5a981 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -54,6 +54,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #include "aarch64-asm.h"
 #include "auto-target.h"
 
+/* L is defined in aarch64-asm.h for a different purpose than why we
+   use it here.  */
+#undef L
+
 /* Tell the assembler to accept LSE instructions.  */
 #ifdef HAVE_AS_LSE
.arch armv8-a+lse
-- 
2.34.1



[PATCH 2/3] aarch64: libgcc: add prototypes in cpuinfo

2024-10-03 Thread Christophe Lyon
Add prototypes for __init_cpu_features_resolver and
__init_cpu_features to avoid warnings due to -Wmissing-prototypes.

libgcc/
* config/aarch64/cpuinfo.c (__init_cpu_features_resolver): Add
prototype.
(__init_cpu_features): Likewise.
---
 libgcc/config/aarch64/cpuinfo.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgcc/config/aarch64/cpuinfo.c b/libgcc/config/aarch64/cpuinfo.c
index 4b94fca8695..c62a7453e8e 100644
--- a/libgcc/config/aarch64/cpuinfo.c
+++ b/libgcc/config/aarch64/cpuinfo.c
@@ -418,6 +418,7 @@ __init_cpu_features_constructor(unsigned long hwcap,
   setCPUFeature(FEAT_INIT);
 }
 
+void __init_cpu_features_resolver(unsigned long, const __ifunc_arg_t *);
 void
 __init_cpu_features_resolver(unsigned long hwcap, const __ifunc_arg_t *arg) {
   if (__aarch64_cpu_features.features)
@@ -425,6 +426,7 @@ __init_cpu_features_resolver(unsigned long hwcap, const 
__ifunc_arg_t *arg) {
   __init_cpu_features_constructor(hwcap, arg);
 }
 
+void __init_cpu_features(void);
 void __attribute__ ((constructor))
 __init_cpu_features(void) {
   unsigned long hwcap;
-- 
2.34.1



Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-03 Thread Maciej W. Rozycki
On Thu, 3 Oct 2024, Jeff Law wrote:

> We can remove a couple of XFAILs in the rv32 space as it's behaving much more
> like rv64 at this point.

 I'm glad to see them gone.  I have a couple of concerns with your change 
though.

 Given:

* gcc.target/riscv/cset-sext.c: Similarly.  No longer allow
"not" in asm output.

and:

+/* { dg-final { scan-assembler-not "\\sneg\\s" } } */

I think the assembly snippet in the comment has to be updated accordingly.  
Also I guess s/not/neg/ for the ChangeLog entry.

 More importantly may I ask you to review the second paragraph of commit 
6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to 
see if any of the other issues referred there have also been now sorted 
and mention that in the change description, possibly with a commit hash 
reference to Andrew P's recent improvements?  And in particular can the 
branch costs requested be lowered for gcc.target/riscv/cset-sext.c now?

> Tested in my tester on rv64gcv and rv32gc.  Will wait for the pre-commit
> testers to render their verdict before moving forward.

 Can you please address my concerns too before moving forward?

  Maciej


Re: [to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-03 Thread Andrew Pinski
On Thu, Oct 3, 2024 at 4:41 PM Maciej W. Rozycki  wrote:
>
> On Thu, 3 Oct 2024, Jeff Law wrote:
>
> > We can remove a couple of XFAILs in the rv32 space as it's behaving much 
> > more
> > like rv64 at this point.
>
>  I'm glad to see them gone.  I have a couple of concerns with your change
> though.
>
>  Given:
>
> * gcc.target/riscv/cset-sext.c: Similarly.  No longer allow
> "not" in asm output.
>
> and:
>
> +/* { dg-final { scan-assembler-not "\\sneg\\s" } } */
>
> I think the assembly snippet in the comment has to be updated accordingly.
> Also I guess s/not/neg/ for the ChangeLog entry.
>
>  More importantly may I ask you to review the second paragraph of commit
> 6c3365e715fa ("RISC-V: Also handle sign extension in branch costing") to
> see if any of the other issues referred there have also been now sorted
> and mention that in the change description, possibly with a commit hash
> reference to Andrew P's recent improvements?  And in particular can the
> branch costs requested be lowered for gcc.target/riscv/cset-sext.c now?

I suspect it is r15-3992-g698e0ec89bc096 . If so then if you change
the function in cset-sext.c to be:
```
_Bool
foo (long a, long b)
{
  if (!b)
return 0;
  else if (a)
return 1;
  else
return 0;
}
```
You get the same code in GCC 14 as you would get with the unmodified
testcase now (13 didn't have -mmovcc).

Thanks,
Andrew Pinski

>
> > Tested in my tester on rv64gcv and rv32gc.  Will wait for the pre-commit
> > testers to render their verdict before moving forward.
>
>  Can you please address my concerns too before moving forward?
>
>   Maciej


Re: [PATCH 2/2] c++: -Wdeprecated enables later standard deprecations

2024-10-03 Thread Eric Gallager
On Thu, Oct 3, 2024 at 12:41 PM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> -- 8< --
>
> By default -Wdeprecated warns about deprecations in the active standard.
> When specified explicitly, let's also warn about deprecations in later
> standards.
>
> gcc/c-family/ChangeLog:
>
> * c-opts.cc (c_common_post_options): Explicit -Wdeprecated enables
> deprecations from later standards.
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Explicit -Wdeprecated enables more warnings.
> ---

This strikes me as slightly dangerous. At the very least this should
get a note in the "Caveats" and/or "Porting To" section of the release
notes, as I can see the change breaking some builds that also use
-Werror.

>  gcc/doc/invoke.texi| 22 --
>  gcc/c-family/c-opts.cc | 17 -
>  2 files changed, 28 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c90f5b4d58e..d38c1feb86f 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -3864,8 +3864,10 @@ for code that is not valid in C++23 but used to be 
> valid but deprecated
>  in C++20 with a pedantic warning that can be disabled with
>  @option{-Wno-comma-subscript}.
>
> -Enabled by default with @option{-std=c++20} unless @option{-Wno-deprecated},
> -and with @option{-std=c++23} regardless of @option{-Wno-deprecated}.
> +Enabled by default with @option{-std=c++20} unless
> +@option{-Wno-deprecated}, and after @option{-std=c++23} regardless of
> +@option{-Wno-deprecated}.  Before @option{-std=c++20}, enabled with
> +explicit @option{-Wdeprecated}.
>
>  This warning is upgraded to an error by @option{-pedantic-errors} in
>  C++23 mode or later.
> @@ -4012,7 +4014,7 @@ int k = f - e;
>
>  @option{-Wdeprecated-enum-enum-conversion} is enabled by default with
>  @option{-std=c++20}.  In pre-C++20 dialects, this warning can be enabled
> -by @option{-Wenum-conversion}.
> +by @option{-Wenum-conversion} or @option{-Wdeprecated}.
>
>  @opindex Wdeprecated-enum-float-conversion
>  @opindex Wno-deprecated-enum-float-conversion
> @@ -4030,14 +4032,14 @@ bool b = e <= 3.7;
>
>  @option{-Wdeprecated-enum-float-conversion} is enabled by default with
>  @option{-std=c++20}.  In pre-C++20 dialects, this warning can be enabled
> -by @option{-Wenum-conversion}.
> +by @option{-Wenum-conversion} or @option{-Wdeprecated}.
>
>  @opindex Wdeprecated-literal-operator
>  @opindex Wno-deprecated-literal-operator
>  @item -Wdeprecated-literal-operator @r{(C++ and Objective-C++ only)}
>  Warn that the declaration of a user-defined literal operator with a
>  space before the suffix is deprecated.  This warning is enabled by
> -default in C++23.
> +default in C++23, or with explicit @option{-Wdeprecated}.
>
>  @smallexample
>  string operator "" _i18n(const char*, std::size_t); // deprecated
> @@ -4740,7 +4742,8 @@ non-class type, @code{volatile}-qualified function 
> return type,
>  @code{volatile}-qualified parameter type, and structured bindings of a
>  @code{volatile}-qualified type.  This usage was deprecated in C++20.
>
> -Enabled by default with @option{-std=c++20}.
> +Enabled by default with @option{-std=c++20}.  Before
> +@option{-std=c++20}, enabled with explicit @option{-Wdeprecated}.
>
>  @opindex Wzero-as-null-pointer-constant
>  @opindex Wno-zero-as-null-pointer-constant
> @@ -10389,6 +10392,13 @@ disable the error when compiled with 
> @option{-Werror} flag.
>  @item -Wno-deprecated
>  Do not warn about usage of deprecated features.  @xref{Deprecated Features}.
>
> +In C++, explicitly specifying @option{-Wdeprecated} also enables
> +warnings about some features that are deprecated in later language
> +standards, specifically @option{-Wcomma-subscript},
> +@option{-Wvolatile}, @option{-Wdeprecated-enum-float-conversion},
> +@option{-Wdeprecated-enum-enum-conversion}, and
> +@option{-Wdeprecated-literal-operator}.
> +
>  @opindex Wno-deprecated-declarations
>  @opindex Wdeprecated-declarations
>  @item -Wno-deprecated-declarations
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index 8ff3d966bb6..510e0870140 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -996,30 +996,37 @@ c_common_post_options (const char **pfilename)
>SET_OPTION_IF_UNSET (&global_options, &global_options_set, warn_register,
>cxx_dialect >= cxx17);
>
> +  /* Explicit -Wdeprecated turns on warnings from later standards.  */
> +  auto deprecated_in = [&](enum cxx_dialect d)
> +  {
> +if (OPTION_SET_P (warn_deprecated)) return !!warn_deprecated;
> +return (warn_deprecated && cxx_dialect >= d);
> +  };
> +
>/* -Wcomma-subscript is enabled by default in C++20.  */
>SET_OPTION_IF_UNSET (&global_options, &global_options_set,
>warn_comma_subscript,
>cxx_dialect >= cxx23
> -  || (cxx_dialect == cxx20 && warn_deprecated));
> +

[PATCH] c++: Return the underlying decl rather than the USING_DECL from update_binding [PR116913]

2024-10-03 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu (so far just dg.exp), OK for trunk if full
bootstrap + regtest passes?

-- >8 --

Users of pushdecl assume that the returned decl will be a possibly
updated decl matching the one that was passed in.  My r15-3910 change
broke this since in some cases we would now return USING_DECLs; this
patch fixes the situation.

PR c++/116913

gcc/cp/ChangeLog:

* name-lookup.cc (update_binding): Return the strip_using'd old
decl rather than the binding.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/using70.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/name-lookup.cc |  4 ++--
 gcc/testsuite/g++.dg/lookup/using70.C | 13 +
 2 files changed, 15 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/lookup/using70.C

diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
index 4754ef5a522..609bd6e8c9b 100644
--- a/gcc/cp/name-lookup.cc
+++ b/gcc/cp/name-lookup.cc
@@ -3101,7 +3101,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
{
  if (same_type_p (TREE_TYPE (old), TREE_TYPE (decl)))
/* Two type decls to the same type.  Do nothing.  */
-   return old_bval;
+   return old;
  else
goto conflict;
}
@@ -3114,7 +3114,7 @@ update_binding (cp_binding_level *level, cxx_binding 
*binding, tree *slot,
 
  /* The new one must be an alias at this point.  */
  gcc_assert (DECL_NAMESPACE_ALIAS (decl));
- return old_bval;
+ return old;
}
   else if (TREE_CODE (old) == VAR_DECL)
{
diff --git a/gcc/testsuite/g++.dg/lookup/using70.C 
b/gcc/testsuite/g++.dg/lookup/using70.C
new file mode 100644
index 000..14838eea7ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/using70.C
@@ -0,0 +1,13 @@
+// PR c++/116913
+// { dg-do compile { target c++11 } }
+
+namespace ns {
+  struct c {};
+  using d = int;
+}
+
+using ns::c;
+using ns::d;
+
+using c = ns::c;
+using d = ns::d;
-- 
2.46.0



Re: [PATCH] libstdc++: Workaround glibc header on ia64-linux

2024-10-03 Thread Frank Scheiner

On 01.10.24 18:02, Jonathan Wakely wrote:

On Tue, 1 Oct 2024 at 16:53, Frank Scheiner  wrote:

Though I don't understand why. From the error message it sounds like 'u'
was replaced with '(' before the __ctx macro could do its job.

But Joseph also wrote that it "prepends __ in standards conformance
modes" in [3]. And this might not be the case for these specific tests,
so the __ctx macro might not have any effect here.


It has no effect here. G++ unconditionally defines _GNU_SOURCE which
means the __ctx macro does nothing.


Thanks for the clarification. I could retrace that via [1].

[1]:
https://github.com/linux-ia64/glibc-ia64/blob/release/2.39/master-w-ia64/include/features.h#L201

In the meantime I also ran the build tests (cross-build a T2 "base"
package selection) with and w/o modifications on the glibc side ('bits'
=> '__bits' and 'u' => '__u' in [2] and [3]) and couldn't find some
obvious differences.

[2]:
https://github.com/linux-ia64/glibc-ia64/blob/master-epic/sysdeps/unix/sysv/linux/ia64/bits/sigcontext.h

[3]:
https://github.com/linux-ia64/glibc-ia64/blob/master-epic/sysdeps/unix/sysv/linux/ia64/sys/ucontext.h



Jonathan, thanks for pointing me at these failing libstdc++ tests and
their possible reasons. I'll send v2 of the patch right away.

Cheers,
Frank


[PATCH v2] libstdc++: Workaround glibc headers on ia64-linux

2024-10-03 Thread Frank Scheiner

We see:

```
FAIL: 17_intro/names.cc  -std=gnu++17 (test for excess errors)
FAIL: 17_intro/names_pstl.cc  -std=gnu++17 (test for excess errors)
FAIL: experimental/names.cc  -std=gnu++17 (test for excess errors)
```

...on ia64-linux.

This is due to:

* /usr/include/bits/sigcontext.h:32-38:
```
32 struct __ia64_fpreg
33   {
34 union
35   {
36 unsigned long bits[2];
37   } u;
38   } __attribute__ ((__aligned__ (16)));
```

* /usr/include/sys/ucontext.h:39-45:
```
 39 struct __ia64_fpreg_mcontext
 40   {
 41 union
 42   {
 43 unsigned long __ctx(bits)[2];
 44   } __ctx(u);
 45   } __attribute__ ((__aligned__ (16)));
```

...from glibc 2.39 (w/ia64 support re-added). See the discussion
starting on [1].

[1]: https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654487.html

The following patch adds a workaround for this on the libstdc++
testsuite side.

Signed-off-by: Frank Scheiner 

---
v2: Fix typo in title.
 libstdc++-v3/testsuite/17_intro/names.cc | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/libstdc++-v3/testsuite/17_intro/names.cc
b/libstdc++-v3/testsuite/17_intro/names.cc
index 9b0ffcb50b2..b45aefe1ccf 100644
--- a/libstdc++-v3/testsuite/17_intro/names.cc
+++ b/libstdc++-v3/testsuite/17_intro/names.cc
@@ -265,6 +265,12 @@
 #undef j
 #endif

+#if defined (__linux__) && defined (__ia64__)
+//  defines __ia64_fpreg::u
+//  defines __ia64_fpreg_mcontext::u
+#undef u
+#endif
+
 #if defined (__linux__) && defined (__powerpc__)
 //  defines __vector128::u
 #undef u
--
2.45.2



Re: [PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-10-03 Thread Segher Boessenkool
On Thu, Mar 21, 2024 at 06:21:48PM +0530, jeevitha wrote:
Hi!

> The following patch has been bootstrapped and regtested on powerpc64le-linux.

Please send v2 patches as their own, new thread.  Replies are for
replies (duh), and for patch series.  If you mix several versions in one
thread things become much, much harder to deal with.

> PTImode assists in generating even/odd register pairs on 128 bits. When the 
> user 
> specifies PTImode as an attribute, it breaks because there is no internal 
> type 
> to handle this mode. To address this, we have created a tree node with dummy 
> type
> to handle PTImode. We are not documenting this dummy type since users are not
> allowed to use this type externally.

Like discussed before, do not say this.  Users are perfectly well
allowed to use whatever type they want.  But we don't *encourage* using
this type, a very different thing.

> 2024-03-21  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110411

That is the wrong PR #.  To prevent such things, never copy such lines,
always type them from scratch.  It is a very short line anyway!

>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add
>   RS6000_BTI_INTPTI.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.

Please don't break lines early.  Changelog lines are 80 positions wide.

> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.

> +  /* PTImode to get even/odd register pairs.  */
> +  intPTI_type_internal_node = make_node(INTEGER_TYPE);
> +  TYPE_PRECISION (intPTI_type_internal_node) = GET_MODE_BITSIZE (PTImode);
> +  layout_type (intPTI_type_internal_node);
> +  SET_TYPE_MODE (intPTI_type_internal_node, PTImode);
> +  t = build_qualified_type (intPTI_type_internal_node, TYPE_QUAL_CONST);
> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
> +   "__dummypti");

Please use a real name, not something "dummy".  It is a real type after
all!

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
> @@ -0,0 +1,15 @@
> +/* PR target/106895 */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-options "-O2" } */
> +
> +/* Verify the following generates even/odd register pairs.  */
> +
> +typedef __int128 pti __attribute__((mode(PTI)));
> +
> +void
> +set128 (pti val, pti *mem)
> +{
> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
> +}
> +
> +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */

Please use {} quoting, and no backslashes.  Also use \m and \M.

Or something like
  scan-assembler { \mstq .*[02468], }
(you do not have to match the things you don't care about, and you only
need to look at the last digit to see if a number is even).

The patch looks good otherwise, but please fix these things and repost.
In a new thread :-)

Thanks,


Segher


Re: [PATCH v2] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-10-03 Thread Peter Bergner
On 10/3/24 2:24 PM, Segher Boessenkool wrote:
> On Thu, Mar 21, 2024 at 06:21:48PM +0530, jeevitha wrote:
>> PTImode assists in generating even/odd register pairs on 128 bits. When the 
>> user 
>> specifies PTImode as an attribute, it breaks because there is no internal 
>> type 
>> to handle this mode. To address this, we have created a tree node with dummy 
>> type
>> to handle PTImode. We are not documenting this dummy type since users are not
>> allowed to use this type externally.
> 
> Like discussed before, do not say this.  Users are perfectly well
> allowed to use whatever type they want.  But we don't *encourage* using
> this type, a very different thing.

I think a simple s/allowed/encouraged/ should suffice.




>> +  lang_hooks.types.register_builtin_type (intPTI_type_internal_node,
>> +  "__dummypti");
> 
> Please use a real name, not something "dummy".  It is a real type after
> all!

Segher, how about __internal_pti or __pti_internal instead?



>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr106895.c
>> @@ -0,0 +1,15 @@
>> +/* PR target/106895 */
>> +/* { dg-require-effective-target int128 } */
>> +/* { dg-options "-O2" } */
>> +
>> +/* Verify the following generates even/odd register pairs.  */
>> +
>> +typedef __int128 pti __attribute__((mode(PTI)));
>> +
>> +void
>> +set128 (pti val, pti *mem)
>> +{
>> +asm("stq %1,%0" : "=m"(*mem) : "r"(val));
>> +}
>> +
>> +/* { dg-final { scan-assembler "stq \[123\]?\[02468\]" } } */
> 
> Please use {} quoting, and no backslashes.  Also use \m and \M.
> 
> Or something like
>   scan-assembler { \mstq .*[02468], }
> (you do not have to match the things you don't care about, and you only
> need to look at the last digit to see if a number is even).

I think a better idea is to change this to a { dg-do assemble } test case,
since the assembler will verify that the register number is even and will
also verify the offset is valid too.  Then the dg-final can be just:

/* { dg-final { scan-assembler {\mstq\M} } } */

Peter




Re: [PATCH 2/3] Release expanded template argument vector

2024-10-03 Thread Jason Merrill

On 10/3/24 12:38 PM, Jason Merrill wrote:

On 10/2/24 7:50 AM, Richard Biener wrote:

This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?


I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.


OK, apparently that was both too clever and not clever enough. 
Replacing it with this one that's much closer to yours.


JasonFrom d77f073ce66cedbcbb22357c49b9ef19e1b61a43 Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Thu, 3 Oct 2024 16:31:00 -0400
Subject: [PATCH] c++: free garbage vec in coerce_template_parms
To: gcc-patches@gcc.gnu.org

coerce_template_parms can create two different vecs for the inner template
arguments, new_inner_args and (potentially) the result of
expand_template_argument_pack.  One or the other, or possibly both, end up
being garbage: in the typical case, the expanded vec is garbage because it's
only used as the source for convert_template_argument.  In some dependent
cases, the new vec is garbage because we decide to return the original args
instead.  In these cases, ggc_free the garbage vec to reduce the memory
overhead of overload resolution.

gcc/cp/ChangeLog:

	* pt.cc (coerce_template_parms): Free garbage vecs.

Co-authored-by: Richard Biener 
---
 gcc/cp/pt.cc | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 20affcd65a2..4ceae1d38de 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9275,6 +9275,7 @@ coerce_template_parms (tree parms,
 	{
 	  /* We don't know how many args we have yet, just use the
 		 unconverted (and still packed) ones for now.  */
+	  ggc_free (new_inner_args);
 	  new_inner_args = orig_inner_args;
 	  arg_idx = nargs;
 	  break;
@@ -9329,7 +9330,8 @@ coerce_template_parms (tree parms,
 		  = make_pack_expansion (conv, complain);
 
   /* We don't know how many args we have yet, just
- use the unconverted ones for now.  */
+		 use the unconverted (but unpacked) ones for now.  */
+	  ggc_free (new_inner_args);
   new_inner_args = inner_args;
 	  arg_idx = nargs;
   break;
@@ -9442,6 +9444,12 @@ coerce_template_parms (tree parms,
 SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (new_inner_args,
 	 TREE_VEC_LENGTH (new_inner_args));
 
+  /* If we expanded packs in inner_args and aren't returning it now, the
+ expanded vec is garbage.  */
+  if (inner_args != new_inner_args
+  && inner_args != orig_inner_args)
+ggc_free (inner_args);
+
   return return_full_args ? new_args : new_inner_args;
 }
 
-- 
2.46.2



[PATCH v4 0/2] Add support for SVE2 faminmax

2024-10-03 Thread saurabh.jha
From: Saurabh Jha 

This is a revised version of this patch series:
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/664329.html

Unfortunately, I had test case failures which I missed but shouldn't
have. Apologies for that.

This version fixes the failing test cases in the second patch with no
other changes.

Regression tested on aarch64-unknown-linux-gnu and found no regressions.

Ok for master?

Thanks,
Saurabh

Saurabh Jha (2):
  aarch64: Add SVE2 faminmax intrinsics
  aarch64: Add codegen support for SVE2 faminmax

 .../aarch64/aarch64-sve-builtins-base.cc  |   4 +
 .../aarch64/aarch64-sve-builtins-base.def |   5 +
 .../aarch64/aarch64-sve-builtins-base.h   |   2 +
 gcc/config/aarch64/aarch64-sve2.md|  37 +++
 gcc/config/aarch64/aarch64.h  |   1 +
 gcc/config/aarch64/iterators.md   |  24 +-
 .../gcc.target/aarch64/sve/faminmax_1.c   |  44 +++
 .../gcc.target/aarch64/sve/faminmax_2.c   |  60 
 .../aarch64/sve2/acle/asm/amax_f16.c  | 312 ++
 .../aarch64/sve2/acle/asm/amax_f32.c  | 312 ++
 .../aarch64/sve2/acle/asm/amax_f64.c  | 312 ++
 .../aarch64/sve2/acle/asm/amin_f16.c  | 311 +
 .../aarch64/sve2/acle/asm/amin_f32.c  | 312 ++
 .../aarch64/sve2/acle/asm/amin_f64.c  | 312 ++
 14 files changed, 2047 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f64.c

-- 
2.34.1



[PATCH v4 2/2] aarch64: Add codegen support for SVE2 faminmax

2024-10-03 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension introduces instructions for
computing the floating point absolute maximum and minimum of the
two vectors element-wise.

This patch adds code generation for famax and famin in terms of existing
unspecs. With this patch:
1. famax can be expressed as taking UNSPEC_COND_SMAX of the two operands
   and then taking absolute value of their result.
2. famin can be expressed as taking UNSPEC_COND_SMIN of the two operands
   and then taking absolute value of their result.

This fusion of operators is only possible when
-march=armv9-a+faminmax+sve flags are passed. We also need to pass
-ffast-math flag; this is what enables compiler to use UNSPEC_COND_SMAX
and UNSPEC_COND_SMIN.

This code generation is only available on -O2 or -O3 as that is when
auto-vectorization is enabled.

gcc/ChangeLog:

* config/aarch64/aarch64-sve2.md
(*aarch64_pred_faminmax_fused): Instruction pattern for faminmax
codegen.
* config/aarch64/iterators.md: Iterator and attribute for
faminmax codegen.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/faminmax_1.c: New test.
* gcc.target/aarch64/sve/faminmax_2.c: New test.
---
 gcc/config/aarch64/aarch64-sve2.md| 37 
 gcc/config/aarch64/iterators.md   |  6 ++
 .../gcc.target/aarch64/sve/faminmax_1.c   | 44 ++
 .../gcc.target/aarch64/sve/faminmax_2.c   | 60 +++
 4 files changed, 147 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/faminmax_2.c

diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md
index 725092cc95f..5f2697c3179 100644
--- a/gcc/config/aarch64/aarch64-sve2.md
+++ b/gcc/config/aarch64/aarch64-sve2.md
@@ -2467,6 +2467,43 @@
   [(set_attr "movprfx" "yes")]
 )
 
+;; -
+;; -- [FP] Absolute maximum and minimum
+;; -
+;; Includes:
+;; - FAMAX
+;; - FAMIN
+;; -
+;; Predicated floating-point absolute maximum and minimum.
+(define_insn_and_rewrite "*aarch64_pred_faminmax_fused"
+  [(set (match_operand:SVE_FULL_F 0 "register_operand")
+	(unspec:SVE_FULL_F
+	  [(match_operand: 1 "register_operand")
+	   (match_operand:SI 4 "aarch64_sve_gp_strictness")
+	   (unspec:SVE_FULL_F
+	 [(match_operand 5)
+	  (const_int SVE_RELAXED_GP)
+	  (match_operand:SVE_FULL_F 2 "register_operand")]
+	 UNSPEC_COND_FABS)
+	   (unspec:SVE_FULL_F
+	 [(match_operand 6)
+	  (const_int SVE_RELAXED_GP)
+	  (match_operand:SVE_FULL_F 3 "register_operand")]
+	 UNSPEC_COND_FABS)]
+	  SVE_COND_SMAXMIN))]
+  "TARGET_SVE_FAMINMAX"
+  {@ [ cons: =0 , 1   , 2  , 3 ; attrs: movprfx ]
+ [ w, Upl , %0 , w ; *  ] \t%0., %1/m, %0., %3.
+ [ ?&w  , Upl , w  , w ; yes] movprfx\t%0, %2\;\t%0., %1/m, %0., %3.
+  }
+  "&& (!rtx_equal_p (operands[1], operands[5])
+   || !rtx_equal_p (operands[1], operands[6]))"
+  {
+operands[5] = copy_rtx (operands[1]);
+operands[6] = copy_rtx (operands[1]);
+  }
+)
+
 ;; =
 ;; == Complex arithmetic
 ;; =
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c06f8c2c90f..8b18682c341 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -3143,6 +3143,9 @@
 	 UNSPEC_COND_FMIN
 	 UNSPEC_COND_FMINNM])
 
+(define_int_iterator SVE_COND_SMAXMIN [UNSPEC_COND_SMAX
+   UNSPEC_COND_SMIN])
+
 (define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
 	  UNSPEC_COND_FMLS
 	  UNSPEC_COND_FNMLA
@@ -4503,6 +4506,9 @@
 
 (define_int_iterator FAMINMAX_UNS [UNSPEC_FAMAX UNSPEC_FAMIN])
 
+(define_int_attr faminmax_cond_uns_op
+  [(UNSPEC_COND_SMAX "famax") (UNSPEC_COND_SMIN "famin")])
+
 (define_int_attr faminmax_uns_op
   [(UNSPEC_FAMAX "famax") (UNSPEC_FAMIN "famin")])
 
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c b/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
new file mode 100644
index 000..3b65ccea065
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/faminmax_1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3 -ffast-math" } */
+
+#include "arm_sve.h"
+
+#pragma GCC target "+sve+faminmax"
+
+#define TEST_FAMAX(TYPE)		\
+  void fn_famax_##TYPE (TYPE * restrict a,\
+			TYPE * restrict b,\
+			TYPE * restrict c,\
+			int n) {	\
+for (int i = 0; i < n; i++) {	\
+  TYPE temp1 = __builtin_fabs (a[i]);\
+  TYPE temp2 = __builtin_fabs (b[i]);\
+  c[i] = __builtin_fmax (temp1, temp2);\
+}	\
+  

[PATCH v4 1/2] aarch64: Add SVE2 faminmax intrinsics

2024-10-03 Thread saurabh.jha

The AArch64 FEAT_FAMINMAX extension introduces instructions for
computing the floating point absolute maximum and minimum of the
two vectors element-wise.

This patch introduces SVE2 faminmax intrinsics. The intrinsics of this
extension are implemented as the following builtin functions:
* sva[max|min]_[m|x|z]
* sva[max|min]_[f16|f32|f64]_[m|x|z]
* sva[max|min]_n_[f16|f32|f64]_[m|x|z]

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins-base.cc
(svamax): Absolute maximum declaration.
(svamin): Absolute minimum declaration.
* config/aarch64/aarch64-sve-builtins-base.def
(REQUIRED_EXTENSIONS): Add faminmax intrinsics behind a flag.
(svamax): Absolute maximum declaration.
(svamin): Absolute minimum declaration.
* config/aarch64/aarch64-sve-builtins-base.h: Declaring function
bases for the new intrinsics.
* config/aarch64/aarch64.h
(TARGET_SVE_FAMINMAX): New flag for SVE2 faminmax.
* config/aarch64/iterators.md: New unspecs, iterators, and attrs
for the new intrinsics.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve2/acle/asm/amax_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f32.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amax_f64.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f32.c: New test.
* gcc.target/aarch64/sve2/acle/asm/amin_f64.c: New test.
---
 .../aarch64/aarch64-sve-builtins-base.cc  |   4 +
 .../aarch64/aarch64-sve-builtins-base.def |   5 +
 .../aarch64/aarch64-sve-builtins-base.h   |   2 +
 gcc/config/aarch64/aarch64.h  |   1 +
 gcc/config/aarch64/iterators.md   |  18 +-
 .../aarch64/sve2/acle/asm/amax_f16.c  | 312 ++
 .../aarch64/sve2/acle/asm/amax_f32.c  | 312 ++
 .../aarch64/sve2/acle/asm/amax_f64.c  | 312 ++
 .../aarch64/sve2/acle/asm/amin_f16.c  | 311 +
 .../aarch64/sve2/acle/asm/amin_f32.c  | 312 ++
 .../aarch64/sve2/acle/asm/amin_f64.c  | 312 ++
 11 files changed, 1900 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amax_f64.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f32.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve2/acle/asm/amin_f64.c

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 4b33585d981..b189818d643 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -3071,6 +3071,10 @@ FUNCTION (svadrb, svadr_bhwd_impl, (0))
 FUNCTION (svadrd, svadr_bhwd_impl, (3))
 FUNCTION (svadrh, svadr_bhwd_impl, (1))
 FUNCTION (svadrw, svadr_bhwd_impl, (2))
+FUNCTION (svamax, cond_or_uncond_unspec_function,
+	  (UNSPEC_COND_FAMAX, UNSPEC_FAMAX))
+FUNCTION (svamin, cond_or_uncond_unspec_function,
+	  (UNSPEC_COND_FAMIN, UNSPEC_FAMIN))
 FUNCTION (svand, rtx_code_function, (AND, AND))
 FUNCTION (svandv, reduction, (UNSPEC_ANDV))
 FUNCTION (svasr, rtx_code_function, (ASHIFTRT, ASHIFTRT))
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.def b/gcc/config/aarch64/aarch64-sve-builtins-base.def
index 65fcba91586..95e04e4393d 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.def
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.def
@@ -379,3 +379,8 @@ DEF_SVE_FUNCTION (svzip2q, binary, all_data, none)
 DEF_SVE_FUNCTION (svld1ro, load_replicate, all_data, implicit)
 DEF_SVE_FUNCTION (svmmla, mmla, d_float, none)
 #undef REQUIRED_EXTENSIONS
+
+#define REQUIRED_EXTENSIONS AARCH64_FL_SVE | AARCH64_FL_FAMINMAX
+DEF_SVE_FUNCTION (svamax, binary_opt_single_n, all_float, mxz)
+DEF_SVE_FUNCTION (svamin, binary_opt_single_n, all_float, mxz)
+#undef REQUIRED_EXTENSIONS
diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.h b/gcc/config/aarch64/aarch64-sve-builtins-base.h
index 5bbf3569c4b..978cf7013f9 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.h
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.h
@@ -37,6 +37,8 @@ namespace aarch64_sve
 extern const function_base *const svadrd;
 extern const function_base *const svadrh;
 extern const function_base *const svadrw;
+extern const function_base *const svamax;
+extern const function_base *const svamin;
 extern const function_base *const svand;
 extern const function_base *const svandv;
 extern const function_base *const svasr;
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index ec8fde783b3..34f56a4b869 100644
--- a/gcc/co

[PATCH 1/2] gcov: branch, conds, calls in function summaries

2024-10-03 Thread Jørgen Kvalsvik
The gcov function summaries only output the covered lines, not the
branches and calls. Since the function summaries is an opt-in it
probably makes sense to also include branch coverage, calls, and
condition coverage.

$ gcc --coverage -fpath-coverage hello.c -o hello
$ ./hello

Before:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 4

Function 'fn'
Lines executed:100.00% of 7

File 'hello.c'
Lines executed:100.00% of 11
Creating 'hello.c.gcov'

After:
$ gcov -f hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

With conditions:
$ gcov -fg hello
Function 'main'
Lines executed:100.00% of 3
No branches
Calls executed:100.00% of 1
No conditions

Function 'fn'
Lines executed:100.00% of 7
Branches executed:100.00% of 4
Taken at least once:50.00% of 4
Condition outcomes covered:100.00% of 8
No calls

File 'hello.c'
Lines executed:100.00% of 10
Creating 'hello.c.gcov'

Lines executed:100.00% of 10

gcc/ChangeLog:

* gcov.cc (generate_results): Count branches, conditions.
(function_summary): Output branch, calls, condition count.
---
 gcc/gcov.cc | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)
---
 gcc/gcov.cc | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index e1334e75012..1ff36de9569 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -1688,11 +1688,19 @@ generate_results (const char *file_name)
   memset (&coverage, 0, sizeof (coverage));
   coverage.name = fn->get_name ();
   add_line_counts (flag_function_summary ? &coverage : NULL, fn);
-  if (flag_function_summary)
-   {
- function_summary (&coverage);
- fnotice (stdout, "\n");
-   }
+
+  if (!flag_function_summary)
+   continue;
+
+  for (const block_info& block : fn->blocks)
+   for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+ add_branch_counts (&coverage, arc);
+
+  for (const block_info& block : fn->blocks)
+   add_condition_counts (&coverage, &block);
+
+  function_summary (&coverage);
+  fnotice (stdout, "\n");
 }
 
   name_map needle;
@@ -2765,6 +2773,36 @@ function_summary (const coverage_info *coverage)
 {
   fnotice (stdout, "%s '%s'\n", "Function", coverage->name);
   executed_summary (coverage->lines, coverage->lines_executed);
+
+  if (coverage->branches)
+{
+  fnotice (stdout, "Branches executed:%s of %d\n",
+  format_gcov (coverage->branches_executed, coverage->branches, 2),
+  coverage->branches);
+  fnotice (stdout, "Taken at least once:%s of %d\n",
+  format_gcov (coverage->branches_taken, coverage->branches, 2),
+   coverage->branches);
+}
+  else
+fnotice (stdout, "No branches\n");
+
+  if (coverage->calls)
+fnotice (stdout, "Calls executed:%s of %d\n",
+format_gcov (coverage->calls_executed, coverage->calls, 2),
+coverage->calls);
+  else
+fnotice (stdout, "No calls\n");
+
+  if (flag_conditions)
+{
+  if (coverage->conditions)
+   fnotice (stdout, "Condition outcomes covered:%s of %d\n",
+format_gcov (coverage->conditions_covered,
+ coverage->conditions, 2),
+coverage->conditions);
+  else
+   fnotice (stdout, "No conditions\n");
+}
 }
 
 /* Output summary info for a file.  */
-- 
2.39.5



[PATCH 0/2] Prime path coverage to gcc/gcov

2024-10-03 Thread Jørgen Kvalsvik
This is both a ping and a minor update. A few of the patches from the
previous set have been merged, but the big feature still needs review.

Since then it has been quiet, but there are two notable changes:

1. The --prime-paths-{lines,source} flags take an optional argument to
   print covered or uncovered paths, or both. By default, uncovered
   paths are printed like before.
2. Fixed a bad vector access when independent functions share compiler
   generated statements. A reproducing case is in gcov-23.C which
   relied on printing the uncovered path of multiple destructors of
   static objects.

Jørgen Kvalsvik (2):
  gcov: branch, conds, calls in function summaries
  Add prime path coverage to gcc/gcov

 gcc/Makefile.in|6 +-
 gcc/builtins.cc|2 +-
 gcc/collect2.cc|5 +-
 gcc/common.opt |   16 +
 gcc/doc/gcov.texi  |  184 +++
 gcc/doc/invoke.texi|   36 +
 gcc/gcc.cc |4 +-
 gcc/gcov-counter.def   |3 +
 gcc/gcov-io.h  |3 +
 gcc/gcov.cc|  531 ++-
 gcc/ipa-inline.cc  |2 +-
 gcc/passes.cc  |4 +-
 gcc/path-coverage.cc   |  782 +
 gcc/prime-paths.cc | 2031 
 gcc/profile.cc |6 +-
 gcc/selftest-run-tests.cc  |1 +
 gcc/selftest.h |1 +
 gcc/testsuite/g++.dg/gcov/gcov-22.C|  170 ++
 gcc/testsuite/g++.dg/gcov/gcov-23-1.h  |9 +
 gcc/testsuite/g++.dg/gcov/gcov-23-2.h  |9 +
 gcc/testsuite/g++.dg/gcov/gcov-23.C|   30 +
 gcc/testsuite/gcc.misc-tests/gcov-29.c |  869 ++
 gcc/testsuite/gcc.misc-tests/gcov-30.c |  869 ++
 gcc/testsuite/gcc.misc-tests/gcov-31.c |   35 +
 gcc/testsuite/gcc.misc-tests/gcov-32.c |   24 +
 gcc/testsuite/gcc.misc-tests/gcov-33.c |   27 +
 gcc/testsuite/gcc.misc-tests/gcov-34.c |   29 +
 gcc/testsuite/lib/gcov.exp |  118 +-
 gcc/tree-profile.cc|   11 +-
 29 files changed, 5795 insertions(+), 22 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-22.C
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-1.h
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23-2.h
 create mode 100644 gcc/testsuite/g++.dg/gcov/gcov-23.C
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-29.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-30.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-31.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-32.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-33.c
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-34.c

-- 
2.39.5



Re: [PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-10-03 Thread Thomas Schwinge
Hi!

On 2024-09-06T11:30:06+0200, Richard Biener  wrote:
> On Thu, 5 Sep 2024, Richard Biener wrote:
>> The following enables single-lane loop SLP discovery for non-grouped stores
>> and adjusts vectorizable_store to properly handle those.

> I have now pushed this as r15-3509-gd34cda72098867

>> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
>> @@ -50,4 +50,5 @@ int main (void)
>>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
>> { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
>> { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
>> { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } 
>> } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } 
>> */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
>> { target riscv_v } } } */

For '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'),
I see:

PASS: gcc.dg/vect/slp-26.c (test for excess errors)
PASS: gcc.dg/vect/slp-26.c execution test
PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1

gcc.dg/vect/slp-26.c: pattern found 2 times

..., so I suppose I'll apply the same change to 'amdgcn-*-*' as you did
to 'riscv_v'?


Grüße
 Thomas


Re: SVE intrinsics: Fold constant operands for svlsl.

2024-10-03 Thread Soumya AR
Ping.

> On 24 Sep 2024, at 2:00 PM, Soumya AR  wrote:
> 
> This patch implements constant folding for svlsl. Test cases have been added 
> to
> check for the following cases:
> 
> Zero, merge, and don't care predication.
> Shift by 0.
> Shift by register width.
> Overflow shift on signed and unsigned integers.
> Shift on a negative integer.
> Maximum possible shift, eg. shift by 7 on an 8-bit integer.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Soumya AR 
> 
> gcc/ChangeLog:
> 
> * config/aarch64/aarch64-sve-builtins-base.cc (svlsl_impl::fold):
> Try constant folding.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/aarch64/sve/const_fold_lsl_1.c: New test.
> 
> <0001-SVE-intrinsics-Fold-constant-operands-for-svlsl.patch>



Re: [PATCH] tree-optimization/116566 - single lane SLP for VLA inductions

2024-10-03 Thread Richard Biener
On Wed, 2 Oct 2024, Andrew Pinski wrote:

> On Tue, Oct 1, 2024 at 5:04 AM Richard Biener  wrote:
> >
> > The following adds SLP support for vectorizing single-lane inductions
> > with variable length vectors.
> 
> This introduces a bootstrap failure on aarch64 due to a maybe
> uninitialized variable.
> 
> inlined from ‘bool vectorizable_induction(loop_vec_info,
> stmt_vec_info, gimple**, slp_tree, stmt_vector_for_cost*)’ at
> /home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc:10718:33:
> /home/linaro/src/upstream-gcc/gcc/gcc/gimple-fold.h:183:25: error:
> ‘vec_init’ may be used uninitialized [-Werror=maybe-uninitialized]
>   183 |   return gimple_convert (&gsi, false, GSI_CONTINUE_LINKING,
>   |  ~~~^~~
>   184 |  UNKNOWN_LOCATION, type, op);
>   |  ~~~
> /home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc: In function
> ‘bool vectorizable_induction(loop_vec_info, stmt_vec_info, gimple**,
> slp_tree, stmt_vector_for_cost*)’:
> /home/linaro/src/upstream-gcc/gcc/gcc/tree-vect-loop.cc:10281:17:
> note: ‘vec_init’ was declared here
> 10281 |   tree new_vec, vec_init, vec_step, t;
>   | ^~~~
> 
> 
> The issue is around line 10718:
>   if (init_node)
> vec_init = vect_get_slp_vect_def (init_node, ivn);
>   if (!nested_in_vect_loop
>   && step_mul
>   && !integer_zerop (step_mul))
> {
>   gcc_assert (invariant);
>   vec_def = gimple_convert (&init_stmts, step_vectype, vec_init);
> 
> it is hard to follow the code to see if it is actually uninitialized or not.

It is, I agree it's a bit hard to follow - I was torn between interweaving
the code like I did and copying the whole SLP handling for the VLA case 
...

I'll push the obvious fix.

Richard.


> Thanks,
> Andrew
> 
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > PR tree-optimization/116566
> > * tree-vect-loop.cc (vectorizable_induction): Handle single-lane
> > SLP for VLA vectors.
> > ---
> >  gcc/tree-vect-loop.cc | 247 --
> >  1 file changed, 189 insertions(+), 58 deletions(-)
> >
> > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> > index a5a44613cb2..f5ecf0bdb80 100644
> > --- a/gcc/tree-vect-loop.cc
> > +++ b/gcc/tree-vect-loop.cc
> > @@ -10283,7 +10283,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >gimple *new_stmt;
> >gphi *induction_phi;
> >tree induc_def, vec_dest;
> > -  tree init_expr, step_expr;
> >poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> >unsigned i;
> >tree expr;
> > @@ -10369,7 +10368,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >  iv_loop = loop;
> >gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
> >
> > -  if (slp_node && !nunits.is_constant ())
> > +  if (slp_node && (!nunits.is_constant () && SLP_TREE_LANES (slp_node) != 
> > 1))
> >  {
> >/* The current SLP code creates the step value element-by-element.  
> > */
> >if (dump_enabled_p ())
> > @@ -10387,7 +10386,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >return false;
> >  }
> >
> > -  step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
> > +  tree step_expr = STMT_VINFO_LOOP_PHI_EVOLUTION_PART (stmt_info);
> >gcc_assert (step_expr != NULL_TREE);
> >if (INTEGRAL_TYPE_P (TREE_TYPE (step_expr))
> >&& !type_has_mode_precision_p (TREE_TYPE (step_expr)))
> > @@ -10475,9 +10474,6 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> > [i2 + 2*S2, i0 + 3*S0, i1 + 3*S1, i2 + 3*S2].  */
> >if (slp_node)
> >  {
> > -  /* Enforced above.  */
> > -  unsigned int const_nunits = nunits.to_constant ();
> > -
> >/* The initial values are vectorized, but any lanes > group_size
> >  need adjustment.  */
> >slp_tree init_node
> > @@ -10499,11 +10495,12 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >
> >/* Now generate the IVs.  */
> >unsigned nvects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
> > -  gcc_assert ((const_nunits * nvects) % group_size == 0);
> > +  gcc_assert (multiple_p (nunits * nvects, group_size));
> >unsigned nivs;
> > +  unsigned HOST_WIDE_INT const_nunits;
> >if (nested_in_vect_loop)
> > nivs = nvects;
> > -  else
> > +  else if (nunits.is_constant (&const_nunits))
> > {
> >   /* Compute the number of distinct IVs we need.  First reduce
> >  group_size if it is a multiple of const_nunits so we get
> > @@ -10514,21 +10511,43 @@ vectorizable_induction (loop_vec_info loop_vinfo,
> >   nivs = least_common_multiple (group_sizep,
> > const_nunits) / const_nunits;
> > }
> > +  else
> > +   {
> 

[PATCH] Restore aarch64 bootstrap

2024-10-03 Thread Richard Biener
This zero-initializes vec_init to avoid a bogus maybe-uninitialized
diagnostic.

Built on x86_64-unknown-linux-gnu, pushed as obvious.

* tree-vect-loop.cc (vectorizable_induction): Initialize
vec_init.
---
 gcc/tree-vect-loop.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f5ecf0bdb80..730888f6275 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -10278,7 +10278,7 @@ vectorizable_induction (loop_vec_info loop_vinfo,
   tree vec_def;
   edge pe = loop_preheader_edge (loop);
   basic_block new_bb;
-  tree new_vec, vec_init, vec_step, t;
+  tree new_vec, vec_init = NULL_TREE, vec_step, t;
   tree new_name;
   gimple *new_stmt;
   gphi *induction_phi;
-- 
2.43.0


[patch,testsuite] Fix gcc.c-torture/execute/ieee/pr108540-1.c

2024-10-03 Thread Georg-Johann Lay

gcc.c-torture/execute/ieee/pr108540-1.c obviously requires that double
is a 64-bit type, hence add pr108540-1.x as an according filter.

Ok for trunk?

And is there a reason for why we are still putting test cases in
these old parts of the testsuite that don't support dg-magic-comments
like

/* { dg-require-effective-target double64 } */

?

Johann

--

testsuite - Fix gcc.c-torture/execute/ieee/pr108540-1.c

  PR testsuite/108540
gcc/testsuite/
* gcc.c-torture/execute/ieee/pr108540-1.c: Un-preprocess
__SIZE_TYPE__ and __INT64_TYPE__.
* gcc.c-torture/execute/ieee/pr108540-1.x: New file, requires double64.testsuite - Fix gcc.c-torture/execute/ieee/pr108540-1.c

  PR testsuite/108540
gcc/testsuite/
* gcc.c-torture/execute/ieee/pr108540-1.c: Un-preprocess
__SIZE_TYPE__ and __INT64_TYPE__.
* gcc.c-torture/execute/ieee/pr108540-1.x: New file, requires double64.

diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.c b/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.c
index ebd4c502ee5..db094418a79 100644
--- a/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.c
@@ -1,7 +1,7 @@
 /* PR tree-optimization/108540 */
 
 __attribute__((noipa)) void
-bar (const char *cp, unsigned long size, char sign, int dsgn)
+bar (const char *cp, __SIZE_TYPE__ size, char sign, int dsgn)
 {
   if (__builtin_strcmp (cp, "ZERO") != 0 || size != 4 || sign != '-' || dsgn != 1)
 __builtin_abort ();
@@ -11,7 +11,7 @@ __attribute__((noipa)) void
 foo (int x, int ch, double d)
 {
   const char *cp = "";
-  unsigned long size = 0;
+  __SIZE_TYPE__ size = 0;
   char sign = '\0';
   switch (x)
 {
@@ -41,7 +41,7 @@ foo (int x, int ch, double d)
 	sign = '\0';
   if (ch == 'a' || ch == 'A')
 	{
-	  union U { long long l; double d; } u;
+	  union U { __INT64_TYPE__ l; double d; } u;
 	  int dsgn;
 	  u.d = d;
 	  if (u.l < 0)
diff --git a/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.x b/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.x
new file mode 100644
index 000..06d93efeb99
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/ieee/pr108540-1.x
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { ! [check_effective_target_double64] } {
+return 1
+}
+
+return 0


[committed] libstdc++: Fix some warnings seen during bootstrap

2024-10-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/locale_facets_nonio.tcc (money_put::__do_get):
Ignore -Wformat warning for __ibm128 arguments.
* include/tr1/tuple (ignore): Ignore -Wunused warning.
---
 libstdc++-v3/include/bits/locale_facets_nonio.tcc | 3 +++
 libstdc++-v3/include/tr1/tuple| 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 5fddc1e3b26..53553d113b2 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -637,6 +637,8 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11
 
 #if defined _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT \
   && defined __LONG_DOUBLE_IEEE128__
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wformat" // '%Lf' expects 'long double'
 extern "C"
 __typeof__(__builtin_snprintf) __glibcxx_snprintfibm128 __asm__("snprintf");
 
@@ -671,6 +673,7 @@ __typeof__(__builtin_snprintf) __glibcxx_snprintfibm128 
__asm__("snprintf");
   return __intl ? _M_insert(__s, __io, __fill, __digits)
: _M_insert(__s, __io, __fill, __digits);
 }
+#pragma GCC diagnostic pop
 #endif
 
 _GLIBCXX_END_NAMESPACE_LDBL_OR_CXX11
diff --git a/libstdc++-v3/include/tr1/tuple b/libstdc++-v3/include/tr1/tuple
index b5c62b585a9..f66090d3631 100644
--- a/libstdc++-v3/include/tr1/tuple
+++ b/libstdc++-v3/include/tr1/tuple
@@ -423,7 +423,7 @@ namespace tr1
   // TODO: Put this in some kind of shared file.
   namespace
   {
-_Swallow_assign ignore;
+_Swallow_assign ignore  __attribute__((__unused__));
   } // anonymous namespace
 }
 
-- 
2.46.1



[committed] libstdc++: Make Unicode utils work with Parallel Mode

2024-10-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/unicode.h (__unicode::__is_incb_linker): Use
_GLIBCXX_STD_A namespace for std::find.
---
 libstdc++-v3/include/bits/unicode.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/unicode.h 
b/libstdc++-v3/include/bits/unicode.h
index a14a17c5dfc..1232f60037c 100644
--- a/libstdc++-v3/include/bits/unicode.h
+++ b/libstdc++-v3/include/bits/unicode.h
@@ -625,7 +625,7 @@ inline namespace __v15_1_0
   {
 const auto __end = std::end(__incb_linkers);
 // Array is small enough that linear search is faster than binary search.
-return std::find(__incb_linkers, __end, __c) != __end;
+return _GLIBCXX_STD_A::find(__incb_linkers, __end, __c) != __end;
   }
 
   // @pre c <= 0x10
-- 
2.46.1



[committed] libstdc++: Fix -Wdeprecated-declarations warning for Parallel Mode [PR116944]

2024-10-03 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk.

-- >8 --

The pragmas to disable warnings need to be moved before the first use of
the deprecated classes.

libstdc++-v3/ChangeLog:

PR libstdc++/116944
* include/parallel/base.h: Move diagnostic pragmas earlier.
---
 libstdc++-v3/include/parallel/base.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libstdc++-v3/include/parallel/base.h 
b/libstdc++-v3/include/parallel/base.h
index fcbcc1e0b99..4341e26baf0 100644
--- a/libstdc++-v3/include/parallel/base.h
+++ b/libstdc++-v3/include/parallel/base.h
@@ -150,6 +150,9 @@ namespace __gnu_parallel
 max(const _Tp& __a, const _Tp& __b)
 { return (__a > __b) ? __a : __b; }
 
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
+
   /** @brief Constructs predicate for equality from strict weak
*  ordering predicate
*/
@@ -166,9 +169,6 @@ namespace __gnu_parallel
   { return !_M_comp(__a, __b) && !_M_comp(__b, __a); }
 };
 
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wdeprecated-declarations" // *nary_function
-
   /** @brief Similar to std::unary_negate,
*  but giving the argument types explicitly. */
   template
-- 
2.46.1



Re: [PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-10-03 Thread Richard Biener
On Thu, 3 Oct 2024, Thomas Schwinge wrote:

> Hi!
> 
> On 2024-09-06T11:30:06+0200, Richard Biener  wrote:
> > On Thu, 5 Sep 2024, Richard Biener wrote:
> >> The following enables single-lane loop SLP discovery for non-grouped stores
> >> and adjusts vectorizable_store to properly handle those.
> 
> > I have now pushed this as r15-3509-gd34cda72098867
> 
> >> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
> >> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
> >> @@ -50,4 +50,5 @@ int main (void)
> >>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { 
> >> target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } 
> >> } } } */
> >>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
> >> target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } 
> >> */
> >>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 
> >> "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || 
> >> loongarch_sx } } } } } } } */
> >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
> >> "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } 
> >> } } } } } */
> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
> >> "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 
> >> "vect" { target riscv_v } } } */
> 
> For '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'),
> I see:
> 
> PASS: gcc.dg/vect/slp-26.c (test for excess errors)
> PASS: gcc.dg/vect/slp-26.c execution test
> PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 
> 1
> [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 1
> 
> gcc.dg/vect/slp-26.c: pattern found 2 times
> 
> ..., so I suppose I'll apply the same change to 'amdgcn-*-*' as you did
> to 'riscv_v'?

I guess yes, I don't remember exactly the reason but IIRC it's about the
unsigned division which gcn might also be able to do - the 32817
value is explicitly excluded from pattern recognition.  We don't have
an effective target for unsigned [short] integer division.

Richard.

> 
> Grüße
>  Thomas
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

Re: [PATCH 2/2] gcc: make Valgrind errors fatal with --enable-checking=valgrind

2024-10-03 Thread Eric Gallager
On Wed, Oct 2, 2024 at 10:43 PM Sam James  wrote:
>
> Valgrind doesn't error out by default which means bootstrap issues like
> in PR116945 can easily be missed: pass --exit-errorcode=1 to handle this.
>
> gcc/ChangeLog:
> PR other/116945
> PR other/116947
>
> * gcc.cc (execute): Pass --error-exitcode=2 to Valgrind.

There's a discrepancy here with the values: it's 2 at this point in
the ChangeLog, but 1 in the actual code...

> ---
>  gcc/gcc.cc | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/gcc.cc b/gcc/gcc.cc
> index 16fed46fb35f..cb3c0be77d31 100644
> --- a/gcc/gcc.cc
> +++ b/gcc/gcc.cc
> @@ -3402,12 +3402,13 @@ execute (void)
>for (argc = 0; commands[i].argv[argc] != NULL; argc++)
> ;
>
> -  argv = XALLOCAVEC (const char *, argc + 3);
> +  argv = XALLOCAVEC (const char *, argc + 4);
>
>argv[0] = VALGRIND_PATH;
>argv[1] = "-q";
> -  for (j = 2; j < argc + 2; j++)
> -   argv[j] = commands[i].argv[j - 2];
> +  argv[2] = "--error-exitcode=1";
> +  for (j = 3; j < argc + 3; j++)
> +   argv[j] = commands[i].argv[j - 3];
>argv[j] = NULL;
>
>commands[i].argv = argv;
> --
> 2.46.2
>


Re: [PATCH] libstdc++: Unroll loop in load_bytes function

2024-10-03 Thread Dmitry Ilvokhin
On Wed, Oct 02, 2024 at 08:15:38PM +0100, Jonathan Wakely wrote:
> On Wed, 2 Oct 2024 at 19:25, Jonathan Wakely  wrote:
> >
> > On Wed, 2 Oct 2024 at 19:16, Jonathan Wakely  wrote:
> > >
> > > On Wed, 2 Oct 2024 at 19:15, Dmitry Ilvokhin  wrote:
> > > >
> > > > Instead of looping over every byte of the tail, unroll loop manually
> > > > using switch statement, then compilers (at least GCC and Clang) will
> > > > generate a jump table [1], which is faster on a microbenchmark [2].
> > > >
> > > > [1]: https://godbolt.org/z/aE8Mq3j5G
> > > > [2]: https://quick-bench.com/q/ylYLW2R22AZKRvameYYtbYxag24
> > > >
> > > > libstdc++-v3/ChangeLog:
> > > >
> > > > * libstdc++-v3/libsupc++/hash_bytes.cc (load_bytes): unroll
> > > >   loop using switch statement.
> > > >
> > > > Signed-off-by: Dmitry Ilvokhin 
> > > > ---
> > > >  libstdc++-v3/libsupc++/hash_bytes.cc | 27 +++
> > > >  1 file changed, 23 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/libstdc++-v3/libsupc++/hash_bytes.cc 
> > > > b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > index 3665375096a..294a7323dd0 100644
> > > > --- a/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > +++ b/libstdc++-v3/libsupc++/hash_bytes.cc
> > > > @@ -50,10 +50,29 @@ namespace
> > > >load_bytes(const char* p, int n)
> > > >{
> > > >  std::size_t result = 0;
> > > > ---n;
> > > > -do
> > > > -  result = (result << 8) + static_cast(p[n]);
> > > > -while (--n >= 0);
> > >
> > > Don't we still need to loop, for the case where n >= 8? Otherwise we
> > > only hash the first 8 bytes.
> >
> > Ah, but it's only ever called with load_bytes(end, len & 0x7)
> 
> It seems to be slower for short strings, but a win overall:
> https://quick-bench.com/q/xhh5m1akZzwUAXRiYJ17z9FASc8
> This measures different lengths, and tries to ensure that the string
> contents aren't treated as constant.

Nice find, thanks for spotting it.

In retrospect, I think case n = 7 is best for unrolled switch version
and worst for loop version, because it maximize overhead from loop
control flow instructions. In contrast, case n = 1 is completely
opposite in that sense, we have minimal overhead from loop control flow.

If explanation above is correct, then not sure we can do something
better for small cases. Seems like this lunch is not completely free.

> 
> >
> >
> > >
> > > > +switch(n & 7)
> > > > +  {
> > > > +  case 7:
> > > > +   result |= std::size_t(p[6]) << 48;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 6:
> > > > +   result |= std::size_t(p[5]) << 40;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 5:
> > > > +   result |= std::size_t(p[4]) << 32;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 4:
> > > > +   result |= std::size_t(p[3]) << 24;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 3:
> > > > +   result |= std::size_t(p[2]) << 16;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 2:
> > > > +   result |= std::size_t(p[1]) << 8;
> > > > +   [[gnu::fallthrough]];
> > > > +  case 1:
> > > > +   result |= std::size_t(p[0]);
> > > > +  };
> > > >  return result;
> > > >}
> > > >
> > > > --
> > > > 2.43.5
> > > >
>

Forgot to CC mailing lists. Sorry about that.


[patch,avr,applied] Make gcc.dg/c23-stdarg-9.c work

2024-10-03 Thread Georg-Johann Lay

gcc.dg/c23-stdarg-9.c failed because the code requested too
much stack memory.  With less stack allocated, this test passes.
Applied as obvious.

Johann

--

AVR: Make gcc.dg/c23-stdarg-9.c work.

gcc/testsuite/
* gcc.dg/c23-stdarg-9.c (struct S) [AVR]: Only use int a[500].

diff --git a/gcc/testsuite/gcc.dg/c23-stdarg-9.c 
b/gcc/testsuite/gcc.dg/c23-stdarg-9.c

index e2839e7e2cd..068fe3d4c7a 100644
--- a/gcc/testsuite/gcc.dg/c23-stdarg-9.c
+++ b/gcc/testsuite/gcc.dg/c23-stdarg-9.c
@@ -5,7 +5,12 @@

 #include 

+#ifdef __AVR__
+/* AVR doesn't have that much stack... */
+struct S { int a[500]; };
+#else
 struct S { int a[1024]; };
+#endif

 int
 f1 (...)


[PATCH] testsuite: Fix tail_call and musttail effective targets [PR116080]

2024-10-03 Thread Christophe Lyon
Some of the musttail tests (eg musttail7.c) fail on arm-eabi because
check_effective_target_musttail pass, but the actual code in the test
is rejected.

The reason is that on arm-eabi with the default configuration, the
compiler targets armv4t for which TARGET_INTERWORK is true, making
arm_function_ok_for_sibcall reject a tail-call candidate if
TREE_ASM_WRITTEN (decl) is false.

For more recent architecture versions, TARGET_INTERWORK is false,
hence the problem was not seen on all arm configurations.

musttail7.c is in turn rejected because f2 is recursive, so
TREE_ASM_WRITTEN is false.

However, the same code used in check_effective_target_musttail is not
recursive and the function body for foo has TREE_ASM_WRITTEN == true.

The simplest fix is to remove the (empty) body for foo () in
check_effective_target_musttail.  For consistency, do the same with
check_effective_target_tail_call.

gcc/testsuite/ChangeLog:
PR testsuite/116080
* lib/target-supports.exp (check_effective_target_tail_call):
Remove foo's body.
(check_effective_target_musttail): Likewise.
---
 gcc/testsuite/lib/target-supports.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f92f7f1af9c..d8cdac97b6f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -12883,7 +12883,7 @@ proc check_effective_target_frame_pointer_for_non_leaf 
{ } {
 # most trivial type.
 proc check_effective_target_tail_call { } {
 return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
-   __attribute__((__noipa__)) void foo (void) { }
+   __attribute__((__noipa__)) void foo (void);
__attribute__((__noipa__)) void bar (void) { foo(); }
 } {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
dump.
 }
@@ -12893,7 +12893,7 @@ proc check_effective_target_tail_call { } {
 # is supported at -O0.
 proc check_effective_target_musttail { } {
 return [check_no_messages_and_pattern musttail ",SIBCALL" rtl-expand {
-   __attribute__((__noipa__)) void foo (void) { }
+   __attribute__((__noipa__)) void foo (void);
__attribute__((__noipa__)) void bar (void) { [[gnu::musttail]] return 
foo(); }
 } {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed dump.
 }
-- 
2.34.1



Re: [PATCH] testsuite: Fix tail_call and musttail effective targets [PR116080]

2024-10-03 Thread Andi Kleen
On Thu, Oct 03, 2024 at 01:48:35PM +, Christophe Lyon wrote:
> Some of the musttail tests (eg musttail7.c) fail on arm-eabi because
> check_effective_target_musttail pass, but the actual code in the test
> is rejected.
 
Looks good to me. Thanks.

-Andi


Re: [PATCH v1] Add -ftime-report-wall

2024-10-03 Thread David Malcolm
On Thu, 2024-10-03 at 00:37 -0700, Andi Kleen wrote:
> > Note that if the user requests SARIF output e.g. with
> >   -fdiagnostics-format=sarif-stderr
> > then any timevar data from -ftime-report is written in JSON form as
> > part of the SARIF, rather than in text form to stderr (see
> > 75d623946d4b6ea80a777b789b116d4b4a2298dc).
> > 
> > I see that the proposed patch leaves the user and sys stats as
> > zero,
> > and conditionalizes what's printed for text output as part of
> > timer::print.  Should it also do something similar in
> > make_json_for_timevar_time_def for the json output, and not add the
> > properties for "user" and "sys" if the data hasn't been gathered?
> 
> > Hope I'm reading the patch correctly.
> 
> Yes that's right.

Thanks.

> 
> I mainly adjusted the human output for cosmetic reasons.
> 
> For machine readable i guess it is better to have a stable schema 
> and not skip fields to avoid pain for parsers. So I left it alone.

The only consumer I know of for the JSON time report data is in the
integration tests I wrote for -fanalyzer, which assumes that all fields
are present when printing, and then goes on to use the "user" times for
summarizing; see this commit FWIW:
https://github.com/davidmalcolm/gcc-analyzer-integration-tests/commit/5420ce968e6eae886e61486555b54fd460e0d35f

I'm not planning to use -ftime-report-wall, but given that my
summarization code is using the "user" times I think I'd prefer it the
property wasn't present rather than contained a bogus value that I
might mistakenly use.  The existing docs do say: "The precise format of
this JSON data is subject to change":
https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report
and there isn't a formal schema for this written down anywhere.

Hope this is constructive
Dave



Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-10-03 Thread Richard Sandiford
Soumya AR  writes:
> From 7fafcb5e0174c56205ec05406c9a412196ae93d3 Mon Sep 17 00:00:00 2001
> From: Soumya AR 
> Date: Thu, 3 Oct 2024 11:53:07 +0530
> Subject: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction
>
> This patch uses the FSCALE instruction provided by SVE to implement the
> standard ldexp family of functions.
>
> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
> following code:
>
> float
> test_ldexpf (float x, int i)
> {
>   return __builtin_ldexpf (x, i);
> }
>
> double
> test_ldexp (double x, int i)
> {
>   return __builtin_ldexp(x, i);
> }
>
> GCC Output:
>
> test_ldexpf:
>   b ldexpf
>
> test_ldexp:
>   b ldexp
>
> Since SVE has support for an FSCALE instruction, we can use this to process
> scalar floats by moving them to a vector register and performing an fscale 
> call,
> similar to how LLVM tackles an ldexp builtin as well.
>
> New Output:
>
> test_ldexpf:
>   fmov s31, w0
>   ptrue p7.b, all
>   fscale z0.s, p7/m, z0.s, z31.s
>   ret
>
> test_ldexp:
>   sxtw x0, w0
>   ptrue p7.b, all
>   fmov d31, x0
>   fscale z0.d, p7/m, z0.d, z31.d
>   ret
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?

Could we also use the .H form for __builtin_ldexpf16?

I suppose:

> @@ -2286,7 +2289,8 @@
>(VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
>(V8QI "VNx8BI") (V16QI "VNx16BI")
>(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
> -  (V4SI "VNx4BI") (V2DI "VNx2BI")])
> +  (V4SI "VNx4BI") (V2DI "VNx2BI")
> +  (SF "VNx4BI") (DF "VNx2BI")])

...this again raises the question what we should do for predicate
modes when the data mode isn't a natural SVE mode.  That came up
recently in relation to V1DI in the popcount patch, and for reductions
in the ANDV etc. patch.

Three obvious options are:

(1) Use the nearest SVE mode with a full ptrue (as the patch does).
(2) Use the nearest SVE mode with a 128-bit ptrue.
(3) Add new modes V16BI, V8BI, V4BI, V2BI, and V1BI.  (And possibly BI
for scalars.)

The problem with (1) is that, as Tamar pointed out, it doesn't work
properly with reductions.  It also isn't safe for this patch (without
fast-mathy options) because of FP exceptions.  Although writing to
a scalar FP register zeros the upper bits, and so gives a "safe" value
for this particular operation, nothing guarantees that all SF and DF
values have this zero-extended form.  They could come from subregs of
Advanced SIMD or SVE vectors.  The ABI also doesn't guarantee that
incoming SF and DF values are zero-extended.

(2) would be safe, but would mean that we continue to have an nunits
disagreement between the data mode and the predicate mode.  This would
prevent operations being described in generic RTL in future.

(3) is probably the cleanest representional approach, but has the problem
that we cannot store a fixed-length portion of an SVE predicate.
We would have to load and store the modes via other register classes.
(With PMOV we could use scalar FP loads and stores, but otherwise
we'd probably need secondary memory reloads.)  That said, we could
tell the RA to spill in a full predicate mode, so this shouldn't be
a problem unless the modes somehow get exposed to gimple or frontends.

WDYT?

Richard

>  ;; ...and again in lower case.
>  (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> new file mode 100644
> index 000..251b4ef9188
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-Ofast" } */
> +
> +float
> +test_ldexpf (float x, int i)
> +{
> +  return __builtin_ldexpf (x, i);
> +}
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, 
> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> +
> +double
> +test_ldexp (double x, int i)
> +{
> +  return __builtin_ldexp (x, i);
> +} 
> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.d, p[0-7]/m, 
> z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */


Re: [PATCH] Aarch64: Define WIDEST_HARDWARE_FP_SIZE

2024-10-03 Thread Richard Sandiford
Eric Botcazou  writes:
> Hi,
>
> the macro is documented like this in the internal manual:
>
>  -- Macro: WIDEST_HARDWARE_FP_SIZE
>  A C expression for the size in bits of the widest floating-point
>  format supported by the hardware.  If you define this macro, you
>  must specify a value less than or equal to mode precision of the
>  mode used for C type 'long double' (from hook
>  'targetm.c.mode_for_floating_type' with argument
>  'TI_LONG_DOUBLE_TYPE').  If you do not define this macro, mode
>  precision of the mode used for C type 'long double' is the default.
>
> AArch64 uses 128-bit TFmode for long double but, as far as I know, no FPU 
> implemented in hardware supports it.
>
> WIDEST_HARDWARE_FP_SIZE is taken into account in exactly two places:
>   - in libgcc for the implementation of float[uns]ti{sd}f,
>   - in the Ada front-end to cap the size clauses of floating-point types.
>
> The effect of the change on the first place can be seen by running nm on 
> libgcc/_floatdisf.o (which implements floattisf for Aarch64), from:
>  U __addtf3
>  U __floatditf
>  T __floattisf
>  U __floatunditf
>  U __multf3
>  U __trunctfsf2
> to just
>  T __floattisf

Oops!  Guess no-one looked in detail at the implementation, given that
it did work (if via an incredibly convoluted route).  The new implementation
looks much better...

> The effect of the change on the second place can be seen on the attached Ada 
> testcase, which fails without it and passes with it.
>
> Bootstrapped/regtested on Aarch64/Linux, OK for the mainline?
>
>
> 2024-10-01  Eric Botcazou  
>
>   * config/aarch64/aarch64.h (WIDEST_HARDWARE_FP_SIZE): Define to 64.
>
>
> 2024-10-01  Eric Botcazou  
>
>   * gnat.dg/specs/size_clause6.ads: New test.

OK, thanks.

Richard


Re: [PATCH] [PR86710][PR116826] match.pd: Fold logarithmic identities.

2024-10-03 Thread Jennifer Schmitz


> On 1 Oct 2024, at 14:27, Richard Biener  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 1 Oct 2024, Jennifer Schmitz wrote:
> 
>> This patch implements 4 rules for logarithmic identities in match.pd
>> under -funsafe-math-optimizations:
>> 1) logN(1.0/a) -> -logN(a). This avoids the division instruction.
>> 2) logN(C/a) -> logN(C) - logN(a), where C is a real constant. Same as 1).
>> 3) logN(a) + logN(b) -> logN(a*b). This reduces the number of calls to
>> log function.
>> 4) logN(a) - logN(b) -> logN(a/b). Same as 4).
>> Tests were added for float, double, and long double.
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu and
>> x86_64-linux-gnu, no regression.
>> Additionally, SPEC 2017 fprate was run. While the transform does not seem
>> to be triggered, we also see no non-noise impact on performance.
>> OK for mainline?
> 
> Since log can set errno we have the builtins affect global memory and
> thus have VDEFs, this posses issues for match.pd which does not assign
> new VDEFs upon materializing the result, esp. for the case where
> you duplicate a call.  There's a similar issue for -frounding-math
> where intermediate FP status changes can be lost.  match.pd simply
> follows the SSA use-def chains without regarding memory side-effects.
> 
> The transforms are guarded by flag_unsafe_math_optimizations but here
> I think we need !HONOR_SIGN_DEPENDENT_ROUNDING, !flag_trapping_math
> (exception state might be different for logN(a) - logN(b) -> logN(a/b),
> at least WRT INEXACT?), and !flag_errno_math (because of the VDEFs).
> 
> +  /* Simplify logN(C/a) into logN(C)-logN(a).  */
> +  (simplify
> +   (logs (rdiv:s REAL_CST@0 @1))
> +(minus (logs @0) (logs @1)))
> 
> I think you want
> 
> (minus (logs! @0) (logs @1))
> 
> here to make sure we constant-fold.
> 
> +  (simplify
> +   (minus (logs:s @0) (logs:s @1))
> +(logs (rdiv @0 @1
> 
> I think that's somewhat dangerous for @1 == 0 given log for
> zero arg results in -HUGE_VAL but a FP division by gives a NaN.
> I'm not exactly sure whether !HONOR_INFINITIES && !HONOR_NANS
> is good enough here.
> 
> Your testcases probably all trigger during GENERIC folding,
> bypassing the VDEF issue - you might want to try assigning
> the comparison operands to tempoaries to run into the actual
> issues.
Dear Richard,
Thanks for the review and suggesting the additional flags. I added 
- !HONOR_SIGN_DEPENDENT_ROUNDING
- !flag_trapping_math
- !flag_errno_math
- !HONOR_INFINITIES
- !HONOR_NANS
as guard before the patterns.
Can we add anything else to account for HUGE_VAL or will !HONOR_INFINITIES && 
!HONOR_NANS be enough? Or do you have a suggestion how I can check this?
I validated again on aarch64 and x86_64.
Best,
Jennifer


0001-PR86710-PR116826-match.pd-Fold-logarithmic-identitie.patch
Description: Binary data

> 
> Thanks,
> Richard.
> 
> 
>> Signed-off-by: Jennifer Schmitz 
>> 
>> gcc/
>>  * match.pd: Fold logN(1.0/a) -> -logN(a),
>>  logN(C/a) -> logN(C) - logN(a), logN(a) + logN(b) -> logN(a*b),
>>  and logN(a) - logN(b) -> logN(a/b).
>> 
>> gcc/testsuite/
>>  * gcc.dg/tree-ssa/log_ident.c: New test.
>> 
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] rs6000, fix test builtins-1-p10-runnable.c

2024-10-03 Thread Carl Love

GCC maintainers:

The builtins-1-10-runnable.c has the debugging inadvertently enabled.  
The test uses #ifdef to enable/disable the debugging. Unfortunately, the 
#define DEBUG was set to 0 to disable debugging and enable the call to 
abort in case of error.  The #define should have been removed to disable 
debugging.
Additionally, a change in the expected output which was made for testing 
purposes was not removed.  Hence, the test is printing that there was an 
error not calling abort.  The result is the test does not get reported 
as failing.


This patch removes the #define DEBUG to enable the call to abort and 
restores the expected output to the correct value.  The patch was tested 
on a Power 10 without the #define DEBUG to verify that the test does 
fail with the incorrect expected value.  The correct expected value was 
then restored.  The test reports 19 expected passes and no errors.


Please let me know if this patch is acceptable for mainline. Thanks.

   Carl


---

rs6000, fix test builtins-1-p10-runnable.c

The test has two issues:

1) The test should generate execute abort() if an error is found.
However, the test contains a #define 0 which actually enables the
error prints not exectuting void() because the debug code is protected
by an #ifdef not #if.  The #define DEBUG needs to be removed to so the
test will abort on an error.

2) The vec_i_expected output was tweeked to test that it would fail.
The test value was not removed.

By removing the #define DEBUG, the test fails and reports 1 failure.
Removing the intentionally wrong expected value results in the test
passing with no errors as expected.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-1-p10-runnable.c: Remove #define
    DEBUG.    Replace vec_i_expected value with correct value.
---
 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

index 222c8b3a409..3e8a1c736e3 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c
@@ -25,8 +25,6 @@
 #include 
 #include 

-#define DEBUG 0
-
 #ifdef DEBUG
 #include 
 #endif
@@ -281,8 +279,7 @@ int main()
 /* Signed word multiply high */
 i_arg1 = (vector int){ 2147483648, 2147483648, 2147483648, 
2147483648 };

 i_arg2 = (vector int){ 2, 3, 4, 5};
-    //    vec_i_expected = (vector int){-1, -2, -2, -3};
-    vec_i_expected = (vector int){1, -2, -2, -3};
+    vec_i_expected = (vector int){-1, -2, -2, -3};

 vec_i_result = vec_mulh (i_arg1, i_arg2);

--
2.46.0




[PATCH v4] RISC-V: Implement TARGET_CAN_INLINE_P

2024-10-03 Thread Yangyu Chen
Currently, we lack support for TARGET_CAN_INLINE_P on the RISC-V
ISA. As a result, certain functions cannot be optimized with inlining
when specific options, such as __attribute__((target("arch=+v"))) .
This can lead to potential performance issues when building
retargetable binaries for RISC-V.

To address this, I have implemented the riscv_can_inline_p function.
This addition enables inlining when the callee either has no special
options or when the some options match, and also ensuring that the
callee's ISA is a subset of the caller's. I also check some other
options when there is no always_inline set.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (cl_opt_var_ref_t): Add
cl_opt_var_ref_t pointer to member of cl_target_option.
(struct riscv_ext_flag_table_t): Add new cl_opt_var_ref_t field.
(RISCV_EXT_FLAG_ENTRY): New macro to simplify the definition of
riscv_ext_flag_table.
(riscv_ext_is_subset): New function to check if the callee's ISA
is a subset of the caller's.
(riscv_x_target_flags_isa_mask): New function to get the mask of
ISA extension in x_target_flags of gcc_options.
* config/riscv/riscv-subset.h (riscv_ext_is_subset): Declare
riscv_ext_is_subset function.
(riscv_x_target_flags_isa_mask): Declare
riscv_x_target_flags_isa_mask function.
* config/riscv/riscv.cc (riscv_can_inline_p): New function.
(TARGET_CAN_INLINE_P): Implement TARGET_CAN_INLINE_P.
---
 gcc/common/config/riscv/riscv-common.cc | 372 +---
 gcc/config/riscv/riscv-subset.h |   3 +
 gcc/config/riscv/riscv.cc   |  66 +
 3 files changed, 276 insertions(+), 165 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index bd42fd01532..941828a1566 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1567,191 +1567,196 @@ riscv_arch_str (bool version_p)
 return std::string();
 }
 
-/* Type for pointer to member of gcc_options.  */
+/* Type for pointer to member of gcc_options and cl_target_option.  */
 typedef int (gcc_options::*opt_var_ref_t);
+typedef int (cl_target_option::*cl_opt_var_ref_t);
 
 /* Types for recording extension to internal flag.  */
 struct riscv_ext_flag_table_t {
   const char *ext;
   opt_var_ref_t var_ref;
+  cl_opt_var_ref_t cl_var_ref;
   int mask;
 };
 
+#define RISCV_EXT_FLAG_ENTRY(NAME, VAR, MASK) \
+  {NAME, &gcc_options::VAR, &cl_target_option::VAR, MASK}
+
 /* Mapping table between extension to internal flag.  */
 static const riscv_ext_flag_table_t riscv_ext_flag_table[] =
 {
-  {"e", &gcc_options::x_target_flags, MASK_RVE},
-  {"m", &gcc_options::x_target_flags, MASK_MUL},
-  {"a", &gcc_options::x_target_flags, MASK_ATOMIC},
-  {"f", &gcc_options::x_target_flags, MASK_HARD_FLOAT},
-  {"d", &gcc_options::x_target_flags, MASK_DOUBLE_FLOAT},
-  {"c", &gcc_options::x_target_flags, MASK_RVC},
-  {"v", &gcc_options::x_target_flags, MASK_FULL_V},
-  {"v", &gcc_options::x_target_flags, MASK_VECTOR},
-
-  {"zicsr",&gcc_options::x_riscv_zi_subext, MASK_ZICSR},
-  {"zifencei", &gcc_options::x_riscv_zi_subext, MASK_ZIFENCEI},
-  {"zicond",   &gcc_options::x_riscv_zi_subext, MASK_ZICOND},
-
-  {"za64rs",  &gcc_options::x_riscv_za_subext, MASK_ZA64RS},
-  {"za128rs", &gcc_options::x_riscv_za_subext, MASK_ZA128RS},
-  {"zawrs",   &gcc_options::x_riscv_za_subext, MASK_ZAWRS},
-  {"zaamo",   &gcc_options::x_riscv_za_subext, MASK_ZAAMO},
-  {"zalrsc",  &gcc_options::x_riscv_za_subext, MASK_ZALRSC},
-  {"zabha",   &gcc_options::x_riscv_za_subext, MASK_ZABHA},
-  {"zacas",   &gcc_options::x_riscv_za_subext, MASK_ZACAS},
-
-  {"zba",&gcc_options::x_riscv_zb_subext, MASK_ZBA},
-  {"zbb",&gcc_options::x_riscv_zb_subext, MASK_ZBB},
-  {"zbc",&gcc_options::x_riscv_zb_subext, MASK_ZBC},
-  {"zbs",&gcc_options::x_riscv_zb_subext, MASK_ZBS},
-
-  {"zfinx",&gcc_options::x_riscv_zinx_subext, MASK_ZFINX},
-  {"zdinx",&gcc_options::x_riscv_zinx_subext, MASK_ZDINX},
-  {"zhinx",&gcc_options::x_riscv_zinx_subext, MASK_ZHINX},
-  {"zhinxmin", &gcc_options::x_riscv_zinx_subext, MASK_ZHINXMIN},
-
-  {"zbkb",   &gcc_options::x_riscv_zk_subext, MASK_ZBKB},
-  {"zbkc",   &gcc_options::x_riscv_zk_subext, MASK_ZBKC},
-  {"zbkx",   &gcc_options::x_riscv_zk_subext, MASK_ZBKX},
-  {"zknd",   &gcc_options::x_riscv_zk_subext, MASK_ZKND},
-  {"zkne",   &gcc_options::x_riscv_zk_subext, MASK_ZKNE},
-  {"zknh",   &gcc_options::x_riscv_zk_subext, MASK_ZKNH},
-  {"zkr",&gcc_options::x_riscv_zk_subext, MASK_ZKR},
-  {"zksed",  &gcc_options::x_riscv_zk_subext, MASK_ZKSED},
-  {"zksh",   &gcc_options::x_riscv_zk_subext, MASK_ZKSH},
-  {"zkt",&gcc_options::x_riscv_zk_subext, MASK_ZKT},
-
-  {"zihintntl", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTNTL},
-  {"zihintpause", &gcc_options::x_riscv_zi_subext, MASK_ZIHINTPAUSE},
-  {"ziccamoa", &gcc_o

Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-10-03 Thread Kyrylo Tkachov


> On 3 Oct 2024, at 16:41, Richard Sandiford  wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
> Soumya AR  writes:
>> From 7fafcb5e0174c56205ec05406c9a412196ae93d3 Mon Sep 17 00:00:00 2001
>> From: Soumya AR 
>> Date: Thu, 3 Oct 2024 11:53:07 +0530
>> Subject: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction
>> 
>> This patch uses the FSCALE instruction provided by SVE to implement the
>> standard ldexp family of functions.
>> 
>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
>> following code:
>> 
>> float
>> test_ldexpf (float x, int i)
>> {
>>  return __builtin_ldexpf (x, i);
>> }
>> 
>> double
>> test_ldexp (double x, int i)
>> {
>>  return __builtin_ldexp(x, i);
>> }
>> 
>> GCC Output:
>> 
>> test_ldexpf:
>>  b ldexpf
>> 
>> test_ldexp:
>>  b ldexp
>> 
>> Since SVE has support for an FSCALE instruction, we can use this to process
>> scalar floats by moving them to a vector register and performing an fscale 
>> call,
>> similar to how LLVM tackles an ldexp builtin as well.
>> 
>> New Output:
>> 
>> test_ldexpf:
>>  fmov s31, w0
>>  ptrue p7.b, all
>>  fscale z0.s, p7/m, z0.s, z31.s
>>  ret
>> 
>> test_ldexp:
>>  sxtw x0, w0
>>  ptrue p7.b, all
>>  fmov d31, x0
>>  fscale z0.d, p7/m, z0.d, z31.d
>>  ret
>> 
>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
>> OK for mainline?
> 
> Could we also use the .H form for __builtin_ldexpf16?
> 
> I suppose:
> 
>> @@ -2286,7 +2289,8 @@
>>   (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
>>   (V8QI "VNx8BI") (V16QI "VNx16BI")
>>   (V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
>> -  (V4SI "VNx4BI") (V2DI "VNx2BI")])
>> +  (V4SI "VNx4BI") (V2DI "VNx2BI")
>> +  (SF "VNx4BI") (DF "VNx2BI")])
> 
> ...this again raises the question what we should do for predicate
> modes when the data mode isn't a natural SVE mode.  That came up
> recently in relation to V1DI in the popcount patch, and for reductions
> in the ANDV etc. patch.

Thanks you for enumerating the options below.

> 
> Three obvious options are:
> 
> (1) Use the nearest SVE mode with a full ptrue (as the patch does).
> (2) Use the nearest SVE mode with a 128-bit ptrue.
> (3) Add new modes V16BI, V8BI, V4BI, V2BI, and V1BI.  (And possibly BI
>for scalars.)

Just to be clear, what do you mean by “nearest SVE mode” in this context?


> 
> The problem with (1) is that, as Tamar pointed out, it doesn't work
> properly with reductions.  It also isn't safe for this patch (without
> fast-mathy options) because of FP exceptions.  Although writing to
> a scalar FP register zeros the upper bits, and so gives a "safe" value
> for this particular operation, nothing guarantees that all SF and DF
> values have this zero-extended form.  They could come from subregs of
> Advanced SIMD or SVE vectors.  The ABI also doesn't guarantee that
> incoming SF and DF values are zero-extended.
> 
> (2) would be safe, but would mean that we continue to have an nunits
> disagreement between the data mode and the predicate mode.  This would
> prevent operations being described in generic RTL in future.
> 
> (3) is probably the cleanest representional approach, but has the problem
> that we cannot store a fixed-length portion of an SVE predicate.
> We would have to load and store the modes via other register classes.
> (With PMOV we could use scalar FP loads and stores, but otherwise
> we'd probably need secondary memory reloads.)  That said, we could
> tell the RA to spill in a full predicate mode, so this shouldn't be
> a problem unless the modes somehow get exposed to gimple or frontends.
> 
> WDYT?

IMO option (2) sounds the more appealing at this stage. To me it feels
conceptually straightforward as we are using a SVE operation clamped at
128 bits to “emulate” what should have been an 128-bit fixed-width mode
operation.
It also feels that, given the complexity of (3) and introducing new modes,
we should go for (3) only if/when we do decide to implement these ops with
generic RTL.

Thanks,
Kyrill

> 
> Richard
> 
>> ;; ...and again in lower case.
>> (define_mode_attr vpred [(VNx16QI "vnx16bi") (VNx8QI "vnx8bi")
>> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fscale.c 
>> b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> new file mode 100644
>> index 000..251b4ef9188
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/sve/fscale.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-Ofast" } */
>> +
>> +float
>> +test_ldexpf (float x, int i)
>> +{
>> +  return __builtin_ldexpf (x, i);
>> +}
>> +/* { dg-final { scan-assembler-times {\tfscale\tz[0-9]+\.s, p[0-7]/m, 
>> z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
>> +
>> +double
>> +test_ldexp (double x, int i)
>> +{
>> +  return __builtin_ldexp (x, i);
>> +}
>> +/* { dg-final { sc

[PATCH 1/3] cfgexpand: Expand comment on when non-var clobbers can show up

2024-10-03 Thread Andrew Pinski
The comment here is not wrong, just it would be better if mentioning
the C++ front-end instead of just the nested function lowering.

gcc/ChangeLog:

* cfgexpand.cc (add_scope_conflicts_1): Expand comment
on when non-var clobbers show up.

Signed-off-by: Andrew Pinski 
---
 gcc/cfgexpand.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index f32cf1b20c9..6c1096363af 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -639,8 +639,9 @@ add_scope_conflicts_1 (basic_block bb, bitmap work, bool 
for_conflict)
{
  tree lhs = gimple_assign_lhs (stmt);
  unsigned *v;
- /* Nested function lowering might introduce LHSs
-that are COMPONENT_REFs.  */
+ /* Handle only plain var clobbers.
+Nested functions lowering and C++ front-end inserts clobbers
+which are not just plain variables.  */
  if (!VAR_P (lhs))
continue;
  if (DECL_RTL_IF_SET (lhs) == pc_rtx
-- 
2.34.1



[PATCH 2/3] cfgexpand: Handle scope conflicts better [PR111422]

2024-10-03 Thread Andrew Pinski
After fixing loop-im to do the correct overflow rewriting
for pointer types too. We end up with code like:
```
  _9 = (unsigned long) &g;
  _84 = _9 + 18446744073709551615;
  _11 = _42 + _84;
  _44 = (signed char *) _11;
...
  *_44 = 10;
  g ={v} {CLOBBER(eos)};
...
  n[0] = &f;
  *_44 = 8;
  g ={v} {CLOBBER(eos)};
```
Which was not being recongized by the scope conflicts code.
This was because it only handled one level walk backs rather than multiple ones.
This fixes it by using a work_list to avoid huge recursion and a visited 
bitmape to avoid
going into an infinite loops when dealing with loops.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/111422

gcc/ChangeLog:

* cfgexpand.cc (add_scope_conflicts_2): Rewrite to be a full walk
of all operands and their uses.

Signed-off-by: Andrew Pinski 
---
 gcc/cfgexpand.cc | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index 6c1096363af..2e653d7207c 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -573,32 +573,40 @@ visit_conflict (gimple *, tree op, tree, void *data)
 
 /* Helper function for add_scope_conflicts_1.  For USE on
a stmt, if it is a SSA_NAME and in its SSA_NAME_DEF_STMT is known to be
-   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR.  */
+   based on some ADDR_EXPR, invoke VISIT on that ADDR_EXPR. Also walk
+   the assignments backwards as they might be based on an ADDR_EXPR.  */
 
-static inline void
+static void
 add_scope_conflicts_2 (tree use, bitmap work,
   walk_stmt_load_store_addr_fn visit)
 {
-  if (TREE_CODE (use) == SSA_NAME
-  && (POINTER_TYPE_P (TREE_TYPE (use))
- || INTEGRAL_TYPE_P (TREE_TYPE (use
+  auto_vec work_list;
+  auto_bitmap visited_ssa_names;
+  work_list.safe_push (use);
+
+  while (!work_list.is_empty())
 {
-  gimple *g = SSA_NAME_DEF_STMT (use);
-  if (gassign *a = dyn_cast  (g))
+  use = work_list.pop();
+  if (!use)
+   continue;
+  if (TREE_CODE (use) == ADDR_EXPR)
+   visit (nullptr, TREE_OPERAND (use, 0), use, work);
+  else if (TREE_CODE (use) == SSA_NAME
+  && (POINTER_TYPE_P (TREE_TYPE (use))
+  || INTEGRAL_TYPE_P (TREE_TYPE (use
{
- if (tree op = gimple_assign_rhs1 (a))
-   if (TREE_CODE (op) == ADDR_EXPR)
- visit (a, TREE_OPERAND (op, 0), op, work);
+ gimple *g = SSA_NAME_DEF_STMT (use);
+ if (!bitmap_set_bit (visited_ssa_names, SSA_NAME_VERSION(use)))
+   continue;
+ if (gassign *a = dyn_cast  (g))
+   {
+ for (unsigned i = 1; i < gimple_num_ops (g); i++)
+   work_list.safe_push (gimple_op (a, i));
+   }
+ else if (gphi *p = dyn_cast  (g))
+   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
+ work_list.safe_push (gimple_phi_arg_def (p, i));
}
-  else if (gphi *p = dyn_cast  (g))
-   for (unsigned i = 0; i < gimple_phi_num_args (p); ++i)
- if (TREE_CODE (use = gimple_phi_arg_def (p, i)) == SSA_NAME)
-   if (gassign *a = dyn_cast  (SSA_NAME_DEF_STMT (use)))
- {
-   if (tree op = gimple_assign_rhs1 (a))
- if (TREE_CODE (op) == ADDR_EXPR)
-   visit (a, TREE_OPERAND (op, 0), op, work);
- }
 }
 }
 
-- 
2.34.1



[PATCH 3/3] gimple: Add gimple_with_undefined_signed_overflow and use it [PR111276]

2024-10-03 Thread Andrew Pinski
While looking into the ifcombine, I noticed that rewrite_to_defined_overflow
was rewriting already defined code. In the previous attempt at fixing this,
the review mentioned we should not be calling rewrite_to_defined_overflow
in those cases. The places which called rewrite_to_defined_overflow didn't
always check the lhs of the assignment. This fixes the problem by
introducing a helper function which is to be used before calling
rewrite_to_defined_overflow.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

PR tree-optimization/111276
* gimple-fold.cc (arith_code_with_undefined_signed_overflow): Make 
static.
(gimple_with_undefined_signed_overflow): New function.
* gimple-fold.h (arith_code_with_undefined_signed_overflow): Remove.
(gimple_with_undefined_signed_overflow): Add declaration.
* tree-if-conv.cc (if_convertible_gimple_assign_stmt_p): Use
gimple_with_undefined_signed_overflow instead of manually
checking lhs and the code of the stmt.
(predicate_statements): Likewise.
* tree-ssa-ifcombine.cc (pass_tree_ifcombine::execute): Likewise.
* tree-ssa-loop-im.cc (move_computations_worker): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise. Reformat.
* tree-scalar-evolution.cc (final_value_replacement_loop): Use
gimple_with_undefined_signed_overflow instead of
arith_code_with_undefined_signed_overflow.
* tree-ssa-loop-split.cc (split_loop): Likewise.

Signed-off-by: Andrew Pinski 
---
 gcc/gimple-fold.cc   | 26 ++-
 gcc/gimple-fold.h|  2 +-
 gcc/tree-if-conv.cc  | 16 +++
 gcc/tree-scalar-evolution.cc |  5 +
 gcc/tree-ssa-ifcombine.cc| 10 ++---
 gcc/tree-ssa-loop-im.cc  |  6 +-
 gcc/tree-ssa-loop-split.cc   |  5 +
 gcc/tree-ssa-reassoc.cc  | 40 +++-
 8 files changed, 50 insertions(+), 60 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index 942de7720fd..0b49d6754e2 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8991,7 +8991,7 @@ gimple_fold_indirect_ref (tree t)
integer types involves undefined behavior on overflow and the
operation can be expressed with unsigned arithmetic.  */
 
-bool
+static bool
 arith_code_with_undefined_signed_overflow (tree_code code)
 {
   switch (code)
@@ -9008,6 +9008,30 @@ arith_code_with_undefined_signed_overflow (tree_code 
code)
 }
 }
 
+/* Return true if STMT has an operation that operates on a signed
+   integer types involves undefined behavior on overflow and the
+   operation can be expressed with unsigned arithmetic.  */
+
+bool
+gimple_with_undefined_signed_overflow (gimple *stmt)
+{
+  if (!is_gimple_assign (stmt))
+return false;
+  tree lhs = gimple_assign_lhs (stmt);
+  if (!lhs)
+return false;
+  tree lhs_type = TREE_TYPE (lhs);
+  if (!INTEGRAL_TYPE_P (lhs_type)
+  && !POINTER_TYPE_P (lhs_type))
+return false;
+  if (!TYPE_OVERFLOW_UNDEFINED (lhs_type))
+return false;
+  if (!arith_code_with_undefined_signed_overflow
+   (gimple_assign_rhs_code (stmt)))
+return false;
+  return true;
+}
+
 /* Rewrite STMT, an assignment with a signed integer or pointer arithmetic
operation that can be transformed to unsigned arithmetic by converting
its operand, carrying out the operation in the corresponding unsigned
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index dc709d515a9..165325392c9 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -59,7 +59,7 @@ extern tree gimple_get_virt_method_for_vtable (HOST_WIDE_INT, 
tree,
 extern tree gimple_fold_indirect_ref (tree);
 extern bool gimple_fold_builtin_sprintf (gimple_stmt_iterator *);
 extern bool gimple_fold_builtin_snprintf (gimple_stmt_iterator *);
-extern bool arith_code_with_undefined_signed_overflow (tree_code);
+extern bool gimple_with_undefined_signed_overflow (gimple *);
 extern void rewrite_to_defined_overflow (gimple_stmt_iterator *);
 extern gimple_seq rewrite_to_defined_overflow (gimple *);
 extern void replace_call_with_value (gimple_stmt_iterator *, tree);
diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 3b04d1e8d34..f5aa6c04fc9 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1067,11 +1067,7 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
fprintf (dump_file, "tree could trap...\n");
   return false;
 }
-  else if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
-   || POINTER_TYPE_P (TREE_TYPE (lhs)))
-  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
-  && arith_code_with_undefined_signed_overflow
-   (gimple_assign_rhs_code (stmt)))
+  else if (gimple_with_undefined_signed_overflow (stmt))
 /* We have to rewrite stmts with undefined overflow.  */
 need_to_rewrite_undefined = true;
 
@@ -2820,7 +2816,6 @@ predicate_statements (loop_p loop)
   for (gsi = gsi_star

Re: [PATCH] expr: Don't clear whole unions [PR116416]

2024-10-03 Thread Jason Merrill

On 9/28/24 2:39 AM, Jakub Jelinek wrote:

On Fri, Sep 27, 2024 at 04:01:33PM +0200, Jakub Jelinek wrote:

So, I think we should go with (but so far completely untested except
for pr78687.C which is optimized with Marek's patch and the above testcase
which doesn't have the clearing anymore) the following patch.


That patch had a bug in type_has_padding_at_level_p and so it didn't
bootstrap.

Here is a full patch which does.  It regressed the infoleak-1.c test
which I've adjusted, but I think the test had undefined behavior.
In particular the question is whether
   union un_b { unsigned char j; unsigned int i; } u = {0};
leaves (or can leave) some bits uninitialized or not.

I believe it can, it is an explicit initialization of the j member
which is just 8-bit (but see my upcoming mail on padding bits in C23/C++)
and nothing in the C standard from what I can see seems to imply the padding
bits in the union beyond the actually initialized field in this case would
be initialized.


Agreed, the padding bits have indeterminate values (or erroneous in 
C++26), so it's correct for infoleak-1.c to complain about 4b.



Though, looking at godbolt, clang and icc 19 and older gcc all do zero
initialize the whole union before storing the single member in there (if
non-zero, otherwise just clear).

So whether we want to do this or do it by default is another question.


We will want to initialize the padding (for all types) to something for 
C++26, but that's a separate issue...



Anyway, bootstrapped/regtested on x86_64-linux and i686-linux successfully.

2024-09-28  Jakub Jelinek  

PR c++/116416
* expr.cc (categorize_ctor_elements_1): Fix up union handling of
*p_complete.  Clear it only if num_fields is 0 and the union has
at least one FIELD_DECL, set to -1 if either union has no fields
and non-zero size, or num_fields is 1 and complete_ctor_at_level_p
returned false.


Hmm, complete_ctor_at_level_p also seems to need a change for this 
understanding of union semantics: "every meaningful byte" depends on the 
active member, so it seems like it should return true for a union iff 
num_elts == 1.


Jason



[to-be-committed][RISC-V] Add splitters to restore condops generation after recent phiopt changes

2024-10-03 Thread Jeff Law

Andrew P's recent improvements to phiopt regressed on the riscv testsuite.

Essentially the new code presented to the RTL optimizers is straightline 
code rather than branchy for the CE pass to analyze and optimize.  In 
the absence of conditional move support or sfb, the new code would be 
better.


Unfortunately the presented form isn't a great fit for xventanacondops, 
zicond or xtheadcondmov.  The net is the resulting code is actually 
slightly worse than before.  Essentially sne+czero turned into sne+sne+and.


Thankfully, combine is presented with

(and (ne (op1) (const_int 0))
 (ne (op2) (const_int 0)))

As the RHS of a set.  We can use a 3->2 splitter to guide combine on how 
to profitably rewrite the sequence in a form suitable for condops.  Just 
splitting that would be enough to fix the regression, but I'm fairly 
confident that other cases need to be handled and would have regressed 
had the testsuite been more thorough.



One arm of the AND is going to turn into an sCC instruction.  We have a 
variety of those that we support.  The codes vary as do the allowed 
operands of the sCC.  That produces a set of new splitters to handle 
those cases.


The other arm is going to turn into a czero (or similar) instruction. 
That one can be generalized to eq/ne.  So another set for that 
generalization.


We can remove a couple of XFAILs in the rv32 space as it's behaving much 
more like rv64 at this point.


For SFB targets it's unclear if the new code is better or worse.  In 
both cases it's a 3 instruction sequence.   So I just adjusted the test. 
 If the new code is worse for SFB, someone with an understanding of the 
tradeoffs for an SFB target will need to make adjustments.


Tested in my tester on rv64gcv and rv32gc.  Will wait for the pre-commit 
testers to render their verdict before moving forward.



Jeff* config/riscv/iterators.md (scc_0): New code iterator.
* config/riscv/zicond.md: New splitters to improve code generated for
cases like (and (scc) (scc)) for zicond, xventanacondops, xtheadcondmov.

* gcc.target/riscv/cset-sext-sfb.c: Turn off ssa-phiopt.
* gcc.target/riscv/cset-sext-thead.c: Do not check CE output anymore.
* gcc.target/riscv/cset-sext-ventana.c: Similarly.
* gcc.target/riscv/cset-sext-zicond.c: Similarly.  Drop rv32 xfail.
* gcc.target/riscv/cset-sext.c: Similarly.  No longer allow
"not" in asm output.

diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 2844cb02ff0..872c542e906 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -233,6 +233,8 @@ (define_code_iterator any_gt [gt gtu])
 (define_code_iterator any_ge [ge geu])
 (define_code_iterator any_lt [lt ltu])
 (define_code_iterator any_le [le leu])
+;; Iterators for conditions we can emit a sCC against 0 or a reg directly
+(define_code_iterator scc_0  [eq ne gt gtu])
 
 ; atomics code iterator
 (define_code_iterator any_atomic [plus ior xor and])
diff --git a/gcc/config/riscv/zicond.md b/gcc/config/riscv/zicond.md
index 3876be7f9d2..91cdec00e18 100644
--- a/gcc/config/riscv/zicond.md
+++ b/gcc/config/riscv/zicond.md
@@ -124,3 +125,115 @@ (define_split
 {
   operands[2] = GEN_INT (1 << UINTVAL(operands[2]));
 })
+
+;; In some cases gimple can give us a sequence with a logical and
+;; of two sCC insns.  This can be implemented an sCC feeding a 
+;; conditional zero.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (ne:X (match_operand:X 1 "register_operand") (const_int 0))
+  (scc_0:X (match_operand:X 2 "register_operand")
+   (match_operand:X 3 "reg_or_0_operand"
+   (clobber (match_operand:X 4 "register_operand"))]
+  "TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV"
+  [(set (match_dup 4) (scc_0:X (match_dup 2) (match_dup 3)))
+   (set (match_dup 0) (if_then_else:X (eq:X (match_dup 1) (const_int 0))
+ (const_int 0)
+ (match_dup 4)))])
+
+;; Similarly but GE/GEU which requires (const_int 1) as an operand.
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (ne:X (match_operand:X 1 "register_operand") (const_int 0))
+  (any_ge:X (match_operand:X 2 "register_operand")
+(const_int 1
+   (clobber (match_operand:X 3 "register_operand"))]
+  "TARGET_ZICOND_LIKE || TARGET_XTHEADCONDMOV"
+  [(set (match_dup 3) (any_ge:X (match_dup 2) (const_int 1)))
+   (set (match_dup 0) (if_then_else:X (eq:X (match_dup 1) (const_int 0))
+ (const_int 0)
+ (match_dup 3)))])
+
+;; Similarly but LU/LTU which allows an arith_operand
+(define_split
+  [(set (match_operand:X 0 "register_operand")
+   (and:X (ne:X (match_operand:X 1 "register_operand") (const_int 0))
+  (any_lt:X (match_operand:X 2 "register_operand")
+ 

Re: [PATCH 2/3] Release expanded template argument vector

2024-10-03 Thread Jason Merrill

On 10/2/24 7:50 AM, Richard Biener wrote:

This reduces peak memory usage by 20% for a specific testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

It's very ugly so I'd appreciate suggestions on how to handle such
situations better?


I'm pushing this alternative patch, tested x86_64-pc-linux-gnu.From 5b08ae503dd4aef2789a667daaf1984e7cc94aaa Mon Sep 17 00:00:00 2001
From: Jason Merrill 
Date: Wed, 2 Oct 2024 08:05:28 -0400
Subject: [PATCH] c++: free garbage vec in coerce_template_parms
To: gcc-patches@gcc.gnu.org

coerce_template_parms can create two different vecs for the inner template
arguments, new_inner_args and (potentially) the result of
expand_template_argument_pack.  One or the other, or possibly both, end up
being garbage: in the typical case, the expanded vec is garbage because it's
only used as the source for convert_template_argument.  In some dependent
cases, the new vec is garbage because we decide to return the original args
instead.  In these cases, ggc_free the garbage vec to reduce the memory
overhead of overload resolution.

gcc/cp/ChangeLog:

	* pt.cc (struct free_if_changed_proxy): New.
	(coerce_template_parms): Use it.

Co-authored-by: Richard Biener  
---
 gcc/cp/pt.cc | 36 ++--
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 20affcd65a2..6d488128d68 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -9079,6 +9079,30 @@ pack_expansion_args_count (tree args)
   return count;
 }
 
+/* Used for a variable of pointer type T (typically 'tree') that starts out
+   pointing to exposed data, but might get changed to point to internal data
+   that can be safely discarded at scope exit.  Use .release when exposing the
+   internal data to prevent ggc_free.  */
+
+template 
+struct free_if_changed_proxy
+{
+  T val;
+  T orig;
+
+  free_if_changed_proxy (T t): val(t), orig(t) { }
+  ~free_if_changed_proxy ()
+  {
+if (val != orig)
+  ggc_free (val);
+  }
+
+  T release () { return orig = val; }
+
+  operator T () { return val; }
+  free_if_changed_proxy& operator= (const T& t) { val = t; return *this; }
+};
+
 /* Convert all template arguments to their appropriate types, and
return a vector containing the innermost resulting template
arguments.  If any error occurs, return error_mark_node. Error and
@@ -9105,8 +9129,6 @@ coerce_template_parms (tree parms,
 		   bool require_all_args /* = true */)
 {
   int nparms, nargs, parm_idx, arg_idx, lost = 0;
-  tree orig_inner_args;
-  tree inner_args;
 
   /* When used as a boolean value, indicates whether this is a
  variadic template parameter list. Since it's an int, we can also
@@ -9152,7 +9174,6 @@ coerce_template_parms (tree parms,
 	++default_p;
 }
 
-  inner_args = orig_inner_args = INNERMOST_TEMPLATE_ARGS (args);
   /* If there are no parameters that follow a parameter pack, we need to
  expand any argument packs so that we can deduce a parameter pack from
  some non-packed args followed by an argument pack, as in variadic85.C.
@@ -9161,6 +9182,7 @@ coerce_template_parms (tree parms,
  with a nested class inside a partial specialization of a class
  template, as in variadic92.C, or when deducing a template parameter pack
  from a sub-declarator, as in variadic114.C.  */
+  free_if_changed_proxy inner_args = INNERMOST_TEMPLATE_ARGS (args);
   if (!post_variadic_parms)
 inner_args = expand_template_argument_pack (inner_args);
 
@@ -9275,7 +9297,8 @@ coerce_template_parms (tree parms,
 	{
 	  /* We don't know how many args we have yet, just use the
 		 unconverted (and still packed) ones for now.  */
-	  new_inner_args = orig_inner_args;
+	  ggc_free (new_inner_args);
+	  new_inner_args = inner_args.orig;
 	  arg_idx = nargs;
 	  break;
 	}
@@ -9329,8 +9352,9 @@ coerce_template_parms (tree parms,
 		  = make_pack_expansion (conv, complain);
 
   /* We don't know how many args we have yet, just
- use the unconverted ones for now.  */
-  new_inner_args = inner_args;
+		 use the unconverted (but unpacked) ones for now.  */
+	  ggc_free (new_inner_args);
+	  new_inner_args = inner_args.release ();
 	  arg_idx = nargs;
   break;
 }
-- 
2.46.2



[PATCH] Further use of mod_scope in modified_type_die

2024-10-03 Thread Tom Tromey
I am working on some changes to GNAT to emit hierarchical DWARF --
i.e., where entities will have simple names nested in a DW_TAG_module.

While working on this I found a couple of paths in modified_type_die
where "mod_scope" should be used, but is not.  I suspect these cases
are only reachable by Ada code, as in both spots (subrange types and
base types), I believe that other languages don't generally have named
types in a non-top-level scope, and in these other situations,
mod_scope will still be correct.

gcc

* dwarf2out.cc (modified_type_die): Use mod_scope for
ranged types and base types.

Issue: eng/toolchain/gcc#241
---
 gcc/dwarf2out.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/dwarf2out.cc b/gcc/dwarf2out.cc
index 38aedb64470..67d2827c279 100644
--- a/gcc/dwarf2out.cc
+++ b/gcc/dwarf2out.cc
@@ -13927,7 +13927,7 @@ modified_type_die (tree type, int cv_quals, bool 
reverse,
   tree bias = NULL_TREE;
   if (lang_hooks.types.get_type_bias)
bias = lang_hooks.types.get_type_bias (type);
-  mod_type_die = subrange_type_die (type, low, high, bias, context_die);
+  mod_type_die = subrange_type_die (type, low, high, bias, mod_scope);
   item_type = TREE_TYPE (type);
 }
   else if (is_base_type (type))
@@ -13964,10 +13964,10 @@ modified_type_die (tree type, int cv_quals, bool 
reverse,
{
  dw_die_ref after_die
= modified_type_die (type, cv_quals, false, context_die);
- add_child_die_after (comp_unit_die (), mod_type_die, after_die);
+ add_child_die_after (mod_scope, mod_type_die, after_die);
}
   else
-   add_child_die (comp_unit_die (), mod_type_die);
+   add_child_die (mod_scope, mod_type_die);
 
   add_pubtype (type, mod_type_die);
 }
-- 
2.46.2



[PATCH 2/2] c++: -Wdeprecated enables later standard deprecations

2024-10-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

By default -Wdeprecated warns about deprecations in the active standard.
When specified explicitly, let's also warn about deprecations in later
standards.

gcc/c-family/ChangeLog:

* c-opts.cc (c_common_post_options): Explicit -Wdeprecated enables
deprecations from later standards.

gcc/ChangeLog:

* doc/invoke.texi: Explicit -Wdeprecated enables more warnings.
---
 gcc/doc/invoke.texi| 22 --
 gcc/c-family/c-opts.cc | 17 -
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c90f5b4d58e..d38c1feb86f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3864,8 +3864,10 @@ for code that is not valid in C++23 but used to be valid 
but deprecated
 in C++20 with a pedantic warning that can be disabled with
 @option{-Wno-comma-subscript}.
 
-Enabled by default with @option{-std=c++20} unless @option{-Wno-deprecated},
-and with @option{-std=c++23} regardless of @option{-Wno-deprecated}.
+Enabled by default with @option{-std=c++20} unless
+@option{-Wno-deprecated}, and after @option{-std=c++23} regardless of
+@option{-Wno-deprecated}.  Before @option{-std=c++20}, enabled with
+explicit @option{-Wdeprecated}.
 
 This warning is upgraded to an error by @option{-pedantic-errors} in
 C++23 mode or later.
@@ -4012,7 +4014,7 @@ int k = f - e;
 
 @option{-Wdeprecated-enum-enum-conversion} is enabled by default with
 @option{-std=c++20}.  In pre-C++20 dialects, this warning can be enabled
-by @option{-Wenum-conversion}.
+by @option{-Wenum-conversion} or @option{-Wdeprecated}.
 
 @opindex Wdeprecated-enum-float-conversion
 @opindex Wno-deprecated-enum-float-conversion
@@ -4030,14 +4032,14 @@ bool b = e <= 3.7;
 
 @option{-Wdeprecated-enum-float-conversion} is enabled by default with
 @option{-std=c++20}.  In pre-C++20 dialects, this warning can be enabled
-by @option{-Wenum-conversion}.
+by @option{-Wenum-conversion} or @option{-Wdeprecated}.
 
 @opindex Wdeprecated-literal-operator
 @opindex Wno-deprecated-literal-operator
 @item -Wdeprecated-literal-operator @r{(C++ and Objective-C++ only)}
 Warn that the declaration of a user-defined literal operator with a
 space before the suffix is deprecated.  This warning is enabled by
-default in C++23.
+default in C++23, or with explicit @option{-Wdeprecated}.
 
 @smallexample
 string operator "" _i18n(const char*, std::size_t); // deprecated
@@ -4740,7 +4742,8 @@ non-class type, @code{volatile}-qualified function return 
type,
 @code{volatile}-qualified parameter type, and structured bindings of a
 @code{volatile}-qualified type.  This usage was deprecated in C++20.
 
-Enabled by default with @option{-std=c++20}.
+Enabled by default with @option{-std=c++20}.  Before
+@option{-std=c++20}, enabled with explicit @option{-Wdeprecated}.
 
 @opindex Wzero-as-null-pointer-constant
 @opindex Wno-zero-as-null-pointer-constant
@@ -10389,6 +10392,13 @@ disable the error when compiled with @option{-Werror} 
flag.
 @item -Wno-deprecated
 Do not warn about usage of deprecated features.  @xref{Deprecated Features}.
 
+In C++, explicitly specifying @option{-Wdeprecated} also enables
+warnings about some features that are deprecated in later language
+standards, specifically @option{-Wcomma-subscript},
+@option{-Wvolatile}, @option{-Wdeprecated-enum-float-conversion},
+@option{-Wdeprecated-enum-enum-conversion}, and
+@option{-Wdeprecated-literal-operator}.
+
 @opindex Wno-deprecated-declarations
 @opindex Wdeprecated-declarations
 @item -Wno-deprecated-declarations
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 8ff3d966bb6..510e0870140 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -996,30 +996,37 @@ c_common_post_options (const char **pfilename)
   SET_OPTION_IF_UNSET (&global_options, &global_options_set, warn_register,
   cxx_dialect >= cxx17);
 
+  /* Explicit -Wdeprecated turns on warnings from later standards.  */
+  auto deprecated_in = [&](enum cxx_dialect d)
+  {
+if (OPTION_SET_P (warn_deprecated)) return !!warn_deprecated;
+return (warn_deprecated && cxx_dialect >= d);
+  };
+
   /* -Wcomma-subscript is enabled by default in C++20.  */
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
   warn_comma_subscript,
   cxx_dialect >= cxx23
-  || (cxx_dialect == cxx20 && warn_deprecated));
+  || deprecated_in (cxx20));
 
   /* -Wvolatile is enabled by default in C++20.  */
   SET_OPTION_IF_UNSET (&global_options, &global_options_set, warn_volatile,
-  cxx_dialect >= cxx20 && warn_deprecated);
+  deprecated_in (cxx20));
 
   /* -Wdeprecated-enum-enum-conversion is enabled by default in C++20.  */
   SET_OPTION_IF_UNSET (&global_options, &global_options_set,
   warn_deprecated_enum_enum_co

[PATCH 1/2] c++: add -Wdeprecated-literal-operator [CWG2521]

2024-10-03 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

C++23 CWG issue 2521 (https://wg21.link/cwg2521) deprecates user-defined
literal operators declared with the optional space between "" and the
suffix.

Many testcases used that syntax; I removed the space from most of them, and
added C++23 warning tests to a few.

CWG 2521

gcc/ChangeLog:

* doc/invoke.texi: Document -Wdeprecated-literal-operator.

gcc/c-family/ChangeLog:

* c.opt: Add -Wdeprecated-literal-operator.
* c-opts.cc (c_common_post_options): Default on in C++23.
* c.opt.urls: Regenerate.

gcc/cp/ChangeLog:

* parser.cc (location_between): New.
(cp_parser_operator): Handle -Wdeprecated-literal-operator.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-string-literal.h
* g++.dg/cpp0x/Wliteral-suffix2.C
* g++.dg/cpp0x/constexpr-55708.C
* g++.dg/cpp0x/gnu_fext-numeric-literals.C
* g++.dg/cpp0x/gnu_fno-ext-numeric-literals.C
* g++.dg/cpp0x/pr51420.C
* g++.dg/cpp0x/pr60209-neg.C
* g++.dg/cpp0x/pr60209.C
* g++.dg/cpp0x/pr61038.C
* g++.dg/cpp0x/std_fext-numeric-literals.C
* g++.dg/cpp0x/std_fno-ext-numeric-literals.C
* g++.dg/cpp0x/udlit-addr.C
* g++.dg/cpp0x/udlit-args-neg.C
* g++.dg/cpp0x/udlit-args.C
* g++.dg/cpp0x/udlit-args2.C
* g++.dg/cpp0x/udlit-clink-neg.C
* g++.dg/cpp0x/udlit-concat-neg.C
* g++.dg/cpp0x/udlit-concat.C
* g++.dg/cpp0x/udlit-constexpr.C
* g++.dg/cpp0x/udlit-cpp98-neg.C
* g++.dg/cpp0x/udlit-declare-neg.C
* g++.dg/cpp0x/udlit-embed-quote.C
* g++.dg/cpp0x/udlit-extended-id-1.C
* g++.dg/cpp0x/udlit-extended-id-3.C
* g++.dg/cpp0x/udlit-extern-c.C
* g++.dg/cpp0x/udlit-friend.C
* g++.dg/cpp0x/udlit-general.C
* g++.dg/cpp0x/udlit-implicit-conv-neg-char8_t.C
* g++.dg/cpp0x/udlit-implicit-conv-neg.C
* g++.dg/cpp0x/udlit-inline.C
* g++.dg/cpp0x/udlit-mangle.C
* g++.dg/cpp0x/udlit-member-neg.C
* g++.dg/cpp0x/udlit-namespace.C
* g++.dg/cpp0x/udlit-nofunc-neg.C
* g++.dg/cpp0x/udlit-nonempty-str-neg.C
* g++.dg/cpp0x/udlit-nosuffix-neg.C
* g++.dg/cpp0x/udlit-nounder-neg.C
* g++.dg/cpp0x/udlit-operator-neg.C
* g++.dg/cpp0x/udlit-overflow-neg.C
* g++.dg/cpp0x/udlit-overflow.C
* g++.dg/cpp0x/udlit-preproc-neg.C
* g++.dg/cpp0x/udlit-raw-length.C
* g++.dg/cpp0x/udlit-raw-op-string-neg.C
* g++.dg/cpp0x/udlit-raw-op.C
* g++.dg/cpp0x/udlit-raw-str.C
* g++.dg/cpp0x/udlit-resolve-char8_t.C
* g++.dg/cpp0x/udlit-resolve.C
* g++.dg/cpp0x/udlit-shadow-neg.C
* g++.dg/cpp0x/udlit-string-length.C
* g++.dg/cpp0x/udlit-suffix-neg.C
* g++.dg/cpp0x/udlit-template.C
* g++.dg/cpp0x/udlit-tmpl-arg-neg.C
* g++.dg/cpp0x/udlit-tmpl-arg-neg2.C
* g++.dg/cpp0x/udlit-tmpl-arg.C
* g++.dg/cpp0x/udlit-tmpl-parms-neg.C
* g++.dg/cpp0x/udlit-tmpl-parms.C
* g++.dg/cpp1y/pr57640.C
* g++.dg/cpp1y/pr88872.C
* g++.dg/cpp26/unevalstr1.C
* g++.dg/cpp2a/concepts-pr60391.C
* g++.dg/cpp2a/consteval-prop21.C
* g++.dg/cpp2a/nontype-class6.C
* g++.dg/cpp2a/udlit-class-nttp-ctad-neg.C
* g++.dg/cpp2a/udlit-class-nttp-ctad-neg2.C
* g++.dg/cpp2a/udlit-class-nttp-ctad.C
* g++.dg/cpp2a/udlit-class-nttp-neg.C
* g++.dg/cpp2a/udlit-class-nttp-neg2.C
* g++.dg/cpp2a/udlit-class-nttp.C
* g++.dg/ext/is_convertible2.C
* g++.dg/lookup/pr87269.C
* g++.dg/cpp0x/udlit_system_header: Adjust for C++23 deprecated
operator "" _suffix.
* g++.dg/DRs/dr2521.C: New test.
---
 gcc/doc/invoke.texi   | 12 ++
 gcc/c-family/c.opt|  4 ++
 .../g++.dg/cpp0x/udlit-string-literal.h   | 10 ++---
 gcc/c-family/c-opts.cc|  5 +++
 gcc/cp/parser.cc  | 33 +--
 gcc/testsuite/g++.dg/DRs/dr2521.C |  5 +++
 gcc/testsuite/g++.dg/cpp0x/Wliteral-suffix2.C |  5 ++-
 gcc/testsuite/g++.dg/cpp0x/constexpr-55708.C  |  2 +-
 .../g++.dg/cpp0x/gnu_fext-numeric-literals.C  | 32 +++
 .../cpp0x/gnu_fno-ext-numeric-literals.C  | 32 +++
 gcc/testsuite/g++.dg/cpp0x/pr51420.C  |  4 +-
 gcc/testsuite/g++.dg/cpp0x/pr60209-neg.C  | 16 
 gcc/testsuite/g++.dg/cpp0x/pr60209.C  |  2 +
 gcc/testsuite/g++.dg/cpp0x/pr61038.C  |  4 +-
 .../g++.dg/cpp0x/std_fext-numeric-literals.C  | 32 +++
 .../cpp0x/std_fno-ext-numeric-literals.C  | 32 +++
 gcc/testsuite/g++.dg/cpp0x/udlit-addr.C   |  4 +-
 gcc/testsuite/g++.dg/cpp0x/udlit-args-neg.C   | 24 +--
 gcc/testsuite/g++.dg/cpp0x/udlit-args.C   | 22 +++

Re: [PATCH] Fix const constraint in std::stable_sort and std::inplace_merge

2024-10-03 Thread François Dumont



On 02/10/2024 19:07, Jonathan Wakely wrote:

On Wed, 2 Oct 2024 at 17:39, Jonathan Wakely  wrote:

On Wed, 25 Sept 2024 at 18:22, François Dumont  wrote:

Hi

Once https://gcc.gnu.org/pipermail/libstdc++/2024-September/059568.html
will be accepted we will be able fix this long lasting issue that
std::stable_sort and std::inplace_merge are forcing the functor to take
const& parameters even when iterators used in range are not const ones.

https://cplusplus.github.io/LWG/issue3031 said that's OK.


At least not "fixing" this in pre-C++11 mode is not a problem then.

And AFAIU the resolution it does not say that allowing it is forbidden 
neither.


Should I add the test case shown in the Issue to the testsuite ? It's 
still UB no ?




And ... I guess that means we don't need to worry about the non-const
X::operator<(X&) case?

Before C++20 the standard implied it should work, and that's what
we've traditionally supported. But maybe we can stop supporting that,
if we treat the C++20 change as a DR for previous standards?

Hmm. That would make your clean-up a lot simpler. That's what you had
in the earlier patch, right?

I cannot remember and could check but to be honest I'm quite proud of 
the proposal. I find it already much simpler that the current approach.


I don't think Issue 3031 resolution is proposing to force const-ness but 
if needed it would be very easy to do so with my proposal.


Note that I recently run all testsuite with both patches under Linux x64 
using default -std and got no regression. I'll run it one more time for 
C++98 though.





Re: [PATCH] Aarch64: Change stack checking method on Linux

2024-10-03 Thread Richard Sandiford
Eric Botcazou  writes:
> Hi,
>
> this changes the stack checking method (that of -fstack-check) used on Linux
> from the traditional model (probe then move SP) to the model implemented for
> -fstack-clash-protection (probe while moving SP).  The rationale is that the
> latter is in widespread use on Linux and thus thought to be more robust.
>
> This entails doing a couple of things: defining STACK_CHECK_MOVING_SP to 1:
>
>  -- Macro: STACK_CHECK_MOVING_SP
>  An integer which is nonzero if GCC should move the stack pointer
>  page by page when doing probes.  This can be necessary on systems
>  where the stack pointer contains the bottom address of the memory
>  area accessible to the executing thread at any point in time.  In
>  this situation an alternate signal stack is required in order to be
>  able to recover from a stack overflow.  The default value of this
>  macro is zero

This part I followed :)

> and replacing tests on flag_stack_clash_protection by calls to a new wrapper 
> routine do_stack_clash_protection in the back-end [the implementation is the 
> same as the one present in the i386 back-end].  

But I'm not sure I understand this part.  It seems like it's using the
stack-clash mechanism and stack-clash thresholds for allocating static
parts of the frame while still using the stack-check mechanism for
dynamic alloctions.  Is that right?  If so, don't the two have different
assumptions about which part needs to be probed?  I wasn't sure why we
continued to use PROBE_INTERVAL for dynamic allocations but switched
to stack_clash_probe_interval for static ones (if I followed the code
correctly).

Thanks,
Richard

>
> -fstack-check is mainly used for Ada and AdaCore has been using this method 
> in 
> its Aarch64/Linux compilers for some time.
>
> Bootstrapped/regtested on Aarch64/Linux, OK for the mainline?
>
>
> 2024-10-01  Eric Botcazou  
>
>   * config/aarch64/aarch64-linux.h (STACK_CHECK_MOVING_SP): Define to 1.
>   * config/aarch64/aarch64-protos.h (do_stack_clash_protection): Declare.
>   * config/aarch64/aarch64.h (STACK_DYNAMIC_OFFSET): Replace
>   flag_stack_clash_protection with call to do_stack_clash_protection.
>   * config/aarch64/aarch64.cc (aarch64_output_probe_stack_range): 
> Likewise.
>   (aarch64_output_probe_sve_stack_clash): Likewise.
>   (aarch64_layout_frame): Likewise.
>   (aarch64_get_separate_components): Likewise.
>   (aarch64_allocate_and_probe_stack_space): Likewise.
>   (aarch64_expand_prologue): Likewise.  And do not check the stack prior
>   to establishing the frame if STACK_CHECK_MOVING_SP is 1.
>   (aarch64_expand_epilogue): Likewise.
>   (do_stack_clash_protection): New predicate.


Re: [PATCH 0/1] Detecting lifetime-dse issues via Valgrind [PR66487]

2024-10-03 Thread Sam James
Alexander Monakov  writes:

> I would like to propose Valgrind integration previously sent as RFC for trunk.
>
> Arsen and Sam, since you commented on the RFC I wonder if you can have
> a look at the proposed configure and documentation changes and let me
> know if they look fine for you? For reference, gccinstall.info will say:
>
> ‘--enable-valgrind-interop’
>  Provide wrappers for Valgrind client requests in libgcc, which are
>  used for ‘-fvalgrind-annotations’.  Requires Valgrind header files
>  for the target (in the build-time sysroot if building a
>  cross-compiler).
>
> and GCC manual will document the new option as:
>
>  -fvalgrind-annotations
>  Emit Valgrind client requests annotating object lifetime
>  boundaries.  This allows to detect attempts to access fields of a
>  C++ object after its destructor has completed (but storage was
>  not deallocated yet), or to initialize it in advance from
>  "operator new" rather than the constructor.
>
>  This instrumentation relies on presence of
>  "__gcc_vgmc_make_mem_undefined" function that wraps the
>  corresponding Valgrind client request. It is provided by libgcc
>  when it is configured with --enable-valgrind-interop.  Otherwise,
>  you can implement it like this:
>
>  #include 
>
>  void
>  __gcc_vgmc_make_mem_undefined (void *addr, size_t size)
>  {
>VALGRIND_MAKE_MEM_UNDEFINED (addr, size);
>  }

The doc changes look good -- thanks! (And again, sorry for forgetting.)

IIRC, the status we left off with was libgcc needing review, for which
Jakub is probably the best candidate (maybe Alex as well given the cfr
bits)? I think richi was fine with it overall.

I'm also happy to kick the tyres on it if that'd be useful to make
reviewers happier, but I also know you've done the work on that already
(thank you!) and verified that it would've caught all the cases I could dig up.

>
> Changes since the RFC:
>
> * Add documentation and tests.
>
> * Drop 'emit-' from -fvalgrind-emit-annotations.
>
> * Use --enable-valgrind-interop instead of overloading
>   --enable-valgrind-annotations.
>
> * Do not build the wrapper unless --enable-valgrind-interop is given and
>   Valgrind headers are present.
>
> * Clean up libgcc configure changes.
> * Reword comments.
>
> Daniil Frolov (1):
>   object lifetime instrumentation for Valgrind [PR66487]
>
>  gcc/Makefile.in   |   1 +
>  gcc/builtins.def  |   3 +
>  gcc/common.opt|   4 +
>  gcc/doc/install.texi  |   5 +
>  gcc/doc/invoke.texi   |  27 +
>  gcc/gimple-valgrind-interop.cc| 112 ++
>  gcc/passes.def|   1 +
>  gcc/testsuite/g++.dg/valgrind-annotations-1.C |  22 
>  gcc/testsuite/g++.dg/valgrind-annotations-2.C |  12 ++
>  gcc/tree-pass.h   |   1 +
>  libgcc/Makefile.in|   3 +
>  libgcc/config.in  |   6 +
>  libgcc/configure  |  22 +++-
>  libgcc/configure.ac   |  15 ++-
>  libgcc/libgcc2.h  |   2 +
>  libgcc/valgrind-interop.c |  40 +++
>  16 files changed, 274 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/gimple-valgrind-interop.cc
>  create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-1.C
>  create mode 100644 gcc/testsuite/g++.dg/valgrind-annotations-2.C
>  create mode 100644 libgcc/valgrind-interop.c


[patch,avr] Fix PR116953 - jump_over_one_insn_p clobbers recog_data.operand in avr_out_sbxx_branch

2024-10-03 Thread Georg-Johann Lay

avr_out_sbxx_branch calls jump_over_one_insn_p which may clobber
recog_data.operand as is calls extract on the next insn.

A fix is to make a copy of avr_out_sbxx_branch`s incoming operands.

Ok to apply?

Johann

--

AVR: target/116953 - ICE due to operands clobber in avr_out_sbxx_branch.

PR target/116953
gcc/
* config/avr/avr.cc (avr_out_sbxx_branch): Work on a copy of
the operands rather than on operands itself, which is just
recog_data.operand and may be clobbered by jump_over_one_insn_p.
gcc/testsuite/
* gcc.target/avr/torture/pr116953.c: New test.AVR: target/116953 - ICE due to operands clobber in avr_out_sbxx_branch.

PR target/116953
gcc/
* config/avr/avr.cc (avr_out_sbxx_branch): Work on a copy of
the operands rather than on operands itself, which is just
recog_data.operand and may be clobbered by jump_over_one_insn_p.
gcc/testsuite/
* gcc.target/avr/torture/pr116953.c: New test.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 92013c3845d..735d05b1e74 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -13603,8 +13603,12 @@ avr_hard_regno_rename_ok (unsigned int old_reg, unsigned int new_reg)
Operand 3: label to jump to if the test is true.  */
 
 const char *
-avr_out_sbxx_branch (rtx_insn *insn, rtx operands[])
+avr_out_sbxx_branch (rtx_insn *insn, rtx xop[])
 {
+  // jump_over_one_insn_p may call extract on the next insn, clobbering
+  // recog_data.operand.  Hence make a copy of the operands (PR116953).
+  rtx operands[] = { xop[0], xop[1], xop[2], xop[3] };
+
   rtx_code comp = GET_CODE (operands[0]);
   bool long_jump = get_attr_length (insn) >= 4;
   bool reverse = long_jump || jump_over_one_insn_p (insn, operands[3]);
diff --git a/gcc/testsuite/gcc.target/avr/torture/pr116953.c b/gcc/testsuite/gcc.target/avr/torture/pr116953.c
new file mode 100644
index 000..f8e5a38ec65
--- /dev/null
+++ b/gcc/testsuite/gcc.target/avr/torture/pr116953.c
@@ -0,0 +1,7 @@
+unsigned foo (unsigned x, unsigned y)
+{
+  int i;
+  for (i = 8; i--; x <<= 1)
+y ^= (x ^ y) & 0x80 ? 79U : 0U;
+  return y;
+}


Re: [PATCH 2/2]AArch64: support encoding integer immediates using floating point moves

2024-10-03 Thread Richard Sandiford
Tamar Christina  writes:
> Hi,
>
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, September 30, 2024 6:33 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 2/2]AArch64: support encoding integer immediates using
>> floating point moves
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > This patch extends our immediate SIMD generation cases to support 
>> > generating
>> > integer immediates using floating point operation if the integer immediate 
>> > maps
>> > to an exact FP value.
>> >
>> > As an example:
>> >
>> > uint32x4_t f1() {
>> > return vdupq_n_u32(0x3f80);
>> > }
>> >
>> > currently generates:
>> >
>> > f1:
>> > adrpx0, .LC0
>> > ldr q0, [x0, #:lo12:.LC0]
>> > ret
>> >
>> > i.e. a load, but with this change:
>> >
>> > f1:
>> > fmovv0.4s, 1.0e+0
>> > ret
>> >
>> > Such immediates are common in e.g. our Math routines in glibc because they 
>> > are
>> > created to extract or mark part of an FP immediate as masks.
>> 
>> I agree this is a good thing to do.  The current code is too beholden
>> to the original vector mode.  This patch relaxes it so that it isn't
>> beholden to the original mode's class (integer vs. float), but it would
>> still be beholden to the original mode's element size.
>
> I've implemented this approach and it works but I'm struggling with an 
> inconsistency
> in how zeros are created.
>
> There are about 800 SVE ACLE tests like acge_f16.c that check that a zero is 
> created
> using a mov of the same sized register as the usage.  So I added an exception 
> for
> zero to use the original input element mode.
>
> But then there are about 400 other SVE ACLE tests that actually check that 
> zeros are
> created using byte moves, like dup_128_s16_z even though they're used as ints.
>
> So these two are in conflict.  Do you care which way I resolve this?  since 
> it's zero
> it shouldn't matter how they're created but perhaps there's a reason why some
> test check for the specific instruction?

No, I think it was an oversight.  Any element size would be correct.

Using byte moves sounds like a good thing.  It would be good to
share constants at some point (like we do with ptrues) and using
the smallest element size would then be the natural choice.

Sorry for the drudge work in updating all the tests.  Hope that
generalising them to be size-agnostic turns out to be sed-able,
or at least a simple script.

Thanks,
Richard




Re: [PATCH] aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]

2024-10-03 Thread Richard Sandiford
Andrew Pinski  writes:
> Early-RA was considering throwing instructions as being dead and removing
> them even if -fno-delete-dead-exceptions was in use. This fixes that 
> oversight.
>
> Built and tested for aarch64-linux-gnu.
>
>   PR target/116927
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
>   that throw are not dead with -fno-delete-dead-exceptions.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/torture/pr116927-1.C: New test.

OK, thanks.

I don't like these magic flags that every relevant pass has to remember
to check individually.  In practice, the only way they will be checked
is if code is copied-and-pasted between passes (meaning an abstraction
is missing) or by trial and error.

I couldn't find an obvious helper routine for it, though, so I agree
this is the established best practice.

Richard

>
> Signed-off-by: Andrew Pinski 
> ---
>  gcc/config/aarch64/aarch64-early-ra.cc|  6 ++
>  gcc/testsuite/g++.dg/torture/pr116927-1.C | 15 +++
>  2 files changed, 21 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/torture/pr116927-1.C
>
> diff --git a/gcc/config/aarch64/aarch64-early-ra.cc 
> b/gcc/config/aarch64/aarch64-early-ra.cc
> index 5f269d029b4..6e544dd6191 100644
> --- a/gcc/config/aarch64/aarch64-early-ra.cc
> +++ b/gcc/config/aarch64/aarch64-early-ra.cc
> @@ -3389,6 +3389,12 @@ early_ra::is_dead_insn (rtx_insn *insn)
>if (side_effects_p (set))
>  return false;
>  
> +  /* If we can't delete dead exceptions and the insn throws,
> + then the instruction is not dead.  */
> +  if (!cfun->can_delete_dead_exceptions
> +  && !insn_nothrow_p (insn))
> +return false;
> +
>return true;
>  }
>  
> diff --git a/gcc/testsuite/g++.dg/torture/pr116927-1.C 
> b/gcc/testsuite/g++.dg/torture/pr116927-1.C
> new file mode 100644
> index 000..22fa1dbd7e1
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr116927-1.C
> @@ -0,0 +1,15 @@
> +// { dg-do compile }
> +// { dg-additional-options "-fnon-call-exceptions 
> -fno-delete-dead-exceptions" }
> +
> +// PR target/116927
> +// aarch64's Early ra was removing possiblely trapping
> +// floating point insn
> +
> +void
> +foo (float f)
> +{
> +  try {
> +f ++;
> +  }catch(...)
> +  {}
> +}


Re: [PATCH v1] Add -ftime-report-wall

2024-10-03 Thread Andi Kleen
> Note that if the user requests SARIF output e.g. with
>   -fdiagnostics-format=sarif-stderr
> then any timevar data from -ftime-report is written in JSON form as
> part of the SARIF, rather than in text form to stderr (see
> 75d623946d4b6ea80a777b789b116d4b4a2298dc).
> 
> I see that the proposed patch leaves the user and sys stats as zero,
> and conditionalizes what's printed for text output as part of
> timer::print.  Should it also do something similar in
> make_json_for_timevar_time_def for the json output, and not add the
> properties for "user" and "sys" if the data hasn't been gathered?

> Hope I'm reading the patch correctly.

Yes that's right.

I mainly adjusted the human output for cosmetic reasons.

For machine readable i guess it is better to have a stable schema 
and not skip fields to avoid pain for parsers. So I left it alone.

-Andi


Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE instruction

2024-10-03 Thread Soumya AR


> On 1 Oct 2024, at 1:18 PM, Tamar Christina  wrote:
>
> External email: Use caution opening links or attachments
>
>
> Hi Soumya,
>
> Nice patch!
>
>> -Original Message-
>> From: Kyrylo Tkachov 
>> Sent: Tuesday, October 1, 2024 7:55 AM
>> To: Soumya AR 
>> Cc: gcc-patches@gcc.gnu.org; Richard Sandiford 
>> Subject: Re: [PATCH] aarch64: Optimise calls to ldexp with SVE FSCALE 
>> instruction
>>
>> Hi Soumya
>>
>>> On 30 Sep 2024, at 18:26, Soumya AR  wrote:
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> This patch uses the FSCALE instruction provided by SVE to implement the
>>> standard ldexp family of functions.
>>>
>>> Currently, with '-Ofast -mcpu=neoverse-v2', GCC generates libcalls for the
>>> following code:
>>>
>>> float
>>> test_ldexpf (float x, int i)
>>> {
>>>   return __builtin_ldexpf (x, i);
>>> }
>>>
>>> double
>>> test_ldexp (double x, int i)
>>> {
>>>   return __builtin_ldexp(x, i);
>>> }
>>>
>>> GCC Output:
>>>
>>> test_ldexpf:
>>>   b ldexpf
>>>
>>> test_ldexp:
>>>   b ldexp
>>>
>>> Since SVE has support for an FSCALE instruction, we can use this to process
>>> scalar floats by moving them to a vector register and performing an fscale 
>>> call,
>>> similar to how LLVM tackles an ldexp builtin as well.
>>>
>>> New Output:
>>>
>>> test_ldexpf:
>>>   fmov s31, w0
>>>   ptrue p7.b, all
>>>   fscale z0.s, p7/m, z0.s, z31.s
>>>   ret
>>>
>>> test_ldexp:
>>>   sxtw x0, w0
>>>   ptrue p7.b, all
>>>   fmov d31, x0
>>>   fscale z0.d, p7/m, z0.d, z31.d
>>>   ret
>>>
>>> The patch was bootstrapped and regtested on aarch64-linux-gnu, no 
>>> regression.
>>> OK for mainline?
>>>
>>> Signed-off-by: Soumya AR 
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/aarch64/aarch64-sve.md
>>> (ldexp3): Added a new pattern to match ldexp calls with scalar
>>> floating modes and expand to the existing pattern for FSCALE.
>>> (@aarch64_pred_): Extended the pattern to accept SVE
>>> operands as well as scalar floating modes.
>>>
>>> * config/aarch64/iterators.md:
>>> SVE_FULL_F_SCALAR: Added an iterator to match all FP SVE modes as well
>>> as SF and DF.
>>> VPRED: Extended the attribute to handle GPF modes.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> * gcc.target/aarch64/sve/fscale.c: New test.
>>
>> This patch fixes the bugzilla report at
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733
>> So it should be referenced in the ChangeLog entries like so:
>>
>>  PR target/111733
>>  * config/aarch64/aarch64-sve.md 
>>
>> That way the commit hooks will pick it up and updated the bug tracker 
>> accordingly
>>
>>>
>>> <0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch>
>>
>> +(define_expand "ldexp3"
>> +  [(set (match_operand:GPF 0 "register_operand" "=w")
>> + (unspec:GPF
>> +   [(match_operand:GPF 1 "register_operand" "w")
>> +(match_operand: 2 "register_operand" "w")]
>> +   UNSPEC_COND_FSCALE))]
>> +  "TARGET_SVE"
>> +  {
>> +rtx ptrue = aarch64_ptrue_reg (mode);
>> +rtx strictness = gen_int_mode (SVE_RELAXED_GP, SImode);
>> +emit_insn (gen_aarch64_pred_fscale (operands[0], ptrue,
>> operands[1], operands[2], strictness));
>> +DONE;
>> +  }
>>
>> Lines should not exceed 80 columns, this should be wrapped around
>
> And Nit: perhaps slightly more idiomatic to the other patterns in SVE is this:
>
> (define_expand "ldexp3"
>  [(set (match_operand:GPF 0 "register_operand")
>(unspec:GPF
>  [(match_dup 3)
>   (const_int SVE_RELAXED_GP)
>   (match_operand:GPF 1 "register_operand")
>   (match_operand: 2 "register_operand")]
>  UNSPEC_COND_FSCALE))]
>  "TARGET_SVE"
>  {
>operands[3] = aarch64_ptrue_reg (mode);
>  }
> )
>
> It removes the dependency on the exact name of the pattern.
> Also note the dropping of the constraints, expand patterns don't use
> the constraints, only the predicates are checked.
>
> Cheers,
> Tamar

Thanks for this suggestion! This makes a lot of sense. Edited the patch with
this change.

Also referenced the PR as suggested by Kyrill earlier.

Thanks,
Soumya

>>
>> The patch looks good to me otherwise.
>> Thanks,
>> Kyrill




0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch
Description: 0001-aarch64-Optimise-calls-to-ldexp-with-SVE-FSCALE-inst.patch


Re: SVE intrinsics: Fold svmul with constant power-of-2 operand to svlsl

2024-10-03 Thread Jennifer Schmitz
Ping.

> On 20 Sep 2024, at 11:28, Jennifer Schmitz  wrote:
> 
> For svmul, if one of the operands is a constant vector with a uniform
> power of 2, this patch folds the multiplication to a left-shift by
> immediate (svlsl).
> Because the shift amount in svlsl is the second operand, the order of the
> operands is switched, if the first operand contained the powers of 2. However,
> this switching is not valid for some predications: If the predication is
> _m and the predicate not ptrue, the result of svlsl might not be the
> same as for svmul. Therefore, we do not apply the fold in this case.
> The transform is also not applied to INTMIN for signed integers and to
> constant vectors of 1 (this case is partially covered by constant folding
> already and the missing cases will be addressed by the follow-up patch
> suggested in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663275.html).
> 
> Tests were added in the existing test harness to check the produced assembly
> - when the first or second operand contains the power of 2
> - when the second operand is a vector or scalar (_n)
> - for _m, _z, _x predication
> - for _m with ptrue or non-ptrue
> - for intmin for signed integer types
> - for the maximum power of 2 for signed and unsigned integer types.
> Note that we used 4 as a power of 2, instead of 2, because a recent
> patch optimizes left-shifts by 1 to an add instruction. But since we
> wanted to highlight the change to an lsl instruction we used a higher
> power of 2.
> To also check correctness, runtime tests were added.
> 
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
> 
> Signed-off-by: Jennifer Schmitz 
> 
> gcc/
> * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold):
> Implement fold to svlsl for power-of-2 operands.
> 
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/asm/mul_s8.c: New test.
> * gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
> * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
> * gcc.target/aarch64/sve/mul_const_run.c: Likewise.
> <0001-SVE-intrinsics-Fold-svmul-with-constant-power-of-2-o.patch>



smime.p7s
Description: S/MIME cryptographic signature


[PATCH v5] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-03 Thread Yangyu Chen
From: Kito Cheng 

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74

Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v4:
- Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
  Zimop, Zcmop, Zawrs.
- Rename the return variable name of hwprobe syscall.
- Minor fixes on indentation.

Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.

Co-Developed-by: Yangyu Chen 
Signed-off-by: Yangyu Chen 
---
 libgcc/config/riscv/feature_bits.c | 364 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 365 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..b346e72dabf
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,364 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 2
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__ ((visibility ("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH_GROUPID 0
+#define ZKNH_BITMASK (1ULL << 43)
+#d

Re: [PATCH v1] Add -ftime-report-wall

2024-10-03 Thread Andi Kleen
> The only consumer I know of for the JSON time report data is in the
> integration tests I wrote for -fanalyzer, which assumes that all fields
> are present when printing, and then goes on to use the "user" times for
> summarizing; see this commit FWIW:
> https://github.com/davidmalcolm/gcc-analyzer-integration-tests/commit/5420ce968e6eae886e61486555b54fd460e0d35f

It seems to be broken even without my changes:


% ./gcc/cc1plus -ftime-report -fdiagnostics-format=sarif-file 
../tsrc/tramp3d-v4.i
cc1plus: internal compiler error: Segmentation fault
0x27206ee internal_error(char const*, ...)
../../gcc/gcc/diagnostic-global-context.cc:517
0x133401f crash_signal
../../gcc/gcc/toplev.cc:321
0x27e7934 htab_hash_string
../../gcc/libiberty/hashtab.c:838
0x2715dde string_hash::hash(char const*)
../../gcc/gcc/hash-traits.h:239
0x2715dde simple_hashmap_traits, 
sarif_artifact*>::hash(char const* const&)
../../gcc/gcc/hash-map-traits.h:50
0x2715dde hash_map, sarif_artifact*> 
>::get(char const* const&)
../../gcc/gcc/hash-map.h:191
0x2715dde ordered_hash_map, sarif_artifact*> 
>::get(char const* const&)
../../gcc/gcc/ordered-hash-map.h:76
0x2715dde sarif_builder::get_or_create_artifact(char const*, 
diagnostic_artifact_role, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:2892
0x2716403 sarif_output_format::sarif_output_format(diagnostic_context&, 
line_maps const*, char const*, bool)
../../gcc/gcc/diagnostic-format-sarif.cc:3154
0x2716403 
sarif_file_output_format::sarif_file_output_format(diagnostic_context&, 
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3193
0x2716403 std::enable_if::value, 
std::unique_ptr > >::type 
make_unique(diagnostic_context&, line_maps const*&, char 
const*&, bool&, char const*&)
../../gcc/gcc/make-unique.h:41
0x2716403 diagnostic_output_format_init_sarif_file(diagnostic_context&, 
line_maps const*, char const*, bool, char const*)
../../gcc/gcc/diagnostic-format-sarif.cc:3392
0x26f0522 common_handle_option(gcc_options*, gcc_options*, cl_decoded_option 
const*, unsigned int, int, unsigned int, cl_option_handlers const*, 
diagnostic_context*, void (*)())
../../gcc/gcc/opts.cc:2968
0x26f5728 handle_option
../../gcc/gcc/opts-common.cc:1316
0x26f585e read_cmdline_option(gcc_options*, gcc_options*, cl_decoded_option*, 
unsigned int, unsigned int, cl_option_handlers const*, diagnostic_context*)
../../gcc/gcc/opts-common.cc:1646
0x120f194 read_cmdline_options
../../gcc/gcc/opts-global.cc:242
0x120f194 decode_options(gcc_options*, gcc_options*, cl_decoded_option*, 
unsigned int, unsigned int, diagnostic_context*, void (*)())
../../gcc/gcc/opts-global.cc:329
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.


[PATCH v6] RISC-V: Implement __init_riscv_feature_bits, __riscv_feature_bits, and __riscv_vendor_feature_bits

2024-10-03 Thread Yangyu Chen
From: Kito Cheng 

This provides a common abstraction layer to probe the available extensions at
run-time. These functions can be used to implement function multi-versioning or
to detect available extensions.

The advantages of providing this abstraction layer are:
- Easy to port to other new platforms.
- Easier to maintain in GCC for function multi-versioning.
  - For example, maintaining platform-dependent code in C code/libgcc is much
easier than maintaining it in GCC by creating GIMPLEs...

This API is intended to provide the capability to query minimal common 
available extensions on the system.

Proposal in riscv-c-api-doc: 
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/74

Full function multi-versioning implementation will come later. We are posting
this first because we intend to backport it to the GCC 14 branch to unblock
LLVM 19 to use this with GCC 14.2, rather than waiting for GCC 15.

Changes since v5:
- Minor fixes on indentation.

Changes since v4:
- Bump to newest riscv-c-api-doc with some new extensions like Zve*, Zc*
  Zimop, Zcmop, Zawrs.
- Rename the return variable name of hwprobe syscall.
- Minor fixes on indentation.

Changes since v3:
- Fix non-linux build.
- Let __init_riscv_feature_bits become constructor

Changes since v2:
- Prevent it initialize more than once.

Changes since v1:
- Fix the format.
- Prevented race conditions by introducing a local variable to avoid load/store
  operations during the computation of the feature bit.

libgcc/ChangeLog:

* config/riscv/feature_bits.c: New.
* config/riscv/t-elf (LIB2ADD): Add feature_bits.c.

Co-Developed-by: Yangyu Chen 
Signed-off-by: Yangyu Chen 
---
 libgcc/config/riscv/feature_bits.c | 364 +
 libgcc/config/riscv/t-elf  |   1 +
 2 files changed, 365 insertions(+)
 create mode 100644 libgcc/config/riscv/feature_bits.c

diff --git a/libgcc/config/riscv/feature_bits.c 
b/libgcc/config/riscv/feature_bits.c
new file mode 100644
index 000..c5339f065c1
--- /dev/null
+++ b/libgcc/config/riscv/feature_bits.c
@@ -0,0 +1,364 @@
+/* Helper function for function multi-versioning for RISC-V.
+
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+#define RISCV_FEATURE_BITS_LENGTH 2
+struct {
+  unsigned length;
+  unsigned long long features[RISCV_FEATURE_BITS_LENGTH];
+} __riscv_feature_bits __attribute__ ((visibility ("hidden"), nocommon));
+
+#define RISCV_VENDOR_FEATURE_BITS_LENGTH 1
+
+struct {
+  unsigned vendorID;
+  unsigned length;
+  unsigned long long features[RISCV_VENDOR_FEATURE_BITS_LENGTH];
+} __riscv_vendor_feature_bits __attribute__ ((visibility ("hidden"), 
nocommon));
+
+#define A_GROUPID 0
+#define A_BITMASK (1ULL << 0)
+#define C_GROUPID 0
+#define C_BITMASK (1ULL << 2)
+#define D_GROUPID 0
+#define D_BITMASK (1ULL << 3)
+#define F_GROUPID 0
+#define F_BITMASK (1ULL << 5)
+#define I_GROUPID 0
+#define I_BITMASK (1ULL << 8)
+#define M_GROUPID 0
+#define M_BITMASK (1ULL << 12)
+#define V_GROUPID 0
+#define V_BITMASK (1ULL << 21)
+#define ZACAS_GROUPID 0
+#define ZACAS_BITMASK (1ULL << 26)
+#define ZBA_GROUPID 0
+#define ZBA_BITMASK (1ULL << 27)
+#define ZBB_GROUPID 0
+#define ZBB_BITMASK (1ULL << 28)
+#define ZBC_GROUPID 0
+#define ZBC_BITMASK (1ULL << 29)
+#define ZBKB_GROUPID 0
+#define ZBKB_BITMASK (1ULL << 30)
+#define ZBKC_GROUPID 0
+#define ZBKC_BITMASK (1ULL << 31)
+#define ZBKX_GROUPID 0
+#define ZBKX_BITMASK (1ULL << 32)
+#define ZBS_GROUPID 0
+#define ZBS_BITMASK (1ULL << 33)
+#define ZFA_GROUPID 0
+#define ZFA_BITMASK (1ULL << 34)
+#define ZFH_GROUPID 0
+#define ZFH_BITMASK (1ULL << 35)
+#define ZFHMIN_GROUPID 0
+#define ZFHMIN_BITMASK (1ULL << 36)
+#define ZICBOZ_GROUPID 0
+#define ZICBOZ_BITMASK (1ULL << 37)
+#define ZICOND_GROUPID 0
+#define ZICOND_BITMASK (1ULL << 38)
+#define ZIHINTNTL_GROUPID 0
+#define ZIHINTNTL_BITMASK (1ULL << 39)
+#define ZIHINTPAUSE_GROUPID 0
+#define ZIHINTPAUSE_BITMASK (1ULL << 40)
+#define ZKND_GROUPID 0
+#define ZKND_BITMASK (1ULL << 41)
+#define ZKNE_GROUPID 0
+#define ZKNE_BITMASK (1ULL << 42)
+#define ZKNH