date:20241211

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Richard Earnshaw (lists)

On 11/12/2024 09:54, Tamar Christina wrote:

-Original Message-
From: Richard Sandiford 
Sent: Wednesday, December 11, 2024 9:50 AM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it

Tamar Christina  writes:

-Original Message-
From: Richard Sandiford 
Sent: Wednesday, December 11, 2024 9:32 AM
To: Tamar Christina 
Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
; ktkac...@gcc.gnu.org
Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
support it

Tamar Christina  writes:

Hi All,

GCC 15 added two new fusions CMP+CSEL and CMP+CSET.

This patch enables them for cores that support based on their Software
Optimization Guides and generically on Armv9-A.   Even if a core does not
support it there's no negative performance impact.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-fusion-pairs.def

(AARCH64_FUSE_NEOVERSE_BASE):

New.
* config/aarch64/tuning_models/cortexx925.h: Use it.
* config/aarch64/tuning_models/generic_armv9_a.h: Use it.
* config/aarch64/tuning_models/neoverse512tvb.h: Use it.
* config/aarch64/tuning_models/neoversen2.h: Use it.
* config/aarch64/tuning_models/neoversen3.h: Use it.
* config/aarch64/tuning_models/neoversev1.h: Use it.
* config/aarch64/tuning_models/neoversev2.h: Use it.
* config/aarch64/tuning_models/neoversev3.h: Use it.
* config/aarch64/tuning_models/neoversev3ae.h: Use it.

---

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def

b/gcc/config/aarch64/aarch64-fusion-pairs.def

index

f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376

df9ae7a196031286d 100644

--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
  /* Baseline fusion settings suitable for all cores.  */
  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |

AARCH64_FUSE_AES_AESMC)

+/* Baseline fusion settings suitable for all Neoverse cores.  */
+#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |

AARCH64_FUSE_CMP_CSEL \

+   | AARCH64_FUSE_CMP_CSET)
+
  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |

AARCH64_FUSE_MOVK_MOVK)

diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h

b/gcc/config/aarch64/tuning_models/cortexx925.h

index

eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496

def277230e193d20f 100644

--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
  2 /* store_pred.  */
}, /* memmov_cost.  */
10, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
"32:16",  /* function_align.  */
"4",  /* jump_align.  */
"32:16",  /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h

b/gcc/config/aarch64/tuning_models/generic_armv9_a.h

index

48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1

5a5df7803ff4dc6c 100644

--- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -236,7 +236,7 @@ static const struct tune_params

generic_armv9_a_tunings =

  1 /* store_pred.  */
}, /* memmov_cost.  */
3, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
"32:16",  /* function_align.  */
"4",  /* jump_align.  */
"32:16",  /* loop_align.  */

Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
we have to be careful about using it for generic tuning.  generic-armv9-a
mustn't become "generic Neoverse", since it's supposed to be good for
non-Neoverse (and non-Arm) Armv9-A cores as well.

So perhaps here we should expand the macro.  Alternatively, we could add
a comment saying that fusion macros for other CPUs can be added here as
well, if they are unlikely to have a negative impact on other cores.

Perhaps we should expand the macro for cortex-x925 as well.

Sorry that certainly was not the intention, I thought about naming it
AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
non-Armv9 cores I thought this would be equally confusing.  But perhaps
In light of your comments it may make more sense naming wise?

Yeah, I'd also wondered about suggesting that, but then saw that there
were quite a few Armv9-A cores that don't have the flag.

So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
should use for the Neoverse tunings.  But perhaps we should expand the
macro for the two files above.  If anyone adds any new _FUSE fl

Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Arsen Arsenović

Jonathan Wakely  writes:

> This was approved in Wrocław as LWG 3900, so that passing an incorrect
> argument intended as an allocator will be ill-formed, instead of
> silently ignored.
>
> This also renames the template parameters and function parameters for
> the allocators, to align with the names in the standard. I found it too
> confusing to have a parameter _Alloc which doesn't correspond to Alloc
> in the standard. Rename _Alloc to _Allocator (which the standard calls
> Allocator) and rename _Na to _Alloc (which the standard calls Alloc).
>
> libstdc++-v3/ChangeLog:
>
>   * include/std/generator (_Promise_alloc): Rename template
>   parameter. Use __alloc_rebind to rebind allocator.
>   (_Promise_alloc::operator new): Replace constraints with a
>   static_assert in the body. Rename allocator parameter.
>   (_Promise_alloc::_M_allocate): Rename allocator parameter.
>   Use __alloc_rebind to rebind allocator.
>   (_Promise_alloc::operator new): Rename allocator
>   parameter.
>   * testsuite/24_iterators/range_generators/alloc.cc: New test.
>   * testsuite/24_iterators/range_generators/lwg3900.cc: New test.
> ---
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/std/generator| 55 +--
>  .../24_iterators/range_generators/alloc.cc| 45 +++
>  .../24_iterators/range_generators/lwg3900.cc  | 16 ++
>  3 files changed, 87 insertions(+), 29 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc

LGTM, with one suggestion below.

> diff --git a/libstdc++-v3/include/std/generator 
> b/libstdc++-v3/include/std/generator
> index 0a14e70064e..16b953e90af 100644
> --- a/libstdc++-v3/include/std/generator
> +++ b/libstdc++-v3/include/std/generator
> @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  concept _Stateless_alloc = 
> (allocator_traits<_All>::is_always_equal::value
>   && default_initializable<_All>);
>  
> -template
> +template
>class _Promise_alloc
>{
> - using _ATr = allocator_traits<_Alloc>;
> - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>;
> - using _Rebound_ATr = typename _ATr
> -   ::template rebind_traits<_Alloc_block>;
> + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>;
> + using _Rebound_ATr = allocator_traits<_Rebound>;
>   static_assert(is_pointer_v,
> "Must use allocators for true pointers with generators");
>  
> @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>public:
>   void*
>   operator new(std::size_t __sz)
> -   requires default_initializable<_Rebound> // _Alloc is non-void
> +   requires default_initializable<_Rebound> // _Allocator is non-void
>   { return _M_allocate({}, __sz); }
>  
> - template
> + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> + // 3900. The allocator_arg_t overloads of promise_type::operator new
> + // should not be constrained
> + template
>   void*
>   operator new(std::size_t __sz,
> -  allocator_arg_t, const _Na& __na,
> +  allocator_arg_t, const _Alloc& __a,
>const _Args&...)
> -   requires convertible_to
>   {
> -   return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> -  __sz);
> +   static_assert(convertible_to);
> +   return _M_allocate(_Rebound(_Allocator(__a)), __sz);
>   }
>  
> - template
> + template
>   void*
>   operator new(std::size_t __sz,
>const _This&,
> -  allocator_arg_t, const _Na& __na,
> +  allocator_arg_t, const _Alloc& __a,
>const _Args&...)
> -   requires convertible_to
>   {
> -   return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> -  __sz);
> +   static_assert(convertible_to);

It might be worth giving a suggestion here?

> +   return _M_allocate(_Rebound(_Allocator(__a)), __sz);
>   }
>  
>   void
> @@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   }
>   }
>  
> - template
> + template
>   static void*
> - _M_allocate(const _Na& __na, std::size_t __csz)
> + _M_allocate(const _Alloc& __a, std::size_t __csz)
>   {
> -   using _Rebound = typename std::allocator_traits<_Na>
> - ::template rebind_alloc<_Alloc_block>;
> -   using _Rebound_ATr = typename std::allocator_traits<_Na>
> - ::template rebind_traits<_Alloc_block>;
> +   using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>;
> +   using _Rebound_ATr = allocator_traits<_Rebound>;
>  
> static_assert(is_pointer_v,
>   "Must use allocators for true pointers with 
> generators");
>  
> _Dea

[patch,avr] Add new 24-bit address space __flashx

2024-12-11 Thread Georg-Johann Lay


This patch adds __flashx as a new named address space that allocates
objects in .progmemx.data.  The handling is mostly the same or similar
to that of 24-bit space __memx, except that the output routines are
simpler and more efficient.  Loads are emit inline when ELPMX or
LPMX is available.  The address space uses a 24-bit addresses even
on devices with a program memory size of 64 KiB or less.

Passes without new regressions.

Also passes without new regressions when used as AS in the proposed
new target hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA, cf.
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671216.html

Ok for trunk?

Johann

--


AVR: Add __flashx as 24-bit named address space.

This patch adds __flashx as a new named address space that allocates
objects in .progmemx.data.  The handling is mostly the same or similar
to that of 24-bit space __memx, except that the output routines are
simpler and more efficient.  Loads are emit inline when ELPMX or
LPMX is available.  The address space uses a 24-bit addresses even
on devices with a program memory size of 64 KiB or less.

gcc/
* doc/extend.texi (AVR Named Address Spaces): Document __flashx.
* config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value.
* config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p)
(avr_fload_libgcc_p, avr_load_libgcc_mem_p)
(avr_load_libgcc_insn_p): New.
* config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX.
(avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p)
(avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload):
New functions.
(avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case.
(avr_progmem_p) [avr_decl_flashx_p]: return 2.
(avr_addr_space_legitimate_address_p) [ADDR_SPACE_FLASHX]:
Has same behavior like ADDR_SPACE_MEMX.
(avr_addr_space_convert): Use pointer sizes rather then ASes.
(avr_addr_space_contains): New function.
(avr_convert_to_type): Use it.
(avr_emit_cpymemhi): Handle ADDR_SPACE_FLASHX.
* config/avr/avr.md (adjust_len) : New attr value.
(gen_load_libgcc): Renamed from load_libgcc.
(xload8_A): Iterate over MOVMODE rather than over ALL1.
(fxmov_A): New from xloadv_A.
(xmov_8): New from xload_A.
(fmov): New insns.
(fxload_A): New from xload_A.
(fxload__libgcc): New from xload__libgcc.
(*fxload__libgcc): New from *xload__libgcc.
(mov) [avr_mem_flashx_p]: Hande ADDR_SPACE_FLASHX.
(cpymemx_): Make sure the address space is not lost
when splitting.
(*cpymemx_) [ADDR_SPACE_FLASHX]: Use __movmemf_ for asm.
(*ashlqi.1.zextpsi_split): New combine pattern.
* config/avr/predicates.md (nox_general_operand): Don't match
when avr_mem_flashx_p is true.
* config/avr/avr-passes.cc (AVR_LdSt_Props):
ADDR_SPACE_FLASHX has no post_inc.

gcc/testsuite/
* gcc.target/avr/torture/addr-space-1.h [AVR_HAVE_ELPM]:
Use a function to bump .progmemx.data to a high address.
* gcc.target/avr/torture/addr-space-2.h: Same.
* gcc.target/avr/torture/addr-space-1-fx.c: New test.
* gcc.target/avr/torture/addr-space-2-fx.c: New test.

libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _fload_1, _fload_2,
_fload_3, _fload_4, _movmemf.
* config/avr/lib1funcs.S (.branch_plus): New .macro.
(__xload_1, __xload_2, __xload_3, __xload_4): When the address is
located in flash, then forward to...
(__fload_1, __fload_2, __fload_3, __fload_4): ...these new
functions, respectively.
(__movmemx_hi): When the address is located in flash, forward to...
(__movmemf_hi): ...this new function.AVR: Add __flashx as 24-bit named address space.

This patch adds __flashx as a new named address space that allocates
objects in .progmemx.data.  The handling is mostly the same or similar
to that of 24-bit space __memx, except that the output routines are
simpler and more efficient.  Loads are emit inline when ELPMX or
LPMX is available.  The address space uses a 24-bit addresses even
on devices with a program memory size of 64 KiB or less.

gcc/
* doc/extend.texi (AVR Named Address Spaces): Document __flashx.
* config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value.
* config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p)
(avr_fload_libgcc_p, avr_load_libgcc_mem_p)
(avr_load_libgcc_insn_p): New.
* config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX.
(avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p)
(avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload):
New functions.
(avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case.
(avr_progmem_p) [avr_decl_flashx_p]: return 2.

Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Jonathan Wakely

On Wed, 11 Dec 2024 at 12:17, Arsen Arsenović  wrote:
>
> Jonathan Wakely  writes:
>
> > This was approved in Wrocław as LWG 3900, so that passing an incorrect
> > argument intended as an allocator will be ill-formed, instead of
> > silently ignored.
> >
> > This also renames the template parameters and function parameters for
> > the allocators, to align with the names in the standard. I found it too
> > confusing to have a parameter _Alloc which doesn't correspond to Alloc
> > in the standard. Rename _Alloc to _Allocator (which the standard calls
> > Allocator) and rename _Na to _Alloc (which the standard calls Alloc).
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/generator (_Promise_alloc): Rename template
> >   parameter. Use __alloc_rebind to rebind allocator.
> >   (_Promise_alloc::operator new): Replace constraints with a
> >   static_assert in the body. Rename allocator parameter.
> >   (_Promise_alloc::_M_allocate): Rename allocator parameter.
> >   Use __alloc_rebind to rebind allocator.
> >   (_Promise_alloc::operator new): Rename allocator
> >   parameter.
> >   * testsuite/24_iterators/range_generators/alloc.cc: New test.
> >   * testsuite/24_iterators/range_generators/lwg3900.cc: New test.
> > ---
> >
> > Tested x86_64-linux.
> >
> >  libstdc++-v3/include/std/generator| 55 +--
> >  .../24_iterators/range_generators/alloc.cc| 45 +++
> >  .../24_iterators/range_generators/lwg3900.cc  | 16 ++
> >  3 files changed, 87 insertions(+), 29 deletions(-)
> >  create mode 100644 
> > libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc
> >  create mode 100644 
> > libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc
>
> LGTM, with one suggestion below.
>
> > diff --git a/libstdc++-v3/include/std/generator 
> > b/libstdc++-v3/include/std/generator
> > index 0a14e70064e..16b953e90af 100644
> > --- a/libstdc++-v3/include/std/generator
> > +++ b/libstdc++-v3/include/std/generator
> > @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  concept _Stateless_alloc = 
> > (allocator_traits<_All>::is_always_equal::value
> >   && default_initializable<_All>);
> >
> > -template
> > +template
> >class _Promise_alloc
> >{
> > - using _ATr = allocator_traits<_Alloc>;
> > - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>;
> > - using _Rebound_ATr = typename _ATr
> > -   ::template rebind_traits<_Alloc_block>;
> > + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>;
> > + using _Rebound_ATr = allocator_traits<_Rebound>;
> >   static_assert(is_pointer_v,
> > "Must use allocators for true pointers with 
> > generators");
> >
> > @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >public:
> >   void*
> >   operator new(std::size_t __sz)
> > -   requires default_initializable<_Rebound> // _Alloc is non-void
> > +   requires default_initializable<_Rebound> // _Allocator is non-void
> >   { return _M_allocate({}, __sz); }
> >
> > - template
> > + // _GLIBCXX_RESOLVE_LIB_DEFECTS
> > + // 3900. The allocator_arg_t overloads of promise_type::operator new
> > + // should not be constrained
> > + template
> >   void*
> >   operator new(std::size_t __sz,
> > -  allocator_arg_t, const _Na& __na,
> > +  allocator_arg_t, const _Alloc& __a,
> >const _Args&...)
> > -   requires convertible_to
> >   {
> > -   return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> > -  __sz);
> > +   static_assert(convertible_to);
> > +   return _M_allocate(_Rebound(_Allocator(__a)), __sz);
> >   }
> >
> > - template
> > + template
> >   void*
> >   operator new(std::size_t __sz,
> >const _This&,
> > -  allocator_arg_t, const _Na& __na,
> > +  allocator_arg_t, const _Alloc& __a,
> >const _Args&...)
> > -   requires convertible_to
> >   {
> > -   return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> > -  __sz);
> > +   static_assert(convertible_to);
>
> It might be worth giving a suggestion here?

Like this?

/home/jwakely/gcc/15/include/c++/15.0.0/generator:478:25: error:
static assertion failed: the argument that follows std::allocator_arg
must be convertible to the generator's allocator type
 478 |   static_assert(convertible_to,
 | ^

Can we be more precise than "the argument"? Is "the coroutine
argument" accurate? The function that the argument is passed to is a
coroutine, so I think that's correct.



>
> > +   return _M_allocate(_Rebound(_Allocator(__a)), __sz);
> >   }
> >
> >   void
> > @@ -5

[PATCH] ifcombine field-merge: saturate align at inner object size

2024-12-11 Thread Alexandre Oliva

On Dec 10, 2024, Richard Biener  wrote:

> On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva  wrote:
>> Regstrapped on x86_64-linux-gnu.  The aarch64 CI had reported a
>> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu,
>> but it will still be a while.  Ok to install if it passes?

> OK.

Thanks.  The aarch64-linux-gnu bootstrap revealed a latent surprise, in
the form of a bootstrap failure, that this incremental patch addresses.
I'll hold off from installing the larger patch without a fix for this.


A bootstrap on aarch64-linux-gnu revealed that sometimes (for example,
when building shorten_branches in final.cc) we will find such things
as MEM , where unsigned int happen to be a variant of
the original unsigned int type, given 64-bit alignment.  This unusual
alignment circumstance caused get_best_mode to choose DImode instead
of SImode, and that failed gimple verification because there aren't
that many bits in the unsigned int object.

Arrange for alignment to saturate at the inner object size to avoid
tripping this error.

Regstrapping on x86_64-linux-gnu.  Regstrapped an earlier version
without lr_align on aarch64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Saturate
align at inner object size.
---
 gcc/gimple-fold.cc |   39 +++
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index a31fc283d51b0..967356a950192 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -8204,24 +8204,31 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
  to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
+  HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
+  /* Guard from types with wider-than-size alignment.  We must not widen the
+ load beyond its total size.  This is rare.  */
+  while (ll_align > BITS_PER_UNIT
+&& TYPE_SIZE (TREE_TYPE (ll_inner))
+&& uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner)))
+&& known_gt ((unsigned HOST_WIDE_INT)ll_align,
+ tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)
+ll_align /= 2;
   if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,
-TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
-volatilep, &lnmode))
+ll_align, BITS_PER_WORD, volatilep, &lnmode))
 l_split_load = false;
   else
 {
   /* Consider the possibility of recombining loads if any of the
 fields straddles across an alignment boundary, so that either
 part can be loaded along with the other field.  */
-  HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
   HOST_WIDE_INT boundary = compute_split_boundary_from_align
-   (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+   (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
 
   if (boundary < 0
  || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
-align, BITS_PER_WORD, volatilep, &lnmode)
+ll_align, BITS_PER_WORD, volatilep, &lnmode)
  || !get_best_mode (end_bit - boundary, boundary, 0, 0,
-align, BITS_PER_WORD, volatilep, &lnmode2))
+ll_align, BITS_PER_WORD, volatilep, &lnmode2))
return 0;
 
   /* If we can't have a single load, but can with two, figure out whether
@@ -8368,16 +8375,24 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
 and then we use two separate compares.  */
   first_bit = MIN (lr_bitpos, rr_bitpos);
   end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
+  HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner));
+  /* Guard from types with wider-than-size alignment.  We must not widen 
the
+load beyond its total size.  This is rare.  */
+  while (lr_align > BITS_PER_UNIT
+&& TYPE_SIZE (TREE_TYPE (lr_inner))
+&& uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner)))
+&& known_gt ((unsigned HOST_WIDE_INT)lr_align,
+ tree_to_poly_uint64 (TYPE_SIZE
+  (TREE_TYPE (lr_inner)
+   lr_align /= 2;
   if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
- TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
- volatilep, &rnmode))
+ lr_align, BITS_PER_WORD, volatilep, &rnmode))
{
  /* Consider the possibility of recombining loads if any of the
 fields straddles across an alignment boundary, so that either
 part can be loaded along with the other field.  */
- HOST_WIDE_INT align = TYPE_ALIGN (TRE

RE: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors

2024-12-11 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina
> Sent: Wednesday, December 4, 2024 12:18 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> ktkac...@gcc.gnu.org; Richard Sandiford 
> Subject: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors
> 
> Hi All,
> 
> This patch adds support for vector constructor from two partial SVE vectors 
> into
> a full SVE vector. It also implements support for the standard vec_init obtab 
> to
> do this.
> 
> gcc/ChangeLog:
> 
>   PR target/96342
>   * config/aarch64/aarch64-sve.md (vec_init): New.
>   (@aarch64_pack_partial): New.
>   * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init): Special
>   case constructors of two vectors.
>   * config/aarch64/iterators.md (SVE_NO2E, SVE_PARTIAL_NO2E): New.
>   (VHALF, Vhalf, Vwstype): Add SVE partial vectors.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/96342
>   * gcc.target/aarch64/vect-simd-clone-2.c: New test.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-
> sve.md
> index
> 9afd11d347626eeb640722fdba2ab763b8479aa7..9e3577be6e943d7a5c95119
> 6463873d4bcfee07c 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -2840,6 +2840,16 @@ (define_expand "vec_init"
>}
>  )
> 
> +(define_expand "vec_init"
> +  [(match_operand:SVE_NO2E 0 "register_operand")
> +(match_operand 1 "")]
> +  "TARGET_SVE"
> +  {
> +aarch64_sve_expand_vector_init (operands[0], operands[1]);
> +DONE;
> +  }
> +)
> +
>  ;; Shift an SVE vector left and insert a scalar into element 0.
>  (define_insn "vec_shl_insert_"
>[(set (match_operand:SVE_FULL 0 "register_operand")
> @@ -9347,6 +9357,20 @@ (define_insn "vec_pack_trunc_"
>"uzp1\t%0., %1., %2."
>  )
> 
> +;; Integer partial pack packing two partial SVE types into a single full SVE
> +;; type of the same element type.  Use UZP1 on the wider type, which discards
> +;; the high part of each wide element.  This allows to concat SVE partial 
> types
> +;; into a wider vector.
> +(define_insn "@aarch64_pack_partial"
> +  [(set (match_operand:SVE_PARTIAL_NO2E 0 "register_operand" "=w")
> + (unspec:SVE_PARTIAL_NO2E
> +   [(match_operand: 1 "register_operand" "w")
> +(match_operand: 2 "register_operand" "w")]
> +   UNSPEC_PACK))]
> +  "TARGET_SVE"
> +  "uzp1\t%0., %1., %2."
> +)
> +
>  ;; -
>  ;;  [INT<-INT] Unpacks
>  ;; -
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> af6fede102c2be6673c24f8020d000ea56322997..690d54b0a2954327e00d559f
> 96c414c81c2e18cd 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24790,6 +24790,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx
> vals)
>  v.quick_push (XVECEXP (vals, 0, i));
>v.finalize ();
> 
> +  /* If we have two elements and are concatting vector.  */
> +  machine_mode elem_mode = GET_MODE (v.elt (0));
> +  if (nelts == 2 && VECTOR_MODE_P (elem_mode))
> +{
> +  /* We've failed expansion using a dup.  Try using a cheeky truncate. */
> +  rtx arg0 = force_reg (elem_mode, v.elt(0));
> +  rtx arg1 = force_reg (elem_mode, v.elt(1));
> +  emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1));
> +  return;
> +}
> +
>/* If neither sub-vectors of v could be initialized specially,
>   then use INSR to insert all elements from v into TARGET.
>   ??? This might not be optimal for vectors with large
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index
> 023893d35f3e955e222c322ce370e84c95c29ee6..77d23d6ad795630d3d5fb5c0
> 76c086a479d46fee 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -138,6 +138,14 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI])
>  ;; VQ without 2 element modes.
>  (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF])
> 
> +;; SVE modes without 2 element modes.
> +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI
> VNx8HF
> + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF])
> +
> +;; Partial SVE modes without 2 element modes.
> +(define_mode_iterator SVE_PARTIAL_NO2E [VNx8QI VNx4QI VNx4HI
> + VNx4HF VNx8BF VNx4BF])
> +
>  ;; 2 element quad vector modes.
>  (define_mode_iterator VQ_2E [V2DI V2DF])
> 
> @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI")  (V16QI
> "V8QI")
>(V2DI "DI")(V2SF  "SF")
>(V4SF "V2SF")  (V4HF "V2HF")
>(V8HF "V4HF")  (V2DF  "DF")
> -

RE: [PATCH 3/7]AArch64: Disable `omp declare variant' tests for aarch64 [PR96342]

2024-12-11 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, December 4, 2024 12:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> ktkac...@gcc.gnu.org; Richard Sandiford 
> Subject: [PATCH 3/7]AArch64: Disable `omp declare variant' tests for aarch64
> [PR96342]
> 
> Hi All,
> 
> These tests are x86 specific and shouldn't be run for aarch64.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/96342
>   * c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64
> target
>   only test.
>   * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> ---
> diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> index
> e3668893afe33a58c029cddd433d9bf43cce2bfa..2b71869787e819dc7bb8ca8f9
> 512792ac2877515 100644
> --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
> @@ -1,6 +1,6 @@
> -/* { dg-do compile { target vect_simd_clones } } */
> +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } 
> } } */
>  /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
> -/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } 
> */
> +/* { dg-additional-options "-mno-sse3" } */
> 
>  int f01 (int);
>  int f02 (int);
> @@ -15,15 +15,13 @@ int
>  test1 (int x)
>  {
>/* At gimplification time, we can't decide yet which function to call.  */
> -  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { 
> !aarch64*-
> *-* } } } } */
> +  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
>/* After simd clones are created, the original non-clone test1 shall
>   call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
>   shall call f01 with score 8.  */
>/* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
> -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target {
> !aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target {
> aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target {
> !aarch64*-*-* } } } } */
> -  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target {
> aarch64*-*-* } } } } */
> +  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
> +  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
>int a = f04 (x);
>int b = f04 (x);
>return a + b;
> diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> index
> 6319df0558f37b95f1b2eb17374bdb4ecbc33295..8db341fd15306a5deeae1468
> 08d7ef55aa713bb1 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90
> @@ -1,6 +1,6 @@
> -! { dg-do compile { target vect_simd_clones } }
> +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } 
> } */
>  ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" }
> -! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } }
> +! { dg-additional-options "-mno-sse3" }
> 
>  module main
>implicit none
> @@ -40,10 +40,8 @@ contains
>  ! call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
>  ! shall call f01 with score 8.
>  ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } }
> -! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target {
> !aarch64*-*-* } } } }
> -! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target {
> aarch64*-*-* } } } }
> -! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target {
> !aarch64*-*-* } } } }
> -! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target {
> aarch64*-*-* } } } }
> +! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } }
> +! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } }
>  a = f04 (x)
>  b = f04 (x)
>  test1 = a + b
> 
> 
> 
> 
> --

RE: [PATCH 2/7]AArch64: Add SVE support for simd clones [PR96342]

2024-12-11 Thread Tamar Christina

ping

> -Original Message-
> From: Tamar Christina 
> Sent: Wednesday, December 4, 2024 12:17 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Richard Earnshaw ;
> ktkac...@gcc.gnu.org; Richard Sandiford 
> Subject: [PATCH 2/7]AArch64: Add SVE support for simd clones [PR96342]
> 
> Hi All,
> 
> This patch finalizes adding support for the generation of SVE simd clones when
> no simdlen is provided, following the ABI rules where the widest data type
> determines the minimum amount of elements in a length agnostic vector.
> 
> gcc/ChangeLog:
> 
>   PR target/96342
>   * config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare.
>   * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make
>   visibility global and support use for non_acle types.
>   * config/aarch64/aarch64.cc
>   (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd
> clone
>   when no simdlen is provided, according to ABI rules.
>   (simd_clone_adjust_sve_vector_type): New helper function.
>   (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones
>   and modify types to use SVE types.
>   * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen.
>   (simd_clone_adjust): Adapt safelen check to be compatible with VLA
>   simdlen.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/96342
>   * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan.
>   * gcc.target/aarch64/vect-simd-clone-1.c: New test.
>   * g++.target/aarch64/vect-simd-clone-1.c: New test.
> 
> 
> Co-authored-by: Victor Do Nascimento 
> Co-authored-by: Tamar Christina 
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> -m32, -m64 and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> ---
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-
> protos.h
> index
> c6ce62190bce43fae7b0c9d64202a7c042df6ef4..e7724e0518dd97a120edbc5f0
> 2b20298a57c653f 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1138,6 +1138,8 @@ namespace aarch64_sve {
>  #ifdef GCC_TARGET_H
>bool verify_type_context (location_t, type_context_kind, const_tree, bool);
>  #endif
> + void add_sve_type_attribute (tree, unsigned int, unsigned int,
> +   const char *, const char *);
>  }
> 
>  extern void aarch64_split_combinev16qi (rtx operands[3]);
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc
> b/gcc/config/aarch64/aarch64-sve-builtins.cc
> index
> 0fec1cd439e729dca495aac4dea054a25ede20a7..e6c2bdeb00681848a838009c
> 833cfe3271a94049 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
> @@ -998,14 +998,16 @@ static GTY(()) hash_map
> *overload_names[2];
>  /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE 
> vectors
> and NUM_PR SVE predicates.  MANGLED_NAME, if nonnull, is the ABI-defined
> mangling of the type.  ACLE_NAME is the  name of the type.  */
> -static void
> +void
>  add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr,
>   const char *mangled_name, const char *acle_name)
>  {
>tree mangled_name_tree
>  = (mangled_name ? get_identifier (mangled_name) : NULL_TREE);
> +  tree acle_name_tree
> += (acle_name ? get_identifier (acle_name) : NULL_TREE);
> 
> -  tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE);
> +  tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE);
>value = tree_cons (NULL_TREE, mangled_name_tree, value);
>value = tree_cons (NULL_TREE, size_int (num_pr), value);
>value = tree_cons (NULL_TREE, size_int (num_zr), value);
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index
> 4108c09715a5540db87ec4ba74a10804af78054a..af6fede102c2be6673c24f80
> 20d000ea56322997 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -29284,7 +29284,7 @@
> aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>   int num, bool explicit_p)
>  {
>tree t, ret_type;
> -  unsigned int nds_elt_bits;
> +  unsigned int nds_elt_bits, wds_elt_bits;
>unsigned HOST_WIDE_INT const_simdlen;
> 
>if (!TARGET_SIMD)
> @@ -29329,10 +29329,14 @@
> aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
>if (TREE_CODE (ret_type) != VOID_TYPE)
>  {
>nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type);
> +  wds_elt_bits = nds_elt_bits;
>vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits));
>  }
>else
> -nds_elt_bits = POINTER_SIZE;
> +{
> +  nds_elt_bits = POINTER_SIZE;
> +  wds_elt_bits = 0;
> +}
> 
>int i;
>tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl));
> @@ -29340,44 +29344,65 @@
> aarch64_simd_clone_

Re: [PATCH] ifcombine field-merge: saturate align at inner object size

2024-12-11 Thread Richard Biener




> Am 11.12.2024 um 14:50 schrieb Alexandre Oliva :
> 
> On Dec 10, 2024, Richard Biener  wrote:
> 
>>> On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva  wrote:
>>> Regstrapped on x86_64-linux-gnu.  The aarch64 CI had reported a
>>> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu,
>>> but it will still be a while.  Ok to install if it passes?
> 
>> OK.
> 
> Thanks.  The aarch64-linux-gnu bootstrap revealed a latent surprise, in
> the form of a bootstrap failure, that this incremental patch addresses.
> I'll hold off from installing the larger patch without a fix for this.
> 
> 
> A bootstrap on aarch64-linux-gnu revealed that sometimes (for example,
> when building shorten_branches in final.cc) we will find such things
> as MEM , where unsigned int happen to be a variant of
> the original unsigned int type, given 64-bit alignment.  This unusual
> alignment circumstance caused get_best_mode to choose DImode instead
> of SImode, and that failed gimple verification because there aren't
> that many bits in the unsigned int object.
> 
> Arrange for alignment to saturate at the inner object size to avoid
> tripping this error.
> 
> Regstrapping on x86_64-linux-gnu.  Regstrapped an earlier version
> without lr_align on aarch64-linux-gnu.  Ok to install?
> 
> 
> for  gcc/ChangeLog
> 
>* gimple-fold.cc (fold_truth_andor_for_ifcombine): Saturate
>align at inner object size.
> ---
> gcc/gimple-fold.cc |   39 +++
> 1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a31fc283d51b0..967356a950192 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -8204,24 +8204,31 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
>  to be relative to a field of that size.  */
>   first_bit = MIN (ll_bitpos, rl_bitpos);
>   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> +  HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> +  /* Guard from types with wider-than-size alignment.  We must not widen the
> + load beyond its total size.  This is rare.  */
> +  while (ll_align > BITS_PER_UNIT
> + && TYPE_SIZE (TREE_TYPE (ll_inner))
> + && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner)))
> + && known_gt ((unsigned HOST_WIDE_INT)ll_align,
> +  tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)
> +ll_align /= 2;
>   if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,

I think These 0, 0 args are supposed to indicate Maximum extent of the 
resulting Access - but Widening to the largest Mode fitting alignment boundary 
should be valid, so I Wonder what the failure Mode was?

Richard 

> - TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> - volatilep, &lnmode))
> + ll_align, BITS_PER_WORD, volatilep, &lnmode))
> l_split_load = false;
>   else
> {
>   /* Consider the possibility of recombining loads if any of the
> fields straddles across an alignment boundary, so that either
> part can be loaded along with the other field.  */
> -  HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
>   HOST_WIDE_INT boundary = compute_split_boundary_from_align
> -(align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +(ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> 
>   if (boundary < 0
>  || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> - align, BITS_PER_WORD, volatilep, &lnmode)
> + ll_align, BITS_PER_WORD, volatilep, &lnmode)
>  || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> - align, BITS_PER_WORD, volatilep, &lnmode2))
> + ll_align, BITS_PER_WORD, volatilep, &lnmode2))
>return 0;
> 
>   /* If we can't have a single load, but can with two, figure out whether
> @@ -8368,16 +8375,24 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
> tree truth_type,
> and then we use two separate compares.  */
>   first_bit = MIN (lr_bitpos, rr_bitpos);
>   end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> +  HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> +  /* Guard from types with wider-than-size alignment.  We must not widen 
> the
> + load beyond its total size.  This is rare.  */
> +  while (lr_align > BITS_PER_UNIT
> + && TYPE_SIZE (TREE_TYPE (lr_inner))
> + && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner)))
> + && known_gt ((unsigned HOST_WIDE_INT)lr_align,
> +  tree_to_poly_uint64 (TYPE_SIZE
> +   (TREE_TYPE (lr_inner)
> +lr_align /= 2;
>   if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -  TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> -  volatilep, &rnmode))
> +  lr_align, BITS_PER_WORD, volatilep, &rnmode))
>{
>  /* C

Re: [PATCH] c++: allow stores to anon union vars to change current union member in constexpr [PR117614]

2024-12-11 Thread Jason Merrill


On 12/11/24 4:11 AM, Jakub Jelinek wrote:

Hi!

Since r14-4771 the FE tries to differentiate between cases where the lhs
of a store allows changing the current union member and cases where it
doesn't, and cases where it doesn't includes everything that has gone
through the cxx_eval_constant_expression path on the lhs.
As the testcase shows, DECL_ANON_UNION_VAR_P vars were handled like that
too, even when stores to them are the only way how to change the current
union member in the sources.

So, the following patch just handles that case manually without calling
cxx_eval_constant_expression and without setting evaluated to true.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK for trunk and 14.


2024-12-11  Jakub Jelinek  

PR c++/117614
* constexpr.cc (cxx_eval_store_expression): For stores to
DECL_ANON_UNION_VAR_P vars just continue with DECL_VALUE_EXPR
of it, without setting evaluated to true or full
cxx_eval_constant_expression.

* g++.dg/cpp2a/constexpr-union8.C: New test.

--- gcc/cp/constexpr.cc.jj  2024-12-06 09:08:20.977872404 +0100
+++ gcc/cp/constexpr.cc 2024-12-09 16:57:23.682959311 +0100
@@ -6418,6 +6418,14 @@ cxx_eval_store_expression (const constex
object = probe;
  else
{
+ tree pvar = tree_strip_any_location_wrapper (probe);
+ if (VAR_P (pvar) && DECL_ANON_UNION_VAR_P (pvar))
+   {
+ /* Stores to DECL_ANON_UNION_VAR_P var are allowed to change
+active union member.  */
+ probe = DECL_VALUE_EXPR (pvar);
+ break;
+   }
  probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue,
non_constant_p, overflow_p);
  evaluated = true;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C.jj2024-12-09 
17:06:02.876721019 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C   2024-12-09 
17:05:01.832572060 +0100
@@ -0,0 +1,31 @@
+// PR c++/117614
+// { dg-do compile { target c++20 } }
+
+constexpr int
+foo ()
+{
+  union {
+int x{0};
+char y;
+  };
+  y = 1;
+  return y;
+}
+
+constexpr int
+bar ()
+{
+  union {
+union {
+  int x{0};
+  char y;
+};
+long long z;
+  };
+  y = 1;
+  z = 2;
+  return z;
+}
+
+static_assert (foo () == 1);
+static_assert (bar () == 2);

Jakub

Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support

2024-12-11 Thread Eric Botcazou

> Yes, please, I don't have commit access.

OK, done.

-- 
Eric Botcazou

Re: [PATCH] ifcombine field-merge: saturate align at inner object size

2024-12-11 Thread Alexandre Oliva

On Dec 11, 2024, Richard Biener  wrote:

> Am 11.12.2024 um 14:50 schrieb Alexandre Oliva :
>> if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,

> I think These 0, 0 args are supposed to indicate Maximum extent of the
> resulting Access - but Widening to the largest Mode fitting alignment
> boundary should be valid, so I Wonder what the failure Mode was?

The generated BIT_FIELD_REF hit this gimple check in tree-cfg.cc:

  if (!AGGREGATE_TYPE_P (TREE_TYPE (op))
  && known_gt (size + bitpos,
   tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (op)
{
  error ("position plus size exceeds size of referenced object in "
 "%qs", code_name);
  return true;
}

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Arsen Arsenović

Jonathan Wakely  writes:

>> +  for (auto _ : gen("const This& argument", std::allocator_arg , &mr))
>
> I have no idea what the promise_type::operator new overload that takes
> a const This& argument is for, but this invocation of gen uses it, so
> ... yay?

Member coroutines can call that overload:

  struct foo {
std::generator>
bar(std::allocator_arg_t, std::allocator) const&
{ co_yield 123; }
  };
  
  void f() {
  foo x;
  x.bar(std::allocator_arg, {});
  }

>> +VERIFY(mr.number_of_active_allocations() == 1);
>> +
>> +  VERIFY(mr.number_of_active_allocations() == 0);
>> +}
>> diff --git a/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc
>> b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc
>> new file mode 100644
>> index 000..957879e8862
>> --- /dev/null
>> +++ b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc
>> @@ -0,0 +1,16 @@
>> +// { dg-do compile { target c++23 } }
>> +
>> +// LWG 3900. allocator_arg_t overloads of generator::promise_type::operator 
>> new
>> +// should not be constrained
>> +
>> +#include 
>> +#include 
>> +
>> +std::pmr::generator
>> +bar(std::allocator_arg_t, std::pmr::memory_resource& mr) // { dg-error 
>> "here" }
>> +{
>> +  co_yield 3;
>> +}
>> +
>> +// { dg-error "static assertion failed" "" { target *-*-* } 0 }
>> +// { dg-error "no matching function .*memory_resource&" "" { target *-*-* } 
>> 0 }
>> --
>> 2.47.1
>>

-- 
Arsen Arsenović


signature.asc
Description: PGP signature

Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Arsen Arsenović

Jonathan Wakely  writes:

> Like this?
>
> /home/jwakely/gcc/15/include/c++/15.0.0/generator:478:25: error:
> static assertion failed: the argument that follows std::allocator_arg
> must be convertible to the generator's allocator type
>  478 |   static_assert(convertible_to,
>  | ^
>
> Can we be more precise than "the argument"? Is "the coroutine
> argument" accurate? The function that the argument is passed to is a
> coroutine, so I think that's correct.

Yes, I think 'coroutine argument' is OK - this function is only called
for allocating the coroutine state when a coroutine is starting, so it
can't be anything else (well, unless the user does the call manually for
some reason..).
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

Re: [PATCH] Add COBOL to gcc

2024-12-11 Thread James K. Lowden

I think the term of art is "ping"?  

If GCC needs something from me to proceed with this, please tell me what
it is.  

--jkl

On Thu, 7 Nov 2024 17:28:33 -0500
"James K. Lowden"  wrote:

> On Fri, 8 Nov 2024 13:52:55 +0100
> Jakub Jelinek  wrote:
> 
> > Rather than a diff from /dev/null,
> > > it's a blob with the exact file contents.  I hope it is correct in
> > > this form.
> > 
> > That is just how the web git viewer presents new file commits.
> > On gcc-patches those should be posted as normal patches.
> 
> Below is hopefully a well formed patch.  It adds ChangeLogs for the
> COBOL front end.  
> 
> [snip]
> From 304f3678dbade1f60abdadb9ddd2baffae88013dpre.patch 4 Oct 2024
> 12:01:22 -0400 From: "James K. Lowden" 
> Date: Fri 08 Nov 2024 03:30:08 PM EST
> Subject: [PATCH]  Add 'cobol' to 2 files
> 
> gcc/cobol/ChangeLog   [new file with mode: 0644]  
> libgcobol/ChangeLog   [new file with mode: 0644]
> 
> ---
> gcc/cobol/ChangeLog | ++-
> libgcobol/ChangeLog | ++
> 2 files changed, 12 insertions(+), 2 deletions(-)
> diff --git a/gcc/cobol/ChangeLog b/gcc/cobol/ChangeLog
> new file mode 100644
> index 000..2988f44a1f1
> --- /dev/null
> +++ b/gcc/cobol/ChangeLog
> @@ -0,0 +1,6 @@
> +^L
> +Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +Copying and distribution of this file, with or without modification,
> +are permitted in any medium without royalty provided the copyright
> +notice and this notice are preserved.
> diff --git a/libgcobol/ChangeLog b/libgcobol/ChangeLog
> new file mode 100644
> index 000..2988f44a1f1
> --- /dev/null
> +++ b/libgcobol/ChangeLog
> @@ -0,0 +1,6 @@
> +^L
> +Copyright (C) 2022 Free Software Foundation, Inc.
> +
> +Copying and distribution of this file, with or without modification,
> +are permitted in any medium without royalty provided the copyright
> +notice and this notice are preserved.
> [pins]

Re: [PATCH] c++: template-id dependence wrt local static arg [PR117792]

2024-12-11 Thread Patrick Palka

On Wed, 27 Nov 2024, Patrick Palka wrote:

> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK
> for trunk/14/13?

Ping.

It occurred to me that another way to fix this PR might be to tweak the
overload set pruning (for non-dependent calls) to un-specialize the
selected function template specialization if it turned out to be dependent:

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index efa748c7dc8..08d05209f44 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -3311,6 +3311,13 @@ finish_call_expr (tree fn, vec **args, bool 
disallow_virtual,
/* Our original callee wasn't wrapped in an ADDR_EXPR,
   so strip this ADDR_EXPR added by build_over_call.  */
sel_fn = TREE_OPERAND (sel_fn, 0);
+ if (TREE_CODE (sel_fn) == FUNCTION_DECL
+ && type_dependent_expression_p (sel_fn)
+ && PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (sel_fn)))
+   {
+ sel_fn = ovl_make (DECL_TI_TEMPLATE (sel_fn));
+ sel_fn = lookup_template_function (fns, ...);
+   }
  orig_fn = sel_fn;
}

But IMHO this should be impossible in the first place -- if an
unresolved overload set is deemed non-dependent then any specialization
selected by overload resolution should be deemed non-dependent as
well.  Or conversely if overload resolution yielded a dependent
specialization then the unresolved overload set should be considered
dependent.

> 
> -- >8 --
> 
> Here we end up ICEing at instantiation time for the call to
> f ultimately because we wrongly consider the call to be
> non-dependent, and so we specialize f ahead of time and then get
> confused when fully substituting this specialization.
> 
> The call is dependent due to [temp.dep.temp]/3 and we miss that because
> function template-id arguments aren't coerced until overload resolution,
> and so the local static template argument lacks an implicit cast to
> reference type that value_dependent_expression_p looks for before
> considering dependence of the address.  Other kinds of template-ids aren't
> affected since they're coerced ahead of time.
> 
> So when considering dependence of a function template-id, we need to
> conservatively consider dependence of the address of each argument (if
> applicable).
> 
>   PR c++/117792
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (type_dependent_expression_p): Consider the dependence
>   of the address of each template argument of a function
>   template-id.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp1z/nontype7.C: New test.
> ---
>  gcc/cp/pt.cc  | 10 --
>  gcc/testsuite/g++.dg/cpp1z/nontype7.C | 22 ++
>  2 files changed, 30 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype7.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 564e368ff43..2f2ec39b083 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -28864,9 +28864,15 @@ type_dependent_expression_p (tree expression)
>  
>if (TREE_CODE (expression) == TEMPLATE_ID_EXPR)
>   {
> -   if (any_dependent_template_arguments_p
> -   (TREE_OPERAND (expression, 1)))
> +   tree args = TREE_OPERAND (expression, 1);
> +   if (any_dependent_template_arguments_p (args))
>   return true;
> +   /* Arguments of a function template-id aren't necessarily coerced
> +  yet so we must conservatively assume that the address (and not
> +  just value) of the argument matters as per [temp.dep.temp]/3.  */
> +   for (tree arg : tree_vec_range (args))
> + if (has_value_dependent_address (arg))
> +   return true;
> expression = TREE_OPERAND (expression, 0);
> if (identifier_p (expression))
>   return true;
> diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype7.C 
> b/gcc/testsuite/g++.dg/cpp1z/nontype7.C
> new file mode 100644
> index 000..b03c643c987
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp1z/nontype7.C
> @@ -0,0 +1,22 @@
> +// PR c++/117792
> +// { dg-do compile { target c++17 } }
> +
> +template
> +void f(T) { }
> +
> +template
> +void f(...) = delete;
> +
> +template int v;
> +
> +template struct A { static const int value = 0; };
> +
> +template
> +void g() {
> +  static const int local_static = 0;
> +  auto x = v; // OK
> +  auto y = A::value; // OK
> +  f(0); // ICE
> +}
> +
> +template void g();
> -- 
> 2.47.1.313.gcc01bad4a9
> 
>

[PATCH v2] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Jonathan Wakely

On Wed, 11 Dec 2024 at 14:59, Arsen Arsenović  wrote:
>
> Jonathan Wakely  writes:
>
> >> +  for (auto _ : gen("const This& argument", std::allocator_arg , &mr))
> >
> > I have no idea what the promise_type::operator new overload that takes
> > a const This& argument is for, but this invocation of gen uses it, so
> > ... yay?
>
> Member coroutines can call that overload:
>
>   struct foo {
> std::generator>
> bar(std::allocator_arg_t, std::allocator) const&
> { co_yield 123; }
>   };
>
>   void f() {
>   foo x;
>   x.bar(std::allocator_arg, {});
>   }
>

Yeah, I figured that out eventually!

Here's a v2 patch that adds the static assert messages and now also
tests allocation for member coroutines (with implicit and explicit
object parameters).
commit 220d5bd2d1ff38dce3a8a7dfe7cea476126ff5a4
Author: Jonathan Wakely 
AuthorDate: Wed Dec 11 10:44:33 2024
Commit: Jonathan Wakely 
CommitDate: Wed Dec 11 15:23:56 2024

libstdc++: Remove constraints on std::generator::promise_type::operator new

This was approved in WrocÅaw as LWG 3900, so that passing an incorrect
argument intended as an allocator will be ill-formed, instead of
silently ignored.

This also renames the template parameters and function parameters for
the allocators, to align with the names in the standard. I found it too
confusing to have a parameter _Alloc which doesn't correspond to Alloc
in the standard. Rename _Alloc to _Allocator (which the standard calls
Allocator) and rename _Na to _Alloc (which the standard calls Alloc).

libstdc++-v3/ChangeLog:

* include/std/generator (_Promise_alloc): Rename template
parameter. Use __alloc_rebind to rebind allocator.
(_Promise_alloc::operator new): Replace constraints with a
static_assert in the body. Rename allocator parameter.
(_Promise_alloc::_M_allocate): Rename allocator parameter.
Use __alloc_rebind to rebind allocator.
(_Promise_alloc::operator new): Rename allocator
parameter.
* testsuite/24_iterators/range_generators/alloc.cc: New test.
* testsuite/24_iterators/range_generators/lwg3900.cc: New test.

Reviewed-by: Arsen ArsenoviÄ 

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index 0a14e70064e..bba85bd0aa4 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 concept _Stateless_alloc = (allocator_traits<_All>::is_always_equal::value
&& default_initializable<_All>);

-template
+template
   class _Promise_alloc
   {
-   using _ATr = allocator_traits<_Alloc>;
-   using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>;
-   using _Rebound_ATr = typename _ATr
- ::template rebind_traits<_Alloc_block>;
+   using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>;
+   using _Rebound_ATr = allocator_traits<_Rebound>;
static_assert(is_pointer_v,
  "Must use allocators for true pointers with generators");

@@ -465,30 +463,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public:
void*
operator new(std::size_t __sz)
- requires default_initializable<_Rebound> // _Alloc is non-void
+ requires default_initializable<_Rebound> // _Allocator is non-void
{ return _M_allocate({}, __sz); }

-   template
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3900. The allocator_arg_t overloads of promise_type::operator new
+   // should not be constrained
+   template
void*
operator new(std::size_t __sz,
-allocator_arg_t, const _Na& __na,
+allocator_arg_t, const _Alloc& __a,
 const _Args&...)
- requires convertible_to
{
- return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
-__sz);
+ static_assert(convertible_to,
+   "the allocator argument to the coroutine must be "
+   "convertible to the generator's allocator type");
+ return _M_allocate(_Rebound(_Allocator(__a)), __sz);
}

-   template
+   template
void*
operator new(std::size_t __sz,
 const _This&,
-allocator_arg_t, const _Na& __na,
+allocator_arg_t, const _Alloc& __a,
 const _Args&...)
- requires convertible_to
{
- return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
-__sz);
+ static_assert(convertible_to,
+   "the allocator argument to the coroutine must be "
+   "convertible to the generator's allocator type");
+ return _M_all

Re: [PATCH] c++: tweak colorization of incompatible declspecs

2024-12-11 Thread David Malcolm

On Fri, 2024-11-15 at 20:05 -0500, David Malcolm wrote:
> Introduce a helper function for complaining about "signed unsigned"
> and "short long".  Add colorization there so that e.g. the 'signed'
> and 'unsigned' are given consisten contrasting colors in both the
> message and the quoted source.
> 
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

...and pushed to trunk as r15-6118-g4b4023d52986b2, having re-run the
above checks.

Dave

[pushed: r15-6114] input.cc: rename file_cache:in_context

2024-12-11 Thread David Malcolm

No functional change intended.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-6114-ge4ef3aa2911f1a.

gcc/ChangeLog:
* input.cc (file_cache::initialize_input_context): Rename member
"in_context" to "m_input_context".
(file_cache::add_file): Likewise.
* input.h (class file_cache): Likewise.

Signed-off-by: David Malcolm 
---
 gcc/input.cc | 6 +++---
 gcc/input.h  | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 20b5f62371ac..7fc683db23f1 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -45,8 +45,8 @@ void
 file_cache::initialize_input_context (diagnostic_input_charset_callback ccb,
  bool should_skip_bom)
 {
-  in_context.ccb = (ccb ? ccb : default_charset_callback);
-  in_context.should_skip_bom = should_skip_bom;
+  m_input_context.ccb = (ccb ? ccb : default_charset_callback);
+  m_input_context.should_skip_bom = should_skip_bom;
 }
 
 /* This is a cache used by get_next_line to store the content of a
@@ -435,7 +435,7 @@ file_cache::add_file (const char *file_path)
 
   unsigned highest_use_count = 0;
   file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
-  if (!r->create (in_context, file_path, fp, highest_use_count))
+  if (!r->create (m_input_context, file_path, fp, highest_use_count))
 return NULL;
   return r;
 }
diff --git a/gcc/input.h b/gcc/input.h
index c6eb1aeda3e9..a5863eb9e091 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -162,7 +162,7 @@ class file_cache
  private:
   static const size_t num_file_slots = 16;
   file_cache_slot *m_file_slots;
-  input_context in_context;
+  input_context m_input_context;
 };
 
 extern expanded_location
-- 
2.26.3

[pushed: r15-6115] diagnostics: tweak output for nested messages [PR116253]

2024-12-11 Thread David Malcolm

When printing nested messages with
-fdiagnostics-set-output=text:experimental-nesting=yes
avoid printing a line such as the "cc1plus:" in the following:
• note: set ‘-fconcepts-diagnostics-depth=’ to at least 2 for more 
detail
  cc1plus:
for "special" locations such as UNKNOWN_LOCATION.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r15-6115-g331226fd87c201.

gcc/ChangeLog:
PR other/116253
* diagnostic-format-text.cc (on_report_diagnostic): When showing
locations for nested messages on new lines, don't print
UNKNOWN_LOCATION or BUILTINS_LOCATION.

Signed-off-by: David Malcolm 
---
 gcc/diagnostic-format-text.cc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/diagnostic-format-text.cc b/gcc/diagnostic-format-text.cc
index 3650da9a927e..e8074aa05dac 100644
--- a/gcc/diagnostic-format-text.cc
+++ b/gcc/diagnostic-format-text.cc
@@ -226,11 +226,12 @@ on_report_diagnostic (const diagnostic_info &diagnostic,
   const int nesting_level = get_context ().get_diagnostic_nesting_level ();
   if (nesting_level > 0)
{
+ location_t loc = diagnostic_location (&diagnostic);
  pp_set_prefix (pp, nullptr);
  char *indent_prefix = build_indent_prefix (false);
  /* Only print changes of location.  */
- if (diagnostic_location (&diagnostic)
- != get_context ().m_last_location)
+ if (loc != get_context ().m_last_location
+ && loc > BUILTINS_LOCATION)
{
  const expanded_location s
= diagnostic_expand_location (&diagnostic);
-- 
2.26.3

Re: [PATCH 1/2] c++: print z candidate count and number them

2024-12-11 Thread David Malcolm

On Tue, 2024-12-10 at 19:08 -0500, Jason Merrill wrote:
> On 12/10/24 6:05 PM, David Malcolm wrote:
> > On Sat, 2024-12-07 at 10:19 -0500, Jason Merrill wrote:
> > > On 11/15/24 8:02 PM, David Malcolm wrote:
> > > > This patch is a followup to:
> > > >     "c++: use diagnostic nesting [PR116253]"
> > > > 
> > > > Following Sy Brand's UX suggestions in P2429R0 for example 1,
> > > > this
> > > > patch
> > > > tweaks print_z_candidates to add a note about the number of
> > > > candidates,
> > > > and adds a candidate number to each one.
> > > > 
> > > > Various examples of output can be seen in the testsuite part of
> > > > the
> > > > patch.
> > > > 
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > > OK for trunk?
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > * call.cc (print_z_candidates): Count the number of
> > > > candidates and issue a note stating the count at an
> > > > intermediate nesting level.  Number the individual
> > > > candidates.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > * g++.dg/concepts/diagnostic9.C: Update expected
> > > > results for candidate count and numbering.
> > > > * g++.dg/concepts/nested-diagnostics-1-truncated.C:
> > > > * g++.dg/concepts/nested-diagnostics-1.C: Likewise.
> > > > * g++.dg/concepts/nested-diagnostics-2.C: Likewise.
> > > > * g++.dg/cpp23/explicit-obj-lambda11.C: Likewise.
> > > > * g++.dg/cpp2a/desig4.C: Likewise.
> > > > * g++.dg/cpp2a/desig6.C: Likewise.
> > > > * g++.dg/cpp2a/spaceship-eq15.C: Likewise.
> > > > * g++.dg/diagnostic/function-color1.C: Likewise.
> > > > * g++.dg/diagnostic/param-type-mismatch-2.C: Likewise.
> > > > * g++.dg/diagnostic/pr100716-1.C: Likewise.
> > > > * g++.dg/diagnostic/pr100716.C: Likewise.
> > > > * g++.dg/lookup/operator-2.C: Likewise.
> > > > * g++.dg/lookup/pr80891-5.C: Likewise.
> > > > * g++.dg/modules/adhoc-1_b.C: Likewise.
> > > > * g++.dg/modules/err-1_c.C: Likewise.
> > > > * g++.dg/modules/err-1_d.C: Likewise.
> > > > * g++.dg/other/return2.C: Likewise.
> > > > * g++.dg/overload/error6.C: Likewise.
> > > > * g++.dg/template/local6.C: Likewise.
> > > > 
> > > > Signed-off-by: David Malcolm 
> > > > ---
> > > >    gcc/cp/call.cc    | 16 ++-
> > > >    gcc/testsuite/g++.dg/concepts/diagnostic9.C   |  2 +-
> > > >    .../concepts/nested-diagnostics-1-truncated.C | 25 +
> > > > --
> > > >    .../g++.dg/concepts/nested-diagnostics-1.C    | 43
> > > > ++
> > > > -
> > > >    .../g++.dg/concepts/nested-diagnostics-2.C    | 27 ++---
> > > > ---
> > > >    .../g++.dg/cpp23/explicit-obj-lambda11.C  |  3 +-
> > > >    gcc/testsuite/g++.dg/cpp2a/desig4.C   |  4 +-
> > > >    gcc/testsuite/g++.dg/cpp2a/desig6.C   |  4 +-
> > > >    gcc/testsuite/g++.dg/cpp2a/spaceship-eq15.C   |  8 ++--
> > > >    .../g++.dg/diagnostic/function-color1.C   |  2 +-
> > > >    .../g++.dg/diagnostic/param-type-mismatch-2.C | 10 -
> > > >    gcc/testsuite/g++.dg/diagnostic/pr100716-1.C  | 14 +++---
> > > >    gcc/testsuite/g++.dg/diagnostic/pr100716.C    | 14 +++---
> > > >    gcc/testsuite/g++.dg/lookup/operator-2.C  |  2 +-
> > > >    gcc/testsuite/g++.dg/lookup/pr80891-5.C   |  6 +--
> > > >    gcc/testsuite/g++.dg/modules/adhoc-1_b.C  |  4 +-
> > > >    gcc/testsuite/g++.dg/modules/err-1_c.C    | 10 ++---
> > > >    gcc/testsuite/g++.dg/modules/err-1_d.C    |  6 +--
> > > >    gcc/testsuite/g++.dg/other/return2.C  |  2 +-
> > > >    gcc/testsuite/g++.dg/overload/error6.C    |  4 +-
> > > >    gcc/testsuite/g++.dg/template/local6.C    |  2 +-
> > > >    21 files changed, 116 insertions(+), 92 deletions(-)
> > > > 
> > > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > > > index db9eb1a55cfc..f820419ee4ff 100644
> > > > --- a/gcc/cp/call.cc
> > > > +++ b/gcc/cp/call.cc
> > > > @@ -4134,6 +4134,16 @@ print_z_candidates (location_t loc,
> > > > struct
> > > > z_candidate *candidates,
> > > >    
> > > >  auto_diagnostic_nesting_level sentinel;
> > > >    
> > > > +  int num_candidates = 0;
> > > > +  for (auto iter = candidates; iter; iter = iter->next)
> > > > +    ++num_candidates;
> > > > +
> > > > +  inform_n (loc,
> > > > +       num_candidates, "we found %i candidate", "we found
> > > > %i
> > > > candidates",
> > > 
> > > The use of 'we' here is novel, especially since we're talking
> > > about
> > > objective properties of the TU.  I'd say "there are".
> > 
> > FWIW I actually prefer the wording "I found" (like does Elm).
> > 
> > Flow's style guide:
> > https://calebmer.com/2019/07/01/writing-good-compiler-error-messages.html
> > says "Write messages in first person plural. That is, use “we”. For
> > example “we see an error”. This personifies the compiler as a team
> > of
> > peopl

Re: [PATCH v4 1/1] C++: Add flag to generate resolver at default version implementation.

2024-12-11 Thread Richard Sandiford

Alfie Richards  writes:
> Hi Richard,
>
> Thank you for the feedback!
>
> On 10/12/2024 18:49, Richard Sandiford wrote:
>>> -/* Returns true if DECL is multi-versioned using the target attribute, and 
>>> this
>>> -   is the default version.  This function can only be used for targets 
>>> that do
>>> -   not support the "target_version" attribute.  */
>>> +/* Returns true if DECL could be a default version of a FMV function set
>>> +   If TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_DECL is false, this is when 
>>> DECL
>> TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_IMPL
>>
>> But could we key this off !TARGET_HAS_FMV_TARGET_ATTRIBUTE instead?
>> At the moment that conditino is essentially "aarch64 semantics", and given
>> the general complexity of FMV, it would be good to avoid creating more
>> variations than we support.
>
> This function has always seemed like a bad compromise to me and is
> currently trying to answer two different questions.
>
> I don't think having it depend on TARGET_HAS_FMV_TARGET_ATTRIBUTE is 
> necessarily any better? Both are false equivalences in my opinion.
>
> I think I would prefer to create a different function called 
> "could_function_be_default_version" containing the different semantics 
> which could more reasonably be conditioned on 
> TARGET_HAS_FMV_TARGET_ATTRIBUTE.
> Then similarly, is_function_default_version can be changed back to 
> working out if a function definitely is the default version

I'm not sure.  The AArch64 semantics seem to be that an explicit
target_version("default") is usually equivalent to no attribute.
The only potential difference I'm aware of is whether:

  __attribute__((target_version("default"))) void f(void) {}

should force a .default alias to be created even when f resolves directly
to the function.  (I'm asking internally for clarification on that.)

So would the distinction between "is" and "could be" be useful in
practice?  This predicate looks at individual decls, whereas the AArch64
semantics mean that "is" is instead a property of the TU as a whole.

>> AIUI, we rely on get_function_versions_dispatcher to arrange this ordering.
>> (Each target does it independently, sigh.)  Can we rely on the hook's
>> having been called at this point?
>>
>> That is, it seems like this code assumes that
>> get_function_versions_dispatcher has already been called, whereas the
>> code below calls get_function_versions_dispatcher on-the-fly when it
>> sees the default version.  Does something guarantee that the default
>> node is analysed before all of the other versions?
> Ahh fair point. I am convinced we can at this point certainly, as this case 
> is only
> triggered if this is the dispatched symbol, which is created by that function 
> :)

Oops, yes.  I thought at first it was keyed off non-default versions.

> It is a good point in general and there may be other cases I've made the
> assumption that this is true.
>
> In my other patch, I change the logic for record_function_versions
> to implicitly ensure the default is always the first version in any
> fmv structure, which I think is a good change and I think I will split it
> off into a separate patch for both of these to rely on.

Sounds good if I've understood correctly.

>> The convention is for error messages to start with lower case and not
>> to end with ".".
> Thank you! I will change this.
>> There should be a test to exercise this codepath.  Locally I tried:
>>
>> __attribute__((target_version("sve"))) int f() { return 2; }
>> int g() { return f(); }
>>
>> but it seems to ICE in aarch64_get_function_versions_dispatcher.
>> (Hope I didn't mess up the testing.)
> Oh dear, apologies. I think I had created tests for this in the
> C frontend as that was more my focus when creating this, and clearly
> didn't investigate the C++ versions thoroughly enough. I will
> copy my tests there over and make sure these paths and cases work
> correctly.
>> As discussed off-list, this was a last-minute change that doesn't
>> compile 🙂  The earlier:
>>
>>/* Mark the assembler name as set to prevent it getting mangled again 
>> .*/
>>SET_DECL_ASSEMBLER_NAME (dispatch_decl,
>> DECL_ASSEMBLER_NAME (dispatch_decl));
>>
>> is probably correct, but I think it'd be worth moving into
>> make_dispatcher_decl, for !TARGET_HAS_FMV_TARGET_ATTRIBUTE.
>>
>> PRs 117987 and 117989 suggest that, independently of this patch,
>> we seem to have some issues around FMV and assembler names.
> Yes apologies for the unforced error! I think I was in the wrong work
> tree or checkout.
>
> I agree it is better in make_dispatcher_decl but I was a little hesitant
> to change other back ends behaviour. Specifically I believe x86 exhibits
> the double mangling behaviour at the moment and was nervous to change that as
> it could be a breaking change for them.

Yeah, that's why I was suggesting keying it off
!TARGET_HAS_FMV_TARGET_ATTRIBUTE (aka the "AArch64 semantics" toggle).
That should ensure that

Re: [PATCH 00/15] arm: [MVE intrinsics] Rework store_scatter and load_gather intrinsics

2024-12-11 Thread Richard Earnshaw (lists)


On 07/11/2024 09:18, Christophe Lyon wrote:

This patch series re-implements the store_scatter and load_gather
intrinsincs using the new framework, similarly to previous series.

A few points worth mentioning:

- unlike other intrinsics, these ones have the predicate after the
   mode in their names, hence the need for patch #1

- when checking the 'offset' argument of the *_base_* intrinsics, we
   need ranges with negative lower bounds, unlike what we needed so far
   (SVE does not have such negative bounds AFAIK), hence the need for
   patch #5 and the use of 'ss64' instead of 'su64' in signatures.

- because of some pecularities in ACLE expected output wrt data type
   suffix (.16 vs .u16 vs .f16 for instance), I chose to update a few
   tests in patches #12 and #13, and to introduce a dedicated iterator
   in other cases (patch#10, using and fixing an existing iterator
   would have impact on Neon tests).  I chose the approach which seemed
   the less invasive, but maybe we should aim at more consistency and
   update ACLE instead?



Thanks for this, it's a nice cleanup.

This patch series is OK.  In fact, I think you should consider further 
changes to move things from arm_mve.h inside the pragma as pre-approved, 
once stage 1 re-opens, and provided they don't need to go significantly 
beyond the type of changes needed here.


One thing we should perhaps consider in future (ie not right now) is 
whether we really need poly types in the Arm back-end code.  Patch 8 
contains:



+  machine_mode memory_vector_mode (const function_instance &fi) const 
override

+  {
+poly_uint64 nunits = GET_MODE_NUNITS (fi.vector_mode (0));
+return arm_mve_data_mode (m_to_int_mode, nunits).require ();
+  }

But since poly types on Arm should degenerate into simple host wide 
ints, this feels a bit like overkill.


R.


Thanks,

Christophe

Christophe Lyon (15):
   arm: [MVE intrinsics] add mode_after_pred helper in function_shape
   arm: [MVE intrinsics] add store_scatter_offset shape
   arm: [MVE intrinsics] rework vstr?q_scatter_offset
   arm: [MVE intrinsics] rework vstr_scatter_shifted_offset
   arm: [MVE intrinsics] Check immediate is a multiple in a range
   arm: [MVE intrinsics] Add store_scatter_base shape
   arm: [MVE intrinsics] rework vstr scatter_base
   arm: [MVE intrinsics] rework vstr scatter_base_wb
   arm: [MVE intrinsics] add load_ext_gather_offset shape
   arm: [MVE intrinsics] rework vldr gather_offset
   arm: [MVE intrinsics] rework vldr gather_shifted_offset
   arm: [MVE intrinsics] add load_gather_base shape
   arm: [MVE intrinsics] rework vldr gather_base
   arm: [MVE intrinsics] rework vldr gather_base_wb
   arm: [MVE intrinsics] remove useless call_properties implementations.

  gcc/config/arm/arm-builtins.cc|  146 -
  gcc/config/arm/arm-mve-builtins-base.cc   |  279 +-
  gcc/config/arm/arm-mve-builtins-base.def  |   28 +
  gcc/config/arm/arm-mve-builtins-base.h|   18 +
  gcc/config/arm/arm-mve-builtins-shapes.cc |  249 ++
  gcc/config/arm/arm-mve-builtins-shapes.h  |4 +
  gcc/config/arm/arm-mve-builtins.cc|   99 +-
  gcc/config/arm/arm-mve-builtins.h |5 +
  gcc/config/arm/arm_mve.h  | 3096 ++---
  gcc/config/arm/arm_mve_builtins.def   |  122 -
  gcc/config/arm/iterators.md   |   63 +-
  gcc/config/arm/mve.md | 2150 ++--
  gcc/config/arm/unspecs.md |   78 +-
  .../mve/intrinsics/vldrdq_gather_base_s64.c   |4 +-
  .../mve/intrinsics/vldrdq_gather_base_u64.c   |4 +-
  .../intrinsics/vldrdq_gather_base_wb_s64.c|4 +-
  .../intrinsics/vldrdq_gather_base_wb_u64.c|4 +-
  17 files changed, 1348 insertions(+), 5005 deletions(-)

Re: [PATCH v4 1/1] C++: Add flag to generate resolver at default version implementation.

2024-12-11 Thread Alfie Richards


Hi Richard,

Thank you for the feedback!

On 10/12/2024 18:49, Richard Sandiford wrote:

-/* Returns true if DECL is multi-versioned using the target attribute, and this
-   is the default version.  This function can only be used for targets that do
-   not support the "target_version" attribute.  */
+/* Returns true if DECL could be a default version of a FMV function set
+   If TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_DECL is false, this is when DECL

TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_IMPL

But could we key this off !TARGET_HAS_FMV_TARGET_ATTRIBUTE instead?
At the moment that conditino is essentially "aarch64 semantics", and given
the general complexity of FMV, it would be good to avoid creating more
variations than we support.


This function has always seemed like a bad compromise to me and is
currently trying to answer two different questions.

I don't think having it depend on TARGET_HAS_FMV_TARGET_ATTRIBUTE is 
necessarily any better? Both are false equivalences in my opinion.


I think I would prefer to create a different function called 
"could_function_be_default_version" containing the different semantics 
which could more reasonably be conditioned on 
TARGET_HAS_FMV_TARGET_ATTRIBUTE.
Then similarly, is_function_default_version can be changed back to 
working out if a function definitely is the default version.



AIUI, we rely on get_function_versions_dispatcher to arrange this ordering.
(Each target does it independently, sigh.)  Can we rely on the hook's
having been called at this point?

That is, it seems like this code assumes that
get_function_versions_dispatcher has already been called, whereas the
code below calls get_function_versions_dispatcher on-the-fly when it
sees the default version.  Does something guarantee that the default
node is analysed before all of the other versions?

Ahh fair point. I am convinced we can at this point certainly, as this case is 
only
triggered if this is the dispatched symbol, which is created by that function :)

It is a good point in general and there may be other cases I've made the
assumption that this is true.

In my other patch, I change the logic for record_function_versions
to implicitly ensure the default is always the first version in any
fmv structure, which I think is a good change and I think I will split it
off into a separate patch for both of these to rely on.

The convention is for error messages to start with lower case and not
to end with ".".

Thank you! I will change this.

There should be a test to exercise this codepath.  Locally I tried:

__attribute__((target_version("sve"))) int f() { return 2; }
int g() { return f(); }

but it seems to ICE in aarch64_get_function_versions_dispatcher.
(Hope I didn't mess up the testing.)

Oh dear, apologies. I think I had created tests for this in the
C frontend as that was more my focus when creating this, and clearly
didn't investigate the C++ versions thoroughly enough. I will
copy my tests there over and make sure these paths and cases work
correctly.

As discussed off-list, this was a last-minute change that doesn't
compile 🙂  The earlier:

   /* Mark the assembler name as set to prevent it getting mangled again .*/
   SET_DECL_ASSEMBLER_NAME (dispatch_decl,
   DECL_ASSEMBLER_NAME (dispatch_decl));

is probably correct, but I think it'd be worth moving into
make_dispatcher_decl, for !TARGET_HAS_FMV_TARGET_ATTRIBUTE.

PRs 117987 and 117989 suggest that, independently of this patch,
we seem to have some issues around FMV and assembler names.

Yes apologies for the unforced error! I think I was in the wrong work
tree or checkout.

I agree it is better in make_dispatcher_decl but I was a little hesitant
to change other back ends behaviour. Specifically I believe x86 exhibits
the double mangling behaviour at the moment and was nervous to change that as
it could be a breaking change for them.

I would appreciate any advice from your experience with how to handle such 
things.

Could you explain why this is needed in more detail?  I assume the
premise of the old code was that we didn't need to call
cgraph_node::record_function_versions if both functions were already
marked as versioned.

Admittedly, I don't know why that seemed necessary, since
cgraph_node::record_function_versions itself has an early-out.
But it sounded like this hunk was needed for correctness, rather than
simply being a clean-up.

Basically just as you say. It seemed like this was an argument here to prevent
the recording, and it was always calculated by checking if the arguments
had already been recorded. But, the record_function_versions function
does that exact check already.

Additionally, as I could no longer rely on DECL_FUNCTION_VERSIONED to
imply that a function had been recorded (as this patch means functions
get marked as versioned much earlier (see the start_preparsed_function change)).
Initially I created a new flag to replace this to imply if a function had been
recorded, b

Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types

2024-12-11 Thread Richard Sandiford

Jennifer Schmitz  writes:
> As follow-up to
> https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html,
> this patch implements folding of svmul by -1 to svneg for
> unsigned SVE vector types. The key idea is to reuse the existing code that
> does this fold for signed types and feed it as callback to a helper function
> that adds the necessary type conversions.
>
> For example, for the test case
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   return svmul_n_u64_x (pg, x, -1);
> }
>
> the following gimple sequence is emitted (-O2 -mcpu=grace):
> svuint64_t foo (svuint64_t x, svbool_t pg)
> {
>   svint64_t D.12921;
>   svint64_t D.12920;
>   svuint64_t D.12919;
>
>   D.12920 = VIEW_CONVERT_EXPR(x);
>   D.12921 = svneg_s64_x (pg, D.12920);
>   D.12919 = VIEW_CONVERT_EXPR(D.12921);
>   goto ;
>   :
>   return D.12919;
> }
>
> In general, the new helper gimple_folder::convert_and_fold
> - takes a target type and a function pointer,
> - converts the lhs and all non-boolean vector types to the target type,
> - passes the converted lhs and arguments to the callback,
> - receives the new gimple statement from the callback function,
> - adds the necessary view converts to the gimple sequence,
> - and returns the new call.
>
> Because all arguments are converted to the same target types, the helper
> function is only suitable for folding calls whose arguments are all of
> the same type. If necessary, this could be extended to convert the
> arguments to different types differentially.
>
> The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-sve-builtins-base.cc
>   (svmul_impl::fold): Wrap code for folding to svneg in lambda
>   function and pass to gimple_folder::convert_and_fold to enable
>   the transform for unsigned types.
>   * config/aarch64/aarch64-sve-builtins.cc
>   (gimple_folder::convert_and_fold): New function that converts
>   operands to target type before calling callback function, adding the
>   necessary conversion statements.
>   * config/aarch64/aarch64-sve-builtins.h
>   (gimple_folder::convert_and_fold): Declare function.
>   (signed_type_suffix_index): Return type_suffix_index of signed
>   vector type for given width.
>   (function_instance::signed_type): Return signed vector type for
>   given width.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome.
>   * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust
>   expected outcome.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 70 +--
>  gcc/config/aarch64/aarch64-sve-builtins.cc| 43 
>  gcc/config/aarch64/aarch64-sve-builtins.h | 31 
>  .../gcc.target/aarch64/sve/acle/asm/mul_u16.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u32.c |  5 +-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++-
>  .../gcc.target/aarch64/sve/acle/asm/mul_u8.c  |  7 +-
>  7 files changed, 153 insertions(+), 34 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 87e9909b55a..52401a8c57a 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -2092,33 +2092,61 @@ public:
>return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs)));
>  
>  /* If one of the operands is all integer -1, fold to svneg.  */
> -tree pg = gimple_call_arg (f.call, 0);
> -tree negated_op = NULL;
> -if (integer_minus_onep (op2))
> -  negated_op = op1;
> -else if (integer_minus_onep (op1))
> -  negated_op = op2;
> -if (!f.type_suffix (0).unsigned_p && negated_op)
> +if (integer_minus_onep (op1) || integer_minus_onep (op2))
>{
> - function_instance instance ("svneg", functions::svneg,
> - shapes::unary, MODE_none,
> - f.type_suffix_ids, GROUP_none, f.pred);
> - gcall *call = f.redirect_call (instance);
> - unsigned offset_index = 0;
> - if (f.pred == PRED_m)
> + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv,
> + vec &args_conv) -> gimple *
> {
> - offset_index = 1;
> - gimple_call_set_arg (call, 0, op1);
> -   }
> - else
> -   gimple_set_num_ops (call, 5);
> - gimple_call_set_arg (call, offset_index, pg);
> - gimple_call_set_arg (call, offset_index + 1, negated_op);
> - return call;
> + gcc_assert (lhs_conv && args_conv.length () == 3);
> + tree pg = args_conv[0];
> + tree op1 = args_conv[1];
> + tree op2 = args_conv[2];
> + tree

[PATCH] ifcombine field-merge: set upper bound for get_best_mode

2024-12-11 Thread Alexandre Oliva

On Dec 11, 2024, Richard Biener  wrote:

> I think These 0, 0 args are supposed to indicate Maximum extent of the
> resulting Access

Thanks, that looks much better indeed.


A bootstrap on aarch64-linux-gnu revealed that sometimes (for example,
when building shorten_branches in final.cc) we will find such things
as MEM , where unsigned int happens to be a variant of
the original unsigned int type, but with 64-bit alignment.  This
unusual alignment circumstance caused (i) get_inner_reference to not
look inside the MEM, (ii) get_best_mode to choose DImode instead of
SImode to access the object, so we built a BIT_FIELD_REF that
attempted to select all 64 bits of a 32-bit object, and that failed
gimple verification ("position plus size exceeds size of referenced
object") because there aren't that many bits in the unsigned int
object.

This patch avoids this failure mode by limiting the bitfield range
with the size of the inner object, if it is a known constant.

This enables us to avoid creating a BIT_FIELD_REF and reusing the load
expr, but we still introduced a separate load, that would presumably
get optimized out, but that is easy enough to avoid in the first place
by reusing the SSA_NAME it was originally loaded into, so I
implemented that in make_bit_field_load.

Regstrapped on x86_64-linux-gnu; tested that it fixes the known issue on
aarch64-linux-gnu, regstrapping now.  Ok to install?


for  gcc/ChangeLog

* gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit the
size of the bitregion in get_best_mode calls by the inner
object's type size, if known.
(make_bit_field_load): Reuse SSA_NAME if we're attempting to
issue an identical load.
---
 gcc/gimple-fold.cc |   52 ++--
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
index a31fc283d51b0..9179010c9eaf1 100644
--- a/gcc/gimple-fold.cc
+++ b/gcc/gimple-fold.cc
@@ -7751,6 +7751,15 @@ make_bit_field_load (location_t loc, tree inner, tree 
orig_inner, tree type,
   if (!point)
 return ref;
 
+  /* If we're remaking the same load, reuse the SSA NAME it is already loaded
+ into.  */
+  if (gimple_assign_load_p (point)
+  && operand_equal_p (ref, gimple_assign_rhs1 (point)))
+{
+  gcc_checking_assert (TREE_CODE (gimple_assign_lhs (point)) == SSA_NAME);
+  return gimple_assign_lhs (point);
+}
+
   gimple_seq stmts = NULL;
   tree ret = force_gimple_operand (ref, &stmts, true, NULL_TREE);
 
@@ -8204,24 +8213,27 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
  to be relative to a field of that size.  */
   first_bit = MIN (ll_bitpos, rl_bitpos);
   end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
-  if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,
-TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
-volatilep, &lnmode))
+  HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner));
+  poly_uint64 ll_end_region = 0;
+  if (TYPE_SIZE (TREE_TYPE (ll_inner))
+  && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner
+ll_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner)));
+  if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region,
+ll_align, BITS_PER_WORD, volatilep, &lnmode))
 l_split_load = false;
   else
 {
   /* Consider the possibility of recombining loads if any of the
 fields straddles across an alignment boundary, so that either
 part can be loaded along with the other field.  */
-  HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
   HOST_WIDE_INT boundary = compute_split_boundary_from_align
-   (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
+   (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
 
   if (boundary < 0
- || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
-align, BITS_PER_WORD, volatilep, &lnmode)
- || !get_best_mode (end_bit - boundary, boundary, 0, 0,
-align, BITS_PER_WORD, volatilep, &lnmode2))
+ || !get_best_mode (boundary - first_bit, first_bit, 0, ll_end_region,
+ll_align, BITS_PER_WORD, volatilep, &lnmode)
+ || !get_best_mode (end_bit - boundary, boundary, 0, ll_end_region,
+ll_align, BITS_PER_WORD, volatilep, &lnmode2))
return 0;
 
   /* If we can't have a single load, but can with two, figure out whether
@@ -8368,16 +8380,19 @@ fold_truth_andor_for_ifcombine (enum tree_code code, 
tree truth_type,
 and then we use two separate compares.  */
   first_bit = MIN (lr_bitpos, rr_bitpos);
   end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
-  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
- TYPE_ALIGN (TREE_TYPE (lr_inner

Re: [PATCH v2] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Arsen Arsenović

Jonathan Wakely  writes:

> Here's a v2 patch that adds the static assert messages and now also
> tests allocation for member coroutines (with implicit and explicit
> object parameters).
>
> [2. text/plain; patch.txt]...

This revision looks good.  Thanks!  :-)

Have a lovely day.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-11 Thread Giuseppe D'Angelo


On 11/12/2024 21:41, Jonathan Wakely wrote:

Hmm, the warning is useful if somebody does:

std::span s({1,2,3});

So I think we do want the warning, but maybe give it a special case so
it doesn't warn for an rvalue std::span.


Do you have anything in mind to make this work?

Note that the above is akin to:

std::vector getVector();
std::span sp = getVector();


which is also similar to:

std::string getString();
std::string_view sv = getString();


and none of these warn at the moment.

Thank you,
--
Giuseppe D'Angelo


smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PATCH] c++: Make sure fold_sizeof_expr returns the correct type [PR117775]

2024-12-11 Thread Simon Martin

Hi,

On 29 Nov 2024, at 15:33, Simon Martin wrote:

> We currently ICE upon the following code, that is valid under
> -Wno-pointer-arith:
>
> === cut here ===
> int main() {
>   decltype( [](auto) { return sizeof(void); } ) x;
>   return x.operator()(0);
> }
> === cut here ===
>
> The problem is that "fold_sizeof_expr (sizeof(void))" returns
> size_one_node, that has a different TREE_TYPE from that of the sizeof
> expression, which later triggers an assert in 
> cxx_eval_store_expression.
>
> This patch makes sure that fold_sizeof_expr always returns a tree with
> the type requested.
>
> Successfully tested on x86_64-pc-linux-gnu.
Ping.

Thanks!
   Simon
>
>   PR c++/117775
>
> gcc/cp/ChangeLog:
>
>   * decl.cc (fold_sizeof_expr): Make sure the folded result has
>   the requested TREE_TYPE.
>
> gcc/testsuite/ChangeLog:
>
>   * g++.dg/cpp2a/constexpr-117775.C: New test.
>
> ---
>  gcc/cp/decl.cc|  1 +
>  gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C | 13 +
>  2 files changed, 14 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C
>
> diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
> index 80485f0a428..fbe1407a2d2 100644
> --- a/gcc/cp/decl.cc
> +++ b/gcc/cp/decl.cc
> @@ -11686,6 +11686,7 @@ fold_sizeof_expr (tree t)
>   false, false);
>if (r == error_mark_node)
>  r = size_one_node;
> +  r = cp_fold_convert (TREE_TYPE (t), r);
>return r;
>  }
>
> diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C 
> b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C
> new file mode 100644
> index 000..59fc0d332b9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C
> @@ -0,0 +1,13 @@
> +// PR c++/117775
> +// Check that we don't ICE and have sizeof(void)==1 under 
> -Wno-pointer-arith
> +// { dg-do run { target c++20 } }
> +// { dg-additional-options "-Wno-pointer-arith" }
> +
> +int main() {
> +  struct why :
> +decltype( [](auto) {
> + return sizeof(void);
> +   })
> +  {} x;
> +  return 1 - x.operator()(0);
> +}
> -- 
> 2.44.0

[PATCH] RISC-V: Fix compress shuffle pattern [PR117383].

2024-12-11 Thread Robin Dapp

Hi,

this patch makes vcompress use the tail-undisturbed policy by default
and also uses the proper VL.

Regtested on rv64gcv and waiting for the CI.

Regards
 Robin

PR target/117383

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Use TU policy.
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c:
Expect tu.
* gcc.target/riscv/rvv/autovec/pr117383.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |  2 +-
 gcc/config/riscv/riscv-v.cc   |  3 +-
 .../rvv/autovec/binop/vcompress-avlprop-1.c   |  2 +-
 .../gcc.target/riscv/rvv/autovec/pr117383.c   | 48 +++
 4 files changed, 52 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 98af41c6e74..ccbb3f3d397 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -511,7 +511,7 @@ enum insn_type : unsigned int
   COMPRESS_OP = __NORMAL_OP_TA2 | BINARY_OP_P,
   /* has merge operand but use ta.  */
   COMPRESS_OP_MERGE
-  = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P | BINARY_OP_P,
+  = HAS_DEST_P | HAS_MERGE_P | TU_POLICY_P | BINARY_OP_P,
 
   /* For vslideup.up has merge operand but use ta.  */
   SLIDEUP_OP_MERGE = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 160a415c371..5c14c77068f 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3391,7 +3391,8 @@ shuffle_compress_patterns (struct expand_vec_perm_d *d)
 
   insn_code icode = code_for_pred_compress (vmode);
   rtx ops[] = {d->target, merge, d->op0, mask};
-  emit_vlmax_insn (icode, COMPRESS_OP_MERGE, ops);
+  emit_nonvlmax_insn (icode, COMPRESS_OP_MERGE, ops,
+ gen_int_mode (vlen, Pmode));
   return true;
 }
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c
index 3654b03e8ed..98e53b38f09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c
@@ -11,7 +11,7 @@ struct s sss[MAX];
 /*
 ** build_linked_list:
 **   ...
-**   vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*ta,\s*ma
+**   vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*tu,\s*ma
 **   ...
 **   vcompress\.vm\s+v[0-9]+,\s*v[0-9]+,\s*v0
 **   ...
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c
new file mode 100644
index 000..c01612f2902
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_v_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-additional-options "-std=c99 -mrvv-vector-bits=zvl" } */
+
+typedef signed char int8_t;
+typedef int8_t vnx64i __attribute__ ((vector_size (64)));
+
+#define MASK_64
\
+  1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,
\
+37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,
\
+82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
\
+100, 101, 102, 103, 104, 105, 106, 107
+
+void __attribute__ ((noipa))
+test_1 (int8_t *x, int8_t *y, int8_t *out)
+{
+  vnx64i v1 = *(vnx64i *) x;
+  vnx64i v2 = *(vnx64i *) y;
+  vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
+  *(vnx64i *) out = v3;
+}
+
+int
+main (void)
+{
+  int8_t x[64];
+  int8_t y[64];
+  int8_t out[64];
+
+  for (int i = 0; i < 64; i++)
+{
+  x[i] = -i;
+  y[i] = i;
+}
+
+  test_1 (x, y, out);
+
+  int mask[] = {MASK_64};
+#pragma GCC novector
+  for (int i = 0; i < 64; i++)
+{
+  int idx = mask[i] < 64 ? mask[i] : mask[i] - 64;
+  int ref = mask[i] < 64 ? x[idx] : y[idx];
+  if (ref != out[i])
+__builtin_abort ();
+}
+}
-- 
2.47.1

Re: [PATCH] RISC-V: Fix compress shuffle pattern [PR117383].

2024-12-11 Thread Palmer Dabbelt


On Wed, 11 Dec 2024 13:29:29 PST (-0800), Robin Dapp wrote:

Hi,

this patch makes vcompress use the tail-undisturbed policy by default
and also uses the proper VL.

Regtested on rv64gcv and waiting for the CI.

Regards
 Robin

PR target/117383

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum insn_type): Use TU policy.
* config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c:
Expect tu.
* gcc.target/riscv/rvv/autovec/pr117383.c: New test.
---
 gcc/config/riscv/riscv-protos.h   |  2 +-
 gcc/config/riscv/riscv-v.cc   |  3 +-
 .../rvv/autovec/binop/vcompress-avlprop-1.c   |  2 +-
 .../gcc.target/riscv/rvv/autovec/pr117383.c   | 48 +++
 4 files changed, 52 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 98af41c6e74..ccbb3f3d397 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -511,7 +511,7 @@ enum insn_type : unsigned int
   COMPRESS_OP = __NORMAL_OP_TA2 | BINARY_OP_P,
   /* has merge operand but use ta.  */
   COMPRESS_OP_MERGE
-  = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P | BINARY_OP_P,
+  = HAS_DEST_P | HAS_MERGE_P | TU_POLICY_P | BINARY_OP_P,
 
   /* For vslideup.up has merge operand but use ta.  */

   SLIDEUP_OP_MERGE = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P


This one also has a comment saying "but use ta" with TDEFAULT_POLICY_P, 
IIUC one of those must be wrong too?



diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 160a415c371..5c14c77068f 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3391,7 +3391,8 @@ shuffle_compress_patterns (struct expand_vec_perm_d *d)
 
   insn_code icode = code_for_pred_compress (vmode);

   rtx ops[] = {d->target, merge, d->op0, mask};
-  emit_vlmax_insn (icode, COMPRESS_OP_MERGE, ops);
+  emit_nonvlmax_insn (icode, COMPRESS_OP_MERGE, ops,
+ gen_int_mode (vlen, Pmode));
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c

index 3654b03e8ed..98e53b38f09 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c
@@ -11,7 +11,7 @@ struct s sss[MAX];
 /*
 ** build_linked_list:
 **   ...
-**   vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*ta,\s*ma
+**   vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*tu,\s*ma
 **   ...
 **   vcompress\.vm\s+v[0-9]+,\s*v[0-9]+,\s*v0
 **   ...
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c
new file mode 100644
index 000..c01612f2902
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c
@@ -0,0 +1,48 @@
+/* { dg-do run } */
+/* { dg-require-effective-target "riscv_v_ok" } */
+/* { dg-add-options "riscv_v" } */
+/* { dg-additional-options "-std=c99 -mrvv-vector-bits=zvl" } */
+
+typedef signed char int8_t;
+typedef int8_t vnx64i __attribute__ ((vector_size (64)));
+
+#define MASK_64
\
+  1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31,
\
+37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81,
\
+82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
\
+100, 101, 102, 103, 104, 105, 106, 107
+
+void __attribute__ ((noipa))
+test_1 (int8_t *x, int8_t *y, int8_t *out)
+{
+  vnx64i v1 = *(vnx64i *) x;
+  vnx64i v2 = *(vnx64i *) y;
+  vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64);
+  *(vnx64i *) out = v3;
+}
+
+int
+main (void)
+{
+  int8_t x[64];
+  int8_t y[64];
+  int8_t out[64];
+
+  for (int i = 0; i < 64; i++)
+{
+  x[i] = -i;
+  y[i] = i;
+}
+
+  test_1 (x, y, out);
+
+  int mask[] = {MASK_64};
+#pragma GCC novector
+  for (int i = 0; i < 64; i++)
+{
+  int idx = mask[i] < 64 ? mask[i] : mask[i] - 64;
+  int ref = mask[i] < 64 ? x[idx] : y[idx];
+  if (ref != out[i])
+__builtin_abort ();
+}
+}
--
2.47.1

Re: [PATCH] v2: Allow limited extended asm at toplevel [PR41045]

2024-12-11 Thread Andreas Schwab

On Dez 05 2024, Jakub Jelinek wrote:

> --- gcc/testsuite/c-c++-common/toplevel-asm-1.c.jj2024-11-01 
> 15:09:46.209708353 +0100
> +++ gcc/testsuite/c-c++-common/toplevel-asm-1.c   2024-11-01 
> 19:25:47.512567789 +0100
> @@ -0,0 +1,25 @@
> +/* PR c/41045 */
> +/* { dg-do compile } */
> +/* { dg-options "-O0" } */
> +/* { dg-additional-options "-fno-pie" { target pie } } */
> +
> +struct S { char a; long long b; int c; };
> +enum E { E0, E1 = sizeof (struct S) + 15 };
> +int v[42];
> +void foo (void) {}
> +
> +asm ("# %0 %1 %2 %c3 %c4 %5 %% %="
> + :: "i" (sizeof (struct S)),
> + "i" (__builtin_offsetof (struct S, c)),
> + "i" (E1),
> + "s" (foo),
> + "i" (v),
> +/* Not all targets can satisfy "m" even in non-pic code.  */
> +#if !defined(__i386__) && !defined(__x86_64__)
> + "s" (v));
> +#else
> + "m" (v));
> +asm ("# %0 %1"
> + : "=m" (v[16])
> + : "m" (v[41]));
> +#endif

That fails on riscv64:

.../toplevel-asm-1.c:11:1: error: invalid 'asm': invalid use of '%c'
.../toplevel-asm-1.c:11:1: error: invalid 'asm': invalid use of '%c'

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

[pushed][PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges

2024-12-11 Thread Vladimir Makarov


The following patch is a better solution for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116778

The patch was successfully tested and bootstrapped on x86-64, aarch64, 
ppc64le.


commit fca0ab08cd936464b152e9b45356f625eba27575
Author: Vladimir N. Makarov 
Date:   Wed Dec 11 15:36:21 2024 -0500

[PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges

This is a better fix of the PR permitting to avoid building live
ranges after rematerialization.  It checks that FP can not be
eliminated now and that pseudos assigned to FP will be spilled.  In
this case we need to build live ranges after rematerialization for
correct assignments of stack slots to spilled pseudos involved in
rematerialization.

gcc/ChangeLog:

PR rtl-optimization/116778
* ira-int.h (x_ira_class_hard_reg_index): Fix comment typo.
* lra-eliminations.cc (lra_fp_pseudo_p): New function.
* lra-int.h (lra_fp_pseudo_p): External declaration.
* lra-spills.cc (lra_need_for_spills_p): Fix formatting.
* lra.cc (lra): Use lra_fp_pseudo_p in lra_create_live_range after
lra_remat.

diff --git a/gcc/ira-int.h b/gcc/ira-int.h
index 8c3c5941de5..5ce930b6d22 100644
--- a/gcc/ira-int.h
+++ b/gcc/ira-int.h
@@ -868,7 +868,7 @@ public:
 
   /* Index (in ira_class_hard_regs; for given register class and hard
  register (in general case a hard register can belong to several
- register classes;.  The index is negative for hard registers
+ register classes).  The index is negative for hard registers
  unavailable for the allocation.  */
   short x_ira_class_hard_reg_index[N_REG_CLASSES][FIRST_PSEUDO_REGISTER];
 
diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc
index 96772f2904a..5343d8c5102 100644
--- a/gcc/lra-eliminations.cc
+++ b/gcc/lra-eliminations.cc
@@ -1428,6 +1428,25 @@ lra_update_fp2sp_elimination (int *spilled_pseudos)
   return n;
 }
 
+/* Return true if we have a pseudo assigned to hard frame pointer.  */
+bool
+lra_fp_pseudo_p (void)
+{
+  HARD_REG_SET set;
+
+  if (frame_pointer_needed)
+/* At this stage it means we have no pseudos assigned to FP:  */
+return false;
+  CLEAR_HARD_REG_SET (set);
+  add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM);
+  for (int i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++)
+if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0
+	&& overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (i),
+reg_renumber[i]))
+  return true;
+  return false;
+}
+
 /* Entry function to do final elimination if FINAL_P or to update
elimination register offsets (FIRST_P if we are doing it the first
time).  */
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index 5f605c3ae41..abee008e642 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -419,6 +419,7 @@ extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
  bool, bool, poly_int64, bool);
 extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
 extern int lra_update_fp2sp_elimination (int *spilled_pseudos);
+extern bool lra_fp_pseudo_p (void);
 extern void lra_eliminate (bool, bool);
 
 extern poly_int64 lra_update_sp_offset (rtx, poly_int64);
diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 3f5c8d2bcb0..6e9a8c3e34e 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -594,8 +594,9 @@ lra_need_for_scratch_reg_p (void)
 bool
 lra_need_for_spills_p (void)
 {
-  int i; max_regno = max_reg_num ();
+  int i;
 
+  max_regno = max_reg_num ();
   for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
 if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0
 	&& ! ira_former_scratch_p (i))
diff --git a/gcc/lra.cc b/gcc/lra.cc
index 6b740ed2325..55737deba3f 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -2555,8 +2555,11 @@ lra (FILE *f, int verbose)
 	 rematerialize them first.  */
   if (lra_remat ())
 	{
-	  /* We need full live info -- see the comment above.  */
-	  lra_create_live_ranges (true, true);
+	  /* We need full live info -- see the comment above.  We also might
+	 need live info if we have a pseudo assigned to hard frame pointer
+	 reg and will need FP for usual purposes.  */
+	  lra_create_live_ranges (lra_reg_spill_p || lra_fp_pseudo_p (),
+  true);
 	  live_p = true;
 	  if (! lra_need_for_spills_p ())
 	{

Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-11 Thread Jonathan Wakely

On Tue, 3 Dec 2024 at 16:07, Giuseppe D'Angelo
 wrote:
>
> Hello,
>
> The attached patch adds the span(initializer_list) constructor, added by
> P2447R6 for C++26.
>
> It seems that GCC is somewhat aggressive with its -Winit-list-lifetime
> warning, which actively interferes with this feature. The idea of the
> new constructor is to allow calls like:
>
>void f(std::span);
>f({1, 2, 3});
>
> which is completely fine as the lifetime of the initializer_list
> encompasses the one of the std::span parameter. However GCC complains
> about the risk of dangling here. I've therefore disabled the warning for
> the new constructor.

Hmm, the warning is useful if somebody does:

std::span s({1,2,3});

So I think we do want the warning, but maybe give it a special case so
it doesn't warn for an rvalue std::span.

Alternatively, we'd want that constructor to be marked with the
opposite of [[nodiscard]], saying to warn if it *isn't* discarded!

[PATCH] MAINTAINERS: add myself to write after approval

2024-12-11 Thread Matthieu Longo

ChangeLog:

* MAINTAINERS: Add myself to write after approval.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 7d65ed64bdd..44153a7a51e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -630,6 +630,7 @@ Sa Liu  saliu   

 Ralph Loaderralph   
 Sheldon Lobosmlobo  
 Gabor Loki  loki
+Matthieu Longo  mlongo  
 Sandra Loosemoresandra  
 Manuel López-Ibáñez manu
 Carl Love   carll   
-- 
2.47.1

Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-11 Thread Jonathan Wakely


On 03/12/24 17:05 +0100, Giuseppe D'Angelo wrote:

Hello,

The attached patch adds the span(initializer_list) constructor, added 
by P2447R6 for C++26.


It seems that GCC is somewhat aggressive with its -Winit-list-lifetime 
warning, which actively interferes with this feature. The idea of the 
new constructor is to allow calls like:


 void f(std::span);
 f({1, 2, 3});

which is completely fine as the lifetime of the initializer_list 
encompasses the one of the std::span parameter. However GCC complains 
about the risk of dangling here. I've therefore disabled the warning 
for the new constructor.


Thanks,
--
Giuseppe D'Angelo



From bb1d537ee7b68883403127903834427c6787c505 Mon Sep 17 00:00:00 2001
From: Giuseppe D'Angelo 
Date: Tue, 3 Dec 2024 16:56:45 +0100
Subject: [PATCH] libstdc++: add initializer_list constructor to std::span
(P2447)

This commit implements P2447R6. The code is straightforward (just one
extra constructor, with constraints and conditional explicit).

I decided to suppress -Winit-list-lifetime because otherwise it would
give too many false positives. The new constructor is meant to be used
as a parameter-passing interface (this is a design choice, see
P2447R6/2) and, as such, the initializer_list won't dangle despite GCC's
warnings.

The new constructor isn't 100% backwards compatible. A couple of
examples are included in Annex C, but I have also lifted some more
from R4. A new test checks for the old and the new behaviors.

libstdc++-v3/ChangeLog:

* include/bits/version.def: Added the new feature-testing macro.
* include/bits/version.h (defined): Regenerated.
* include/std/span: Added constructor from initializer_list.
* testsuite/23_containers/span/init_list_cons.cc: New test.
* testsuite/23_containers/span/init_list_cons_neg.cc: New test.
---
libstdc++-v3/include/bits/version.def |  8 +++
libstdc++-v3/include/bits/version.h   | 10 +++
libstdc++-v3/include/std/span | 21 ++
.../23_containers/span/init_list_cons.cc  | 65 +++
.../23_containers/span/init_list_cons_neg.cc  | 31 +
5 files changed, 135 insertions(+)
create mode 100644 libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc
create mode 100644 
libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index 8d4b8e9b383..cfa0469fb2d 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1853,6 +1853,14 @@ ftms = {
  };
};

+ftms = {
+  name = span_initializer_list;
+  values = {
+v = 202311;
+cxxmin = 26;
+  };
+};
+
ftms = {
  name = text_encoding;
  values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index c556aca38fa..6a2c66bdf81 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2055,6 +2055,16 @@
#endif /* !defined(__cpp_lib_saturation_arithmetic) && 
defined(__glibcxx_want_saturation_arithmetic) */
#undef __glibcxx_want_saturation_arithmetic

+#if !defined(__cpp_lib_span_initializer_list)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_span_initializer_list 202311L
+#  if defined(__glibcxx_want_all) || 
defined(__glibcxx_want_span_initializer_list)
+#   define __cpp_lib_span_initializer_list 202311L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_span_initializer_list) && 
defined(__glibcxx_want_span_initializer_list) */
+#undef __glibcxx_want_span_initializer_list
+
#if !defined(__cpp_lib_text_encoding)
# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED && (_GLIBCXX_USE_NL_LANGINFO_L)
#  define __glibcxx_text_encoding 202306L
diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index f1c19b58737..b84ac87b657 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -39,6 +39,7 @@
#endif

#define __glibcxx_want_span
+#define __glibcxx_want_span_initializer_list
#include 

#ifdef __cpp_lib_span // C++ >= 20 && concepts
@@ -46,6 +47,9 @@
#include 
#include 
#include 
+#ifdef __cpp_lib_span_initializer_list
+# include 
+#endif
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -226,6 +230,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}

+#if __cpp_lib_span_initializer_list >= 202311L
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Winit-list-lifetime"
+   constexpr
+   explicit(extent != dynamic_extent)
+   span(initializer_list __il)
+   requires (is_const_v<_Type>)
+   : _M_extent(__il.size()), _M_ptr(__il.begin())


These mem-initializers are in the wrong order (we had an existing
constructor with the same problem, but I pushed a fix less than an
hour ago).


+   {
+ if constexpr (extent != dynamic_extent)
+   {
+ __glibcxx_assert(__il.size() == extent);


It's just occurred to me that this check should be done in

Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?

2024-12-11 Thread Bill Wendling

On Thu, Dec 5, 2024 at 2:28 PM Bill Wendling  wrote:
> On Thu, Dec 5, 2024 at 1:09 PM Qing Zhao  wrote:
> > > On Dec 3, 2024, at 10:29, Qing Zhao  wrote:
> > >> On Dec 3, 2024, at 10:07, Martin Uecker  wrote:
> > >> The language extension does not exist yet, so there is no problem.
> > > Yeah, I should mention this as “corresponding future language extension” 
> > > -:)
> > >>
> > >> But I hope we will get it and then specify it so that this works
> > >> correctly without this footgun.
> > >>
> > >> Of course, if GCC gets the "counted_by" attribute wrong, there will
> > >> be arguments later in WG14 why the feature is then different to it.
> > >
> > > I think that we need to resolve this issue first in the design of 
> > > “counted_by” for pointer fields.
> > > I guess that we might need to come up with some additional limitations 
> > > for using the “counted_by”
> > > attribute for pointer fields at the source code level in order to avoid 
> > > such potential error.  But not
> > > sure what exactly the additional limitation should be at this moment.
> > >
> > > Need some study here.
> >
> > Actually, I found out that this is really not a problem with the current 
> > design, for the following new testing case I added for my current 
> > implementation of the counted_by for pointer field:
> >
> > [ gcc.dg]$ cat pointer-counted-by-7.c
> > /* Test the attribute counted_by for pointer field and its usage in
> >  * __builtin_dynamic_object_size.  */
> > /* { dg-do run } */
> > /* { dg-options "-O2" } */
> >
> > #include "builtin-object-size-common.h"
> >
> > struct annotated {
> >   int b;
> >   int *c __attribute__ ((counted_by (b)));
> > };
> >
> > struct annotated *__attribute__((__noinline__)) setup (int attr_count)
> > {
> >   struct annotated *p_array_annotated
> > = (struct annotated *) malloc (sizeof (struct annotated));
> >   p_array_annotated->c = (int *) malloc (sizeof (int) * attr_count);
> >   p_array_annotated->b = attr_count;
> >
> >   return p_array_annotated;
> > }
> >
> >
> > int main(int argc, char *argv[])
> > {
> >   struct annotated *x = setup (10);
> >   int *p = x->c;
> >   x = setup (20);
> >   EXPECT(__builtin_dynamic_object_size (p, 1), 10 * sizeof (int));
> >   EXPECT(__builtin_dynamic_object_size (x->c, 1), 20 * sizeof (int));
> >   DONE ();
> > }
> >
> > This test case passed without any issue.
> >
> > Our previous introduction of the new internal function .ACCESS_WITH_SIZE 
> > already resolved this issue.
> >
> > So, I think that as long as the whole structure is set at the same time, 
> > should be fine.
> >
> > Let me know if you have more comment here.
> >
> Nice! I think this is going to be an issue with Clang's
> implementation. I'll need to create our version of .ACCESS_WITH_SIZE.
> It might end up simplifying some of the code. :-)

So my assumption that our GetElementPointerInst instruction would
"just work" in this case was wrong. By assigning x->c to p it loses
pretty much all information, e.g. the __counted_by attribute. It looks
like we'll have to go with a similar intrinsic to .ACCESS_WITH_SIZE.

-bw

Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-11 Thread Jonathan Wakely

On Wed, 11 Dec 2024 at 21:18, Giuseppe D'Angelo
 wrote:
>
> On 11/12/2024 21:41, Jonathan Wakely wrote:
> > Hmm, the warning is useful if somebody does:
> >
> > std::span s({1,2,3});
> >
> > So I think we do want the warning, but maybe give it a special case so
> > it doesn't warn for an rvalue std::span.
>
> Do you have anything in mind to make this work?

It would need changes to the front end, but I think they'd be useful changes.

>
> Note that the above is akin to:
>
> std::vector getVector();
> std::span sp = getVector();
>
>
> which is also similar to:
>
> std::string getString();
> std::string_view sv = getString();
>
>
> and none of these warn at the moment.

They should do though :-)

If C++ had ref-qualified constructors then we could constrain the
std::span(Range&&)& constructor to only work for borrowed_ranges but
allow the std::span(Range&&)&& constructor to work for any ranges. We
don't have that though.

Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support

2024-12-11 Thread Eric Botcazou

> Difference with v3: Refresh
> 
> Samuel Thibault (3):
>   ada: Factorize bsd signal definitions
>   ada: Fix GNU/Hurd priority range
>   ada: Add GNU/Hurd x86_64 support

Thanks, this looks good to me.  Do you need me to merge it?

-- 
Eric Botcazou

[PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Tamar Christina

Hi All,

GCC 15 added two new fusions CMP+CSEL and CMP+CSET.

This patch enables them for cores that support based on their Software
Optimization Guides and generically on Armv9-A.   Even if a core does not
support it there's no negative performance impact.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE):
New.
* config/aarch64/tuning_models/cortexx925.h: Use it.
* config/aarch64/tuning_models/generic_armv9_a.h: Use it.
* config/aarch64/tuning_models/neoverse512tvb.h: Use it.
* config/aarch64/tuning_models/neoversen2.h: Use it.
* config/aarch64/tuning_models/neoversen3.h: Use it.
* config/aarch64/tuning_models/neoversev1.h: Use it.
* config/aarch64/tuning_models/neoversev2.h: Use it.
* config/aarch64/tuning_models/neoversev3.h: Use it.
* config/aarch64/tuning_models/neoversev3ae.h: Use it.

---
diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 
f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376df9ae7a196031286d
 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
 /* Baseline fusion settings suitable for all cores.  */
 #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC)
 
+/* Baseline fusion settings suitable for all Neoverse cores.  */
+#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | AARCH64_FUSE_CMP_CSEL \
+   | AARCH64_FUSE_CMP_CSET)
+
 #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK)
diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
b/gcc/config/aarch64/tuning_models/cortexx925.h
index 
eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496def277230e193d20f
 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
 2 /* store_pred.  */
   }, /* memmov_cost.  */
   10, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
   "32:16", /* function_align.  */
   "4", /* jump_align.  */
   "32:16", /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
index 
48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a15a5df7803ff4dc6c
 100644
--- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
+++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
@@ -236,7 +236,7 @@ static const struct tune_params generic_armv9_a_tunings =
 1 /* store_pred.  */
   }, /* memmov_cost.  */
   3, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
   "32:16", /* function_align.  */
   "4", /* jump_align.  */
   "32:16", /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h 
b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
index 
c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596dbe2d7d4a3de9c08c
 100644
--- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
+++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
@@ -143,7 +143,7 @@ static const struct tune_params neoverse512tvb_tunings =
 1 /* store_pred.  */
   }, /* memmov_cost.  */
   3, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
   "32:16", /* function_align.  */
   "4", /* jump_align.  */
   "32:16", /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
b/gcc/config/aarch64/tuning_models/neoversen2.h
index 
18199ac206c6cbfcef8695b497401b78a8f77f38..47d861232951f92bebbd09fbca8b1ff8624949fc
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -205,7 +205,7 @@ static const struct tune_params neoversen2_tunings =
 1 /* store_pred.  */
   }, /* memmov_cost.  */
   5, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
   "32:16", /* function_align.  */
   "4", /* jump_align.  */
   "32:16", /* loop_align.  */
diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h 
b/gcc/config/aarch64/tuning_models/neoversen3.h
index 
4da85cfac0d185a5d59439f6d19d90ace0354e8f..356b5f8cc83b69e74ddac70db7a3883bc1b7a29b
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen3.h
+++ b/gcc/config/aarch64/tuning_models/neoversen3.h
@@ -205,7 +205,7 @@ static const struct tune_params neoversen3_tunings =
 2 /* store_pred.  */
   }, /* memmov_cost.  */
   5, /* issue_rate  */
-  AARCH64_FUSE_BASE, /* fusible_ops  */
+

[PATCH 2/2]AArch64: Set L1 data cache size according to size on CPUs

2024-12-11 Thread Tamar Christina

Hi All,

This sets the L1 data cache size for some cores based on their size in their
Technical Reference Manuals.

Today the port minimum is 256 bytes as explained in commit
g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most cores
actually define the L1 cache size as 64-bytes.  The generic Armv9-A model was
already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this
change follows suite for a few other cores based on their TRMs.

This results in less memory pressure when running on large core count machines.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

* config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b.
* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
* config/aarch64/tuning_models/neoversen1.h: Likewise.
* config/aarch64/tuning_models/neoversen2.h: Likewise.
* config/aarch64/tuning_models/neoversen3.h: Likewise.
* config/aarch64/tuning_models/neoversev1.h: Likewise.
* config/aarch64/tuning_models/neoversev2.h: Likewise.
* config/aarch64/tuning_models/neoversev3.h: Likewise.
* config/aarch64/tuning_models/neoversev3ae.h: Likewise.

---
diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
b/gcc/config/aarch64/tuning_models/cortexx925.h
index 
b3ae1576ade1f701b775496def277230e193d20f..f32454aa808bb2a19bf4d364387ab463a87f9fbf
 100644
--- a/gcc/config/aarch64/tuning_models/cortexx925.h
+++ b/gcc/config/aarch64/tuning_models/cortexx925.h
@@ -222,7 +222,7 @@ static const struct tune_params cortexx925_tunings =
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
| AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
-  &generic_prefetch_tune,
+  &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
 };
diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h 
b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
index 
f72505918f3aa64200aa596dbe2d7d4a3de9c08c..007f987154c4634afeb6294b0df142fdd05997cd
 100644
--- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
+++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
@@ -158,7 +158,7 @@ static const struct tune_params neoverse512tvb_tunings =
   (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
-  &generic_prefetch_tune,
+  &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
 };
diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h 
b/gcc/config/aarch64/tuning_models/neoversen1.h
index 
82def6b2736df8162d9b606440d260c951f3ef99..608e36936ad437befa9b8663bc4c91822b05befc
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen1.h
+++ b/gcc/config/aarch64/tuning_models/neoversen1.h
@@ -52,7 +52,7 @@ static const struct tune_params neoversen1_tunings =
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags.  */
-  &generic_prefetch_tune,
+  &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
 };
diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
b/gcc/config/aarch64/tuning_models/neoversen2.h
index 
47d861232951f92bebbd09fbca8b1ff8624949fc..13d4640791ca998ca012002c932e8775b1898733
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen2.h
+++ b/gcc/config/aarch64/tuning_models/neoversen2.h
@@ -222,7 +222,7 @@ static const struct tune_params neoversen2_tunings =
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
| AARCH64_EXTRA_TUNE_AVOID_PRED_RMW),   /* tune_flags.  */
-  &generic_prefetch_tune,
+  &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
 };
diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h 
b/gcc/config/aarch64/tuning_models/neoversen3.h
index 
356b5f8cc83b69e74ddac70db7a3883bc1b7a29b..ef15dc8c3015bb406a6957346cf47bacf305d884
 100644
--- a/gcc/config/aarch64/tuning_models/neoversen3.h
+++ b/gcc/config/aarch64/tuning_models/neoversen3.h
@@ -221,7 +221,7 @@ static const struct tune_params neoversen3_tunings =
| AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
| AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
| AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags.  */
-  &generic_prefetch_tune,
+  &generic_armv9a_prefetch_tune,
   AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
   AARCH64_LDP_STP_POLICY_ALWAYS   /* stp_policy_model.  */
 };
diff --git a/gcc/co

Re: [PATCH] Fix some more is_trivial C++26 deprecation warnings

2024-12-11 Thread Giuseppe D'Angelo


Hello,

Giuseppe D'Angelo ha scritto:

Hello,

This patch fixes a C++26 deprecation warning in g++'s (not libstdc++)
autotests.


Please ignore; this has been independently fixed in 
7aab1271b4afb6f3e9e8a01825fcb14750b29312 (thanks!).


Thank you,
--
Giuseppe D'Angelo


smime.p7s
Description: Firma crittografica S/MIME

[PATCH] c++: allow stores to anon union vars to change current union member in constexpr [PR117614]

2024-12-11 Thread Jakub Jelinek

Hi!

Since r14-4771 the FE tries to differentiate between cases where the lhs
of a store allows changing the current union member and cases where it
doesn't, and cases where it doesn't includes everything that has gone
through the cxx_eval_constant_expression path on the lhs.
As the testcase shows, DECL_ANON_UNION_VAR_P vars were handled like that
too, even when stores to them are the only way how to change the current
union member in the sources.

So, the following patch just handles that case manually without calling
cxx_eval_constant_expression and without setting evaluated to true.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-12-11  Jakub Jelinek  

PR c++/117614
* constexpr.cc (cxx_eval_store_expression): For stores to
DECL_ANON_UNION_VAR_P vars just continue with DECL_VALUE_EXPR
of it, without setting evaluated to true or full
cxx_eval_constant_expression.

* g++.dg/cpp2a/constexpr-union8.C: New test.

--- gcc/cp/constexpr.cc.jj  2024-12-06 09:08:20.977872404 +0100
+++ gcc/cp/constexpr.cc 2024-12-09 16:57:23.682959311 +0100
@@ -6418,6 +6418,14 @@ cxx_eval_store_expression (const constex
object = probe;
  else
{
+ tree pvar = tree_strip_any_location_wrapper (probe);
+ if (VAR_P (pvar) && DECL_ANON_UNION_VAR_P (pvar))
+   {
+ /* Stores to DECL_ANON_UNION_VAR_P var are allowed to change
+active union member.  */
+ probe = DECL_VALUE_EXPR (pvar);
+ break;
+   }
  probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue,
non_constant_p, overflow_p);
  evaluated = true;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C.jj2024-12-09 
17:06:02.876721019 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C   2024-12-09 
17:05:01.832572060 +0100
@@ -0,0 +1,31 @@
+// PR c++/117614
+// { dg-do compile { target c++20 } }
+
+constexpr int
+foo ()
+{
+  union {
+int x{0};
+char y;
+  };
+  y = 1;
+  return y;
+}
+
+constexpr int
+bar ()
+{
+  union {
+union {
+  int x{0};
+  char y;
+};
+long long z;
+  };
+  y = 1;
+  z = 2;
+  return z;
+}
+
+static_assert (foo () == 1);
+static_assert (bar () == 2);

Jakub

Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support

2024-12-11 Thread Samuel Thibault

Eric Botcazou, le mer. 11 déc. 2024 09:58:23 +0100, a ecrit:
> > Difference with v3: Refresh
> > 
> > Samuel Thibault (3):
> >   ada: Factorize bsd signal definitions
> >   ada: Fix GNU/Hurd priority range
> >   ada: Add GNU/Hurd x86_64 support
> 
> Thanks, this looks good to me.  Do you need me to merge it?

Yes, please, I don't have commit access.

Samuel

[PATCH] RISC-V: optimization by converting LUI operands with SMALL_AFTER_COMMON_TRAILING_SHIFT

2024-12-11 Thread Oliver Kozul

The patch optimizes code generation for comparisons of the form
X & C1 == C2. When the bitwise AND mask could fit in a 12-bit immediate
operand of RISC-V "andi" instruction with help of right shifting.
For example, C1 = 0x5550 and C2 = 0x1450.
These numbers can be stored using 12 bits, which is advantageous in RISC-V,
since instructions such as ANDI exist. By shifting all used values
by 20 bits to the right, we can make use of the “and immediate” instruction,
thus improving performance.

2024-12-11  Oliver Kozul  

  PR target/114087

gcc/ChangeLog:

  * config/riscv/riscv.md (*lui_constraint_lshiftrt): New 
pattern.

gcc/testsuite/ChangeLog:

  * gcc.target/riscv/pr114087-2.c: New test.



CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at straym...@rt-rk.com 
immediately.
---
 gcc/config/riscv/riscv.md   | 25 +
 gcc/testsuite/gcc.target/riscv/pr114087-2.c | 11 +
 2 files changed, 36 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/pr114087-2.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 3a4cd1d93a0..310cbd5617e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -858,6 +858,31 @@
   [(set_attr "type" "arith")
(set_attr "mode" "SI")])
 
+(define_insn_and_split "*lui_constraint_lshiftrt"
+  [(set (match_operand:ANYI 0 "register_operand" "=r")
+  (plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r")
+(match_operand 2 "const_int_operand"))
+(match_operand 3 "const_int_operand")))]
+  "!SMALL_OPERAND (INTVAL (operands[2]))
+  && !SMALL_OPERAND (INTVAL (operands[3]))
+  && SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]),
+  INTVAL (operands[3]))"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (lshiftrt:X (match_dup 1) (match_dup 4)))
+   (set (match_dup 0) (and:X (match_dup 0) (match_dup 5)))
+   (set (match_dup 0) (plus:X (match_dup 0) (match_dup 6)))]
+  {
+HOST_WIDE_INT mask = INTVAL (operands[2]);
+HOST_WIDE_INT val = INTVAL (operands[3]);
+int trailing_shift = COMMON_TRAILING_ZEROS (mask, val);
+
+operands[4] = GEN_INT (trailing_shift);
+operands[5] = GEN_INT (mask >> trailing_shift);
+operands[6] = GEN_INT (val >> trailing_shift);
+  }
+[(set_attr "type" "arith")])
+
 ;;
 ;;  
 ;;
diff --git a/gcc/testsuite/gcc.target/riscv/pr114087-2.c 
b/gcc/testsuite/gcc.target/riscv/pr114087-2.c
new file mode 100644
index 000..1e158e8b722
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr114087-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
+/* { dg-options "-march=rv64gc -mabi=lp64d" } */
+
+int pred2a(int x) {
+  return ((x & 0x5550) == 0x1450);
+}
+
+/* { dg-final { scan-assembler  {srli\s*[a-x0-9]+,\s*[a-x0-9]+,\s*20}} } */
+/* { dg-final { scan-assembler  {andi\s*[a-x0-9]+,\s*[a-x0-9]+,\s*1365}} } */
\ No newline at end of file
-- 
2.43.0

Re: rs6000: Add -msplit-patch-nops (PR112980)

2024-12-11 Thread Martin Jambor

Hello,

even though it is not my work, I would like to ping this patch.  Having
it upstream would really help us a lot.

Thank you very much in advance,

Martin


On Wed, Nov 13 2024, Michael Matz wrote:
> Hello,
>
> this is essentially 
>
>   https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html
>
> from Kewen in functionality.  When discussing this with Segher at the 
> Cauldron he expressed reservations about changing the default 
> implementation of -fpatchable-function-entry.  So, to move forward, let's 
> move it under a new target option -msplit-patch-nops (expressing the 
> important deviation from the default behaviour, namely that all the 
> patching nops form a consecutive sequence normally).
>
> Regstrapping on power9 ppc64le in progress.  Okay if that passes?
>
>
> Ciao,
> Michael.
>
> ---
>
> as the bug report details some uses of -fpatchable-function-entry
> aren't happy with the "before" NOPs being inserted between global and
> local entry point on powerpc.  We want the before NOPs be in front
> of the global entry point.  That means that the patching NOPs aren't
> consecutive for dual entry point functions, but for these usecases
> that's not the problem.  But let us support both under the control
> of a new target option: -msplit-patch-nops.
>
>   gcc/
>
> PR target/112980
> * config/rs6000/rs6000.opt (msplit-patch-nops): New option.
> * doc/invoke.texi (RS/6000 and PowerPC Options): Document it.
> * config/rs6000/rs6000.h (machine_function.stop_patch_area_print):
> New member.
> * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
> Emit split nops under control of that one.
> * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue):
> Add handling of split patch nops.
> ---
>  gcc/config/rs6000/rs6000-logue.cc | 15 +--
>  gcc/config/rs6000/rs6000.cc   | 27 +++
>  gcc/config/rs6000/rs6000.h|  6 ++
>  gcc/config/rs6000/rs6000.opt  |  4 
>  gcc/doc/invoke.texi   | 17 +++--
>  5 files changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000-logue.cc 
> b/gcc/config/rs6000/rs6000-logue.cc
> index c87058b435e..aa1e0442f2b 100644
> --- a/gcc/config/rs6000/rs6000-logue.cc
> +++ b/gcc/config/rs6000/rs6000-logue.cc
> @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file)
>  
>unsigned short patch_area_size = crtl->patch_area_size;
>unsigned short patch_area_entry = crtl->patch_area_entry;
> -  /* Need to emit the patching area.  */
> -  if (patch_area_size > 0)
> +  /* Emit non-split patching area now.  */
> +  if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0)
>   {
> cfun->machine->global_entry_emitted = true;
> /* As ELFv2 ABI shows, the allowable bytes between the global
> @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file)
>  patch_area_entry);
> rs6000_print_patchable_function_entry (file, patch_area_entry,
>true);
> -   patch_area_size -= patch_area_entry;
>   }
>   }
>  
> @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file)
>assemble_name (file, name);
>fputs ("\n", file);
>/* Emit the nops after local entry.  */
> -  if (patch_area_size > 0)
> - rs6000_print_patchable_function_entry (file, patch_area_size,
> -patch_area_entry == 0);
> +  if (patch_area_size > patch_area_entry)
> + {
> +   patch_area_size -= patch_area_entry;
> +   cfun->machine->stop_patch_area_print = false;
> +   rs6000_print_patchable_function_entry (file, patch_area_size,
> +  patch_area_entry == 0);
> + }
>  }
>  
>else if (rs6000_pcrel_p ())
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 950fd947fda..6427e6913ba 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file,
>  {
>bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p 
> ();
>/* For a function which needs global entry point, we will emit the
> - patchable area before and after local entry point under the control of
> - cfun->machine->global_entry_emitted, see the handling in function
> - rs6000_output_function_prologue.  */
> -  if (!global_entry_needed_p || cfun->machine->global_entry_emitted)
> + patchable area when it isn't split before and after local entry point
> + under the control of cfun->machine->global_entry_emitted, see the
> + handling in function rs6000_output_function_prologue.  */
> +  if (!TARGET_SPLIT_PATCH_NOPS
> +  && (!global_entry_needed_p || cfun->machine->global_entry_emitted))
>

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
>
> This patch enables them for cores that support based on their Software
> Optimization Guides and generically on Armv9-A.   Even if a core does not
> support it there's no negative performance impact.
>
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE):
>   New.
>   * config/aarch64/tuning_models/cortexx925.h: Use it.
>   * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
>   * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
>   * config/aarch64/tuning_models/neoversen2.h: Use it.
>   * config/aarch64/tuning_models/neoversen3.h: Use it.
>   * config/aarch64/tuning_models/neoversev1.h: Use it.
>   * config/aarch64/tuning_models/neoversev2.h: Use it.
>   * config/aarch64/tuning_models/neoversev3.h: Use it.
>   * config/aarch64/tuning_models/neoversev3ae.h: Use it.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> index 
> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376df9ae7a196031286d
>  100644
> --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>  /* Baseline fusion settings suitable for all cores.  */
>  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC)
>  
> +/* Baseline fusion settings suitable for all Neoverse cores.  */
> +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | 
> AARCH64_FUSE_CMP_CSEL \
> + | AARCH64_FUSE_CMP_CSET)
> +
>  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK)
> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> index 
> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496def277230e193d20f
>  100644
> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
>  2 /* store_pred.  */
>}, /* memmov_cost.  */
>10, /* issue_rate  */
> -  AARCH64_FUSE_BASE, /* fusible_ops  */
> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>"32:16",   /* function_align.  */
>"4",   /* jump_align.  */
>"32:16",   /* loop_align.  */
> diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h 
> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> index 
> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a15a5df7803ff4dc6c
>  100644
> --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> @@ -236,7 +236,7 @@ static const struct tune_params generic_armv9_a_tunings =
>  1 /* store_pred.  */
>}, /* memmov_cost.  */
>3, /* issue_rate  */
> -  AARCH64_FUSE_BASE, /* fusible_ops  */
> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>"32:16",   /* function_align.  */
>"4",   /* jump_align.  */
>"32:16",   /* loop_align.  */

Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
we have to be careful about using it for generic tuning.  generic-armv9-a
mustn't become "generic Neoverse", since it's supposed to be good for
non-Neoverse (and non-Arm) Armv9-A cores as well.

So perhaps here we should expand the macro.  Alternatively, we could add
a comment saying that fusion macros for other CPUs can be added here as
well, if they are unlikely to have a negative impact on other cores.

Perhaps we should expand the macro for cortex-x925 as well.

LGTM otherwise, but let's see whether there are any other comments.

Thanks,
Richard

> diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h 
> b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> index 
> c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596dbe2d7d4a3de9c08c
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> @@ -143,7 +143,7 @@ static const struct tune_params neoverse512tvb_tunings =
>  1 /* store_pred.  */
>}, /* memmov_cost.  */
>3, /* issue_rate  */
> -  AARCH64_FUSE_BASE, /* fusible_ops  */
> +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>"32:16",   /* function_align.  */
>"4",   /* jump_align.  */
>"32:16",   /* loop_align.  */
> diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
> b/gcc/config/aarch64/tuning_models/neoversen2.h
> index 
> 18199ac206c6cbfcef8695b497401b78a8f77f38..47d861232951f92bebbd09fbca8b1ff8624949fc
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversen2.h
> +++ b/gcc/config/aarch64/tuning_models/neoversen2.h
> @@ -205,7 +2

RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, December 11, 2024 9:32 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> support it
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
> >
> > This patch enables them for cores that support based on their Software
> > Optimization Guides and generically on Armv9-A.   Even if a core does not
> > support it there's no negative performance impact.
> >
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64-fusion-pairs.def
> (AARCH64_FUSE_NEOVERSE_BASE):
> > New.
> > * config/aarch64/tuning_models/cortexx925.h: Use it.
> > * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
> > * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
> > * config/aarch64/tuning_models/neoversen2.h: Use it.
> > * config/aarch64/tuning_models/neoversen3.h: Use it.
> > * config/aarch64/tuning_models/neoversev1.h: Use it.
> > * config/aarch64/tuning_models/neoversev2.h: Use it.
> > * config/aarch64/tuning_models/neoversev3.h: Use it.
> > * config/aarch64/tuning_models/neoversev3ae.h: Use it.
> >
> > ---
> >
> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> > index
> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
> df9ae7a196031286d 100644
> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
> >  /* Baseline fusion settings suitable for all cores.  */
> >  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
> AARCH64_FUSE_AES_AESMC)
> >
> > +/* Baseline fusion settings suitable for all Neoverse cores.  */
> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
> AARCH64_FUSE_CMP_CSEL \
> > +   | AARCH64_FUSE_CMP_CSET)
> > +
> >  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
> AARCH64_FUSE_MOVK_MOVK)
> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> > index
> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
> def277230e193d20f 100644
> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
> >  2 /* store_pred.  */
> >}, /* memmov_cost.  */
> >10, /* issue_rate  */
> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >"32:16", /* function_align.  */
> >"4", /* jump_align.  */
> >"32:16", /* loop_align.  */
> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> > index
> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
> 5a5df7803ff4dc6c 100644
> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> > @@ -236,7 +236,7 @@ static const struct tune_params
> generic_armv9_a_tunings =
> >  1 /* store_pred.  */
> >}, /* memmov_cost.  */
> >3, /* issue_rate  */
> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >"32:16", /* function_align.  */
> >"4", /* jump_align.  */
> >"32:16", /* loop_align.  */
> 
> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
> we have to be careful about using it for generic tuning.  generic-armv9-a
> mustn't become "generic Neoverse", since it's supposed to be good for
> non-Neoverse (and non-Arm) Armv9-A cores as well.
> 
> So perhaps here we should expand the macro.  Alternatively, we could add
> a comment saying that fusion macros for other CPUs can be added here as
> well, if they are unlikely to have a negative impact on other cores.
> 
> Perhaps we should expand the macro for cortex-x925 as well.

Sorry that certainly was not the intention, I thought about naming it
AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
non-Armv9 cores I thought this would be equally confusing.  But perhaps
In light of your comments it may make more sense naming wise?

Thanks,
Tamar

> 
> LGTM otherwise, but let's see whether there are any other comments.
> 
> Thanks,
> Richard
> 
> > diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> > index
> c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596db
> e2d7d4a3de9c08c 100644
> > --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> > +++ b/gcc/config/aarch64/tuning_model

Re: [PATCH V2] RISC-V: Update Xsfvqmacc and Xsfvfnrclip extension's testcases.

2024-12-11 Thread Kito Cheng

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
> index 813f7860f645..3b46505e8f99 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c
> @@ -7,6 +7,7 @@
>  /*
>  ** test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t:
>  ** ...
> +** vsetivli\s+zero+,0+,e32+,mf2+,ta+,ma+

This should be e8, mf8, you can also check with llvm's implementation:
https://godbolt.org/z/xoeYrK6o7

>  ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+
>  ** ...
>  */

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
> index f2058a14779b..6bb659b5d233 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c
> @@ -7,6 +7,7 @@
>  /*
>  ** test_sf_vqmacc_2x8x2_i32m1_vint32m1_t:
>  ** ...
> +** vsetivli\s+zero+,0+,e32+,m1,ta,ma+


This should be e8, m1, you can also check with llvm's implementation:
https://godbolt.org/z/ns8T65YTh

>  ** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+
>  ** ...
>  */

> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c
> index 3bd6f1c273cd..8106d0dbbaba 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c
> @@ -7,6 +7,7 @@
>  /*
>  ** test_sf_vqmacc_4x8x4_i32m1_vint32m1_t:
>  ** ...
> +** vsetivli\s+zero+,0+,e32+,m1,ta,ma+

This should be e8, mf2, you can also check with llvm's implementation:
https://godbolt.org/z/h3q6q369q


>  ** sf\.vqmacc\.4x8x4\tv[0-9]+,v[0-9]+,v[0-9]+
>  ** ...
>  */

Re: [PATCH 2/2]AArch64: Set L1 data cache size according to size on CPUs

2024-12-11 Thread Richard Sandiford

Tamar Christina  writes:
> Hi All,
>
> This sets the L1 data cache size for some cores based on their size in their
> Technical Reference Manuals.
>
> Today the port minimum is 256 bytes as explained in commit
> g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most 
> cores
> actually define the L1 cache size as 64-bytes.  The generic Armv9-A model was
> already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this
> change follows suite for a few other cores based on their TRMs.
>
> This results in less memory pressure when running on large core count 
> machines.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b.
>   * config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
>   * config/aarch64/tuning_models/neoversen1.h: Likewise.
>   * config/aarch64/tuning_models/neoversen2.h: Likewise.
>   * config/aarch64/tuning_models/neoversen3.h: Likewise.
>   * config/aarch64/tuning_models/neoversev1.h: Likewise.
>   * config/aarch64/tuning_models/neoversev2.h: Likewise.
>   * config/aarch64/tuning_models/neoversev3.h: Likewise.
>   * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
>
> ---
>
> diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h 
> b/gcc/config/aarch64/tuning_models/cortexx925.h
> index 
> b3ae1576ade1f701b775496def277230e193d20f..f32454aa808bb2a19bf4d364387ab463a87f9fbf
>  100644
> --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> @@ -222,7 +222,7 @@ static const struct tune_params cortexx925_tunings =
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags.  */
> -  &generic_prefetch_tune,
> +  &generic_armv9a_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model.  */
>  };
> diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h 
> b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> index 
> f72505918f3aa64200aa596dbe2d7d4a3de9c08c..007f987154c4634afeb6294b0df142fdd05997cd
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h
> @@ -158,7 +158,7 @@ static const struct tune_params neoverse512tvb_tunings =
>(AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> -  &generic_prefetch_tune,
> +  &generic_armv9a_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model.  */
>  };
> diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h 
> b/gcc/config/aarch64/tuning_models/neoversen1.h
> index 
> 82def6b2736df8162d9b606440d260c951f3ef99..608e36936ad437befa9b8663bc4c91822b05befc
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversen1.h
> +++ b/gcc/config/aarch64/tuning_models/neoversen1.h
> @@ -52,7 +52,7 @@ static const struct tune_params neoversen1_tunings =
>0, /* max_case_values.  */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND),   /* tune_flags.  */
> -  &generic_prefetch_tune,
> +  &generic_armv9a_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model.  */
>  };
> diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h 
> b/gcc/config/aarch64/tuning_models/neoversen2.h
> index 
> 47d861232951f92bebbd09fbca8b1ff8624949fc..13d4640791ca998ca012002c932e8775b1898733
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversen2.h
> +++ b/gcc/config/aarch64/tuning_models/neoversen2.h
> @@ -222,7 +222,7 @@ static const struct tune_params neoversen2_tunings =
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT
> | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags.  */
> -  &generic_prefetch_tune,
> +  &generic_armv9a_prefetch_tune,
>AARCH64_LDP_STP_POLICY_ALWAYS,   /* ldp_policy_model.  */
>AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model.  */
>  };
> diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h 
> b/gcc/config/aarch64/tuning_models/neoversen3.h
> index 
> 356b5f8cc83b69e74ddac70db7a3883bc1b7a29b..ef15dc8c3015bb406a6957346cf47bacf305d884
>  100644
> --- a/gcc/config/aarch64/tuning_models/neoversen3.h
> +++ b/gcc/config/aarch64/tuning_models/neoversen3.h
> @@ -221,7 +221,7 @@ static const struct tune_params neoversen3_tunings =
> | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS
> | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS
> | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),  /* tune_flags.  */
> -  &gener

Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Richard Sandiford

Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Wednesday, December 11, 2024 9:32 AM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; ktkac...@gcc.gnu.org
>> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
>> support it
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
>> >
>> > This patch enables them for cores that support based on their Software
>> > Optimization Guides and generically on Armv9-A.   Even if a core does not
>> > support it there's no negative performance impact.
>> >
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >* config/aarch64/aarch64-fusion-pairs.def
>> (AARCH64_FUSE_NEOVERSE_BASE):
>> >New.
>> >* config/aarch64/tuning_models/cortexx925.h: Use it.
>> >* config/aarch64/tuning_models/generic_armv9_a.h: Use it.
>> >* config/aarch64/tuning_models/neoverse512tvb.h: Use it.
>> >* config/aarch64/tuning_models/neoversen2.h: Use it.
>> >* config/aarch64/tuning_models/neoversen3.h: Use it.
>> >* config/aarch64/tuning_models/neoversev1.h: Use it.
>> >* config/aarch64/tuning_models/neoversev2.h: Use it.
>> >* config/aarch64/tuning_models/neoversev3.h: Use it.
>> >* config/aarch64/tuning_models/neoversev3ae.h: Use it.
>> >
>> > ---
>> >
>> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > index
>> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
>> df9ae7a196031286d 100644
>> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
>> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
>> >  /* Baseline fusion settings suitable for all cores.  */
>> >  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
>> AARCH64_FUSE_AES_AESMC)
>> >
>> > +/* Baseline fusion settings suitable for all Neoverse cores.  */
>> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
>> AARCH64_FUSE_CMP_CSEL \
>> > +  | AARCH64_FUSE_CMP_CSET)
>> > +
>> >  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
>> AARCH64_FUSE_MOVK_MOVK)
>> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
>> b/gcc/config/aarch64/tuning_models/cortexx925.h
>> > index
>> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
>> def277230e193d20f 100644
>> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
>> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
>> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
>> >  2 /* store_pred.  */
>> >}, /* memmov_cost.  */
>> >10, /* issue_rate  */
>> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
>> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>> >"32:16",/* function_align.  */
>> >"4",/* jump_align.  */
>> >"32:16",/* loop_align.  */
>> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > index
>> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
>> 5a5df7803ff4dc6c 100644
>> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
>> > @@ -236,7 +236,7 @@ static const struct tune_params
>> generic_armv9_a_tunings =
>> >  1 /* store_pred.  */
>> >}, /* memmov_cost.  */
>> >3, /* issue_rate  */
>> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
>> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
>> >"32:16",/* function_align.  */
>> >"4",/* jump_align.  */
>> >"32:16",/* loop_align.  */
>> 
>> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
>> we have to be careful about using it for generic tuning.  generic-armv9-a
>> mustn't become "generic Neoverse", since it's supposed to be good for
>> non-Neoverse (and non-Arm) Armv9-A cores as well.
>> 
>> So perhaps here we should expand the macro.  Alternatively, we could add
>> a comment saying that fusion macros for other CPUs can be added here as
>> well, if they are unlikely to have a negative impact on other cores.
>> 
>> Perhaps we should expand the macro for cortex-x925 as well.
>
> Sorry that certainly was not the intention, I thought about naming it
> AARCH64_FUSE_ARMV9_BASE instead,  but since I was also using it for
> non-Armv9 cores I thought this would be equally confusing.  But perhaps
> In light of your comments it may make more sense naming wise?

Yeah, I'd also wondered about suggesting that, but then saw that there
were quite a few Armv9-A cores that don't have the flag.

So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we
should use for the Neoverse tunings.  But perhaps we should expand the
macro for the two

RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it

2024-12-11 Thread Tamar Christina

> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, December 11, 2024 9:50 AM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; ktkac...@gcc.gnu.org
> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> support it
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Wednesday, December 11, 2024 9:32 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> >> ; ktkac...@gcc.gnu.org
> >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that
> >> support it
> >>
> >> Tamar Christina  writes:
> >> > Hi All,
> >> >
> >> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET.
> >> >
> >> > This patch enables them for cores that support based on their Software
> >> > Optimization Guides and generically on Armv9-A.   Even if a core does not
> >> > support it there's no negative performance impact.
> >> >
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  * config/aarch64/aarch64-fusion-pairs.def
> >> (AARCH64_FUSE_NEOVERSE_BASE):
> >> >  New.
> >> >  * config/aarch64/tuning_models/cortexx925.h: Use it.
> >> >  * config/aarch64/tuning_models/generic_armv9_a.h: Use it.
> >> >  * config/aarch64/tuning_models/neoverse512tvb.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversen2.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversen3.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev1.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev2.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev3.h: Use it.
> >> >  * config/aarch64/tuning_models/neoversev3ae.h: Use it.
> >> >
> >> > ---
> >> >
> >> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > index
> >>
> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376
> >> df9ae7a196031286d 100644
> >> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
> >> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET)
> >> >  /* Baseline fusion settings suitable for all cores.  */
> >> >  #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH |
> >> AARCH64_FUSE_AES_AESMC)
> >> >
> >> > +/* Baseline fusion settings suitable for all Neoverse cores.  */
> >> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE |
> >> AARCH64_FUSE_CMP_CSEL \
> >> > +| AARCH64_FUSE_CMP_CSET)
> >> > +
> >> >  #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK |
> >> AARCH64_FUSE_MOVK_MOVK)
> >> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h
> >> b/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > index
> >>
> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496
> >> def277230e193d20f 100644
> >> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h
> >> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings =
> >> >  2 /* store_pred.  */
> >> >}, /* memmov_cost.  */
> >> >10, /* issue_rate  */
> >> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >> >"32:16",  /* function_align.  */
> >> >"4",  /* jump_align.  */
> >> >"32:16",  /* loop_align.  */
> >> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > index
> >>
> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1
> >> 5a5df7803ff4dc6c 100644
> >> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h
> >> > @@ -236,7 +236,7 @@ static const struct tune_params
> >> generic_armv9_a_tunings =
> >> >  1 /* store_pred.  */
> >> >}, /* memmov_cost.  */
> >> >3, /* issue_rate  */
> >> > -  AARCH64_FUSE_BASE, /* fusible_ops  */
> >> > +  AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops  */
> >> >"32:16",  /* function_align.  */
> >> >"4",  /* jump_align.  */
> >> >"32:16",  /* loop_align.  */
> >>
> >> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think
> >> we have to be careful about using it for generic tuning.  generic-armv9-a
> >> mustn't become "generic Neoverse", since it's supposed to be good for
> >> non-Neoverse (and non-Arm) Armv9-A cores as well.
> >>
> >> So perhaps here we should expand the macro.  Alternatively, we could add
> >> a comment saying that fusion macros for other CPUs can be added here as
> >> well, if they are unlikely to have a negative impact on other cores.
> >>
> >> Perhaps we should expand the macro for cortex-x925 as well.
> >
> > Sorry that certainly was not the intention, I thought about naming it
> > AARCH6

Re: [PATCH] AIX Build failure with default -std=gnu23.

2024-12-11 Thread swamy sangamesh

Thank you everyone.

I have pushed the changes.

Thanks,
Sangamesh


On Mon, Dec 9, 2024 at 12:48 PM swamy sangamesh 
wrote:

> Hi  David,
>
> I don't have write privileges for GCC repo.
> I have raised a request for write access and provided your email address
> as approver.
> Please approve the request.
>
> Thanks,
> Sangamesh
>
>
> On Mon, Dec 9, 2024 at 3:17 AM David Edelsohn  wrote:
>
>> This revised patch is okay.
>>
>> You are listed in the FSF copyrights file for GCC GDB GLIBC BINUTILS, but
>> do you have write privileges for the GCC repo?  You are not listed in
>> gcc/MAINTAINERS for write-after-approval.
>>
>> Thanks, David
>>
>> On Sun, Dec 8, 2024 at 10:49 AM swamy sangamesh <
>> swamy.sangam...@gmail.com> wrote:
>>
>>> Thank you all for the review and comments.
>>>
>>> David, I have tested the changes against the latest master and ran
>>> default test cases.
>>> I have also built and ran the default test cases on RHEL9.0 ppc64le.
>>> Please let me know if any other testing needs to be covered.
>>>
>>> Please find the latest patch attached.
>>>
>>>
>>>
>>> On Sat, Dec 7, 2024 at 2:31 AM David Edelsohn  wrote:
>>>
 On Fri, Dec 6, 2024 at 2:17 PM Ian Lance Taylor  wrote:

> David Edelsohn  writes:
>
> > On Fri, Dec 6, 2024 at 12:25 PM Rainer Orth <
> r...@cebitec.uni-bielefeld.de>
> > wrote:
> >
> >> Hi David,
> >>
> >> > No objection from me, but Ian is the maintainer of libiberty, so
> I'll
> >> defer
> >> > to him, especially about style and overall software engineering.
> >> >
> >> > The C23 change presumably will break on Alpha OSF/1 as well.
> Does GCC
> >> > still support OSF/1?  It might be preferred to delete the block
> entirely
> >> > instead of #ifndef _AIX.
> >>
> >> GCC 4.7 was the last release to support Tru64 UNIX (ex-OSF/1).
> However,
> >> libiberty is also used outside of the toolchain, so that may affect
> the
> >> decision.
> >>
> >> However, IMO the Tru64 UNIX support can go for good now.
> >>
> >
> > Hi, Rainer
> >
> > Thanks for taking a look and commenting.
> >
> > It seems we both agree that it would be better to remove the entire
> block
> > defining _NO_PROTO because both of the systems are no longer
> supported.
> >
> > I'll give Ian the opportunity to comment.
>
> Looks good to me.  Thanks.
>
> Ian
>

 Sangamesh,

 Can you respin and test a revised patch that removes the conditional
 _NO_PROTO definition instead of adding #ifndef _AIX?  I think that is what
 Rainer and I would prefer because neither of the OSes is supported and we
 don't need a fragile work-around.

 Thanks, David


>>>
>>> --
>>> Thanks & Regards,
>>> Sangamesh
>>>
>>
>
> --
> Thanks & Regards,
> Sangamesh
>


-- 
Thanks & Regards,
Sangamesh

[PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Jonathan Wakely

This was approved in Wrocław as LWG 3900, so that passing an incorrect
argument intended as an allocator will be ill-formed, instead of
silently ignored.

This also renames the template parameters and function parameters for
the allocators, to align with the names in the standard. I found it too
confusing to have a parameter _Alloc which doesn't correspond to Alloc
in the standard. Rename _Alloc to _Allocator (which the standard calls
Allocator) and rename _Na to _Alloc (which the standard calls Alloc).

libstdc++-v3/ChangeLog:

* include/std/generator (_Promise_alloc): Rename template
parameter. Use __alloc_rebind to rebind allocator.
(_Promise_alloc::operator new): Replace constraints with a
static_assert in the body. Rename allocator parameter.
(_Promise_alloc::_M_allocate): Rename allocator parameter.
Use __alloc_rebind to rebind allocator.
(_Promise_alloc::operator new): Rename allocator
parameter.
* testsuite/24_iterators/range_generators/alloc.cc: New test.
* testsuite/24_iterators/range_generators/lwg3900.cc: New test.
---

Tested x86_64-linux.

 libstdc++-v3/include/std/generator| 55 +--
 .../24_iterators/range_generators/alloc.cc| 45 +++
 .../24_iterators/range_generators/lwg3900.cc  | 16 ++
 3 files changed, 87 insertions(+), 29 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc
 create mode 100644 
libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index 0a14e70064e..16b953e90af 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 concept _Stateless_alloc = (allocator_traits<_All>::is_always_equal::value
&& default_initializable<_All>);
 
-template
+template
   class _Promise_alloc
   {
-   using _ATr = allocator_traits<_Alloc>;
-   using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>;
-   using _Rebound_ATr = typename _ATr
- ::template rebind_traits<_Alloc_block>;
+   using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>;
+   using _Rebound_ATr = allocator_traits<_Rebound>;
static_assert(is_pointer_v,
  "Must use allocators for true pointers with generators");
 
@@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   public:
void*
operator new(std::size_t __sz)
- requires default_initializable<_Rebound> // _Alloc is non-void
+ requires default_initializable<_Rebound> // _Allocator is non-void
{ return _M_allocate({}, __sz); }
 
-   template
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3900. The allocator_arg_t overloads of promise_type::operator new
+   // should not be constrained
+   template
void*
operator new(std::size_t __sz,
-allocator_arg_t, const _Na& __na,
+allocator_arg_t, const _Alloc& __a,
 const _Args&...)
- requires convertible_to
{
- return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
-__sz);
+ static_assert(convertible_to);
+ return _M_allocate(_Rebound(_Allocator(__a)), __sz);
}
 
-   template
+   template
void*
operator new(std::size_t __sz,
 const _This&,
-allocator_arg_t, const _Na& __na,
+allocator_arg_t, const _Alloc& __a,
 const _Args&...)
- requires convertible_to
{
- return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
-__sz);
+ static_assert(convertible_to);
+ return _M_allocate(_Rebound(_Allocator(__a)), __sz);
}
 
void
@@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
}
 
-   template
+   template
static void*
-   _M_allocate(const _Na& __na, std::size_t __csz)
+   _M_allocate(const _Alloc& __a, std::size_t __csz)
{
- using _Rebound = typename std::allocator_traits<_Na>
-   ::template rebind_alloc<_Alloc_block>;
- using _Rebound_ATr = typename std::allocator_traits<_Na>
-   ::template rebind_traits<_Alloc_block>;
+ using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>;
+ using _Rebound_ATr = allocator_traits<_Rebound>;
 
  static_assert(is_pointer_v,
"Must use allocators for true pointers with 
generators");
 
  _Dealloc_fn __d = &_M_deallocator<_Rebound>;
- auto __b = static_cast<_Rebound>(__na);
+ auto __b = static_cast<_Rebound>(__a);
  auto __asz = _M_alloc_size<_Rebound>(__csz);

[committed] libstdc++: Make std::println use locale from ostream (LWG 4088)

2024-12-11 Thread Jonathan Wakely

This was just approved in Wrocław.

libstdc++-v3/ChangeLog:

* include/std/ostream (println): Pass stream's locale to
std::format, as per LWG 4088.
* testsuite/27_io/basic_ostream/print/1.cc: Check std::println
with custom locale. Remove unused brit_punc class.
---

Tested x86_64-linux. Pushed to trunk.

 libstdc++-v3/include/std/ostream   |  6 --
 .../testsuite/27_io/basic_ostream/print/1.cc   | 18 +-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 637aad5a5a4..327313a881d 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -1028,8 +1028,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline void
 println(ostream& __os, format_string<_Args...> __fmt, _Args&&... __args)
 {
-  std::print(__os, "{}\n",
-std::format(__fmt, std::forward<_Args>(__args)...));
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 4088. println ignores the locale imbued in std::ostream
+  std::print(__os, "{}\n", std::format(__os.getloc(), __fmt,
+  std::forward<_Args>(__args)...));
 }
 
   // Defined for C++26, supported as an extension to C++23.
diff --git a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc 
b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
index cd4b116ac1c..183e08733d2 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc
@@ -63,14 +63,6 @@ test_vprint_nonunicode()
   // { dg-output "garbage in . garbage out" }
 }
 
-struct brit_punc : std::numpunct
-{
-  std::string do_grouping() const override { return "\3\3"; }
-  char do_thousands_sep() const override { return ','; }
-  std::string do_truename() const override { return "yes mate"; }
-  std::string do_falsename() const override { return "nah bruv"; }
-};
-
 void
 test_locale()
 {
@@ -82,7 +74,7 @@ test_locale()
 
   // The default C locale.
   std::locale cloc = std::locale::classic();
-  // A custom locale using comma digit separators.
+  // A custom locale using tilde digit separators.
   std::locale bloc(cloc, new stream_punc);
 
   {
@@ -101,6 +93,14 @@ test_locale()
 std::print(os, "{:L} {}", 12345, 6789);
 VERIFY(os.str() == "1~23~45 6789");
   }
+
+  {
+// LWG 4088. println ignores the locale imbued in std::ostream
+std::ostringstream os;
+os.imbue(bloc);
+std::println(os, "{:L} {}", 12345, 6789);
+VERIFY(os.str() == "1~23~45 6789\n");
+  }
 }
 
 void
-- 
2.47.1

Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new

2024-12-11 Thread Jonathan Wakely

On Wed, 11 Dec 2024 at 11:02, Jonathan Wakely  wrote:
>
> This was approved in Wrocław as LWG 3900, so that passing an incorrect
> argument intended as an allocator will be ill-formed, instead of
> silently ignored.
>
> This also renames the template parameters and function parameters for
> the allocators, to align with the names in the standard. I found it too
> confusing to have a parameter _Alloc which doesn't correspond to Alloc
> in the standard. Rename _Alloc to _Allocator (which the standard calls
> Allocator) and rename _Na to _Alloc (which the standard calls Alloc).
>
> libstdc++-v3/ChangeLog:
>
> * include/std/generator (_Promise_alloc): Rename template
> parameter. Use __alloc_rebind to rebind allocator.
> (_Promise_alloc::operator new): Replace constraints with a
> static_assert in the body. Rename allocator parameter.
> (_Promise_alloc::_M_allocate): Rename allocator parameter.
> Use __alloc_rebind to rebind allocator.
> (_Promise_alloc::operator new): Rename allocator
> parameter.
> * testsuite/24_iterators/range_generators/alloc.cc: New test.
> * testsuite/24_iterators/range_generators/lwg3900.cc: New test.
> ---
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/std/generator| 55 +--
>  .../24_iterators/range_generators/alloc.cc| 45 +++
>  .../24_iterators/range_generators/lwg3900.cc  | 16 ++
>  3 files changed, 87 insertions(+), 29 deletions(-)
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc
>  create mode 100644 
> libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc
>
> diff --git a/libstdc++-v3/include/std/generator 
> b/libstdc++-v3/include/std/generator
> index 0a14e70064e..16b953e90af 100644
> --- a/libstdc++-v3/include/std/generator
> +++ b/libstdc++-v3/include/std/generator
> @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  concept _Stateless_alloc = 
> (allocator_traits<_All>::is_always_equal::value
> && default_initializable<_All>);
>
> -template
> +template
>class _Promise_alloc
>{
> -   using _ATr = allocator_traits<_Alloc>;
> -   using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>;
> -   using _Rebound_ATr = typename _ATr
> - ::template rebind_traits<_Alloc_block>;
> +   using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>;
> +   using _Rebound_ATr = allocator_traits<_Rebound>;
> static_assert(is_pointer_v,
>   "Must use allocators for true pointers with 
> generators");
>
> @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>public:
> void*
> operator new(std::size_t __sz)
> - requires default_initializable<_Rebound> // _Alloc is non-void
> + requires default_initializable<_Rebound> // _Allocator is non-void
> { return _M_allocate({}, __sz); }
>
> -   template
> +   // _GLIBCXX_RESOLVE_LIB_DEFECTS
> +   // 3900. The allocator_arg_t overloads of promise_type::operator new
> +   // should not be constrained
> +   template
> void*
> operator new(std::size_t __sz,
> -allocator_arg_t, const _Na& __na,
> +allocator_arg_t, const _Alloc& __a,
>  const _Args&...)
> - requires convertible_to
> {
> - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> -__sz);
> + static_assert(convertible_to);
> + return _M_allocate(_Rebound(_Allocator(__a)), __sz);
> }
>
> -   template
> +   template
> void*
> operator new(std::size_t __sz,
>  const _This&,
> -allocator_arg_t, const _Na& __na,
> +allocator_arg_t, const _Alloc& __a,
>  const _Args&...)
> - requires convertible_to
> {
> - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)),
> -__sz);
> + static_assert(convertible_to);
> + return _M_allocate(_Rebound(_Allocator(__a)), __sz);
> }
>
> void
> @@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> }
> }
>
> -   template
> +   template
> static void*
> -   _M_allocate(const _Na& __na, std::size_t __csz)
> +   _M_allocate(const _Alloc& __a, std::size_t __csz)
> {
> - using _Rebound = typename std::allocator_traits<_Na>
> -   ::template rebind_alloc<_Alloc_block>;
> - using _Rebound_ATr = typename std::allocator_traits<_Na>
> -   ::template rebind_traits<_Alloc_block>;
> + using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>;
> + using _Rebound_ATr = allocator_traits<_Rebound>;
>
>   static_assert(is_pointer_v,
>

[PATCH] libstdc++: Avoid redundant assertions in std::span constructors

2024-12-11 Thread Jonathan Wakely

Any std::span constructor with a runtime length has a precondition
that the length is equal to N (except when N == std::dynamic_extent).

Currently every constructor with a runtime length does:

if constexpr (extent != dynamic_extent)
  __glibcxx_assert(n == extent);

We can move those assertions into the __detail::__extent_storage
constructor so they are only done in one place. To avoid checking the
assertions when we have a constant length we can add a second
constructor which is consteval and takes a integral_constant
argument. The std::span constructors can pass a size_t for runtime
lengths and a std::integral_constant for constant lengths
that don't need to be checked.

The __detail::__extent_storage specialization only needs
one constructor, as a std::integral_constant argument can
implicitly convert to size_t.

For the member functions that return a subspan with a constant extent we
return std::span(ptr, C) which is redundant in two ways. Repeating
the constant length C when it's already a template argument is
redundant, and using the std::span(T*, size_t) constructor implies a
runtime length which will do a redundant assertion check.  Even though
that assertion won't fail and should be optimized away, it's still
unnecessary code that doesn't need to be instantiated and then optimized
away again. We can avoid that by adding a new private constructor that
only takes a pointer (wrapped in a custom tag struct to avoid
accidentally using that constructor) and automatically sets _M_extent to
the correct value.

libstdc++-v3/ChangeLog:

* include/std/span (__detail::__extent_storage): Check
precondition in constructor. Add consteval constructor for valid
lengths and deleted constructor for invalid constant lengths.
Make member functions always_inline.
(__detail::__span_ptr): New class template.
(span): Adjust constructors to use a std::integral_constant
value for constant lengths. Declare all specializations of
std::span as friends.
(span::first, span::last, span::subspan): Use new
private constructor.
(span(__span_ptr)): New private constructor for constant
lengths.
---

As suggested at
https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671431.html

Tested x86_64-linux.

 libstdc++-v3/include/std/span | 78 ---
 1 file changed, 44 insertions(+), 34 deletions(-)

diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
index 67227348c90..e8043c02c9a 100644
--- a/libstdc++-v3/include/std/span
+++ b/libstdc++-v3/include/std/span
@@ -73,10 +73,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class __extent_storage
   {
   public:
+   // Used for runtime sizes that must satisfy the precondition.
constexpr
-   __extent_storage(size_t) noexcept
+   __extent_storage([[maybe_unused]] size_t __n) noexcept
+   { __glibcxx_assert(__n == _Extent); }
+
+   // Used for constant sizes that are already known to be correct.
+   consteval
+   __extent_storage(integral_constant) noexcept
{ }
 
+   // "I've made a huge mistake" - George Oscar Bluth II
+   template
+ __extent_storage(integral_constant) = delete;
+
+   [[__gnu__::__always_inline__]]
static constexpr size_t
_M_extent() noexcept
{ return _Extent; }
@@ -86,11 +97,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class __extent_storage
   {
   public:
+   [[__gnu__::__always_inline__]]
constexpr
__extent_storage(size_t __extent) noexcept
: _M_extent_value(__extent)
{ }
 
+   [[__gnu__::__always_inline__]]
constexpr size_t
_M_extent() const noexcept
{ return this->_M_extent_value; }
@@ -98,6 +111,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   private:
size_t _M_extent_value;
   };
+
+template struct __span_ptr { _Type* const _M_ptr; };
+
   } // namespace __detail
 
   template
@@ -128,6 +144,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Nested type so that _Type is not an associated class of iterator.
   struct __iter_tag;
 
+  template
+   static inline constexpr integral_constant __v{};
+
 public:
   // member types
   using element_type   = _Type;
@@ -153,7 +172,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr
   span() noexcept
   requires (_Extent == dynamic_extent || _Extent == 0)
-  : _M_ptr(nullptr), _M_extent(0)
+  : _M_ptr(nullptr), _M_extent(__v<0>)
   { }
 
   template
@@ -162,13 +181,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
span(_It __first, size_type __count)
noexcept
: _M_ptr(std::to_address(__first)), _M_extent(__count)
-   {
- if constexpr (_Extent != dynamic_extent)
-   {
- __glibcxx_assert(__count == _Extent);
-   }
- __glibcxx_requires_valid_range(__first, __first + __count);
-   }
+   {

[PATCH] libstdc++: Add lvalue overload for generator::yield_value

2024-12-11 Thread Jonathan Wakely

This was approved in Wrocław as LWG 3899.

libstdc++-v3/ChangeLog:

* include/std/generator (generator::yield_value): Add overload
taking lvalue element_of view, as per LWG 3899.
---

The issue suggests that this change avoids creating a new coroutine
frame, so I thought if I used a custom allocator I'd be able to see a
reduction in memory allocation after this fix. I was unable to see any
difference. I would welcome a testcase for it.

Tested x86_64-linux.

 libstdc++-v3/include/std/generator | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/libstdc++-v3/include/std/generator 
b/libstdc++-v3/include/std/generator
index bba85bd0aa4..807e724c0c8 100644
--- a/libstdc++-v3/include/std/generator
+++ b/libstdc++-v3/include/std/generator
@@ -153,6 +153,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  noexcept
{ return _Recursive_awaiter { std::move(__r.range) }; }
 
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3899. co_yielding elements of an lvalue generator is
+   // unnecessarily inefficient
+   template
+   requires std::same_as<_Yield2_t<_R2, _V2>, _Yielded>
+   auto
+   yield_value(ranges::elements_of&, _U2> __r)
+ noexcept
+   { return _Recursive_awaiter { std::move(__r.range) }; }
+
template
requires convertible_to, _Yielded>
auto
-- 
2.47.1

Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)

2024-12-11 Thread Jonathan Wakely

On Wed, 11 Dec 2024 at 22:53, Jonathan Wakely  wrote:
>
> On 03/12/24 17:05 +0100, Giuseppe D'Angelo wrote:
> >Hello,
> >
> >The attached patch adds the span(initializer_list) constructor, added
> >by P2447R6 for C++26.
> >
> >It seems that GCC is somewhat aggressive with its -Winit-list-lifetime
> >warning, which actively interferes with this feature. The idea of the
> >new constructor is to allow calls like:
> >
> >  void f(std::span);
> >  f({1, 2, 3});
> >
> >which is completely fine as the lifetime of the initializer_list
> >encompasses the one of the std::span parameter. However GCC complains
> >about the risk of dangling here. I've therefore disabled the warning
> >for the new constructor.
> >
> >Thanks,
> >--
> >Giuseppe D'Angelo
>
> >From bb1d537ee7b68883403127903834427c6787c505 Mon Sep 17 00:00:00 2001
> >From: Giuseppe D'Angelo 
> >Date: Tue, 3 Dec 2024 16:56:45 +0100
> >Subject: [PATCH] libstdc++: add initializer_list constructor to std::span
> > (P2447)
> >
> >This commit implements P2447R6. The code is straightforward (just one
> >extra constructor, with constraints and conditional explicit).
> >
> >I decided to suppress -Winit-list-lifetime because otherwise it would
> >give too many false positives. The new constructor is meant to be used
> >as a parameter-passing interface (this is a design choice, see
> >P2447R6/2) and, as such, the initializer_list won't dangle despite GCC's
> >warnings.
> >
> >The new constructor isn't 100% backwards compatible. A couple of
> >examples are included in Annex C, but I have also lifted some more
> >from R4. A new test checks for the old and the new behaviors.
> >
> >libstdc++-v3/ChangeLog:
> >
> >   * include/bits/version.def: Added the new feature-testing macro.
> >   * include/bits/version.h (defined): Regenerated.
> >   * include/std/span: Added constructor from initializer_list.
> >   * testsuite/23_containers/span/init_list_cons.cc: New test.
> >   * testsuite/23_containers/span/init_list_cons_neg.cc: New test.
> >---
> > libstdc++-v3/include/bits/version.def |  8 +++
> > libstdc++-v3/include/bits/version.h   | 10 +++
> > libstdc++-v3/include/std/span | 21 ++
> > .../23_containers/span/init_list_cons.cc  | 65 +++
> > .../23_containers/span/init_list_cons_neg.cc  | 31 +
> > 5 files changed, 135 insertions(+)
> > create mode 100644 
> > libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc
> > create mode 100644 
> > libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc
> >
> >diff --git a/libstdc++-v3/include/bits/version.def 
> >b/libstdc++-v3/include/bits/version.def
> >index 8d4b8e9b383..cfa0469fb2d 100644
> >--- a/libstdc++-v3/include/bits/version.def
> >+++ b/libstdc++-v3/include/bits/version.def
> >@@ -1853,6 +1853,14 @@ ftms = {
> >   };
> > };
> >
> >+ftms = {
> >+  name = span_initializer_list;
> >+  values = {
> >+v = 202311;
> >+cxxmin = 26;
> >+  };
> >+};
> >+
> > ftms = {
> >   name = text_encoding;
> >   values = {
> >diff --git a/libstdc++-v3/include/bits/version.h 
> >b/libstdc++-v3/include/bits/version.h
> >index c556aca38fa..6a2c66bdf81 100644
> >--- a/libstdc++-v3/include/bits/version.h
> >+++ b/libstdc++-v3/include/bits/version.h
> >@@ -2055,6 +2055,16 @@
> > #endif /* !defined(__cpp_lib_saturation_arithmetic) && 
> > defined(__glibcxx_want_saturation_arithmetic) */
> > #undef __glibcxx_want_saturation_arithmetic
> >
> >+#if !defined(__cpp_lib_span_initializer_list)
> >+# if (__cplusplus >  202302L)
> >+#  define __glibcxx_span_initializer_list 202311L
> >+#  if defined(__glibcxx_want_all) || 
> >defined(__glibcxx_want_span_initializer_list)
> >+#   define __cpp_lib_span_initializer_list 202311L
> >+#  endif
> >+# endif
> >+#endif /* !defined(__cpp_lib_span_initializer_list) && 
> >defined(__glibcxx_want_span_initializer_list) */
> >+#undef __glibcxx_want_span_initializer_list
> >+
> > #if !defined(__cpp_lib_text_encoding)
> > # if (__cplusplus >  202302L) && _GLIBCXX_HOSTED && 
> > (_GLIBCXX_USE_NL_LANGINFO_L)
> > #  define __glibcxx_text_encoding 202306L
> >diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span
> >index f1c19b58737..b84ac87b657 100644
> >--- a/libstdc++-v3/include/std/span
> >+++ b/libstdc++-v3/include/std/span
> >@@ -39,6 +39,7 @@
> > #endif
> >
> > #define __glibcxx_want_span
> >+#define __glibcxx_want_span_initializer_list
> > #include 
> >
> > #ifdef __cpp_lib_span // C++ >= 20 && concepts
> >@@ -46,6 +47,9 @@
> > #include 
> > #include 
> > #include 
> >+#ifdef __cpp_lib_span_initializer_list
> >+# include 
> >+#endif
> > namespace std _GLIBCXX_VISIBILITY(default)
> > {
> > _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >@@ -226,6 +230,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >   }
> >   }
> >
> >+#if __cpp_lib_span_initializer_list >= 202311L
> >+#pragma GCC diagnostic push
> >+#pragma GCC diagnostic ignored "-Winit-list-lifetime"
> >+  constexpr
> >+  explicit(extent != dy

Re: [PATCH] RISC-V: Fix compress shuffle pattern [PR117383].

2024-12-11 Thread Robin Dapp

> This one also has a comment saying "but use ta" with TDEFAULT_POLICY_P, 
> IIUC one of those must be wrong too?

Yea, I didn't update the comment, thanks for catching.

-- 
Regards
 Robin

[COMMITTED] Fix misplaced x86 -mstack-protector-guard-symbol documentation [PR117150]

2024-12-11 Thread Sandra Loosemore

Commit e1769bdd4cef522ada32aec863feba41116b183a accidentally inserted
the documentation for the x86 -mstack-protector-guard-symbol option in the
wrong place.  Fixed thusly.

gcc/ChangeLog
PR target/117150
* doc/invoke.texi (RS/6000 and PowerPC Options): Move description
of -mstack-protector-guard-symbol from here...
(x86 Options): ...to here.
---
 gcc/doc/invoke.texi | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3cb9a50b690..b85084459b1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -32376,11 +32376,9 @@ The @option{-mno-compat-align-parm} option is the 
default.
 @opindex mstack-protector-guard
 @opindex mstack-protector-guard-reg
 @opindex mstack-protector-guard-offset
-@opindex mstack-protector-guard-symbol
 @item -mstack-protector-guard=@var{guard}
 @itemx -mstack-protector-guard-reg=@var{reg}
 @itemx -mstack-protector-guard-offset=@var{offset}
-@itemx -mstack-protector-guard-symbol=@var{symbol}
 Generate stack protection code using canary at @var{guard}.  Supported
 locations are @samp{global} for global canary or @samp{tls} for per-thread
 canary in the TLS block (the default with GNU libc version 2.4 or later).
@@ -32390,8 +32388,7 @@ With the latter choice the options
 @option{-mstack-protector-guard-offset=@var{offset}} furthermore specify
 which register to use as base register for reading the canary, and from what
 offset from that base register. The default for those is as specified in the
-relevant ABI.  @option{-mstack-protector-guard-symbol=@var{symbol}} overrides
-the offset with a symbol reference to a canary in the TLS block.
+relevant ABI.
 
 @opindex mpcrel
 @opindex mno-pcrel
@@ -36216,9 +36213,11 @@ Split 32-byte AVX unaligned load and store.
 @opindex mstack-protector-guard
 @opindex mstack-protector-guard-reg
 @opindex mstack-protector-guard-offset
+@opindex mstack-protector-guard-symbol
 @item -mstack-protector-guard=@var{guard}
 @itemx -mstack-protector-guard-reg=@var{reg}
 @itemx -mstack-protector-guard-offset=@var{offset}
+@itemx -mstack-protector-guard-symbol=@var{symbol}
 Generate stack protection code using canary at @var{guard}.  Supported
 locations are @samp{global} for global canary or @samp{tls} for per-thread
 canary in the TLS block (the default).  This option has effect only when
@@ -36231,6 +36230,9 @@ which segment register (@code{%fs} or @code{%gs}) to 
use as base register
 for reading the canary, and from what offset from that base register.
 The default for those is as specified in the relevant ABI.
 
+@option{-mstack-protector-guard-symbol=@var{symbol}} overrides
+the offset with a symbol reference to a canary in the TLS block.
+
 @opindex mgeneral-regs-only
 @item -mgeneral-regs-only
 Generate code that uses only the general-purpose registers.  This
-- 
2.25.1

61 matches

Mail list logo