Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
On 11/12/2024 09:54, Tamar Christina wrote: -Original Message- From: Richard Sandiford Sent: Wednesday, December 11, 2024 9:50 AM To: Tamar Christina Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw ; ktkac...@gcc.gnu.org Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it Tamar Christina writes: -Original Message- From: Richard Sandiford Sent: Wednesday, December 11, 2024 9:32 AM To: Tamar Christina Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw ; ktkac...@gcc.gnu.org Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it Tamar Christina writes: Hi All, GCC 15 added two new fusions CMP+CSEL and CMP+CSET. This patch enables them for cores that support based on their Software Optimization Guides and generically on Armv9-A. Even if a core does not support it there's no negative performance impact. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE): New. * config/aarch64/tuning_models/cortexx925.h: Use it. * config/aarch64/tuning_models/generic_armv9_a.h: Use it. * config/aarch64/tuning_models/neoverse512tvb.h: Use it. * config/aarch64/tuning_models/neoversen2.h: Use it. * config/aarch64/tuning_models/neoversen3.h: Use it. * config/aarch64/tuning_models/neoversev1.h: Use it. * config/aarch64/tuning_models/neoversev2.h: Use it. * config/aarch64/tuning_models/neoversev3.h: Use it. * config/aarch64/tuning_models/neoversev3ae.h: Use it. --- diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def index f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 df9ae7a196031286d 100644 --- a/gcc/config/aarch64/aarch64-fusion-pairs.def +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) /* Baseline fusion settings suitable for all cores. */ #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC) +/* Baseline fusion settings suitable for all Neoverse cores. */ +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | AARCH64_FUSE_CMP_CSEL \ + | AARCH64_FUSE_CMP_CSET) + #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK) diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h b/gcc/config/aarch64/tuning_models/cortexx925.h index eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 def277230e193d20f 100644 --- a/gcc/config/aarch64/tuning_models/cortexx925.h +++ b/gcc/config/aarch64/tuning_models/cortexx925.h @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = 2 /* store_pred. */ }, /* memmov_cost. */ 10, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h index 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 5a5df7803ff4dc6c 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -236,7 +236,7 @@ static const struct tune_params generic_armv9_a_tunings = 1 /* store_pred. */ }, /* memmov_cost. */ 3, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think we have to be careful about using it for generic tuning. generic-armv9-a mustn't become "generic Neoverse", since it's supposed to be good for non-Neoverse (and non-Arm) Armv9-A cores as well. So perhaps here we should expand the macro. Alternatively, we could add a comment saying that fusion macros for other CPUs can be added here as well, if they are unlikely to have a negative impact on other cores. Perhaps we should expand the macro for cortex-x925 as well. Sorry that certainly was not the intention, I thought about naming it AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for non-Armv9 cores I thought this would be equally confusing. But perhaps In light of your comments it may make more sense naming wise? Yeah, I'd also wondered about suggesting that, but then saw that there were quite a few Armv9-A cores that don't have the flag. So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we should use for the Neoverse tunings. But perhaps we should expand the macro for the two files above. If anyone adds any new _FUSE fl
Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
Jonathan Wakely writes: > This was approved in Wrocław as LWG 3900, so that passing an incorrect > argument intended as an allocator will be ill-formed, instead of > silently ignored. > > This also renames the template parameters and function parameters for > the allocators, to align with the names in the standard. I found it too > confusing to have a parameter _Alloc which doesn't correspond to Alloc > in the standard. Rename _Alloc to _Allocator (which the standard calls > Allocator) and rename _Na to _Alloc (which the standard calls Alloc). > > libstdc++-v3/ChangeLog: > > * include/std/generator (_Promise_alloc): Rename template > parameter. Use __alloc_rebind to rebind allocator. > (_Promise_alloc::operator new): Replace constraints with a > static_assert in the body. Rename allocator parameter. > (_Promise_alloc::_M_allocate): Rename allocator parameter. > Use __alloc_rebind to rebind allocator. > (_Promise_alloc::operator new): Rename allocator > parameter. > * testsuite/24_iterators/range_generators/alloc.cc: New test. > * testsuite/24_iterators/range_generators/lwg3900.cc: New test. > --- > > Tested x86_64-linux. > > libstdc++-v3/include/std/generator| 55 +-- > .../24_iterators/range_generators/alloc.cc| 45 +++ > .../24_iterators/range_generators/lwg3900.cc | 16 ++ > 3 files changed, 87 insertions(+), 29 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc > create mode 100644 > libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc LGTM, with one suggestion below. > diff --git a/libstdc++-v3/include/std/generator > b/libstdc++-v3/include/std/generator > index 0a14e70064e..16b953e90af 100644 > --- a/libstdc++-v3/include/std/generator > +++ b/libstdc++-v3/include/std/generator > @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > concept _Stateless_alloc = > (allocator_traits<_All>::is_always_equal::value > && default_initializable<_All>); > > -template > +template >class _Promise_alloc >{ > - using _ATr = allocator_traits<_Alloc>; > - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>; > - using _Rebound_ATr = typename _ATr > - ::template rebind_traits<_Alloc_block>; > + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>; > + using _Rebound_ATr = allocator_traits<_Rebound>; > static_assert(is_pointer_v, > "Must use allocators for true pointers with generators"); > > @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >public: > void* > operator new(std::size_t __sz) > - requires default_initializable<_Rebound> // _Alloc is non-void > + requires default_initializable<_Rebound> // _Allocator is non-void > { return _M_allocate({}, __sz); } > > - template > + // _GLIBCXX_RESOLVE_LIB_DEFECTS > + // 3900. The allocator_arg_t overloads of promise_type::operator new > + // should not be constrained > + template > void* > operator new(std::size_t __sz, > - allocator_arg_t, const _Na& __na, > + allocator_arg_t, const _Alloc& __a, >const _Args&...) > - requires convertible_to > { > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > - __sz); > + static_assert(convertible_to); > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > } > > - template > + template > void* > operator new(std::size_t __sz, >const _This&, > - allocator_arg_t, const _Na& __na, > + allocator_arg_t, const _Alloc& __a, >const _Args&...) > - requires convertible_to > { > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > - __sz); > + static_assert(convertible_to); It might be worth giving a suggestion here? > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > } > > void > @@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > } > } > > - template > + template > static void* > - _M_allocate(const _Na& __na, std::size_t __csz) > + _M_allocate(const _Alloc& __a, std::size_t __csz) > { > - using _Rebound = typename std::allocator_traits<_Na> > - ::template rebind_alloc<_Alloc_block>; > - using _Rebound_ATr = typename std::allocator_traits<_Na> > - ::template rebind_traits<_Alloc_block>; > + using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>; > + using _Rebound_ATr = allocator_traits<_Rebound>; > > static_assert(is_pointer_v, > "Must use allocators for true pointers with > generators"); > > _Dea
[patch,avr] Add new 24-bit address space __flashx
This patch adds __flashx as a new named address space that allocates objects in .progmemx.data. The handling is mostly the same or similar to that of 24-bit space __memx, except that the output routines are simpler and more efficient. Loads are emit inline when ELPMX or LPMX is available. The address space uses a 24-bit addresses even on devices with a program memory size of 64 KiB or less. Passes without new regressions. Also passes without new regressions when used as AS in the proposed new target hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA, cf. https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671216.html Ok for trunk? Johann -- AVR: Add __flashx as 24-bit named address space. This patch adds __flashx as a new named address space that allocates objects in .progmemx.data. The handling is mostly the same or similar to that of 24-bit space __memx, except that the output routines are simpler and more efficient. Loads are emit inline when ELPMX or LPMX is available. The address space uses a 24-bit addresses even on devices with a program memory size of 64 KiB or less. gcc/ * doc/extend.texi (AVR Named Address Spaces): Document __flashx. * config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value. * config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p) (avr_fload_libgcc_p, avr_load_libgcc_mem_p) (avr_load_libgcc_insn_p): New. * config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX. (avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p) (avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload): New functions. (avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case. (avr_progmem_p) [avr_decl_flashx_p]: return 2. (avr_addr_space_legitimate_address_p) [ADDR_SPACE_FLASHX]: Has same behavior like ADDR_SPACE_MEMX. (avr_addr_space_convert): Use pointer sizes rather then ASes. (avr_addr_space_contains): New function. (avr_convert_to_type): Use it. (avr_emit_cpymemhi): Handle ADDR_SPACE_FLASHX. * config/avr/avr.md (adjust_len) : New attr value. (gen_load_libgcc): Renamed from load_libgcc. (xload8_A): Iterate over MOVMODE rather than over ALL1. (fxmov_A): New from xloadv_A. (xmov_8): New from xload_A. (fmov): New insns. (fxload_A): New from xload_A. (fxload__libgcc): New from xload__libgcc. (*fxload__libgcc): New from *xload__libgcc. (mov) [avr_mem_flashx_p]: Hande ADDR_SPACE_FLASHX. (cpymemx_): Make sure the address space is not lost when splitting. (*cpymemx_) [ADDR_SPACE_FLASHX]: Use __movmemf_ for asm. (*ashlqi.1.zextpsi_split): New combine pattern. * config/avr/predicates.md (nox_general_operand): Don't match when avr_mem_flashx_p is true. * config/avr/avr-passes.cc (AVR_LdSt_Props): ADDR_SPACE_FLASHX has no post_inc. gcc/testsuite/ * gcc.target/avr/torture/addr-space-1.h [AVR_HAVE_ELPM]: Use a function to bump .progmemx.data to a high address. * gcc.target/avr/torture/addr-space-2.h: Same. * gcc.target/avr/torture/addr-space-1-fx.c: New test. * gcc.target/avr/torture/addr-space-2-fx.c: New test. libgcc/ * config/avr/t-avr (LIB1ASMFUNCS): Add _fload_1, _fload_2, _fload_3, _fload_4, _movmemf. * config/avr/lib1funcs.S (.branch_plus): New .macro. (__xload_1, __xload_2, __xload_3, __xload_4): When the address is located in flash, then forward to... (__fload_1, __fload_2, __fload_3, __fload_4): ...these new functions, respectively. (__movmemx_hi): When the address is located in flash, forward to... (__movmemf_hi): ...this new function.AVR: Add __flashx as 24-bit named address space. This patch adds __flashx as a new named address space that allocates objects in .progmemx.data. The handling is mostly the same or similar to that of 24-bit space __memx, except that the output routines are simpler and more efficient. Loads are emit inline when ELPMX or LPMX is available. The address space uses a 24-bit addresses even on devices with a program memory size of 64 KiB or less. gcc/ * doc/extend.texi (AVR Named Address Spaces): Document __flashx. * config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value. * config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p) (avr_fload_libgcc_p, avr_load_libgcc_mem_p) (avr_load_libgcc_insn_p): New. * config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX. (avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p) (avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload): New functions. (avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case. (avr_progmem_p) [avr_decl_flashx_p]: return 2.
Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
On Wed, 11 Dec 2024 at 12:17, Arsen Arsenović wrote: > > Jonathan Wakely writes: > > > This was approved in Wrocław as LWG 3900, so that passing an incorrect > > argument intended as an allocator will be ill-formed, instead of > > silently ignored. > > > > This also renames the template parameters and function parameters for > > the allocators, to align with the names in the standard. I found it too > > confusing to have a parameter _Alloc which doesn't correspond to Alloc > > in the standard. Rename _Alloc to _Allocator (which the standard calls > > Allocator) and rename _Na to _Alloc (which the standard calls Alloc). > > > > libstdc++-v3/ChangeLog: > > > > * include/std/generator (_Promise_alloc): Rename template > > parameter. Use __alloc_rebind to rebind allocator. > > (_Promise_alloc::operator new): Replace constraints with a > > static_assert in the body. Rename allocator parameter. > > (_Promise_alloc::_M_allocate): Rename allocator parameter. > > Use __alloc_rebind to rebind allocator. > > (_Promise_alloc::operator new): Rename allocator > > parameter. > > * testsuite/24_iterators/range_generators/alloc.cc: New test. > > * testsuite/24_iterators/range_generators/lwg3900.cc: New test. > > --- > > > > Tested x86_64-linux. > > > > libstdc++-v3/include/std/generator| 55 +-- > > .../24_iterators/range_generators/alloc.cc| 45 +++ > > .../24_iterators/range_generators/lwg3900.cc | 16 ++ > > 3 files changed, 87 insertions(+), 29 deletions(-) > > create mode 100644 > > libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc > > create mode 100644 > > libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc > > LGTM, with one suggestion below. > > > diff --git a/libstdc++-v3/include/std/generator > > b/libstdc++-v3/include/std/generator > > index 0a14e70064e..16b953e90af 100644 > > --- a/libstdc++-v3/include/std/generator > > +++ b/libstdc++-v3/include/std/generator > > @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > concept _Stateless_alloc = > > (allocator_traits<_All>::is_always_equal::value > > && default_initializable<_All>); > > > > -template > > +template > >class _Promise_alloc > >{ > > - using _ATr = allocator_traits<_Alloc>; > > - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>; > > - using _Rebound_ATr = typename _ATr > > - ::template rebind_traits<_Alloc_block>; > > + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>; > > + using _Rebound_ATr = allocator_traits<_Rebound>; > > static_assert(is_pointer_v, > > "Must use allocators for true pointers with > > generators"); > > > > @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >public: > > void* > > operator new(std::size_t __sz) > > - requires default_initializable<_Rebound> // _Alloc is non-void > > + requires default_initializable<_Rebound> // _Allocator is non-void > > { return _M_allocate({}, __sz); } > > > > - template > > + // _GLIBCXX_RESOLVE_LIB_DEFECTS > > + // 3900. The allocator_arg_t overloads of promise_type::operator new > > + // should not be constrained > > + template > > void* > > operator new(std::size_t __sz, > > - allocator_arg_t, const _Na& __na, > > + allocator_arg_t, const _Alloc& __a, > >const _Args&...) > > - requires convertible_to > > { > > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > > - __sz); > > + static_assert(convertible_to); > > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > > } > > > > - template > > + template > > void* > > operator new(std::size_t __sz, > >const _This&, > > - allocator_arg_t, const _Na& __na, > > + allocator_arg_t, const _Alloc& __a, > >const _Args&...) > > - requires convertible_to > > { > > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > > - __sz); > > + static_assert(convertible_to); > > It might be worth giving a suggestion here? Like this? /home/jwakely/gcc/15/include/c++/15.0.0/generator:478:25: error: static assertion failed: the argument that follows std::allocator_arg must be convertible to the generator's allocator type 478 | static_assert(convertible_to, | ^ Can we be more precise than "the argument"? Is "the coroutine argument" accurate? The function that the argument is passed to is a coroutine, so I think that's correct. > > > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > > } > > > > void > > @@ -5
[PATCH] ifcombine field-merge: saturate align at inner object size
On Dec 10, 2024, Richard Biener wrote: > On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva wrote: >> Regstrapped on x86_64-linux-gnu. The aarch64 CI had reported a >> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu, >> but it will still be a while. Ok to install if it passes? > OK. Thanks. The aarch64-linux-gnu bootstrap revealed a latent surprise, in the form of a bootstrap failure, that this incremental patch addresses. I'll hold off from installing the larger patch without a fix for this. A bootstrap on aarch64-linux-gnu revealed that sometimes (for example, when building shorten_branches in final.cc) we will find such things as MEM , where unsigned int happen to be a variant of the original unsigned int type, given 64-bit alignment. This unusual alignment circumstance caused get_best_mode to choose DImode instead of SImode, and that failed gimple verification because there aren't that many bits in the unsigned int object. Arrange for alignment to saturate at the inner object size to avoid tripping this error. Regstrapping on x86_64-linux-gnu. Regstrapped an earlier version without lr_align on aarch64-linux-gnu. Ok to install? for gcc/ChangeLog * gimple-fold.cc (fold_truth_andor_for_ifcombine): Saturate align at inner object size. --- gcc/gimple-fold.cc | 39 +++ 1 file changed, 27 insertions(+), 12 deletions(-) diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index a31fc283d51b0..967356a950192 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -8204,24 +8204,31 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, to be relative to a field of that size. */ first_bit = MIN (ll_bitpos, rl_bitpos); end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize); + HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner)); + /* Guard from types with wider-than-size alignment. We must not widen the + load beyond its total size. This is rare. */ + while (ll_align > BITS_PER_UNIT +&& TYPE_SIZE (TREE_TYPE (ll_inner)) +&& uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner))) +&& known_gt ((unsigned HOST_WIDE_INT)ll_align, + tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner) +ll_align /= 2; if (get_best_mode (end_bit - first_bit, first_bit, 0, 0, -TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD, -volatilep, &lnmode)) +ll_align, BITS_PER_WORD, volatilep, &lnmode)) l_split_load = false; else { /* Consider the possibility of recombining loads if any of the fields straddles across an alignment boundary, so that either part can be loaded along with the other field. */ - HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner)); HOST_WIDE_INT boundary = compute_split_boundary_from_align - (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); + (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); if (boundary < 0 || !get_best_mode (boundary - first_bit, first_bit, 0, 0, -align, BITS_PER_WORD, volatilep, &lnmode) +ll_align, BITS_PER_WORD, volatilep, &lnmode) || !get_best_mode (end_bit - boundary, boundary, 0, 0, -align, BITS_PER_WORD, volatilep, &lnmode2)) +ll_align, BITS_PER_WORD, volatilep, &lnmode2)) return 0; /* If we can't have a single load, but can with two, figure out whether @@ -8368,16 +8375,24 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, and then we use two separate compares. */ first_bit = MIN (lr_bitpos, rr_bitpos); end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize); + HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner)); + /* Guard from types with wider-than-size alignment. We must not widen the +load beyond its total size. This is rare. */ + while (lr_align > BITS_PER_UNIT +&& TYPE_SIZE (TREE_TYPE (lr_inner)) +&& uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner))) +&& known_gt ((unsigned HOST_WIDE_INT)lr_align, + tree_to_poly_uint64 (TYPE_SIZE + (TREE_TYPE (lr_inner) + lr_align /= 2; if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0, - TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD, - volatilep, &rnmode)) + lr_align, BITS_PER_WORD, volatilep, &rnmode)) { /* Consider the possibility of recombining loads if any of the fields straddles across an alignment boundary, so that either part can be loaded along with the other field. */ - HOST_WIDE_INT align = TYPE_ALIGN (TRE
RE: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, December 4, 2024 12:18 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > ktkac...@gcc.gnu.org; Richard Sandiford > Subject: [PATCH 7/7]AArch64: Implement vector concat of partial SVE vectors > > Hi All, > > This patch adds support for vector constructor from two partial SVE vectors > into > a full SVE vector. It also implements support for the standard vec_init obtab > to > do this. > > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-sve.md (vec_init): New. > (@aarch64_pack_partial): New. > * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init): Special > case constructors of two vectors. > * config/aarch64/iterators.md (SVE_NO2E, SVE_PARTIAL_NO2E): New. > (VHALF, Vhalf, Vwstype): Add SVE partial vectors. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/vect-simd-clone-2.c: New test. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > --- > diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64- > sve.md > index > 9afd11d347626eeb640722fdba2ab763b8479aa7..9e3577be6e943d7a5c95119 > 6463873d4bcfee07c 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -2840,6 +2840,16 @@ (define_expand "vec_init" >} > ) > > +(define_expand "vec_init" > + [(match_operand:SVE_NO2E 0 "register_operand") > +(match_operand 1 "")] > + "TARGET_SVE" > + { > +aarch64_sve_expand_vector_init (operands[0], operands[1]); > +DONE; > + } > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0. > (define_insn "vec_shl_insert_" >[(set (match_operand:SVE_FULL 0 "register_operand") > @@ -9347,6 +9357,20 @@ (define_insn "vec_pack_trunc_" >"uzp1\t%0., %1., %2." > ) > > +;; Integer partial pack packing two partial SVE types into a single full SVE > +;; type of the same element type. Use UZP1 on the wider type, which discards > +;; the high part of each wide element. This allows to concat SVE partial > types > +;; into a wider vector. > +(define_insn "@aarch64_pack_partial" > + [(set (match_operand:SVE_PARTIAL_NO2E 0 "register_operand" "=w") > + (unspec:SVE_PARTIAL_NO2E > + [(match_operand: 1 "register_operand" "w") > +(match_operand: 2 "register_operand" "w")] > + UNSPEC_PACK))] > + "TARGET_SVE" > + "uzp1\t%0., %1., %2." > +) > + > ;; - > ;; [INT<-INT] Unpacks > ;; - > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > af6fede102c2be6673c24f8020d000ea56322997..690d54b0a2954327e00d559f > 96c414c81c2e18cd 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24790,6 +24790,17 @@ aarch64_sve_expand_vector_init (rtx target, rtx > vals) > v.quick_push (XVECEXP (vals, 0, i)); >v.finalize (); > > + /* If we have two elements and are concatting vector. */ > + machine_mode elem_mode = GET_MODE (v.elt (0)); > + if (nelts == 2 && VECTOR_MODE_P (elem_mode)) > +{ > + /* We've failed expansion using a dup. Try using a cheeky truncate. */ > + rtx arg0 = force_reg (elem_mode, v.elt(0)); > + rtx arg1 = force_reg (elem_mode, v.elt(1)); > + emit_insn (gen_aarch64_pack_partial (mode, target, arg0, arg1)); > + return; > +} > + >/* If neither sub-vectors of v could be initialized specially, > then use INSR to insert all elements from v into TARGET. > ??? This might not be optimal for vectors with large > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index > 023893d35f3e955e222c322ce370e84c95c29ee6..77d23d6ad795630d3d5fb5c0 > 76c086a479d46fee 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -138,6 +138,14 @@ (define_mode_iterator VQ_I [V16QI V8HI V4SI V2DI]) > ;; VQ without 2 element modes. > (define_mode_iterator VQ_NO2E [V16QI V8HI V4SI V8HF V4SF V8BF]) > > +;; SVE modes without 2 element modes. > +(define_mode_iterator SVE_NO2E [VNx16QI VNx8QI VNx4QI VNx8HI VNx4HI > VNx8HF > + VNx4HF VNx8BF VNx4BF VNx4SI VNx4SF]) > + > +;; Partial SVE modes without 2 element modes. > +(define_mode_iterator SVE_PARTIAL_NO2E [VNx8QI VNx4QI VNx4HI > + VNx4HF VNx8BF VNx4BF]) > + > ;; 2 element quad vector modes. > (define_mode_iterator VQ_2E [V2DI V2DF]) > > @@ -1678,7 +1686,15 @@ (define_mode_attr VHALF [(V8QI "V4QI") (V16QI > "V8QI") >(V2DI "DI")(V2SF "SF") >(V4SF "V2SF") (V4HF "V2HF") >(V8HF "V4HF") (V2DF "DF") > -
RE: [PATCH 3/7]AArch64: Disable `omp declare variant' tests for aarch64 [PR96342]
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, December 4, 2024 12:17 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > ktkac...@gcc.gnu.org; Richard Sandiford > Subject: [PATCH 3/7]AArch64: Disable `omp declare variant' tests for aarch64 > [PR96342] > > Hi All, > > These tests are x86 specific and shouldn't be run for aarch64. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 > target > only test. > * gfortran.dg/gomp/declare-variant-14.f90: Likewise. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > --- > diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c > b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c > index > e3668893afe33a58c029cddd433d9bf43cce2bfa..2b71869787e819dc7bb8ca8f9 > 512792ac2877515 100644 > --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c > +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c > @@ -1,6 +1,6 @@ > -/* { dg-do compile { target vect_simd_clones } } */ > +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } > } } */ > /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */ > -/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } > */ > +/* { dg-additional-options "-mno-sse3" } */ > > int f01 (int); > int f02 (int); > @@ -15,15 +15,13 @@ int > test1 (int x) > { >/* At gimplification time, we can't decide yet which function to call. */ > - /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { > !aarch64*- > *-* } } } } */ > + /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */ >/* After simd clones are created, the original non-clone test1 shall > call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones > shall call f01 with score 8. */ >/* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */ > - /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { > !aarch64*-*-* } } } } */ > - /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { > aarch64*-*-* } } } } */ > - /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { > !aarch64*-*-* } } } } */ > - /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { > aarch64*-*-* } } } } */ > + /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */ > + /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */ >int a = f04 (x); >int b = f04 (x); >return a + b; > diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 > b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 > index > 6319df0558f37b95f1b2eb17374bdb4ecbc33295..8db341fd15306a5deeae1468 > 08d7ef55aa713bb1 100644 > --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 > +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 > @@ -1,6 +1,6 @@ > -! { dg-do compile { target vect_simd_clones } } > +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } > } */ > ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" } > -! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } > +! { dg-additional-options "-mno-sse3" } > > module main >implicit none > @@ -40,10 +40,8 @@ contains > ! call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones > ! shall call f01 with score 8. > ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } > -! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { > !aarch64*-*-* } } } } > -! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { > aarch64*-*-* } } } } > -! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { > !aarch64*-*-* } } } } > -! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { > aarch64*-*-* } } } } > +! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } > +! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } > a = f04 (x) > b = f04 (x) > test1 = a + b > > > > > --
RE: [PATCH 2/7]AArch64: Add SVE support for simd clones [PR96342]
ping > -Original Message- > From: Tamar Christina > Sent: Wednesday, December 4, 2024 12:17 PM > To: gcc-patches@gcc.gnu.org > Cc: nd ; Richard Earnshaw ; > ktkac...@gcc.gnu.org; Richard Sandiford > Subject: [PATCH 2/7]AArch64: Add SVE support for simd clones [PR96342] > > Hi All, > > This patch finalizes adding support for the generation of SVE simd clones when > no simdlen is provided, following the ABI rules where the widest data type > determines the minimum amount of elements in a length agnostic vector. > > gcc/ChangeLog: > > PR target/96342 > * config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare. > * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make > visibility global and support use for non_acle types. > * config/aarch64/aarch64.cc > (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd > clone > when no simdlen is provided, according to ABI rules. > (simd_clone_adjust_sve_vector_type): New helper function. > (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones > and modify types to use SVE types. > * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen. > (simd_clone_adjust): Adapt safelen check to be compatible with VLA > simdlen. > > gcc/testsuite/ChangeLog: > > PR target/96342 > * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan. > * gcc.target/aarch64/vect-simd-clone-1.c: New test. > * g++.target/aarch64/vect-simd-clone-1.c: New test. > > > Co-authored-by: Victor Do Nascimento > Co-authored-by: Tamar Christina > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master? > > Thanks, > Tamar > > --- > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64- > protos.h > index > c6ce62190bce43fae7b0c9d64202a7c042df6ef4..e7724e0518dd97a120edbc5f0 > 2b20298a57c653f 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -1138,6 +1138,8 @@ namespace aarch64_sve { > #ifdef GCC_TARGET_H >bool verify_type_context (location_t, type_context_kind, const_tree, bool); > #endif > + void add_sve_type_attribute (tree, unsigned int, unsigned int, > + const char *, const char *); > } > > extern void aarch64_split_combinev16qi (rtx operands[3]); > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index > 0fec1cd439e729dca495aac4dea054a25ede20a7..e6c2bdeb00681848a838009c > 833cfe3271a94049 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -998,14 +998,16 @@ static GTY(()) hash_map > *overload_names[2]; > /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE > vectors > and NUM_PR SVE predicates. MANGLED_NAME, if nonnull, is the ABI-defined > mangling of the type. ACLE_NAME is the name of the type. */ > -static void > +void > add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr, > const char *mangled_name, const char *acle_name) > { >tree mangled_name_tree > = (mangled_name ? get_identifier (mangled_name) : NULL_TREE); > + tree acle_name_tree > += (acle_name ? get_identifier (acle_name) : NULL_TREE); > > - tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE); > + tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE); >value = tree_cons (NULL_TREE, mangled_name_tree, value); >value = tree_cons (NULL_TREE, size_int (num_pr), value); >value = tree_cons (NULL_TREE, size_int (num_zr), value); > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index > 4108c09715a5540db87ec4ba74a10804af78054a..af6fede102c2be6673c24f80 > 20d000ea56322997 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -29284,7 +29284,7 @@ > aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, > int num, bool explicit_p) > { >tree t, ret_type; > - unsigned int nds_elt_bits; > + unsigned int nds_elt_bits, wds_elt_bits; >unsigned HOST_WIDE_INT const_simdlen; > >if (!TARGET_SIMD) > @@ -29329,10 +29329,14 @@ > aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, >if (TREE_CODE (ret_type) != VOID_TYPE) > { >nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type); > + wds_elt_bits = nds_elt_bits; >vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits)); > } >else > -nds_elt_bits = POINTER_SIZE; > +{ > + nds_elt_bits = POINTER_SIZE; > + wds_elt_bits = 0; > +} > >int i; >tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); > @@ -29340,44 +29344,65 @@ > aarch64_simd_clone_
Re: [PATCH] ifcombine field-merge: saturate align at inner object size
> Am 11.12.2024 um 14:50 schrieb Alexandre Oliva : > > On Dec 10, 2024, Richard Biener wrote: > >>> On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva wrote: >>> Regstrapped on x86_64-linux-gnu. The aarch64 CI had reported a >>> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu, >>> but it will still be a while. Ok to install if it passes? > >> OK. > > Thanks. The aarch64-linux-gnu bootstrap revealed a latent surprise, in > the form of a bootstrap failure, that this incremental patch addresses. > I'll hold off from installing the larger patch without a fix for this. > > > A bootstrap on aarch64-linux-gnu revealed that sometimes (for example, > when building shorten_branches in final.cc) we will find such things > as MEM , where unsigned int happen to be a variant of > the original unsigned int type, given 64-bit alignment. This unusual > alignment circumstance caused get_best_mode to choose DImode instead > of SImode, and that failed gimple verification because there aren't > that many bits in the unsigned int object. > > Arrange for alignment to saturate at the inner object size to avoid > tripping this error. > > Regstrapping on x86_64-linux-gnu. Regstrapped an earlier version > without lr_align on aarch64-linux-gnu. Ok to install? > > > for gcc/ChangeLog > >* gimple-fold.cc (fold_truth_andor_for_ifcombine): Saturate >align at inner object size. > --- > gcc/gimple-fold.cc | 39 +++ > 1 file changed, 27 insertions(+), 12 deletions(-) > > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc > index a31fc283d51b0..967356a950192 100644 > --- a/gcc/gimple-fold.cc > +++ b/gcc/gimple-fold.cc > @@ -8204,24 +8204,31 @@ fold_truth_andor_for_ifcombine (enum tree_code code, > tree truth_type, > to be relative to a field of that size. */ > first_bit = MIN (ll_bitpos, rl_bitpos); > end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize); > + HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner)); > + /* Guard from types with wider-than-size alignment. We must not widen the > + load beyond its total size. This is rare. */ > + while (ll_align > BITS_PER_UNIT > + && TYPE_SIZE (TREE_TYPE (ll_inner)) > + && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner))) > + && known_gt ((unsigned HOST_WIDE_INT)ll_align, > + tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner) > +ll_align /= 2; > if (get_best_mode (end_bit - first_bit, first_bit, 0, 0, I think These 0, 0 args are supposed to indicate Maximum extent of the resulting Access - but Widening to the largest Mode fitting alignment boundary should be valid, so I Wonder what the failure Mode was? Richard > - TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD, > - volatilep, &lnmode)) > + ll_align, BITS_PER_WORD, volatilep, &lnmode)) > l_split_load = false; > else > { > /* Consider the possibility of recombining loads if any of the > fields straddles across an alignment boundary, so that either > part can be loaded along with the other field. */ > - HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner)); > HOST_WIDE_INT boundary = compute_split_boundary_from_align > -(align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); > +(ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); > > if (boundary < 0 > || !get_best_mode (boundary - first_bit, first_bit, 0, 0, > - align, BITS_PER_WORD, volatilep, &lnmode) > + ll_align, BITS_PER_WORD, volatilep, &lnmode) > || !get_best_mode (end_bit - boundary, boundary, 0, 0, > - align, BITS_PER_WORD, volatilep, &lnmode2)) > + ll_align, BITS_PER_WORD, volatilep, &lnmode2)) >return 0; > > /* If we can't have a single load, but can with two, figure out whether > @@ -8368,16 +8375,24 @@ fold_truth_andor_for_ifcombine (enum tree_code code, > tree truth_type, > and then we use two separate compares. */ > first_bit = MIN (lr_bitpos, rr_bitpos); > end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize); > + HOST_WIDE_INT lr_align = TYPE_ALIGN (TREE_TYPE (lr_inner)); > + /* Guard from types with wider-than-size alignment. We must not widen > the > + load beyond its total size. This is rare. */ > + while (lr_align > BITS_PER_UNIT > + && TYPE_SIZE (TREE_TYPE (lr_inner)) > + && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (lr_inner))) > + && known_gt ((unsigned HOST_WIDE_INT)lr_align, > + tree_to_poly_uint64 (TYPE_SIZE > + (TREE_TYPE (lr_inner) > +lr_align /= 2; > if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0, > - TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD, > - volatilep, &rnmode)) > + lr_align, BITS_PER_WORD, volatilep, &rnmode)) >{ > /* C
Re: [PATCH] c++: allow stores to anon union vars to change current union member in constexpr [PR117614]
On 12/11/24 4:11 AM, Jakub Jelinek wrote: Hi! Since r14-4771 the FE tries to differentiate between cases where the lhs of a store allows changing the current union member and cases where it doesn't, and cases where it doesn't includes everything that has gone through the cxx_eval_constant_expression path on the lhs. As the testcase shows, DECL_ANON_UNION_VAR_P vars were handled like that too, even when stores to them are the only way how to change the current union member in the sources. So, the following patch just handles that case manually without calling cxx_eval_constant_expression and without setting evaluated to true. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK for trunk and 14. 2024-12-11 Jakub Jelinek PR c++/117614 * constexpr.cc (cxx_eval_store_expression): For stores to DECL_ANON_UNION_VAR_P vars just continue with DECL_VALUE_EXPR of it, without setting evaluated to true or full cxx_eval_constant_expression. * g++.dg/cpp2a/constexpr-union8.C: New test. --- gcc/cp/constexpr.cc.jj 2024-12-06 09:08:20.977872404 +0100 +++ gcc/cp/constexpr.cc 2024-12-09 16:57:23.682959311 +0100 @@ -6418,6 +6418,14 @@ cxx_eval_store_expression (const constex object = probe; else { + tree pvar = tree_strip_any_location_wrapper (probe); + if (VAR_P (pvar) && DECL_ANON_UNION_VAR_P (pvar)) + { + /* Stores to DECL_ANON_UNION_VAR_P var are allowed to change +active union member. */ + probe = DECL_VALUE_EXPR (pvar); + break; + } probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue, non_constant_p, overflow_p); evaluated = true; --- gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C.jj2024-12-09 17:06:02.876721019 +0100 +++ gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C 2024-12-09 17:05:01.832572060 +0100 @@ -0,0 +1,31 @@ +// PR c++/117614 +// { dg-do compile { target c++20 } } + +constexpr int +foo () +{ + union { +int x{0}; +char y; + }; + y = 1; + return y; +} + +constexpr int +bar () +{ + union { +union { + int x{0}; + char y; +}; +long long z; + }; + y = 1; + z = 2; + return z; +} + +static_assert (foo () == 1); +static_assert (bar () == 2); Jakub
Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support
> Yes, please, I don't have commit access. OK, done. -- Eric Botcazou
Re: [PATCH] ifcombine field-merge: saturate align at inner object size
On Dec 11, 2024, Richard Biener wrote: > Am 11.12.2024 um 14:50 schrieb Alexandre Oliva : >> if (get_best_mode (end_bit - first_bit, first_bit, 0, 0, > I think These 0, 0 args are supposed to indicate Maximum extent of the > resulting Access - but Widening to the largest Mode fitting alignment > boundary should be valid, so I Wonder what the failure Mode was? The generated BIT_FIELD_REF hit this gimple check in tree-cfg.cc: if (!AGGREGATE_TYPE_P (TREE_TYPE (op)) && known_gt (size + bitpos, tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (op) { error ("position plus size exceeds size of referenced object in " "%qs", code_name); return true; } -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for not behaving ""normal"" is *not* inclusive
Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
Jonathan Wakely writes: >> + for (auto _ : gen("const This& argument", std::allocator_arg , &mr)) > > I have no idea what the promise_type::operator new overload that takes > a const This& argument is for, but this invocation of gen uses it, so > ... yay? Member coroutines can call that overload: struct foo { std::generator> bar(std::allocator_arg_t, std::allocator) const& { co_yield 123; } }; void f() { foo x; x.bar(std::allocator_arg, {}); } >> +VERIFY(mr.number_of_active_allocations() == 1); >> + >> + VERIFY(mr.number_of_active_allocations() == 0); >> +} >> diff --git a/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc >> b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc >> new file mode 100644 >> index 000..957879e8862 >> --- /dev/null >> +++ b/libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc >> @@ -0,0 +1,16 @@ >> +// { dg-do compile { target c++23 } } >> + >> +// LWG 3900. allocator_arg_t overloads of generator::promise_type::operator >> new >> +// should not be constrained >> + >> +#include >> +#include >> + >> +std::pmr::generator >> +bar(std::allocator_arg_t, std::pmr::memory_resource& mr) // { dg-error >> "here" } >> +{ >> + co_yield 3; >> +} >> + >> +// { dg-error "static assertion failed" "" { target *-*-* } 0 } >> +// { dg-error "no matching function .*memory_resource&" "" { target *-*-* } >> 0 } >> -- >> 2.47.1 >> -- Arsen Arsenović signature.asc Description: PGP signature
Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
Jonathan Wakely writes: > Like this? > > /home/jwakely/gcc/15/include/c++/15.0.0/generator:478:25: error: > static assertion failed: the argument that follows std::allocator_arg > must be convertible to the generator's allocator type > 478 | static_assert(convertible_to, > | ^ > > Can we be more precise than "the argument"? Is "the coroutine > argument" accurate? The function that the argument is passed to is a > coroutine, so I think that's correct. Yes, I think 'coroutine argument' is OK - this function is only called for allocating the coroutine state when a coroutine is starting, so it can't be anything else (well, unless the user does the call manually for some reason..). -- Arsen Arsenović signature.asc Description: PGP signature
Re: [PATCH] Add COBOL to gcc
I think the term of art is "ping"? If GCC needs something from me to proceed with this, please tell me what it is. --jkl On Thu, 7 Nov 2024 17:28:33 -0500 "James K. Lowden" wrote: > On Fri, 8 Nov 2024 13:52:55 +0100 > Jakub Jelinek wrote: > > > Rather than a diff from /dev/null, > > > it's a blob with the exact file contents. I hope it is correct in > > > this form. > > > > That is just how the web git viewer presents new file commits. > > On gcc-patches those should be posted as normal patches. > > Below is hopefully a well formed patch. It adds ChangeLogs for the > COBOL front end. > > [snip] > From 304f3678dbade1f60abdadb9ddd2baffae88013dpre.patch 4 Oct 2024 > 12:01:22 -0400 From: "James K. Lowden" > Date: Fri 08 Nov 2024 03:30:08 PM EST > Subject: [PATCH] Add 'cobol' to 2 files > > gcc/cobol/ChangeLog [new file with mode: 0644] > libgcobol/ChangeLog [new file with mode: 0644] > > --- > gcc/cobol/ChangeLog | ++- > libgcobol/ChangeLog | ++ > 2 files changed, 12 insertions(+), 2 deletions(-) > diff --git a/gcc/cobol/ChangeLog b/gcc/cobol/ChangeLog > new file mode 100644 > index 000..2988f44a1f1 > --- /dev/null > +++ b/gcc/cobol/ChangeLog > @@ -0,0 +1,6 @@ > +^L > +Copyright (C) 2022 Free Software Foundation, Inc. > + > +Copying and distribution of this file, with or without modification, > +are permitted in any medium without royalty provided the copyright > +notice and this notice are preserved. > diff --git a/libgcobol/ChangeLog b/libgcobol/ChangeLog > new file mode 100644 > index 000..2988f44a1f1 > --- /dev/null > +++ b/libgcobol/ChangeLog > @@ -0,0 +1,6 @@ > +^L > +Copyright (C) 2022 Free Software Foundation, Inc. > + > +Copying and distribution of this file, with or without modification, > +are permitted in any medium without royalty provided the copyright > +notice and this notice are preserved. > [pins]
Re: [PATCH] c++: template-id dependence wrt local static arg [PR117792]
On Wed, 27 Nov 2024, Patrick Palka wrote: > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK > for trunk/14/13? Ping. It occurred to me that another way to fix this PR might be to tweak the overload set pruning (for non-dependent calls) to un-specialize the selected function template specialization if it turned out to be dependent: diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index efa748c7dc8..08d05209f44 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -3311,6 +3311,13 @@ finish_call_expr (tree fn, vec **args, bool disallow_virtual, /* Our original callee wasn't wrapped in an ADDR_EXPR, so strip this ADDR_EXPR added by build_over_call. */ sel_fn = TREE_OPERAND (sel_fn, 0); + if (TREE_CODE (sel_fn) == FUNCTION_DECL + && type_dependent_expression_p (sel_fn) + && PRIMARY_TEMPLATE_P (DECL_TI_TEMPLATE (sel_fn))) + { + sel_fn = ovl_make (DECL_TI_TEMPLATE (sel_fn)); + sel_fn = lookup_template_function (fns, ...); + } orig_fn = sel_fn; } But IMHO this should be impossible in the first place -- if an unresolved overload set is deemed non-dependent then any specialization selected by overload resolution should be deemed non-dependent as well. Or conversely if overload resolution yielded a dependent specialization then the unresolved overload set should be considered dependent. > > -- >8 -- > > Here we end up ICEing at instantiation time for the call to > f ultimately because we wrongly consider the call to be > non-dependent, and so we specialize f ahead of time and then get > confused when fully substituting this specialization. > > The call is dependent due to [temp.dep.temp]/3 and we miss that because > function template-id arguments aren't coerced until overload resolution, > and so the local static template argument lacks an implicit cast to > reference type that value_dependent_expression_p looks for before > considering dependence of the address. Other kinds of template-ids aren't > affected since they're coerced ahead of time. > > So when considering dependence of a function template-id, we need to > conservatively consider dependence of the address of each argument (if > applicable). > > PR c++/117792 > > gcc/cp/ChangeLog: > > * pt.cc (type_dependent_expression_p): Consider the dependence > of the address of each template argument of a function > template-id. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp1z/nontype7.C: New test. > --- > gcc/cp/pt.cc | 10 -- > gcc/testsuite/g++.dg/cpp1z/nontype7.C | 22 ++ > 2 files changed, 30 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/cpp1z/nontype7.C > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc > index 564e368ff43..2f2ec39b083 100644 > --- a/gcc/cp/pt.cc > +++ b/gcc/cp/pt.cc > @@ -28864,9 +28864,15 @@ type_dependent_expression_p (tree expression) > >if (TREE_CODE (expression) == TEMPLATE_ID_EXPR) > { > - if (any_dependent_template_arguments_p > - (TREE_OPERAND (expression, 1))) > + tree args = TREE_OPERAND (expression, 1); > + if (any_dependent_template_arguments_p (args)) > return true; > + /* Arguments of a function template-id aren't necessarily coerced > + yet so we must conservatively assume that the address (and not > + just value) of the argument matters as per [temp.dep.temp]/3. */ > + for (tree arg : tree_vec_range (args)) > + if (has_value_dependent_address (arg)) > + return true; > expression = TREE_OPERAND (expression, 0); > if (identifier_p (expression)) > return true; > diff --git a/gcc/testsuite/g++.dg/cpp1z/nontype7.C > b/gcc/testsuite/g++.dg/cpp1z/nontype7.C > new file mode 100644 > index 000..b03c643c987 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp1z/nontype7.C > @@ -0,0 +1,22 @@ > +// PR c++/117792 > +// { dg-do compile { target c++17 } } > + > +template > +void f(T) { } > + > +template > +void f(...) = delete; > + > +template int v; > + > +template struct A { static const int value = 0; }; > + > +template > +void g() { > + static const int local_static = 0; > + auto x = v; // OK > + auto y = A::value; // OK > + f(0); // ICE > +} > + > +template void g(); > -- > 2.47.1.313.gcc01bad4a9 > >
[PATCH v2] libstdc++: Remove constraints on std::generator::promise_type::operator new
On Wed, 11 Dec 2024 at 14:59, Arsen Arsenović wrote: > > Jonathan Wakely writes: > > >> + for (auto _ : gen("const This& argument", std::allocator_arg , &mr)) > > > > I have no idea what the promise_type::operator new overload that takes > > a const This& argument is for, but this invocation of gen uses it, so > > ... yay? > > Member coroutines can call that overload: > > struct foo { > std::generator> > bar(std::allocator_arg_t, std::allocator) const& > { co_yield 123; } > }; > > void f() { > foo x; > x.bar(std::allocator_arg, {}); > } > Yeah, I figured that out eventually! Here's a v2 patch that adds the static assert messages and now also tests allocation for member coroutines (with implicit and explicit object parameters). commit 220d5bd2d1ff38dce3a8a7dfe7cea476126ff5a4 Author: Jonathan Wakely AuthorDate: Wed Dec 11 10:44:33 2024 Commit: Jonathan Wakely CommitDate: Wed Dec 11 15:23:56 2024 libstdc++: Remove constraints on std::generator::promise_type::operator new This was approved in WrocÅaw as LWG 3900, so that passing an incorrect argument intended as an allocator will be ill-formed, instead of silently ignored. This also renames the template parameters and function parameters for the allocators, to align with the names in the standard. I found it too confusing to have a parameter _Alloc which doesn't correspond to Alloc in the standard. Rename _Alloc to _Allocator (which the standard calls Allocator) and rename _Na to _Alloc (which the standard calls Alloc). libstdc++-v3/ChangeLog: * include/std/generator (_Promise_alloc): Rename template parameter. Use __alloc_rebind to rebind allocator. (_Promise_alloc::operator new): Replace constraints with a static_assert in the body. Rename allocator parameter. (_Promise_alloc::_M_allocate): Rename allocator parameter. Use __alloc_rebind to rebind allocator. (_Promise_alloc::operator new): Rename allocator parameter. * testsuite/24_iterators/range_generators/alloc.cc: New test. * testsuite/24_iterators/range_generators/lwg3900.cc: New test. Reviewed-by: Arsen ArsenoviÄ diff --git a/libstdc++-v3/include/std/generator b/libstdc++-v3/include/std/generator index 0a14e70064e..bba85bd0aa4 100644 --- a/libstdc++-v3/include/std/generator +++ b/libstdc++-v3/include/std/generator @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION concept _Stateless_alloc = (allocator_traits<_All>::is_always_equal::value && default_initializable<_All>); -template +template class _Promise_alloc { - using _ATr = allocator_traits<_Alloc>; - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>; - using _Rebound_ATr = typename _ATr - ::template rebind_traits<_Alloc_block>; + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>; + using _Rebound_ATr = allocator_traits<_Rebound>; static_assert(is_pointer_v, "Must use allocators for true pointers with generators"); @@ -465,30 +463,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION public: void* operator new(std::size_t __sz) - requires default_initializable<_Rebound> // _Alloc is non-void + requires default_initializable<_Rebound> // _Allocator is non-void { return _M_allocate({}, __sz); } - template + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3900. The allocator_arg_t overloads of promise_type::operator new + // should not be constrained + template void* operator new(std::size_t __sz, -allocator_arg_t, const _Na& __na, +allocator_arg_t, const _Alloc& __a, const _Args&...) - requires convertible_to { - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), -__sz); + static_assert(convertible_to, + "the allocator argument to the coroutine must be " + "convertible to the generator's allocator type"); + return _M_allocate(_Rebound(_Allocator(__a)), __sz); } - template + template void* operator new(std::size_t __sz, const _This&, -allocator_arg_t, const _Na& __na, +allocator_arg_t, const _Alloc& __a, const _Args&...) - requires convertible_to { - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), -__sz); + static_assert(convertible_to, + "the allocator argument to the coroutine must be " + "convertible to the generator's allocator type"); + return _M_all
Re: [PATCH] c++: tweak colorization of incompatible declspecs
On Fri, 2024-11-15 at 20:05 -0500, David Malcolm wrote: > Introduce a helper function for complaining about "signed unsigned" > and "short long". Add colorization there so that e.g. the 'signed' > and 'unsigned' are given consisten contrasting colors in both the > message and the quoted source. > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. ...and pushed to trunk as r15-6118-g4b4023d52986b2, having re-run the above checks. Dave
[pushed: r15-6114] input.cc: rename file_cache:in_context
No functional change intended. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-6114-ge4ef3aa2911f1a. gcc/ChangeLog: * input.cc (file_cache::initialize_input_context): Rename member "in_context" to "m_input_context". (file_cache::add_file): Likewise. * input.h (class file_cache): Likewise. Signed-off-by: David Malcolm --- gcc/input.cc | 6 +++--- gcc/input.h | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/input.cc b/gcc/input.cc index 20b5f62371ac..7fc683db23f1 100644 --- a/gcc/input.cc +++ b/gcc/input.cc @@ -45,8 +45,8 @@ void file_cache::initialize_input_context (diagnostic_input_charset_callback ccb, bool should_skip_bom) { - in_context.ccb = (ccb ? ccb : default_charset_callback); - in_context.should_skip_bom = should_skip_bom; + m_input_context.ccb = (ccb ? ccb : default_charset_callback); + m_input_context.should_skip_bom = should_skip_bom; } /* This is a cache used by get_next_line to store the content of a @@ -435,7 +435,7 @@ file_cache::add_file (const char *file_path) unsigned highest_use_count = 0; file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count); - if (!r->create (in_context, file_path, fp, highest_use_count)) + if (!r->create (m_input_context, file_path, fp, highest_use_count)) return NULL; return r; } diff --git a/gcc/input.h b/gcc/input.h index c6eb1aeda3e9..a5863eb9e091 100644 --- a/gcc/input.h +++ b/gcc/input.h @@ -162,7 +162,7 @@ class file_cache private: static const size_t num_file_slots = 16; file_cache_slot *m_file_slots; - input_context in_context; + input_context m_input_context; }; extern expanded_location -- 2.26.3
[pushed: r15-6115] diagnostics: tweak output for nested messages [PR116253]
When printing nested messages with -fdiagnostics-set-output=text:experimental-nesting=yes avoid printing a line such as the "cc1plus:" in the following: • note: set ‘-fconcepts-diagnostics-depth=’ to at least 2 for more detail cc1plus: for "special" locations such as UNKNOWN_LOCATION. Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-6115-g331226fd87c201. gcc/ChangeLog: PR other/116253 * diagnostic-format-text.cc (on_report_diagnostic): When showing locations for nested messages on new lines, don't print UNKNOWN_LOCATION or BUILTINS_LOCATION. Signed-off-by: David Malcolm --- gcc/diagnostic-format-text.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/diagnostic-format-text.cc b/gcc/diagnostic-format-text.cc index 3650da9a927e..e8074aa05dac 100644 --- a/gcc/diagnostic-format-text.cc +++ b/gcc/diagnostic-format-text.cc @@ -226,11 +226,12 @@ on_report_diagnostic (const diagnostic_info &diagnostic, const int nesting_level = get_context ().get_diagnostic_nesting_level (); if (nesting_level > 0) { + location_t loc = diagnostic_location (&diagnostic); pp_set_prefix (pp, nullptr); char *indent_prefix = build_indent_prefix (false); /* Only print changes of location. */ - if (diagnostic_location (&diagnostic) - != get_context ().m_last_location) + if (loc != get_context ().m_last_location + && loc > BUILTINS_LOCATION) { const expanded_location s = diagnostic_expand_location (&diagnostic); -- 2.26.3
Re: [PATCH 1/2] c++: print z candidate count and number them
On Tue, 2024-12-10 at 19:08 -0500, Jason Merrill wrote: > On 12/10/24 6:05 PM, David Malcolm wrote: > > On Sat, 2024-12-07 at 10:19 -0500, Jason Merrill wrote: > > > On 11/15/24 8:02 PM, David Malcolm wrote: > > > > This patch is a followup to: > > > > "c++: use diagnostic nesting [PR116253]" > > > > > > > > Following Sy Brand's UX suggestions in P2429R0 for example 1, > > > > this > > > > patch > > > > tweaks print_z_candidates to add a note about the number of > > > > candidates, > > > > and adds a candidate number to each one. > > > > > > > > Various examples of output can be seen in the testsuite part of > > > > the > > > > patch. > > > > > > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. > > > > OK for trunk? > > > > > > > > gcc/cp/ChangeLog: > > > > * call.cc (print_z_candidates): Count the number of > > > > candidates and issue a note stating the count at an > > > > intermediate nesting level. Number the individual > > > > candidates. > > > > > > > > gcc/testsuite/ChangeLog: > > > > * g++.dg/concepts/diagnostic9.C: Update expected > > > > results for candidate count and numbering. > > > > * g++.dg/concepts/nested-diagnostics-1-truncated.C: > > > > * g++.dg/concepts/nested-diagnostics-1.C: Likewise. > > > > * g++.dg/concepts/nested-diagnostics-2.C: Likewise. > > > > * g++.dg/cpp23/explicit-obj-lambda11.C: Likewise. > > > > * g++.dg/cpp2a/desig4.C: Likewise. > > > > * g++.dg/cpp2a/desig6.C: Likewise. > > > > * g++.dg/cpp2a/spaceship-eq15.C: Likewise. > > > > * g++.dg/diagnostic/function-color1.C: Likewise. > > > > * g++.dg/diagnostic/param-type-mismatch-2.C: Likewise. > > > > * g++.dg/diagnostic/pr100716-1.C: Likewise. > > > > * g++.dg/diagnostic/pr100716.C: Likewise. > > > > * g++.dg/lookup/operator-2.C: Likewise. > > > > * g++.dg/lookup/pr80891-5.C: Likewise. > > > > * g++.dg/modules/adhoc-1_b.C: Likewise. > > > > * g++.dg/modules/err-1_c.C: Likewise. > > > > * g++.dg/modules/err-1_d.C: Likewise. > > > > * g++.dg/other/return2.C: Likewise. > > > > * g++.dg/overload/error6.C: Likewise. > > > > * g++.dg/template/local6.C: Likewise. > > > > > > > > Signed-off-by: David Malcolm > > > > --- > > > > gcc/cp/call.cc | 16 ++- > > > > gcc/testsuite/g++.dg/concepts/diagnostic9.C | 2 +- > > > > .../concepts/nested-diagnostics-1-truncated.C | 25 + > > > > -- > > > > .../g++.dg/concepts/nested-diagnostics-1.C | 43 > > > > ++ > > > > - > > > > .../g++.dg/concepts/nested-diagnostics-2.C | 27 ++--- > > > > --- > > > > .../g++.dg/cpp23/explicit-obj-lambda11.C | 3 +- > > > > gcc/testsuite/g++.dg/cpp2a/desig4.C | 4 +- > > > > gcc/testsuite/g++.dg/cpp2a/desig6.C | 4 +- > > > > gcc/testsuite/g++.dg/cpp2a/spaceship-eq15.C | 8 ++-- > > > > .../g++.dg/diagnostic/function-color1.C | 2 +- > > > > .../g++.dg/diagnostic/param-type-mismatch-2.C | 10 - > > > > gcc/testsuite/g++.dg/diagnostic/pr100716-1.C | 14 +++--- > > > > gcc/testsuite/g++.dg/diagnostic/pr100716.C | 14 +++--- > > > > gcc/testsuite/g++.dg/lookup/operator-2.C | 2 +- > > > > gcc/testsuite/g++.dg/lookup/pr80891-5.C | 6 +-- > > > > gcc/testsuite/g++.dg/modules/adhoc-1_b.C | 4 +- > > > > gcc/testsuite/g++.dg/modules/err-1_c.C | 10 ++--- > > > > gcc/testsuite/g++.dg/modules/err-1_d.C | 6 +-- > > > > gcc/testsuite/g++.dg/other/return2.C | 2 +- > > > > gcc/testsuite/g++.dg/overload/error6.C | 4 +- > > > > gcc/testsuite/g++.dg/template/local6.C | 2 +- > > > > 21 files changed, 116 insertions(+), 92 deletions(-) > > > > > > > > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc > > > > index db9eb1a55cfc..f820419ee4ff 100644 > > > > --- a/gcc/cp/call.cc > > > > +++ b/gcc/cp/call.cc > > > > @@ -4134,6 +4134,16 @@ print_z_candidates (location_t loc, > > > > struct > > > > z_candidate *candidates, > > > > > > > > auto_diagnostic_nesting_level sentinel; > > > > > > > > + int num_candidates = 0; > > > > + for (auto iter = candidates; iter; iter = iter->next) > > > > + ++num_candidates; > > > > + > > > > + inform_n (loc, > > > > + num_candidates, "we found %i candidate", "we found > > > > %i > > > > candidates", > > > > > > The use of 'we' here is novel, especially since we're talking > > > about > > > objective properties of the TU. I'd say "there are". > > > > FWIW I actually prefer the wording "I found" (like does Elm). > > > > Flow's style guide: > > https://calebmer.com/2019/07/01/writing-good-compiler-error-messages.html > > says "Write messages in first person plural. That is, use “we”. For > > example “we see an error”. This personifies the compiler as a team > > of > > peopl
Re: [PATCH v4 1/1] C++: Add flag to generate resolver at default version implementation.
Alfie Richards writes: > Hi Richard, > > Thank you for the feedback! > > On 10/12/2024 18:49, Richard Sandiford wrote: >>> -/* Returns true if DECL is multi-versioned using the target attribute, and >>> this >>> - is the default version. This function can only be used for targets >>> that do >>> - not support the "target_version" attribute. */ >>> +/* Returns true if DECL could be a default version of a FMV function set >>> + If TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_DECL is false, this is when >>> DECL >> TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_IMPL >> >> But could we key this off !TARGET_HAS_FMV_TARGET_ATTRIBUTE instead? >> At the moment that conditino is essentially "aarch64 semantics", and given >> the general complexity of FMV, it would be good to avoid creating more >> variations than we support. > > This function has always seemed like a bad compromise to me and is > currently trying to answer two different questions. > > I don't think having it depend on TARGET_HAS_FMV_TARGET_ATTRIBUTE is > necessarily any better? Both are false equivalences in my opinion. > > I think I would prefer to create a different function called > "could_function_be_default_version" containing the different semantics > which could more reasonably be conditioned on > TARGET_HAS_FMV_TARGET_ATTRIBUTE. > Then similarly, is_function_default_version can be changed back to > working out if a function definitely is the default version I'm not sure. The AArch64 semantics seem to be that an explicit target_version("default") is usually equivalent to no attribute. The only potential difference I'm aware of is whether: __attribute__((target_version("default"))) void f(void) {} should force a .default alias to be created even when f resolves directly to the function. (I'm asking internally for clarification on that.) So would the distinction between "is" and "could be" be useful in practice? This predicate looks at individual decls, whereas the AArch64 semantics mean that "is" is instead a property of the TU as a whole. >> AIUI, we rely on get_function_versions_dispatcher to arrange this ordering. >> (Each target does it independently, sigh.) Can we rely on the hook's >> having been called at this point? >> >> That is, it seems like this code assumes that >> get_function_versions_dispatcher has already been called, whereas the >> code below calls get_function_versions_dispatcher on-the-fly when it >> sees the default version. Does something guarantee that the default >> node is analysed before all of the other versions? > Ahh fair point. I am convinced we can at this point certainly, as this case > is only > triggered if this is the dispatched symbol, which is created by that function > :) Oops, yes. I thought at first it was keyed off non-default versions. > It is a good point in general and there may be other cases I've made the > assumption that this is true. > > In my other patch, I change the logic for record_function_versions > to implicitly ensure the default is always the first version in any > fmv structure, which I think is a good change and I think I will split it > off into a separate patch for both of these to rely on. Sounds good if I've understood correctly. >> The convention is for error messages to start with lower case and not >> to end with ".". > Thank you! I will change this. >> There should be a test to exercise this codepath. Locally I tried: >> >> __attribute__((target_version("sve"))) int f() { return 2; } >> int g() { return f(); } >> >> but it seems to ICE in aarch64_get_function_versions_dispatcher. >> (Hope I didn't mess up the testing.) > Oh dear, apologies. I think I had created tests for this in the > C frontend as that was more my focus when creating this, and clearly > didn't investigate the C++ versions thoroughly enough. I will > copy my tests there over and make sure these paths and cases work > correctly. >> As discussed off-list, this was a last-minute change that doesn't >> compile 🙂 The earlier: >> >>/* Mark the assembler name as set to prevent it getting mangled again >> .*/ >>SET_DECL_ASSEMBLER_NAME (dispatch_decl, >> DECL_ASSEMBLER_NAME (dispatch_decl)); >> >> is probably correct, but I think it'd be worth moving into >> make_dispatcher_decl, for !TARGET_HAS_FMV_TARGET_ATTRIBUTE. >> >> PRs 117987 and 117989 suggest that, independently of this patch, >> we seem to have some issues around FMV and assembler names. > Yes apologies for the unforced error! I think I was in the wrong work > tree or checkout. > > I agree it is better in make_dispatcher_decl but I was a little hesitant > to change other back ends behaviour. Specifically I believe x86 exhibits > the double mangling behaviour at the moment and was nervous to change that as > it could be a breaking change for them. Yeah, that's why I was suggesting keying it off !TARGET_HAS_FMV_TARGET_ATTRIBUTE (aka the "AArch64 semantics" toggle). That should ensure that
Re: [PATCH 00/15] arm: [MVE intrinsics] Rework store_scatter and load_gather intrinsics
On 07/11/2024 09:18, Christophe Lyon wrote: This patch series re-implements the store_scatter and load_gather intrinsincs using the new framework, similarly to previous series. A few points worth mentioning: - unlike other intrinsics, these ones have the predicate after the mode in their names, hence the need for patch #1 - when checking the 'offset' argument of the *_base_* intrinsics, we need ranges with negative lower bounds, unlike what we needed so far (SVE does not have such negative bounds AFAIK), hence the need for patch #5 and the use of 'ss64' instead of 'su64' in signatures. - because of some pecularities in ACLE expected output wrt data type suffix (.16 vs .u16 vs .f16 for instance), I chose to update a few tests in patches #12 and #13, and to introduce a dedicated iterator in other cases (patch#10, using and fixing an existing iterator would have impact on Neon tests). I chose the approach which seemed the less invasive, but maybe we should aim at more consistency and update ACLE instead? Thanks for this, it's a nice cleanup. This patch series is OK. In fact, I think you should consider further changes to move things from arm_mve.h inside the pragma as pre-approved, once stage 1 re-opens, and provided they don't need to go significantly beyond the type of changes needed here. One thing we should perhaps consider in future (ie not right now) is whether we really need poly types in the Arm back-end code. Patch 8 contains: + machine_mode memory_vector_mode (const function_instance &fi) const override + { +poly_uint64 nunits = GET_MODE_NUNITS (fi.vector_mode (0)); +return arm_mve_data_mode (m_to_int_mode, nunits).require (); + } But since poly types on Arm should degenerate into simple host wide ints, this feels a bit like overkill. R. Thanks, Christophe Christophe Lyon (15): arm: [MVE intrinsics] add mode_after_pred helper in function_shape arm: [MVE intrinsics] add store_scatter_offset shape arm: [MVE intrinsics] rework vstr?q_scatter_offset arm: [MVE intrinsics] rework vstr_scatter_shifted_offset arm: [MVE intrinsics] Check immediate is a multiple in a range arm: [MVE intrinsics] Add store_scatter_base shape arm: [MVE intrinsics] rework vstr scatter_base arm: [MVE intrinsics] rework vstr scatter_base_wb arm: [MVE intrinsics] add load_ext_gather_offset shape arm: [MVE intrinsics] rework vldr gather_offset arm: [MVE intrinsics] rework vldr gather_shifted_offset arm: [MVE intrinsics] add load_gather_base shape arm: [MVE intrinsics] rework vldr gather_base arm: [MVE intrinsics] rework vldr gather_base_wb arm: [MVE intrinsics] remove useless call_properties implementations. gcc/config/arm/arm-builtins.cc| 146 - gcc/config/arm/arm-mve-builtins-base.cc | 279 +- gcc/config/arm/arm-mve-builtins-base.def | 28 + gcc/config/arm/arm-mve-builtins-base.h| 18 + gcc/config/arm/arm-mve-builtins-shapes.cc | 249 ++ gcc/config/arm/arm-mve-builtins-shapes.h |4 + gcc/config/arm/arm-mve-builtins.cc| 99 +- gcc/config/arm/arm-mve-builtins.h |5 + gcc/config/arm/arm_mve.h | 3096 ++--- gcc/config/arm/arm_mve_builtins.def | 122 - gcc/config/arm/iterators.md | 63 +- gcc/config/arm/mve.md | 2150 ++-- gcc/config/arm/unspecs.md | 78 +- .../mve/intrinsics/vldrdq_gather_base_s64.c |4 +- .../mve/intrinsics/vldrdq_gather_base_u64.c |4 +- .../intrinsics/vldrdq_gather_base_wb_s64.c|4 +- .../intrinsics/vldrdq_gather_base_wb_u64.c|4 +- 17 files changed, 1348 insertions(+), 5005 deletions(-)
Re: [PATCH v4 1/1] C++: Add flag to generate resolver at default version implementation.
Hi Richard, Thank you for the feedback! On 10/12/2024 18:49, Richard Sandiford wrote: -/* Returns true if DECL is multi-versioned using the target attribute, and this - is the default version. This function can only be used for targets that do - not support the "target_version" attribute. */ +/* Returns true if DECL could be a default version of a FMV function set + If TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_DECL is false, this is when DECL TARGET_CREATE_FMV_DISPATCHER_AT_DEFAULT_IMPL But could we key this off !TARGET_HAS_FMV_TARGET_ATTRIBUTE instead? At the moment that conditino is essentially "aarch64 semantics", and given the general complexity of FMV, it would be good to avoid creating more variations than we support. This function has always seemed like a bad compromise to me and is currently trying to answer two different questions. I don't think having it depend on TARGET_HAS_FMV_TARGET_ATTRIBUTE is necessarily any better? Both are false equivalences in my opinion. I think I would prefer to create a different function called "could_function_be_default_version" containing the different semantics which could more reasonably be conditioned on TARGET_HAS_FMV_TARGET_ATTRIBUTE. Then similarly, is_function_default_version can be changed back to working out if a function definitely is the default version. AIUI, we rely on get_function_versions_dispatcher to arrange this ordering. (Each target does it independently, sigh.) Can we rely on the hook's having been called at this point? That is, it seems like this code assumes that get_function_versions_dispatcher has already been called, whereas the code below calls get_function_versions_dispatcher on-the-fly when it sees the default version. Does something guarantee that the default node is analysed before all of the other versions? Ahh fair point. I am convinced we can at this point certainly, as this case is only triggered if this is the dispatched symbol, which is created by that function :) It is a good point in general and there may be other cases I've made the assumption that this is true. In my other patch, I change the logic for record_function_versions to implicitly ensure the default is always the first version in any fmv structure, which I think is a good change and I think I will split it off into a separate patch for both of these to rely on. The convention is for error messages to start with lower case and not to end with ".". Thank you! I will change this. There should be a test to exercise this codepath. Locally I tried: __attribute__((target_version("sve"))) int f() { return 2; } int g() { return f(); } but it seems to ICE in aarch64_get_function_versions_dispatcher. (Hope I didn't mess up the testing.) Oh dear, apologies. I think I had created tests for this in the C frontend as that was more my focus when creating this, and clearly didn't investigate the C++ versions thoroughly enough. I will copy my tests there over and make sure these paths and cases work correctly. As discussed off-list, this was a last-minute change that doesn't compile 🙂 The earlier: /* Mark the assembler name as set to prevent it getting mangled again .*/ SET_DECL_ASSEMBLER_NAME (dispatch_decl, DECL_ASSEMBLER_NAME (dispatch_decl)); is probably correct, but I think it'd be worth moving into make_dispatcher_decl, for !TARGET_HAS_FMV_TARGET_ATTRIBUTE. PRs 117987 and 117989 suggest that, independently of this patch, we seem to have some issues around FMV and assembler names. Yes apologies for the unforced error! I think I was in the wrong work tree or checkout. I agree it is better in make_dispatcher_decl but I was a little hesitant to change other back ends behaviour. Specifically I believe x86 exhibits the double mangling behaviour at the moment and was nervous to change that as it could be a breaking change for them. I would appreciate any advice from your experience with how to handle such things. Could you explain why this is needed in more detail? I assume the premise of the old code was that we didn't need to call cgraph_node::record_function_versions if both functions were already marked as versioned. Admittedly, I don't know why that seemed necessary, since cgraph_node::record_function_versions itself has an early-out. But it sounded like this hunk was needed for correctness, rather than simply being a clean-up. Basically just as you say. It seemed like this was an argument here to prevent the recording, and it was always calculated by checking if the arguments had already been recorded. But, the record_function_versions function does that exact check already. Additionally, as I could no longer rely on DECL_FUNCTION_VERSIONED to imply that a function had been recorded (as this patch means functions get marked as versioned much earlier (see the start_preparsed_function change)). Initially I created a new flag to replace this to imply if a function had been recorded, b
Re: [PATCH] SVE intrinsics: Fold svmul and svdiv by -1 to svneg for unsigned types
Jennifer Schmitz writes: > As follow-up to > https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665472.html, > this patch implements folding of svmul by -1 to svneg for > unsigned SVE vector types. The key idea is to reuse the existing code that > does this fold for signed types and feed it as callback to a helper function > that adds the necessary type conversions. > > For example, for the test case > svuint64_t foo (svuint64_t x, svbool_t pg) > { > return svmul_n_u64_x (pg, x, -1); > } > > the following gimple sequence is emitted (-O2 -mcpu=grace): > svuint64_t foo (svuint64_t x, svbool_t pg) > { > svint64_t D.12921; > svint64_t D.12920; > svuint64_t D.12919; > > D.12920 = VIEW_CONVERT_EXPR(x); > D.12921 = svneg_s64_x (pg, D.12920); > D.12919 = VIEW_CONVERT_EXPR(D.12921); > goto ; > : > return D.12919; > } > > In general, the new helper gimple_folder::convert_and_fold > - takes a target type and a function pointer, > - converts the lhs and all non-boolean vector types to the target type, > - passes the converted lhs and arguments to the callback, > - receives the new gimple statement from the callback function, > - adds the necessary view converts to the gimple sequence, > - and returns the new call. > > Because all arguments are converted to the same target types, the helper > function is only suitable for folding calls whose arguments are all of > the same type. If necessary, this could be extended to convert the > arguments to different types differentially. > > The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. > OK for mainline? > > Signed-off-by: Jennifer Schmitz > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve-builtins-base.cc > (svmul_impl::fold): Wrap code for folding to svneg in lambda > function and pass to gimple_folder::convert_and_fold to enable > the transform for unsigned types. > * config/aarch64/aarch64-sve-builtins.cc > (gimple_folder::convert_and_fold): New function that converts > operands to target type before calling callback function, adding the > necessary conversion statements. > * config/aarch64/aarch64-sve-builtins.h > (gimple_folder::convert_and_fold): Declare function. > (signed_type_suffix_index): Return type_suffix_index of signed > vector type for given width. > (function_instance::signed_type): Return signed vector type for > given width. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Adjust expected outcome. > * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. > * gcc.target/aarch64/sve/acle/asm/mul_u64.c: New test and adjust > expected outcome. > --- > .../aarch64/aarch64-sve-builtins-base.cc | 70 +-- > gcc/config/aarch64/aarch64-sve-builtins.cc| 43 > gcc/config/aarch64/aarch64-sve-builtins.h | 31 > .../gcc.target/aarch64/sve/acle/asm/mul_u16.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u32.c | 5 +- > .../gcc.target/aarch64/sve/acle/asm/mul_u64.c | 26 ++- > .../gcc.target/aarch64/sve/acle/asm/mul_u8.c | 7 +- > 7 files changed, 153 insertions(+), 34 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > index 87e9909b55a..52401a8c57a 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -2092,33 +2092,61 @@ public: >return f.fold_active_lanes_to (build_zero_cst (TREE_TYPE (f.lhs))); > > /* If one of the operands is all integer -1, fold to svneg. */ > -tree pg = gimple_call_arg (f.call, 0); > -tree negated_op = NULL; > -if (integer_minus_onep (op2)) > - negated_op = op1; > -else if (integer_minus_onep (op1)) > - negated_op = op2; > -if (!f.type_suffix (0).unsigned_p && negated_op) > +if (integer_minus_onep (op1) || integer_minus_onep (op2)) >{ > - function_instance instance ("svneg", functions::svneg, > - shapes::unary, MODE_none, > - f.type_suffix_ids, GROUP_none, f.pred); > - gcall *call = f.redirect_call (instance); > - unsigned offset_index = 0; > - if (f.pred == PRED_m) > + auto mul_by_m1 = [](gimple_folder &f, tree lhs_conv, > + vec &args_conv) -> gimple * > { > - offset_index = 1; > - gimple_call_set_arg (call, 0, op1); > - } > - else > - gimple_set_num_ops (call, 5); > - gimple_call_set_arg (call, offset_index, pg); > - gimple_call_set_arg (call, offset_index + 1, negated_op); > - return call; > + gcc_assert (lhs_conv && args_conv.length () == 3); > + tree pg = args_conv[0]; > + tree op1 = args_conv[1]; > + tree op2 = args_conv[2]; > + tree
[PATCH] ifcombine field-merge: set upper bound for get_best_mode
On Dec 11, 2024, Richard Biener wrote: > I think These 0, 0 args are supposed to indicate Maximum extent of the > resulting Access Thanks, that looks much better indeed. A bootstrap on aarch64-linux-gnu revealed that sometimes (for example, when building shorten_branches in final.cc) we will find such things as MEM , where unsigned int happens to be a variant of the original unsigned int type, but with 64-bit alignment. This unusual alignment circumstance caused (i) get_inner_reference to not look inside the MEM, (ii) get_best_mode to choose DImode instead of SImode to access the object, so we built a BIT_FIELD_REF that attempted to select all 64 bits of a 32-bit object, and that failed gimple verification ("position plus size exceeds size of referenced object") because there aren't that many bits in the unsigned int object. This patch avoids this failure mode by limiting the bitfield range with the size of the inner object, if it is a known constant. This enables us to avoid creating a BIT_FIELD_REF and reusing the load expr, but we still introduced a separate load, that would presumably get optimized out, but that is easy enough to avoid in the first place by reusing the SSA_NAME it was originally loaded into, so I implemented that in make_bit_field_load. Regstrapped on x86_64-linux-gnu; tested that it fixes the known issue on aarch64-linux-gnu, regstrapping now. Ok to install? for gcc/ChangeLog * gimple-fold.cc (fold_truth_andor_for_ifcombine): Limit the size of the bitregion in get_best_mode calls by the inner object's type size, if known. (make_bit_field_load): Reuse SSA_NAME if we're attempting to issue an identical load. --- gcc/gimple-fold.cc | 52 ++-- 1 file changed, 34 insertions(+), 18 deletions(-) diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index a31fc283d51b0..9179010c9eaf1 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -7751,6 +7751,15 @@ make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type, if (!point) return ref; + /* If we're remaking the same load, reuse the SSA NAME it is already loaded + into. */ + if (gimple_assign_load_p (point) + && operand_equal_p (ref, gimple_assign_rhs1 (point))) +{ + gcc_checking_assert (TREE_CODE (gimple_assign_lhs (point)) == SSA_NAME); + return gimple_assign_lhs (point); +} + gimple_seq stmts = NULL; tree ret = force_gimple_operand (ref, &stmts, true, NULL_TREE); @@ -8204,24 +8213,27 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, to be relative to a field of that size. */ first_bit = MIN (ll_bitpos, rl_bitpos); end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize); - if (get_best_mode (end_bit - first_bit, first_bit, 0, 0, -TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD, -volatilep, &lnmode)) + HOST_WIDE_INT ll_align = TYPE_ALIGN (TREE_TYPE (ll_inner)); + poly_uint64 ll_end_region = 0; + if (TYPE_SIZE (TREE_TYPE (ll_inner)) + && uniform_integer_cst_p (TYPE_SIZE (TREE_TYPE (ll_inner +ll_end_region = tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (ll_inner))); + if (get_best_mode (end_bit - first_bit, first_bit, 0, ll_end_region, +ll_align, BITS_PER_WORD, volatilep, &lnmode)) l_split_load = false; else { /* Consider the possibility of recombining loads if any of the fields straddles across an alignment boundary, so that either part can be loaded along with the other field. */ - HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner)); HOST_WIDE_INT boundary = compute_split_boundary_from_align - (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); + (ll_align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize); if (boundary < 0 - || !get_best_mode (boundary - first_bit, first_bit, 0, 0, -align, BITS_PER_WORD, volatilep, &lnmode) - || !get_best_mode (end_bit - boundary, boundary, 0, 0, -align, BITS_PER_WORD, volatilep, &lnmode2)) + || !get_best_mode (boundary - first_bit, first_bit, 0, ll_end_region, +ll_align, BITS_PER_WORD, volatilep, &lnmode) + || !get_best_mode (end_bit - boundary, boundary, 0, ll_end_region, +ll_align, BITS_PER_WORD, volatilep, &lnmode2)) return 0; /* If we can't have a single load, but can with two, figure out whether @@ -8368,16 +8380,19 @@ fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type, and then we use two separate compares. */ first_bit = MIN (lr_bitpos, rr_bitpos); end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize); - if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0, - TYPE_ALIGN (TREE_TYPE (lr_inner
Re: [PATCH v2] libstdc++: Remove constraints on std::generator::promise_type::operator new
Jonathan Wakely writes: > Here's a v2 patch that adds the static assert messages and now also > tests allocation for member coroutines (with implicit and explicit > object parameters). > > [2. text/plain; patch.txt]... This revision looks good. Thanks! :-) Have a lovely day. -- Arsen Arsenović signature.asc Description: PGP signature
Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)
On 11/12/2024 21:41, Jonathan Wakely wrote: Hmm, the warning is useful if somebody does: std::span s({1,2,3}); So I think we do want the warning, but maybe give it a special case so it doesn't warn for an rvalue std::span. Do you have anything in mind to make this work? Note that the above is akin to: std::vector getVector(); std::span sp = getVector(); which is also similar to: std::string getString(); std::string_view sv = getString(); and none of these warn at the moment. Thank you, -- Giuseppe D'Angelo smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH] c++: Make sure fold_sizeof_expr returns the correct type [PR117775]
Hi, On 29 Nov 2024, at 15:33, Simon Martin wrote: > We currently ICE upon the following code, that is valid under > -Wno-pointer-arith: > > === cut here === > int main() { > decltype( [](auto) { return sizeof(void); } ) x; > return x.operator()(0); > } > === cut here === > > The problem is that "fold_sizeof_expr (sizeof(void))" returns > size_one_node, that has a different TREE_TYPE from that of the sizeof > expression, which later triggers an assert in > cxx_eval_store_expression. > > This patch makes sure that fold_sizeof_expr always returns a tree with > the type requested. > > Successfully tested on x86_64-pc-linux-gnu. Ping. Thanks! Simon > > PR c++/117775 > > gcc/cp/ChangeLog: > > * decl.cc (fold_sizeof_expr): Make sure the folded result has > the requested TREE_TYPE. > > gcc/testsuite/ChangeLog: > > * g++.dg/cpp2a/constexpr-117775.C: New test. > > --- > gcc/cp/decl.cc| 1 + > gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C | 13 + > 2 files changed, 14 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C > > diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc > index 80485f0a428..fbe1407a2d2 100644 > --- a/gcc/cp/decl.cc > +++ b/gcc/cp/decl.cc > @@ -11686,6 +11686,7 @@ fold_sizeof_expr (tree t) > false, false); >if (r == error_mark_node) > r = size_one_node; > + r = cp_fold_convert (TREE_TYPE (t), r); >return r; > } > > diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C > b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C > new file mode 100644 > index 000..59fc0d332b9 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-117775.C > @@ -0,0 +1,13 @@ > +// PR c++/117775 > +// Check that we don't ICE and have sizeof(void)==1 under > -Wno-pointer-arith > +// { dg-do run { target c++20 } } > +// { dg-additional-options "-Wno-pointer-arith" } > + > +int main() { > + struct why : > +decltype( [](auto) { > + return sizeof(void); > + }) > + {} x; > + return 1 - x.operator()(0); > +} > -- > 2.44.0
[PATCH] RISC-V: Fix compress shuffle pattern [PR117383].
Hi, this patch makes vcompress use the tail-undisturbed policy by default and also uses the proper VL. Regtested on rv64gcv and waiting for the CI. Regards Robin PR target/117383 gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Use TU policy. * config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c: Expect tu. * gcc.target/riscv/rvv/autovec/pr117383.c: New test. --- gcc/config/riscv/riscv-protos.h | 2 +- gcc/config/riscv/riscv-v.cc | 3 +- .../rvv/autovec/binop/vcompress-avlprop-1.c | 2 +- .../gcc.target/riscv/rvv/autovec/pr117383.c | 48 +++ 4 files changed, 52 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 98af41c6e74..ccbb3f3d397 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -511,7 +511,7 @@ enum insn_type : unsigned int COMPRESS_OP = __NORMAL_OP_TA2 | BINARY_OP_P, /* has merge operand but use ta. */ COMPRESS_OP_MERGE - = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P | BINARY_OP_P, + = HAS_DEST_P | HAS_MERGE_P | TU_POLICY_P | BINARY_OP_P, /* For vslideup.up has merge operand but use ta. */ SLIDEUP_OP_MERGE = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 160a415c371..5c14c77068f 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3391,7 +3391,8 @@ shuffle_compress_patterns (struct expand_vec_perm_d *d) insn_code icode = code_for_pred_compress (vmode); rtx ops[] = {d->target, merge, d->op0, mask}; - emit_vlmax_insn (icode, COMPRESS_OP_MERGE, ops); + emit_nonvlmax_insn (icode, COMPRESS_OP_MERGE, ops, + gen_int_mode (vlen, Pmode)); return true; } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c index 3654b03e8ed..98e53b38f09 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c @@ -11,7 +11,7 @@ struct s sss[MAX]; /* ** build_linked_list: ** ... -** vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*ta,\s*ma +** vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*tu,\s*ma ** ... ** vcompress\.vm\s+v[0-9]+,\s*v[0-9]+,\s*v0 ** ... diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c new file mode 100644 index 000..c01612f2902 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-require-effective-target "riscv_v_ok" } */ +/* { dg-add-options "riscv_v" } */ +/* { dg-additional-options "-std=c99 -mrvv-vector-bits=zvl" } */ + +typedef signed char int8_t; +typedef int8_t vnx64i __attribute__ ((vector_size (64))); + +#define MASK_64 \ + 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \ +37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \ +82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \ +100, 101, 102, 103, 104, 105, 106, 107 + +void __attribute__ ((noipa)) +test_1 (int8_t *x, int8_t *y, int8_t *out) +{ + vnx64i v1 = *(vnx64i *) x; + vnx64i v2 = *(vnx64i *) y; + vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); + *(vnx64i *) out = v3; +} + +int +main (void) +{ + int8_t x[64]; + int8_t y[64]; + int8_t out[64]; + + for (int i = 0; i < 64; i++) +{ + x[i] = -i; + y[i] = i; +} + + test_1 (x, y, out); + + int mask[] = {MASK_64}; +#pragma GCC novector + for (int i = 0; i < 64; i++) +{ + int idx = mask[i] < 64 ? mask[i] : mask[i] - 64; + int ref = mask[i] < 64 ? x[idx] : y[idx]; + if (ref != out[i]) +__builtin_abort (); +} +} -- 2.47.1
Re: [PATCH] RISC-V: Fix compress shuffle pattern [PR117383].
On Wed, 11 Dec 2024 13:29:29 PST (-0800), Robin Dapp wrote: Hi, this patch makes vcompress use the tail-undisturbed policy by default and also uses the proper VL. Regtested on rv64gcv and waiting for the CI. Regards Robin PR target/117383 gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Use TU policy. * config/riscv/riscv-v.cc (shuffle_compress_patterns): Set VL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c: Expect tu. * gcc.target/riscv/rvv/autovec/pr117383.c: New test. --- gcc/config/riscv/riscv-protos.h | 2 +- gcc/config/riscv/riscv-v.cc | 3 +- .../rvv/autovec/binop/vcompress-avlprop-1.c | 2 +- .../gcc.target/riscv/rvv/autovec/pr117383.c | 48 +++ 4 files changed, 52 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 98af41c6e74..ccbb3f3d397 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -511,7 +511,7 @@ enum insn_type : unsigned int COMPRESS_OP = __NORMAL_OP_TA2 | BINARY_OP_P, /* has merge operand but use ta. */ COMPRESS_OP_MERGE - = HAS_DEST_P | HAS_MERGE_P | TDEFAULT_POLICY_P | BINARY_OP_P, + = HAS_DEST_P | HAS_MERGE_P | TU_POLICY_P | BINARY_OP_P, /* For vslideup.up has merge operand but use ta. */ SLIDEUP_OP_MERGE = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P This one also has a comment saying "but use ta" with TDEFAULT_POLICY_P, IIUC one of those must be wrong too? diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 160a415c371..5c14c77068f 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3391,7 +3391,8 @@ shuffle_compress_patterns (struct expand_vec_perm_d *d) insn_code icode = code_for_pred_compress (vmode); rtx ops[] = {d->target, merge, d->op0, mask}; - emit_vlmax_insn (icode, COMPRESS_OP_MERGE, ops); + emit_nonvlmax_insn (icode, COMPRESS_OP_MERGE, ops, + gen_int_mode (vlen, Pmode)); return true; } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c index 3654b03e8ed..98e53b38f09 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vcompress-avlprop-1.c @@ -11,7 +11,7 @@ struct s sss[MAX]; /* ** build_linked_list: ** ... -** vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*ta,\s*ma +** vsetivli\s+zero,\s*8,\s*e64,\s*m1,\s*tu,\s*ma ** ... ** vcompress\.vm\s+v[0-9]+,\s*v[0-9]+,\s*v0 ** ... diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c new file mode 100644 index 000..c01612f2902 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr117383.c @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-require-effective-target "riscv_v_ok" } */ +/* { dg-add-options "riscv_v" } */ +/* { dg-additional-options "-std=c99 -mrvv-vector-bits=zvl" } */ + +typedef signed char int8_t; +typedef int8_t vnx64i __attribute__ ((vector_size (64))); + +#define MASK_64 \ + 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \ +37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \ +82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \ +100, 101, 102, 103, 104, 105, 106, 107 + +void __attribute__ ((noipa)) +test_1 (int8_t *x, int8_t *y, int8_t *out) +{ + vnx64i v1 = *(vnx64i *) x; + vnx64i v2 = *(vnx64i *) y; + vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); + *(vnx64i *) out = v3; +} + +int +main (void) +{ + int8_t x[64]; + int8_t y[64]; + int8_t out[64]; + + for (int i = 0; i < 64; i++) +{ + x[i] = -i; + y[i] = i; +} + + test_1 (x, y, out); + + int mask[] = {MASK_64}; +#pragma GCC novector + for (int i = 0; i < 64; i++) +{ + int idx = mask[i] < 64 ? mask[i] : mask[i] - 64; + int ref = mask[i] < 64 ? x[idx] : y[idx]; + if (ref != out[i]) +__builtin_abort (); +} +} -- 2.47.1
Re: [PATCH] v2: Allow limited extended asm at toplevel [PR41045]
On Dez 05 2024, Jakub Jelinek wrote: > --- gcc/testsuite/c-c++-common/toplevel-asm-1.c.jj2024-11-01 > 15:09:46.209708353 +0100 > +++ gcc/testsuite/c-c++-common/toplevel-asm-1.c 2024-11-01 > 19:25:47.512567789 +0100 > @@ -0,0 +1,25 @@ > +/* PR c/41045 */ > +/* { dg-do compile } */ > +/* { dg-options "-O0" } */ > +/* { dg-additional-options "-fno-pie" { target pie } } */ > + > +struct S { char a; long long b; int c; }; > +enum E { E0, E1 = sizeof (struct S) + 15 }; > +int v[42]; > +void foo (void) {} > + > +asm ("# %0 %1 %2 %c3 %c4 %5 %% %=" > + :: "i" (sizeof (struct S)), > + "i" (__builtin_offsetof (struct S, c)), > + "i" (E1), > + "s" (foo), > + "i" (v), > +/* Not all targets can satisfy "m" even in non-pic code. */ > +#if !defined(__i386__) && !defined(__x86_64__) > + "s" (v)); > +#else > + "m" (v)); > +asm ("# %0 %1" > + : "=m" (v[16]) > + : "m" (v[41])); > +#endif That fails on riscv64: .../toplevel-asm-1.c:11:1: error: invalid 'asm': invalid use of '%c' .../toplevel-asm-1.c:11:1: error: invalid 'asm': invalid use of '%c' -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[pushed][PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges
The following patch is a better solution for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116778 The patch was successfully tested and bootstrapped on x86-64, aarch64, ppc64le. commit fca0ab08cd936464b152e9b45356f625eba27575 Author: Vladimir N. Makarov Date: Wed Dec 11 15:36:21 2024 -0500 [PR116778][LRA]: Check pseudos assigned to FP after rematerialization to build live ranges This is a better fix of the PR permitting to avoid building live ranges after rematerialization. It checks that FP can not be eliminated now and that pseudos assigned to FP will be spilled. In this case we need to build live ranges after rematerialization for correct assignments of stack slots to spilled pseudos involved in rematerialization. gcc/ChangeLog: PR rtl-optimization/116778 * ira-int.h (x_ira_class_hard_reg_index): Fix comment typo. * lra-eliminations.cc (lra_fp_pseudo_p): New function. * lra-int.h (lra_fp_pseudo_p): External declaration. * lra-spills.cc (lra_need_for_spills_p): Fix formatting. * lra.cc (lra): Use lra_fp_pseudo_p in lra_create_live_range after lra_remat. diff --git a/gcc/ira-int.h b/gcc/ira-int.h index 8c3c5941de5..5ce930b6d22 100644 --- a/gcc/ira-int.h +++ b/gcc/ira-int.h @@ -868,7 +868,7 @@ public: /* Index (in ira_class_hard_regs; for given register class and hard register (in general case a hard register can belong to several - register classes;. The index is negative for hard registers + register classes). The index is negative for hard registers unavailable for the allocation. */ short x_ira_class_hard_reg_index[N_REG_CLASSES][FIRST_PSEUDO_REGISTER]; diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index 96772f2904a..5343d8c5102 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -1428,6 +1428,25 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) return n; } +/* Return true if we have a pseudo assigned to hard frame pointer. */ +bool +lra_fp_pseudo_p (void) +{ + HARD_REG_SET set; + + if (frame_pointer_needed) +/* At this stage it means we have no pseudos assigned to FP: */ +return false; + CLEAR_HARD_REG_SET (set); + add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); + for (int i = FIRST_PSEUDO_REGISTER; i < max_reg_num (); i++) +if (lra_reg_info[i].nrefs != 0 && reg_renumber[i] >= 0 + && overlaps_hard_reg_set_p (set, PSEUDO_REGNO_MODE (i), +reg_renumber[i])) + return true; + return false; +} + /* Entry function to do final elimination if FINAL_P or to update elimination register offsets (FIRST_P if we are doing it the first time). */ diff --git a/gcc/lra-int.h b/gcc/lra-int.h index 5f605c3ae41..abee008e642 100644 --- a/gcc/lra-int.h +++ b/gcc/lra-int.h @@ -419,6 +419,7 @@ extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode, bool, bool, poly_int64, bool); extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64); extern int lra_update_fp2sp_elimination (int *spilled_pseudos); +extern bool lra_fp_pseudo_p (void); extern void lra_eliminate (bool, bool); extern poly_int64 lra_update_sp_offset (rtx, poly_int64); diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc index 3f5c8d2bcb0..6e9a8c3e34e 100644 --- a/gcc/lra-spills.cc +++ b/gcc/lra-spills.cc @@ -594,8 +594,9 @@ lra_need_for_scratch_reg_p (void) bool lra_need_for_spills_p (void) { - int i; max_regno = max_reg_num (); + int i; + max_regno = max_reg_num (); for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++) if (lra_reg_info[i].nrefs != 0 && lra_get_regno_hard_regno (i) < 0 && ! ira_former_scratch_p (i)) diff --git a/gcc/lra.cc b/gcc/lra.cc index 6b740ed2325..55737deba3f 100644 --- a/gcc/lra.cc +++ b/gcc/lra.cc @@ -2555,8 +2555,11 @@ lra (FILE *f, int verbose) rematerialize them first. */ if (lra_remat ()) { - /* We need full live info -- see the comment above. */ - lra_create_live_ranges (true, true); + /* We need full live info -- see the comment above. We also might + need live info if we have a pseudo assigned to hard frame pointer + reg and will need FP for usual purposes. */ + lra_create_live_ranges (lra_reg_spill_p || lra_fp_pseudo_p (), + true); live_p = true; if (! lra_need_for_spills_p ()) {
Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)
On Tue, 3 Dec 2024 at 16:07, Giuseppe D'Angelo wrote: > > Hello, > > The attached patch adds the span(initializer_list) constructor, added by > P2447R6 for C++26. > > It seems that GCC is somewhat aggressive with its -Winit-list-lifetime > warning, which actively interferes with this feature. The idea of the > new constructor is to allow calls like: > >void f(std::span); >f({1, 2, 3}); > > which is completely fine as the lifetime of the initializer_list > encompasses the one of the std::span parameter. However GCC complains > about the risk of dangling here. I've therefore disabled the warning for > the new constructor. Hmm, the warning is useful if somebody does: std::span s({1,2,3}); So I think we do want the warning, but maybe give it a special case so it doesn't warn for an rvalue std::span. Alternatively, we'd want that constructor to be marked with the opposite of [[nodiscard]], saying to warn if it *isn't* discarded!
[PATCH] MAINTAINERS: add myself to write after approval
ChangeLog: * MAINTAINERS: Add myself to write after approval. --- MAINTAINERS | 1 + 1 file changed, 1 insertion(+) diff --git a/MAINTAINERS b/MAINTAINERS index 7d65ed64bdd..44153a7a51e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -630,6 +630,7 @@ Sa Liu saliu Ralph Loaderralph Sheldon Lobosmlobo Gabor Loki loki +Matthieu Longo mlongo Sandra Loosemoresandra Manuel López-Ibáñez manu Carl Love carll -- 2.47.1
Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)
On 03/12/24 17:05 +0100, Giuseppe D'Angelo wrote: Hello, The attached patch adds the span(initializer_list) constructor, added by P2447R6 for C++26. It seems that GCC is somewhat aggressive with its -Winit-list-lifetime warning, which actively interferes with this feature. The idea of the new constructor is to allow calls like: void f(std::span); f({1, 2, 3}); which is completely fine as the lifetime of the initializer_list encompasses the one of the std::span parameter. However GCC complains about the risk of dangling here. I've therefore disabled the warning for the new constructor. Thanks, -- Giuseppe D'Angelo From bb1d537ee7b68883403127903834427c6787c505 Mon Sep 17 00:00:00 2001 From: Giuseppe D'Angelo Date: Tue, 3 Dec 2024 16:56:45 +0100 Subject: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447) This commit implements P2447R6. The code is straightforward (just one extra constructor, with constraints and conditional explicit). I decided to suppress -Winit-list-lifetime because otherwise it would give too many false positives. The new constructor is meant to be used as a parameter-passing interface (this is a design choice, see P2447R6/2) and, as such, the initializer_list won't dangle despite GCC's warnings. The new constructor isn't 100% backwards compatible. A couple of examples are included in Annex C, but I have also lifted some more from R4. A new test checks for the old and the new behaviors. libstdc++-v3/ChangeLog: * include/bits/version.def: Added the new feature-testing macro. * include/bits/version.h (defined): Regenerated. * include/std/span: Added constructor from initializer_list. * testsuite/23_containers/span/init_list_cons.cc: New test. * testsuite/23_containers/span/init_list_cons_neg.cc: New test. --- libstdc++-v3/include/bits/version.def | 8 +++ libstdc++-v3/include/bits/version.h | 10 +++ libstdc++-v3/include/std/span | 21 ++ .../23_containers/span/init_list_cons.cc | 65 +++ .../23_containers/span/init_list_cons_neg.cc | 31 + 5 files changed, 135 insertions(+) create mode 100644 libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc create mode 100644 libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index 8d4b8e9b383..cfa0469fb2d 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1853,6 +1853,14 @@ ftms = { }; }; +ftms = { + name = span_initializer_list; + values = { +v = 202311; +cxxmin = 26; + }; +}; + ftms = { name = text_encoding; values = { diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h index c556aca38fa..6a2c66bdf81 100644 --- a/libstdc++-v3/include/bits/version.h +++ b/libstdc++-v3/include/bits/version.h @@ -2055,6 +2055,16 @@ #endif /* !defined(__cpp_lib_saturation_arithmetic) && defined(__glibcxx_want_saturation_arithmetic) */ #undef __glibcxx_want_saturation_arithmetic +#if !defined(__cpp_lib_span_initializer_list) +# if (__cplusplus > 202302L) +# define __glibcxx_span_initializer_list 202311L +# if defined(__glibcxx_want_all) || defined(__glibcxx_want_span_initializer_list) +# define __cpp_lib_span_initializer_list 202311L +# endif +# endif +#endif /* !defined(__cpp_lib_span_initializer_list) && defined(__glibcxx_want_span_initializer_list) */ +#undef __glibcxx_want_span_initializer_list + #if !defined(__cpp_lib_text_encoding) # if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && (_GLIBCXX_USE_NL_LANGINFO_L) # define __glibcxx_text_encoding 202306L diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span index f1c19b58737..b84ac87b657 100644 --- a/libstdc++-v3/include/std/span +++ b/libstdc++-v3/include/std/span @@ -39,6 +39,7 @@ #endif #define __glibcxx_want_span +#define __glibcxx_want_span_initializer_list #include #ifdef __cpp_lib_span // C++ >= 20 && concepts @@ -46,6 +47,9 @@ #include #include #include +#ifdef __cpp_lib_span_initializer_list +# include +#endif namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -226,6 +230,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } +#if __cpp_lib_span_initializer_list >= 202311L +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-list-lifetime" + constexpr + explicit(extent != dynamic_extent) + span(initializer_list __il) + requires (is_const_v<_Type>) + : _M_extent(__il.size()), _M_ptr(__il.begin()) These mem-initializers are in the wrong order (we had an existing constructor with the same problem, but I pushed a fix less than an hour ago). + { + if constexpr (extent != dynamic_extent) + { + __glibcxx_assert(__il.size() == extent); It's just occurred to me that this check should be done in
Re: Should -fsanitize=bounds support counted-by attribute for pointers inside a structure?
On Thu, Dec 5, 2024 at 2:28 PM Bill Wendling wrote: > On Thu, Dec 5, 2024 at 1:09 PM Qing Zhao wrote: > > > On Dec 3, 2024, at 10:29, Qing Zhao wrote: > > >> On Dec 3, 2024, at 10:07, Martin Uecker wrote: > > >> The language extension does not exist yet, so there is no problem. > > > Yeah, I should mention this as “corresponding future language extension” > > > -:) > > >> > > >> But I hope we will get it and then specify it so that this works > > >> correctly without this footgun. > > >> > > >> Of course, if GCC gets the "counted_by" attribute wrong, there will > > >> be arguments later in WG14 why the feature is then different to it. > > > > > > I think that we need to resolve this issue first in the design of > > > “counted_by” for pointer fields. > > > I guess that we might need to come up with some additional limitations > > > for using the “counted_by” > > > attribute for pointer fields at the source code level in order to avoid > > > such potential error. But not > > > sure what exactly the additional limitation should be at this moment. > > > > > > Need some study here. > > > > Actually, I found out that this is really not a problem with the current > > design, for the following new testing case I added for my current > > implementation of the counted_by for pointer field: > > > > [ gcc.dg]$ cat pointer-counted-by-7.c > > /* Test the attribute counted_by for pointer field and its usage in > > * __builtin_dynamic_object_size. */ > > /* { dg-do run } */ > > /* { dg-options "-O2" } */ > > > > #include "builtin-object-size-common.h" > > > > struct annotated { > > int b; > > int *c __attribute__ ((counted_by (b))); > > }; > > > > struct annotated *__attribute__((__noinline__)) setup (int attr_count) > > { > > struct annotated *p_array_annotated > > = (struct annotated *) malloc (sizeof (struct annotated)); > > p_array_annotated->c = (int *) malloc (sizeof (int) * attr_count); > > p_array_annotated->b = attr_count; > > > > return p_array_annotated; > > } > > > > > > int main(int argc, char *argv[]) > > { > > struct annotated *x = setup (10); > > int *p = x->c; > > x = setup (20); > > EXPECT(__builtin_dynamic_object_size (p, 1), 10 * sizeof (int)); > > EXPECT(__builtin_dynamic_object_size (x->c, 1), 20 * sizeof (int)); > > DONE (); > > } > > > > This test case passed without any issue. > > > > Our previous introduction of the new internal function .ACCESS_WITH_SIZE > > already resolved this issue. > > > > So, I think that as long as the whole structure is set at the same time, > > should be fine. > > > > Let me know if you have more comment here. > > > Nice! I think this is going to be an issue with Clang's > implementation. I'll need to create our version of .ACCESS_WITH_SIZE. > It might end up simplifying some of the code. :-) So my assumption that our GetElementPointerInst instruction would "just work" in this case was wrong. By assigning x->c to p it loses pretty much all information, e.g. the __counted_by attribute. It looks like we'll have to go with a similar intrinsic to .ACCESS_WITH_SIZE. -bw
Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)
On Wed, 11 Dec 2024 at 21:18, Giuseppe D'Angelo wrote: > > On 11/12/2024 21:41, Jonathan Wakely wrote: > > Hmm, the warning is useful if somebody does: > > > > std::span s({1,2,3}); > > > > So I think we do want the warning, but maybe give it a special case so > > it doesn't warn for an rvalue std::span. > > Do you have anything in mind to make this work? It would need changes to the front end, but I think they'd be useful changes. > > Note that the above is akin to: > > std::vector getVector(); > std::span sp = getVector(); > > > which is also similar to: > > std::string getString(); > std::string_view sv = getString(); > > > and none of these warn at the moment. They should do though :-) If C++ had ref-qualified constructors then we could constrain the std::span(Range&&)& constructor to only work for borrowed_ranges but allow the std::span(Range&&)&& constructor to work for any ranges. We don't have that though.
Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support
> Difference with v3: Refresh > > Samuel Thibault (3): > ada: Factorize bsd signal definitions > ada: Fix GNU/Hurd priority range > ada: Add GNU/Hurd x86_64 support Thanks, this looks good to me. Do you need me to merge it? -- Eric Botcazou
[PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
Hi All, GCC 15 added two new fusions CMP+CSEL and CMP+CSET. This patch enables them for cores that support based on their Software Optimization Guides and generically on Armv9-A. Even if a core does not support it there's no negative performance impact. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE): New. * config/aarch64/tuning_models/cortexx925.h: Use it. * config/aarch64/tuning_models/generic_armv9_a.h: Use it. * config/aarch64/tuning_models/neoverse512tvb.h: Use it. * config/aarch64/tuning_models/neoversen2.h: Use it. * config/aarch64/tuning_models/neoversen3.h: Use it. * config/aarch64/tuning_models/neoversev1.h: Use it. * config/aarch64/tuning_models/neoversev2.h: Use it. * config/aarch64/tuning_models/neoversev3.h: Use it. * config/aarch64/tuning_models/neoversev3ae.h: Use it. --- diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def b/gcc/config/aarch64/aarch64-fusion-pairs.def index f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376df9ae7a196031286d 100644 --- a/gcc/config/aarch64/aarch64-fusion-pairs.def +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) /* Baseline fusion settings suitable for all cores. */ #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC) +/* Baseline fusion settings suitable for all Neoverse cores. */ +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | AARCH64_FUSE_CMP_CSEL \ + | AARCH64_FUSE_CMP_CSET) + #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK) diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h b/gcc/config/aarch64/tuning_models/cortexx925.h index eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496def277230e193d20f 100644 --- a/gcc/config/aarch64/tuning_models/cortexx925.h +++ b/gcc/config/aarch64/tuning_models/cortexx925.h @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = 2 /* store_pred. */ }, /* memmov_cost. */ 10, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h b/gcc/config/aarch64/tuning_models/generic_armv9_a.h index 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a15a5df7803ff4dc6c 100644 --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h @@ -236,7 +236,7 @@ static const struct tune_params generic_armv9_a_tunings = 1 /* store_pred. */ }, /* memmov_cost. */ 3, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h index c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596dbe2d7d4a3de9c08c 100644 --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -143,7 +143,7 @@ static const struct tune_params neoverse512tvb_tunings = 1 /* store_pred. */ }, /* memmov_cost. */ 3, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h index 18199ac206c6cbfcef8695b497401b78a8f77f38..47d861232951f92bebbd09fbca8b1ff8624949fc 100644 --- a/gcc/config/aarch64/tuning_models/neoversen2.h +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -205,7 +205,7 @@ static const struct tune_params neoversen2_tunings = 1 /* store_pred. */ }, /* memmov_cost. */ 5, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ "32:16", /* function_align. */ "4", /* jump_align. */ "32:16", /* loop_align. */ diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h b/gcc/config/aarch64/tuning_models/neoversen3.h index 4da85cfac0d185a5d59439f6d19d90ace0354e8f..356b5f8cc83b69e74ddac70db7a3883bc1b7a29b 100644 --- a/gcc/config/aarch64/tuning_models/neoversen3.h +++ b/gcc/config/aarch64/tuning_models/neoversen3.h @@ -205,7 +205,7 @@ static const struct tune_params neoversen3_tunings = 2 /* store_pred. */ }, /* memmov_cost. */ 5, /* issue_rate */ - AARCH64_FUSE_BASE, /* fusible_ops */ +
[PATCH 2/2]AArch64: Set L1 data cache size according to size on CPUs
Hi All, This sets the L1 data cache size for some cores based on their size in their Technical Reference Manuals. Today the port minimum is 256 bytes as explained in commit g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most cores actually define the L1 cache size as 64-bytes. The generic Armv9-A model was already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this change follows suite for a few other cores based on their TRMs. This results in less memory pressure when running on large core count machines. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen1.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise. --- diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h b/gcc/config/aarch64/tuning_models/cortexx925.h index b3ae1576ade1f701b775496def277230e193d20f..f32454aa808bb2a19bf4d364387ab463a87f9fbf 100644 --- a/gcc/config/aarch64/tuning_models/cortexx925.h +++ b/gcc/config/aarch64/tuning_models/cortexx925.h @@ -222,7 +222,7 @@ static const struct tune_params cortexx925_tunings = | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h b/gcc/config/aarch64/tuning_models/neoverse512tvb.h index f72505918f3aa64200aa596dbe2d7d4a3de9c08c..007f987154c4634afeb6294b0df142fdd05997cd 100644 --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h @@ -158,7 +158,7 @@ static const struct tune_params neoverse512tvb_tunings = (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h b/gcc/config/aarch64/tuning_models/neoversen1.h index 82def6b2736df8162d9b606440d260c951f3ef99..608e36936ad437befa9b8663bc4c91822b05befc 100644 --- a/gcc/config/aarch64/tuning_models/neoversen1.h +++ b/gcc/config/aarch64/tuning_models/neoversen1.h @@ -52,7 +52,7 @@ static const struct tune_params neoversen1_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model. */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h b/gcc/config/aarch64/tuning_models/neoversen2.h index 47d861232951f92bebbd09fbca8b1ff8624949fc..13d4640791ca998ca012002c932e8775b1898733 100644 --- a/gcc/config/aarch64/tuning_models/neoversen2.h +++ b/gcc/config/aarch64/tuning_models/neoversen2.h @@ -222,7 +222,7 @@ static const struct tune_params neoversen2_tunings = | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h b/gcc/config/aarch64/tuning_models/neoversen3.h index 356b5f8cc83b69e74ddac70db7a3883bc1b7a29b..ef15dc8c3015bb406a6957346cf47bacf305d884 100644 --- a/gcc/config/aarch64/tuning_models/neoversen3.h +++ b/gcc/config/aarch64/tuning_models/neoversen3.h @@ -221,7 +221,7 @@ static const struct tune_params neoversen3_tunings = | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT),/* tune_flags. */ - &generic_prefetch_tune, + &generic_armv9a_prefetch_tune, AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ }; diff --git a/gcc/co
Re: [PATCH] Fix some more is_trivial C++26 deprecation warnings
Hello, Giuseppe D'Angelo ha scritto: Hello, This patch fixes a C++26 deprecation warning in g++'s (not libstdc++) autotests. Please ignore; this has been independently fixed in 7aab1271b4afb6f3e9e8a01825fcb14750b29312 (thanks!). Thank you, -- Giuseppe D'Angelo smime.p7s Description: Firma crittografica S/MIME
[PATCH] c++: allow stores to anon union vars to change current union member in constexpr [PR117614]
Hi! Since r14-4771 the FE tries to differentiate between cases where the lhs of a store allows changing the current union member and cases where it doesn't, and cases where it doesn't includes everything that has gone through the cxx_eval_constant_expression path on the lhs. As the testcase shows, DECL_ANON_UNION_VAR_P vars were handled like that too, even when stores to them are the only way how to change the current union member in the sources. So, the following patch just handles that case manually without calling cxx_eval_constant_expression and without setting evaluated to true. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2024-12-11 Jakub Jelinek PR c++/117614 * constexpr.cc (cxx_eval_store_expression): For stores to DECL_ANON_UNION_VAR_P vars just continue with DECL_VALUE_EXPR of it, without setting evaluated to true or full cxx_eval_constant_expression. * g++.dg/cpp2a/constexpr-union8.C: New test. --- gcc/cp/constexpr.cc.jj 2024-12-06 09:08:20.977872404 +0100 +++ gcc/cp/constexpr.cc 2024-12-09 16:57:23.682959311 +0100 @@ -6418,6 +6418,14 @@ cxx_eval_store_expression (const constex object = probe; else { + tree pvar = tree_strip_any_location_wrapper (probe); + if (VAR_P (pvar) && DECL_ANON_UNION_VAR_P (pvar)) + { + /* Stores to DECL_ANON_UNION_VAR_P var are allowed to change +active union member. */ + probe = DECL_VALUE_EXPR (pvar); + break; + } probe = cxx_eval_constant_expression (ctx, probe, vc_glvalue, non_constant_p, overflow_p); evaluated = true; --- gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C.jj2024-12-09 17:06:02.876721019 +0100 +++ gcc/testsuite/g++.dg/cpp2a/constexpr-union8.C 2024-12-09 17:05:01.832572060 +0100 @@ -0,0 +1,31 @@ +// PR c++/117614 +// { dg-do compile { target c++20 } } + +constexpr int +foo () +{ + union { +int x{0}; +char y; + }; + y = 1; + return y; +} + +constexpr int +bar () +{ + union { +union { + int x{0}; + char y; +}; +long long z; + }; + y = 1; + z = 2; + return z; +} + +static_assert (foo () == 1); +static_assert (bar () == 2); Jakub
Re: [PATCHv4 0/3] ada: Add GNU/Hurd x86_64 support
Eric Botcazou, le mer. 11 déc. 2024 09:58:23 +0100, a ecrit: > > Difference with v3: Refresh > > > > Samuel Thibault (3): > > ada: Factorize bsd signal definitions > > ada: Fix GNU/Hurd priority range > > ada: Add GNU/Hurd x86_64 support > > Thanks, this looks good to me. Do you need me to merge it? Yes, please, I don't have commit access. Samuel
[PATCH] RISC-V: optimization by converting LUI operands with SMALL_AFTER_COMMON_TRAILING_SHIFT
The patch optimizes code generation for comparisons of the form X & C1 == C2. When the bitwise AND mask could fit in a 12-bit immediate operand of RISC-V "andi" instruction with help of right shifting. For example, C1 = 0x5550 and C2 = 0x1450. These numbers can be stored using 12 bits, which is advantageous in RISC-V, since instructions such as ANDI exist. By shifting all used values by 20 bits to the right, we can make use of the “and immediate” instruction, thus improving performance. 2024-12-11 Oliver Kozul PR target/114087 gcc/ChangeLog: * config/riscv/riscv.md (*lui_constraint_lshiftrt): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr114087-2.c: New test. CONFIDENTIALITY: The contents of this e-mail are confidential and intended only for the above addressee(s). If you are not the intended recipient, or the person responsible for delivering it to the intended recipient, copying or delivering it to anyone else or using it in any unauthorized manner is prohibited and may be unlawful. If you receive this e-mail by mistake, please notify the sender and the systems administrator at straym...@rt-rk.com immediately. --- gcc/config/riscv/riscv.md | 25 + gcc/testsuite/gcc.target/riscv/pr114087-2.c | 11 + 2 files changed, 36 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/pr114087-2.c diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 3a4cd1d93a0..310cbd5617e 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -858,6 +858,31 @@ [(set_attr "type" "arith") (set_attr "mode" "SI")]) +(define_insn_and_split "*lui_constraint_lshiftrt" + [(set (match_operand:ANYI 0 "register_operand" "=r") + (plus:ANYI (and:ANYI (match_operand:ANYI 1 "register_operand" "r") +(match_operand 2 "const_int_operand")) +(match_operand 3 "const_int_operand")))] + "!SMALL_OPERAND (INTVAL (operands[2])) + && !SMALL_OPERAND (INTVAL (operands[3])) + && SMALL_AFTER_COMMON_TRAILING_SHIFT (INTVAL (operands[2]), + INTVAL (operands[3]))" + "#" + "&& reload_completed" + [(set (match_dup 0) (lshiftrt:X (match_dup 1) (match_dup 4))) + (set (match_dup 0) (and:X (match_dup 0) (match_dup 5))) + (set (match_dup 0) (plus:X (match_dup 0) (match_dup 6)))] + { +HOST_WIDE_INT mask = INTVAL (operands[2]); +HOST_WIDE_INT val = INTVAL (operands[3]); +int trailing_shift = COMMON_TRAILING_ZEROS (mask, val); + +operands[4] = GEN_INT (trailing_shift); +operands[5] = GEN_INT (mask >> trailing_shift); +operands[6] = GEN_INT (val >> trailing_shift); + } +[(set_attr "type" "arith")]) + ;; ;; ;; diff --git a/gcc/testsuite/gcc.target/riscv/pr114087-2.c b/gcc/testsuite/gcc.target/riscv/pr114087-2.c new file mode 100644 index 000..1e158e8b722 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/pr114087-2.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */ +/* { dg-options "-march=rv64gc -mabi=lp64d" } */ + +int pred2a(int x) { + return ((x & 0x5550) == 0x1450); +} + +/* { dg-final { scan-assembler {srli\s*[a-x0-9]+,\s*[a-x0-9]+,\s*20}} } */ +/* { dg-final { scan-assembler {andi\s*[a-x0-9]+,\s*[a-x0-9]+,\s*1365}} } */ \ No newline at end of file -- 2.43.0
Re: rs6000: Add -msplit-patch-nops (PR112980)
Hello, even though it is not my work, I would like to ping this patch. Having it upstream would really help us a lot. Thank you very much in advance, Martin On Wed, Nov 13 2024, Michael Matz wrote: > Hello, > > this is essentially > > https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651025.html > > from Kewen in functionality. When discussing this with Segher at the > Cauldron he expressed reservations about changing the default > implementation of -fpatchable-function-entry. So, to move forward, let's > move it under a new target option -msplit-patch-nops (expressing the > important deviation from the default behaviour, namely that all the > patching nops form a consecutive sequence normally). > > Regstrapping on power9 ppc64le in progress. Okay if that passes? > > > Ciao, > Michael. > > --- > > as the bug report details some uses of -fpatchable-function-entry > aren't happy with the "before" NOPs being inserted between global and > local entry point on powerpc. We want the before NOPs be in front > of the global entry point. That means that the patching NOPs aren't > consecutive for dual entry point functions, but for these usecases > that's not the problem. But let us support both under the control > of a new target option: -msplit-patch-nops. > > gcc/ > > PR target/112980 > * config/rs6000/rs6000.opt (msplit-patch-nops): New option. > * doc/invoke.texi (RS/6000 and PowerPC Options): Document it. > * config/rs6000/rs6000.h (machine_function.stop_patch_area_print): > New member. > * config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry): > Emit split nops under control of that one. > * config/rs6000/rs6000-logue.cc (rs6000_output_function_prologue): > Add handling of split patch nops. > --- > gcc/config/rs6000/rs6000-logue.cc | 15 +-- > gcc/config/rs6000/rs6000.cc | 27 +++ > gcc/config/rs6000/rs6000.h| 6 ++ > gcc/config/rs6000/rs6000.opt | 4 > gcc/doc/invoke.texi | 17 +++-- > 5 files changed, 57 insertions(+), 12 deletions(-) > > diff --git a/gcc/config/rs6000/rs6000-logue.cc > b/gcc/config/rs6000/rs6000-logue.cc > index c87058b435e..aa1e0442f2b 100644 > --- a/gcc/config/rs6000/rs6000-logue.cc > +++ b/gcc/config/rs6000/rs6000-logue.cc > @@ -4005,8 +4005,8 @@ rs6000_output_function_prologue (FILE *file) > >unsigned short patch_area_size = crtl->patch_area_size; >unsigned short patch_area_entry = crtl->patch_area_entry; > - /* Need to emit the patching area. */ > - if (patch_area_size > 0) > + /* Emit non-split patching area now. */ > + if (!TARGET_SPLIT_PATCH_NOPS && patch_area_size > 0) > { > cfun->machine->global_entry_emitted = true; > /* As ELFv2 ABI shows, the allowable bytes between the global > @@ -4027,7 +4027,6 @@ rs6000_output_function_prologue (FILE *file) > patch_area_entry); > rs6000_print_patchable_function_entry (file, patch_area_entry, >true); > - patch_area_size -= patch_area_entry; > } > } > > @@ -4037,9 +4036,13 @@ rs6000_output_function_prologue (FILE *file) >assemble_name (file, name); >fputs ("\n", file); >/* Emit the nops after local entry. */ > - if (patch_area_size > 0) > - rs6000_print_patchable_function_entry (file, patch_area_size, > -patch_area_entry == 0); > + if (patch_area_size > patch_area_entry) > + { > + patch_area_size -= patch_area_entry; > + cfun->machine->stop_patch_area_print = false; > + rs6000_print_patchable_function_entry (file, patch_area_size, > + patch_area_entry == 0); > + } > } > >else if (rs6000_pcrel_p ()) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 950fd947fda..6427e6913ba 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -15226,11 +15226,25 @@ rs6000_print_patchable_function_entry (FILE *file, > { >bool global_entry_needed_p = rs6000_global_entry_point_prologue_needed_p > (); >/* For a function which needs global entry point, we will emit the > - patchable area before and after local entry point under the control of > - cfun->machine->global_entry_emitted, see the handling in function > - rs6000_output_function_prologue. */ > - if (!global_entry_needed_p || cfun->machine->global_entry_emitted) > + patchable area when it isn't split before and after local entry point > + under the control of cfun->machine->global_entry_emitted, see the > + handling in function rs6000_output_function_prologue. */ > + if (!TARGET_SPLIT_PATCH_NOPS > + && (!global_entry_needed_p || cfun->machine->global_entry_emitted)) >
Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
Tamar Christina writes: > Hi All, > > GCC 15 added two new fusions CMP+CSEL and CMP+CSET. > > This patch enables them for cores that support based on their Software > Optimization Guides and generically on Armv9-A. Even if a core does not > support it there's no negative performance impact. > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE): > New. > * config/aarch64/tuning_models/cortexx925.h: Use it. > * config/aarch64/tuning_models/generic_armv9_a.h: Use it. > * config/aarch64/tuning_models/neoverse512tvb.h: Use it. > * config/aarch64/tuning_models/neoversen2.h: Use it. > * config/aarch64/tuning_models/neoversen3.h: Use it. > * config/aarch64/tuning_models/neoversev1.h: Use it. > * config/aarch64/tuning_models/neoversev2.h: Use it. > * config/aarch64/tuning_models/neoversev3.h: Use it. > * config/aarch64/tuning_models/neoversev3ae.h: Use it. > > --- > > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def > b/gcc/config/aarch64/aarch64-fusion-pairs.def > index > f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376df9ae7a196031286d > 100644 > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) > /* Baseline fusion settings suitable for all cores. */ > #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC) > > +/* Baseline fusion settings suitable for all Neoverse cores. */ > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | > AARCH64_FUSE_CMP_CSEL \ > + | AARCH64_FUSE_CMP_CSET) > + > #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_MOVK_MOVK) > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h > b/gcc/config/aarch64/tuning_models/cortexx925.h > index > eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496def277230e193d20f > 100644 > --- a/gcc/config/aarch64/tuning_models/cortexx925.h > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = > 2 /* store_pred. */ >}, /* memmov_cost. */ >10, /* issue_rate */ > - AARCH64_FUSE_BASE, /* fusible_ops */ > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >"32:16", /* function_align. */ >"4", /* jump_align. */ >"32:16", /* loop_align. */ > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > index > 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a15a5df7803ff4dc6c > 100644 > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > @@ -236,7 +236,7 @@ static const struct tune_params generic_armv9_a_tunings = > 1 /* store_pred. */ >}, /* memmov_cost. */ >3, /* issue_rate */ > - AARCH64_FUSE_BASE, /* fusible_ops */ > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >"32:16", /* function_align. */ >"4", /* jump_align. */ >"32:16", /* loop_align. */ Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think we have to be careful about using it for generic tuning. generic-armv9-a mustn't become "generic Neoverse", since it's supposed to be good for non-Neoverse (and non-Arm) Armv9-A cores as well. So perhaps here we should expand the macro. Alternatively, we could add a comment saying that fusion macros for other CPUs can be added here as well, if they are unlikely to have a negative impact on other cores. Perhaps we should expand the macro for cortex-x925 as well. LGTM otherwise, but let's see whether there are any other comments. Thanks, Richard > diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > index > c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596dbe2d7d4a3de9c08c > 100644 > --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > @@ -143,7 +143,7 @@ static const struct tune_params neoverse512tvb_tunings = > 1 /* store_pred. */ >}, /* memmov_cost. */ >3, /* issue_rate */ > - AARCH64_FUSE_BASE, /* fusible_ops */ > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >"32:16", /* function_align. */ >"4", /* jump_align. */ >"32:16", /* loop_align. */ > diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h > b/gcc/config/aarch64/tuning_models/neoversen2.h > index > 18199ac206c6cbfcef8695b497401b78a8f77f38..47d861232951f92bebbd09fbca8b1ff8624949fc > 100644 > --- a/gcc/config/aarch64/tuning_models/neoversen2.h > +++ b/gcc/config/aarch64/tuning_models/neoversen2.h > @@ -205,7 +2
RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, December 11, 2024 9:32 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that > support it > > Tamar Christina writes: > > Hi All, > > > > GCC 15 added two new fusions CMP+CSEL and CMP+CSET. > > > > This patch enables them for cores that support based on their Software > > Optimization Guides and generically on Armv9-A. Even if a core does not > > support it there's no negative performance impact. > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-fusion-pairs.def > (AARCH64_FUSE_NEOVERSE_BASE): > > New. > > * config/aarch64/tuning_models/cortexx925.h: Use it. > > * config/aarch64/tuning_models/generic_armv9_a.h: Use it. > > * config/aarch64/tuning_models/neoverse512tvb.h: Use it. > > * config/aarch64/tuning_models/neoversen2.h: Use it. > > * config/aarch64/tuning_models/neoversen3.h: Use it. > > * config/aarch64/tuning_models/neoversev1.h: Use it. > > * config/aarch64/tuning_models/neoversev2.h: Use it. > > * config/aarch64/tuning_models/neoversev3.h: Use it. > > * config/aarch64/tuning_models/neoversev3ae.h: Use it. > > > > --- > > > > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def > b/gcc/config/aarch64/aarch64-fusion-pairs.def > > index > f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 > df9ae7a196031286d 100644 > > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def > > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def > > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) > > /* Baseline fusion settings suitable for all cores. */ > > #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | > AARCH64_FUSE_AES_AESMC) > > > > +/* Baseline fusion settings suitable for all Neoverse cores. */ > > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | > AARCH64_FUSE_CMP_CSEL \ > > + | AARCH64_FUSE_CMP_CSET) > > + > > #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | > AARCH64_FUSE_MOVK_MOVK) > > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h > b/gcc/config/aarch64/tuning_models/cortexx925.h > > index > eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 > def277230e193d20f 100644 > > --- a/gcc/config/aarch64/tuning_models/cortexx925.h > > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h > > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = > > 2 /* store_pred. */ > >}, /* memmov_cost. */ > >10, /* issue_rate */ > > - AARCH64_FUSE_BASE, /* fusible_ops */ > > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >"32:16", /* function_align. */ > >"4", /* jump_align. */ > >"32:16", /* loop_align. */ > > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > > index > 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 > 5a5df7803ff4dc6c 100644 > > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > > @@ -236,7 +236,7 @@ static const struct tune_params > generic_armv9_a_tunings = > > 1 /* store_pred. */ > >}, /* memmov_cost. */ > >3, /* issue_rate */ > > - AARCH64_FUSE_BASE, /* fusible_ops */ > > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >"32:16", /* function_align. */ > >"4", /* jump_align. */ > >"32:16", /* loop_align. */ > > Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think > we have to be careful about using it for generic tuning. generic-armv9-a > mustn't become "generic Neoverse", since it's supposed to be good for > non-Neoverse (and non-Arm) Armv9-A cores as well. > > So perhaps here we should expand the macro. Alternatively, we could add > a comment saying that fusion macros for other CPUs can be added here as > well, if they are unlikely to have a negative impact on other cores. > > Perhaps we should expand the macro for cortex-x925 as well. Sorry that certainly was not the intention, I thought about naming it AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for non-Armv9 cores I thought this would be equally confusing. But perhaps In light of your comments it may make more sense naming wise? Thanks, Tamar > > LGTM otherwise, but let's see whether there are any other comments. > > Thanks, > Richard > > > diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > > index > c407b89a22f1aecbfd594b493be4fbaf1f9b0437..f72505918f3aa64200aa596db > e2d7d4a3de9c08c 100644 > > --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > > +++ b/gcc/config/aarch64/tuning_model
Re: [PATCH V2] RISC-V: Update Xsfvqmacc and Xsfvfnrclip extension's testcases.
> diff --git > a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c > b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c > index 813f7860f645..3b46505e8f99 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vfnrclip_x_f_qf.c > @@ -7,6 +7,7 @@ > /* > ** test_sf_vfnrclip_x_f_qf_i8mf8_vint8mf8_t: > ** ... > +** vsetivli\s+zero+,0+,e32+,mf2+,ta+,ma+ This should be e8, mf8, you can also check with llvm's implementation: https://godbolt.org/z/xoeYrK6o7 > ** sf\.vfnrclip\.x\.f\.qf\tv[0-9]+,v[0-9]+,fa[0-9]+ > ** ... > */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c > b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c > index f2058a14779b..6bb659b5d233 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_2x8x2.c > @@ -7,6 +7,7 @@ > /* > ** test_sf_vqmacc_2x8x2_i32m1_vint32m1_t: > ** ... > +** vsetivli\s+zero+,0+,e32+,m1,ta,ma+ This should be e8, m1, you can also check with llvm's implementation: https://godbolt.org/z/ns8T65YTh > ** sf\.vqmacc\.2x8x2\tv[0-9]+,v[0-9]+,v[0-9]+ > ** ... > */ > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c > b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c > index 3bd6f1c273cd..8106d0dbbaba 100644 > --- a/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c > +++ b/gcc/testsuite/gcc.target/riscv/rvv/xsfvector/sf_vqmacc_4x8x4.c > @@ -7,6 +7,7 @@ > /* > ** test_sf_vqmacc_4x8x4_i32m1_vint32m1_t: > ** ... > +** vsetivli\s+zero+,0+,e32+,m1,ta,ma+ This should be e8, mf2, you can also check with llvm's implementation: https://godbolt.org/z/h3q6q369q > ** sf\.vqmacc\.4x8x4\tv[0-9]+,v[0-9]+,v[0-9]+ > ** ... > */
Re: [PATCH 2/2]AArch64: Set L1 data cache size according to size on CPUs
Tamar Christina writes: > Hi All, > > This sets the L1 data cache size for some cores based on their size in their > Technical Reference Manuals. > > Today the port minimum is 256 bytes as explained in commit > g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most > cores > actually define the L1 cache size as 64-bytes. The generic Armv9-A model was > already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this > change follows suite for a few other cores based on their TRMs. > > This results in less memory pressure when running on large core count > machines. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b. > * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. > * config/aarch64/tuning_models/neoversen1.h: Likewise. > * config/aarch64/tuning_models/neoversen2.h: Likewise. > * config/aarch64/tuning_models/neoversen3.h: Likewise. > * config/aarch64/tuning_models/neoversev1.h: Likewise. > * config/aarch64/tuning_models/neoversev2.h: Likewise. > * config/aarch64/tuning_models/neoversev3.h: Likewise. > * config/aarch64/tuning_models/neoversev3ae.h: Likewise. > > --- > > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h > b/gcc/config/aarch64/tuning_models/cortexx925.h > index > b3ae1576ade1f701b775496def277230e193d20f..f32454aa808bb2a19bf4d364387ab463a87f9fbf > 100644 > --- a/gcc/config/aarch64/tuning_models/cortexx925.h > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h > @@ -222,7 +222,7 @@ static const struct tune_params cortexx925_tunings = > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv9a_prefetch_tune, >AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > }; > diff --git a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > index > f72505918f3aa64200aa596dbe2d7d4a3de9c08c..007f987154c4634afeb6294b0df142fdd05997cd > 100644 > --- a/gcc/config/aarch64/tuning_models/neoverse512tvb.h > +++ b/gcc/config/aarch64/tuning_models/neoverse512tvb.h > @@ -158,7 +158,7 @@ static const struct tune_params neoverse512tvb_tunings = >(AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv9a_prefetch_tune, >AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > }; > diff --git a/gcc/config/aarch64/tuning_models/neoversen1.h > b/gcc/config/aarch64/tuning_models/neoversen1.h > index > 82def6b2736df8162d9b606440d260c951f3ef99..608e36936ad437befa9b8663bc4c91822b05befc > 100644 > --- a/gcc/config/aarch64/tuning_models/neoversen1.h > +++ b/gcc/config/aarch64/tuning_models/neoversen1.h > @@ -52,7 +52,7 @@ static const struct tune_params neoversen1_tunings = >0, /* max_case_values. */ >tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ >(AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv9a_prefetch_tune, >AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >AARCH64_LDP_STP_POLICY_ALWAYS/* stp_policy_model. */ > }; > diff --git a/gcc/config/aarch64/tuning_models/neoversen2.h > b/gcc/config/aarch64/tuning_models/neoversen2.h > index > 47d861232951f92bebbd09fbca8b1ff8624949fc..13d4640791ca998ca012002c932e8775b1898733 > 100644 > --- a/gcc/config/aarch64/tuning_models/neoversen2.h > +++ b/gcc/config/aarch64/tuning_models/neoversen2.h > @@ -222,7 +222,7 @@ static const struct tune_params neoversen2_tunings = > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT > | AARCH64_EXTRA_TUNE_AVOID_PRED_RMW), /* tune_flags. */ > - &generic_prefetch_tune, > + &generic_armv9a_prefetch_tune, >AARCH64_LDP_STP_POLICY_ALWAYS, /* ldp_policy_model. */ >AARCH64_LDP_STP_POLICY_ALWAYS /* stp_policy_model. */ > }; > diff --git a/gcc/config/aarch64/tuning_models/neoversen3.h > b/gcc/config/aarch64/tuning_models/neoversen3.h > index > 356b5f8cc83b69e74ddac70db7a3883bc1b7a29b..ef15dc8c3015bb406a6957346cf47bacf305d884 > 100644 > --- a/gcc/config/aarch64/tuning_models/neoversen3.h > +++ b/gcc/config/aarch64/tuning_models/neoversen3.h > @@ -221,7 +221,7 @@ static const struct tune_params neoversen3_tunings = > | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS > | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS > | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ > - &gener
Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
Tamar Christina writes: >> -Original Message- >> From: Richard Sandiford >> Sent: Wednesday, December 11, 2024 9:32 AM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; ktkac...@gcc.gnu.org >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that >> support it >> >> Tamar Christina writes: >> > Hi All, >> > >> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET. >> > >> > This patch enables them for cores that support based on their Software >> > Optimization Guides and generically on Armv9-A. Even if a core does not >> > support it there's no negative performance impact. >> > >> > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> > >> > Ok for master? >> > >> > Thanks, >> > Tamar >> > >> > gcc/ChangeLog: >> > >> >* config/aarch64/aarch64-fusion-pairs.def >> (AARCH64_FUSE_NEOVERSE_BASE): >> >New. >> >* config/aarch64/tuning_models/cortexx925.h: Use it. >> >* config/aarch64/tuning_models/generic_armv9_a.h: Use it. >> >* config/aarch64/tuning_models/neoverse512tvb.h: Use it. >> >* config/aarch64/tuning_models/neoversen2.h: Use it. >> >* config/aarch64/tuning_models/neoversen3.h: Use it. >> >* config/aarch64/tuning_models/neoversev1.h: Use it. >> >* config/aarch64/tuning_models/neoversev2.h: Use it. >> >* config/aarch64/tuning_models/neoversev3.h: Use it. >> >* config/aarch64/tuning_models/neoversev3ae.h: Use it. >> > >> > --- >> > >> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def >> b/gcc/config/aarch64/aarch64-fusion-pairs.def >> > index >> f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 >> df9ae7a196031286d 100644 >> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def >> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def >> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) >> > /* Baseline fusion settings suitable for all cores. */ >> > #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | >> AARCH64_FUSE_AES_AESMC) >> > >> > +/* Baseline fusion settings suitable for all Neoverse cores. */ >> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | >> AARCH64_FUSE_CMP_CSEL \ >> > + | AARCH64_FUSE_CMP_CSET) >> > + >> > #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | >> AARCH64_FUSE_MOVK_MOVK) >> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h >> b/gcc/config/aarch64/tuning_models/cortexx925.h >> > index >> eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 >> def277230e193d20f 100644 >> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h >> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h >> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = >> > 2 /* store_pred. */ >> >}, /* memmov_cost. */ >> >10, /* issue_rate */ >> > - AARCH64_FUSE_BASE, /* fusible_ops */ >> > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >> >"32:16",/* function_align. */ >> >"4",/* jump_align. */ >> >"32:16",/* loop_align. */ >> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> > index >> 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 >> 5a5df7803ff4dc6c 100644 >> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h >> > @@ -236,7 +236,7 @@ static const struct tune_params >> generic_armv9_a_tunings = >> > 1 /* store_pred. */ >> >}, /* memmov_cost. */ >> >3, /* issue_rate */ >> > - AARCH64_FUSE_BASE, /* fusible_ops */ >> > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ >> >"32:16",/* function_align. */ >> >"4",/* jump_align. */ >> >"32:16",/* loop_align. */ >> >> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think >> we have to be careful about using it for generic tuning. generic-armv9-a >> mustn't become "generic Neoverse", since it's supposed to be good for >> non-Neoverse (and non-Arm) Armv9-A cores as well. >> >> So perhaps here we should expand the macro. Alternatively, we could add >> a comment saying that fusion macros for other CPUs can be added here as >> well, if they are unlikely to have a negative impact on other cores. >> >> Perhaps we should expand the macro for cortex-x925 as well. > > Sorry that certainly was not the intention, I thought about naming it > AARCH64_FUSE_ARMV9_BASE instead, but since I was also using it for > non-Armv9 cores I thought this would be equally confusing. But perhaps > In light of your comments it may make more sense naming wise? Yeah, I'd also wondered about suggesting that, but then saw that there were quite a few Armv9-A cores that don't have the flag. So I think AARCH64_FUSE_NEOVERSE_BASE is a good name, and is what we should use for the Neoverse tunings. But perhaps we should expand the macro for the two
RE: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that support it
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, December 11, 2024 9:50 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that > support it > > Tamar Christina writes: > >> -Original Message- > >> From: Richard Sandiford > >> Sent: Wednesday, December 11, 2024 9:32 AM > >> To: Tamar Christina > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > >> ; ktkac...@gcc.gnu.org > >> Subject: Re: [PATCH 1/2]AArch64: Add CMP+CSEL and CMP+CSET for cores that > >> support it > >> > >> Tamar Christina writes: > >> > Hi All, > >> > > >> > GCC 15 added two new fusions CMP+CSEL and CMP+CSET. > >> > > >> > This patch enables them for cores that support based on their Software > >> > Optimization Guides and generically on Armv9-A. Even if a core does not > >> > support it there's no negative performance impact. > >> > > >> > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > >> > > >> > Ok for master? > >> > > >> > Thanks, > >> > Tamar > >> > > >> > gcc/ChangeLog: > >> > > >> > * config/aarch64/aarch64-fusion-pairs.def > >> (AARCH64_FUSE_NEOVERSE_BASE): > >> > New. > >> > * config/aarch64/tuning_models/cortexx925.h: Use it. > >> > * config/aarch64/tuning_models/generic_armv9_a.h: Use it. > >> > * config/aarch64/tuning_models/neoverse512tvb.h: Use it. > >> > * config/aarch64/tuning_models/neoversen2.h: Use it. > >> > * config/aarch64/tuning_models/neoversen3.h: Use it. > >> > * config/aarch64/tuning_models/neoversev1.h: Use it. > >> > * config/aarch64/tuning_models/neoversev2.h: Use it. > >> > * config/aarch64/tuning_models/neoversev3.h: Use it. > >> > * config/aarch64/tuning_models/neoversev3ae.h: Use it. > >> > > >> > --- > >> > > >> > diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def > >> b/gcc/config/aarch64/aarch64-fusion-pairs.def > >> > index > >> > f8413ab0c802c28290ebcc171bfd131622cb33be..0123430d988b96b20d20376 > >> df9ae7a196031286d 100644 > >> > --- a/gcc/config/aarch64/aarch64-fusion-pairs.def > >> > +++ b/gcc/config/aarch64/aarch64-fusion-pairs.def > >> > @@ -45,4 +45,8 @@ AARCH64_FUSION_PAIR ("cmp+cset", CMP_CSET) > >> > /* Baseline fusion settings suitable for all cores. */ > >> > #define AARCH64_FUSE_BASE (AARCH64_FUSE_CMP_BRANCH | > >> AARCH64_FUSE_AES_AESMC) > >> > > >> > +/* Baseline fusion settings suitable for all Neoverse cores. */ > >> > +#define AARCH64_FUSE_NEOVERSE_BASE (AARCH64_FUSE_BASE | > >> AARCH64_FUSE_CMP_CSEL \ > >> > +| AARCH64_FUSE_CMP_CSET) > >> > + > >> > #define AARCH64_FUSE_MOVK (AARCH64_FUSE_MOV_MOVK | > >> AARCH64_FUSE_MOVK_MOVK) > >> > diff --git a/gcc/config/aarch64/tuning_models/cortexx925.h > >> b/gcc/config/aarch64/tuning_models/cortexx925.h > >> > index > >> > eb9b89984b0472858bc08dba924c962ec4ba53bd..b3ae1576ade1f701b775496 > >> def277230e193d20f 100644 > >> > --- a/gcc/config/aarch64/tuning_models/cortexx925.h > >> > +++ b/gcc/config/aarch64/tuning_models/cortexx925.h > >> > @@ -205,7 +205,7 @@ static const struct tune_params cortexx925_tunings = > >> > 2 /* store_pred. */ > >> >}, /* memmov_cost. */ > >> >10, /* issue_rate */ > >> > - AARCH64_FUSE_BASE, /* fusible_ops */ > >> > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >> >"32:16", /* function_align. */ > >> >"4", /* jump_align. */ > >> >"32:16", /* loop_align. */ > >> > diff --git a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >> b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >> > index > >> > 48353a59939d84647c6981d6d0551af7ce9df751..e971e645dfc4b805b7f994a1 > >> 5a5df7803ff4dc6c 100644 > >> > --- a/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >> > +++ b/gcc/config/aarch64/tuning_models/generic_armv9_a.h > >> > @@ -236,7 +236,7 @@ static const struct tune_params > >> generic_armv9_a_tunings = > >> > 1 /* store_pred. */ > >> >}, /* memmov_cost. */ > >> >3, /* issue_rate */ > >> > - AARCH64_FUSE_BASE, /* fusible_ops */ > >> > + AARCH64_FUSE_NEOVERSE_BASE, /* fusible_ops */ > >> >"32:16", /* function_align. */ > >> >"4", /* jump_align. */ > >> >"32:16", /* loop_align. */ > >> > >> Having AARCH64_FUSE_NEOVERSE_BASE seems like a good thing, but I think > >> we have to be careful about using it for generic tuning. generic-armv9-a > >> mustn't become "generic Neoverse", since it's supposed to be good for > >> non-Neoverse (and non-Arm) Armv9-A cores as well. > >> > >> So perhaps here we should expand the macro. Alternatively, we could add > >> a comment saying that fusion macros for other CPUs can be added here as > >> well, if they are unlikely to have a negative impact on other cores. > >> > >> Perhaps we should expand the macro for cortex-x925 as well. > > > > Sorry that certainly was not the intention, I thought about naming it > > AARCH6
Re: [PATCH] AIX Build failure with default -std=gnu23.
Thank you everyone. I have pushed the changes. Thanks, Sangamesh On Mon, Dec 9, 2024 at 12:48 PM swamy sangamesh wrote: > Hi David, > > I don't have write privileges for GCC repo. > I have raised a request for write access and provided your email address > as approver. > Please approve the request. > > Thanks, > Sangamesh > > > On Mon, Dec 9, 2024 at 3:17 AM David Edelsohn wrote: > >> This revised patch is okay. >> >> You are listed in the FSF copyrights file for GCC GDB GLIBC BINUTILS, but >> do you have write privileges for the GCC repo? You are not listed in >> gcc/MAINTAINERS for write-after-approval. >> >> Thanks, David >> >> On Sun, Dec 8, 2024 at 10:49 AM swamy sangamesh < >> swamy.sangam...@gmail.com> wrote: >> >>> Thank you all for the review and comments. >>> >>> David, I have tested the changes against the latest master and ran >>> default test cases. >>> I have also built and ran the default test cases on RHEL9.0 ppc64le. >>> Please let me know if any other testing needs to be covered. >>> >>> Please find the latest patch attached. >>> >>> >>> >>> On Sat, Dec 7, 2024 at 2:31 AM David Edelsohn wrote: >>> On Fri, Dec 6, 2024 at 2:17 PM Ian Lance Taylor wrote: > David Edelsohn writes: > > > On Fri, Dec 6, 2024 at 12:25 PM Rainer Orth < > r...@cebitec.uni-bielefeld.de> > > wrote: > > > >> Hi David, > >> > >> > No objection from me, but Ian is the maintainer of libiberty, so > I'll > >> defer > >> > to him, especially about style and overall software engineering. > >> > > >> > The C23 change presumably will break on Alpha OSF/1 as well. > Does GCC > >> > still support OSF/1? It might be preferred to delete the block > entirely > >> > instead of #ifndef _AIX. > >> > >> GCC 4.7 was the last release to support Tru64 UNIX (ex-OSF/1). > However, > >> libiberty is also used outside of the toolchain, so that may affect > the > >> decision. > >> > >> However, IMO the Tru64 UNIX support can go for good now. > >> > > > > Hi, Rainer > > > > Thanks for taking a look and commenting. > > > > It seems we both agree that it would be better to remove the entire > block > > defining _NO_PROTO because both of the systems are no longer > supported. > > > > I'll give Ian the opportunity to comment. > > Looks good to me. Thanks. > > Ian > Sangamesh, Can you respin and test a revised patch that removes the conditional _NO_PROTO definition instead of adding #ifndef _AIX? I think that is what Rainer and I would prefer because neither of the OSes is supported and we don't need a fragile work-around. Thanks, David >>> >>> -- >>> Thanks & Regards, >>> Sangamesh >>> >> > > -- > Thanks & Regards, > Sangamesh > -- Thanks & Regards, Sangamesh
[PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
This was approved in Wrocław as LWG 3900, so that passing an incorrect argument intended as an allocator will be ill-formed, instead of silently ignored. This also renames the template parameters and function parameters for the allocators, to align with the names in the standard. I found it too confusing to have a parameter _Alloc which doesn't correspond to Alloc in the standard. Rename _Alloc to _Allocator (which the standard calls Allocator) and rename _Na to _Alloc (which the standard calls Alloc). libstdc++-v3/ChangeLog: * include/std/generator (_Promise_alloc): Rename template parameter. Use __alloc_rebind to rebind allocator. (_Promise_alloc::operator new): Replace constraints with a static_assert in the body. Rename allocator parameter. (_Promise_alloc::_M_allocate): Rename allocator parameter. Use __alloc_rebind to rebind allocator. (_Promise_alloc::operator new): Rename allocator parameter. * testsuite/24_iterators/range_generators/alloc.cc: New test. * testsuite/24_iterators/range_generators/lwg3900.cc: New test. --- Tested x86_64-linux. libstdc++-v3/include/std/generator| 55 +-- .../24_iterators/range_generators/alloc.cc| 45 +++ .../24_iterators/range_generators/lwg3900.cc | 16 ++ 3 files changed, 87 insertions(+), 29 deletions(-) create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc create mode 100644 libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc diff --git a/libstdc++-v3/include/std/generator b/libstdc++-v3/include/std/generator index 0a14e70064e..16b953e90af 100644 --- a/libstdc++-v3/include/std/generator +++ b/libstdc++-v3/include/std/generator @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION concept _Stateless_alloc = (allocator_traits<_All>::is_always_equal::value && default_initializable<_All>); -template +template class _Promise_alloc { - using _ATr = allocator_traits<_Alloc>; - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>; - using _Rebound_ATr = typename _ATr - ::template rebind_traits<_Alloc_block>; + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>; + using _Rebound_ATr = allocator_traits<_Rebound>; static_assert(is_pointer_v, "Must use allocators for true pointers with generators"); @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION public: void* operator new(std::size_t __sz) - requires default_initializable<_Rebound> // _Alloc is non-void + requires default_initializable<_Rebound> // _Allocator is non-void { return _M_allocate({}, __sz); } - template + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3900. The allocator_arg_t overloads of promise_type::operator new + // should not be constrained + template void* operator new(std::size_t __sz, -allocator_arg_t, const _Na& __na, +allocator_arg_t, const _Alloc& __a, const _Args&...) - requires convertible_to { - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), -__sz); + static_assert(convertible_to); + return _M_allocate(_Rebound(_Allocator(__a)), __sz); } - template + template void* operator new(std::size_t __sz, const _This&, -allocator_arg_t, const _Na& __na, +allocator_arg_t, const _Alloc& __a, const _Args&...) - requires convertible_to { - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), -__sz); + static_assert(convertible_to); + return _M_allocate(_Rebound(_Allocator(__a)), __sz); } void @@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } - template + template static void* - _M_allocate(const _Na& __na, std::size_t __csz) + _M_allocate(const _Alloc& __a, std::size_t __csz) { - using _Rebound = typename std::allocator_traits<_Na> - ::template rebind_alloc<_Alloc_block>; - using _Rebound_ATr = typename std::allocator_traits<_Na> - ::template rebind_traits<_Alloc_block>; + using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>; + using _Rebound_ATr = allocator_traits<_Rebound>; static_assert(is_pointer_v, "Must use allocators for true pointers with generators"); _Dealloc_fn __d = &_M_deallocator<_Rebound>; - auto __b = static_cast<_Rebound>(__na); + auto __b = static_cast<_Rebound>(__a); auto __asz = _M_alloc_size<_Rebound>(__csz);
[committed] libstdc++: Make std::println use locale from ostream (LWG 4088)
This was just approved in Wrocław. libstdc++-v3/ChangeLog: * include/std/ostream (println): Pass stream's locale to std::format, as per LWG 4088. * testsuite/27_io/basic_ostream/print/1.cc: Check std::println with custom locale. Remove unused brit_punc class. --- Tested x86_64-linux. Pushed to trunk. libstdc++-v3/include/std/ostream | 6 -- .../testsuite/27_io/basic_ostream/print/1.cc | 18 +- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream index 637aad5a5a4..327313a881d 100644 --- a/libstdc++-v3/include/std/ostream +++ b/libstdc++-v3/include/std/ostream @@ -1028,8 +1028,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline void println(ostream& __os, format_string<_Args...> __fmt, _Args&&... __args) { - std::print(__os, "{}\n", -std::format(__fmt, std::forward<_Args>(__args)...)); + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 4088. println ignores the locale imbued in std::ostream + std::print(__os, "{}\n", std::format(__os.getloc(), __fmt, + std::forward<_Args>(__args)...)); } // Defined for C++26, supported as an extension to C++23. diff --git a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc index cd4b116ac1c..183e08733d2 100644 --- a/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc +++ b/libstdc++-v3/testsuite/27_io/basic_ostream/print/1.cc @@ -63,14 +63,6 @@ test_vprint_nonunicode() // { dg-output "garbage in . garbage out" } } -struct brit_punc : std::numpunct -{ - std::string do_grouping() const override { return "\3\3"; } - char do_thousands_sep() const override { return ','; } - std::string do_truename() const override { return "yes mate"; } - std::string do_falsename() const override { return "nah bruv"; } -}; - void test_locale() { @@ -82,7 +74,7 @@ test_locale() // The default C locale. std::locale cloc = std::locale::classic(); - // A custom locale using comma digit separators. + // A custom locale using tilde digit separators. std::locale bloc(cloc, new stream_punc); { @@ -101,6 +93,14 @@ test_locale() std::print(os, "{:L} {}", 12345, 6789); VERIFY(os.str() == "1~23~45 6789"); } + + { +// LWG 4088. println ignores the locale imbued in std::ostream +std::ostringstream os; +os.imbue(bloc); +std::println(os, "{:L} {}", 12345, 6789); +VERIFY(os.str() == "1~23~45 6789\n"); + } } void -- 2.47.1
Re: [PATCH] libstdc++: Remove constraints on std::generator::promise_type::operator new
On Wed, 11 Dec 2024 at 11:02, Jonathan Wakely wrote: > > This was approved in Wrocław as LWG 3900, so that passing an incorrect > argument intended as an allocator will be ill-formed, instead of > silently ignored. > > This also renames the template parameters and function parameters for > the allocators, to align with the names in the standard. I found it too > confusing to have a parameter _Alloc which doesn't correspond to Alloc > in the standard. Rename _Alloc to _Allocator (which the standard calls > Allocator) and rename _Na to _Alloc (which the standard calls Alloc). > > libstdc++-v3/ChangeLog: > > * include/std/generator (_Promise_alloc): Rename template > parameter. Use __alloc_rebind to rebind allocator. > (_Promise_alloc::operator new): Replace constraints with a > static_assert in the body. Rename allocator parameter. > (_Promise_alloc::_M_allocate): Rename allocator parameter. > Use __alloc_rebind to rebind allocator. > (_Promise_alloc::operator new): Rename allocator > parameter. > * testsuite/24_iterators/range_generators/alloc.cc: New test. > * testsuite/24_iterators/range_generators/lwg3900.cc: New test. > --- > > Tested x86_64-linux. > > libstdc++-v3/include/std/generator| 55 +-- > .../24_iterators/range_generators/alloc.cc| 45 +++ > .../24_iterators/range_generators/lwg3900.cc | 16 ++ > 3 files changed, 87 insertions(+), 29 deletions(-) > create mode 100644 > libstdc++-v3/testsuite/24_iterators/range_generators/alloc.cc > create mode 100644 > libstdc++-v3/testsuite/24_iterators/range_generators/lwg3900.cc > > diff --git a/libstdc++-v3/include/std/generator > b/libstdc++-v3/include/std/generator > index 0a14e70064e..16b953e90af 100644 > --- a/libstdc++-v3/include/std/generator > +++ b/libstdc++-v3/include/std/generator > @@ -416,13 +416,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > concept _Stateless_alloc = > (allocator_traits<_All>::is_always_equal::value > && default_initializable<_All>); > > -template > +template >class _Promise_alloc >{ > - using _ATr = allocator_traits<_Alloc>; > - using _Rebound = typename _ATr::template rebind_alloc<_Alloc_block>; > - using _Rebound_ATr = typename _ATr > - ::template rebind_traits<_Alloc_block>; > + using _Rebound = __alloc_rebind<_Allocator, _Alloc_block>; > + using _Rebound_ATr = allocator_traits<_Rebound>; > static_assert(is_pointer_v, > "Must use allocators for true pointers with > generators"); > > @@ -465,30 +463,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >public: > void* > operator new(std::size_t __sz) > - requires default_initializable<_Rebound> // _Alloc is non-void > + requires default_initializable<_Rebound> // _Allocator is non-void > { return _M_allocate({}, __sz); } > > - template > + // _GLIBCXX_RESOLVE_LIB_DEFECTS > + // 3900. The allocator_arg_t overloads of promise_type::operator new > + // should not be constrained > + template > void* > operator new(std::size_t __sz, > -allocator_arg_t, const _Na& __na, > +allocator_arg_t, const _Alloc& __a, > const _Args&...) > - requires convertible_to > { > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > -__sz); > + static_assert(convertible_to); > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > } > > - template > + template > void* > operator new(std::size_t __sz, > const _This&, > -allocator_arg_t, const _Na& __na, > +allocator_arg_t, const _Alloc& __a, > const _Args&...) > - requires convertible_to > { > - return _M_allocate(static_cast<_Rebound>(static_cast<_Alloc>(__na)), > -__sz); > + static_assert(convertible_to); > + return _M_allocate(_Rebound(_Allocator(__a)), __sz); > } > > void > @@ -578,20 +577,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > } > } > > - template > + template > static void* > - _M_allocate(const _Na& __na, std::size_t __csz) > + _M_allocate(const _Alloc& __a, std::size_t __csz) > { > - using _Rebound = typename std::allocator_traits<_Na> > - ::template rebind_alloc<_Alloc_block>; > - using _Rebound_ATr = typename std::allocator_traits<_Na> > - ::template rebind_traits<_Alloc_block>; > + using _Rebound = __alloc_rebind<_Alloc, _Alloc_block>; > + using _Rebound_ATr = allocator_traits<_Rebound>; > > static_assert(is_pointer_v, >
[PATCH] libstdc++: Avoid redundant assertions in std::span constructors
Any std::span constructor with a runtime length has a precondition that the length is equal to N (except when N == std::dynamic_extent). Currently every constructor with a runtime length does: if constexpr (extent != dynamic_extent) __glibcxx_assert(n == extent); We can move those assertions into the __detail::__extent_storage constructor so they are only done in one place. To avoid checking the assertions when we have a constant length we can add a second constructor which is consteval and takes a integral_constant argument. The std::span constructors can pass a size_t for runtime lengths and a std::integral_constant for constant lengths that don't need to be checked. The __detail::__extent_storage specialization only needs one constructor, as a std::integral_constant argument can implicitly convert to size_t. For the member functions that return a subspan with a constant extent we return std::span(ptr, C) which is redundant in two ways. Repeating the constant length C when it's already a template argument is redundant, and using the std::span(T*, size_t) constructor implies a runtime length which will do a redundant assertion check. Even though that assertion won't fail and should be optimized away, it's still unnecessary code that doesn't need to be instantiated and then optimized away again. We can avoid that by adding a new private constructor that only takes a pointer (wrapped in a custom tag struct to avoid accidentally using that constructor) and automatically sets _M_extent to the correct value. libstdc++-v3/ChangeLog: * include/std/span (__detail::__extent_storage): Check precondition in constructor. Add consteval constructor for valid lengths and deleted constructor for invalid constant lengths. Make member functions always_inline. (__detail::__span_ptr): New class template. (span): Adjust constructors to use a std::integral_constant value for constant lengths. Declare all specializations of std::span as friends. (span::first, span::last, span::subspan): Use new private constructor. (span(__span_ptr)): New private constructor for constant lengths. --- As suggested at https://gcc.gnu.org/pipermail/gcc-patches/2024-December/671431.html Tested x86_64-linux. libstdc++-v3/include/std/span | 78 --- 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span index 67227348c90..e8043c02c9a 100644 --- a/libstdc++-v3/include/std/span +++ b/libstdc++-v3/include/std/span @@ -73,10 +73,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class __extent_storage { public: + // Used for runtime sizes that must satisfy the precondition. constexpr - __extent_storage(size_t) noexcept + __extent_storage([[maybe_unused]] size_t __n) noexcept + { __glibcxx_assert(__n == _Extent); } + + // Used for constant sizes that are already known to be correct. + consteval + __extent_storage(integral_constant) noexcept { } + // "I've made a huge mistake" - George Oscar Bluth II + template + __extent_storage(integral_constant) = delete; + + [[__gnu__::__always_inline__]] static constexpr size_t _M_extent() noexcept { return _Extent; } @@ -86,11 +97,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION class __extent_storage { public: + [[__gnu__::__always_inline__]] constexpr __extent_storage(size_t __extent) noexcept : _M_extent_value(__extent) { } + [[__gnu__::__always_inline__]] constexpr size_t _M_extent() const noexcept { return this->_M_extent_value; } @@ -98,6 +111,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION private: size_t _M_extent_value; }; + +template struct __span_ptr { _Type* const _M_ptr; }; + } // namespace __detail template @@ -128,6 +144,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Nested type so that _Type is not an associated class of iterator. struct __iter_tag; + template + static inline constexpr integral_constant __v{}; + public: // member types using element_type = _Type; @@ -153,7 +172,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION constexpr span() noexcept requires (_Extent == dynamic_extent || _Extent == 0) - : _M_ptr(nullptr), _M_extent(0) + : _M_ptr(nullptr), _M_extent(__v<0>) { } template @@ -162,13 +181,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION span(_It __first, size_type __count) noexcept : _M_ptr(std::to_address(__first)), _M_extent(__count) - { - if constexpr (_Extent != dynamic_extent) - { - __glibcxx_assert(__count == _Extent); - } - __glibcxx_requires_valid_range(__first, __first + __count); - } + {
[PATCH] libstdc++: Add lvalue overload for generator::yield_value
This was approved in Wrocław as LWG 3899. libstdc++-v3/ChangeLog: * include/std/generator (generator::yield_value): Add overload taking lvalue element_of view, as per LWG 3899. --- The issue suggests that this change avoids creating a new coroutine frame, so I thought if I used a custom allocator I'd be able to see a reduction in memory allocation after this fix. I was unable to see any difference. I would welcome a testcase for it. Tested x86_64-linux. libstdc++-v3/include/std/generator | 10 ++ 1 file changed, 10 insertions(+) diff --git a/libstdc++-v3/include/std/generator b/libstdc++-v3/include/std/generator index bba85bd0aa4..807e724c0c8 100644 --- a/libstdc++-v3/include/std/generator +++ b/libstdc++-v3/include/std/generator @@ -153,6 +153,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION noexcept { return _Recursive_awaiter { std::move(__r.range) }; } + // _GLIBCXX_RESOLVE_LIB_DEFECTS + // 3899. co_yielding elements of an lvalue generator is + // unnecessarily inefficient + template + requires std::same_as<_Yield2_t<_R2, _V2>, _Yielded> + auto + yield_value(ranges::elements_of&, _U2> __r) + noexcept + { return _Recursive_awaiter { std::move(__r.range) }; } + template requires convertible_to, _Yielded> auto -- 2.47.1
Re: [PATCH] libstdc++: add initializer_list constructor to std::span (P2447)
On Wed, 11 Dec 2024 at 22:53, Jonathan Wakely wrote: > > On 03/12/24 17:05 +0100, Giuseppe D'Angelo wrote: > >Hello, > > > >The attached patch adds the span(initializer_list) constructor, added > >by P2447R6 for C++26. > > > >It seems that GCC is somewhat aggressive with its -Winit-list-lifetime > >warning, which actively interferes with this feature. The idea of the > >new constructor is to allow calls like: > > > > void f(std::span); > > f({1, 2, 3}); > > > >which is completely fine as the lifetime of the initializer_list > >encompasses the one of the std::span parameter. However GCC complains > >about the risk of dangling here. I've therefore disabled the warning > >for the new constructor. > > > >Thanks, > >-- > >Giuseppe D'Angelo > > >From bb1d537ee7b68883403127903834427c6787c505 Mon Sep 17 00:00:00 2001 > >From: Giuseppe D'Angelo > >Date: Tue, 3 Dec 2024 16:56:45 +0100 > >Subject: [PATCH] libstdc++: add initializer_list constructor to std::span > > (P2447) > > > >This commit implements P2447R6. The code is straightforward (just one > >extra constructor, with constraints and conditional explicit). > > > >I decided to suppress -Winit-list-lifetime because otherwise it would > >give too many false positives. The new constructor is meant to be used > >as a parameter-passing interface (this is a design choice, see > >P2447R6/2) and, as such, the initializer_list won't dangle despite GCC's > >warnings. > > > >The new constructor isn't 100% backwards compatible. A couple of > >examples are included in Annex C, but I have also lifted some more > >from R4. A new test checks for the old and the new behaviors. > > > >libstdc++-v3/ChangeLog: > > > > * include/bits/version.def: Added the new feature-testing macro. > > * include/bits/version.h (defined): Regenerated. > > * include/std/span: Added constructor from initializer_list. > > * testsuite/23_containers/span/init_list_cons.cc: New test. > > * testsuite/23_containers/span/init_list_cons_neg.cc: New test. > >--- > > libstdc++-v3/include/bits/version.def | 8 +++ > > libstdc++-v3/include/bits/version.h | 10 +++ > > libstdc++-v3/include/std/span | 21 ++ > > .../23_containers/span/init_list_cons.cc | 65 +++ > > .../23_containers/span/init_list_cons_neg.cc | 31 + > > 5 files changed, 135 insertions(+) > > create mode 100644 > > libstdc++-v3/testsuite/23_containers/span/init_list_cons.cc > > create mode 100644 > > libstdc++-v3/testsuite/23_containers/span/init_list_cons_neg.cc > > > >diff --git a/libstdc++-v3/include/bits/version.def > >b/libstdc++-v3/include/bits/version.def > >index 8d4b8e9b383..cfa0469fb2d 100644 > >--- a/libstdc++-v3/include/bits/version.def > >+++ b/libstdc++-v3/include/bits/version.def > >@@ -1853,6 +1853,14 @@ ftms = { > > }; > > }; > > > >+ftms = { > >+ name = span_initializer_list; > >+ values = { > >+v = 202311; > >+cxxmin = 26; > >+ }; > >+}; > >+ > > ftms = { > > name = text_encoding; > > values = { > >diff --git a/libstdc++-v3/include/bits/version.h > >b/libstdc++-v3/include/bits/version.h > >index c556aca38fa..6a2c66bdf81 100644 > >--- a/libstdc++-v3/include/bits/version.h > >+++ b/libstdc++-v3/include/bits/version.h > >@@ -2055,6 +2055,16 @@ > > #endif /* !defined(__cpp_lib_saturation_arithmetic) && > > defined(__glibcxx_want_saturation_arithmetic) */ > > #undef __glibcxx_want_saturation_arithmetic > > > >+#if !defined(__cpp_lib_span_initializer_list) > >+# if (__cplusplus > 202302L) > >+# define __glibcxx_span_initializer_list 202311L > >+# if defined(__glibcxx_want_all) || > >defined(__glibcxx_want_span_initializer_list) > >+# define __cpp_lib_span_initializer_list 202311L > >+# endif > >+# endif > >+#endif /* !defined(__cpp_lib_span_initializer_list) && > >defined(__glibcxx_want_span_initializer_list) */ > >+#undef __glibcxx_want_span_initializer_list > >+ > > #if !defined(__cpp_lib_text_encoding) > > # if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && > > (_GLIBCXX_USE_NL_LANGINFO_L) > > # define __glibcxx_text_encoding 202306L > >diff --git a/libstdc++-v3/include/std/span b/libstdc++-v3/include/std/span > >index f1c19b58737..b84ac87b657 100644 > >--- a/libstdc++-v3/include/std/span > >+++ b/libstdc++-v3/include/std/span > >@@ -39,6 +39,7 @@ > > #endif > > > > #define __glibcxx_want_span > >+#define __glibcxx_want_span_initializer_list > > #include > > > > #ifdef __cpp_lib_span // C++ >= 20 && concepts > >@@ -46,6 +47,9 @@ > > #include > > #include > > #include > >+#ifdef __cpp_lib_span_initializer_list > >+# include > >+#endif > > namespace std _GLIBCXX_VISIBILITY(default) > > { > > _GLIBCXX_BEGIN_NAMESPACE_VERSION > >@@ -226,6 +230,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > } > > } > > > >+#if __cpp_lib_span_initializer_list >= 202311L > >+#pragma GCC diagnostic push > >+#pragma GCC diagnostic ignored "-Winit-list-lifetime" > >+ constexpr > >+ explicit(extent != dy
Re: [PATCH] RISC-V: Fix compress shuffle pattern [PR117383].
> This one also has a comment saying "but use ta" with TDEFAULT_POLICY_P, > IIUC one of those must be wrong too? Yea, I didn't update the comment, thanks for catching. -- Regards Robin
[COMMITTED] Fix misplaced x86 -mstack-protector-guard-symbol documentation [PR117150]
Commit e1769bdd4cef522ada32aec863feba41116b183a accidentally inserted the documentation for the x86 -mstack-protector-guard-symbol option in the wrong place. Fixed thusly. gcc/ChangeLog PR target/117150 * doc/invoke.texi (RS/6000 and PowerPC Options): Move description of -mstack-protector-guard-symbol from here... (x86 Options): ...to here. --- gcc/doc/invoke.texi | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3cb9a50b690..b85084459b1 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -32376,11 +32376,9 @@ The @option{-mno-compat-align-parm} option is the default. @opindex mstack-protector-guard @opindex mstack-protector-guard-reg @opindex mstack-protector-guard-offset -@opindex mstack-protector-guard-symbol @item -mstack-protector-guard=@var{guard} @itemx -mstack-protector-guard-reg=@var{reg} @itemx -mstack-protector-guard-offset=@var{offset} -@itemx -mstack-protector-guard-symbol=@var{symbol} Generate stack protection code using canary at @var{guard}. Supported locations are @samp{global} for global canary or @samp{tls} for per-thread canary in the TLS block (the default with GNU libc version 2.4 or later). @@ -32390,8 +32388,7 @@ With the latter choice the options @option{-mstack-protector-guard-offset=@var{offset}} furthermore specify which register to use as base register for reading the canary, and from what offset from that base register. The default for those is as specified in the -relevant ABI. @option{-mstack-protector-guard-symbol=@var{symbol}} overrides -the offset with a symbol reference to a canary in the TLS block. +relevant ABI. @opindex mpcrel @opindex mno-pcrel @@ -36216,9 +36213,11 @@ Split 32-byte AVX unaligned load and store. @opindex mstack-protector-guard @opindex mstack-protector-guard-reg @opindex mstack-protector-guard-offset +@opindex mstack-protector-guard-symbol @item -mstack-protector-guard=@var{guard} @itemx -mstack-protector-guard-reg=@var{reg} @itemx -mstack-protector-guard-offset=@var{offset} +@itemx -mstack-protector-guard-symbol=@var{symbol} Generate stack protection code using canary at @var{guard}. Supported locations are @samp{global} for global canary or @samp{tls} for per-thread canary in the TLS block (the default). This option has effect only when @@ -36231,6 +36230,9 @@ which segment register (@code{%fs} or @code{%gs}) to use as base register for reading the canary, and from what offset from that base register. The default for those is as specified in the relevant ABI. +@option{-mstack-protector-guard-symbol=@var{symbol}} overrides +the offset with a symbol reference to a canary in the TLS block. + @opindex mgeneral-regs-only @item -mgeneral-regs-only Generate code that uses only the general-purpose registers. This -- 2.25.1